Learning COBOL with Examples - Part 2: FILE I/O
The previous post in this series was about learning the basics of the COBOL programming language. Now I will cover a very important and essential part of COBOL - File Input / Output (I/O). I find it really interesting, as I haven't seen any other language which makes it that easy to read files for lines and break them up basically automatically. I suspect, that this is the reason COBOL exists and that most of the business applications did that a lot.
Well nowadays there is basically everything for COBOL, you can write web-services in COBOL, query databases with SQL and do loads more, as we will see in future posts.
In programming files are basically containers of input- or output-data. In Linux, or Unix-Systems for that matter, files can also be pipes or sockets, which can be used to communicate with other programs. In this post I will only discuss opening, reading, writing and closing known files, i.e. using system calls to find files, or list them, is not part of this post.
There are basically four operations you want to do on files. Opening a file tells the Operating System to find the file on the disk, and make it accessible from your application. Internally the so called file descriptor of that file is added to the list of file descriptors of your process. This file descriptor can be used in languages like C for syscalls such as
read. Closing a file tells the OS, that we are finished, the written content can be flushed to the disk, and the operation with this file is done.
Reading and writing from and to the file does exactly that. A read gets some data from that file and makes it usable via memory regions or variables, whereas writing data takes your variables or a memory region and transfers it into the file, at the current location of the file pointer. However writing is most of the times not immediate, but deferred, i.e. the operation is put off, until more data is there to write to the disk.
The example program
Take this example program I wrote year ago. It is supposed to read a file containing my worked hours per day, reformats the data and calculates a sum of the worked hours. It is an extremely simple program, but still it is loads of text in COBOL.
IDENTIFICATION DIVISION. PROGRAM-ID. wrkhrs. ENVIRONMENT DIVISION. INPUT-OUTPUT SECTION. FILE-CONTROL. select hrs assign to "hours.txt" organization is line sequential. select hrs-out assign to "hours.out" organization is line sequential. DATA DIVISION. FILE SECTION. fd hrs. 01 hrs-record. 02 d PIC IS Z9. 02 FILLER PIC IS X. 02 dayname PIC IS XX. 02 FILLER PIC IS X. 02 hours PIC IS XX. 02 FILLER PIC IS X(6). 02 descr PIC IS X(50). fd hrs-out. 01 hrs-out-rec. 02 dayname PIC IS XX. 02 a PIC IS X(2). 02 d PIC IS XX. 02 b PIC IS X(7). 02 hours PIC IS XX. 02 c PIC IS X(15). 02 descr PIC IS X(50). 01 total-rec. 02 s PIC IS X(14). 02 total PIC IS X(3). WORKING-STORAGE SECTION. 01 ls. 02 d PIC IS 99. 02 dayname PIC IS XX. 02 hours PIC IS 99. 02 descr PIC IS X(50). 01 nmbr PIC IS 9(3). PROCEDURE DIVISION. bg. OPEN INPUT hrs. OPEN OUTPUT hrs-out. MOVE zeroes TO nmbr. MOVE ", " TO a of hrs-out-rec. MOVE " Sep.: " TO b of hrs-out-rec. MOVE " hrs., Report: " TO c of hrs-out-rec. reed. READ hrs RECORD AT END go to e. MOVE CORR hrs-record TO ls. ADD hours OF ls TO nmbr. MOVE CORR ls TO hrs-out-rec. WRITE hrs-out-rec. go to reed. e. MOVE " Hours total: " TO s OF total-rec. MOVE nmbr TO total OF total-rec. WRITE total-rec. CLOSE hrs-out. CLOSE hrs. STOP RUN.
The environment division contains the Input-Output Section, which in turns holds the file-control-paragraphs. Here the two files, one for writing, one for reading is declared.
select hrs assign to "hours.txt" organization is line sequential
Here the name
hrs is chosen, by which this file will be refered to in the remainder of the program. The name is assigned to the path
hours.txt and it is organized as lines, i.e. every record is finished with a newline-symbol. This also tells COBOL, that the file is text-based and not a binary file.
sequential identifies the access pattern, i.e. in what order the file is read or written to. Other valid access patterns would be
Both files are handled identically here, as COBOL makes in this part of the program no difference of input- or output-files.
The data division contains two sections: the file section and the working-storage section. You should already know the working-storage section if you read the previous post, that is.
In the file section, records are defined. This is in my eyes a special point of COBOL, as it lets you define data structures explicitely for working with files and I/O in particular. There are three data structures defined here.
fd hrs. 01 hrs-record. 02 d PIC IS Z9. 02 FILLER PIC IS X. 02 dayname PIC IS XX. 02 FILLER PIC IS X. 02 hours PIC IS XX. 02 FILLER PIC IS X(6). 02 descr PIC IS X(50).
The data structure
hrs-record is defined for use with the file
hrs. The definition of the variables and fields is identical to what it would be in the working-storage section. The keyword
FILLER however is something special. It basically serves as a placeholder, as you want the Space in line 3 for example, but you don't you to access it every. So one can use the "name"
FILLER to let the compiler know, that I will never access it and I don't want any name registered with this field of the record. This means I can use it more than once in my program.
fd hrs-out. 01 hrs-out-rec. 02 dayname PIC IS XX. 02 a PIC IS X(2). 02 d PIC IS XX. 02 b PIC IS X(7). 02 hours PIC IS XX. 02 c PIC IS X(15). 02 descr PIC IS X(50). 01 total-rec. 02 s PIC IS X(14). 02 total PIC IS X(3).
The second file will contain two records. They can just be written directly one after the other and both will be usable with the file.
One thing to note is, the records defined here are all characters, that means we cannot do any calculations on them. You could in theory defined certain fields as a numeric value. I always found it more easy to copy the data twice, when needing input and output of records, so I relied on the casting-mechanisms of COBOL.
The working storage section describes the variables used in the program. In this case there is
nmbr used for calculating the total number of hours worked in the month and we have
ls, which is basically the working-copy of the input record.
01 ls. 02 d PIC IS 99. 02 dayname PIC IS XX. 02 hours PIC IS 99. 02 descr PIC IS X(50). 01 nmbr PIC IS 9(3).
The procedure division consists of three paragraphs. The
bg paragraph is the begin of the program, which opens the two files and initialises the data structures.
reed reads the next record from the input file, does some reformatting and outputs the output-record to the output file.
e is the end of the program and write the totally worked hours to the output-file.
A file, which has previously been declared under File-Control in the Environment Division, can simply be opened by using the
OPEN <mode> <file>.
For simplicity some arguments are omitted. The mode can be one of the four:
INPUT -- the file can only be read from
OUTPUT -- the file can only be written to, the file will be created if not present, or emptied if present
I-O -- the file can be read from and written to
EXTEND -- the file can be written to, the file-pointer is placed after the last logical record of the file. It can only be used with
sequential access patterns.
In our case, one file is opened as input, one as output.
Reading from files
Reading from files is as simple as calling
READ with the file handle afterwards. However the
READ verb can loads of things, for example it can divert the control flow if the end of the file is reached, for example for jumping to a different place or for setting a variable.
READ <file> [INTO <data-structure>] [AT END <statement>].
When reading a file, where several data structure are attached to, the
INTO clause can be used to identify the record, which should be filled.
AT END specifies the statement, which should be executed at the end of the file.
If you want to read a file until the end, one way is to use
goto in the
AT END clause of the
READ verb. However a different approach is using a
MOVE 0 TO feof. PERFORM UNTIL feof > 0 READ hours RECORD INTO hrs-record AT END MOVE 1 TO feof END-READ *> do stuff END-PERFORM.
In this case the variable
feof needs to be declared in the working-storage section.
PERFOM loops until the condition is true, so until
feof is bigger than 0. This will happen as soon as the end of the file is found. This way at least one could go around the use of
goto. Please note, that the only period is followed
END-PERFORM, as the sentence does not end until the
PERFORM-loop is finished.
Writing a record to file
Writing is also as simple as using
WRITE. There are parameters for the
WRITE-verb, but you will probably know when you need to use one.
Closing a file
Closing a file is done via the
CLOSE verb, which takes the file handle as parameter.
So what does it do?
The example program, which we have dived through, takes an input file which is essentially formatted like that:
01 Fr 12 hrs. Really good. 02 Sa 32 hrs. Even better!
The program reformats it and prints the reformatted text alongside the total number of hours into an output-file:
Fr, 01 Sep.: 12 hrs., Report: Really good. Sa, 02 Sep.: 32 hrs., Report: Even better! Hours total: 044
When I wrote this program it was apparently September. This program of course is of limited use, but nontheless I extended it quite a bit, to even give me input-files for gnuplot and create reports of how long I've worked in a month.
In a way I find it quite interesting, that COBOL gives me the tools to work with files as input and output this easily. I've not seen any other language which makes it that easy for me to read and write files. C-structs would be similar, but I have to check first, where the newline symbol is myself, and that's a huge hassle. At least I have found one thing COBOL seems to be quite good at.