GHS BULLETIN

GHS BULLETIN The Fundamentals of READ (DATA)

READ (DATA) is a command of significant utility, but there are some quirks to ensuring it works for your intended purpose. The basic power of READ (DATA) should be immediately evident: it reads data from a file into variables of your choice. A quick and easy method to read data from files is understandably very useful.

Reading a data file is first initiated by issuing a command line similar to the following:

READ (DATA) "mydata.txt" /DELIM:","

Running this command will tell GHS to look for the data file "mydata.txt". You must issue this command before trying to read data into variables. Why? Because without the above command GHS has no idea which data file you intend to be reading!

READ (DATA) gives users the option to specify their desired delimiter. The delimiter is the character that is used as a field separator in the data file. In the above example, we know that the data file "mydata.txt" is comma-separated, meaning each data value in a line is separated by a comma. This might look something like:

"Apple",4.566,7.89,3,56.7
"Orange",6.342,8.56,1,42.3

The /DELIM parameter will accept other delimiting characters like semi-colons and dashes, or ASCII values. For example, a tab character is indicated by /DELIM:9. Knowing the delimiting character for your data file is very important because it tells READ (DATA) where each data field should start and stop.

Once the file specification (and delimiter) are established, reading data is accomplished by issuing successive READ commands. For example:

VARI x
READ x

...would take the first field in the current line (in this example "Apple") and attempt to place it in the user variable x. This brings up a very important point: when reading data values, string data cannot be stored in just any user variable. To read string data (i.e. text or characters, such as in this example) you must create a user variable of a string type. To those with some programming experience, this is a familiar idea, but for those with less familiarity, all this means is the variable can contain a specific type of data, namely strings. To create a variable as such, you would use:

VARI (STRING) str

...where str is a variable which may accept string type data. If you ever encounter a "Non-numeric data" error, such as in this example, it's because you are trying to read string data into a regular user variable. Keep in mind, so-called numeric data can be read into a string variable. This is because numeric data may also be interpreted as regular string characters, and not their underlying numeric meaning. For this reason, when you have mixed data files, it is sometimes easiest to parse only with string variables, and then "convert" to regular variables after you are certain which variables are numeric and which are non-numeric. This conversion is essentially automatic, as if you were to perform a subsequent numeric operation on a numerically derived string variable, the operation would be valid.

Continuing with our example, now with a string variable, we look at how to parse all entries in a data line. There are several ways to do this, but the most general method is to parse with successive READ commands. So, for this example, we would issue:

READ str `assigns field entry "Apple" to str
READ str `assigns field entry 5.466 to str
READ str `assigns field entry 7.89 to str
READ str `assigns field entry 3 to str
READ str `assigns field entry 56.7 to str
READ str `assigns field entry "Orange" to str, the first field from the next line

As you can see, each READ command reads the next field in the line. When the end of the line is reached and there are no more fields, the next READ command will go to the first field of the next line. This is all well and good, but there are two problems here: 1) we are overwriting the variable str with each call of READ, and 2) what happens if we do not know how many fields are on the line?

To address these problems, we use three very useful techniques: indexing, dynamic variable initialization and the EOL system variable. These techniques are implemented in a MACRO loop:

MACRO read_line
 SET j={j} PLUS 1
 VARI (STRING) field_{j}
 READ field_{j}
 IF eol THEN exit ELSE exit read_line
/

VARI j=0
.read_line

Let's look carefully at this macro. This is an example of a while loop. So-called because the macro will repeat by exiting and then calling itself while the EOL variable is not true (i.e. while EOL=0). Once EOL is true (EOL=1), the macro will exit permanently. The EOL variable, as you might expect, is true only when the end of the line is reached, therefore the macro will repeat through all fields in the line and no more.

During this looping process, we are using an indexing variable j to count each loop. Before calling the macro, we initialize this variable to 0 so that on the first loop it is increased to 1, then on the second loop 2, and so forth until the loop is exited. This also means that we will know the total number of fields in the line at the end of the macro loop, which can be very useful.

We want to store each field in its own variable so as to not overwrite their values. To do this, we create a unique variable name on each loop, which is a form of dynamic variable initialization. We do this by appending a common variable name prefix (in this case "field") with the index value from the variable j. With this technique, using the example data given above, we will create the following variables:

field_1 `assigned the entry "Apple"
field_2 `assigned the entry 4.566
field_3 `assigned the entry 7.89
field_4 `assigned the entry 3
field_5 `assigned the entry 56.7

We've now parsed the entire line, put each field into its own variable, and counted the number of fields in the line. Not bad for one macro!

While the method described up to this point is the most general means to parse data in a line, it is not the only method. If you already know how many fields are in a line, and you also know their types, you can simply read the fields into individual variables in a single line:

VARI (STRING) field_1
VARI field_2,field_3,field_4,field_5

READ field_1,field_2,field_3,field_4,field_5

One may also skip remaining fields and move to the next line using the /SKIP parameter:

VARI (STRING) field_1
VARI field_2,field_3

READ field_1,field_2,field_3 /SKIP

Or, one may read an entire line into a single variable using the /LINE parameter (and move to the next line):

VARI (STRING) line

READ line /LINE `note that this preserves the "" surrounding the first field entry

All of the aforementioned methods can be useful depending on how much information you already know about the file being parsed. However, let us continue with our generalized version given by the macro read_line. This macro works well for a single line of data, but how might we parse all lines in a data file?

To parse all lines in a file, we modify the original macro and add a second macro:

MACRO read_line
 SET j={j} PLUS 1
 VARI (STRING) field_%1_{j} 
 READ field_%1_{j}
 IF eol THEN exit ELSE exit read_line
/

MACRO read_all_lines
 SET i={i} PLUS 1
 SET j=0
 .read_line {i}
 IF eof THEN exit ELSE exit read_all_lines
/ 

VARI j
VARI i=0
.read_all_lines

You will notice that we've appended another index to the field variables that we are creating. Now, the first index, %1, is the line number from the file, and the second index, j, is the field or column number like before. Notice that when read_line is called, we pass the line index, i, as the first parameter (%1) to the macro, so we are really indexing by i and j. Pretty nifty!

Also notice our use of the EOF system variable. This variable is true (EOF=1) when we reach the end of the file. Much like the read_line macro, the read_all_lines macro will loop until the end of the file is reached, thereby parsing each line and storing each field entry from each line in the entire file.

After running macro read_all_lines, we will have created the following variables:

field_1_1 `assigned the entry from line 1, column 1: "Apple"
field_1_2 `assigned the entry from line 1, column 2: 4.566
field_1_3 `assigned the entry from line 1, column 3: 7.89
field_1_4 `assigned the entry from line 1, column 4: 3
field_1_5 `assigned the entry from line 1, column 5: 56.7

field_2_1 `assigned the entry from line 2, column 1: "Orange"
field_2_2 `assigned the entry from line 2, column 2: 6.342
field_2_3 `assigned the entry from line 2, column 3: 8.56
field_2_4 `assigned the entry from line 2, column 4: 1
field_2_5 `assigned the entry from line 2, column 5: 42.3

After all parsing is complete, we must close the data file. This is accomplished by issuing the following command:

READ (DATA) OFF

It is very important that the data file being read is always closed when finished. Otherwise, you may run into errors if you try to access or modify the file later.

Data Parsing Example

The following example outlines the development of a data file parsing run file for a very large and varied data file. The techniques discussed in this bulletin will be used, as well as a few clever tricks. The data file used is a large publicly available database of 2017 Vessel Characteristics (in this case towboats) as sourced from the US Army Corps of Engineers public data library. Visit the USACE page here (Accessed: 3/21/2019). Both the data file and example run file may be downloaded here. The run file includes line-by-line comments to explain the logical progression of commands and macros.

If you would like to see another bulletin created regarding a specific topic, please email Creative Systems, Inc. at support@ghsport.com.