TRANSREAD Name
TRANSREAD
Author
Craig B. Markwardt, NASA/GSFC Code 662, Greenbelt, MD 20770
craigm@lheamail.gsfc.nasa.gov Purpose
Parse a tabular ASCII data file or string array.
Calling Sequence
TRANSREAD, UNIT, VARi [, FORMAT=FORMAT] (first usage)
or
TRANSREAD, UNIT, VARi [, FORMAT=FORMAT], FILENAME=FILENAME (second usage)
or
TRANSREAD, STRINGARRAY, VARi [, FORMAT=FORMAT] (second usage)
Description
TRANSREAD parses an ASCII table into IDL variables, one variable
for each column in the table. The tabular data is not limited to
numerical values, and can be processed with an IDL FORMAT
expression or with a delimeter character.
TRANSREAD behaves similarly to READF/READS in that it transfers
ASCII input data into IDL variables. The difference is that
TRANSREAD reads more than one row in one pass, and returns data by
column. In a sense, it forms the *transpose* of the typical
output from READF or READS (which returns data by row), hence the
name TRANSREAD. [ TRANSREAD can parse up to 20 columns in its
current implementation, but that number can be easily increased. ]
TRANSREAD can optionally be provided with a FORMAT expression to
control the transfer of data. The usage is the same as for
READ/READF/READS. However, you may find that you need to slightly
modify your format statements to read properly. In this
implementation, variables are intermediately parsed with READS,
which appears from my experimentation to require at least a
default length for transfers.
Hence, you should use: ..., FORMAT='(D0.0,D0.0,I0)' ; GOOD
instead of: ..., FORMAT='(D,D,I)' ; BAD
As with the standard IDL READ-style commands, you need to supply
initial values to your variables before calling TRANSREAD, which
are used to determine the type. Then dimensions of the variable
are not important; TRANSREAD will grow the arrays to an
appropriate size to accomodate the input. Lines from the input
which do not contain the correct number of columns or do not obey
the format statement are ignored.
TRANSREAD will also flexibly manage typical data files, which may
contain blank lines, lines with comments (see COMMENT keyword), or
incomplete lines. These lines are ignored. It can be programmed
to wait for a user-specified "trigger" phrase in the input before
beginning or ending processing, which can be useful if for example
the input table contains some header lines (see STARTCUE and
STOPCUE keywords). [ The user can also pre-read these lines
before calling TRANSREAD. ] Finally, the total number of lines
read can be controlled (see MAXLINES keyword). TRANSREAD parses
until (a) the file ends, (b) the STOPCUE condition is met or (c)
the number of lines read reaches MAXLINES.
TRANSREAD has three possible usages. In the first, the file must
already be open, and TRANSREAD begins reading at the current file
position. In the second usage, a filename is given. TRANSREAD
automatically opens the file, and reads tabular data from the
beginning of the file. Normally the file is then closed, but this
can be prevented by using the NOCLOSE keyword.
In the third usage, a string array is passed instead of a file
unit. Elements from the array are used one-by-one as if they were
read from the file.
Since TRANSREAD is not vectorized, and does a significant amount
of processing on a per-line basis, it is probably not optimal to
use on very large data files.
Inputs
UNIT - in the first usage, UNIT is an open file unit which
contains ASCII tabular data to read. UNIT must not be a
variable which could be mistaken for a string array.
In the second usage, when FILENAME is specified, then upon
return UNIT contains the file unit that TRANSREAD used for
reading. Normally, the UNIT is closed before return, but
it can be kept open using the NOCLOSE keyword. In that
case the unit should be closed with FREE_LUN.
STRINGARRAY - this is the third usage of TRANSREAD. When a string
array is passed, elements from the array are used as
if they were lines from an input file. The array
must not be of a numeric type, so it cannot be
mistaken for a file unit. [ Of course, the string
itself can contain ASCII numeric data. ]
Outputs
VARi - List of named variables to receive columns from the table,
one variable for each column. Upon output each variable
will be an array containing the same number of elements,
one for each row in the table. If no rows were
successfully parsed, then the variable values are not
changed. Use the COUNT output keyword to determine whether
any rows were parsed.
NOTE: Up to twenty columns may be parsed. If more columns
are desired, then a simple modification must be made to the
IDL source code. To do so, find the beginning of the
procdure definition, identified by the words, "pro
transread, ..." and follow the instructions there.
Input Keyword Parameters
FORMAT - an IDL format expression to be used to transfer *each*
row in the table. If no format as given then the default
IDL transfer format is used, based on the types of the
input variables. As mentioned in the description above,
a length should be assigned to each format code; a length
of zero can be used for numeric types. Lines from the
input which do not contain the correct number of columns
or do not obey the format statement are ignored.
DELIM - A ASCII character string which separates (delimits) each
field in each row. This is commonly a comma or space. When
the DELIM keyword is used, the FORMAT string does not
require lengths for each variable. This allows data
entries in the text file to vary from line to line. For
Example
TRANSREAD, UNIT, A,B,C, DELIM=',', FORMAT='(A,I,F)', FILENAME='file.csv'
Notice that the format expression does not specify the
length of variables A, B, and C. They are separated by ','
on each line.
COMMENT - A one-character string which designates a "comment" in
the input. Input lines beginning with this character
(preceded by optional spaces) are ignored. FAILCOUNT
does not increase.
DEFAULT: no comments are recognized.
NOTE: lines which do not match the format statement are
ignored. Comments are likely to be ignored based on
this behavior, even without specifying the COMMENT
keyword; however the FAILCOUNT will increase.
MAXLINES - the maximum number of lines to be read from input. The
count begins *after* any STARTCUE is satisfied (if any)
DEFAULT: no maximum is imposed.
SKIPLINES - the number of lines of input to skip before beginning
to parse the table.
DEFAULT: no lines are skipped.
NOTE: if STARTCUE is also given, then the line count
does not start until after the STARTCUE phrase has
been encountered.
STARTCUE - a unique string phrase that triggers the start of
parsing. Lines up to and including the line containing
the cue are ignored. Because each line is checked for
this starting cue, it should be unambiguous.
DEFAULT: parsing begins immediately.
STOPCUE - a unique string phrase that triggers the finishing of
parsing. The line including the cue is ignored, and no
more reads occur afterward.
DEFAULT: no STOPCUE is imposed.
FILENAME - the presence of this keyword signals the second usage,
where TRANSREAD explicitly opens the input file named
by the string FILENAME. Reading begins at the start of
the file.
Normally TRANSREAD will close the input file when it
finishes. This can be prevented by setting the NOCLOSE
keyword.
DEFAULT: input is either an already-opened file passed
via the UNIT keyword, or a string array.
NOCLOSE - if set and if FILENAME is given, then the file is not
closed upon return. The file unit is returned in UNIT,
and must be closed by the user via FREE_LUN, UNIT.
DEFAULT: any files that TRANSREAD opens are closed.
DEBUG - set this keyword to enable debugging messages. Detailed
error messages will be printed for each failed line.
Output Keywords
LINES - the number of lines read, including comments and failed
parses.
COUNT - the number of rows successfully parsed. Can be zero if
accessing the input utterly fails, or if no rows are
present.
FAILCOUNT - the number of rows that could not be parsed
successfully. Comments and blank lines are not
included.
Examples
OPENR, UNIT, 'widgets.dat', /GET_LUN
A = '' & B = 0L & C = 0D
TRANSREAD, UNIT, A, B, C, COUNT=COUNT, FORMAT='(A10,I0,D0.0)'
FREE_LUN, UNIT
(First usage) Opens widgets.dat and reads three columns. The
first column is a ten-character string, the second an integer, and
the third a double precision value.
A = '' & B = 0L & C = 0D
TRANSREAD, UNIT, A, B, C, COUNT=COUNT, FORMAT='(A10,I0,D0.0)', $
FILENAME='widgets.dat'
(Second usage) Achieves the same effect as the first example, but
TRANSREAD opens and closes the file automatically.
SPAWN, 'cat widgets.dat', BUF
A = '' & B = 0L & C = 0D
TRANSREAD, BUF, A, B, C, COUNT=COUNT, FORMAT='(A10,I0,D0.0)'
(Third usage) Achieves the same effect as the first two examples,
but input is read from the string variable BUF.
A = '' & B = 0L & C = 0D
TRANSREAD, UNIT, A, B, C, DELIM=',', COUNT=COUNT, FORMAT='(A,I,D)', $
FILENAME='widgets.dat'
(Fourth usage) Example with DELIM keyword. Here the delimeter is
a comma (DELIM=',').
Modification History
Feb 1999, Written, CM
Mar 1999, Added SKIPLINES and moved on_ioerror out of loop, CM
Jun 2000, Added NOCATCH and DEBUG keyword options, CM
Jul 2009, Added DELIM keyword, thanks to Chris Holmes