Reading in strings -> numbers ??

David Bolen db3l at fitlinxx.com
Mon May 1 21:39:33 EDT 2000


"Louis M. Pecora" <pecora at anvil.nrl.navy.mil> writes:

> After three weeks of learning Python have I actually found a real wart? 
> A common requirment in programming (especially for numerical stuff) is
> to read in data that is often generated by other programs and other
> people.  The common form is a "table" structure:
> 
> data11(white space)data12(white space)...data1m(return/newline)
> data21(white space)data22(white space)...data2m(return/newline)
> ...
> datan1(white space)datan2(white space)...datanm(return/newline/EOF)
> EOF
> 
> So you're saying that reading in something as basic as this is a
> "work-around?"  Sigh.

I definitely don't think handling a file like this is a "work-around",
but that's also because I don't think it requires a direct correlation
to the [f]scanf function.  Assuming for the moment that your lines are
columns of integers, this would be one way to process the file:

	import string

	input = open('filename')
	while 1:
	    line = input.readline()
	    if not line: break
	    columns = map(int,string.split(line))

This handles the file row by row, so you don't have to read the entire
thing into memory first.  Alternatively, using "input.readlines()"
would return a list of all lines from the file that you could parse or
access in any order you preferred, at the expense of memory.

During this process, you could use columns[x], where x was 1-m.  If
your columns were a single but different datatype, you could change
the first argument to map() (which is the function to iterate over the
list) to something else, such as float or long.  The reason for the
map is that splitting the string yields a list of strings, which
you'll likely need to convert into some numeric type for your actual
computations.  You could also use the "eval" suggested by a previous
poster, which would actually allow Python expressions in each column.

While not quite as flexible as a *rintf-like format string, it easily
handles the most common case of matrix information or other consistent
data types.  If your columns were more varied, then you could just do
the split, and process the columns however might be appropriate for
them individually.

As an aside, the [f]scanf functions were never something I suggest
using in C code, simply because of the possibility for buffer overruns
or mismatched pointer types in the arguments, and such.  They could be
convenient, but they could also leave a big gaping risk point in your
program if you weren't careful.  I always prefer splitting and parsing
the input myself (which as it turns out is a bit closer to the Python
approach) for more robust code.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list