python: ascii read

Alex Martelli aleaxit at yahoo.com
Thu Sep 16 09:02:34 EDT 2004


Sebastian Krause <canopus at gmx.net> wrote:

> I did not explictly mention that the ascii file should be read in as an
> array of numbers (either integer or float).

Ah, right, you didn't .  So I was answering the literal question you
asked rather than the one you had in mind.

> To use open() and read() is very fast, but does only read in the data as
> string and it also does not work with large files.

It works just fine with files as large as you have memory for (and mmap
works for files as large as you have _spare address space_ for, if your
OS is decently good at its job).  But if what you want is not the job
that .read() and mmap do, the fact that they _do_ perform that job quite
well on large files is of course of no use to you.

Back to, why scipy.io.read_array works so badly for you -- I don't know,
it's rather complicated code, as well as maybe old-ish (wraps files into
class instances to be able to iterate on their lines) and very general
(lots of options regarding what are separators, etc, etc).  If your
needs are very specific  (you know a lot about the format of those huge
files -- e.g. they're column-oriented, or only use whitespace separators
and \n line termination, or other such specifics) you might well be able
to do better -- likely even in Python, worst case in C.  I assume you
need Numeric arrays, 2-d, specifically, as the result of reading your
files?  Would you know in advance whether you're reading int or float
(it might be faster to have two separate functions)?  Could you
pre-dimension the Numeric array and pass it in, or do you need it to
dimension itself dynamically based on file contents?  The less
flexibility you need, the simpler and faster the reading can be...


Alex



More information about the Python-list mailing list