Bulding arrays from text file data.
Mike C. Fletcher
mcfletch at rogers.com
Tue Apr 30 19:55:48 EDT 2002
Okay, off the top:
string.split( string ) instead of re.split('\s+') should be significantly
faster
your string.strip( ) usage is causing almost the whole string to be
copied if there is any whitespace at beginning/end. Might be better
(test it) to simply delete null strings from beginning/end of list if
they exist. i.e.
if strings and not strings[0]: del strings[0]
if strings and not strings[-1]: del strings[-1]
if you can process the file by lines, using xreadlines might help if the
lines are sufficiently long and you're really running out of memory.
trade off is lots of Python code getting run.
minor note: shape can probably accept -1 as an argument saying "whatever
fits in this dimension"
if none of those gets you in spitting range of acceptable, a scheme
where you read in a large chunk (a few tens of megabytes of text),
accumulate those results in a list, add the tail to the next few
megabytes, then process that chunk (rinse, repeat), would likely be the
next step in optimising.
HTH,
Mike
Joe Woodward wrote:
> Does anyone know of a faster way to build an array out of text file
> with formatted data. What I have been doing works well, but now my
> files and 200MB and up. The following way builds a Numeric.array
> without knowing the size to start with. I can then resize it after the
> fact.
>
>
> yldpos=Numeric.array(map(float,re.split('\s+',string.strip(open('datafile.txt').read()))))
>
> yldpos.shape=(len(yldpos)/5,5)
>
>
> Thank you,
>
> Joe Woodward
> PHoenix Analysis & Design Technologies
--
_______________________________________
Mike C. Fletcher
http://members.rogers.com/mcfletch/
More information about the Python-list
mailing list