Bulding arrays from text file data.

Mike C. Fletcher mcfletch at rogers.com
Tue Apr 30 19:55:48 EDT 2002


Okay, off the top:

	string.split( string ) instead of re.split('\s+') should be significantly 
faster

	your string.strip( ) usage is causing almost the whole string to be 
copied if there is any whitespace at beginning/end.  Might be better 
(test it) to simply delete null strings from beginning/end of list if 
they exist. i.e.

		if strings and not strings[0]: del strings[0]
		if strings and not strings[-1]: del strings[-1]

	if you can process the file by lines, using xreadlines might help if the 
lines are sufficiently long and you're really running out of memory. 
trade off is lots of Python code getting run.

	minor note: shape can probably accept -1 as an argument saying "whatever 
fits in this dimension"
	
if none of those gets you in spitting range of acceptable, a scheme 
where you read in a large chunk (a few tens of megabytes of text), 
accumulate those results in a list, add the tail to the next few 
megabytes, then process that chunk (rinse, repeat), would likely be the 
next step in optimising.

HTH,
Mike
	

Joe Woodward wrote:
> Does anyone know of a faster way to build an array out of text file
> with formatted data. What I have been doing works well, but now my
> files and 200MB and up. The following way builds a Numeric.array
> without knowing the size to start with. I can then resize it after the
> fact.
> 
> 
> yldpos=Numeric.array(map(float,re.split('\s+',string.strip(open('datafile.txt').read()))))
> 
> yldpos.shape=(len(yldpos)/5,5)
> 
> 
> Thank you,
> 
> Joe Woodward 
> PHoenix Analysis & Design Technologies


-- 
_______________________________________
   Mike C. Fletcher
   http://members.rogers.com/mcfletch/







More information about the Python-list mailing list