speeding up reading files (possibly with cython)
Tim Chase
python.list at tim.thechases.com
Sat Mar 7 19:19:55 EST 2009
> i have a program that essentially loops through a textfile file thats
> about 800 MB in size containing tab separated data... my program
> parses this file and stores its fields in a dictionary of lists.
>
> for line in file:
> split_values = line.strip().split('\t')
> # do stuff with split_values
>
> currently, this is very slow in python, even if all i do is break up
> each line using split() and store its values in a dictionary, indexing
> by one of the tab separated values in the file.
I'm not sure what the situation is, but I regularly skim through
tab-delimited files of similar size and haven't noticed any
problems like you describe. You might try tweaking the optional
(and infrequently specified) bufsize parameter of the
open()/file() call:
bufsize = 4 * 1024 * 1024 # buffer 4 megs at a time
f = file('in.txt', 'r', bufsize)
for line in f:
split_values = line.strip().split('\t')
# do stuff with split_values
If not specified, you're at the mercy of the system-default
(perhaps OS specific?). You can read more at[1] along with the
associated warning about setvbuf()
-tkc
[1]
http://docs.python.org/library/functions.html#open
More information about the Python-list
mailing list