speeding up reading files (possibly with cython)

Sun Mar 8 05:15:29 EDT 2009

per wrote:

> i have a program that essentially loops through a textfile file thats
> about 800 MB in size containing tab separated data... my program
> parses this file and stores its fields in a dictionary of lists.
> 
> for line in file:
>   split_values = line.strip().split('\t')
>   # do stuff with split_values
> 
> currently, this is very slow in python, even if all i do is break up
> each line using split() and store its values in a dictionary, indexing
> by one of the tab separated values in the file.
> 
> is this just an overhead of python that's inevitable? do you guys
> think that switching to cython might speed this up, perhaps by
> optimizing the main for loop?  or is this not a viable option?

For the general approach and the overall speed of your program it does
matter what you want to do with the data once you've read it -- can you
tell us a bit about that?

Peter