Reading from file up to arbitrary byte.
David M. Wilson
dw-google.com at botanicus.net
Fri Feb 13 19:34:34 EST 2004
Peter Hansen wrote:
> I'd say the simple way to do it would be to read in chunks and hold a
> remainder over. It's not complicated: why don't you like that way?
> Or even simpler: just use read()[:chunksize] and not worry about the
> fact that you're reading all the data and throwing some away.
> Performance-wise, this probably beats the pants off most alternatives,
> if performance is what concerns you, and unless your file is really
> big and chunksize is small, who cares about the memory that is wasted
> for a few microseconds?
> It might also help respondents if you describe the reason for wanting
> to read the first part of the file like that. Maybe there's a more
> suitable approach.
Hi Peter, thanks for your reply.
I wanted to avoid keeping a remainder as I would have thought the
underlying implementation would have to do this anyway when doing
readline(). The tool I am working on reads the UK Postal Address File
(1.5gb of data), to be deployed on a small 800mhz VIA C3 server.
I have created a minimalist module for reading in the tabular data, in a
way that is as close to 'wire speed' as possible. Previously I have used
the Python 2.3 CSV module, and a C implementation of a CSV reader I
found on the web, however the data set I am dealing with has a very
basic structure, and I found the two CSV modules overly complicated for
the task.
I failed to produce something that is clean, but it does exactly what it
says on the tin and that's all I need. If you care for a nosey:
http://botanicus.net/dw/IDTDR.py.txt
David.
More information about the Python-list
mailing list