Reading from file up to arbitrary byte.

Fri Feb 13 19:34:34 EST 2004

Peter Hansen wrote:

> I'd say the simple way to do it would be to read in chunks and hold a 
> remainder over.  It's not complicated: why don't you like that way?

> Or even simpler: just use read()[:chunksize] and not worry about the
> fact that you're reading all the data and throwing some away.
> Performance-wise, this probably beats the pants off most alternatives,
> if performance is what concerns you, and unless your file is really 
> big and chunksize is small, who cares about the memory that is wasted
> for a few microseconds?

> It might also help respondents if you describe the reason for wanting
> to read the first part of the file like that.  Maybe there's a more
> suitable approach.

Hi Peter, thanks for your reply.

I wanted to avoid keeping a remainder as I would have thought the 
underlying implementation would have to do this anyway when doing 
readline(). The tool I am working on reads the UK Postal Address File 
(1.5gb of data), to be deployed on a small 800mhz VIA C3 server.

I have created a minimalist module for reading in the tabular data, in a 
way that is as close to 'wire speed' as possible. Previously I have used 
the Python 2.3 CSV module, and a C implementation of a CSV reader I 
found on the web, however the data set I am dealing with has a very 
basic structure, and I found the two CSV modules overly complicated for 
the task.

I failed to produce something that is clean, but it does exactly what it 
says on the tin and that's all I need. If you care for a nosey:

	http://botanicus.net/dw/IDTDR.py.txt

David.