[Tutor] Read-ahead for large fixed-width binary files?

Marc Tompkins marc.tompkins at gmail.com
Sun Nov 18 06:03:39 CET 2007


On Nov 17, 2007 8:14 PM, Kent Johnson <kent37 at tds.net> wrote:

> Marc Tompkins wrote:
> > My question is this: does anybody know of an equivalent to
> > "readlines(sizehint)" for non-delimited, binary files?  I've Googled
> > and Googled until I'm groggy, but I don't seem to find what I want.
>
> Have you tried specifying a buffer size in the open() call?
>
> Kent
>

Yes.
I compared:
-  no buffer size specified
-  any of a wide range of positive numbers (from 1 to 4M)
-  -1
and saw no noticeable difference - as opposed to adding the StringIO
buffering, which kicked things up by a notch of six or so.

By the way, this is really obscurely documented.  It took me a lot of
Googling to find even one mention of it - in Programming Python by Mark Lutz
- and I was very excited... until I tested it and found that it did nothing
for me.  Bummer.  Then I re-read the passage:

> *Buffer size*
>
> The open call also takes an optional third buffer size argument, which
> lets you control stdio buffering for the file -- the way that data is
> queued up before being transferred to boost performance. If passed, means
> file operations are unbuffered (data is transferred immediately), 1 means
> they are line buffered, any other positive value means use a buffer of
> approximately that size, and a negative value means to use the system
> default (which you get if no third argument is passed, and generally means
> buffering is enabled). The buffer size argument works on most platforms, but
> is currently ignored on platforms that don't provide the sevbuf system
> call.
>
I've only tested on Windows XP; is XP one of those that don't provide
sevbuf?  (Actually, I think that's a typo - I think it should be "setvbuf" -
but it exists in both the 2001 and 2006 editions of the book.)  Perhaps
someone who codes closer to the silicon can enlighten me on that score.

I just realized that the one thing I didn't try was passing a value of 0 to
turn buffering OFF -  if, as the above passage seems to suggest, it's always
on by default.  I might try that tomorrow.  But in any case it looks like
all it'll do is make things slower.

-- 
www.fsrtechnologies.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071117/2ffeeef8/attachment.htm 


More information about the Tutor mailing list