[Tutor] Read-ahead for large fixed-width binary files?

Marc Tompkins marc.tompkins at gmail.com
Sun Nov 18 18:09:20 CET 2007


On Nov 18, 2007 5:15 AM, Kent Johnson <kent37 at tds.net> wrote:

> Marc Tompkins wrote:
> > On Nov 17, 2007 8:20 PM, Kent Johnson <kent37 at tds.net
> > <mailto:kent37 at tds.net>> wrote:
> >     use plain slicing to return the individual records instead of
> StringIO.
> >
> > I hope I'm not being obtuse, but could you clarify that?
>
> I think it will simplify the looping. A sketch, probably needs work:
>
> def by_record(path, recsize):
>   with open(path,'rb') as inFile:
>     inFile.read(recLen)  # throw away the header record
>     while True:
>       buf = inFile.read(recLen*4096)
>       if not buf:
>         return
>       for ix in range(0, len(buf), recLen):
>         yield buf[ix:ix+recLen]
>
>  > I'm not sure I see how this makes my
> > life better than using StringIO (especially since I'm actually using
> > cStringIO, with a "just-in-case" fallback in the import section, and it
> > seems to be pretty fast.)
>
> This version seems simpler and more readable to me.
>
> Kent
>
It does look lean and mean, true.  I'll time this against the cStringIO
version. One thing, though - I think I need to do

> if len(buf) < recLen:
>         return
>
rather than

>       if not buf:
>         return
>
I'll have to experiment again to refresh my memory, but I believe I tried
that in one of my first iterations (about a year ago, so I may be
remembering wrong.)  If I remember correctly, read() was still returning a
result - but with a size that didn't evaluate to false.  As you can imagine,
hilarity ensued when I tried to slice the last record.

Of course, I may have hallucinated that while on an extended caffeine jag,
so feel free to disregard!
-- 
www.fsrtechnologies.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071118/65f1e23c/attachment.htm 


More information about the Tutor mailing list