file tell in a for-loop

Magdoll magdoll at gmail.com
Wed Nov 19 11:45:21 EST 2008


Gotcha. Thanks!

Magdoll

On Nov 19, 2:57 am, Tim Chase <python.l... at tim.thechases.com> wrote:
> Magdoll wrote:
> > I was trying to map various locations in a file to a dictionary. At
> > first I read through the file using a for-loop, buttell() gave back
> > weird results, so I switched to while, then it worked.
>
> > The for-loop version was something like:
> >                 d = {}
> >                 for line in f:
> >                          if line.startswith('>'): d[line] = f.tell()
>
> > And the while version was:
> >                 d = {}
> >                 while 1:
> >                         line = f.readline()
> >                         if len(line) == 0: break
> >                         if line.startswith('>'): d[line] = f.tell()
>
> > In the for-loop version, f.tell() would sometimes return the same
> > result multiple times consecutively, even though the for-loop
> > apparently progressed the file descriptor. I don't have a clue why
> > this happened, but I switched to while loop and then it worked.
>
> > Does anyone have any ideas as to why this is so?
>
> I suspect that at least the iterator version uses internal
> buffering, so thetell() call returns the current buffer
> read-location, not the current read location.  I've also had
> problems withtell() returning bogus results while reading
> through large non-binary files (in this case about a 530 meg
> text-file) once the file-offset passed some point I wasn't able
> to identify.  It may have to do with newline translation as this
> was python2.4 on Win32.  Switching to "b"inary mode resolved the
> issue for me.
>
> I created the following generator to make my life a little easier:
>
>    def offset_iter(fp):
>      assert 'b' in fp.mode.lower(), \
>        "offset_iter must have a binary file"
>      while True:
>        addr = fp.tell()
>        line = fp.readline()
>        if not line: break
>        yield (addr, line.rstrip('\n\r'))
>
> That way, I can just use
>
>    f = file('foo.txt', 'b')
>    for offset, line in offset_iter(f):
>      if line.startswith('>'): d[line] = offset
>
> This bookmarks the *beginning* (I think your code notes the
> *end*) of each line that starts with ">"
>
> -tkc




More information about the Python-list mailing list