file tell in a for-loop
Magdoll
magdoll at gmail.com
Wed Nov 19 11:45:21 EST 2008
Gotcha. Thanks!
Magdoll
On Nov 19, 2:57 am, Tim Chase <python.l... at tim.thechases.com> wrote:
> Magdoll wrote:
> > I was trying to map various locations in a file to a dictionary. At
> > first I read through the file using a for-loop, buttell() gave back
> > weird results, so I switched to while, then it worked.
>
> > The for-loop version was something like:
> > d = {}
> > for line in f:
> > if line.startswith('>'): d[line] = f.tell()
>
> > And the while version was:
> > d = {}
> > while 1:
> > line = f.readline()
> > if len(line) == 0: break
> > if line.startswith('>'): d[line] = f.tell()
>
> > In the for-loop version, f.tell() would sometimes return the same
> > result multiple times consecutively, even though the for-loop
> > apparently progressed the file descriptor. I don't have a clue why
> > this happened, but I switched to while loop and then it worked.
>
> > Does anyone have any ideas as to why this is so?
>
> I suspect that at least the iterator version uses internal
> buffering, so thetell() call returns the current buffer
> read-location, not the current read location. I've also had
> problems withtell() returning bogus results while reading
> through large non-binary files (in this case about a 530 meg
> text-file) once the file-offset passed some point I wasn't able
> to identify. It may have to do with newline translation as this
> was python2.4 on Win32. Switching to "b"inary mode resolved the
> issue for me.
>
> I created the following generator to make my life a little easier:
>
> def offset_iter(fp):
> assert 'b' in fp.mode.lower(), \
> "offset_iter must have a binary file"
> while True:
> addr = fp.tell()
> line = fp.readline()
> if not line: break
> yield (addr, line.rstrip('\n\r'))
>
> That way, I can just use
>
> f = file('foo.txt', 'b')
> for offset, line in offset_iter(f):
> if line.startswith('>'): d[line] = offset
>
> This bookmarks the *beginning* (I think your code notes the
> *end*) of each line that starts with ">"
>
> -tkc
More information about the Python-list
mailing list