Lookahead while doing: for line in fh.readlines():

Veek. M vek.m1234 at gmail.com
Fri Mar 4 22:31:14 EST 2016


MRAB wrote:

> On 2016-03-04 13:04, Veek. M wrote:
>> Terry Reedy wrote:
>>
>>> On 2/27/2016 4:39 AM, Veek. M wrote:
>>>> I want to do something like:
>>>>
>>>> #!/usr/bin/env python3
>>>>
>>>> fh = open('/etc/motd')
>>>> for line in fh.readlines():
>>>>      print(fh.tell())
>>>>
>>>> why doesn't this work as expected.. fh.readlines() should return a
>>>> generator object and fh.tell() ought to start at 0 first.
>>>
>>> Not after you have already read some data.  Readlines() reads the
>>> entire
>>> file and splits it into lines.  readline reads at least a single
>>> block.
>>>   Reading a single byte or character at a time looking for /n would
>>>   be
>>> too slow, so even after readline, the file pointer will be somewhere
>>> past the end of the last line returned.
>>>
>>>> Instead i get the final count repeated for the number of lines.
>>>>
>>>> What i'm trying to do is lookahead:
>>>> #!whatever
>>>>
>>>> fh = open(whatever)
>>>> for line in fh.readlines():
>>>>      x = fh.tell()
>>>>      temp = fh.readline()
>>>>      fh.seek(x)
>>>>
>>>
>>>
>>
>> I get that readlines() would slurp the whole file for efficiency
>> reasons. Why doesn't fh.seek() work though. Object 'fh' is a data
>> structure for the OS file descriptor similar to FILE in C.
>> <class '_io.TextIOWrapper'>
>>
>  > So if seek works in C, how come it doesn't work in python wrt
>  > readlines() which is just a method. What obviates the functioning
>  > of seek wrt readlines()?
>  >
>  > fh.tell() works at the line level.. and fh.readline() works with
>  > fh.seek(0)
>  >
> 
> fh.readlines() reads the entire file.
> 
> At this point, it's at the end of the file.
> 
> The 'body' of the 'for' loop is then executed.
> 
> fh.tell() returns the the position of the end of the file because it's
> at the end of the file.
> 
> fh.readline() returns an empty string because it's at the end of the
> file.
> 
> fh.seek(x) seeks to the end of the file, which is where it already is.
> 
> Is that clearer?
Ah, right - got it - sorry for being thick. readlines() slurps the whole 
darn thing so the file pointer is at the EOF and within the loop body, 
i'm just saving that EOF position and restoring it back every time.



More information about the Python-list mailing list