readlines() reading incorrect number of lines?

John Machin sjmachin at lexicon.net
Thu Dec 20 17:47:49 EST 2007


On Dec 21, 8:13 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> [Fixing top-posting.]
>
>
>
>
>
> On Thu, 20 Dec 2007 12:41:44 -0800, Wojciech Gryc wrote:
> > On Dec 20, 3:30 pm, John Machin <sjmac... at lexicon.net> wrote:
> [snip]
> >> > However, when I use Python's various methods -- readline(),
> >> > readlines(), or xreadlines() and loop through the lines of the file,
> >> > the line program exits at 16,000 lines. No error output or anything
> >> > -- it seems the end of the loop was reached, and the code was
> >> > executed successfully.
> ...
> >> One possibility: you are running this on Windows and the file contains
> >> Ctrl-Z aka chr(26) aka '\x1a'.
>
> > Hi,
>
> > Python 2.5, on Windows XP. Actually, I think you may be right about \x1a
> > -- there's a few lines that definitely have some strange character
> > sequences, so this would make sense... Would you happen to know how I
> > can actually fix this (e.g. replace the character)? Since Python doesn't
> > see the rest of the file, I don't even know how to get to it to fix the
> > problem... Due to the nature of the data I'm working with, manual
> > editing is also not an option.
>
> > Thanks,
> > Wojciech
>
> Open the file in binary mode:
>
> open(filename, 'rb')
>
> and Windows should do no special handling of Ctrl-Z characters.
>
> --
> Steven

I don't know whether it's a bug or a feature or just a dark corner,
but using mode='rU' does no special handling of Ctrl-Z either.

>>> x = 'foo\r\n\x1abar\r\n'
>>> f = open('udcray.txt', 'wb')
>>> f.write(x)
>>> f.close()
>>> open('udcray.txt', 'r').readlines()
['foo\n']
>>> open('udcray.txt', 'rU').readlines()
['foo\n', '\x1abar\n']
>>> for line in open('udcray.txt', 'rU'):
...    print repr(line)
...
'foo\n'
'\x1abar\n'
>>>

Using 'rU' should make the OP's task of finding the strange character
sequences a bit easier -- he won't have to read a block at a time and
worry about the guff straddling a block boundary.



More information about the Python-list mailing list