readlines() reading incorrect number of lines?

Wojciech Gryc wojciech at gmail.com
Thu Dec 20 15:41:44 EST 2007


Hi,

Python 2.5, on Windows XP. Actually, I think you may be right about
\x1a -- there's a few lines that definitely have some strange
character sequences, so this would make sense... Would you happen to
know how I can actually fix this (e.g. replace the character)? Since
Python doesn't see the rest of the file, I don't even know how to get
to it to fix the problem... Due to the nature of the data I'm working
with, manual editing is also not an option.

Thanks,
Wojciech

On Dec 20, 3:30 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Dec 21, 6:48 am, Wojciech Gryc <wojci... at gmail.com> wrote:
>
> > Hi,
>
> > I'm currently using Python to deal with a fairly large text file (800
> > MB), which I know has about 85,000 lines of text. I can confirm this
> > because (1) I built the file myself, and (2) running a basic Java
> > program to count lines yields a number in that range.
>
> > However, when I use Python's various methods -- readline(),
> > readlines(), or xreadlines() and loop through the lines of the file,
> > the line program exits at 16,000 lines. No error output or anything --
> > it seems the end of the loop was reached, and the code was executed
> > successfully.
>
> > I'm baffled and confused, and would be grateful for any advice as to
> > what I'm doing wrong, or why this may be happening.
>
> What platform, what version of python?
>
> One possibility: you are running this on Windows and the file contains
> Ctrl-Z aka chr(26) aka '\x1a'.




More information about the Python-list mailing list