readlines() reading incorrect number of lines?

John Machin sjmachin at lexicon.net
Thu Dec 20 16:07:15 EST 2007


On Dec 21, 7:41 am, Wojciech Gryc <wojci... at gmail.com> wrote:
> Hi,
>
> Python 2.5, on Windows XP. Actually, I think you may be right about
> \x1a -- there's a few lines that definitely have some strange
> character sequences, so this would make sense... Would you happen to
> know how I can actually fix this (e.g. replace the character)? Since
> Python doesn't see the rest of the file, I don't even know how to get
> to it to fix the problem... Due to the nature of the data I'm working
> with, manual editing is also not an option.
>

Please don't top-post.

Quick hack to remove all occurrences of '\x1a' (untested):

fin = open('old_file', 'rb') # note b BINARY
fout = open('new_file', 'wb')
blksz = 1024 * 1024
while True:
    blk = fin.read(blksz)
    if not blk: break
    fout.write(blk.replace('\x1a', ''))
fout.close()
fin.close()

You may however want to investigate the "strange character sequences"
that have somehow appeared in your file after you built it
yourself :-)

HTH,
John



More information about the Python-list mailing list