whitespace again

Fredrik Lundh fredrik at pythonware.com
Fri Apr 14 05:37:46 EDT 2006


"david brochu jr" wrote:

> still having problems....i have the following in a txt file:
>
> Windows Registry Editor Version 5.00

if this is a regedit export, the data is encoded as UTF-16.  treating
that as plain ASCII doesn't really work.

> for line in new_line.readlines():
>  line = re.sub('"',"",line)
>  print line
>
> I get:
>
> i n d o w s   R e g i s t r y   E d i t o r   V e r s i o n   5 . 0 0

> etc etc...Too much space...

it's NUL bytes (chr(0)), not space.

to open an UTF-16 file with automatic decoding, use codecs.open:

    import codecs
    infile = codecs.open("file", "r", "utf-16")

reading from "infile" will now give you properly decoded unicode strings,
which you can process as usual.

> this is killing me please help

sounds like you need to read up on what text encodings are, and how
you can let Python handle them for you.  start here:

    http://www.google.com/search?q=python+unicode

</F>






More information about the Python-list mailing list