whitespace again
Fredrik Lundh
fredrik at pythonware.com
Fri Apr 14 05:37:46 EDT 2006
"david brochu jr" wrote:
> still having problems....i have the following in a txt file:
>
> Windows Registry Editor Version 5.00
if this is a regedit export, the data is encoded as UTF-16. treating
that as plain ASCII doesn't really work.
> for line in new_line.readlines():
> line = re.sub('"',"",line)
> print line
>
> I get:
>
> i n d o w s R e g i s t r y E d i t o r V e r s i o n 5 . 0 0
> etc etc...Too much space...
it's NUL bytes (chr(0)), not space.
to open an UTF-16 file with automatic decoding, use codecs.open:
import codecs
infile = codecs.open("file", "r", "utf-16")
reading from "infile" will now give you properly decoded unicode strings,
which you can process as usual.
> this is killing me please help
sounds like you need to read up on what text encodings are, and how
you can let Python handle them for you. start here:
http://www.google.com/search?q=python+unicode
</F>
More information about the Python-list
mailing list