unknown encoding problem

John Machin sjmachin at lexicon.net
Fri Apr 8 19:36:50 EDT 2005


On Fri, 08 Apr 2005 15:45:35 +0200, Uwe Mayer <merkosh at hadiko.de>
wrote:

>Hi,
>
>I need to read in a text file which seems to be stored in some unknown
>encoding. Opening and reading the files content returns:
>
>>>> f.read()
>'\x00 \x00 \x00<\x00l\x00o\x00g\x00E\x00n\x00t\x00r\x00y\x00...
>
>Each character has a \x00 prepended to it. I suspect its some kind of
>unicode - how do I get rid of it? 
>

Interesting attitude. Why do you want to "get rid of it"? Have you
considered investigating the source of this suspicious text? You never
know, there could be something really interesting in there, like
'\x00v\x00o\x00n\x00 \x04\x1c\x04>\x04A\x04:\x042\x040\x00
\x00m\x00i\x00t\x00 \x00L\x00i\x00e\x00b' :-)

> str.replace('\x00', '')

Why not go the whole hog:

''.join([c for c in foreign_text if 32 <= ord(c) <= 126 or c in
'\t\r\n'])

Alternatively, try embracing Unicode -- it's the way forward, and it's
not that difficult.



More information about the Python-list mailing list