Interpreting string containing \u000a
Peter Otten
__peter__ at web.de
Wed Jun 18 08:21:18 EDT 2008
Francis Girard wrote:
> I have an ISO-8859-1 file containing things like
> "Hello\u000d\u000aWorld", i.e. the character '\', followed by the
> character 'u' and then '0', etc.
>
> What is the easiest way to automatically translate these codes into
> unicode characters ?
If the file really contains the escape sequences use "unicode-escape" as the
encoding:
>>> "Hello\\u000d\\u000aWorld".decode("unicode-escape")
u'Hello\r\nWorld'
If it contains the raw bytes use "iso-8859-1":
>>> "Hello\x0d\x0aWorld".decode("iso-8859-1")
u'Hello\r\nWorld'
Open the file with
codecs.open(filename, encoding=encoding_as_determined_above)
instead of the builtin open().
Peter
More information about the Python-list
mailing list