universal newlines and utf-16

Stefan Behnel stefan_ml at behnel.de
Sun Apr 11 10:37:01 EDT 2010


Baz Walter, 11.04.2010 16:12:
> i am using python 2.6 on a linux box and i have some utf-16 encoded
> files with crlf line-endings which i would like to open with universal
> newlines.
>
> so far, i have been unable to get this to work correctly.
>
> for example:
>
>  >>> open('test.txt', 'w').write(u'a\r\nb\r\n'.encode('utf-16'))
>  >>> repr(open('test.txt', 'rbU').read().decode('utf-16'))
> "u'a\\n\\nb\\n\\n'"
>  >>> import codecs
>  >>> repr(codecs.open('test.txt', 'rbU', 'utf-16').read())
> "u'a\\n\\nb\\n\\n'"
>
> of course, the output i want is:
>
> "u'a\\nb\\n'"
>
> i suppose it's not too surprising that the built-in open converts the
> line endings before decoding, but it surprised me that codecs.open does
> this as well.

The codecs module does not support universal newline parsing (see the 
docs). You need to use the new io module instead.

Stefan




More information about the Python-list mailing list