ascii to unicode line endings
Jean-Paul Calderone
exarkun at divmod.com
Wed May 2 12:29:20 EDT 2007
On 2 May 2007 09:19:25 -0700, fidtz at clara.co.uk wrote:
>The code:
>
>import codecs
>
>udlASCII = file("c:\\temp\\CSVDB.udl",'r')
>udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
>
>udlUNI.write(udlASCII.read())
>
>udlUNI.close()
>udlASCII.close()
>
>This doesn't seem to generate the correct line endings. Instead of
>converting 0x0D/0x0A to 0x0D/0x00/0x0A/0x00, it leaves it as 0x0D/
>0x0A
>
>I have tried various 2 byte unicode encoding but it doesn't seem to
>make a difference. I have also tried modifying the code to read and
>convert a line at a time, but that didn't make any difference either.
>
>I have tried to understand the unicode docs but nothing seems to
>indicate why an seemingly incorrect conversion is being done.
>Obviously I am missing something blindingly obvious here, any help
>much appreciated.
Consider this simple example:
>>> import codecs
>>> f = codecs.open('test-newlines-file', 'w', 'utf16')
>>> f.write('\r\n')
>>> f.close()
>>> f = file('test-newlines-file')
>>> f.read()
'\xff\xfe\r\x00\n\x00'
>>>
And how it differs from your example. Are you sure you're examining
the resulting output properly?
By the way, "\r\0\n\0" isn't a "unicode line ending", it's just the UTF-16
encoding of "\r\n".
Jean-Paul
More information about the Python-list
mailing list