ascii to unicode line endings

Jean-Paul Calderone exarkun at divmod.com
Wed May 2 12:29:20 EDT 2007


On 2 May 2007 09:19:25 -0700, fidtz at clara.co.uk wrote:
>The code:
>
>import codecs
>
>udlASCII = file("c:\\temp\\CSVDB.udl",'r')
>udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
>
>udlUNI.write(udlASCII.read())
>
>udlUNI.close()
>udlASCII.close()
>
>This doesn't seem to generate the correct line endings. Instead of
>converting 0x0D/0x0A to 0x0D/0x00/0x0A/0x00, it leaves it as  0x0D/
>0x0A
>
>I have tried various 2 byte unicode encoding but it doesn't seem to
>make a difference. I have also tried modifying the code to read and
>convert a line at a time, but that didn't make any difference either.
>
>I have tried to understand the unicode docs but nothing seems to
>indicate why an seemingly incorrect conversion is being done.
>Obviously I am missing something blindingly obvious here, any help
>much appreciated.

Consider this simple example:

  >>> import codecs
  >>> f = codecs.open('test-newlines-file', 'w', 'utf16')
  >>> f.write('\r\n')
  >>> f.close()
  >>> f = file('test-newlines-file')
  >>> f.read()
  '\xff\xfe\r\x00\n\x00'
  >>>

And how it differs from your example.  Are you sure you're examining
the resulting output properly?

By the way, "\r\0\n\0" isn't a "unicode line ending", it's just the UTF-16
encoding of "\r\n".

Jean-Paul



More information about the Python-list mailing list