ascii to unicode line endings

fidtz at clara.co.uk fidtz at clara.co.uk
Thu May 3 07:30:37 EDT 2007


On 2 May, 17:29, Jean-Paul Calderone <exar... at divmod.com> wrote:
> On 2 May 2007 09:19:25 -0700, f... at clara.co.uk wrote:
>
>
>
> >The code:
>
> >import codecs
>
> >udlASCII = file("c:\\temp\\CSVDB.udl",'r')
> >udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
>
> >udlUNI.write(udlASCII.read())
>
> >udlUNI.close()
> >udlASCII.close()
>
> >This doesn't seem to generate the correct line endings. Instead of
> >converting 0x0D/0x0A to 0x0D/0x00/0x0A/0x00, it leaves it as  0x0D/
> >0x0A
>
> >I have tried various 2 byte unicode encoding but it doesn't seem to
> >make a difference. I have also tried modifying the code to read and
> >convert a line at a time, but that didn't make any difference either.
>
> >I have tried to understand the unicode docs but nothing seems to
> >indicate why an seemingly incorrect conversion is being done.
> >Obviously I am missing something blindingly obvious here, any help
> >much appreciated.
>
> Consider this simple example:
>
>   >>> import codecs
>   >>> f = codecs.open('test-newlines-file', 'w', 'utf16')
>   >>> f.write('\r\n')
>   >>> f.close()
>   >>> f = file('test-newlines-file')
>   >>> f.read()
>   '\xff\xfe\r\x00\n\x00'
>   >>>
>
> And how it differs from your example.  Are you sure you're examining
> the resulting output properly?
>
> By the way, "\r\0\n\0" isn't a "unicode line ending", it's just the UTF-16
> encoding of "\r\n".
>
> Jean-Paul

I am not sure what you are driving at here, since I started with an
ascii file, whereas you just write a unicode file to start with. I
guess the direct question is "is there a simple way to convert my
ascii file to a utf16 file?". I thought either string.encode() or
writing to a utf16 file would do the trick but it probably isn't that
simple!

I used a binary file editor I have used a great deal for all sorts of
things to get the hex values.

Dom




More information about the Python-list mailing list