base64 and unicode

Fri May 4 05:47:40 EDT 2007

Duncan Booth wrote:
> However, the decoded text looks as though it is utf16 encoded so it should be written as binary. i.e.  
> the output mode should be "wb".

Thanks for the "wb" tip that works (see bellow). I guess it is 
experience based but how could you tell that it was utf16 encoded?

> Simpler than using the base64 module you can just use the base64 codec. 
> This will decode a string to a byte sequence and you can then decode that 
> to get the unicode string:
> 
> with file("hebrew.b64","r") as f:
>    text = f.read().decode('base64').decode('utf16')
> 
> You can then write the text to a file through any desired codec or process 
> it first.

 >>> with file("hebrew.lang","wb") as f:
 >>> ... file.write(text.encode('utf16'))

Done ... superb!

> BTW, you may just have shortened your example too much, but depending on 
> python to close files for you is risky behaviour. If you get an exception 
> thrown before the file goes out of scope it may not get closed when you 
> expect and that can lead to some fairly hard to track problems. It is much 
> better to either call the close method explicitly or to use Python 2.5's 
> 'with' statement.

Yes I had shortened my example but thanks for the 'with' statement tip 
... I never think about using it and I should ;)

Thanks,

EuGeNe -- http://www.3kwa.com