Unicode strings, struct, and files

Tom Plunket tomas at fancy.org
Mon Oct 9 03:34:45 EDT 2006


John Machin wrote:

> > message = unicode('Hello, world')
> > myFile.write(message)
> >
> > results in 'message' being converted back to a string before being
> > written.  Is the way to do this to do something hideous like this:
> >
> > for c in message:
> >    myFile.write(struct.pack('>H', ord(unicode(c))))
> 
> I'd suggest UTF-encoding it as a string, using the encoding that
> matches whatever wchar means on the target machine, for example
> assuming bigendian and sizeof(wchar) == 2:

Ahh, this is the info that my trawling through the documentation
didn't let me find!

Thanks a bunch.

> utf_line1 = unicode_line1.encode('utf_16_be')
> etc
> struct.pack(">.........64s64s", ......, utf_line1, utf_line2)
> Presumes (1) you have already checked that you don't have more than 32
> characters in each "line" (2) padding with unichr(0) is acceptable.

This works frighteningly well.  ;)


-tom!



More information about the Python-list mailing list