convert string with raw binary data to unicode

Thomas Heller theller at python.net
Thu Feb 12 13:40:46 EST 2004


"Achim Domma" <domma at procoders.net> writes:

> Hi,
>
> I want to pass raw binary data from a file to a COM object. I read the data
> from file like this:
>
> data = file('path_to_file','rb').read()
>
> If passed to a COM object, data is converted to unicode in the way one would
> expect for strings. I.e. a lot of zeros are filled in. I want each two
> characters from data to be interpreted as one unicode character. I read the
> docu about codecs but can not find a suitable codec. I also tried to read
> the data like this:
>
> data = codecs.open('path_to_file','rb','???').read()
>
> I tried to use UCS2 for the ???, but this encoding does not exist. A posting
> found via google supposes to use UTF-16 but this is not the same and raises
> an error.
>
> This shouldn't be a big problem, but I can figure out how to solve it. Can
> anybody help?

If I understand your problem correctly, you want to construct a unicode
object containing arbitrary data in it's internal buffer.

And if I understand Python's unicode implementation correctly, than I
would say it isn't possible - since unicode objects do not contain
binary data, they contain characters (or how is this called in the
unicode world?).

OTOH, it should be possible to write a small extension wrapping the
PyUnicode_FromUnicode() function to accept arbitrary data.

Is there also a possibility to write a codec which does this?

Note that the 'if's above are probably big 'if's...

Thomas



More information about the Python-list mailing list