Character set woes with binary data

Paul Boddie paul at boddie.org.uk
Sun Apr 1 14:09:02 EDT 2007


Michael B. Trausch wrote:
>
> I never said it did.  It just happens to be the context with which I am
> working.  I said I wanted to concatenate materials without regard for
> the character set.  I am mixing binary data with ASCII and Unicode, for
> sure, but I should be able to do this.

The problem is that Unicode has no default representation for mixing
with binary data and ASCII. What you should therefore ask yourself is,
"Which encoded representation of Unicode should I be using to mix my
text with those things?" Then, you should choose an encoding, call the
encode method on your Unicode objects, take the result, and mix away!

[...]

> In short:  How do I create a string that contains raw binary content
> without Python caring?  Is that possible?

All strings can contain raw binary content without Python caring.
Unicode objects, however, work on a higher level of abstraction:
characters, not bytes. Thus, you need to make sure that your Unicode
objects have been converted to bytes (ie. encoded to strings) in order
for the content to be workable at the same level as that binary
content.

Paul




More information about the Python-list mailing list