Can I get the 8bit-string representation of any unicode string

wanghz at gmail.com wanghz at gmail.com
Sun Feb 12 10:11:13 EST 2006


Hello, everyone.

I have a problem when I'm processing unicode strings.  Is it possible
to get the 8bit-string representation of any unicode string?

Suppose I get a unicode string:
  a = u'\xc8\xce\xcf\xcd\xc6\xeb';
then, by
  a.encode('latin-1');
I can get the 8bit-string representation of it, that is, the physical
storage format of this string.

But for another kind of unicode string, say:
  b = u'\u4efb\u8d24\u9f50';
I have to:
  b.encode('utf-8')
to get the 8bit-string format of it.

Since these unicode strings are given by an external library function,
I don't know which kind a unicode string belongs to before I get it at
runtime.  So, I wonder if there is a unified way to get the 8bit-string
representation, say, byte-by-byte, of any unicode string?

Thank you very much.




More information about the Python-list mailing list