problem with unicode

Fri Apr 25 17:34:34 EDT 2008

On Apr 26, 6:42 am, Bjoern Schliessmann <usenet-
mail-0306.20.chr0n... at spamgourmet.com> wrote:
> John Machin wrote:
> > On Apr 25, 10:01 pm, Bjoern Schliessmann <usenet-
> >> >>> media="x???[?"
> >> >>> print repr(media.decode("utf-8"))
>
> >> u'x\u30ef\u30e6\u30ed[\u30e8'
>
> (dang, KNode doesn't autodetect encodings ...)
>
> > But that_unicode_string.encode("utf-8") produces
> > 'x\xe3\x83\xaf\xe3\x83\xa6\xe3\x83\xad[\xe3\x83\xa8'
> > which does not contain the complained-about byte 0x9c in position
> > 1 (or any other position) -- how can that be?
>
> Probably the OP used a different encoding. That seems even more
> likely given the fact that his postings have a Japanese encoding
> (but this one doesn't produce any 0x9c, either).
>

I've tried just about every Japanese encoding there is. None of those
produced 0x9c. Perhaps the OP might like to tell us a bit more about
"# media is a binary string (mysql escaped zipped file)". mysql,
escaped, zipped -- we could be three layers of onion skin away from
enlightenment.