Python Unicode to String conversion

Fri Aug 31 20:29:48 EDT 2007

On Sep 1, 9:56 am, "Chris Mellon" <arka... at gmail.com> wrote:
> On 8/31/07, thijs.br... at gmail.com <thijs.br... at gmail.com> wrote:
>
> > Hi everyone,
>
> > I'm having quite some troubles trying to convert Unicode to String
> > (for use in psycopg, which apparently doesn't know how to cope with
> > unicode strings).
>
> > The error I keep having is something like this:
> > ERREUR:  Séquence d'octets invalide pour le codage «UTF8» : 0xe02063
>
> > (sorry, locale is french, it means "byte sequence invalid for encoding
> > <<utf8>>", the value is probably an e with one of the french accents)
>
> > I've found lots of stuff about this googling the error, but I don't
> > seem to be able to find a "works always"-function just to convert a
> > unicode variable back to string...
>
> encode().
>
>  You didn't post the code that was failing, I can encode that value
> into UTF-8
What is "that value"?
(1) unichr(0xe02063)? You must have a wide unicode build of Python ...
(2) u"\xe0\x20\x63"? Of course you can encode it; so what?

> (and unless I'm very much mistaken, you should be able to
> encode any unicode string to UTF-8).

That is true, by definition. However you are barking this truism up
the wrong tree. The unknown complainant's whinge is that it is
expecting a sequence of octets (an 8-bit string) that is valid UTF8,
but the actuality is something else. It is *NOT* trying to say that a
unicode input can't be converted to UTF8.