'ascii' codec can't encode character u'\xf3'

Diez B. Roggisch deetsNOSPAM at web.de
Tue Aug 17 12:01:28 EDT 2004


oziko wrote:

> Now I can print the tags with no aparent problem. But now whe I tried to
> insert that value into a PostgreSQL data base I get the same error. I
> create the PostgreSQL database with default Unicode with

There seems to be a general misunderstanding about what unicode, an encoding
and all that together in python means.

Unicode is only an abstract definition of character-sets - the usual
suspects like what is in ascii, but also nearly everything somebody on this
planet of ours cares to write down once in a while.

Now an actual encoding is how these totally abstract character sets are
mapped to actual values. So for the capital letter "A", the ascii encoding
maps it to the well known value 65.

BUT: You can define another encoding, call it oziko or whatever, and map "A"
to 1 - if you like it.

Now UTF-8 is also only an encoding - with the capability to map most of
ascii on the usual numbers where you expect them, and a few escape chars
that allow for multi-byte seqhences to appear in the text that encode one
character. So it can encode the whole unicode set, on the price of not
beeing able to determine the length of a string by dividing the number of
bytes it contains it by the number of bytes a character uses - usual one.

So this is an extremely important lesson: unicode is _not_ - I repeat, _not_
- UTF-8. 

Now python has unicode objects. They are sequences of characters - what
shape these internally have is opaque to you and not of your concern. They
are _not_ strings!!!! strings in python are sequences of bytes - as we are
used to from C.

Now whenever you want to use a string that is encoded in a special encoding,
you can get it from a unicode-object by invoking encode on it. Thats what

u.encode('iso-8859-1')

does, if s is a unicode object.

The other way round, if you have a byte-sequence - conveniently stored in a
string - and want to get a unicode object from it, use decode

s.decode('iso-8859-1')

Now if you pass a unicode object to a function that wants a _string_, python
applies for you an automatic encode - with the default encoding!!!! As this
is usually ascii, you get the problems you had.

So what do you need to solve your problem at hand? You need to know which
encoding the sql driver wants for transmitting strings - most probably
utf-8, so they can encode all possible characters. And thus you have to
encode tthe strings you pass beforehand, or set the default encoding
properly.

The last thing is to explain where the u''-thingies fit in. They are a
shortcut for getting a unicode object - whatever characters are encountered
inside the u'', is interpreted with the encoding the python interpreter
uses to parse file at hand. Which one that is can either be specified
implicit (system settings) or explicit using the 


-*- coding: <codec> -*-

line on top of the source file.

You might want to start reading about unicode and python on the net, google
is as always your friend. 

-- 
Regards,

Diez B. Roggisch



More information about the Python-list mailing list