'ascii' codec can't encode character u'\u2013'

Fredrik Lundh fredrik at pythonware.com
Fri Sep 30 09:50:05 EDT 2005


Thomas Armstrong wrote:

> I'm trying to parse a UTF-8 document with special characters like
> acute-accent vowels:
> --------
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> ...
> -------
>
> But I get this error message:
> -------
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
> position 122: ordinal not in range(128)
> -------

> It works, but I don't want to substitute each special character, because there
> are always forgotten ones which can crack the program.

if you really want to use latin-1 in the database, and you don't mind dropping
unsupported characters, you can use

    text_extrated = text_extrated.encode('iso-8859-1', 'replace')

or

    text_extrated = text_extrated.encode('iso-8859-1', 'ignore')

a better approach is of course to convert your database to use UTF-8 and use

    text_extrated = text_extrated.encode('utf-8')

it's also a good idea to switch to parameter substitution in your SQL queries:

    cursor.execute ("update ... set text = %s where id = %s", text_extrated, id)

it's possible that your database layer can automatically encode unicode strings if
you pass them in as parameters; see the database API documentation for details.

</F> 






More information about the Python-list mailing list