'ascii' codec can't encode character u'\u2013'
Fredrik Lundh
fredrik at pythonware.com
Fri Sep 30 09:50:05 EDT 2005
Thomas Armstrong wrote:
> I'm trying to parse a UTF-8 document with special characters like
> acute-accent vowels:
> --------
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> ...
> -------
>
> But I get this error message:
> -------
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
> position 122: ordinal not in range(128)
> -------
> It works, but I don't want to substitute each special character, because there
> are always forgotten ones which can crack the program.
if you really want to use latin-1 in the database, and you don't mind dropping
unsupported characters, you can use
text_extrated = text_extrated.encode('iso-8859-1', 'replace')
or
text_extrated = text_extrated.encode('iso-8859-1', 'ignore')
a better approach is of course to convert your database to use UTF-8 and use
text_extrated = text_extrated.encode('utf-8')
it's also a good idea to switch to parameter substitution in your SQL queries:
cursor.execute ("update ... set text = %s where id = %s", text_extrated, id)
it's possible that your database layer can automatically encode unicode strings if
you pass them in as parameters; see the database API documentation for details.
</F>
More information about the Python-list
mailing list