unicode to ascii converting

Peter Wilkinson pwilkinson at videotron.ca
Fri Aug 6 15:35:31 EDT 2004


Well this is interestingly annoying:

u"ä".encode("ascii", "ignore") ->  ''    # works just fine but as I have 
written

aa = "ä"
aa.encode("ascii","ignore") ->

Traceback (most recent call last):
   File "<interactive input>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: 
ordinal not in range(128)

So I am guessing that I don't understand something about the syntax

Peter

At 02:31 PM 8/6/2004, Bernhard Herzog wrote:
>Peter Wilkinson <pwilkinson at videotron.ca> writes:
>
> > It would be good to find out _why_ this happens in the first place. I
> > will keep do a little searching on this for a few days.
>
>Most likely because you have characters in that file that are not in the
>ASCII character set. ASCII is after all only a very small subset of
>unicode.  E.g.
>
> >>> u"ä".encode("ascii")
>Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in 
>position 0: ordinal not in range(128)
>
>
>If it's OK to lose information, you could use the error argument to
>.encode like
>
> >>> u"ä".encode("ascii", "ignore")
>''
>
>or
>
> >>> u"ä".encode("ascii", "replace")
>'?'
>
>
>    Bernhard
>
>--
>Intevation GmbH                                 http://intevation.de/
>Skencil                                http://sketch.sourceforge.net/
>Thuban                                  http://thuban.intevation.org/
>--
>http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list