Trouble with unicode

Brian Quinlan BrianQ at ActiveState.com
Mon May 14 17:01:35 EDT 2001


Hmmmm, are you sure that the characters are Unicode? They look like latin-1
to me...

Anyway, I'm assuming that you want to generate ASCII text based on a unicode
object and that you simply want to strip characters that are not
representable in ASCII. Let me know if these assumptions are not true. If
they are, try this:

>>> from codecs import lookup
>>> toASCII = lookup( 'ascii' )[0]
>>> toASCII( u'123\555' )
>>> toASCII( u'123\555', 'replace' )
('123?', 4)

The result tuple contains the converted buffer and the length of the
converted buffer.

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Charlie Clark
> Sent: Monday, May 14, 2001 1:14 PM
> To: python-list at python.org
> Subject: Trouble with unicode
>
>
> I'm having trouble convert the contents of e-mails stored as unicode
> files into plain text. I'm not sure if I've understood how to
> deal with
> unicode :-(
>
> As usual the problem is with non-ascii characters.
>
> For example I have the following characters in the mail:
> "ä, Ä, ö, Ö, ü, Ü, ß"
>
> when I read the mail in Python as a string I get:
> "\xe4, \xc4, \xf6, \xd6, \xfc, \xdc, \xdf"
>
> I've followed the example from
> http://www.python.org/2.0/new-python.html
> but don't seem to be getting very far and ascii_decode() gives me the
> following error:
> "UnicodeError: ASCII decoding error: ordinal not in range(128)"
>
> Please help me "get it" with unicode.
>
> Thanx
>
> Charlie Clark
> --
> http://mail.python.org/mailman/listinfo/python-list
>





More information about the Python-list mailing list