help with unicode email parse

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Fri Sep 8 04:34:55 EDT 2006


In <1157693089.096269.191580 at b28g2000cwb.googlegroups.com>, neoedmund
wrote:

> john , you can look my code:
> it downloads email and save to local filesystem(filename and email
> contains non-english characters)
> it works now.
> but i still think python's unicode string is not as straightforward as
> java's
> string SHOULD always be unicode. or i'm trouble dealing them when they
> are in different encodings. because before using it, i must try to find
> what encoding it use, unicode or 8bit. and why the system use ascii to
> decode. you have explained some, but i cannot catch up you. however i
> never have encoding problem using string in java.

Really?  That would be true magic.  Outside the program strings have to be
encoded somehow so they always have to be decoded when you want a unicode
string. And it's impossible to guess the encoding 100% correctly.

Just go ahead and create some text files encoded in `utf-8`, `iso-8859-1`,
`ibm850` and make sure they contain characters that are not encoded with
the same byte values across those encodings, "umlauts" for instance.
"Hällö" might be a good test string.  Now read all those different text
files an print the content in Java and see yourself that it's necessary to
give the encoding explicitly if you want to deal with arbitrary encodings
and not just the default Java uses.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list