recycling internationalized garbage

Ross Ridge rridge at csclub.uwaterloo.ca
Wed Mar 8 12:51:09 EST 2006


aaronwmail-usenet at yahoo.com wrote:
> Question: what is a good strategy for taking an 8bit
> string of unknown encoding and recovering the largest
> amount of reasonable information from it (translated to
> utf8 if needed)?

Copy the string unmodified to the WWW page and ensure your page doesn't
identify the encoding used.  That way it becomes the browser's problem,
and if the user reading the page can understand the language the string
is written in there's a very good chance the browser will display it
correctly.  Unfortunately, that's how text like this is supposed to be
displayed.

> The output must be clean utf8 suitable for arbitrary xml parsers.

Oh, you're screwed then.

                                          Ross Ridge




More information about the Python-list mailing list