recycling internationalized garbage

Fredrik Lundh fredrik at pythonware.com
Thu Mar 16 05:47:17 EST 2006


"Martin v. Löwis" wrote:

>> It should be obvious that any 8-bit single-byte character set can
>> produce byte sequences that are valid in UTF-8.
>
> It is certainly possible to interpret UTF-8 data as if they were
> in a specific single-byte encoding. However, the text you then
> obtain is not meaningful in any language of the world.

Except those languages that uses words consisting of runs of accented
letters immediately followed by either undefined characters or odd sym-
bols, and never use accented characters in any other way.

(Given that the freedb spec says that it's okay to mix iso-8859-1 with
utf-8 on a record-by-record level, one might assume that they've de-
cided that the number of bands using such languages is very close to
zero...)

</F> 






More information about the Python-list mailing list