recycling internationalized garbage

Wed Mar 15 03:06:21 EST 2006

Martin v. Löwis wrote:
> The point is that you can tell UTF-8 reliably. If the data decodes
> as UTF-8, it *is* UTF-8, because no other encoding in the world
> produces the same byte sequences (except for ASCII, which is
> an UTF-8 subset).

It should be obvious that any 8-bit single-byte character set can
produce byte sequences that are valid in UTF-8.   In fact I can't think
of any multi-byte encoding that can't produce valid UTF-8 byte
sequence.

                         Ross Ridge