Problem with japanese characters in filenames

Philip 'Yes, that's my address' Newton nospam.newton at gmx.li
Sat Oct 21 07:41:00 EDT 2000


On Mon, 16 Oct 2000 21:51:49 -0700, jay.krell at cornell.edu wrote:

> At least with Unicode there is sort of only one, or at least fewer, "code
> pages". (You could consider UTF7, UTF8 and Java UTF8 as Unicode code
> pages...and I'm a bit ignorant, but the generalized terms are probably UCS7,
> UCS8, Java UCS8, UCS16, UCS32. UCS16 being the normal "big/wide/large/etc."
> representation. Java UTF8/UCS8 changing the representation of 0, how nice of
> Sun to follow standards...)

Nope; AFAIK UTF-n is "Unicode Transformation Format" and n is in bits;
UCS-n is "Universal Character Set" and n is in bytes. So you have UTF-7,
UTF-8, UTF-16 (sometimes split into UTF-16BE and UTF-16LE) and UTF-32,
as well as UCS-2 and UCS-4. I think there's also a UTF-1 that became
obsolete. http://czyborra.com/ has more on Unicode and stuff.

Cheers,
Philip
-- 
Philip Newton <nospam.newton at gmx.li>
If you're not part of the solution, you're part of the precipitate.



More information about the Python-list mailing list