unicode filenames
Beni Cherniavsky
cben at techunix.technion.ac.il
Thu Feb 6 09:29:32 EST 2003
On 2003-02-06, Carlos Ribeiro wrote:
> On Thursday 06 February 2003 11:16 am, Beni Cherniavsky wrote:
> > Since unix can afford to change all APIs and programs like windows did
> > (the mess that resulted explains why <wink>), unix must stay with the
> > byte-orineted filenames at the low level. This ensures that all programs
> > that store file names in files, etc., continue to work. UTF-8 is the only
> > encoding that can represent all of unicode that satisfies all these needs,
> > so everybody should migrate to UTF-8 filenames (CJK users might have
> > reservations to this; I'd be happy to learn their opinion).
>
> Sorry. It would be a big mess. Here in Brazil, I can safely assume that it is
> nearly impossible to find a computer *without* filenames with latin-1
> accented characters. Not to mention the problems that we have when mounting
> FAT partitions under Linux - many Unix users still need to use dual boot
> machines in order to use a few Windows apps.
>
If you use latin1 everywhere on the computer, you are OK too. Just don't
have one directory in latin1, another in latin8 and another in UTF-8.
If and when you decide to convert to UTF-8, you can run one script to
convert the whole filesystem. The problem will be with remaining
filenames lurking in files (e.g. playlists). That most probably requires
a period of manual fix-as-it-breaks after the conversion...
> In my opinion, this is the type of problem that has to be solved at its root,
> by slowly migrating the filesystem itself to accept only UTF-8 filenames. All
> conversions during the migration phase have to be done by the operating
> system itself; when moving files from one FS to the other, it would do the
> necessary conversions. It's not going to be easy, though.
>
No encoding conversion is easy :-(.
--
Beni Cherniavsky <cben at tx.technion.ac.il>
Do not feed the Bugzillas.
More information about the Python-list
mailing list