unicode filenames

Beni Cherniavsky cben at techunix.technion.ac.il
Thu Feb 6 09:29:32 EST 2003


On 2003-02-06, Carlos Ribeiro wrote:

> On Thursday 06 February 2003 11:16 am, Beni Cherniavsky wrote:
> > Since unix can afford to change all APIs and programs like windows did
> > (the mess that resulted explains why <wink>), unix must stay with the
> > byte-orineted filenames at the low level.  This ensures that all programs
> > that store file names in files, etc., continue to work.  UTF-8 is the only
> > encoding that can represent all of unicode that satisfies all these needs,
> > so everybody should migrate to UTF-8 filenames (CJK users might have
> > reservations to this; I'd be happy to learn their opinion).
>
> Sorry. It would be a big mess. Here in Brazil, I can safely assume that it is
> nearly impossible to find a computer *without* filenames with latin-1
> accented characters. Not to mention the problems that we have when mounting
> FAT partitions under Linux - many Unix users still need to use dual boot
> machines in order to use a few Windows apps.
>
If you use latin1 everywhere on the computer, you are OK too.  Just don't
have one directory in latin1, another in latin8 and another in UTF-8.

If and when you decide to convert to UTF-8, you can run one script to
convert the whole filesystem.  The problem will be with remaining
filenames lurking in files (e.g. playlists).  That most probably requires
a period of manual fix-as-it-breaks after the conversion...

> In my opinion, this is the type of problem that has to be solved at its root,
> by slowly migrating the filesystem itself to accept only UTF-8 filenames. All
> conversions during the migration phase have to be done by the operating
> system itself; when moving files from one FS to the other, it would do the
> necessary conversions. It's not going to be easy, though.
>
No encoding conversion is easy :-(.

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>

Do not feed the Bugzillas.





More information about the Python-list mailing list