unicode filenames

Erik Max Francis max at alcyone.com
Sun Feb 2 20:45:19 EST 2003


Andrew Dalke wrote:

> I normally use unix.  What's the right way to treat filenames
> under that OS?  As Latin-1?  Or UTF-8?  As far as I can tell,
> filenames are simply bytes, so I can make whatever interpretation
> I want on the characters, and the standard viewpoint is to
> interpret those characters as Latin-1.

I believe that's the most common interpretation, but as you say, it
doesn't much matter since filenames in UNIX are just considered streams
of bytes.  No reference to an encoding -- as far as I know -- is made in
any UNIX-relevant standard.

> Does that mean Unix filenames can't contain non-Latin-1 characters?
> Or does it mean I need to get the info on how to interpret the
> filename using something from the current environment?

It means that filenames are strings of bytes.  What the meaning of those
bytes are is entirely application dependent.  They could be raw ASCII
(the most common), Latin-1 (probably the most common with filenames that
contain bytes with the MSB set), or any other encoding whatsoever.  It's
applications that make the files, it's applications that decide what
encoding to use.

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/  \ The quickest way of ending a war is to lose it.
\__/ George Orwell
    REALpolitik / http://www.realpolitik.com/
 Get your own customized newsfeed online in realtime ... for free!




More information about the Python-list mailing list