[Python-3000] Unicode and OS strings

martin at v.loewis.de martin at v.loewis.de
Thu Sep 20 15:51:16 CEST 2007


> On Linux, filenames are *byte* string and not *character* string.

That's not true, although this is a wide-spread misunderstanding.

The POSIX standard defines that the file names must be a superset
of the portable character set, which includes things such as '/',
which is the path separator.

> I always
> have his problem with Python 2.x. I converted filename (argv[x]) to Unicode
> to be able to format error messages in full unicode... but it's not possible.
> Linux allows invalid utf8 filename even on full utf8 installation (ubuntu),
> see Marcin's examples.

True. However, this does not mean that the file names are byte strings -
they are character strings in an unspecified/undetermined encoding.

Regards,
Martin




More information about the Python-3000 mailing list