pep 277, Unicode filenames & mbcs encoding &c.

Edward K. Ream edreamleo at charter.net
Wed Oct 22 07:44:49 EDT 2003


Many thanks, Martin, for these comments.  They are so helpful...

> You should either use only Unicode strings, or only byte strings. The
> functions of os.path are not all affected by the PEP 277
> implementation (although they probably should).

My working assumption is that all strings in my app must be Unicode strings.
For example, the crashes happening right now trying to support Unicode
filenames occur when a string is converted to Unicode in situations like:

if fileName1 == fileName2:

where one fileName is a unicode string and the other isn't yet.  That's why
I wanted to do:

myFile = unicode(__file__, "mbcs", "strict")

The challenge in my app is to make sure the proper encoding is used in the
more than 30 situations where a filename gets created somehow.  Naturally,
that's not your problem, nor PEP 277's problem either :-)

> > If so, what encoding should be specified when converting to Unicode?
>
> Nobody knows, but the convention is to use the locale's encoding, as
> returned by locale.getpreferredencoding().

Thanks for this.

Edward
--------------------------------------------------------------------
Edward K. Ream   email:  edreamleo at charter.net
Leo: Literate Editor with Outlines
Leo: http://webpages.charter.net/edreamleo/front.html
--------------------------------------------------------------------






More information about the Python-list mailing list