[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

Victor Stinner victor.stinner at haypocalc.com
Tue Oct 25 10:22:26 CEST 2011


Le Mardi 25 Octobre 2011 13:20:12 vous avez écrit :
> Victor Stinner writes:
>  > I propose to raise Unicode errors if a filename cannot be decoded
>  > on Windows, instead of creating a bogus filenames with questions
>  > marks.
> 
> By "bogus" you mean "sometimes (?) invalid and the OS will refuse to
> use them, causing a later hard-to-diagnose exception", rather than
> "not what the user thinks he wants", right?

If the ("Unicode") filename cannot be encoded to the ANSI code page, which is 
usually a small charset (e.g. cp1252 contains 256 code points), Windows 
replaces unencodable characters by question marks.

Imagine that the code page is ASCII, the ("Unicode") filename "hého.txt" will 
be encoded to b"h?ho.txt". You can display this string in a dialog, but you 
cannot open the file to read its content... If you pass the filename to 
os.listdir(), it is even worse because "?" is interpreted ("?" means any 
character, it's a pattern to match a filename).

I would like to raise an error on such situation, because currently the user 
cannot be noticed otherwise. The user may search "?" in the filename, but 
Windows replaces also unencodable characters by *similar glyph* (e.g. "é" 
replaced by "e").

> In the "hard errors" case, a hearty +1 (I'm dealing with this in an
> experimental version of XEmacs and it's a right PITA if the codec
> doesn't complain). 

If you use MultiByteToWideChar and WideCharToMultiByte, you can be noticed on 
error using some flags, but functions of the ANSI API doesn't give access to 
these flags...

> Backward compatibility is important, but here the
> costs of fixing such bugs outweigh the value of bug-compatibility.

I only want to change how unencodable filenames are handled, the bytes API will 
still be available. If you filesystem has the "8dot3name" feature enable, it 
may work even for unencodable filenames (Windows generates names like 
HEHO~1.TXT).

Victor


More information about the Python-Dev mailing list