[ python-Bugs-926427 ] OEM codepage chars in mbcs filenames are misinterpreted

SourceForge.net noreply at sourceforge.net
Tue Mar 30 23:05:00 EST 2004


Bugs item #926427, was opened at 2004-03-30 21:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=926427&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Mike Brown (mike_j_brown)
Assigned to: Nobody/Anonymous (nobody)
Summary: OEM codepage chars in mbcs filenames are misinterpreted

Initial Comment:
My system: Windows XP, English - US locale, Python 
2.3.3

I believe the bug I am reporting here is this:

On Windows XP, when using os.listdir() with a non-
Unicode argument, characters that are not in the 
default locale's encoding (e.g. Greek capital letter 
Sigma, (U+03A3), is not in windows-1252), but that are 
in the default OEM code page (e.g. Sigma is in cp437), 
get mapped to ASCII characters other than '?'.

For example, things seem to work in a predictable way 
when I put windows-1252 characters into filenames (I 
do this in Explorer and then I see what os.listdir
(r'C:\path\to\the\dir') returns):

— (U+2014) becomes \x97
• (U+2022) becomes \x95
é (U+00E9) becomes \xe9

But things are much less predictable when I use 
characters from outside this range. I thought I'd try 
some Greek characters first. Some of them (the ones 
that happen to be in cp437, interestingly enough) come 
back as random ASCII letters:

Θ (U+0398) becomes "T"
Σ (U+03A3) becomes "S"
Φ (U+03A6) becomes "F"

Greek letters that are not in cp437 come back as 
question marks, as expected (I guess):
Τ (U+03A4) becomes "?"
Υ (U+03A5) becomes "?"

...as do some Hebrew letters and Japanese hiragana:
א (U+05D0) becomes "?"
ה (U+05D4) becomes "?"
ס (U+05E1) becomes "?"
あ (U+305F) becomes "?"
う (U+3046) becomes "?"
た (U+3042) becomes "?"

I don't know if this is something that anyone cares 
about, since the filenames are useless anyway, but it 
does seem to be unintended behavior.

(And before you ask, it's just a theoretical exercise; I 
have no urgent need to use os.listdir with non-Unicode 
directory names on Windows.)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=926427&group_id=5470



More information about the Python-bugs-list mailing list