[ python-Bugs-896236 ] Unicode problem in os.path.getsize ?

SourceForge.net noreply at sourceforge.net
Mon Feb 16 09:55:26 EST 2004


Bugs item #896236, was opened at 2004-02-13 03:49
Message generated for change (Settings changed) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=896236&group_id=5470

Category: Python Library
Group: Python 2.3
>Status: Closed
>Resolution: Wont Fix
Priority: 5
Submitted By: Ronald L. Rivest (ronrivest)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode problem in os.path.getsize ?

Initial Comment:
I am running on Windows XP 5.1 using python version 2.3.
The following simple code fails on my system.

for dirpath,dirnames,filenames in os.walk("C:/"):
    for name in filenames:
	pathname = os.path.join(dirpath,name)
	size = os.path.getsize(pathname)
	print size, pathname

I get an error from getsize that the file given by 
pathname does not exist.  When it breaks, the
variable "name" contains two question marks, which
makes me think that this is a Unicode problem.

In any case, shouldn't names returned by walk be
acceptable in all cases to getsize???




 
            
            


----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2004-02-16 15:55

Message:
Logged In: YES 
user_id=21627

This behaviour is standard behaviour of Win32, and,
disturbing as it may sound, is somewhat outside Python's
control.

When a file is found whose name cannot be represented in the
system code page (CP_ACP, the "ANSI" code page), then
non-representable characters are converted to question
marks. What's worse: "roughly-representable" characters are
sometimes converted to look-alike characters.

When passing back such a file name to the Win32, it will not
find the file, as it does have question marks in it.

Withe the "ANSI" API, there is really no solution. Instead,
you should use Unicode file names, i.e. write

for dirpath,dirnames,filenames in os.walk(u"C:/"):

Closing as "won't fix".

----------------------------------------------------------------------

Comment By: Ronald L. Rivest (ronrivest)
Date: 2004-02-14 02:46

Message:
Logged In: YES 
user_id=863876

TJREEDY -- Thanks for the reply...

To answer your questions:
   (1) What does Windows show when I visit the directory?
        -- I have several files in this directory that have
the same
            problem.  It is a hard, reproducible problem, not a
            transient glitch.   The files are mp3 files that
have 
            the name "prelude.mp3", except that the first "e" is
            replaced by two question marks (for Python) or by 
            two "boxes" in Windows Explorer.  I would guess that
            this is some funky representation of the french "e"
            with an "accent aigu".  
    (2) What does "dir" do in a Command Prompt?
        -- From a command prompt, I see two question marks
            at the problematic position.

Does Windows allow one to create filenames with characters
in the filename that are illegal for Windows?  

As I said in the original post, I find it very disturbing that
os.walk should return a filename that os.path.exists says
doesn't exist!  If you can walk the directory and find the
file, then os.path.exists (or, equivalently, os.path.getsize),
should find it!  This looks like a Python bug to me... no?

    Cheers,
    Ron Rivest



----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2004-02-14 01:47

Message:
Logged In: YES 
user_id=593130

Though it might be, I suspect that this is not a Python bug.  
Whether is it a Windows design or coding bug in is another 
matter.

>variable "name" contains two question marks, which
>makes me think that this is a Unicode problem.

Since '?' is not legal in filenames, as you seem to know, I 
more believe this is the Windows substitute, in the Win 
function called by os.listdir and os.walk, for illegal characters 
in the filename.  So of course getsize, which wraps os.stat(), 
which calls a system function, chokes on it.

Could be disk bit glitch, or bad program writing directly to 
directory block.  Happened to me once - difficult to get rid of.

What does Windows Explorer show when you visit that 
directory?  Ditto for 'dir' in a CommandPrompt window
(Start/Accessories)? 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=896236&group_id=5470



More information about the Python-bugs-list mailing list