Chinese character error

Mark Tolonen mark.e.tolonen at mailinator.com
Fri Feb 8 21:00:37 EST 2008


"Chris" <cwitts at gmail.com> wrote in message 
news:0d61dcee-5102-4d35-9b74-9538335d33ca at l32g2000hse.googlegroups.com...
> On Feb 8, 11:29 am, John Deas <john.d... at gmail.com> wrote:
>> Hi,
>>
>> I made a small script to recursively copy files from a directory tree
>> to an exportDir only if they have an mp3 extension :
>>
>> a=os.walk(os.getcwd())
>> for root, dirs, files in a:
>>         for currFile in files:
>>                 pathCurrFile=os.path.join(root, currFile)
>>                 if mp3Reg.search(pathCurrFile):
>>                         shutil.copy(pathCurrFile,exportDir)
>>                 else:
>>                         print pathCurrFile
>>
>> The problem is that I get stuck with files containing name in
>> Chinese :
>>
>> Traceback (most recent call last):
>>   File "/cygdrive/c/Documents and Settings/vku/Mes documents/Ma
>> musique/iTunes/i
>> Tunes Music/script.py", line 21, in <module>
>>     shutil.copy(pathCurrFile,exportDir)
>>   File "/usr/lib/python2.5/shutil.py", line 80, in copy
>>     copyfile(src, dst)
>>   File "/usr/lib/python2.5/shutil.py", line 46, in copyfile
>>     fsrc = open(src, 'rb')
>> IOError: [Errno 2] No such file or directory: '/cygdrive/c/Documents
>> and Setting
>> s/vku/Mes documents/Ma musique/iTunes/iTunes Music/Podcasts/Learn
>> Chinese - Chin
>> esePod/785 Advanced - ????.mp3'
>>
>> I am using python on cygwin, so could this be the source of the error,
>> and is there a way to fix this ?
>
> It has to do with the way the OS reports the filename.  Explorers GUI
> diplays it as square blocks and both CmdPrompt + Cygwin display it as
> Question marks as does the os.listdir in Python.  Copying Chinese
> Characters and checking their Ordinal Values directly from python gave
> me for eg. 230+188+162 for 1 Chinese Charater yet Python shows it is
> Ordinal 63 (a Question Mark) after reading the filename.
>
> Those files you will need to manually copy, I even tried
> find /cygdrive/d/Temp/ -name "*.mp3" -exec cp {} /cygdrive/d/Temp/
> test/ \;
> which yielded
> cp: cannot stat '/cygdrive/d/Temp/??.mp3': No such file or directory

If you call os.walk() with a Unicode string, it's return values will be 
Unicode as well and you should be able to process files with non-ASCII 
characters.  This worked for me (on Windows):

import os
import shutil
import fnmatch

exportDir = u'c:\\mp3s'

a=os.walk(os.getcwdu()) # Unicode version of os.getcwd()
for root, dirs, files in a:
    for currFile in files:
        pathCurrFile=os.path.join(root, currFile)
        if fnmatch.fnmatch(pathCurrFile,u'*.mp3'):
            shutil.copy(pathCurrFile,exportDir)
        else:
            print pathCurrFile

--Mark




More information about the Python-list mailing list