Help with Latin Characters

Terry Reedy tjreedy at udel.edu
Sun Jul 24 14:30:10 EDT 2011


On 7/24/2011 11:15 AM, Joao Jacome wrote:
> http://pastebin.com/aMrzczt4

         list = os.listdir(dir)
While somewhat natural, using 'list' as a local name and masking the 
builtin list function is a *very bad* idea. Someday you will do this and 
then use 'list(args)' expecting to call the list function, and it will 
not work.

> When the script reaches a file with latin characters (ê é ã etc) it crashes.
>
> Traceback (most recent call last):
>    File "C:\backup\ORGANI~1\teste.py", line 37, in <module>
>      Retrieve(rootdir);
>    File "C:\backup\ORGANI~1\teste.py", line 25, in Retrieve
>      Retrieve(os.path.join(dir,filename))
>    File "C:\backup\ORGANI~1\teste.py", line 18, in Retrieve
>      print l
>    File "C:\Python27\lib\encodings\cp850.py", line 12, in
> ejavascript:void(0);ncode
>      return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character u'\x8a' in
> position 4
> 3: character maps to <undefined>

'\x8a' *is* the cp850 encoded byte for reverse accent e: è
But your program treats is a unicode value, where it is a control char 
(Line Tabulation Set), and tries to encode it to cp850, which is not 
possible.

I suspect this has something to do with defining the rootdir as a 
unicode string: rootdir = u"D:\\ghostone"
Perhaps if you removed the 'u', your program would work.
Or perhaps you should explicitly decode the values in os.listdir(dir) 
before joining them to the rootdir and re-encoding.

This sort of thing sometimes works better with Python 3.

> Does someone knows how to fix this?
>
> Thank you!
>
> João Victor Sousa Jácome
>


-- 
Terry Jan Reedy





More information about the Python-list mailing list