python 3.1 unicode question

Mark Tolonen metolone+gmane at gmail.com
Wed Sep 16 00:25:23 EDT 2009


"jeffunit" <jeff at jeffunit.com> wrote in message 
news:20090915144123964.LJKA6569 at cdptpa-omta01.mail.rr.com...
>I wrote a program that diffs files and prints out matching file names.
> I will be executing the output with sh, to delete select files.
>
> Most of the files names are plain ascii, but about 10% of them have 
> unicode
> characters in them. When I try to print the string containing the name, I 
> get
> an exception:
>
> 'ascii' codec can't encode character '\udce9'
> in position 37: ordinal not in range(128)
>
> The string is:
>
> './Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'
>
> This is on a windows xp system, using python 3.1 which I compiled
> with the cygwin
> linux compatability layer tool.
>
> Can you tell me what encoding I need to print \udce9 and how to set python 
> to
> that encoding mode?

That looks like a "surrogate escape" (See PEP 383) 
http://www.python.org/dev/peps/pep-0383/.  It indicates the wrong encoding 
was used to decode the filename.

-Mark





More information about the Python-list mailing list