UnicodeDecodeError: problem when path contain folder startwithcharacter 'u

Mark Tolonen metolone+gmane at gmail.com
Tue Jun 23 10:27:41 EDT 2009


"aberry" <aberry at aol.in> wrote in message 
news:24164207.post at talk.nabble.com...
> Mark Tolonen-3 wrote:
>> "aberry" <aberry at aol.in> wrote in message
>> news:24146775.post at talk.nabble.com...
>>>
>>> I am facing an error on Unicode decoding of path if it contain a
>>> folder/file
>>> name starting with character 'u' .
>>>
>>> Here is what I did in IDLE
>>> 1. >>> fp = "C:\\ab\\anil"
>>> 2. >>> unicode(fp, "unicode_escape")
>>> 3. u'C:\x07b\x07nil'
>>> 4. >>> fp = "C:\\ab\\unil"
>>> 5. >>> unicode(fp, "unicode_escape")
>>> 6.
>>> 7. Traceback (most recent call last):
>>> 8.   File "<pyshell#41>", line 1, in <module>
>>> 9.     unicode(fp, "unicode_escape")
>>> 10. UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in
>>> position
>>> 5-9: end of string in escape sequence
>>> 11. >>>
>>>
>>> Not sure whether I am doing something wrong or this is as designed
>>> behavior
>>> .
>>> any help appreciated
>>
>> What is your intent?  Below gives a unicode strings with backslashes.  No
>> need for unicode_escape here.
>>
>>>>> fp = "C:\\ab\\unil"
>>>>> fp
>> 'C:\\ab\\unil'
>>>>> print fp
>> C:\ab\unil
>>>>> unicode(fp)
>> u'C:\\ab\\unil'
>>>>> print unicode(fp)
>> C:\ab\unil
>>>>> u'C:\\ab\\unil'
>> u'C:\\ab\\unil'
>>>>> print u'C:\\ab\\unil'
>> C:\ab\unil
>
> thanks all for help...
> actually this was in old code having 'unicode_escape' .
> i hope it was there to handle path which may contain localized chars...
>
> but removing unicode_escape' it worked fine... :)

If that was the case, then here's a few other options:

>>> print 'c:\\\\abc\\\\unil\\xe4'.decode('unicode_escape')
c:\abc\unilä
>>> print r'c:\\abc\\unil\xe4'.decode('unicode_escape')
c:\abc\unilä
>>> print u'c:\\abc\u005cunil\u00e4'
c:\abc\unilä
>>> print ur'c:\abc\u005cunil\u00e4'
c:\abc\unilä

You can also use forward slashes as another poster mentioned.  If you want 
to display the filenames with the backslashes, os.path.normpath can be used:

>>> print os.path.normpath('c:/abc/unil\u00e4'.decode('unicode_escape'))
c:\abc\unilä

Note you only have to jump through these hoops to generate hard-coded 
filenames with special characters.  If they are already on disk, just read 
them in with something like os.listdir(u'.'), which generates a list of 
unicode filenames.

-Mark





More information about the Python-list mailing list