urllib unqoute providing string mismatch between string found using os.walk (Python3)

Richard Damon Richard at Damon-Family.org
Sat Dec 21 21:36:16 EST 2019


On 12/21/19 8:25 PM, MRAB wrote:
> On 2019-12-22 00:22, Michael Torrie wrote:
>> On 12/21/19 2:46 PM, Ben Hearn wrote:
>>> These 2 paths look identical, one from the drive & the other from an
>>> xml url:
>>> a = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
>>> ¡Móchate! _PromoMix_.wav'
>>                                                                 ^^
>>> b = '/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf - ¡Móchate!
>>> _PromoMix_.wav'
>>                                                                 ^^
>> They are actually are different strings.  The name is spelled
>> differently between the two.  Móchate vs Móchate (the former seems to
>> be the correct spelling according to my inline spell checker).  Is this
>> from your own program? I wonder how it got switched?
>>
> Use the 'ascii' function to see what's different:
>
> >>> ascii(a)
> "'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
> \\xa1Mo\\u0301chate! _PromoMix_.wav'"
> >>> ascii(b)
> "'/Users/macbookpro/Music/tracks_new/_NS_2018/J.Staaf -
> \\xa1M\\xf3chate! _PromoMix_.wav'"
> >>>

It is a Unicode Normalization issue. A number of characters can be
'spelled' different ways.

ó can be either a single codepoint U+00F3, or it can be the pair of
codepoints, the o and U+0301 (the accent).

If you want to make the strings compare equal then you need to make sure
that you have normalized both strings the same way. I beleive that the
Mac OS always converts file names into the NFD format when it uses them
(that is what the first (a) string is in)

-- 
Richard Damon



More information about the Python-list mailing list