[Python-Dev] Security implications of pep 383
Victor Stinner
victor.stinner at haypocalc.com
Tue Mar 29 23:02:47 CEST 2011
Le mardi 29 mars 2011 à 22:45 +0200, Lennart Regebro a écrit :
> On Tue, Mar 29, 2011 at 22:40, Lennart Regebro <regebro at gmail.com> wrote:
> > The lesson here seems to be "if you have to use blacklists, and you
> > use unicode strings for those blacklists, also make sure the string
> > you compare with doesn't have surrogates".
> >
>
> For that matter, what happens with combining characters?
>
> '\N{LATIN SMALL LETTER O}\N{COMBINING DIAERESIS}' != '\N{LATIN SMALL
> LETTER O WITH DIAERESIS}'
>
> I guess the filesystem shouldn't treat these as the same (even though
> they are), but what if some webservice does?
Mac OS X does normalize filenames to a variant of the D (decomposed)
form.
http://www.haypocalc.com/tmp/unicode-2011-03-25/html/operating_systems.html#mac-os-x
> I suspect you should normalize both strings before comparing them in any blacklist,
Yes, but a blacklist is not safe: use a whitelist.
> and what happens with surrogates when you normalize?
Surrogates are not the same in forms N, D, KC and KD.
>>> unicodedata.normalize('NFC', '\uDC80') ==
unicodedata.normalize('NFC', '\uDC80') == unicodedata.normalize('NFKC',
'\uDC80') == unicodedata.normalize('NFKD', '\uDC80') == '\uDC80'
True
Victor
More information about the Python-Dev
mailing list