[Python-Dev] Security implications of pep 383

"Martin v. Löwis" martin at v.loewis.de
Tue Mar 29 20:56:41 CEST 2011

> Not sure how real the security risk is here:
>     http://blog.omega-prime.co.uk/?p=107
> Basically  he is saying that if you store a list of blacklisted files
> with names encoded in big-5 (or some other non-utf8 compatible encoding)
> if those names are passed at the command line, or otherwise read in and
> decoded from an assumed-utf8 source with surrogate escaping, the
> surrogate escape decoded names will not match the properly decoded
> blacklisted names.

As described, I find the problem a little bit artificial: supposedly,
he was passing the file name on the command line. However, since his
terminal is in UTF-8 and the file name in Big5, the console didn't
display the file name in a meaningful way when he ran the program. So
whoever ran the program ignored the moji-bake, and didn't wonder whether
it could have any effect on proper functioning of the program. In
addition, if he did ls(1) on the directory, it would have displayed
question marks throughout. This should alert the user that something bad
is going on.

Notice that this isn't really PEP-383's fault. If the file system
encoding was UTF-8, and the blacklist was UTF-8, and the program
ran in a Latin-1 locale, it would have decoded the file name nicely
(without surrogates), but the blacklist check would still have failed.

He should have opened the file in the locale's encoding (i.e. giving no
encoding), using the surrogate escape handler.


More information about the Python-Dev mailing list