[Python-3000] String formating operations in python 3k

Ian Bicking ianb at colorstudy.com
Mon Apr 3 22:23:55 CEST 2006


Barry Warsaw wrote:
>>Even what Mailman 
>>does is potentially slightly unsafe if they were to accept input to _() 
>>from untrusted sources, though exploiting str() is rather hard, and 
>>Mailman presumably has at least a moderate amoung of trust for translators.
> 
> 
> Right, the attack vector would be through a broken translation (either
> maliciously or inadvertently) accessing a local unescaped string causing
> an XSS exploit.

I hadn't even thought of that one; XSS opens up a whole new batch of 
security errors related to string substitution.  Ideally in this case, 
then, you'd actually do HTML escaping on the extracted locals before 
string substitution.  You could do this in _(), but you'd have to pass 
something in to indicate if you were creating HTML/XML or plain text.


>>It's not actually unreasonable that translation strings could contain 
>>expressions, though it's unlikely that Python expressions are really 
>>called for.  Like with pluralization: "Displaying $count ${'user' if 
>>count==1 else 'users'}" is reasonable, though a more constrained syntax 
>>would probably be more usable for the translators.  It seems there's a 
>>continuum of use cases.
> 
> 
> Except with some language's plural forms (e.g. Polish IIUC) simple
> expressions like that won't cut it.  

"Simple", sure, but with the full power of Python expressions you can 
manage any pluralization, even if the string degrades into one big chunk 
of code squeezed into an expression.  Though a DSL will also be more 
appropriate for these rules than Python syntax.

> OTOH, gettext has facilities for
> supporting all those bizarre plural forms so I don't think we have to
> reinvent them in Python (though we may need to do more to support them).

It's not magic, it's just code, be that code in gettext or directly in 
the translation strings.  E.g., "%{user}s es %{'bonita' if user.gender 
== 'f' else 'guapo'}".  You can't tell me gettext also has support for 
gender-appropriate adjectives!

This is all wandering off-topic, except that all these cases make me 
think that different kinds of wrapping are very useful.  For instance, 
if you want to make sure everything is quoted before being inserted:

class EscapingWrapper:
     def __init__(self, d):
         self.d = d
     def __getitem__(self, item):
         return cgi.escape(str(self.d[item]), 1)

Or if you want expressions:

class EvalingWrapper:
     def __init__(self, d):
         self.d = d
     def __getitem__(self, item):
         return eval(item, d)

Then you do:

string.Template(pattern).substitute(EscapingWrapper(EvalingWrapper(locals()))

Probably wrapping that in a function of some sort, of course, because 
it's no longer something you just whip out on a whim.  In this case 
Template.substitute works nicely, but str.format would not work well if 
it required **kw for named arguments (since these wrappers can't be 
turned into actual dictionaries).

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org


More information about the Python-3000 mailing list