[Python-ideas] Fix the DRY problem (was Re: PEP 501 - i18n with marked strings)

Sat Aug 15 14:27:54 CEST 2015

On 08/13/2015 07:58 AM, Eric V. Smith wrote:
> On 08/13/2015 12:37 AM, Guido van Rossum wrote:
>> On Wed, Aug 12, 2015 at 6:06 PM, Barry Warsaw <barry at python.org
>> <mailto:barry at python.org>> wrote:
> 
>>
>>         placeholders = source_string.extract_placeholders()
>>         substitutions = scope(*placeholders)
>>         translated_string = i18n.lookup(source_string)
>>         return translated_string.safe_substitute(substitutions)
>>
>>     That would actually be quite useful.
>>
>>
>> Agreed. But whereas you are quite happy having only simple variable
>> names in i18n templates, the feature required for the non-i18n use case
>> really needs arbitrary expressions. If we marry the two, your i18n code
>> will just have to yell at the programmer if they use something too
>> complex for the translators as a substitution. So possibly PEP 501 can
>> be rescued. But I think we need separate prefixes for the PEP 498 and
>> PEP 501 use cases; perhaps f'{...}' and _'{...}'. (But it would not be
>> up to the compiler to limit the substitution syntax in _'{...}')
> 
> For the sake of the following argument, let's agree to disagree on:
> - arbitrary expressions: we'll say yes
> - string prefix character: we'll say 'f'
> - how to identify expressions in a string: we'll say {...}
> 
> I promise we can bikeshed about these later. I'm just using the PEP 498
> version because I'm more familiar with it.
> 
> And let's say that PEP 498 will take this:
> 
> name = 'Eric'
> dog_name = 'Fluffy'
> f"My name is {name}, my dog's name is {dog_name}"
> 
> And convert it to this (inspired by Victor):
> 
> "My name is {0}, my dog's name is {1}".format('Eric', 'Fluffy')
> Resulting in:
> "My name is Eric, my dog's name is Fluffy"
> 
> It seems to me that all you need for i18n is to instead make it produce:
> 
> __i18n__("My name is {0}, my dog's name is {1}").format('Eric', 'Fluffy')
> 
> The __i18n__ function would do whatever lookup is needed to produce the
> translated string. So, in some English dialect where pet names had to
> come first, it could return:
> 'The owner of the dog {1} is named {0}'
> 
> So the result would be:
> 'The owner of the dog Fluffy is named Eric'
> 
> I promise we can bikeshed about the name __i18n__.
> 
> So the translator has no say in how the expressions are evaluated. This
> removes any concern about information leakage. If the source code said:
> f"My name is {name}, my dog's name is {dog_name.upper()}"
> 
> then the string being passed to __i18n__ would remain unchanged. If by
> convention you wanted to not use arbitrary expressions and just use
> identifiers, then just make it a coding standard thing. It doesn't
> affect the implementation one way or the other.
> 
> The default implementation for my proposed __i18n__ function (probably a
> builtin) would be just to return its string argument. Then you get the
> PEP 498 behavior. But in your module, you could say:
> __i18n__ = gettext.gettext
> and now you'd be using that machinery.
> 
> The one downside of this is that the strings that the translator is
> translating from do not appear in the source code. The translator would
> have to know that the string being translated is:
> "My name is {0}, my dog's name is {1}"

Okay, here's a new proposal that handles Barry's concern about the
format strings passed to __i18n__ not having the same contents as the
source code.

Instead of translating:
name = 'Eric'
dog_name = 'Fluffy'
f"My name is {name}, my dog's name is {dog_name}"

to:
__i18n__("My name is {0}, my dog's name is {1}").format('Eric', 'Fluffy')

We instead translate it to:
__i18n__("My name is {name}, my dog's name is
{dog_name}").format_map({'name':'Eric', 'dog_name':'Fluffy')

The string would be unchanged from value of the f-string. The keys in
the dict would be exactly the expressions inside the braces in the
f-string. The values in the dict would be the value of the expressions
in the f-string.

This solution works for cases where the expressions inside braces are
either simple identifiers, or are more complicated expressions. For i18n
work, I'd expect them to all be simple identifiers, but that need not be
the case. I consider this a code review item.

We could add something like's PEP 501's iu-strings, that would be
interpolated but not translated, so we could mix translated and
non-translated strings in the same module. Probably not spelled
fu-strings, though!

We'd probably want to add a str.safe_format_map to match the behavior of
string.Template.safe_substitute, or add a parameter to str.format_map.
I'm not sure how this parameter would get set from an f-string, or if it
would always default to "safe" for the __i18n__ case.

Maybe instead of __i18n__ just doing the string lookup, it would also be
responsible for calling .format_map or .safe_format_map, so it could
choose the behavior it wanted on a per-module basis.

Eric.