[Python-ideas] Binary f-strings

Steven D'Aprano steve at pearwood.info
Mon Sep 28 04:09:58 CEST 2015


On Sun, Sep 27, 2015 at 09:23:30PM -0400, Eric V. Smith wrote:
> Now that f-strings are in the 3.6 branch, I'd like to turn my attention
> to binary f-strings (fb'' or bf'').
> 
> The idea is that:
> 
> >>> bf'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'
> 
> Might be translated as:
> 
> >>> (b'datestamp:' +
> ...  bytes(format(datetime.datetime.now(),
> ...               str(b'%Y%m%d', 'ascii')),
> ...        'ascii') +
> ...  b'\r\n')

What's wrong with this?

f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n'.encode('ascii')

This eliminates all your questions about which encoding we should guess 
is more useful (ascii? utf-8? something else?), allows the caller 
to set an error handler without inventing yet more cryptic format codes, 
and is nicely explicit.

If people are worried about the length of ".encode(...)", a helper 
function works great:

def b(s): return bytes(s, 'utf-8')  
# or whatever encoding makes sense for them

b(f'datestamp:{datetime.datetime.now():%Y%m%d}\r\n')


> Which would result in:
> b'datestamp:20150927\r\n'
> 
> The only real question is: what encoding to use for the second parameter
> to bytes()? Since an object must return unicode from __format__(), I
> need to convert that to bytes in order to join everything together. But how?
> 
> Here I suggest 'ascii'. Unfortunately, this would give an error if
> __format__ returned anything with a char greater than 127. I think we've
> learned that an API that only raises an exception with certain specific
> inputs is fragile.
> 
> Guido has suggested using 'utf-8' as the encoding. That has some appeal,
> but if we're designing this for wire protocols, not all protocols will
> be using utf-8.

Using UTF-8 is not sufficient, since there are strings that can't be 
encoded into UTF-8 because they contain surrogates:

py> '\uDA11'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\uda11' in 
position 0: surrogates not allowed


but we surely don't want to suppress such errors by default. Sometimes 
they will be an error that needs fixing.



-- 
Steve


More information about the Python-ideas mailing list