Coding challenge: Optimise a custom string encoding

Alex Willmer alex at moreati.org.uk
Mon Aug 18 17:27:16 EDT 2014


On Monday, 18 August 2014 21:16:26 UTC+1, Terry Reedy  wrote:
> On 8/18/2014 3:16 PM, Alex Willmer wrote:
> > A challenge, just for fun. Can you speed up this function?
> 
> You should give a specification here, with examples. You should perhaps 

Sorry, the (informal) spec was further down.

> > a custom encoding to store unicode usernames in a config file that only allowed mixed case ascii, digits, underscore, dash, at-sign and plus sign. We also wanted to keeping the encoded usernames somewhat human readable.

> > My design was utf-8 and a variant of %-escaping, using the plus symbol. So u'alic EURO 123' would be encoded as b'alic+e2+82+ac123'.

Other examples:
>>> plus_encode(u'alice')
'alice'
>>> plus_encode(u'Bacon & eggs only $19.95')
'Bacon+20+26+20eggs+20only+20+2419+2e95'
>>> plus_encode(u'ünïcoԁë')
'+c3+bc+ef+bd+8e+c3+af+ef+bd+83+ef+bd+8f+d4+81+c3+ab'

> You should perhaps be using .maketrans and .translate.

That wouldn't work, maketrans() can only map single bytes to other single bytes. To encode 256 possible source bytes with 66 possible symbols requires a multi-symbol expansion of some or all source bytes.



More information about the Python-list mailing list