Coding challenge: Optimise a custom string encoding

Chris Angelico rosuav at gmail.com
Mon Aug 18 19:28:08 EDT 2014


On Tue, Aug 19, 2014 at 5:16 AM, Alex Willmer <alex at moreati.org.uk> wrote:
> Back story:
> Last week we needed a custom encoding to store unicode usernames in a config file that only allowed mixed case ascii, digits, underscore, dash, at-sign and plus sign. We also wanted to keeping the encoded usernames somewhat human readable.
>

If you can drop the "somewhat human readable" requirement, this fits
perfectly into a Base 64 encoding. All you need to do is this:

>>> import base64
>>> base64.b64encode("alic€123".encode(),b"+@").replace(b'=',b'-')
b'YWxpY+KCrDEyMw--'


The second argument specifies that, instead of the usual + and / for
the last two, + and @ are used instead. (The last step is because
Python's b64encode doesn't allow customization of the padding
character. Alternatively, you could simply rstrip() them, and
reinstate them by rounding up to four input bytes.)

Decoding is, obviously, the reverse:

>>> base64.b64decode(_.replace(b'-',b'='),b"+@").decode()
'alic€123'

This is done in Python 3, not Python 2. But I expect it'll work the
same way in 2.7.

ChrisA



More information about the Python-list mailing list