Good cross-version ASCII serialisation protocol for simple types

Chris Angelico rosuav at gmail.com
Sat Feb 23 11:00:10 EST 2013


On Sun, Feb 24, 2013 at 2:45 AM, Paul  Moore <p.f.moore at gmail.com> wrote:
> At the moment, I'm using
>
> encoded = json.dumps([ord(c) for c in json.dumps(obj)])
> decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
>
> The double-encoding ensures that non-ASCII characters don't make it into the result.
>
> This works fine, but is there something simpler (i.e., less of a hack!) that I could use? (Base64 and the like don't work because they encode bytes->strings, not strings->strings).

Hmm. How likely is it that you'll have non-ASCII characters in the
input? If they're fairly uncommon, you could use UTF-7 - it's fairly
space-efficient when the input is mostly ASCII, but inefficient on
other characters.

Not sure what the problem is with bytes vs strings; you can always do
an encode("ascii") or decode("ascii") to convert 7-bit strings between
those types.

With that covered, I'd just go with a single JSON packaging, and work
with the resulting Unicode string.

Python 2.6:
>>> s=u"asdf\u1234zxcv"
>>> s.encode("utf-7").decode("ascii")
u'asdf+EjQ-zxcv'

Python 3.3:
>>> s=u"asdf\u1234zxcv"
>>> s.encode("utf-7").decode("ascii")
'asdf+EjQ-zxcv'

Another option would be to JSON-encode in pure-ASCII mode:

>>> json.dumps([s],ensure_ascii=True)
'["asdf\\u1234zxcv"]'

Would that cover it?

ChrisA



More information about the Python-list mailing list