Good cross-version ASCII serialisation protocol for simple types

Jussi Piitulainen jpiitula at ling.helsinki.fi
Sat Feb 23 11:06:11 EST 2013


Paul  Moore writes:

> I need to transfer some data (nothing fancy, some dictionaries,
> strings, numbers and lists, basically) between 2 Python
> processes. However, the data (string values) is potentially not
> ASCII, but the transport is (I'm piping between 2 processes, but
> thanks to nasty encoding issues, the only characters I can be sure
> won't be mangled are ASCII).
> 
> What's the best ASCII-only protocol to use that's portable between
> versions of Python back to about 2.6/2.7 and in the stdlib, so I
> don't need external modules?
> 
> At the moment, I'm using
> 
> encoded = json.dumps([ord(c) for c in json.dumps(obj)])
> decoded = json.loads(''.join([chr(n) for n in json.loads(encoded)]))
> 
> The double-encoding ensures that non-ASCII characters don't make it
> into the result.
> 
> This works fine, but is there something simpler (i.e., less of a
> hack!) that I could use? (Base64 and the like don't work because
> they encode bytes->strings, not strings->strings).

I don't know much of these things but I've been using Python's
json.dump and json.load for a couple of weeks now and they seem to use
ASCII-friendly escapes automatically, writing a four-character string
as "\u00e4\u00e4ni" instead of using the UTF-8 characters that my
environment is set to handle. That's written to stdout which is then
directed to a file in a shell script, and I copy-pasted it here from
the resulting file.

I'm using Python 3.3, though.



More information about the Python-list mailing list