[issue18679] include a codec to handle escaping only control characters but not any others

Derek Wilson report at bugs.python.org
Thu Aug 8 18:23:03 CEST 2013


Derek Wilson added the comment:

> ast.literal_eval("'%s'" % e)

this doesn't work if you use the wrong quote. without introspecting the data in e you can't reliably choose whether to use "'%s'" '"%s"' '"""%s"""' or "'''%s'''". which ones break (and break siliently) depend on the data.


> e.encode().decode('unicode-escape').encode('latin1').decode()

so ... encode the repr()[1:-1] string in utf-8 bytes, decode backslash escape sequences and individual bytes as if they are latin1, encode as latin1 (which is just byte for byte serialization), then decode the byte representation as if it is utf-8 encoded to recombine the characters that were broken with the 'unicode-escape' decode earlier? 

this may work for my example, but this looks and feels very hacky for something that should be simple and straight forward. and again tools other than python will run into escaped quotes in the data which may cause problems.

> e.encode('latin1', 'backslashescape').decode('unicode-escape')

when i execute this i get a traceback

LookupError: unknown error handler name 'backslashescape'

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18679>
_______________________________________


More information about the Python-bugs-list mailing list