Putting Unicode characters in JSON

Thomas Jollans tjol at tjol.eu
Thu Mar 22 20:27:46 EDT 2018


On 22/03/18 20:46, Tobiah wrote:
> I was reading though, that JSON files must be encoded with UTF-8.  So
> should I be doing string.decode('latin-1').encode('utf-8')?  Or does
> the json module do that for me when I give it a unicode object?

Definitely not. In fact, that won't even work.

>>> import json
>>> s = 'déjà vu'.encode('latin1')
>>> s
b'd\xe9j\xe0 vu'
>>> json.dumps(s.decode('latin1').encode('utf8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.6/json/encoder.py", line 180, in default
    o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable
>>>

You should make sure that either the file you're writing to is opened as
UTF-8 text, or the ensure_ascii parameter of dumps() or dump() is set to
True (the default) – and then write the data in ASCII or any
ASCII-compatible encoding (e.g. UTF-8).

Basically, the default behaviour of the json module means you don't
really have to worry about encodings at all once your original data is
in unicode strings.

-- Thomas



More information about the Python-list mailing list