[Python-porting] json encoding

M.-A. Lemburg mal at egenix.com
Sat Jun 20 10:24:24 CEST 2015


On 20.06.2015 01:18, R. David Murray wrote:
> On Fri, 19 Jun 2015 14:41:59 -0700, Clay Gerrard <clay.gerrard at gmail.com> wrote:
>> Has anyone ported any projects that use json as a data interchange format -
>> or can think of any that probably had to port some json handling code?
>>
>> I'm perplexed that python3's json.dumps returns a str.  I mean I know it's
>> dump*s* - but that was from back when we only had binary strings ;)
>>
>> My reading leads me to believe that the json format is not a "string" - its
>> a binary format - encoded in utf-8 [1]
> 
> Well, utf-8 is the wire encoding, but the *intent* (as I understand
> it) is that json is a unicode string.
> 
>> Why would want to first create a unicode string - then encode it to utf-8?
>> What else are you doing with this this data exchange format if not using it
>> to exchange the representation of the data encoded as bytes to somewhere
>> else?
>>
>>     json_data = json.dumps(my_object)
>>     if not isinstance(json_data, six.bytes_type):
>>         json_data = json_data.encode('utf-8')
>>     my_buffer = BytesIO(json_data)
>>
>> ^ Is that really the best I can do?
>>
>> Why did json.dumps in python3 [3] loose the encoding kwarg from python2 [2].
> 
> Because json is a string.  You encode it to utf-8 using the string
> encode method when you put it on the wire.  It's not json if its not
> unicode or utf-8 (or -16 or -32), so it wouldn't make sense to have an
> encoding keyword that can take an arbitrary codec.

The encoding argument in Python 2 refers to the encoding to use
for binary strings in the object you pass to json.dumps():

https://docs.python.org/2/library/json.html#json.dump
"""
encoding is the character encoding for str instances, default is UTF-8.
"""

In Python 3, bytes are not supported by the json module, so you
don't need the encoding parameter anymore.

Python 2's json will convert all strings to Unicode on input
and then write out a UTF-8 encoded string or, when using
ensure_ascii, an ASCII string with escape sequence.

>> Thanks for any guidance or suggestion - maybe I'm just thinking about it
>> wrong.
> 
> I haven't done any code that used json and needed to run on both python2
> and python3, so I can't tell you if there is an easier way.  But in general
> you do have to jump through some hoops when supporting both dialects.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 20 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2015-06-16: Released eGenix pyOpenSSL 0.13.10 ... http://egenix.com/go78
2015-06-10: Released mxODBC Plone/Zope DA 2.2.2   http://egenix.com/go76
2015-07-20: EuroPython 2015, Bilbao, Spain ...             30 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-porting mailing list