[Web-SIG] Time a for JSON parser in the standard library?

Alan Kennedy pywebsig at xhaus.com
Tue Mar 11 11:30:30 CET 2008


[Massimo]
> It would also be nice to have a common interface to all modules that
> do serialization. For example pickle, cPickle, marshall has dumps, so
> json should also have dumps.

Indeed, this is my primary concern also.

The reason is that I have a pure-java JSON codec for jython, that I
will either publish separately or contribute to jython itself.

If we're going to have the facility in both cpython and jython (and
probably ironpython, etc), then it would be optimal to have a
compatible API so that we have full interoperability. And given that
we in jython land are always left implementing cpython APIs (which are
not necessarily always the optimal design for jython) it would be nice
if we could agree on APIs, etc, *before* stuff goes into the standard
library.

The API for my codec is slightly different from simplejson, although
it could be made the same with a little work, including exception
signatures, etc.

But there are some things about my own design that I like. For
example, simplejson allows override of the JSON output representing
certain objects, by the use of subclasses of JSONEncoder. My design
does it differently; it simply looks for a "__json__()" callable on
every object being serialised, and if found, calls it and uses its
return value to represent the object. I have no equivalent of
simplejson's decoding extensions.

Another difference is the set of options. Simplejson has options to
control parsing and generation, and so does mine. But the sets of
options are different, e.g. simplejson has no option to permit/reject
dangling commas (e.g. "[1,2,3,]")*, whereas mine has no support for
accepting NaN, infinity, etc, etc.

On the encoding side, I simply make the assumption that all character
transcoding has happened before the JSON text reaches the JSON parser.
(I think this is a reasonable assumption, given that byte streams are
always associated with file storage, network transmission, etc, and
only the programmer has access to the relevant encoding information).
But given that RFC 4627 specifies how to guess encoding of JSON byte
streams, I'll probably change that policy.

Lastly, another area of potential cooperation is testing: I have over
100 unit-tests, with fairly extensive coverage. I think that test
coverage is very important in the case of JSON; you can never have too
many tests.

So, what is the best way to go about agreeing on the best API?

1. Discussion on web-sig?
2. Discussion on stdlib-sig?
3. Collaborative authoring/discussion on a WIKI page?
4. ????

Regards,

Alan.

* Which can mean different things to different software. Some
javascript interpreters interpret it as a 4 element list (inferring
the last object between the comma and the closing square bracket as a
null) , others as a 3 element list. Python obviously interprets it as
a 3-element list. So the general internet maxim "be liberal in what
you accept and strict in what produce" applies. My API gives control
of this strictness/relaxedness to the user.


More information about the Web-SIG mailing list