multiple JSON documents in one file, change proposal

Chris Angelico rosuav at gmail.com
Sat Dec 1 05:30:53 EST 2018


On Sat, Dec 1, 2018 at 9:16 PM Marko Rauhamaa <marko at pacujo.net> wrote:
>
> Paul Rubin <no.email at nospam.invalid>:
>
> > Marko Rauhamaa <marko at pacujo.net> writes:
> >> Having rejected different options (<URL:
> >> https://en.wikipedia.org/wiki/JSON_streaming>), I settled with
> >> terminating each JSON value with an ASCII NUL character, which is
> >> illegal in JSON proper.
> >
> > Thanks, that Wikipedia article is helpful.  I'd prefer to not use stuff
> > like NUL or RS because I like keeping the file human readable.  I might
> > use netstring format (http://cr.yp.to/proto/netstrings.txt) but I'm even
> > more convinced now that adding a streaming feature to the existing json
> > module is the right way to do it.
>
> We all have our preferences.
>
> In my case, I need an explicit terminator marker to know when a JSON
> value is complete. For example, if I should read from a socket:
>
>    123
>
> I can't yet parse it because there might be another digit coming. On the
> other hand, the peer might not see any reason to send any further bytes
> because "123" is all they wanted to send at the moment.

This is actually the only special case. Every other JSON value has a
clear end. So the only thing you need to say is that, if the sender
wishes to transmit a bare number, it must append a space. Seriously,
how often do you ACTUALLY send a bare number? I've sometimes sent a
string on its own, but even that is incredibly rare. Having to send a
simple space after a bare number is unlikely to cause much trouble.

> As for NUL, a control character that is illegal in all JSON contexts is
> practical so the JSON chunks don't need to be escaped. An ASCII-esque
> solution would be to pick ETX (= end of text). Unfortunately, a human
> operator typing ETX (= ctrl-C) to terminate a JSON value will cause a
> KeyboardInterrupt in many modern command-line interfaces.
>
> It happens NUL (= ctrl-SPC = ctrl-@) is pretty easy to generate and
> manipulate in editors and the command line.

I have no idea which editors YOU use, but if you poll across platforms
and systems, I'm pretty sure you'll find that not everyone can type
it. Furthermore, many tools use the presence of an 0x00 byte as
evidence that a file is binary, not text. (For instance, git does
this.) That might be a good choice for your personal use-case, but not
the general case, whereas the much simpler options listed on the
Wikipedia page are far more general, and actually wouldn't be THAT
hard for you to use.

> The need for the format to be "typable" (and editable) is essential for
> ad-hoc manual testing of components. That precludes all framing formats
> that would necessitate a length prefix. HTTP would be horrible to have
> to type even without the content-length problem, but BEEP (RFC 3080)
> would suffer from the content-length (and CRLF!) issue as well.

I dunno, I type HTTP manually often enough that it can't be all *that* horrible.

> Finally, couldn't any whitespace character work as a terminator? Yes, it
> could, but it would force you to use a special JSON parser that is
> prepared to handle the self-delineation. A NUL gives you many more
> degrees of freedom in choosing your JSON tools.

Either non-delimited or newline-delimited JSON is supported in a lot
of tools. I'm quite at a loss here as to how an unprintable character
gives you more freedom.

I get it: you have a bizarre set of tools and the normal solutions
don't work for you. But you can't complain about the tools not
supporting your use-cases. Just code up your own styles of doing
things that are unique to you.

ChrisA



More information about the Python-list mailing list