multiple JSON documents in one file, change proposal

Chris Angelico rosuav at gmail.com
Sat Dec 1 06:28:21 EST 2018


On Sat, Dec 1, 2018 at 10:16 PM Marko Rauhamaa <marko at pacujo.net> wrote:
>
> Chris Angelico <rosuav at gmail.com>:
> > On Sat, Dec 1, 2018 at 9:16 PM Marko Rauhamaa <marko at pacujo.net> wrote:
> >> The need for the format to be "typable" (and editable) is essential
> >> for ad-hoc manual testing of components. That precludes all framing
> >> formats that would necessitate a length prefix. HTTP would be
> >> horrible to have to type even without the content-length problem, but
> >> BEEP (RFC 3080) would suffer from the content-length (and CRLF!)
> >> issue as well.
> >
> > I dunno, I type HTTP manually often enough that it can't be all *that*
> > horrible.
>
> Say I want to send this piece of JSON:
>
>    {
>        "msgtype": "echo-req",
>        "opid": 3487547843
>    }
>
> and the framing format is HTTP. I will need to type something like this:
>
>    POST / HTTP/1.1^M
>    Host: localhost^M
>    Content-type: application/json^M
>    Content-length: 54^M
>    ^M
>    {
>        "msgtype": "echo-req",
>        "opid": 3487547843
>    }
>
> That's almost impossible to type without a syntax error.

1) Set your Enter key to send CR-LF, at least for this operation.
That's half your problem solved.
2) Send the request like this:

POST / HTTP/1.0
Content-type: application/json

{"msgtype": "echo-req", "opid": 3487547843}

Then shut down your end of the connection, probably with Ctrl-D. I'm
fairly sure I can type that without bugs, and any compliant HTTP
server should be fine with it.

> >> Finally, couldn't any whitespace character work as a terminator? Yes,
> >> it could, but it would force you to use a special JSON parser that is
> >> prepared to handle the self-delineation. A NUL gives you many more
> >> degrees of freedom in choosing your JSON tools.
> >
> > Either non-delimited or newline-delimited JSON is supported in a lot
> > of tools. I'm quite at a loss here as to how an unprintable character
> > gives you more freedom.
>
> As stated by Paul in another context, newline-delimited is a no-go
> because it forces you to restrict JSON to a subset that doesn't contain
> newlines (see the JSON example above).
>
> Of course, you could say that the terminating newline is only
> interpreted as a terminator after a complete JSON value, but that's not
> the format "supported in a lot of tools".

The subset in question is simply "JSON without any newlines between
tokens", which has the exact meaning as it would have *with* those
newlines. So what you lose is the human-readability of being able to
break an object over multiple lines. Is that a problem? Use
non-delimited instead.

> If you use any legal JSON character as a terminator, you have to make it
> contextual or add an escaping syntax.

Or just use non-delimited, strip all whitespace between objects, and
then special-case the one otherwise-ambiguous situation of two Numbers
back to back. Anything that sends newline-delimited JSON will work
with that.

> > I get it: you have a bizarre set of tools and the normal solutions
> > don't work for you. But you can't complain about the tools not
> > supporting your use-cases. Just code up your own styles of doing
> > things that are unique to you.
>
> There are numerous tools that parse complete JSON documents fine.
> Framing JSON values with NUL-termination is trivial to add in any
> programming environment. For example:
>
>    def json_docs(path):
>        with open(path) as f:
>            for doc in f.read().split("\0")[:-1].
>                yield json.loads(doc)

Yes, but many text-processing tools don't let you manually insert
NULs. Of *course* you can put anything you like in there when you
control both ends and everything in between; that's kinda the point of
coding. But I'm going to use newlines, and parse as non-delimited,
since that can be done just as easily (see my example code earlier -
it could be converted into the same style of generator as you have
here and would be about as many lines), since that will behave as text
in most applications.

ChrisA



More information about the Python-list mailing list