Decoding a huge JSON file incrementally

Paul Moore p.f.moore at gmail.com
Thu Dec 20 10:42:43 EST 2018


I'm looking for a way to incrementally decode a JSON file. I know this
has come up before, and in general the problem is not soluble (because
in theory the JSON file could be a single object). In my particular
situation, though, I have a 9GB file containing a top-level array
object, with many elements. So what I could (in theory) do is to parse
an element at a time, yielding them.

The problem is that the stdlib JSON library reads the whole file,
which defeats my purpose. What I'd like is if it would read one
complete element, then just enough far ahead to find out that the
parse was done, and return the object it found (it should probably
also return the "next token", as it can't reliably push it back - I'd
check that it was a comma before proceeding with the next list
element).

I couldn't see a way to get the stdlib json library to read "just as
much as needed" in this way. Did I miss a trick? Or alternatively, is
there a JSON decoder library on PyPI that supports this sort of usage?
I'd rather not have to implement my own JSON parser if I can avoid it.

Thanks,
Paul



More information about the Python-list mailing list