Relaxed, or best-efforts JSON parser for Python?

victor.hooi at 10gen.com victor.hooi at 10gen.com
Mon Oct 12 18:43:07 EDT 2015


On Monday, October 12, 2015 at 10:02:13 PM UTC+11, Laura Creighton wrote:
> In a message of Sun, 11 Oct 2015 17:56:33 -0700, Victor Hooi writes:
> >Hi,
> >
> >I'm attempting to parse MongoDB loglines.
> >
> >The formatting of these loglines could best be described as JSON-like...
> >
> >For example - arrays 
> >
> >Anyhow, say I had the following logline snippet:
> >
> >    { Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }
> >
> >This won't parse with json.loads() - the main issues is the missing quotation marks (") around the strings.
> >
> >My question, is there a more lenient, or relaxed JSON parser available for Python, that will try to do a best-efforts parsing of non-spec JSON?
> >
> >Cheers,
> >Victor
> >-- 
> >https://mail.python.org/mailman/listinfo/python-list
> 
> Won't this 
> http://blog.mongodb.org/post/85123256973/introducing-mtools
> https://github.com/rueckstiess/mtools
> https://pypi.python.org/pypi/mtools/1.1.3
> 
> be better? :)

Hi,

@MRAB - Thanks for the tip. I did actually think of doing that as well - it's what we (MongoDB) do internally for a few of our tools, but was really hoping to avoid going down the regex route. However, this is what I'm doing for now:

    locks = re.sub(r"(\w+):", "\"\g<1>\":", locks)

@Random832 - No, it's not YAML. The MongoDB log format is....sort of JSON, but not. IMHO ,it's a bit of an ugly mess. So things like string fields aren't quoted, you have random custom types, parentheses aren't necessarily balanced (e.g. if you have long loglines that get truncated at 10K characters etc.). I could go on.

@Laura Creighton - Yup, mtools is actually written by a colleague of mine =). Awesome guy. He does a lot of stuff to work around the idiosyncrasies of the MongoDB log format. However, there's quite a bit of overhead to using the full module for this - for this use case, I just needed to parse a specific "locks" document from a logline, so I was hoping for a clean way to just take it and parse it - in this case, the only issue that could hit us (AFAIK) is the lack of quotes around string fields. If they ever introduced a field with spaces in it....I don't know what would happen, lol.




More information about the Python-list mailing list