Which one is the best XML-parser?

Marko Rauhamaa marko at pacujo.net
Fri Jun 24 09:16:27 EDT 2016


Random832 <random832 at fastmail.com>:
> You know what would be really nice? A "semi-incremental" parser that
> can e.g. yield (whether through an event or through the iterator
> protocol) a fully formed element (preferably one that can be queried
> with xpath) at a time for each record of a document representing a
> list of objects. Does anything like that exist?

You can construct that from a SAX parser, but it's less convenient than
it could be. Python's JSON parser doesn't have it so I've had to build a
clumsy one myself:

            def decode_json_object_array(self):
                # A very clumsy implementation of an incremental JSON decoder
                it = self.get_text()
                inbuf = ""
                while True:
                    try:
                        inbuf += next(it)
                    except StopIteration:
                        # a premature end; trigger a decode error
                        json.loads("[" + inbuf)
                    try:
                        head, tail = inbuf.split("[", 1)
                    except ValueError:
                        continue
                    break
                # trigger a decode error if head contains junk
                json.loads(head + "[]")
                inbuf = ""
                chunk = tail
                while True:
                    bracket_maybe = ""
                    for big in chunk.split("]"):
                        comma_maybe = ""
                        for small in big.split(","):
                            inbuf += comma_maybe + small
                            comma_maybe = ","
                            try:
                                yield json.loads(inbuf)
                            #except json.JSONDecodeError:
                            except ValueError: # legacy exception
                                pass
                            else:
                                inbuf = comma_maybe = ""
                        inbuf += bracket_maybe
                        bracket_maybe = "]"
                        try:
                            yield json.loads(inbuf)
                        #except json.JSONDecodeError:
                        except ValueError: # legacy exception
                            pass
                        else:
                            inbuf = ""
                    try:
                        chunk += next(it)
                    except StopIteration:
                        break
                # trigger a decode error if chunk contains junk
                json.loads("[" + chunk)

It could easily be converted to an analogous XML parser.


Marko



More information about the Python-list mailing list