[BangPypers] parsing xml

Fri Sep 30 10:06:01 CEST 2011

On Fri, Jul 29, 2011 at 10:47 AM, Anand Chitipothu <anandology at gmail.com>wrote:

> 2011/7/28 Venkatraman S <venkat83 at gmail.com>:
> > parsing using minidom is one of the slowest. if you just want to extract
> the
> > distance and assuming that it(the tag) will always be consistent, then i
> > would always suggest regexp. xml parsing is a pain.
>
> regexp is a bad solution to parse xml.
>

Partly because the answer is loosely related and partly because of the
humour quotient, I thought this response to using regex's to parse HTMLs
(which is perhaps more challenging in general than XMLs) was quite an
interesting read. Note this response could be considered a bit OT so don't
take it too seriously in the context of this thread's discussion.

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

>
> minidom is the fastest solution if you consider the programmer time
> instead of developer time.  Minidom is available in standard library,
> you don't have to add another dependency and worry about PyPI
> downtimes and lxml compilations failures.
>
> I don't think there will be significant performance difference between
> regexp and minidom unless you are doing it a million times.
>
>