[BangPypers] parsing xml

Dhananjay Nene dhananjay.nene at gmail.com
Sun Jul 31 19:28:58 CEST 2011


On Thu, Jul 28, 2011 at 3:18 PM, Kenneth Gonsalves <lawgon at gmail.com> wrote:

> hi,
>
> here is a simplified version of an xml file:
>
> <?xml version="1.0" encoding="UTF-8"?>
>    <gpx >
>        <metadata>
>                <author>
>                <name>CloudMade</name>
>                <email id="support" domain="cloudmade.com" />
>                <link href="http://maps.cloudmade.com"></link>
>                </author>
>                <copyright author="CloudMade">
>                <license>http://cloudmade.com/faq#license</license>
>                </copyright>
>                <time>2011-07-28T07:04:01</time>
>        </metadata>
>            <extensions>
>                <distance>1489</distance>
>                <time>344</time>
>                <start>Sägerstraße</start>
>                <end>Im Gisinger Feld</end>
>            </extensions>
>    </gpx>
>
> I want to get the value of the distance element - 1489. What is the
> simplest way of doing this?
>

re.search("<distance>\s*(\d+)\s*</distance>",data).group(1)

would appear to be the most succinct and quite fast. Adjust for whitespace
as and if necessary.

Yet I would probably use the minidom based approach, if I was sure the input
was likely to be continue to be xml. Anand C's solution (elsewhere in the
thread) reflects the programmers intent in a simpler, less obfuscated form
(both correctly working solutions will communicate the intent with exactly
the same precision - the precision required to make the program work).

As far as optimisation goes - I can see at least 3 options

a. the minidom performance is acceptable - no further optimisation required
b. minidom performance is not acceptable - try the regex one
c. python library performance is not acceptable - switch to 'c'

I can imagine people starting with a and then deciding to move along the
path a->b->c if and as necessary.
I believe starting with b risks obfuscating code (imo regex is obfuscated
compared to xml nodes - YMMV)
I don't know of any python programmers who are speed-maniacs. I am worried
anytime someone programs in something else than assembly/machine code and
uses the latter word. The rest of us are just trading off development speed
vs. runtime speed.

Dhananjay


More information about the BangPypers mailing list