Questions about XML processing?

Shaozhong SHI shishaozhong at gmail.com
Sat Nov 7 07:14:32 EST 2020


Hi, Hernan,

Did you try to parse GML?

Surely, there can be very concise and smart ways to do these things.

Regards,

David

On Fri, 6 Nov 2020 at 20:57, Hernán De Angelis <variablestarlight at gmail.com>
wrote:

> Thank you Terry, Dan and Dieter for encouraging me to post here. I have
> already solved the problem albeit with a not so efficient solution.
> Perhaps, it is useful to present it here anyway in case some light can
> be added to this.
>
> My job is to parse a complicated XML (iso metadata) and pick up values
> of certain fields in certain conditions. This goes for the most part
> well. I am working with xml.etree.elementtree, which proved sufficient
> for the most part and the rest of the project. JSON is not an option
> within this project.
>
> The specific trouble was in this section, itself the child of a more
> complicated parent: (for simplicity tags are renamed and namespaces
> removed)
>
>            <tagA>
>              <tagB>
>                <tagC>
>                  <string>Something</string>
>                </tagC>
>                <tagC>
>                  <string>Something else</string>
>                </tagC>
>                <tagC>
>                  <note>
>                    <title>
>                      <string>value</string>
>                    </title>
>                    <date0>
>                      <date1>
>                        <date2>
> <gco:Date>2020-11-06</gco:Date>
>                        </date2>
>                        <dateType>
>                          <code blah lots of strange things blah />
>                        </dateType>
>                      </date1>
>                    </date0>
>                  </note>
>                </tagC>
>              </tagB>
>            </tagA>
>
> Basically, I have to get what is in tagC/string but only if the value of
> tagC/note/title/string is "value". As you see, there are several tagC,
> all children of tagB, but tagC can have different meanings(!). And no, I
> have no control over how these XML fields are constructed.
>
> In principle it is easy to make a "findall" and get strings for tagC,
> using:
>
> elem.findall("./tagA/tagB/tagC/string")
>
> and then get the content and append in case there is more than one
> tagC/string like: "Something, Something else".
>
> However, the hard thing to do here is to get those only when
> tagC/note/title/string='value'. I was expecting to find a way of
> specifying a certain construction in square brackets, like
> [@string='value'] or [@/tagC/note/title/string='value'], as is usual in
> XML and possible in xml.etree. However this proved difficult (at least
> for me). So this is the "brute" solution I implemented:
>
> - find all children of tagA/tagB
> - check if /tagA/tagB/tagC/note/title/string has "value"
> - if yes find all tagA/tagB/tagC/string
>
> In quasi-Python:
>
> string = []
> element0 = elem.findall("./tagA/tagB/")
>      for element1 in element0:
>          element2 = element1.find("./tagA/tagB/tagC/note/title/string")
>              if element2.text == 'value'
>                  element3 = element1.findall("./tagA/tagB/tagC/string)
>                  for element4 in element3:
>                      string.append(element4.text)
>
>
> Crude, but works. As I wrote above, I was wishing that a bracketed
> clause of the type [@ ...] already in the first "findall" would do a
> more efficient job but alas my knowledge of xml is too rudimentary.
> Perhaps something to tinker on in the coming weeks.
>
> Have a nice weekend!
>
>
>
>
>
> On 2020-11-06 20:10, Terry Reedy wrote:
> > On 11/6/2020 11:17 AM, Hernán De Angelis wrote:
> >> I am confronting some XML parsing challenges and would like to ask
> >> some questions to more knowledgeable Python users. Apparently there
> >> exists a group for such questions but that list (xml-sig) has
> >> apparently not received (or archived) posts since May 2018(!). I
> >> wonder if there are other list or forum for Python XML questions, or
> >> if this list would be fine for that.
> >
> > If you don't hear otherwise, try here.  Or try stackoverflow.com and
> > tag questions with python and xml.
> >
> >
> --
> https://mail.python.org/mailman/listinfo/python-list
>


More information about the Python-list mailing list