Questions about XML processing?

Hernán De Angelis variablestarlight at gmail.com
Sat Nov 7 07:24:01 EST 2020


No, it is XML metadata. I also believe there should be a better way using
[@...] expressions in the path.

H.

Den lör 7 nov. 2020 13:14Shaozhong SHI <shishaozhong at gmail.com> skrev:

> Hi, Hernan,
>
> Did you try to parse GML?
>
> Surely, there can be very concise and smart ways to do these things.
>
> Regards,
>
> David
>
> On Fri, 6 Nov 2020 at 20:57, Hernán De Angelis <
> variablestarlight at gmail.com> wrote:
>
>> Thank you Terry, Dan and Dieter for encouraging me to post here. I have
>> already solved the problem albeit with a not so efficient solution.
>> Perhaps, it is useful to present it here anyway in case some light can
>> be added to this.
>>
>> My job is to parse a complicated XML (iso metadata) and pick up values
>> of certain fields in certain conditions. This goes for the most part
>> well. I am working with xml.etree.elementtree, which proved sufficient
>> for the most part and the rest of the project. JSON is not an option
>> within this project.
>>
>> The specific trouble was in this section, itself the child of a more
>> complicated parent: (for simplicity tags are renamed and namespaces
>> removed)
>>
>>            <tagA>
>>              <tagB>
>>                <tagC>
>>                  <string>Something</string>
>>                </tagC>
>>                <tagC>
>>                  <string>Something else</string>
>>                </tagC>
>>                <tagC>
>>                  <note>
>>                    <title>
>>                      <string>value</string>
>>                    </title>
>>                    <date0>
>>                      <date1>
>>                        <date2>
>> <gco:Date>2020-11-06</gco:Date>
>>                        </date2>
>>                        <dateType>
>>                          <code blah lots of strange things blah />
>>                        </dateType>
>>                      </date1>
>>                    </date0>
>>                  </note>
>>                </tagC>
>>              </tagB>
>>            </tagA>
>>
>> Basically, I have to get what is in tagC/string but only if the value of
>> tagC/note/title/string is "value". As you see, there are several tagC,
>> all children of tagB, but tagC can have different meanings(!). And no, I
>> have no control over how these XML fields are constructed.
>>
>> In principle it is easy to make a "findall" and get strings for tagC,
>> using:
>>
>> elem.findall("./tagA/tagB/tagC/string")
>>
>> and then get the content and append in case there is more than one
>> tagC/string like: "Something, Something else".
>>
>> However, the hard thing to do here is to get those only when
>> tagC/note/title/string='value'. I was expecting to find a way of
>> specifying a certain construction in square brackets, like
>> [@string='value'] or [@/tagC/note/title/string='value'], as is usual in
>> XML and possible in xml.etree. However this proved difficult (at least
>> for me). So this is the "brute" solution I implemented:
>>
>> - find all children of tagA/tagB
>> - check if /tagA/tagB/tagC/note/title/string has "value"
>> - if yes find all tagA/tagB/tagC/string
>>
>> In quasi-Python:
>>
>> string = []
>> element0 = elem.findall("./tagA/tagB/")
>>      for element1 in element0:
>>          element2 = element1.find("./tagA/tagB/tagC/note/title/string")
>>              if element2.text == 'value'
>>                  element3 = element1.findall("./tagA/tagB/tagC/string)
>>                  for element4 in element3:
>>                      string.append(element4.text)
>>
>>
>> Crude, but works. As I wrote above, I was wishing that a bracketed
>> clause of the type [@ ...] already in the first "findall" would do a
>> more efficient job but alas my knowledge of xml is too rudimentary.
>> Perhaps something to tinker on in the coming weeks.
>>
>> Have a nice weekend!
>>
>>
>>
>>
>>
>> On 2020-11-06 20:10, Terry Reedy wrote:
>> > On 11/6/2020 11:17 AM, Hernán De Angelis wrote:
>> >> I am confronting some XML parsing challenges and would like to ask
>> >> some questions to more knowledgeable Python users. Apparently there
>> >> exists a group for such questions but that list (xml-sig) has
>> >> apparently not received (or archived) posts since May 2018(!). I
>> >> wonder if there are other list or forum for Python XML questions, or
>> >> if this list would be fine for that.
>> >
>> > If you don't hear otherwise, try here.  Or try stackoverflow.com and
>> > tag questions with python and xml.
>> >
>> >
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>


More information about the Python-list mailing list