lxml, comparing nodes

Stefan Behnel stefan_ml at behnel.de
Thu Jul 24 12:26:16 EDT 2008


code_berzerker wrote:
> Thanks for help. Thats inspiring, tho not exactly what I need, coz
> ignoring document order is requirement (ignoring changes in order of
> different siblings of the same type, etc). I plan to try something
> like that:
> 
> def xmlCmp(xmlStr1, xmlStr2):
>   et1 = etree.XML(xmlStr1)
>   et2 = etree.XML(xmlStr2)
> 
>   queue = []
>   tmpq = deque([et1])
>   tmpq2 = deque([et2])
> 
>   while tmpq:
>     el = tmpq.popleft()
>     tmpq.extend(el)
>     queue.append(el.tag)
> 
>   while queue:
>     el = queue.pop()
>     foundEl = findMatchingElem(el, et2)
>     if foundEl:
>       et1.remove(el)
>       tmpq2.remove(foundEl)
>     else:
>       return False
> 
>   if len(tmpq2) == 0:
>     return True
>   else:
>     return False

If document order doesn't matter, try sorting the elements of each level in
the two documents by some arbitrary deterministic key, such as (tag name,
text, attr count, whatever), and then compare them in order, instead of trying
to find matches in multiple passes. itertools.groupby() might be your friend here.

Stefan



More information about the Python-list mailing list