lxml, comparing nodes

Stefan Behnel stefan_ml at behnel.de
Fri Jul 25 07:02:09 EDT 2008


code_berzerker wrote:
>> If document order doesn't matter, try sorting the elements of each level in
>> the two documents by some arbitrary deterministic key, such as (tag name,
>> text, attr count, whatever), and then compare them in order, instead of trying
>> to find matches in multiple passes. itertools.groupby() might be your friend here.
> 
> I think that sorting multiple times by each attribute will cost more
> than I've managed to do:
[...]
>   let1 = [x for x in et1.iter()]
>   let2 = [x for x in et2.iter()]
> 
[...]
>   while let1:
>     el = let1.pop(0)
>     foundEl = findMatchingElem(el, let2)
>     if foundEl is None:
>       return False
>     let2.remove(foundEl)
>   return True
> 
> def findMatchingElem(el, eList):
>   for elem in eList:
>     if elemsEqual(el, elem):
>       return elem
>   return None
[...]
> Notice that if documents are in exact same order, each element is
> compared only once!

Not in your code.

Stefan



More information about the Python-list mailing list