NEWB: reverse traversal of xml file

Serge Orlov Serge.Orlov at gmail.com
Mon May 22 21:42:52 EDT 2006


manstey wrote:
> Hi,
>
> I have an xml file of about 140Mb like this:
>
> <book>
>   <record>
> ...
>      <wordpartWTS>1</wordpartWTS>
>   </record>
>   <record>
>     ...
>     <wordpartWTS>2</wordpartWTS>
>   </record>
>   <record>
> ...
>     <wordpartWTS>1</wordpartWTS>
>   </record>
> </book>
>
> I want to traverse it from bottom to top and add another field to each
> record         <totalWordPart>1</totalWordPart>
> which would give the highest value of wordpartWTS for each record for
> each word
>
> so if wordparts for the first ten records were 1 2 1 1 1 2 3 4 1 2
> I want totalWordPart to be 2 2 1 1 4 4 4 4 2 2
>
> I figure the easiest way to do this is to go thru the file backwards.
>
> Any ideas how to do this with an xml data file?

You need to iterate from the beginning and use itertools.groupby:

from itertools import groupby

def enumerate_words(parts):
    word_num = 0
    prev = 0
    for part in parts:
        if prev >= part:
            word_num += 1
        prev = part
        yield word_num, part


def get_word_num(item):
    return item[0]

parts = 1,2,1,1,1,2,3,4,1,2
for word_num, word in groupby(enumerate_words(parts), get_word_num):
    parts_list = list(word)
    max_part = parts_list[-1][1]
    for word_num, part_num in parts_list:
        print max_part, part_num

prints:

2 1
2 2
1 1
1 1
4 1
4 2
4 3
4 4
2 1
2 2




More information about the Python-list mailing list