Controlling the passing of data

Peter Otten __peter__ at web.de
Fri Apr 29 07:36:00 EDT 2016


Sayth Renshaw wrote:

> 
>> 
>> Your actual problem is drowned in too much source code. Can you restate
>> it in English, optionally with a few small snippets of Python?
>> 
>> It is not even clear what the code you provide should accomplish once
>> it's running as desired.
>> 
>> To give at least one code-related advice: You have a few repetitions of
>> the following structure
>> 
>> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
>> > 'trackcondition')
>> 
>> >     meet = d('meeting')
>> 
>> >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]
>> 
>> You should move the pieces into a function that works for meetings,
>> clubs, races, and so on. Finally (If I am repeating myself so be it): the
>> occurence of range(len(something)) in your code is a strong indication
>> that you are not using Python the way Guido intended it. Iterate over the
>> `something` directly whenever possible.
> 
> Hi Peter
> 
>> meetattrs = ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition')
> 
> is created to define a list of attr in the XML rather than referencing
> each attr individually I create a list and pass it into
> 
>  >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]
> 
> This list comprehension reads the XML attr by attr using meet =
> d('meeting') as the call to pyquery to locate the class in the XML and
> identify the attr's.

You misunderstood me. I do understand what your code does, I just have no 
idea what you want to do, in terms of the domain, like e. g.

"Print horses with the last three races they took part in."

Why does this matter? Here's an extreme example:

bars = []
for foo in whatever:
   bars.append(foo.baz)

What does this do? The description

"It puts all baz attributes of the items in whatever into a list"

doesn't help. If you say "I want to make a list of all brands in the car 
park I could recommend a change to

brand = set(car.brand for car in car_park)

because a set avoids duplicates. If you say "I want to document my 
achievements for posterity" I would recommend that you print to a file 
rather than append to a list and the original code could be changed to

with open("somefile") as f:
    for achievement in my_achievements:
        print(achievement.description, file=f)


Back to my coding hint: Don't repeat yourself. If you move the pieces

>> > meetattrs = ('id', 'venue', 'date', 'rail', 'weather',
>> > 'trackcondition')
>> 
>> >     meet = d('meeting')
>> 
>> >     meetdata = [[meet.eq(i).attr(x)
>> >                  for x in meetattrs] for i in range(len(meet))]

into a function

def extract_attrs(nodes, attrs):
    return [[nodes.eq(i).attr(name) for name in attrs]
            for i in range(len(nodes))]

You can reuse it for clubs, races, etc.:

meetdata = extract_attrs(d("meeting"), meetattrs)
racedata = extract_attrs(d("race"), raceattrs)

If you put the parts into a dict you can generalize even further:

tables = {
   "meeting": ([], meetattrs),
   "race": ([], raceattrs),
}
for name, (data, attrs) in tables.items():
    data.extend(extract_attrs(d(name), attrs))

> 
> I do apologise for the lack of output, I asked a question about parsing
> that I always seem to get wrong over think and then find the solution
> simpler than I thought.
> 
> The output is 4 tables of the class and selected attributes eg meetattrs =
> ('id', 'venue', 'date', 'rail', 'weather', 'trackcondition') from the
> meeting class of the XML.
> 
> In order to give flexibility and keep the relational nature they have
> defined in the table I found when exporting the nominations section via
> pandas to csv that i had no way to determine which id belonged to each
> race that is there was no race_id in the nominations to relate back to the
> race, and also no meeting id in the raceid to relate it back to the
> meeting.
> 
> 
> So I wanted to traverse all the classes Meeting, Race and Nomination and
> insert the id of the class into its direct children only and since there
> were many races a meeting and many nomnations a race I need to ensure that
> it is the direct children only.
> 
> It was otherwise working as parsed output in code supplied using to push
> to pandas and use its csv write capability.
> 
> So I inserted
> 
>     for race_el in d('race'):
>         race = pq(race_el)
>         race_id = race.attr('id')
> 
>     for nom_el in race.items('nomination'):
>         res.append((pq(nom_el).attr('raceid', race_id)))
> 
> which traverses and inserts the race_id into the child nominations.
> However, my boggles is how to pass this to the list comprehension that was
> working without changing the data from XML or creating another
> intermediate step and variable. Just to parse it as it was but with the
> extra included race_id.

So you want to go from a tree structure to a set of tables and preserve the 
structural information:

for meeting in meetings
    meeting_table.append(...meeting attributes...)
    meeting_id = ...
    for race in meeting.races:
        race_table.append(meeting_id, ...race attributes...)
        race_id = ...
        for nomination in race.nominations:
            nomination_table.append(race_id, ...nomination attributes...)

I don't know how to spell that in PyQuery, so here's how to do it with lxml:

meeting_table = []
race_table = []
nomination_table = []
tree = lxml.etree.parse(filename)
for meeting in tree.xpath("/meeting"):
    meeting_table.append([meeting.attrib[name] for name in meetattrs])
    meeting_id = meeting.attrib["id"]
    for race in meeting.xpath("./race"):
        race_table.append(
            [meeting_id] + [race.attrib[name] for name in raceattrs])
        race_id = race.attrib["id"]
        for nomination in race.xpath("./nomination"):
            nomination_table.append(
                [race_id]
                + [nomination.attrib[name] for name in horseattrs])

Not as clean and not as general as I would hope -- basically I'm neglecting 
my recommendations from above -- but if it works for you I might have a 
second look later.






More information about the Python-list mailing list