Namedtuples problem

Peter Otten __peter__ at web.de
Thu Feb 23 05:34:11 EST 2017


Deborah Swanson wrote:

> This is how the list of namedtuples is originally created from a csv:
> 
> infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
> test.csv")
> rows = csv.reader(infile)fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)
> 
> Thanks to Peter Otten for this succinct code, and to Greg Ewing for
> suggesting namedtuples for this type of problem to begin with.
> 
> Namedtuples worked beautifully for the first two thirds of this code,
> but I've run into a snag attempting to proceed.
> 
> Here's my code up to the snag, and I'll explain afterwards what I'm
> trying to do:
> 
> import operator
> records[1:] = sorted(records[1:], key=operator.attrgetter("title",
> "Date"))
>     
> groups = defaultdict()
> for r in records[1:]:
> # if the key doesn't exist, make a new group
> if r.title not in groups.keys():
> groups[r.title] = [r]
> # if key (group) exists, append this record
> else:
> groups[r.title].append(r)
> 
> # make lookup table: indices for field names
> records_idx = {}
> for idx, label in enumerate(records[0]):
> records_idx[label] = idx
> 
> LABELS = ['Location', 'ST', 'co', 'miles', 'first', 'Kind', 'Notes'] #
> look at field values for each label on group for group in
> groups.values():
> values = []
> for idx, row in enumerate(group):
> for label in LABELS:
> values.append(group[[idx][records_idx[label]]])
> <-snag
> 
> I want to get lists of field values from the list of namedtuples, one
> list of field values for each row in each group (groups are defined in
> the section beginning with "groups = defaultdict()".
> 
> LABELS defines the field names for the columns of field values of
> interest. So all the locations in this group would be in one list, all
> the states in another list, etc. (Jussi, I'm looking at your suggestion
> for the next part.)
> 
> (I'm quite sure this bit of code could be written with list and dict
> comprehensions, but here I was just trying to get it to work, and
> comprehensions still confuse me a little.)
> 
> Using the debugger's watch window, from
> group[[idx][records_idx[label]]], I get:
> 
> idx = {int}: 0
> records_idx[label] = {int}: 4
> 
> which is the correct indices for the first row of the current group (idx
> = 0) and the first field label in LABELS, 'Location' (records_idx[label]
> = 4).
> 
> And if I look at
> 
> group[0][4] = 'Longview'
> 
> this is also correct. Longview is the Location field value for the first
> row of this group.
> 
> However,
> 
> group[[idx][records_idx[label]]]
> gets an Index Error: list index out of range
> 
> I've run into this kind of problem with namedtuples before, trying to
> access field values with variable names, like:
> 
> label = 'Location'
> records.label
> 
> and I get something like "'records' has no attribute 'label'. This can
> be fixed by using the subscript form and an index, like:
> 
> for idx, r in enumerate(records):
> ...
> records[idx] = r
> 
> But here, I get the Index Error and I'm a bit baffled why. Both
> subscripts evaluate to valid indices and give the correct value when
> explicitly used.
> 
> Can anyone see why I'm getting this Index error? and how to fix it?

I'm not completely sure I can follow you, but you seem to be mixing two 
problems

(1) split a list into groups
(2) convert a list of rows into a list of columns

and making a kind of mess in the process. Functions to the rescue:

#untested

def split_into_groups(records, key):
    groups = defaultdict(list)
    for record in records:
        # no need to check if a group already exists
        # an empty list will automatically added for every 
        # missing key
        groups[key(record)].append(record)
    return groups

def extract_column(records, name):
    # you will agree that extracting one column is easy :)
    return [getattr(record, name) for record in records]

def extract_columns(records, names):
    # we can build on that to make a list of columns
    return [extract_column(records, name) for name in names]

wanted_columns = ['Location', ...]
records = ...
groups = split_into_groups(records, operator.attrgetter("title"))

Columns = namedtuple("Columns", wanted_columns)
for title, group in groups.items():
    # for easier access we turn the list of columns
    # into a namedtuple of columns
    groups[title] = Columns._make(extract_columns(wanted_columns))

If all worked well you should now be able to get a group with

group["whatever"]

and all locations for that group with

group["whatever"].Locations

If there is a bug you can pinpoint the function that doesn't work and ask 
for specific help on that one.




More information about the Python-list mailing list