Namedtuples: some unexpected inconveniences

Wed Apr 12 15:57:08 EDT 2017

I won't say the following points are categorically true, but I became
convinced enough they were true in this instance that I abandoned the
advised strategy. Which was to use defaultdict to group the list of
namedtuples by one of the fields for the purpose of determining whether
certain other fields in each group were either missing values or
contained contradictory values.

Are these bugs, or was there something I could have done to avoid these
problems? Or are they just things you need to know working with
namedtuples?

The list of namedtuples was created with:

infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
test.csv") 
rows = csv.reader(infile)fieldnames = next(rows) 
Record = namedtuple("Record", fieldnames) 
records = [Record._make(fieldnames)]
records.extend(Record._make(row) for row in rows)
    . . .
(many lines of field processing code)
    . . .

then the attempt to group the records by title:

import operator
records[1:] = sorted(records[1:], key=operator.attrgetter("title",
"Date")) groups = defaultdict() for r in records[1:]:
    # if the key doesn't exist, make a new group
    if r.title not in groups.keys():
        groups[r.title] = [r]
    # if key (group) exists, append this record
    else:
        groups[r.title].append(r)

(Please note that this default dict will not automatically make new keys
when they are encountered, possibly because the keys of the defaultdict
are made from namedtuples and the values are namedtuples. So you have to
include the step to make a new key when a key is not found.)

If you succeed in modifying records in a group, the dismaying thing is
that the underlying records are not updated, making the entire exercise
totally pointless, which was a severe and unexpected inconvenience.

It looks like the values and the structure were only copied from the
original list of namedtuples to the defaultdict. The rows of the
grouped-by dict still behave like namedtuples, but they are no longer
the same namedtuples as the original list of namedtuples. (I'm sure I
didn't say that quite right, please correct me if you have better words
for it.) 

It might be possible to complete the operation and then write out the
groups of rows of namedtuples in the dict to a simple list of
namedtuples, discarding the original, but at the time I noticed that
modifying rows in a group didn't change the values in the original list
of namedtuples, I still had further to go with the dict of groups,  and
it was looking easier by the minute to solve the missing values problem
directly from the original list of namedtuples, so that's what I did.

If requested I can reproduce how I saw that the original list of
namedtuples was not changed when I modified field values in group rows
of the dict, but it's lengthy and messy. It might be worthwhile though
if someone might see a mistake I made, though I found the same behavior
several different ways. Which was when I called it barking up the wrong
tree and quit trying to solve the problem that way. 

Another inconvenience is that there appears to be no way to access field
values of a named tuple by variable, although I've had limited success
accessing by variable indices. However, direct attempts to do so, like:

values = {row[label] for row in group}  
    (where 'label' is a variable for the field names of a namedtuple)

    gets "object has no attribute 'label'

or, where 'record' is a row in a list of namedtuples and 'label' is a
variable for the fieldnames of a namedtuple:

    value = getattr(record, label)       
    setattr(record, label, value)	also don't work.

You get the error 'object has no attribute 'label' every time.