Namedtuples: some unexpected inconveniences

Peter Otten __peter__ at web.de
Wed Apr 12 16:44:46 EDT 2017


Deborah Swanson wrote:

> I won't say the following points are categorically true, but I became
> convinced enough they were true in this instance that I abandoned the
> advised strategy. Which was to use defaultdict to group the list of
> namedtuples by one of the fields for the purpose of determining whether
> certain other fields in each group were either missing values or
> contained contradictory values.
> 
> Are these bugs, or was there something I could have done to avoid these
> problems? Or are they just things you need to know working with
> namedtuples?
> 
> The list of namedtuples was created with:
> 
> infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
> test.csv")
> rows = csv.reader(infile)fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)
>     . . .
> (many lines of field processing code)
>     . . .
> 
> then the attempt to group the records by title:
> 
> import operator
> records[1:] = sorted(records[1:], key=operator.attrgetter("title",
> "Date")) 

Personally I would immediately discard the header row once and for all, not 
again and again on every operation.

> groups = defaultdict() for r in records[1:]:
>     # if the key doesn't exist, make a new group
>     if r.title not in groups.keys():
>         groups[r.title] = [r]
>     # if key (group) exists, append this record
>     else:
>         groups[r.title].append(r)

You are not using the defaultdict the way it is intended; the groups can be 
built with

groups = defaultdict(list)
for r in records[1:]:
    groups[r.title].append(r)
 
> (Please note that this default dict will not automatically make new keys
> when they are encountered, possibly because the keys of the defaultdict
> are made from namedtuples and the values are namedtuples. So you have to
> include the step to make a new key when a key is not found.)
> 
> If you succeed in modifying records in a group, the dismaying thing is
> that the underlying records are not updated, making the entire exercise
> totally pointless, which was a severe and unexpected inconvenience.
> 
> It looks like the values and the structure were only copied from the
> original list of namedtuples to the defaultdict. The rows of the
> grouped-by dict still behave like namedtuples, but they are no longer
> the same namedtuples as the original list of namedtuples. (I'm sure I
> didn't say that quite right, please correct me if you have better words
> for it.)

They should be the same namedtuple. Something is wrong with your actual code 
or your diagnosis or both.

> 
> It might be possible to complete the operation and then write out the
> groups of rows of namedtuples in the dict to a simple list of
> namedtuples, discarding the original, but at the time I noticed that
> modifying rows in a group didn't change the values in the original list
> of namedtuples, I still had further to go with the dict of groups,  and
> it was looking easier by the minute to solve the missing values problem
> directly from the original list of namedtuples, so that's what I did.
> 
> If requested I can reproduce how I saw that the original list of
> namedtuples was not changed when I modified field values in group rows
> of the dict, but it's lengthy and messy. It might be worthwhile though
> if someone might see a mistake I made, though I found the same behavior
> several different ways. Which was when I called it barking up the wrong
> tree and quit trying to solve the problem that way.
> 
> Another inconvenience is that there appears to be no way to access field
> values of a named tuple by variable, although I've had limited success
> accessing by variable indices. However, direct attempts to do so, like:
> 
> values = {row[label] for row in group}
>     (where 'label' is a variable for the field names of a namedtuple)
>     
>     gets "object has no attribute 'label'
> 
> or, where 'record' is a row in a list of namedtuples and 'label' is a
> variable for the fieldnames of a namedtuple:
> 
>     value = getattr(record, label)

That should work.

>     setattr(record, label, value)	also don't work.
>     
> You get the error 'object has no attribute 'label' every time.

Indeed you cannot change the namedtuple's attributes. Like the "normal" 
tuple it is designed to be immutable. If you want changes in one list (the 
group) to appear in another (the original records) you need a mutable data 
type.





More information about the Python-list mailing list