Namedtuples: some unexpected inconveniences

Wed Apr 12 18:00:06 EDT 2017

Peter Otten wrote, on Wednesday, April 12, 2017 1:45 PM
> 
> Deborah Swanson wrote:
> 
> > I won't say the following points are categorically true, but I
became 
> > convinced enough they were true in this instance that I abandoned
the 
> > advised strategy. Which was to use defaultdict to group the list of 
> > namedtuples by one of the fields for the purpose of determining 
> > whether certain other fields in each group were either missing
values 
> > or contained contradictory values.
> > 
> > Are these bugs, or was there something I could have done to avoid 
> > these problems? Or are they just things you need to know working
with 
> > namedtuples?
> > 
> > The list of namedtuples was created with:
> > 
> > infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in
-
> > test.csv")
> > rows = csv.reader(infile)fieldnames = next(rows)
> > Record = namedtuple("Record", fieldnames)
> > records = [Record._make(fieldnames)]
> > records.extend(Record._make(row) for row in rows)
> >     . . .
> > (many lines of field processing code)
> >     . . .
> > 
> > then the attempt to group the records by title:
> > 
> > import operator
> > records[1:] = sorted(records[1:], key=operator.attrgetter("title",
> > "Date"))
> 
> Personally I would immediately discard the header row once and for
all, not 
> again and again on every operation.

Well, perhaps, but I need the header row to stay in place to write the
list to a csv when I'm done (which is why it's there in the first
place). There might be a tiny performance edge in discarding the header
row for the sort, but there would also be a hit to recreate it at output
time.

> > groups = defaultdict() for r in records[1:]:
> >     # if the key doesn't exist, make a new group
> >     if r.title not in groups.keys():
> >         groups[r.title] = [r]
> >     # if key (group) exists, append this record
> >     else:
> >         groups[r.title].append(r)
> 
> You are not using the defaultdict the way it is intended; the 
> groups can be built with
> 
> groups = defaultdict(list)
> for r in records[1:]:
>     groups[r.title].append(r)

Yes, going back to your original post I see now that's what you gave,
and it's probably why I noticed defaultdict's being characterized by
what you make the default to be. Too bad I lost track of that.

> > (Please note that this default dict will not automatically make new 
> > keys when they are encountered, possibly because the keys of the 
> > defaultdict are made from namedtuples and the values are
namedtuples. 
> > So you have to include the step to make a new key when a key is not 
> > found.)
> > 
> > If you succeed in modifying records in a group, the dismaying thing
is 
> > that the underlying records are not updated, making the entire 
> > exercise totally pointless, which was a severe and unexpected 
> > inconvenience.
> > 
> > It looks like the values and the structure were only copied from the

> > original list of namedtuples to the defaultdict. The rows of the 
> > grouped-by dict still behave like namedtuples, but they are no
longer 
> > the same namedtuples as the original list of namedtuples. (I'm sure
I 
> > didn't say that quite right, please correct me if you have better 
> > words for it.)
> 
> They should be the same namedtuple. Something is wrong with 
> your actual code or your diagnosis or both.

Well, I didn't see them behaving as the same namedtuples, and I looked
hard at it, many different ways. If someone could point out the mistake
I might have made to get only copies of them or why they necessarily
would be the same namedtuples, I'd certainly look into it. Or better yet
some code that does the same thing and they remain the same ones. 

(But I think you got it right in your last sentence below. defaultdict
copied them because they were immutable, leaving the original list
unchanged.)

> > It might be possible to complete the operation and then write out
the 
> > groups of rows of namedtuples in the dict to a simple list of 
> > namedtuples, discarding the original, but at the time I noticed that

> > modifying rows in a group didn't change the values in the original 
> > list of namedtuples, I still had further to go with the dict of 
> > groups,  and it was looking easier by the minute to solve the
missing 
> > values problem directly from the original list of namedtuples, so 
> > that's what I did.
> > 
> > If requested I can reproduce how I saw that the original list of 
> > namedtuples was not changed when I modified field values in group
rows 
> > of the dict, but it's lengthy and messy. It might be worthwhile
though 
> > if someone might see a mistake I made, though I found the same 
> > behavior several different ways. Which was when I called it barking
up 
> > the wrong tree and quit trying to solve the problem that way.
> > 
> > Another inconvenience is that there appears to be no way to access 
> > field values of a named tuple by variable, although I've had limited

> > success accessing by variable indices. However, direct attempts to
do 
> > so, like:
> > 
> > values = {row[label] for row in group}
> >     (where 'label' is a variable for the field names of a
namedtuple)
> >     
> >     gets "object has no attribute 'label'
> > 
> > or, where 'record' is a row in a list of namedtuples and 'label' is
a 
> > variable for the fieldnames of a namedtuple:
> > 
> >     value = getattr(record, label)
> 
> That should work.

We may agree that it *should* work, by an intuitive grasp of how it
should work, but it doesn't. You get "object has no attribute 'label'.
It wants to see one of the specific names of a defined field and it
rejects the variable as an invalid attribute. I think MRAB nailed it by
pointing out that as fundamentally a tuple itself, a namedtuple is
immutable, and its elements can't be referenced by variable. 

> >     setattr(record, label, value)	also don't work.
> >     
> > You get the error 'object has no attribute 'label' every time.
> 
> Indeed you cannot change the namedtuple's attributes. Like the
"normal" 
> tuple it is designed to be immutable. If you want changes in one list
(the 
> group) to appear in another (the original records) you need a mutable
data 
> type.

Sadly, that does seem to be the correct conclusion here.