Namedtuples: some unexpected inconveniences

Fri Apr 14 17:16:10 EDT 2017

Deborah Swanson wrote:

> Peter,
> 
> Retracing my steps to rewrite the getattr(row, label) code, this is what
> sent me down the rabbit hole in the first place. (I changed your 'rows'
> to 'records' just to use the same name everywhere, but all else is the
> same as you gave me.) I'd like you to look at it and see if you still
> think complete(group, label) should work. Perhaps seeing why it fails
> will clarify some of the difficulties I'm having.
> 
> I ran into problems with values and has_empty. values has a problem
> because
> row[label] gets a TypeError. has_empty has a problem because a list of
> field values will be shorter with missing values than a full list, but a
> namedtuple with missing values will be the same length as a full
> namedtuple since missing values have '' placeholders.  Two more
> unexpected inconveniences.
> 
> A short test csv is at the end, for you to read in and attempt to
> execute the following code, and I'm still working on reconstructing the
> lost getattr(row, label) code.
> 
> import csv
> from collections import namedtuple, defaultdict
> 
> def get_title(row):
>     return row.title
> 
> def complete(group, label):
>     values = {row[label] for row in group}
>     # get "TypeError: tuple indices must be integers, not str"

Yes, the function expects row to be dict-like. However when you change 

row[label]

to

getattr(row, label)

this part of the code will work...

>     has_empty = not min(values, key=len)
>     if len(values) - has_empty != 1:
>         # no value or multiple values; manual intervention needed
>         return False
>     elif has_empty:
>         for row in group:
>             row[label] = max(values, key=len)

but here you'll get an error. I made the experiment to change everything 
necessary to make it work with namedtuples, but you'll probably find the 
result a bit hard to follow:

import csv
from collections import namedtuple, defaultdict

INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv"
OUTFILE = "tmp.csv" 

def get_title(row):
    return row.title

def complete(group, label):
    values = {getattr(row, label) for row in group}  
    has_empty = not min(values, key=len)
    if len(values) - has_empty != 1:
        # no value or multiple values; manual intervention needed
        return False
    elif has_empty:
        # replace namedtuples in the group. Yes, it's ugly
        fix = {label: max(values, key=len)}
        group[:] = [record._replace(**fix) for record in group]
    return True

with open(INFILE) as infile:
    rows = csv.reader(infile)
    fieldnames = next(rows)
    Record = namedtuple("Record", fieldnames)
    groups = defaultdict(list)
    for row in rows:
        record = Record._make(row)
        groups[get_title(record)].append(record)

LABELS = ['Location', 'Kind', 'Notes']

# add missing values
for group in groups.values():
    for label in LABELS:
        complete(group, label)

# dump data (as a demo that you do not need the list of all records)
with open(OUTFILE, "w") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(fieldnames)
    writer.writerows(
        record for group in groups.values() for record in group
    )

One alternative is to keep the original and try to replace the namedtuple 
with the class suggested by Gregory Ewing. Then it should suffice to also 
change

>     elif has_empty:
>         for row in group:
>             row[label] = max(values, key=len)

to

>     elif has_empty:
>         for row in group:
              setattr(row, label, max(values, key=len))

PS: Personally I would probably take the opposite direction and use dicts 
throughout...