Using namedtuples field names for column indices in a list of lists

Deborah Swanson python at deborahswanson.net
Sun Jan 8 04:53:00 EST 2017


Steven D'Aprano wrote, on January 07, 2017 10:43 PM
> 
> On Sunday 08 January 2017 16:39, Deborah Swanson wrote:
> 
> > What I've done so far:
> > 
> > with open('E:\\Coding projects\\Pycharm\\Moving\\Moving 
> 2017 in.csv',
> > 'r') as infile:
> >     ls = list(csv.reader(infile))
> >     lst = namedtuple('lst', ls[0])
> > 
> > where 'ls[0]' is the header row of the csv, and it works perfectly 
> > well. 'lst' is a namedtuple instance with each of the 
> column titles as 
> > field names.
> 
> Are you sure? namedtuple() returns a class, not a list:

Yes. 'ls' is defined as 'list(csv.reader(infile))', so ls[0] is the
first row from the csv, the header row. 'lst' is the namedtuple.

Perhaps what's puzzling you is that the way I've written it, the list of
data and the namedtuple are disjoint, and that's the problem.

> py> from collections import namedtuple
> py> names = ['A', 'B', 'C']
> py> namedtuple('lst', names)
> <class '__main__.lst'>
> 
> The way namedtuple() is intended to be used is like this:
> 
> 
> py> from collections import namedtuple
> py> names = ['A', 'B', 'C']
> py> Record = namedtuple('Record', names)
> py> instance = Record(10, 20, 30)
> py> print(instance)
> Record(A=10, B=20, C=30)
> 
> 
> There is no need to call fget directly to access the 
> individual fields:
> 
> py> instance.A
> 10
> py> instance.B
> 20
> py> instance[1]  # indexing works too
> 20
> 
> 
> which is *much* simpler than:
> 
> py> Record.A.fget(instance)
> 10
 
I don't disagree with anything you've said and shown here. But I want to
use the 'instance.A' as a subscript for the list 'ls', and the only way
to do that is with .fget(). Believe me, I tried every possible way to
use instance.A or instance[1] and no way could I get ls[instance.A]. 

The problem I'm having here is one of linkage between the named tuple
for the column titles and the list that holds the data in the columns.

> I think you should be doing something like this:
> 
> pathname = 'E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 
> in.csv' with open(pathname, 'r') as infile:
>     rows = list(csv.reader(infile))
>     Record = namedtuple("Record", rows[0])
>     for row in rows[1:]:  # skip the first row, the header
>         row = Record(row)
>         # process this row...
>         if row.location == 0:
>             ...

Now here you have something I didn't think of: 'row = Record(row)' in a
loop through the rows. 

> [...]
> > But I haven't found a way to assign new values to a list element. 
> > using namedtuple.fieldname. I think a basic problem is that 
> > namedtuples have the properties of tuples, and you can't assign to
an 
> > existing tuple because they're immutable.
> 
> Indeed. Being tuples, you have to create a new one. You can 
> do it with slicing, 
> like ordinary tuples, but that's rather clunky:
> 
> py> print(instance)
> Record(A=10, B=20, C=30)
> py> Record(999, *instance[1:])
> Record(A=999, B=20, C=30)

Very clunky. I don't like modifying standard tuples with slicing, and
this is even worse.

> The recommended way is with the _replace method:
> 
> py> instance._replace(A=999)
> Record(A=999, B=20, C=30)
> py> instance._replace(A=999, C=888)
> Record(A=999, B=20, C=888)
> 
> 
> Note that despite the leading underscore, _replace is *not* a 
> private method of 
> the class. It is intentionally documented as public. The 
> leading underscore is 
> so that it won't clash with any field names.
> 
> 
> 
> 
> -- 
> Steven
> "Ever since I learned about confirmation bias, I've been seeing 
> it everywhere." - Jon Ronson

I will have to work with this. It's entirely possible it will do what I
want it to do. The key problem I was having was getting a linkage
between the namedtuple and the list of data from the csv.

I want to implement a suggestion I got to use a namedtuple made from the
header row as subscripts for elements in the list of data, and the
example given in the docs: 

EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title,
department, paygrade')

import csv
for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv",
"rb"))):
    print(emp.name, emp.title)

assumes the field names will be hardcoded. Reading the csv into a list
and then trying to use the namedtuple made from the header row as
subscripts is how I ended up resorting to 'Record.A.fget(instance)' to
read values, and wasn't able to assign them. 

But assigning the rows of data into namedtuple instances with: 

Record = namedtuple("Record", rows[0])
for row in rows[1:]: 
    row = Record(row)

does look like the linkage I need and wasn't finding the way I was doing
it. If 'Record(row)' is the list data and the columns are the same as
defined in 'namedtuple("Record", rows[0])', it really should work. And I
didn't get it that _replace could be used to assign new values to
namedtuples (duh. Pretty clear now that I reread it, and all the row
data is in namedtuple instances.) 

The big question is whether the namedtuple instances can be used as
something recognizable as field name subscripts, but that's something
I'll just have to try and see what it looks like. The goal is that
they'll look like row.Description, row.Location, etc., and I think they
will.

Thanks Steven. I know stuff that I've already learned and used enough
that I'm familiar with it pretty well, but I'm a new enough python coder
that I still have to thrash around with something new before I
understand it. I hope I get to a point where I'll be more systematic in
learning complex new things without a professor to tell me how to do it,
but that hasn't quite happened yet.

Deborah




More information about the Python-list mailing list