Using namedtuples field names for column indices in a list of lists

Deborah Swanson python at deborahswanson.net
Mon Jan 9 01:58:34 EST 2017


Peter Otten wrote, on January 08, 2017 5:21 AM
> 
> Deborah Swanson wrote:
> 
> > Peter Otten wrote, on January 08, 2017 3:01 AM
>  
> Personally I would recommend against mixing data (an actual location)
and 
> metadata (the column name,"Location"), but if you wish my code can be 
> adapted as follows:
> 
> infile = open("dictreader_demo.csv")
> rows = csv.reader(infile)
> fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)

Works like a charm. I stumbled a bit changing all my subscripted
variables to namedtuples and rewriting the inevitable places my code
that didn't work the same. But actually it was fun, especially deleting
all the sections and variables I no longer needed. And it executes
correctly now too - with recognizable fieldnames instead of my quirky
2-letter code subscripts.  All in all a huge win!

I do have two more questions.

1) I have a section that loops through the sorted data, compares two
adjacent rows at a time, and marks one of them for deletion if the rows
are identical.

I'm using 

for i in range(len(records)-1):
    r1 = records[i]
    r2 = records[i+1]
    if r1.xx = r2.xx:
		.
		.
and my question is whether there's a way to work with two adjacent rows
without using subscripts?  

Even better, to get hold of all the records with the same Description as
the current row, compare them all, mark all but the different ones for
deletion, and then resume processing the records after the last one?

2) I'm using mergesort. (I didn't see any way to sort a namedtuple in
the docs.) In the list version of my code I copied and inserted the 2
columns I wanted to sort by into the beginning of the list, and then
deleted them after the list was sorted. But just looking at records, I'm
not so sure that can easily be done. I remember your code to work with
columns of the data:

columnA = [record.A for record in records]

and I can see how that would get me columnA and columnB, but then is
there any better way to insert and delete columns in an existing
namedtuple than slicing? And I don't think you can insert or delete a
whole column while slicing.

Or maybe my entire approach is not the best. I know it's possible to do
keyed sorts, but I haven't actually written or used any. So I just
pulled a mergesort off the shelf and got what I wanted by inserting
copies of those 2 columns at the front, and then deleting them when the
sort was complete. Not exactly elegant, but it works.

Any suggestions would be most welcome. 




More information about the Python-list mailing list