Using namedtuples field names for column indices in a list of lists

Deborah Swanson python at deborahswanson.net
Mon Jan 9 16:40:47 EST 2017


Peter Otten wrote, on January 09, 2017 6:51 AM
> 
> Deborah Swanson wrote:
> 
> > Even better, to get hold of all the records with the same
Description 
> > as the current row, compare them all, mark all but the different
ones 
> > for deletion, and then resume processing the records after the last 
> > one?
> 
> When you look at all fields for deduplication anyway there's no need
to 
> treat one field (Description) specially. Just
> 
> records = set(records)

I haven't worked with sets before, so this would be a good time to
start.

> should be fine. As the initial order is lost* you probably want to
sort 
> afterwards. The code then becomes
> 
> records = sorted(
>     set(records), 
>     key=operator.attrgetter("Description")
> )

Good, this is confirmation that 'sorted()' is the way to go. I want a 2
key sort, Description and Date, but I think I can figure out how to do
that.


> Now if you want to fill in missing values, you should probably do this

> before deduplication 

That's how my original code was written, to fill in missing values as
the very last thing before saving to csv.

> -- and the complete() function introduced in
>https://mail.python.org/pipermail/python-list/2016-December/717847.html

> can be adapted to work with namedtuples instead of dicts.

Ah, your defaultdict suggestion. Since my original comprows() function
to fill in missing values is now broken after the rest of the code was
rewritten for namedtuples (I just commented it out to test the
namedtuples version), this would be a good time to look at defaultdict.

> (*) If you want to preserve the initial order you can use a 
> collections.OrderedDict instead of the set.

OrderedDict is another thing I haven't used, but would love to, so I
think I'll try both the set and the OrderedDict, and see which one is
best here.

Thanks again Peter, all your help is very much appreciated.

Deborah




More information about the Python-list mailing list