Using namedtuples field names for column indices in a list of lists

Sun Jan 8 08:53:15 EST 2017

Peter Otten wrote, on January 08, 2017 5:21 AM
> 
> Deborah Swanson wrote:
> 
> > Peter Otten wrote, on January 08, 2017 3:01 AM
> >> 
> >> Deborah Swanson wrote:
> >> 
> >> > to do that is with .fget(). Believe me, I tried every > possible 
> >> > way
> > to
> >> > use instance.A or instance[1] and no way could I get 
> >> > ls[instance.A].
> >> 
> >> Sorry, no.
> > 
> > I quite agree, I was describing the dead end I was in from 
> peeling the 
> > list of data and the namedtuple from the header row off the csv 
> > separately. That was quite obviously the wrong path to take, but I 
> > didn't know what a good way would be.
> > 
> >> To get a list of namedtuple instances use:
> >> 
> >> rows = csv.reader(infile)
> >> Record = namedtuple("Record", next(rows))
> >> records = [Record._make(row) for row in rows]
> > 
> > This is slightly different from Steven's suggestion, and it makes a 
> > block of records that I think would be iterable. At any 
> rate all the 
> > data from the csv would belong to a single data structure, and that 
> > seems inherently a good thing.
> > 
> > a = records[i].A , for example
> > 
> > And I think that this would produce recognizable field names in my 
> > code (which was the original goal) if the following works:
> > 
> > records[0] is the header row == ('Description', 'Location', etc.)
> 
> Personally I would recommend against mixing data (an actual 
> location) and 
> metadata (the column name,"Location"), but if you wish my code can be 
> adapted as follows:
> 
> infile = open("dictreader_demo.csv")
> rows = csv.reader(infile)
> fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)

Peter, this looks really good, and yes, I didn't feel so good about 
records[i].Location either, but it was the only way I could see to get
the recognizable variable names I want. By extending records from a
namedtuple of field names, I think it can be done cleanly. I'll try it
and see.

> If you want a lot of flexibility without doing the legwork 
> yourself you 
> might also have a look at pandas. Example session:
> 
> $ cat places.csv
> Location,Description,Size
> here,something,17
> there,something else,10
> $ python3
> Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
> [GCC 4.8.4] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pandas
> >>> places = pandas.read_csv("places.csv")
> >>> places
>   Location     Description  Size
> 0     here       something    17
> 1    there  something else    10
> 
> [2 rows x 3 columns]
> >>> places.Location
> 0     here
> 1    there
> Name: Location, dtype: object
> >>> places.sort(columns="Size")
>   Location     Description  Size
> 1    there  something else    10
> 0     here       something    17
> 
> [2 rows x 3 columns]
> >>> places.Size.mean()
> 13.5
> 
> Be aware that there is a learning curve...

Yes, and I'm sure the learning curve is steep. I watched a webinar on
pandas about a year ago, not to actually learn it, but just to take in
the big picture and see something people were really accomplishing with
python.  I won't take this on any time right away, but I'll definitely
keep it and work with it sometime. Maybe as just an intro to pandas,
using my data from the real estate project.

> > If I can use records[i].Location for the Location column 
> data in row 
> > 'i', then I've got my recognizable-field-name variables.
> > 
> >> If you want a column from a list of records you need to extract it 
> >> manually:
> >> 
> >> columnA = [record.A for record in records]
> > 
> > This is very neat. Something like a list comprehension for named 
> > tuples?
> > 
> > Thanks Peter, I'll try it all tomorrow and see how it goes.
> > 
> > PS. I haven't forgotten your defaultdict suggestion, I'm 
> just taking 
> > the suggestions I got in the "Cleaning up Conditionals" 
> thread one at 
> > a time, and I will get to defaultdict. Then I'll look at 
> all of them 
> > and see what final version of the code will work best with all the 
> > factors to consider.