Grouping on and exporting to csv files

Peter Otten __peter__ at web.de
Wed Mar 20 03:21:40 EDT 2013


accessnewbie at gmail.com wrote:

> My current Test_Sort.csv looks like this (which constantly
> changes-although never more than 4 records, thus far anyway):
> 
> RecNo,Count,District,Fruit,StoreNo
> 1,100,0,oranges,1254
> 2,30,3,apples,654
> 3,100,0,bananas,21
> 4,0,4,grapes,478
> 
> I wrote the following python 2.7 script (stuck with 2.7)
> 
> ---------------------
> import sys
> import csv
> import operator
> 
> 
> source = "E:\\Data\\Test_Sort.csv"
> result = "E:\\Data\\Test_Sort1.csv"
> 
> with open(source,"rb") as source:
>     rdr= csv.reader( source )
>     with open(result,"wb") as result:
>         next(rdr, None) # Skip header
>         wtr= csv.writer( result )
>         for r in rdr:
>             wtr.writerow( (r[2], r[3])) # Only write columns 3 and 4 to a
>             csv file
>     exit
> exit
> 
> data = csv.reader(open("E:\\Data\\Test_Sort1.csv"),delimiter=',')
> sortedlist = sorted(data, key=operator.itemgetter(0,1)) # sort by column
> 1, the 2 print sortedlist
> 
> -----------------------------
> 
> My sortedlist results are:
> 
> [['0', 'bananas'], ['0', 'oranges'], ['3', 'apples'], ['4', 'grapes']]
> 
> I want to output all fruits with the same District number out to a csv
> file called by its District number. (No headers)
> 
> Thus in above example I would ultimately have three csv files:
> 
> 0.csv
> bananas
> oranges
> 
> 3.csv
> apples
> 
> 4.csv
> grapes
> 
> Question is : Any ideas as to how export this sortedlist to the tables
> described above?

The standard library offers itertools.groupby()

for district, group in itertools.groupby(
        sortedlist, key=operator.itemgetter(0)):
    with open(district + ".csv", "wb") as outfile:
        csv.writer(outfile).writerows(group)

If you want try to do it by hand as an exercise -- you need a variable to 
keep track of the current district:

last_district = None
for row in sortedlist:
    district = row[0]
    if last_district != district:
        # open new file/writer
        last_district = district
    # write row

> I already have a script that already will look at each file and count the
> number of lines and insert the correct number of commas and the word "and"
> where appropriate in a sentence.
> 
> My ultimate goal is to read each file and print out a sentence based on
> whether it has record or not, and if so how many records and then add the
> word "and" and/or commas between records if greater than 1. I will then
> concatenate all the sentences together to form a paragraph.
> 
> Thus, the above example would say:
> 
> The western district sold bananas and oranges. The central district sold
> apples. The southern district sold grapes.

Have a look at str.join():

 >>> colors = ["red", "yellow", "blue"]
>>> print "Who's afraid of {}, and {}?".format(", ".join(colors[:-1]), 
colors[-1])
Who's afraid of red, yellow, and blue?

> or if the files were:
> 
> 0.csv
> apples
> bananas
> oranges
> 
> 2.csv
> grapes
> 
> The resulting paragraph would read:
> 
> The western district sold apples, bananas, and oranges. The eastern
> district sold grapes.
> 
> It may seem like a crazy way about getting my results but it was the best
> I could come up with given my limited knowledge of python.

No, it is what experienced programmers do all the time -- break a complex 
task into manageable steps, come up with a simple solution for each step, 
and then improve these steps as needed.




More information about the Python-list mailing list