easy questions from python newbie

Mon Jul 24 01:52:07 EDT 2006

In <1153701209.069639.199860 at i42g2000cwa.googlegroups.com>, walterbyrd
wrote:

> This is the first real python program I have ever worked on. What I
> want to do is:
> 1) count identical records in a cvs file
> 2) create a new file with quantities instead duplicate records
> 3) open the new file in ms-excel
> 
> For example, I will start with a file like:
> 
> 1001
> 1012
> 1008
> 1012
> 1001
> 1001
> 
> and finish with a file like:
> 
> 1001,3
> 1008,1
> 1012,2
> 
> What I need to know:
> 1) is there a function in python that will sort a file into another
> file. Something like:
> sort file1.txt > file2.txt from the DOS command line. I know there is
> also a similar "sort" funtion in Unix.

Lists have a sort method.  No need to do this with temporary files.  Just
read in the first file into a list and sort it.

> 3) I will probably be working with 50 items, or less, would it be best
> for me to do this with a multi-diminsional array? For example: sort the
> file, read a rec into the array, if the next rec is the same then incr
> the count, otherwise add a new rec with a count of 1. Then write the
> array to a file?

I would read the file into a list of list, that's what comes closest to a
multidimensional array, via the `csv` module.  Sort that (outer) list and
then use `itertools.groupby()` to group the identical lists.  You can
write the rows with the `csv` module again.  Short example:

import csv
from itertools import groupby

in_file = open('test.csv', 'rb')
data = list(csv.reader(in_file))
in_file.close()

data.sort()

out_file = open('test2.csv', 'wb')
writer = csv.writer(out_file)
for row, identical_rows in groupby(data):
    row.append(len(list(identical_rows)))
    writer.writerow(row)
out_file.close()

Ciao,
	Marc 'BlackJack' Rintsch