Perl and Python, a practical side-by-side example.

Jussi Salmela tiedon_jano at hotmail.com
Sat Mar 3 12:53:01 EST 2007


Shawn Milo kirjoitti:
> <snip> 
> I am not looking for the smallest number of lines, or anything else
> that would make the code more difficult to read in six months. Just
> any instances where I'm doing something inefficiently or in a "bad"
> way.
> 
> I'm attaching both the Perl and Python versions, and I'm open to
> comments on either. The script reads a file from standard input and
> finds the best record for each unique ID (piid). The best is defined
> as follows: The newest expiration date (field 5) for the record with
> the state (field 1) which matches the desired state (field 6). If
> there is no record matching the desired state, then just take the
> newest expiration date.
> 

I don't know if this attempt satisfies your criteria but here goes!

This is not a rewrite of your program but was created using your problem
description above. I've not included the reading of the data because it 
has not much to do with the problem per se.

#============================================================
input = [
     "aaa\tAAA\t...\t...\t...\t20071212\tBBB\n",
     "aaa\tAAA\t...\t...\t...\t20070120\tAAA\n",
     "aaa\tAAA\t...\t...\t...\t20070101\tAAA\n",
     "aaa\tAAA\t...\t...\t...\t20071010\tBBB\n",
     "aaa\tAAA\t...\t...\t...\t20071111\tBBB\n",
     "ccc\tAAA\t...\t...\t...\t20071201\tBBB\n",
     "ccc\tAAA\t...\t...\t...\t20070101\tAAA\n",
     "ccc\tAAA\t...\t...\t...\t20071212\tBBB\n",
     "ccc\tAAA\t...\t...\t...\t20071212\tAAA\n",
     "bbb\tAAA\t...\t...\t...\t20070101\tAAA\n",
     "bbb\tAAA\t...\t...\t...\t20070101\tAAA\n",
     "bbb\tAAA\t...\t...\t...\t20071212\tAAA\n",
     "bbb\tAAA\t...\t...\t...\t20070612\tAAA\n",
     "bbb\tAAA\t...\t...\t...\t20071212\tBBB\n",
     ]

input = [x[:-1].split('\t') for x in input]
recs = {}
for row in input:
     recs.setdefault(row[0], []).append(row)

for key in recs:
     rows = recs[key]
     rows.sort(key=lambda x:x[5], reverse=True)
     for current in rows:
         if current[1] == current[6]:
             break
     else:
         current = rows[0]
     print '\t'.join(current)
#============================================================


The output is:

aaa	AAA	...	...	...	20070120	AAA
bbb	AAA	...	...	...	20071212	AAA
ccc	AAA	...	...	...	20071212	AAA

and it is the same as the output of your original code on this data.
Further testing would naturally be beneficial.

Cheers,
Jussi



More information about the Python-list mailing list