Perl and Python, a practical side-by-side example.

Paul Rubin http
Fri Mar 2 21:06:02 EST 2007


Here's my version (not tested much).  Main differences from yours:

1. It defines a Python class to hold row data, and defines the __cmp__ operation
on the class, so given two Row objects r1,r2, you can say simply
   if r1 > r2: ...
to see which is "better".  

2. Instead of reading all the rows into memory and then scanning the
list of records of each piid, it simply remembers the best it has seen
for each piid.

By putting the "better than" logic into the class definition, the main
loop becomes very simple.  It does parse out and store fields on the
Row objects consuming some extra memory, but you could eliminate that
at the cost of a little code and speed by re-parsing as needed in the
comparison function.

================================================================

#! /usr/bin/env python

import sys

class Row:
    def __init__(self, row):
        self.row = row.rstrip('\n')
        fields = self.row.split('\t')
        self.piid = fields[0]
        self.state = fields[1]
        self.expiration_date = fields[5]
        self.desired_state = fields[6]

    def __cmp__(self, other):
        # return +1 if self is better than other, -1 if other is better
        # than self, or 0 if they are equally good
        if self.state == self.desired_state:
            if other.state != other.desired_state:
                return 1
            return cmp(self.expiration_date, other.expiration_date)
        elif other.expiration_date > self.expiration_date:
            # other record is better only if its exp date is newer
            return 1
        return 0
    
best = {}
input = sys.stdin

for row in input:
    r = Row(row)
    if r.piid not in best or r > best[r.piid]:
        best[r.piid] = r

for piid,r in best.iteritems():
    print r.row



More information about the Python-list mailing list