Perl and Python, a practical side-by-side example.

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Fri Mar 2 19:07:03 EST 2007


Few suggestions, some important, some less important. All my
suggestions are untested.


Use 4 spaces to indent.


If you want to speed up this code you can move it inside a function.
After that, if you want to make it even faster you can use Psyco too.


Ho are the dates represented? How do you test what is the older one?


You seem to compute current.split("\t") and best.split("\t") many
times, so you can compute it once only.
You can keep best and best_splitted.


You can split the last line:
if best.split("\t")[1] != best.split("\t")[6] and \
   current.split("\t")[5] > best.split("\t")[5]:


input() is a builtin function, so you may want to use a different
name, or just:
for row in sys.stdin:


Instead of:
row = row.rstrip('\n')
You often may use just:
row = row.strip()


Instead of:
piid = row.split('\t')[0]
You can probably use (test it):
piid = row.split('\t', 1)[0]


Instead of:
if recs.has_key(piid) is False:
Better:
if piid not in recs:


Instead of (on Python 2.5):
recs = {}
for row in input:
    row = ...
    piid = ...
    if recs.has_key(piid) is False:
        recs[piid] = []
    recs[piid].append(row)
You can probably use:
from collection import defaultdict
recs = defaultdict(list)
for row in input:
    row = ...
    piid = ...
    recs[piid].append(row)


Instead of:
for piid in recs.keys():
You can use this, lazily:
for piid in recs:


Instead of:
for piid in recs.keys():
    best = ""
    for current in recs[piid]:
You can probably use:
for piid, piii_recs in recs.iteritems():
    best = ""
    for current in piii_recs:
But your version may be a little faster anyway.


Instead of:
best = ""
for current in recs[piid]:
    if best == "":
        best = current;
You may want to use the singleton None:
best = None
for current in recs[piid]:
    if best is None:
        best = current


Note that to read such files you may use the csv module (present in
the standard library).

You can try to modify the code and show us the second working version.

Bye,
bearophile




More information about the Python-list mailing list