Perl and Python, a practical side-by-side example.

John Machin sjmachin at lexicon.net
Fri Mar 2 19:33:57 EST 2007


On Mar 3, 9:44 am, "Shawn Milo" <S... at Milochik.com> wrote:
> I'm new to Python and fairly experienced in Perl, although that
> experience is limited to the things I use daily.
>
> I wrote the same script in both Perl and Python, and the output is
> identical. The run speed is similar (very fast) and the line count is
> similar.
>
> Now that they're both working, I was looking at the code and wondering
> what Perl-specific and Python-specific improvements to the code would
> look like, as judged by others more knowledgeable in the individual
> languages.
>
> I am not looking for the smallest number of lines, or anything else
> that would make the code more difficult to read in six months. Just
> any instances where I'm doing something inefficiently or in a "bad"
> way.
>
> I'm attaching both the Perl and Python versions, and I'm open to
> comments on either. The script reads a file from standard input and
> finds the best record for each unique ID (piid). The best is defined
> as follows: The newest expiration date (field 5) for the record with
> the state (field 1) which matches the desired state (field 6). If
> there is no record matching the desired state, then just take the
> newest expiration date.
>

[big snip]
Here is my rewrite in what I regard as idiomatic reasonably-modern
Python (OMMV of course). A few of the comments are applicable
irrespective of the language used.

HTH,
John
8<------
#! /usr/bin/env python

### Layout: Use 4-space indent. Don't use tabs. Don't exceed 79 chars
per line.

import sys

def process_file(opened_file=sys.stdin):
    ### Local variable access is faster than global

    ### input = sys.stdin ### shadowing built_in function "input"

    ### Use names to access elements in rows
    PIID = 0
    ACTUAL_STATE = 1
    DESIRED_STATE = 6
    EXPIRY_DATE = 5

    recs = {}

    for row in opened_file:
        row = row.rstrip('\n').split('\t')
        ### Do the split('\t') *once* per row
        piid = row[PIID]
        ### if recs.has_key(piid) is False:
        ### has_key() is ancient
        ### "if boolean_expression is False" is megabletchworthy;
        ### use "if not boolean_expression"
        if piid not in recs:
            recs[piid] = []
        recs[piid].append(row)

    ### for piid in recs.keys():
    for piid in recs:
        best = None ### use out-of-band sentinel
        for current in recs[piid]:
            if best is None:
                best = current ### had cockroach crap (";") at EOL
            else:
                #If the current record is the correct state
                ### Clear code (like the next line) doesn't need
comments
                ### like the above line
                if current[ACTUAL_STATE] == current[DESIRED_STATE]:
                    if best[ACTUAL_STATE] == best[DESIRED_STATE]:
                        if current[EXPIRY_DATE] > best[EXPIRY_DATE]:
                            best = current
                    else:
                        best = current
                else:
                    if (best[ACTUAL_STATE] != best[ACTUAL_STATE]
                    and current[EXPIRY_DATE] > best[EXPIRY_DATE]):
                        best = current
        print "\t".join(best)

if __name__ == "__main__":
    ### Standard idiom to avoid executing script content
    ### when/if file is imported
    process_file()
8<----





More information about the Python-list mailing list