Perl and Python, a practical side-by-side example.
Bruno Desthuilliers
bdesth.quelquechose at free.quelquepart.fr
Sun Mar 4 17:07:35 EST 2007
Shawn Milo a écrit :
(snip)
> The script reads a file from standard input and
> finds the best record for each unique ID (piid). The best is defined
> as follows: The newest expiration date (field 5) for the record with
> the state (field 1) which matches the desired state (field 6). If
> there is no record matching the desired state, then just take the
> newest expiration date.
>
Here's a fixed (wrt/ test data) version with a somewhat better (and
faster) algorithm using Decorate/Sort/Undecorate (aka schwarzian transform):
import sys
output = sys.stdout
input = [
#ID STATE ... ... ... DATE TARGET
"aaa\tAAA\t...\t...\t...\t20071212\tBBB\n",
"aaa\tAAA\t...\t...\t...\t20070120\tAAA\n",
"aaa\tAAA\t...\t...\t...\t20070101\tAAA\n",
"aaa\tAAA\t...\t...\t...\t20071010\tBBB\n",
"aaa\tAAA\t...\t...\t...\t20071111\tBBB\n",
"ccc\tAAA\t...\t...\t...\t20071201\tBBB\n",
"ccc\tAAA\t...\t...\t...\t20070101\tAAA\n",
"ccc\tAAA\t...\t...\t...\t20071212\tBBB\n",
"ccc\tAAA\t...\t...\t...\t20071212\tAAA\n",
"bbb\tAAA\t...\t...\t...\t20070101\tBBB\n",
"bbb\tAAA\t...\t...\t...\t20070101\tBBB\n",
"bbb\tAAA\t...\t...\t...\t20071212\tBBB\n",
"bbb\tAAA\t...\t...\t...\t20070612\tBBB\n",
"bbb\tAAA\t...\t...\t...\t20071212\tBBB\n",
]
def find_best_match(input=input, output=output):
PIID = 0
STATE = 1
EXP_DATE = 5
DESIRED_STATE = 6
recs = {}
for line in input:
line = line.rstrip('\n')
row = line.split('\t')
sort_key = (row[STATE] == row[DESIRED_STATE], row[EXP_DATE])
recs.setdefault(row[PIID], []).append((sort_key, line))
for decorated_lines in recs.itervalues():
print >> output, sorted(decorated_lines, reverse=True)[0][1]
Lines are sorted first on whether the state matches the desired state,
then on the expiration date. Since it's a reverse sort, we first have
lines that match (if any) sorted by date descending, then the lines that
dont match sorted by date descending. So in both cases, the 'best match'
is the first item in the list. Then we just have to get rid of the sort
key, et voilà !-)
HTH
More information about the Python-list
mailing list