Suggest more finesse, please. I/O and sequences.
Qertoip
qer1 at o2.pl
Fri Mar 25 17:06:17 EST 2005
Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisał(a):
Thanks for your reply! It was really enlightening.
> How about:
> for line in inFile:
> for word in line.split():
> try:
> corpus[word] += 1
> except KeyError:
> corpus[word] = 1
Above is (probably) not efficient when exception is thrown, that is most of
the time (for any new word). However, I've just read about the following:
corpus[word] = corpus.setdefault( word, 0 ) + 1
>> wordsLst = wordsDic.items()
>> wordsLst.sort( moreCommonWord )
> OK, here I'm going to get version specific.
> For Python 2.4 and later:
> words = sorted((-freq, word) for word, freq in corpus.iteritems())
This is my favorite! :) You managed to avoid moreCommonWord() through the
clever use of list comprehensions and sequences comaparison rules.
> After python 2.2:
> for negfrequency, word in words:
> print >>outFile, '%7d : %s' % (-negfrequency, word)
This is also cool, I didn't know about this kind of 'print' usage.
> So, with all my prejudices in place and python 2.4 on my box, I'd
> lift a few things to functions:
While I like your functionality and reusability improvements, I will stick
to my as-simple-as-possible solution for given requirements (which I didn't
mention, and which assume correct command line arguments for example).
Therefore, the current code is:
-------------------------------------------------------------------------
import sys
corpus = {}
inFile = open( sys.argv[1] )
for line in inFile:
for word in line.split():
corpus[word] = corpus.setdefault( word, 0 ) + 1
inFile.close()
words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() )
outFile = open( sys.argv[2], 'w')
for negFreq, word in words:
print >>outFile, '%7d : %s' % ( -negFreq, word )
outFile.close()
-------------------------------------------------------------------------
Any ideas how to make it even better? :>
--
Regards,
Piotrek
More information about the Python-list
mailing list