"Newbie" questions - "unique" sorting ?
Cousin Stanley
CousinStanley at hotmail.com
Tue Jun 24 04:55:30 EDT 2003
| How about the simple approach?
| ...
Kim ...
The approach I used is fairly simple and similar
to the one you posted, basically just
stuffing words from split lines into
a dictionary ...
The following script produces an indexed list
fairly quickly for relatively small files,
but dogged out on the input file John supplied
which yielded ...
Total Words .... 467381
Unique Words .... 47122
Perhaps skipping the dictionary word count update
in the following line might speed things up ...
else :
dict_words[ this_word ] += 1
--
Cousin Stanley
Human Being
Phoenix, Arizona
-------------------------------------------------------------------
'''
Module ........... word_list.py
Usage ............ python word_list.py File_In.txt File_Out.txt
NewsGroup ........ comp.lang.python
Date ............. 2003-06-18
Posted_By ........ John Fitzsimmons
Replies_From ..... [ kpop , Erik Max Francis ]
Coded_By ......... Stanley C. Kitching
'''
import math
import sys
import time
time_in = time.time()
NL = '\n'
module_name = sys.argv[ 0 ]
print '%s %s ' % ( NL , module_name )
path_in = sys.argv[ 1 ]
path_out = sys.argv[ 2 ]
file_in = file( path_in , 'r' )
file_out = file( path_out , 'w' )
word_total = 0
dict_words = {}
print
print ' Indexing Words .... ' ,
for iLine in file_in :
if math.fmod( word_total , 1000 ) == 0 :
print '.' ,
list_words = iLine.strip().split()
for this_word in list_words :
if this_word not in dict_words.keys() :
dict_words[ this_word ] = 1
else :
dict_words[ this_word ] += 1
word_total += 1
list_words = dict_words.keys()
list_words.sort( lambda x , y : cmp( x.lower() , y.lower() ) )
print NL
print ' Writing Output File ....' ,
for this_word in list_words :
word_count = dict_words[ this_word ]
str_out = '%6d %s %s' % ( word_count , this_word , NL )
file_out.write( str_out )
word_str = '%s Total Words .... %d %s' % ( NL , word_total , NL )
keys_total = len( dict_words.keys() )
keys_str = '%s Unique Words .... %d %s' % ( NL , keys_total , NL )
file_out.write( word_str )
file_out.write( keys_str )
print NL
print ' Complete .................'
print
print ' Total Words ....' , word_total
print
print ' Unique Words ....' , keys_total
file_in.close()
file_out.close()
time_out = time.time()
time_diff = time_out - time_in
print NL
print ' Process Time ........ %-6.2f Seconds' % ( time_diff )
More information about the Python-list
mailing list