help make it faster please
Sion Arrowsmith
siona at chiark.greenend.org.uk
Fri Nov 11 09:53:08 EST 2005
<pkilambi at gmail.com> wrote:
>Oh sorry indentation was messed here...the
>wordlist = countDict.keys()
>wordlist.sort()
>should be outside the word loop.... now
>def create_words(lines):
> cnt = 0
> spl_set = '[",;<>{}_&?!():-[\.=+*\t\n\r]+'
> for content in lines:
> words=content.split()
> countDict={}
> wordlist = []
> for w in words:
> w=string.lower(w)
> if w[-1] in spl_set: w = w[:-1]
> if w != '':
> if countDict.has_key(w):
> countDict[w]=countDict[w]+1
> else:
> countDict[w]=1
> wordlist = countDict.keys()
> wordlist.sort()
> cnt += 1
> if countDict != {}:
> for word in wordlist: print (word+' '+
>str(countDict[word])+'\n')
>
>ok now this is the correct question I am asking...
(a) You might be better off doing:
words = words.lower()
for w in words:
...
instead of calling lower() on each separate word (and note that most
functions from string are deprecated in favour of string methods).
(b) spl_set isn't doing what you might think it is -- it looks like
you've written it as a regexp but your using it as a character set.
What you might want is:
spl_set = '",;<>{}_&?!():-[\.=+*\t\n\r'
and
while w[-1] in spl_set: w = w[:-1]
That loop can be written:
w = w.rstrip(spl_set)
(which by my timings is faster if you have multiple characters from
spl_set at the end of your word, but slower if you have 0 or 1).
--
\S -- siona at chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
___ | "Frankly I have no feelings towards penguins one way or the other"
\X/ | -- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump
More information about the Python-list
mailing list