Help me cythonize a python routine!

Wed Nov 9 14:34:15 EST 2016

On 05/11/2016 04:11, DFS wrote:
>
> It reads in a text file of the Bible, and counts the Top 20 most common
> words.
>
> http://www.truth.info/download/bible.htm
>
> ------------------------------------------------
> import time; start=time.clock()
> import sys, string
> from collections import Counter
>
> #read file
> with open(sys.argv[1],'r') as f:
> 	chars=f.read().lower()
>
> #remove numbers and punctuation
> chars=chars.translate(None,string.digits)
> chars=chars.translate(None,string.punctuation)
>
> #count words
> counts=Counter(chars.split()).most_common(20)		
>
> #print
> i=1
> for word,count in counts:
> 	print str(i)+'.',count,word
> 	i+=1
>
> print "%.3gs"%(time.clock()-start)
> ------------------------------------------------

> 1.17s isn't too bad, but it could be faster.

> Is it easy to cythonize?  Can someone show me how?
>
> I installed Cython and made some attempts but got lost.

The trouble there isn't really any Python code here to Cythonise.

All the real work is done inside the Collections module. If that was 
written in Python, then you'd have to Cythonise that, and there might be 
quite a lot of it!

But I think 'collections' is a built-in module which means it's already 
in something like C. So it might already be as fast as it gets (0.7 to 
0.8 seconds on my machine), unless perhaps a different algorithm is used.

-- 
Bartc