Python performance notes...

Moshe Zadka moshez at math.huji.ac.il
Sat May 27 05:11:35 EDT 2000


On Thu, 25 May 2000, Brett g Porter wrote:

> > So?  What's your point?  Try writing the following code in C:
> >
> >
> > How long did it take you?  Did it run as fast as the Python code?  Was
> > the overhead of the two loops here really a significant factor?
> 
> here it is in C++. Took me about the same amount of time it woulda taken me
> to write it in Python, actually...
> Your point is well taken, nonetheless...
> 
> // splitter.cpp : Defines the entry point for the console application.
> //
> 
> #include <fstream>
> #include <iostream>
> #include <map>
> #include <string>
> 
> 
> using namespace std;
> 
> int main(int argc, char* argv[])
> {
>    ifstream f(argv[1]);
> 
>    string newWord;
>    map<string, int> unique;
> 
>    while (f)
>    {
>       f >> newWord;
>       unique[newWord]+= 1;
>    }
> 
>    for (map<string, int>::iterator i = unique.begin();
>    i != unique.end(); ++i)
>    {
>       cout << i->first << endl;
>    }
> 
>     return 0;
> }

Well, of course, it is much longer in C++, but just for the heck of it,
here's the real challenge: change it so you'd print

word lineno lineno .....

And the words have to be sorted. 

import string, sys

uniq = {}
f=open(sys.argv[1])
lines = f.readlines()
f.close()
for i in range(len(lines)):
	words = string.split(lines[i])
	for word in words:
		uniq[word] = uniq.get(word, [])
		uniq[word].append(i)
words = uniq.items()
words.sort()
for word, lines in words:
	print word,
	for line in lines:
		print line,
	print

(It took me 3 minutes to take Aahz's script and change it, and I had one
bug: forgot to import sys and string)

In 1 more minute, it is possible to change the definition of a word to "a
consecutive alphabetic characters". In 30 more seconds, it is possible to 
make strings case insensitive, and 30 more seconds to disregard "_"s. 

How much time would it take in C++? 

Remember that map<string, vector<int>> is invalid.

--
Moshe Zadka <moshez at math.huji.ac.il>
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com





More information about the Python-list mailing list