Python performance notes...
Moshe Zadka
moshez at math.huji.ac.il
Sat May 27 05:11:35 EDT 2000
On Thu, 25 May 2000, Brett g Porter wrote:
> > So? What's your point? Try writing the following code in C:
> >
> >
> > How long did it take you? Did it run as fast as the Python code? Was
> > the overhead of the two loops here really a significant factor?
>
> here it is in C++. Took me about the same amount of time it woulda taken me
> to write it in Python, actually...
> Your point is well taken, nonetheless...
>
> // splitter.cpp : Defines the entry point for the console application.
> //
>
> #include <fstream>
> #include <iostream>
> #include <map>
> #include <string>
>
>
> using namespace std;
>
> int main(int argc, char* argv[])
> {
> ifstream f(argv[1]);
>
> string newWord;
> map<string, int> unique;
>
> while (f)
> {
> f >> newWord;
> unique[newWord]+= 1;
> }
>
> for (map<string, int>::iterator i = unique.begin();
> i != unique.end(); ++i)
> {
> cout << i->first << endl;
> }
>
> return 0;
> }
Well, of course, it is much longer in C++, but just for the heck of it,
here's the real challenge: change it so you'd print
word lineno lineno .....
And the words have to be sorted.
import string, sys
uniq = {}
f=open(sys.argv[1])
lines = f.readlines()
f.close()
for i in range(len(lines)):
words = string.split(lines[i])
for word in words:
uniq[word] = uniq.get(word, [])
uniq[word].append(i)
words = uniq.items()
words.sort()
for word, lines in words:
print word,
for line in lines:
print line,
print
(It took me 3 minutes to take Aahz's script and change it, and I had one
bug: forgot to import sys and string)
In 1 more minute, it is possible to change the definition of a word to "a
consecutive alphabetic characters". In 30 more seconds, it is possible to
make strings case insensitive, and 30 more seconds to disregard "_"s.
How much time would it take in C++?
Remember that map<string, vector<int>> is invalid.
--
Moshe Zadka <moshez at math.huji.ac.il>
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com
More information about the Python-list
mailing list