Python performance notes...

Daniel Berlin dan at cgsoftware.com
Thu May 25 17:16:35 EDT 2000


On Thu, 25 May 2000, Brett g Porter wrote:

> 
> "Aahz Maruch" <aahz at netcom.com> wrote in message
> news:8gjcc1$nam$1 at nntp9.atl.mindspring.net...
> > So?  What's your point?  Try writing the following code in C:
> >
> > import re,sys
> > f=open(sys.argv[1])
> > s=f.read()
> > f.close()
> > words=re.split(s,r'\s')
> > uniqWords = {}
> > for word in words:
> >   uniqWords[word] = 1
> > for word in uniqWords.keys():
> >   print word
> >
> > How long did it take you?  Did it run as fast as the Python code?  Was
> > the overhead of the two loops here really a significant factor?
> 
> here it is in C++. Took me about the same amount of time it woulda taken me
> to write it in Python, actually...
> Your point is well taken, nonetheless...

Except your version will run *much* slower given a large dataset.
map's are implemented using balanced tree algorithms, while the python
example is using a hash table.

> 
> // splitter.cpp : Defines the entry point for the console application.
> //
> 
> #include <fstream>
> #include <iostream>
> #include <map>
> #include <string>
> 
> 
> using namespace std;
> 
> int main(int argc, char* argv[])
> {
>    ifstream f(argv[1]);
> 
>    string newWord;
>    map<string, int> unique;
> 
>    while (f)
>    {
>       f >> newWord;
>       unique[newWord]+= 1;
>    }
> 
>    for (map<string, int>::iterator i = unique.begin();
>    i != unique.end(); ++i)
>    {
>       cout << i->first << endl;
>    }
> 
>     return 0;
> }
> 
> 
> 





More information about the Python-list mailing list