Python performance notes...
Daniel Berlin
dan at cgsoftware.com
Thu May 25 17:16:35 EDT 2000
On Thu, 25 May 2000, Brett g Porter wrote:
>
> "Aahz Maruch" <aahz at netcom.com> wrote in message
> news:8gjcc1$nam$1 at nntp9.atl.mindspring.net...
> > So? What's your point? Try writing the following code in C:
> >
> > import re,sys
> > f=open(sys.argv[1])
> > s=f.read()
> > f.close()
> > words=re.split(s,r'\s')
> > uniqWords = {}
> > for word in words:
> > uniqWords[word] = 1
> > for word in uniqWords.keys():
> > print word
> >
> > How long did it take you? Did it run as fast as the Python code? Was
> > the overhead of the two loops here really a significant factor?
>
> here it is in C++. Took me about the same amount of time it woulda taken me
> to write it in Python, actually...
> Your point is well taken, nonetheless...
Except your version will run *much* slower given a large dataset.
map's are implemented using balanced tree algorithms, while the python
example is using a hash table.
>
> // splitter.cpp : Defines the entry point for the console application.
> //
>
> #include <fstream>
> #include <iostream>
> #include <map>
> #include <string>
>
>
> using namespace std;
>
> int main(int argc, char* argv[])
> {
> ifstream f(argv[1]);
>
> string newWord;
> map<string, int> unique;
>
> while (f)
> {
> f >> newWord;
> unique[newWord]+= 1;
> }
>
> for (map<string, int>::iterator i = unique.begin();
> i != unique.end(); ++i)
> {
> cout << i->first << endl;
> }
>
> return 0;
> }
>
>
>
More information about the Python-list
mailing list