Replacing words from strings except 'and' / 'or' / 'and not'

John Machin sjmachin at lexicon.net
Sun Nov 28 18:33:13 EST 2004


Skip Montanaro <skip at pobox.com> wrote in message news:<mailman.6853.1101656845.5135.python-list at python.org>...
> >> > Is there a reason to use sets here? I think lists will do as well.
>     >> 
>     >> Sets are implemented using dictionaries, so the "if w in KEYWORDS"
>     >> part would be O(1) instead of O(n) as with lists...
>     >> 
>     >> (I.e. searching a list is a brute-force operation, whereas
>     >> sets are not.)
> 
>     Jp>   And yet... using sets here is slower in every possible case:
>     ...
>     Jp>   This is a pretty clear example of premature optimization.
> 
> I think the set concept is correct.  The keywords of interest are best
> thought of as an unordered collection.  Lists imply some ordering (or at
> least that potential).  Premature optimization would have been realizing
> that scanning a short list of strings was faster than testing for set
> membership and choosing to use lists instead of sets.
> 
> Skip

Jp scores extra points for pre-maturity by not trying out version 2.4,
by not reading the bit about sets now being built-in, based on dicts,
dicts being one of the timbot's optimise-the-snot-out-of targets ...
herewith some results from a box with a 1.4Ghz Athlon chip running
Windows 2000:

C:\junk>\python24\python \python24\lib\timeit.py  -s "from sets import
Set; x = Set(['and', 'or', 'not'])" "None in x"
1000000 loops, best of 3: 1.81 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "from sets import
Set; x = Set(['and', 'or', 'not'])" "None in x"
1000000 loops, best of 3: 1.77 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = set(['and',
'or', 'not'])" "None in x"
1000000 loops, best of 3: 0.29 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = set(['and',
'or', 'not'])" "None in x"
1000000 loops, best of 3: 0.289 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = ['and',
'or', 'not']" "None in x"
1000000 loops, best of 3: 0.804 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = ['and',
'or', 'not']" "None in x"
1000000 loops, best of 3: 0.81 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "from sets import
Set; x = Set(['and', 'or', 'not'])" "'and' in x"
1000000 loops, best of 3: 1.69 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = set(['and',
'or', 'not'])" "'and' in x"
1000000 loops, best of 3: 0.243 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = set(['and',
'or', 'not'])" "'and' in x"
1000000 loops, best of 3: 0.245 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = ['and',
'or', 'not']" "'and' in x"
1000000 loops, best of 3: 0.22 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = ['and',
'or', 'not']" "'and' in x"
1000000 loops, best of 3: 0.22 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = set(['and',
'or', 'not'])" "'not' in x"
1000000 loops, best of 3: 0.257 usec per loop

C:\junk>\python24\python \python24\lib\timeit.py  -s "x = ['and',
'or', 'not']" "'not' in x"
1000000 loops, best of 3: 0.34 usec per loop

tee hee ...



More information about the Python-list mailing list