Replacing words from strings except 'and' / 'or' / 'and not'

Jp Calderone exarkun at divmod.com
Sun Nov 28 19:16:35 EST 2004


On 28 Nov 2004 15:33:13 -0800, sjmachin at lexicon.net (John Machin) wrote:
>Skip Montanaro <skip at pobox.com> wrote in message news:<mailman.6853.1101656845.5135.python-list at python.org>...
> > >> > Is there a reason to use sets here? I think lists will do as well.
> >     >> 
> >     >> Sets are implemented using dictionaries, so the "if w in KEYWORDS"
> >     >> part would be O(1) instead of O(n) as with lists...
> >     >> 
> >     >> (I.e. searching a list is a brute-force operation, whereas
> >     >> sets are not.)
> > 
> >     Jp>   And yet... using sets here is slower in every possible case:
> >     ...
> >     Jp>   This is a pretty clear example of premature optimization.
> > 
> > I think the set concept is correct.  The keywords of interest are best
> > thought of as an unordered collection.  Lists imply some ordering (or at
> > least that potential).  Premature optimization would have been realizing
> > that scanning a short list of strings was faster than testing for set
> > membership and choosing to use lists instead of sets.
> > 
> > Skip
> 
> Jp scores extra points for pre-maturity by not trying out version 2.4,
> by not reading the bit about sets now being built-in, based on dicts,
> dicts being one of the timbot's optimise-the-snot-out-of targets ...
> herewith some results from a box with a 1.4Ghz Athlon chip running
> Windows 2000:
> 
> [snip - builtin `set' faster than sets.Set]

  John,

    Thanks for pointing out the existence of the new, built-in set type.  I was well aware of it and the performance improvements it brings, but others may not have been.  Since Python 2.4 hasn't actually be released yet, I don't think it could be of any help to the original poster, although I am well aware of the proclivity of a significant portion of the Python community to eagerly begin developing against pre-release versions of Python.

    Since the code I was claiming suffered from premature optimization didn't use the set type, I'm not sure of the relevance of this point.  In fact, it is trivially _more_ work to convert from usage of sets.Set to usage of set than it is to convert from usage of lists to usage of set.  If anything, the list version would have been easier to migrate, once 2.4 was a suitable deployment platform.

    As Skip pointed out, though, Sets can be seen as a conceptually better fit for the problem, so the performance or simplified migration of lists over set.Sets is hardly a justification for the use of lists.  My post was intended solely to rebut the position that sets.Set would desirable because they were more efficient, as the poster to whom I was responding had claimed.  I also tried to emphasize the fact that optimization is senseless without first identifying bottlenecks and that the function in question almost certainly was not even a blip on the performance radar of the program to which it belonged, but perhaps I was not sufficiently clear on that point.

  In any case, thanks for the points.  I'll try to exchange them for some alms at the next town. :)

  Hope this resolves some confusion,

  Jp



More information about the Python-list mailing list