making a valid file name...

Tim Chase python.list at tim.thechases.com
Tue Oct 17 13:50:58 EDT 2006


>> If you're doing it on a time-critical basis, it might help to
>> make "valid" a set, which should have O(1) membership testing,
>> rather than using the "in" test with a string.  I don't know
>> how well the find() method of a string performs in relationship
>> to "in" testing of a set.  Test and see, if it's important.
> 
> The find method of (8-bit) strings is really, really fast. My
> guess is that set can't beat it. I tried to beat it recently with
> a binary search function. Even after applying psyco find was
> still faster (though I could beat the bisect functions by a
> little bit by replacing a divide with a shift).

In "theory" (you know...that little town in west Texas where 
everything goes right), a set-membership test should be O(1).  A 
binary search function would be O(log N).  A linear search of a 
string for a member should be O(N).

In practice, however, for such small strings as the given 
whitelist, the underlying find() operation likely doesn't put a 
blip on the radar.  If your whitelist were some huge document 
that you were searching repeatedly, it could have worse 
performance.  Additionally, the find() in the underlying C code 
is likely about as bare-metal as it gets, whereas the set 
membership aspect of things may go through some more convoluted 
setup/teardown/hashing and spend a lot more time further from the 
processor's op-codes.

And I know that a number of folks have done some hefty 
optimization of Python's string-handling abilities.  There's 
likely a tradeoff point where it's better to use one over the 
other depending on the size of the whitelist.  YMMV

-tkc










More information about the Python-list mailing list