Comparing strings from the back?

Duncan Booth duncan.booth at invalid.invalid
Tue Sep 11 05:41:50 EDT 2012


Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:

> But for the record, in principle string comparisons *could* be the 
> bottleneck. Example: you have 10000 strings, which are each created
> once and stored in a list. Then you iterate over the list, comparing
> every string against every other string. And due to some weird vagary
> of the data, the strings are nearly all equal.
> 
> (Why would you do this? I don't know, maybe it's a programmers'
> challenge found on the Internet, make up your own scenario...)
> 
> Total number of strings created: 10000.
> Total number of strings compared: 100000000.

which is exactly what I meant by doing a lot of comparisons (10000) on
the same string. Perhaps if I'd phrased it more as "you have to be doing
many more comparisons than string creation operations" it would have
been clearer what I meant. 


> The overhead of creating the strings is trivial compared to the
> overhead of comparing them, and since each string is only created once
> anyway, interning them is just a waste of time.

No, you created 10k strings many of which are equal and then did 10k
comparisons on each most of which found 'yes' they are equal. Interning
them would have reduced all the 'true' comparisons to an identity check 
at the cost of 1 hash and 1 full comparison per string. 


>> so at the expense of a single dictionary
>> insertion when the string is created you can get guaranteed O(1) on
>> all the comparisons.
> 
> What interning buys you is that "s == t" is an O(1) pointer compare if
> they are equal. But if s and t differ in the last character, __eq__
> will still inspect every character. There is no way to tell Python
> "all strings are interned, if s is not t then s != t as well".

Right if the strings differ only in the last character, but the rest of
this thread has been about how, for random strings, the not-equal case
is O(1) as well. 

-- 
Duncan Booth http://kupuguy.blogspot.com



More information about the Python-list mailing list