Comparing strings from the back?

Dave Angel d at davea.name
Thu Sep 6 11:54:58 EDT 2012


On 09/06/2012 10:42 AM, Johannes Bauer wrote:
> On 06.09.2012 16:23, Dave Angel wrote:
>> On 09/06/2012 09:43 AM, Johannes Bauer wrote:
>>> <snip>
>>> Yes, worst-case is O(N), best case O(1). Average is O(n log n).
>> Can't see how you came up with an average of n log(n).  Fourteen minutes
>> before you made this post, you demonstrated it was less than 2 for any N.
>>
>> And I previously posted that for a character set of size 10, the average
>> was 1.1111
> Again playing with the equations and thinking about it again. And I
> completely missed your point. It wasn't about n log n vs. log n. Both
> are wrong. This surprises me. I was somehow thinking about the limit of
> sum (1/n), n -> infty. But since the summands are shrinking
> exponentially, the limit is different.
>
> I think the limit of the average comparisons for a given wordlength n
> against infinity with alphabet size l is
>
> l / (l - 1)
>
> i.e. for bitstrings it's 2 and for bytestrings it's 256/255.
>
> This would mean string comparison of randomized strings is O(1). Can
> that really be true? It looks like it.
>
(Just lost the internet in a storm, so I'm not sure how long it'll be
before this sends.)

Thanks, that was exactly my point.  Since el is at least 2, the average
number of comparisons is no larger than 2, for any value of N.  That's
why, when I'm visually comparing MD5 values, I usually only look at the
first few, and last few hex digits.

However, Chris Angelico (at 10:39) pointed out again that totally random
strings aren't real-world equivalent.

He has now proposed that somebody measure real-world cases here, by
patching the string comparison to gather statistics.  I think that's
beyond me at this point.  I only joined this thread when the cases under
study were well-defined random, and therefore predictable.  <g>



-- 

DaveA




More information about the Python-list mailing list