Comparing strings from the back?

Wed Sep 5 10:51:10 EDT 2012

On 05.09.2012 16:30, Steven D'Aprano wrote:

> Since these are *completely different Ns*, you can't combine them to get 
> O(N**2 log N) as the overall algorithmic complexity. Doing so is sheer 
> nonsense. If you state it like this:

Yes, you are correct here.

> You are making unjustified assumptions about the distribution of letters 
> in the words. This might be a list of long chemical compounds where the 
> words typically differ only in their suffix. It might be a list of people 
> with titles:

Actually, I'm not. I'm stating exactly what assumptions I'm making to
get my calculation. I'm comparing *random* character strings or bitstrings.

You, on the other hand, are making vague assumptions which you do not
care for formalize and yet you claim that "the number of comparisons is
equally likely to be 1, 2, 3, ..., N. The average then is". Without any
explanation for this. At all.

> Herr Professor Frederick Schmidt
> Herr Professor Frederick Wagner
> ...

Is your assumtion that we're comparing words that have the common prefix
"Herr Professor Frederick "? Why don't you clearly state what you're
comparing. In the example you gave in the other thread you claimed "Note
that this average assumes the strings are completely random.", yet now
you're talking about specific word patterns.

> I have no idea why you think this program demonstrates anything about 
> string equality comparisons. What you are counting is not the average 
> number of comparisons for string equality, but the number of comparisons 
> when sorting. These are *completely different*, because sorting a list 
> does not require testing each string against every other string.

You're wrong. It's not counting the number of string comparisons, it's
counting the number of character comparisons. Which is exactly what
we're talking about in this thread, are we not? The sorting of the list
is just to show some example where lots of strings are compared.

>> So, interestingly, in this real-world case(tm), the complexity does not
>> scale with O(n). It actually goes down (which is probably due to the
>> limit amount of words of higher length).
> 
> I would guess that it probably has more to do with the ability of the 
> timsort algorithm to exploit local order within a list and avoid 
> performing comparisons.

Again, it's not counting the number of string comparisons. It's counting
the average number of character comparisons per string comparison. The
total amount of string comparisons has nothing to do with it.

>> In summary, let's please not debate "feelings" or such.
> 
> I have no idea what feelings you are referring to.

I'm referring to "the number of comparisons is equally likely to be 1,
2, 3, ..., N. The average then is" without any deduction or rationale.
Without any explanation or assumption.

Regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1 at speranza.aioe.org>