Comparing strings from the back?

Johannes Bauer dfnsonfsduifb at gmx.de
Wed Sep 5 05:24:52 EDT 2012


On 04.09.2012 23:59, Chris Angelico wrote:

>> n = (256 / 255) * (1 - 256 ^ (-c))
>>
>> where n is the average number of character comparisons and c. The
>> rationale as follows: The first character has to be compared in any
>> case. The second with a probability of 1/256, the third with 1/(256^2)
>> and so on.
> 
> That would be for comparing two random areas of memory. Python strings
> don't have 256 options per character; and in terms of actual strings,
> there's so many possibilities. 

The unit of comparison is completely arbitraty and can equally be used
to compare bitstrings if changed accordingly.

> The strings that a program is going to
> compare for equality are going to use a vastly restricted alphabet;
> for a lot of cases, there might be only a few dozen plausible
> characters.

This is very true. But if so, I would like to see the assumtion that
you're making (which might very well be plausible) in order to arrive at
a average of N/2 comparisons (which just seems *way* off).

> But even so, it's going to scale approximately linearly with the
> string length. If they're really random, then yes, there's little
> chance that either a 1MB string or a 2MB string will be the same, but
> with real data, they might very well have a long common prefix. So
> it's still going to be more or less O(n).

I understand what you're saying and surely this is true to some extent
(someone mentioned filenames, which is an excellent example). However in
order to derive a *formula* you need to formularize your assumtions.
With random data, it's definitely not O(n). And even in reality (with a
more limited alphabet) it usually will early abort:

Consider sorting a large dictionary. Sorting is in O(n log(n)), but this
assumes O(1) comparisons! If the comparison operation itself were in
O(n), the total sorting complexity would be O(n^2 log(n)), which is
definitely false. Most comparisons will abort *very* early (after the
first character).

Best regards,
Johannes

-- 
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?
> Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
 - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1 at speranza.aioe.org>



More information about the Python-list mailing list