Comparing strings from the back?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Sep 3 22:17:30 EDT 2012


On Mon, 03 Sep 2012 21:54:01 -0400, Roy Smith wrote:

> Let's assume you're testing two strings for equality.  You've already
> done the obvious quick tests (i.e they're the same length), and you're
> down to the O(n) part of comparing every character.
> 
> I'm wondering if it might be faster to start at the ends of the strings
> instead of at the beginning?  If the strings are indeed equal, it's the
> same amount of work starting from either end.  But, if it turns out that
> for real-life situations, the ends of strings have more entropy than the
> beginnings, the odds are you'll discover that they're unequal quicker by
> starting at the end.

And if the strings have the same end and different beginnings:

    running
    jumping
    cooking

then you'll find out quicker by starting at the beginning.

No general-purpose string comparison operator can make assumptions about 
the content of the strings, like "they are nearly always pathnames on 
Unix-like systems like /home/user/x and /home/user/y".

I suppose you could have two different operators, == for "test from the 
front" and === for "test from the back", and push that decision onto the 
user, who will then spend hours angsting about the "most efficient" 
equality test and days painstakingly profiling his data to save four 
nanoseconds at runtime.

Besides, then somebody will say "Yes, but what about the cases where the 
prefix and the suffix are both equal, but the middle will be different?" 
and propose a third string-equality operator ==== and then there will be 
bloodshed.

On average, string equality needs to check half the characters in the 
string. Any individual comparisons may require fewer, or more, than that, 
but whichever way you scan the string, some comparisons will take more 
and some will take fewer. With no general way of telling ahead of time 
which will be which, on average you can't do better than O(N/2) and no 
complexity justification for picking "start from the end" from "start 
from the start".



-- 
Steven



More information about the Python-list mailing list