is implemented with id ?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sat Nov 3 23:10:24 EDT 2012


On Sun, 04 Nov 2012 01:14:29 +0000, Oscar Benjamin wrote:

> On 3 November 2012 22:50, Chris Angelico <rosuav at gmail.com> wrote:
>> This one I haven't checked the source for, but ISTR discussions on this
>> list about comparison of two unequal interned strings not being
>> optimized, so they'll end up being compared char-for-char. Using 'is'
>> guarantees that the check stops with identity. This may or may not be
>> significant, and as you say, defending against an uninterned string
>> slipping through is potentially critical.
> 
> The source is here (and it shows what you suggest):
> http://hg.python.org/cpython/file/6c639a1ff53d/Objects/
unicodeobject.c#l6128

I don't think it does, although I could be wrong, I find reading C to be 
quite difficult.

The unicode_compare function compares character by character, true, but 
it doesn't get called directly. The public interface is 
PyUnicode_Compare, which includes this test before calling 
unicode_compare:

/* Shortcut for empty or interned objects */
if (v == u) {
    Py_DECREF(u);
    Py_DECREF(v);
    return 0;
}
result = unicode_compare(u, v);

where v and u are pointers to the unicode object.

So it appears that the test for strings being equal length have been 
dropped, but the identity test is still present.

> Comparing strings char for char is really not that big a deal though.

Depends on how big the string and where the first difference is.

> This has been discussed before: you don't need to compare very many
> characters to conclude that strings are unequal (if I remember correctly
> you were part of that discussion).

On average. Worst case, you have to look at every character.



-- 
Steven



More information about the Python-list mailing list