[New-bugs-announce] [issue17628] str==str: compare the first and last character before calling memcmp()

Wed Apr 3 23:29:12 CEST 2013

New submission from STINNER Victor:

In Python 3.4, str==str is implemented by calling memcmp().

unicode_eq() function, used by dict and set types, checks the first byte before calling memcmp(). bytes==bytes uses the same check.

Py_UNICODE_MATCH macro checks the first *and* last character before calling memcmp() since this commit:
---
changeset:   38242:0de9a789de39
branch:      legacy-trunk
user:        Fredrik Lundh <fredrik at pythonware.com>
date:        Tue May 23 10:10:57 2006 +0000
files:       Include/unicodeobject.h
description:
needforspeed: check first *and* last character before doing a full memcmp
---

Attached patch changes str==str to check the first and last character before calling memcmp(). It might reduce the overhead of a C function call, but it is much faster when comparing two different strings of the same length with a common prefix (but a different suffix).

The patch merges also unicode_compare_eq() and unicode_eq() to use the same code for str, dict and set.

We may use the same optimization on byte strings.

See also #16321.

----------
files: unicode_eq.patch
keywords: patch
messages: 185956
nosy: haypo, pitrou, serhiy.storchaka
priority: normal
severity: normal
status: open
title: str==str: compare the first and last character before calling memcmp()
versions: Python 3.4
Added file: http://bugs.python.org/file29668/unicode_eq.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17628>
_______________________________________