[Cython] Specializing str methods?
Stefan Behnel
stefan_ml at behnel.de
Thu May 19 21:35:17 CEST 2011
Hi,
the right place to discuss this is the cython-devel mailing list.
John Ehresman, 19.05.2011 18:17:
> On 4/13/11 11:23 AM, Stefan Behnel wrote:
>> You can add BuiltinMethod entries for the "str" type in Builtin.py and
>> map them to a function name like "__Pyx_Str_PyString_WhatEver()". Then,
>> define "utility_code" code blocks for each of the methods that #defines
>> that name to the appropriate PyString_*() or PyUnicode_*() C-API
>> function, depending on the Python version it runs with. There are a lot
>> of examples for that in Builtin.py already.
>
> I've finally got back to this and have the start of an implementation. I've
> attached a diff, though I'm not sure what the preferred workflow is. Should
> I push to github and / or create a bug in trac?
It's best to create a pull request on github, potentially accompanied by a
request for review on the developer mailing list. We get a notification
about pull requests, but it's still better to give us an idea in what state
you think your patch is and what kind of review you need. The review can
then happen by commenting on the changes directly in github.
> I'll probably add more
> methods, but wanted to get feedback before doing so.
Certainly the right decision.
> I also intend to contribute other patches in the future.
Please do. :)
> One thing that I didn't expect is that I originally declared the method as
> BuiltinMethod("startswith", "TT", "O", "_Pyx_StrStartsWith",
> utility_code=str_startswith_utility_code),
> and expected the specialization to only be used when both self and arg
> where a str, but instead cython always used the specialization and added a
> check for arg being a str before the specialization was called. I ended up
> writing a function that works for both unicode & bytes self and an object arg.
You don't really have to care about the type of the argument when applying
this optimisation. As long as it's clear that the object that provides the
method is a "str" (although it's potentially None even then!), it's clear
what the method will do when being called, even if we don't know the
argument type at compile time. Just provide a fallback, as you did anyway.
I'll comment on your patch below.
diff --git a/Cython/Compiler/Builtin.py b/Cython/Compiler/Builtin.py
+str_startswith_utility_code = UtilityCode(
+proto="""
"proto" is for the forward declaration of the function signature. Except
for very simple cases (short macros) the implementation goes into "impl".
+static PyObject* _Pyx_StrStartsWith(PyObject* self, PyObject* arg)
+{
+ PyObject* ret;
+
+ if ( PyBytes_CheckExact(self) && PyBytes_CheckExact(arg) ) {
+ if ( PyBytes_GET_SIZE(self) < PyBytes_GET_SIZE(arg) )
+ ret = Py_False;
+ else if ( memcmp(PyBytes_AS_STRING(self), PyBytes_AS_STRING(arg),
+ PyBytes_GET_SIZE(arg)) == 0 )
+ ret = Py_True;
+ else
+ ret = Py_False;
+ Py_INCREF(ret);
+ return ret;
+ }
+
+ if ( PyUnicode_CheckExact(self) && PyUnicode_CheckExact(arg) ) {
+ if ( PyUnicode_GET_SIZE(self) < PyUnicode_GET_SIZE(arg) )
+ ret = Py_False;
+ else if ( memcmp(PyUnicode_AS_UNICODE(self), PyUnicode_AS_UNICODE(arg),
+ PyUnicode_GET_DATA_SIZE(arg)) == 0 )
+ ret = Py_True;
+ else
+ ret = Py_False;
+ Py_INCREF(ret);
+ return ret;
+ }
+
+ return PyObject_CallMethod(self, "startswith", "O", arg);
I'd split this up into two functions that can be applied to different types
independently, and wrap it in a function that calls either one depending on
PY_MAJOR_VERSION. After all, "str" is *known* to be "bytes" in Py2 and
"unicode" in Py3.
@@ -522,8 +557,12 @@ builtin_types_table = [
]),
("bytes", "PyBytes_Type", []),
- ("str", "PyString_Type", []),
+ ("str", "PyString_Type", [BuiltinMethod("startswith", "TO", "O",
"_Pyx_StrStartsWith",
+
utility_code=str_startswith_utility_code),
+ ]),
It's better to return a "bint" to avoid returning Python objects for
True/False. The type character for that is "b". There's a
"__Pyx_PyObject_IsTrue(obj)" macro that you can use to convert the fallback
return value. It's usually also ok to copy error handling code from
CPython, and to raise the exception directly without going into the
fallback case.
("unicode", "PyUnicode_Type", [BuiltinMethod("join", "TO", "T",
"PyUnicode_Join"),
+ BuiltinMethod("startswith", "TO", "O",
"_Pyx_StrStartsWith",
+
utility_code=str_startswith_utility_code),
]),
This is already optimised in Cython/Compiler/Optimize.py ("tailmatch"),
basically because it allows different numbers of arguments that need to be
dealt with. It may be worth going the same route for "str". The decision
usually depends on how complex the code transformation is. The method table
in Builtins is clearly limited.
diff --git a/tests/run/stropts.pyx b/tests/run/stropts.pyx
@@ -0,0 +1,84 @@
+cimport cython
+
+ at cython.test_assert_path_exists(
+ '//AttributeNode[@member="_Pyx_StrStartsWith"]')
+def str_startswith_str():
+ """
+ >>> str_startswith_str()
+ True
+ """
+
+ cdef str a
+ cdef str b
+
+ a = ''
+ b = ''
+ return a.startswith(b)
[...]
I usually just implement one or a few test functions and use the doctest to
pass in different values.
Stefan
More information about the cython-devel
mailing list