[Cython] Redundant Cython exception message strings

Robert Bradshaw robertwb at math.washington.edu
Sat May 28 18:15:26 CEST 2011


On Sat, May 28, 2011 at 2:37 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Robert Bradshaw, 28.05.2011 00:39:
>>
>> On Fri, May 27, 2011 at 3:32 PM, Stefan Behnel wrote:
>>>
>>> I recently stumbled over a tradeoff question with AttributeError, and now
>>> found the same situation for UnboundLocalError in Vitja's control flow
>>> branch. So here it is.
>>>
>>> When we raise an exception several times in different parts of the code
>>> with
>>> a message that only differs slightly each time (usually something like
>>> "'NoneType' has no attribute X", or "local variable X referenced before
>>> assignment"), we have three choices to handle this:
>>>
>>> 1) Optimise for speed: create a Python string object at module
>>> initialisation time and call PyErr_SetObject(exc_type, msg_str_obj).
>>>
>>> 2) Current way: let CPython create the string object when raising the
>>> exception and just call PyErr_SetString(exc_type, "complete message").
>>>
>>> 3) Trade speed for size and allow the C compiler to reduce the storage
>>> redundancy: write only the message template and the names as C char*
>>> constants by calling PyErr_Format(exc_type, "message template %s", "X").
>>>
>>> Assuming that exceptions should be exceptional, I'm leaning towards 3).
>>> This
>>> would allow the C compiler to collapse multiple usages of the same C
>>> string
>>> into one data constant, thus reducing a bit of redundancy in the shared
>>> library size and the memory footprint. However, it would (slightly?) slow
>>> down the exception raising due to the additional string formatting, even
>>> when compared to the need to build a Python string object that it shares
>>> with 2). While 1) would obviously be the fastest way to raise an
>>> exception
>>> (no memory allocation, only refcounting), I think it's not worth it for
>>> exceptions as it increases both the runtime memory overhead and the
>>> module
>>> startup time.
>>
>> Any back-of-the-envelope calculations on how much the savings would
>> be?
>
> As a micro benchmark, I wrote three C functions that do 10 exception setting
> calls and then clear the exception, and called those 10x in a loop (i.e. 100
> exceptions). Results:
>
> 1) PyErr_SetObject(PyExc_TypeError, Py_None)
> Py3.3: 1000000 loops, best of 3: 1.42 usec
> Py2.7: 1000000 loops, best of 3: 0.965 usec
>
> 2) PyErr_SetString(PyExc_TypeError, "[complete message]")
> Py3.3: 100000 loops, best of 3: 11.2 usec
> Py2.7: 100000 loops, best of 3: 4.85 usec
>
> 3) PyErr_Format(PyExc_TypeError, "[message %s template]", "Abc1")
> Py3.3: 10000 loops, best of 3: 37.3 usec
> Py2.7: 10000 loops, best of 3: 25.3 usec
>
> Observations: these are really tiny numbers for 100 exceptions. The string
> formatting case is only some 0.3 microseconds (25x) slower per exception
> than the constant pointer case, and about 0.2 microseconds (4-5x) slower
> than the C string constant case.
>
> Note that this only benchmarks the exception setting, not the catching, i.e.
> without the instantiation of the exception object etc., which is identical
> for all three cases.
>
> This change would only apply to Cython generated exceptions (from None
> safety checks, unbound locals, etc.), which can appear in a lot of places in
> the C code but should not normally be triggered in production code. If they
> occur, we'd loose about 0.2 microseconds per exception, comparing 2) and 3).
> I think that's totally negligible, given that these exceptions potentially
> indicate a bug in the user code.
>
> "strings" tells me that the C compiler really only keeps one copy of the
> string constants. The savings per exception message are somewhere between 30
> and 40 bytes. Not much in today's categories. Assuming even 1000 such
> exceptions in a large module, that's only some 30K of savings, whereas such
> a module would likely have a total stripped size of a *lot* more than 1MB.
>
> Personally, I think that the performance degradation is basically
> non-existent, so the space savings come almost for free, however tiny they
> may be.

Sounds good. I'm fine with 2 or 3, and despite the performance
advantage of 1, it should be the exceptional case to raise this kind
of error, and the module initialization time is and issue.

- Robert


More information about the cython-devel mailing list