Performance issue with CPython 3.10 + Cython

Andreas Ames andreas.0815.qwertz at gmail.com
Fri Oct 7 10:39:00 EDT 2022


Answering to myself, just for the records:

1. The culprit was me. As lazy as I am, I have used f-strings all over the
place in calls to `logging.logger.debug()` and friends, evaluating all
arguments regardless of whether the logger was enabled or not.  Replacing
these f-strings by regular printf-like format strings solved the issue.
Now the application bowls happily along, consistently below 0.02 seconds
per second application time.
2. Valgrind + callgrind is an awesome toolchain to spot performance issues,
even on VMs.


Am Di., 4. Okt. 2022 um 11:05 Uhr schrieb Andreas Ames <
andreas.0815.qwertz at gmail.com>:

> Hi all,
>
> I am wrapping an embedded application (, which does not use any dynamic
> memory management,) using Cython to call it from CPython.  The wrapped
> application uses a cyclic executive, i.e. everything is done in the
> input-logic-output design, typical for some real-time related domains.
> Consequentially, every application cycle executes more or less the very
> same code.  As I am still in a prototyping stadium, the wrapped process is
> completely CPU-bound, i.e. except of some logging output there is no I/O
> whatsoever.
>
> During one second of "application time", I am doing exactly 120 calls into
> the application through three Cython-wrapped API functions.  The
> application uses some platform-dependent APIs, which I have also wrapped
> with Cython, so that there are numerous callbacks into the Python realm per
> call into the application. What I am observing now, is that the performance
> per "application second" decreases (remember: executing code that does the
> same thing on every cycle) and extending the number of loop iterations does
> not seem to cause any bound to this decrease.  In the log ouput below, you
> can see the GC counts, which look innocent to me.  The "real time" is
> measured using "time.time()". The "top" utility does not suggest any memory
> leak either.  I am developing on WSL2, but I have checked that this
> performance effect also happens on physical machines.  Right now, I am
> staring at "kcachegrind", but I have no idea, how to determine time series
> for the performance of functions (I am not looking for those functions,
> which need the most time, but for those, which consume more and more
> execution time).
>
> One more thing to look for could be memory fragmentation, but before that
> I would like to ask the experts here for their ideas and experiences and/or
> for tools, which could help to find the culprit.
>
> 2022-10-04 10:40:50|INFO    |__main__       |Execution loop 0 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.06862711906433105 seconds real time.
>> 2022-10-04 10:40:51|INFO    |__main__       |Execution loop 100 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.08224177360534668 seconds real time.
>> 2022-10-04 10:40:52|INFO    |__main__       |Execution loop 200 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.08225250244140625 seconds real time.
>> 2022-10-04 10:40:53|INFO    |__main__       |Execution loop 300 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.10176873207092285 seconds real time.
>> 2022-10-04 10:40:54|INFO    |__main__       |Execution loop 400 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.10900592803955078 seconds real time.
>> 2022-10-04 10:40:55|INFO    |__main__       |Execution loop 500 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.12233948707580566 seconds real time.
>> 2022-10-04 10:40:56|INFO    |__main__       |Execution loop 600 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.14058256149291992 seconds real time.
>> 2022-10-04 10:40:58|INFO    |__main__       |Execution loop 700 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.14777183532714844 seconds real time.
>> 2022-10-04 10:40:59|INFO    |__main__       |Execution loop 800 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.15729451179504395 seconds real time.
>> 2022-10-04 10:41:01|INFO    |__main__       |Execution loop 900 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.17365813255310059 seconds real time.
>> 2022-10-04 10:41:03|INFO    |__main__       |Execution loop 1000 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.17772984504699707 seconds real time.
>> 2022-10-04 10:41:05|INFO    |__main__       |Execution loop 1100 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.1955263614654541 seconds real time.
>> 2022-10-04 10:41:07|INFO    |__main__       |Execution loop 1200 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.20046710968017578 seconds real time.
>> 2022-10-04 10:41:09|INFO    |__main__       |Execution loop 1300 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.22513842582702637 seconds real time.
>> 2022-10-04 10:41:11|INFO    |__main__       |Execution loop 1400 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.23578548431396484 seconds real time.
>> 2022-10-04 10:41:13|INFO    |__main__       |Execution loop 1500 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.24581527709960938 seconds real time.
>> 2022-10-04 10:41:16|INFO    |__main__       |Execution loop 1600 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.2541334629058838 seconds real time.
>> 2022-10-04 10:41:19|INFO    |__main__       |Execution loop 1700 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.26812195777893066 seconds real time.
>> 2022-10-04 10:41:21|INFO    |__main__       |Execution loop 1800 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.28777456283569336 seconds real time.
>> 2022-10-04 10:41:24|INFO    |__main__       |Execution loop 1900 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.3005337715148926 seconds real time.
>> 2022-10-04 10:41:28|INFO    |__main__       |Execution loop 2000 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.30992960929870605 seconds real time.
>> 2022-10-04 10:41:31|INFO    |__main__       |Execution loop 2100 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.32058119773864746 seconds real time.
>> 2022-10-04 10:41:34|INFO    |__main__       |Execution loop 2200 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.33020949363708496 seconds real time.
>> 2022-10-04 10:41:37|INFO    |__main__       |Execution loop 2300 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.34426307678222656 seconds real time.
>> 2022-10-04 10:41:41|INFO    |__main__       |Execution loop 2400 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.35851263999938965 seconds real time.
>> 2022-10-04 10:41:45|INFO    |__main__       |Execution loop 2500 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.3664553165435791 seconds real time.
>> 2022-10-04 10:41:48|INFO    |__main__       |Execution loop 2600 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.3767662048339844 seconds real time.
>> 2022-10-04 10:41:52|INFO    |__main__       |Execution loop 2700 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.39760732650756836 seconds real time.
>> 2022-10-04 10:41:56|INFO    |__main__       |Execution loop 2800 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.42221736907958984 seconds real time.
>> 2022-10-04 10:42:01|INFO    |__main__       |Execution loop 2900 done. GC
>> counts = (381, 9, 3); 1 second of application time corresponds to
>> 0.4237234592437744 seconds real time.
>
>
> Thanks in advance,
>
> Andreas
>
>


More information about the Python-list mailing list