From stefan_ml at behnel.de Tue Sep 19 02:56:51 2023 From: stefan_ml at behnel.de (Stefan Behnel) Date: Tue, 19 Sep 2023 08:56:51 +0200 Subject: [Cython] Can we remove the FastGIL implementation? Message-ID: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> Hi, I've seen reports that Cython's "FastGIL" implementation (which basically keeps the GIL state in a thread-local variable) is no longer faster than CPython's plain GIL implementation in recent Python 3.x versions. Potentially even slower. See the report in https://github.com/cython/cython/issues/5703 It would be helpful to get user feedback on this. If you have GIL-heavy Cython code, especially with nested with-nogil/with-gil sections across functions, and a benchmark that exercises it, could you please run the benchmark with and without the feature enabled and report the results? You can add "-DCYTHON_FAST_GIL=0" to your CFLAGS to disabled it (and "=1" to enable it explicitly). It's enabled by default in CPython 3.6-3.11 (but disabled in Cython 0.29.x on Python 3.11). Thanks, Stefan From dalcinl at gmail.com Tue Sep 19 06:58:50 2023 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 19 Sep 2023 13:58:50 +0300 Subject: [Cython] Can we remove the FastGIL implementation? In-Reply-To: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> References: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> Message-ID: Disclaimer: I may be doing something wrong, I did not put a lot of effort into it. With the microbenchmark that was offered in the GH issue, I see little difference. Use the attached zip file to reproduce yourself. Change tox.ini to "cython<3" to try 0.29.x. BTW, in the 0.29.x case, I see no compilation error as claimed in the GH issue. $ ./run.sh CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=0 Running test_gil_already_held ... took 0.08735537528991699 Running test_gil_released ... took 0.6329536437988281 py37: OK ? in 3.57 seconds Running test_gil_already_held ... took 0.09007453918457031 Running test_gil_released ... took 0.4598276615142822 py38: OK ? in 3.19 seconds Running test_gil_already_held ... took 0.10935306549072266 Running test_gil_released ... took 0.4512367248535156 py39: OK ? in 3.25 seconds Running test_gil_already_held ... took 0.09970474243164062 Running test_gil_released ... took 0.46637773513793945 py310: OK ? in 3.21 seconds Running test_gil_already_held ... took 0.08569073677062988 Running test_gil_released ... took 0.46811795234680176 py311: OK ? in 3.22 seconds Running test_gil_already_held ... took 0.15221118927001953 Running test_gil_released ... took 0.2246694564819336 py37: OK (3.57 seconds) py38: OK (3.19 seconds) py39: OK (3.25 seconds) py310: OK (3.21 seconds) py311: OK (3.22 seconds) pypy3.9: OK (5.24 seconds) congratulations :) (21.71 seconds) CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=1 Running test_gil_already_held ... took 0.08835673332214355 Running test_gil_released ... took 0.6265637874603271 py37: OK ? in 1.42 seconds Running test_gil_already_held ... took 0.09030938148498535 Running test_gil_released ... took 0.456279993057251 py38: OK ? in 1.17 seconds Running test_gil_already_held ... took 0.10986089706420898 Running test_gil_released ... took 0.45894527435302734 py39: OK ? in 1.2 seconds Running test_gil_already_held ... took 0.10107588768005371 Running test_gil_released ... took 0.5052204132080078 py310: OK ? in 1.21 seconds Running test_gil_already_held ... took 0.08566665649414062 Running test_gil_released ... took 0.4581136703491211 py311: OK ? in 1.13 seconds Running test_gil_already_held ... took 0.15286779403686523 Running test_gil_released ... took 0.22533607482910156 py37: OK (1.42 seconds) py38: OK (1.17 seconds) py39: OK (1.20 seconds) py310: OK (1.21 seconds) py311: OK (1.13 seconds) pypy3.9: OK (1.64 seconds) congratulations :) (7.81 seconds) On Tue, 19 Sept 2023 at 10:09, Stefan Behnel wrote: > Hi, > > I've seen reports that Cython's "FastGIL" implementation (which basically > keeps the GIL state in a thread-local variable) is no longer faster than > CPython's plain GIL implementation in recent Python 3.x versions. > Potentially even slower. See the report in > > https://github.com/cython/cython/issues/5703 > > It would be helpful to get user feedback on this. > > If you have GIL-heavy Cython code, especially with nested > with-nogil/with-gil sections across functions, and a benchmark that > exercises it, could you please run the benchmark with and without the > feature enabled and report the results? > > You can add "-DCYTHON_FAST_GIL=0" to your CFLAGS to disabled it (and "=1" > to enable it explicitly). It's enabled by default in CPython 3.6-3.11 (but > disabled in Cython 0.29.x on Python 3.11). > > Thanks, > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > https://mail.python.org/mailman/listinfo/cython-devel > -- Lisandro Dalcin ============ Senior Research Scientist Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: fastgil.zip Type: application/zip Size: 1218 bytes Desc: not available URL: From dw-git at d-woods.co.uk Tue Sep 19 15:38:11 2023 From: dw-git at d-woods.co.uk (da-woods) Date: Tue, 19 Sep 2023 20:38:11 +0100 Subject: [Cython] Can we remove the FastGIL implementation? In-Reply-To: References: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> Message-ID: <1b62cca9-7a5d-6bf2-a801-d59ef4a3f553@d-woods.co.uk> I think the detail that was missing is you need to add the `#cython: fast_gil = True` to enable it. For me: Python 3.9 and 3.10 are basically identical (on master) **test_gil_already_held** with fast_gil Running the test... took 0.175062894821167 without Running the test... took 0.10976791381835938 **test_gil_released** with fast_gil Running the test... took 0.583066463470459 without Running the test... took 0.5824759006500244 test_gil_already_held is noticably faster with fast_gil. For Python 3.11: I get the crash in 0.29.x if I try to run using fast_gil. No defines are needed to get that... On master: **test_gil_already_held** with fast_gil Running the test... took 0.17254948616027832 without Running the test... took 0.10958600044250488 **test_gil_released** with fast_gil Running the test... took 0.5791811943054199 without Running the test... took 0.5597968101501465 Note that "without fastgil" is now as fast as "fastgil" used to be. As fastgil is now slower. This is reproducible. On Python 3.12 on master they're identical by default (which makes sense since I think we disable it). Defining -DCYTHON_FAST_GIL brings us back to roughly the same as 3.11 (i.e. now slower). So my conclusion is that from 3.11 onwards Python sped up their own GIL handling to about the same as we used to have, and fastgil has turned into a pessimization. David On 19/09/2023 11:58, Lisandro Dalcin wrote: > Disclaimer: I may be doing something wrong,?I did not put a lot of > effort into it. > With the microbenchmark that?was offered in the GH issue, I see little > difference. > Use the attached zip file to reproduce yourself. > Change tox.ini to "cython<3" to try 0.29.x. > BTW, in the 0.29.x case, I see no compilation error as claimed in the > GH issue. > > $ ./run.sh > CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=0 > Running test_gil_already_held ... took 0.08735537528991699 > Running test_gil_released ? ? ... took 0.6329536437988281 > py37: OK ? in 3.57 seconds > Running test_gil_already_held ... took 0.09007453918457031 > Running test_gil_released ? ? ... took 0.4598276615142822 > py38: OK ? in 3.19 seconds > Running test_gil_already_held ... took 0.10935306549072266 > Running test_gil_released ? ? ... took 0.4512367248535156 > py39: OK ? in 3.25 seconds > Running test_gil_already_held ... took 0.09970474243164062 > Running test_gil_released ? ? ... took 0.46637773513793945 > py310: OK ? in 3.21 seconds > Running test_gil_already_held ... took 0.08569073677062988 > Running test_gil_released ? ? ... took 0.46811795234680176 > py311: OK ? in 3.22 seconds > Running test_gil_already_held ... took 0.15221118927001953 > Running test_gil_released ? ? ... took 0.2246694564819336 > ? py37: OK (3.57 seconds) > ? py38: OK (3.19 seconds) > ? py39: OK (3.25 seconds) > ? py310: OK (3.21 seconds) > ? py311: OK (3.22 seconds) > ? pypy3.9: OK (5.24 seconds) > ? congratulations :) (21.71 seconds) > CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=1 > Running test_gil_already_held ... took 0.08835673332214355 > Running test_gil_released ? ? ... took 0.6265637874603271 > py37: OK ? in 1.42 seconds > Running test_gil_already_held ... took 0.09030938148498535 > Running test_gil_released ? ? ... took 0.456279993057251 > py38: OK ? in 1.17 seconds > Running test_gil_already_held ... took 0.10986089706420898 > Running test_gil_released ? ? ... took 0.45894527435302734 > py39: OK ? in 1.2 seconds > Running test_gil_already_held ... took 0.10107588768005371 > Running test_gil_released ? ? ... took 0.5052204132080078 > py310: OK ? in 1.21 seconds > Running test_gil_already_held ... took 0.08566665649414062 > Running test_gil_released ? ? ... took 0.4581136703491211 > py311: OK ? in 1.13 seconds > Running test_gil_already_held ... took 0.15286779403686523 > Running test_gil_released ? ? ... took 0.22533607482910156 > ? py37: OK (1.42 seconds) > ? py38: OK (1.17 seconds) > ? py39: OK (1.20 seconds) > ? py310: OK (1.21 seconds) > ? py311: OK (1.13 seconds) > ? pypy3.9: OK (1.64 seconds) > ? congratulations :) (7.81 seconds) > > > On Tue, 19 Sept 2023 at 10:09, Stefan Behnel wrote: > > Hi, > > I've seen reports that Cython's "FastGIL" implementation (which > basically > keeps the GIL state in a thread-local variable) is no longer > faster than > CPython's plain GIL implementation in recent Python 3.x versions. > Potentially even slower. See the report in > > https://github.com/cython/cython/issues/5703 > > It would be helpful to get user feedback on this. > > If you have GIL-heavy Cython code, especially with nested > with-nogil/with-gil sections across functions, and a benchmark that > exercises it, could you please run the benchmark with and without the > feature enabled and report the results? > > You can add "-DCYTHON_FAST_GIL=0" to your CFLAGS to disabled it > (and "=1" > to enable it explicitly). It's enabled by default in CPython > 3.6-3.11 (but > disabled in Cython 0.29.x on Python 3.11). > > Thanks, > Stefan > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > https://mail.python.org/mailman/listinfo/cython-devel > > > > -- > Lisandro Dalcin > ============ > Senior Research Scientist > Extreme Computing Research Center (ECRC) > King Abdullah University of Science and Technology (KAUST) > http://ecrc.kaust.edu.sa/ > > _______________________________________________ > cython-devel mailing list > cython-devel at python.org > https://mail.python.org/mailman/listinfo/cython-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From dw-git at d-woods.co.uk Tue Sep 19 15:53:13 2023 From: dw-git at d-woods.co.uk (da-woods) Date: Tue, 19 Sep 2023 20:53:13 +0100 Subject: [Cython] [cython-users] Re: Can we remove the FastGIL implementation? In-Reply-To: <1b62cca9-7a5d-6bf2-a801-d59ef4a3f553@d-woods.co.uk> References: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> <1b62cca9-7a5d-6bf2-a801-d59ef4a3f553@d-woods.co.uk> Message-ID: One more detail - on 0.29.x it becomes a pessimization in Python 3.10 rather than Python 3.11. So in conclusion ? ? ? ? | Python <3.10??????? | Python 3.10???????? | Python 3.11 | Python 3.12b2 ----------------------------------------------------------------------------------------- 0.29.x? | fast_gil is better? | fast_gil is worse?? | fast_gil crashes? | fast_gil crashes master? | fast_gil is better? | fast_gil is better? | fast_gil is worse | fast_gil is worse (but off by default) On 19/09/2023 20:38, da-woods wrote: > I think the detail that was missing is you need to add the `#cython: > fast_gil = True` to enable it. > > For me: > Python 3.9 and 3.10 are basically identical (on master) > > **test_gil_already_held** > with fast_gil > Running the test... > took 0.175062894821167 > without > Running the test... > took 0.10976791381835938 > > **test_gil_released** > with fast_gil > Running the test... > took 0.583066463470459 > without > Running the test... > took 0.5824759006500244 > > test_gil_already_held is noticably faster with fast_gil. > > For Python 3.11: > I get the crash in 0.29.x if I try to run using fast_gil. No defines > are needed to get that... > On master: > > **test_gil_already_held** > with fast_gil > Running the test... > took 0.17254948616027832 > without > Running the test... > took 0.10958600044250488 > > **test_gil_released** > with fast_gil > Running the test... > took 0.5791811943054199 > without > Running the test... > took 0.5597968101501465 > > Note that "without fastgil" is now as fast as "fastgil" used to be. As > fastgil is now slower. This is reproducible. > > On Python 3.12 on master they're identical by default (which makes > sense since I think we disable it). Defining -DCYTHON_FAST_GIL brings > us back to roughly the same as 3.11 (i.e. now slower). > > So my conclusion is that from 3.11 onwards Python sped up their own > GIL handling to about the same as we used to have, and fastgil has > turned into a pessimization. > > David > > > > > On 19/09/2023 11:58, Lisandro Dalcin wrote: >> Disclaimer: I may be doing something wrong,?I did not put a lot of >> effort into it. >> With the microbenchmark that?was offered in the GH issue, I see >> little difference. >> Use the attached zip file to reproduce yourself. >> Change tox.ini to "cython<3" to try 0.29.x. >> BTW, in the 0.29.x case, I see no compilation error as claimed in the >> GH issue. >> >> $ ./run.sh >> CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=0 >> Running test_gil_already_held ... took 0.08735537528991699 >> Running test_gil_released ? ? ... took 0.6329536437988281 >> py37: OK ? in 3.57 seconds >> Running test_gil_already_held ... took 0.09007453918457031 >> Running test_gil_released ? ? ... took 0.4598276615142822 >> py38: OK ? in 3.19 seconds >> Running test_gil_already_held ... took 0.10935306549072266 >> Running test_gil_released ? ? ... took 0.4512367248535156 >> py39: OK ? in 3.25 seconds >> Running test_gil_already_held ... took 0.09970474243164062 >> Running test_gil_released ? ? ... took 0.46637773513793945 >> py310: OK ? in 3.21 seconds >> Running test_gil_already_held ... took 0.08569073677062988 >> Running test_gil_released ? ? ... took 0.46811795234680176 >> py311: OK ? in 3.22 seconds >> Running test_gil_already_held ... took 0.15221118927001953 >> Running test_gil_released ? ? ... took 0.2246694564819336 >> ? py37: OK (3.57 seconds) >> ? py38: OK (3.19 seconds) >> ? py39: OK (3.25 seconds) >> ? py310: OK (3.21 seconds) >> ? py311: OK (3.22 seconds) >> ? pypy3.9: OK (5.24 seconds) >> ? congratulations :) (21.71 seconds) >> CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=1 >> Running test_gil_already_held ... took 0.08835673332214355 >> Running test_gil_released ? ? ... took 0.6265637874603271 >> py37: OK ? in 1.42 seconds >> Running test_gil_already_held ... took 0.09030938148498535 >> Running test_gil_released ? ? ... took 0.456279993057251 >> py38: OK ? in 1.17 seconds >> Running test_gil_already_held ... took 0.10986089706420898 >> Running test_gil_released ? ? ... took 0.45894527435302734 >> py39: OK ? in 1.2 seconds >> Running test_gil_already_held ... took 0.10107588768005371 >> Running test_gil_released ? ? ... took 0.5052204132080078 >> py310: OK ? in 1.21 seconds >> Running test_gil_already_held ... took 0.08566665649414062 >> Running test_gil_released ? ? ... took 0.4581136703491211 >> py311: OK ? in 1.13 seconds >> Running test_gil_already_held ... took 0.15286779403686523 >> Running test_gil_released ? ? ... took 0.22533607482910156 >> ? py37: OK (1.42 seconds) >> ? py38: OK (1.17 seconds) >> ? py39: OK (1.20 seconds) >> ? py310: OK (1.21 seconds) >> ? py311: OK (1.13 seconds) >> ? pypy3.9: OK (1.64 seconds) >> ? congratulations :) (7.81 seconds) >> >> >> On Tue, 19 Sept 2023 at 10:09, Stefan Behnel wrote: >> >> Hi, >> >> I've seen reports that Cython's "FastGIL" implementation (which >> basically >> keeps the GIL state in a thread-local variable) is no longer >> faster than >> CPython's plain GIL implementation in recent Python 3.x versions. >> Potentially even slower. See the report in >> >> https://github.com/cython/cython/issues/5703 >> >> It would be helpful to get user feedback on this. >> >> If you have GIL-heavy Cython code, especially with nested >> with-nogil/with-gil sections across functions, and a benchmark that >> exercises it, could you please run the benchmark with and without >> the >> feature enabled and report the results? >> >> You can add "-DCYTHON_FAST_GIL=0" to your CFLAGS to disabled it >> (and "=1" >> to enable it explicitly). It's enabled by default in CPython >> 3.6-3.11 (but >> disabled in Cython 0.29.x on Python 3.11). >> >> Thanks, >> Stefan >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> https://mail.python.org/mailman/listinfo/cython-devel >> >> >> >> -- >> Lisandro Dalcin >> ============ >> Senior Research Scientist >> Extreme Computing Research Center (ECRC) >> King Abdullah University of Science and Technology (KAUST) >> http://ecrc.kaust.edu.sa/ >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel at python.org >> https://mail.python.org/mailman/listinfo/cython-devel > > > -- > > --- > You received this message because you are subscribed to the Google > Groups "cython-users" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to cython-users+unsubscribe at googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/cython-users/1b62cca9-7a5d-6bf2-a801-d59ef4a3f553%40d-woods.co.uk > . -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Sep 20 05:27:37 2023 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 20 Sep 2023 11:27:37 +0200 Subject: [Cython] Can we remove the FastGIL implementation? In-Reply-To: <1b62cca9-7a5d-6bf2-a801-d59ef4a3f553@d-woods.co.uk> References: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> <1b62cca9-7a5d-6bf2-a801-d59ef4a3f553@d-woods.co.uk> Message-ID: <4305dccf-8ced-f0c7-fdce-7e19f16c97c1@behnel.de> da-woods schrieb am 19.09.23 um 21:38: > I think the detail that was missing is you need to add the `#cython: > fast_gil = True` to enable it. > [...] > So my conclusion is that from 3.11 onwards Python sped up their own GIL > handling to about the same as we used to have, and fastgil has turned into > a pessimization. I tried the benchmark with the master branch on my side again, this time with correct configuration. :) Turns out that enabling the FastGIL feature makes it much slower for me (on Ubuntu Linux 20.04) in both Py3.8 and 3.10: """ * Python 3.10 (-DCYTHON_FAST_GIL=0) Running the test (already held)... took 1.2482502460479736 Running the test (released)... took 6.444956541061401 Running the test (already held)... took 1.2358744144439697 Running the test (released)... took 6.4064109325408936 * Python 3.10 (-DCYTHON_FAST_GIL=1) Running the test (already held)... took 2.243091583251953 Running the test (released)... took 7.32707667350769 Running the test (already held)... took 2.4065449237823486 Running the test (released)... took 7.50264573097229 """ I also tried it with PGO enabled and got more or less the same result. The Python installations that I tried it with were both PGO builds. It's probably mixed across platforms, different configurations and C compilers. I looked through the "What's new" document for Py3.10 and 3.11 but couldn't find mentions of GIL improvements. Just that some other things have become faster. So ? disable the feature in Python 3.11 and later? (Currently it's disabled in 3.12+.) Py3.11+ would suggest that we keep the code in Cython 3.1, since that will support older Python versions that still seem to benefit from it. Stefan From dw-git at d-woods.co.uk Wed Sep 20 14:40:13 2023 From: dw-git at d-woods.co.uk (da-woods) Date: Wed, 20 Sep 2023 19:40:13 +0100 Subject: [Cython] Can we remove the FastGIL implementation? In-Reply-To: <4305dccf-8ced-f0c7-fdce-7e19f16c97c1@behnel.de> References: <1912d66b-4ecc-2eb7-8171-b4f3b196fb1a@behnel.de> <1b62cca9-7a5d-6bf2-a801-d59ef4a3f553@d-woods.co.uk> <4305dccf-8ced-f0c7-fdce-7e19f16c97c1@behnel.de> Message-ID: <38ee70a5-9c1c-550f-8320-685f5e75feda@d-woods.co.uk> On 20/09/2023 10:27, Stefan Behnel wrote: > So ? disable the feature in Python 3.11 and later? (Currently it's > disabled in 3.12+.) > That seems sensible. I think the other question is 0.29.x. On Python 3.11+ it silently produces code that crashes at runtime. We should probably disable it there (at least if there is another 0.29.x release). David