From aixtools at felt.demon.nl Sat Nov 10 10:01:04 2018 From: aixtools at felt.demon.nl (Michael) Date: Sat, 10 Nov 2018 16:01:04 +0100 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks Message-ID: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> Just noticed my build-bot (#161) and the other AIX build-bot (#10) are stuck acquiring locks. #161 stuck 5 days (actually, 159, 160, 161, 162 and 163) #10 stuck 2 days (9, 10 and 131) - with 9 pinging builder for 9 hours (builders 4 and 104 are quiet) FYI. p.s. Not unique to AIX - this one is acquiring locks for 9 days! https://buildbot.python.org/all/#/builders/8 Michael -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From db3l.net at gmail.com Sat Nov 10 13:47:44 2018 From: db3l.net at gmail.com (David Bolen) Date: Sat, 10 Nov 2018 13:47:44 -0500 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> Message-ID: If you look back on the worker there's probably some other build on the same worker still technically running that's blocking the rest. It might be old enough to not show on the main summary page, and you won't see if it just looking at a single builder history, so look at the worker in general. For example, it looks like there's a 3.6 build from 5 days ago, which appears stuck in test_multiprocessing_fork for the past 123 hours. So the later builds are just waiting for the worker to be available. I've seen the rare multiprocessing test hang on some of my slower Windows workers. I think only maybe within the past 5-6 months and probably less than a few times overall. I don't know why the regular test timeouts fail to interrupt things. But if you manually kill the stuck test process on the worker, everything should free up for the rest. Not sure if a master side cancellation works or not. The other cases are appear stuck in builds but in different places. The ARM worker has a 3.x build from 10 days ago stuck in test_pydoc. The other AIX worker is stuck in a checkout step somehow. But in all cases, it's not the waiting builds at issue, but the hung build that blocks the rest. -- David On Sat, Nov 10, 2018 at 10:00 AM Michael wrote: > Just noticed my build-bot (#161) and the other AIX build-bot (#10) are > stuck acquiring locks. > > #161 stuck 5 days (actually, 159, 160, 161, 162 and 163) > > #10 stuck 2 days (9, 10 and 131) - with 9 pinging builder for 9 hours > (builders 4 and 104 are quiet) > > FYI. > > p.s. Not unique to AIX - this one is acquiring locks for 9 days! > https://buildbot.python.org/all/#/builders/8 > > Michael > > > _______________________________________________ > Python-Buildbots mailing list > Python-Buildbots at python.org > https://mail.python.org/mailman/listinfo/python-buildbots > From zachary.ware+pydev at gmail.com Sat Nov 10 15:29:02 2018 From: zachary.ware+pydev at gmail.com (Zachary Ware) Date: Sat, 10 Nov 2018 14:29:02 -0600 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> Message-ID: On Sat, Nov 10, 2018 at 12:48 PM David Bolen wrote: > If you look back on the worker there's probably some other build on the > same worker still technically running that's blocking the rest. It might > be old enough to not show on the main summary page, and you won't see if it > just looking at a single builder history, so look at the worker in general. David is exactly right; there were hung builds on a few builders. I thought I'd added configuration to fix that, but apparently it's not working as expected. I have manually gone through and cancelled the hung builds, so hopefully those builders will catch up soon. -- Zach From aixtools at felt.demon.nl Sat Nov 10 17:34:05 2018 From: aixtools at felt.demon.nl (Michael Felt (aixtools)) Date: Sat, 10 Nov 2018 23:34:05 +0100 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> Message-ID: <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> Sent from my iPhone > On 10 Nov 2018, at 21:29, Zachary Ware wrote: > >> On Sat, Nov 10, 2018 at 12:48 PM David Bolen wrote: >> If you look back on the worker there's probably some other build on the >> same worker still technically running that's blocking the rest. It might >> be old enough to not show on the main summary page, and you won't see if it >> just looking at a single builder history, so look at the worker in general. > > David is exactly right; there were hung builds on a few builders. I > thought I'd added configuration to fix that, but apparently it's not > working as expected. I have manually gone through and cancelled the > hung builds, so hopefully those builders will catch up soon. Thx. My 5 day job finished. Scary though. 20 tests failed, rather than the expected 7. > -- > Zach > _______________________________________________ > Python-Buildbots mailing list > Python-Buildbots at python.org > https://mail.python.org/mailman/listinfo/python-buildbots From aixtools at felt.demon.nl Sun Nov 11 15:27:50 2018 From: aixtools at felt.demon.nl (Michael) Date: Sun, 11 Nov 2018 21:27:50 +0100 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> Message-ID: <3d5e3e65-2bc6-b77e-e4c3-c6768dc624c8@felt.demon.nl> On 10/11/2018 23:34, Michael Felt (aixtools) wrote: OK. I spent a bit of time looking at the results, and to my dismay - there was a major shift in the number of errors that the AIX bot returns. I spent about a month of real-time to find corrections to all the open AIX tests - so that the build bot could be useful in spotting changes that seriously affected AIX. Without that effort there is not really any point in running a bot. Anyway, my bot went from having 9 to 10 fails (and 10k lines of output) to 20 fails and roughly 22k lines of test. The last "near normal FAIL" is: https://buildbot.python.org/all/#/builders/161/builds/325/steps/4/logs/stdio and the first mega fail is https://buildbot.python.org/all/#/builders/161/builds/326/steps/4/logs/stdio No way I am going to research all of those if noone is going to take review and reject or merge my submissions. I suppose I could look daily - but it just says fail, so I watch the PRs instead. The bot just says fail. Does anyone besides myself care? Sincerely, Michael > > Sent from my iPhone > >> On 10 Nov 2018, at 21:29, Zachary Ware wrote: >> >>> On Sat, Nov 10, 2018 at 12:48 PM David Bolen wrote: >>> If you look back on the worker there's probably some other build on the >>> same worker still technically running that's blocking the rest. It might >>> be old enough to not show on the main summary page, and you won't see if it >>> just looking at a single builder history, so look at the worker in general. >> David is exactly right; there were hung builds on a few builders. I >> thought I'd added configuration to fix that, but apparently it's not >> working as expected. I have manually gone through and cancelled the >> hung builds, so hopefully those builders will catch up soon. > Thx. My 5 day job finished. Scary though. 20 tests failed, rather than the expected 7. >> -- >> Zach >> _______________________________________________ >> Python-Buildbots mailing list >> Python-Buildbots at python.org >> https://mail.python.org/mailman/listinfo/python-buildbots > _______________________________________________ > Python-Buildbots mailing list > Python-Buildbots at python.org > https://mail.python.org/mailman/listinfo/python-buildbots -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: From nad at python.org Sun Nov 11 20:16:02 2018 From: nad at python.org (Ned Deily) Date: Sun, 11 Nov 2018 20:16:02 -0500 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: <3d5e3e65-2bc6-b77e-e4c3-c6768dc624c8@felt.demon.nl> References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> <3d5e3e65-2bc6-b77e-e4c3-c6768dc624c8@felt.demon.nl> Message-ID: <6B950C32-FD6F-4A2D-B1A5-4B29BE3BAAFE@python.org> On Nov 11, 2018, at 15:27, Michael wrote: > Anyway, my bot went from having 9 to 10 fails (and 10k lines of output) > to 20 fails and roughly 22k lines of test. > > The last "near normal FAIL" is: > https://buildbot.python.org/all/#/builders/161/builds/325/steps/4/logs/stdio > and the first mega fail is > https://buildbot.python.org/all/#/builders/161/builds/326/steps/4/logs/stdio Michael, Thanks for tracking down where the additional failures began. I took a quick look at the buildbot listings for builds 325 and 325 and it seems like all of the additional failures that appear in 326 vs 325 are a result of a new build failure for the pyexpat module. That's visible at the end of the "compile" stage output: Failed to build these modules: pyexpat and near the top of the test output when the Makefile tries to rebuild any failed modules. Looking at the output of the git stage of 326, the git branch HEAD of the build is 9d4712bc8f26bf1d7e626b53ab092fe030bcd68d. Running git show to get the commit message: $ git show 9d4712bc8f26bf1d7e626b53ab092fe030bcd68d commit 9d4712bc8f26bf1d7e626b53ab092fe030bcd68d Author: Gregory P. Smith Date: Wed Oct 17 18:10:46 2018 -0700 bpo-35011: Restore use of pyexpatns.h in libexpat (GH-9939) Restores the use of pyexpatns.h to isolate our embedded copy of the expat C library so that its symbols do not conflict at link or dynamic loading time with an embedding application or other extension modules with their own version of libexpat. https://github.com/python/cpython/commit/5dc3f23b5fb0b510926012cb3732dae63cddea60#diff-3afaf7274c90ce1b7405f75ad825f545 inadvertently removed it when upgrading expat. That looks rather suspicious! So I guess the next step would be to try to figure out why that change (or some other in the range between buildbot run 325 and 326) causes the pyexpact module to stop building. There are a bunch of ld: 0711-317 ERROR: Undefined symbol errors in the link step of the "building 'pyexpat' extension" step. In the "configure" step, it is determined that it is not using an operating system provided version of expat: checking for --with-system-expat... no Which is to be expected. So it should be trying to pick up a version of expat that is provided elsewhere outside of the build. Perhaps there is an issue with it. Or the above change doesn't take something into account. To proceed further, it would be most helpful to have someone with AIX experience and/or familiarity with that buildbot's setup. -- Ned Deily nad at python.org -- [] From aixtools at felt.demon.nl Tue Nov 13 16:07:49 2018 From: aixtools at felt.demon.nl (Michael Felt) Date: Tue, 13 Nov 2018 23:07:49 +0200 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: <6B950C32-FD6F-4A2D-B1A5-4B29BE3BAAFE@python.org> References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> <3d5e3e65-2bc6-b77e-e4c3-c6768dc624c8@felt.demon.nl> <6B950C32-FD6F-4A2D-B1A5-4B29BE3BAAFE@python.org> Message-ID: On 11/12/2018 3:16 AM, Ned Deily wrote: > On Nov 11, 2018, at 15:27, Michael wrote: >> Anyway, my bot went from having 9 to 10 fails (and 10k lines of output) >> to 20 fails and roughly 22k lines of test. >> >> The last "near normal FAIL" is: >> https://buildbot.python.org/all/#/builders/161/builds/325/steps/4/logs/stdio >> and the first mega fail is >> https://buildbot.python.org/all/#/builders/161/builds/326/steps/4/logs/stdio > Michael, > > Thanks for tracking down where the additional failures began. I took a quick look at the buildbot listings for builds 325 and 325 and it seems like all of the additional failures that appear in 326 vs 325 are a result of a new build failure for the pyexpat module. That's visible at the end of the "compile" stage output: > > Failed to build these modules: > pyexpat > > and near the top of the test output when the Makefile tries to rebuild any failed modules. > > Looking at the output of the git stage of 326, the git branch HEAD of the build is 9d4712bc8f26bf1d7e626b53ab092fe030bcd68d. Running git show to get the commit message: > > $ git show 9d4712bc8f26bf1d7e626b53ab092fe030bcd68d > commit 9d4712bc8f26bf1d7e626b53ab092fe030bcd68d > Author: Gregory P. Smith > Date: Wed Oct 17 18:10:46 2018 -0700 > > bpo-35011: Restore use of pyexpatns.h in libexpat (GH-9939) > > Restores the use of pyexpatns.h to isolate our embedded copy of the expat C > library so that its symbols do not conflict at link or dynamic loading time > with an embedding application or other extension modules with their own > version of libexpat. > > https://github.com/python/cpython/commit/5dc3f23b5fb0b510926012cb3732dae63cddea60#diff-3afaf7274c90ce1b7405f75ad825f545 inadvertently removed it when upgrading expat. > > That looks rather suspicious! > > So I guess the next step would be to try to figure out why that change (or some other in the range between buildbot run 325 and 326) causes the pyexpact module to stop building. There are a bunch of ld: 0711-317 ERROR: Undefined symbol errors in the link step of the "building 'pyexpat' extension" step. In the "configure" step, it is determined that it is not using an operating system provided version of expat: > > checking for --with-system-expat... no > > Which is to be expected. So it should be trying to pick up a version of expat that is provided elsewhere outside of the build. Basically, as long as this bot has been running there has been a libexpat.a in /opt/lib - at expat version 2.2.5. buildbot at x064:[/home/buildbot]lslpp -L | grep expat ? aixtools.expat.rte???????? 2.2.5.0??? C???? F??? aixtools expat 12-Nov-2017 ? aixtools.expat.share?????? 2.2.5.0??? C???? F??? aixtools expat universal files buildbot at x064:[/home/buildbot]lslpp -h aixtools.expat.rte ? Fileset???????? Level???? Action?????? Status?????? Date???????? Time ? ---------------------------------------------------------------------------- Path: /usr/lib/objrepos ? aixtools.expat.rte ????????????????? 2.2.5.0?? COMMIT?????? COMPLETE???? 08/02/18???? 10:16:19 That has not changed. I manually did a clone of cpython master and built using the same arguments, userid and environment as I expect the bot to be using - And I get 11 failed tests: 11 tests failed: ??? test__xxsubinterpreters test_binascii test_ctypes test_distutils ??? test_httpservers test_importlib test_multiprocessing_forkserver ??? test_multiprocessing_spawn test_pyexpat test_socket test_time ???? I checked on my regular (manual) build system, and I see that I do not have any libexpat.a library. I recall (now), that I have occaisionally had to "manually" add -lexpat to one of the modules - because of many undefined symbols - e.g., building 'pyexpat' extension xlc_r -O2 -I/opt/include -I/opt/buildaix/include -I/opt/include -I/opt/buildaix/include -I./Include/internal -DHAVE_EXPAT_CONFIG_H=1 -DXML_POOR_ENTROPY=1 -DUSE_PYEXPAT_CAPI -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat -I./Include -I. -I/opt/include -I/opt/buildaix/include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0 -c /home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/pyexpat.c -o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/pyexpat.o "/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/pyexpat.c", line 1221.23: 1506-068 (W) Operation between types "void*" and "void(*)(void*,const char*,int)" is not allowed. ??? 1500-030: (I) INFORMATION: PyInit_pyexpat: Additional optimization may be attained by recompiling and specifying MAXMEM option with a value greaterthan 8192. xlc_r -O2 -I/opt/include -I/opt/buildaix/include -I/opt/include -I/opt/buildaix/include -I./Include/internal -DHAVE_EXPAT_CONFIG_H=1 -DXML_POOR_ENTROPY=1 -DUSE_PYEXPAT_CAPI -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat -I./Include -I. -I/opt/include -I/opt/buildaix/include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0 -c /home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmlparse.c -o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmlparse.o xlc_r -O2 -I/opt/include -I/opt/buildaix/include -I/opt/include -I/opt/buildaix/include -I./Include/internal -DHAVE_EXPAT_CONFIG_H=1 -DXML_POOR_ENTROPY=1 -DUSE_PYEXPAT_CAPI -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat -I./Include -I. -I/opt/include -I/opt/buildaix/include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0 -c /home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmlrole.c -o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmlrole.o xlc_r -O2 -I/opt/include -I/opt/buildaix/include -I/opt/include -I/opt/buildaix/include -I./Include/internal -DHAVE_EXPAT_CONFIG_H=1 -DXML_POOR_ENTROPY=1 -DUSE_PYEXPAT_CAPI -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat -I./Include -I. -I/opt/include -I/opt/buildaix/include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Include -I/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0 -c /home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmltok.c -o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmltok.o Modules/ld_so_aix xlc_r -bI:Modules/python.exp -L/opt/lib -bmaxdata:0x40000000 -L/opt/lib -bmaxdata:0x40000000 -L/opt/lib -bmaxdata:0x40000000 -I/opt/include -I/opt/buildaix/include -I/opt/include -I/opt/buildaix/include build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/pyexpat.o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmlparse.o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmlrole.o build/temp.aix-7.1-3.8-pydebug/home/buildbot/buildarea/3.x.aixtools-aix-power6/build.0/Modules/expat/xmltok.o -L/opt/lib -o build/lib.aix-7.1-3.8-pydebug/pyexpat.so ld: 0711-317 ERROR: Undefined symbol: XML_SetStartElementHandler ld: 0711-317 ERROR: Undefined symbol: XML_SetEndElementHandler ld: 0711-317 ERROR: Undefined symbol: XML_SetProcessingInstructionHandler ld: 0711-317 ERROR: Undefined symbol: XML_SetCharacterDataHandler ld: 0711-317 ERROR: Undefined symbol: XML_SetUnparsedEntityDeclHandler ... When I add "-lexpat" the module builds normally. But is "contrary" to having it find a system libexpat (it is there, but in /opt/lib). As my system that does not any libexpat does not need this "help", I am going to guess that since configure finds the include files it builds pyexpat differently than when the include file is not available. I'll research further. (I'll start with renaming the expat include files). All for tonight. Thanks for hints to look at. > Perhaps there is an issue with it. Or the above change doesn't take something into account. To proceed further, it would be most helpful to have someone with AIX experience and/or familiarity with that buildbot's setup. > > > -- > Ned Deily > nad at python.org -- [] > > From aixtools at felt.demon.nl Tue Nov 13 16:39:32 2018 From: aixtools at felt.demon.nl (Michael Felt) Date: Tue, 13 Nov 2018 23:39:32 +0200 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> <3d5e3e65-2bc6-b77e-e4c3-c6768dc624c8@felt.demon.nl> <6B950C32-FD6F-4A2D-B1A5-4B29BE3BAAFE@python.org> Message-ID: <00a7b76b-c43b-e245-e9cc-673f6bea39a9@felt.demon.nl> On 11/13/2018 11:07 PM, Michael Felt wrote: > As my system that does not any libexpat does not need this "help", I am > going to guess that since configure finds the include files it builds > pyexpat differently than when the include file is not available. > > I'll research further. (I'll start with renaming the expat include files). > > All for tonight. Thanks for hints to look at. FYI - It now builds "normally" Python build finished successfully! The necessary bits to build these optional modules were not found: _curses_panel???????? _gdbm???????????????? _tkinter ossaudiodev?????????? readline????????????? spwd To find the necessary bits, look in setup.py in detect_modules() for the module's name. The following modules found by detect_modules() in setup.py, have been built by the Makefile instead, as configured by the Setup files: _abc????????????????? atexit??????????????? pwd time Tests are running... From aixtools at felt.demon.nl Wed Nov 14 16:33:25 2018 From: aixtools at felt.demon.nl (Michael Felt) Date: Wed, 14 Nov 2018 23:33:25 +0200 Subject: [Python-buildbots] [Python-Buildbots] - some/many bots stuck acquiring locks In-Reply-To: <00a7b76b-c43b-e245-e9cc-673f6bea39a9@felt.demon.nl> References: <4a9374ab-fea6-cd82-6183-fd633d762594@felt.demon.nl> <43BBA36D-72B9-4FCD-B6B5-20F7247FE736@felt.demon.nl> <3d5e3e65-2bc6-b77e-e4c3-c6768dc624c8@felt.demon.nl> <6B950C32-FD6F-4A2D-B1A5-4B29BE3BAAFE@python.org> <00a7b76b-c43b-e245-e9cc-673f6bea39a9@felt.demon.nl> Message-ID: @Ned - thx for the hints. The bot behavior is "back to normal". Still have to find out why the test_multiprocession_* fail on the bot, but not when I run them manually (on my AIX6.1 build server). As I have time, I'll continue to research. Also, I'll look to a patch to check BOTH include files AND library availability. My bot system could have been (should have been) using an "external" aka " system" expat library. Is it the intent to use a "system" expat - if one is found? On 11/13/2018 11:39 PM, Michael Felt wrote: > Tests are running... > ________________________