[issue41642] Buildbot: workers detached every minute and "no space left on device" issue

STINNER Victor report at bugs.python.org
Fri Aug 28 04:32:35 EDT 2020


STINNER Victor <vstinner at python.org> added the comment:

> The buildbot server migrated to a new machine and is now behind a load balancer. tcp/80 (buildbot web page, HTTP) and tcp/9020 (used by buildbot workers) are both behind the load balancer.
> (...)
> Buildbot workers have a TCP keepalive option of 1 hour (3600 seconds) by default (...)

Ernest confirmed that there are edge load balancers for the PSF infra in DigitalOcean. He updated the load balancers to offer a full 24 hour timeout on buildbot TCP connections. (Yesterday around 17:30 UTC.)

It seems like it doesn't fix the issue. Example in server logs:

(...)
2020-08-28 08:21:55+0000 [Broker,50831,10.132.169.157] Worker.detached(koobs-freebsd-564d)
2020-08-28 08:23:25+0000 [Broker,50856,10.132.169.157] Worker.detached(koobs-freebsd-564d)
2020-08-28 08:24:55+0000 [Broker,50881,10.132.169.157] Worker.detached(koobs-freebsd-564d)
2020-08-28 08:26:26+0000 [Broker,50906,10.132.169.157] Worker.detached(koobs-freebsd-564d)
2020-08-28 08:27:56+0000 [Broker,50931,10.132.169.156] Worker.detached(koobs-freebsd-564d)
(...)

----------
title: RHEL and fedora buildbots fail due to disk space error -> Buildbot: workers detached every minute and "no space left on device" issue

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41642>
_______________________________________


More information about the Python-bugs-list mailing list