Fwd: socket: Too many open files

Cameron Simpson cs at cskk.id.au
Sun Oct 14 18:57:07 EDT 2018


On 14Oct2018 20:24, Shakti Kumar <shakti.shrivastava13 at gmail.com> wrote:
>>On 13Oct2018 14:10, Shakti Kumar <shakti.shrivastava13 at gmail.com> wrote:
>>>I’m running a script which basically does a traceroute to the list of
>>>hosts
>>>provided, and then pulls up some info by logging in to gateways in the
>path.
>>>I am running this script for a list of almost 40k hosts in our data
>centers.
>>>Also, I am using commands module to get the traceroute output.
>>>
>>>out = commands.getstatusoutput('traceroute ' + ip)
>>>
>>>However I observe that this particular line is failing with socket error
>>>after I reach some 5k to 6k hosts.
[...]
>>In fact, given that it is file descriptors, maybe sockets are not what
>>is leaking?
>
>>From another terminal, see what your Python programme has open when this
>>happens with "lsof -n -p pid-of-python-programme". Maybe the leaks are
>>pipes, or connections from your "logging in to gateways in the path"
>>code. It may be as simple as you not closing files or connections.
>
>Thanks Cameron, I still cant get over the fact that you were able to pin
>point the issue even without looking at my code XD

Well it seemed likely that traceroute itself was not the cause because 
as you expected, when the command terminates its resources should clear 
up. That is one of the nice things about external executables - the OS 
tidies up their leaks, if any. (TCP sockets tend to hand around for a 
while to catch stray resent packets, but traceroute doesn't use TCP.)

>Indeed when I started looking for sockets, I realised I was not closing
>connections to ACI hosts in our datacentres, and the error was starting to
>pop up at around 2900 to 3k TCP connections open to the hosts. XD

I think the takea away from my post, if any, is the usefulness of some 
external commands for inspecting behaviour if a problem seems 
intractable by inspection (eg "in Python" debugging, including prints 
etc).

Tools like strace, netstat and lsof are really good for seeing what the 
program is doing/leaking/misusing at an OS level. Even for opaque things 
like third party Java programmes, etc.

In your particular case, you're out of sockets (or possibly file 
desciptors). So use netstat to inspect the open sockets (and on Linux, 
"the open sockets for a particular PID" is an answerable question).  
Also, lsof will show your programme's open files, including the sockets.

Cheers,
Cameron Simpson <cs at cskk.id.au>



More information about the Python-list mailing list