CPython thread starvation

Sat Apr 28 09:04:42 EDT 2012

In article <7xy5pgqwto.fsf at ruckus.brouhaha.com>,
 Paul Rubin <no.email at nospam.invalid> wrote:

> John Nagle <nagle at animats.com> writes:
> >    I may do that to prevent the stall.  But the real problem was all
> > those DNS requests.  Parallizing them wouldn't help much when it took
> > hours to grind through them all.
> 
> True dat.  But building a DNS cache into the application seems like a
> kludge.  Unless the number of requests is insane, running a caching
> nameserver on the local box seems cleaner.

I agree that application-level name cacheing is "wrong", but sometimes 
doing it the wrong way just makes sense.  I could whip up a simple 
cacheing wrapper around getaddrinfo() in 5 minutes.  Depending on the 
environment (both technology and bureaucracy), getting a cacheing 
nameserver installed might take anywhere from 5 minutes to a few days to 
kicking a dead whale down the beach (if you need to involve your 
corporate IT department) to it just ain't happening (if you need to 
involve your corporate IT department).

Doing DNS cacheing correctly is non-trivial.  In fact, if you're 
building it on top of getaddrinfo(), it may be impossible, since I don't 
think getaddrinfo() exposes all the data you need (i.e. TTL values).  
But, doing a half-assed job of cache expiration is better than not 
expiring your cache at all.  I would suggest (from experience) that if 
you build a getaddrinfo() wrapper, you have cache entries time out after 
a fairly short time.  From the problem description, it sounds like using 
a 1-minute timeout would get 99% of the benefit and might keep you from 
doing some bizarre things.

PS -- I've also learned by experience that nscd can mess up.  If DNS 
starts doing stuff that doesn't make sense, my first line of attack is 
usually killing and restarting the local nscd.  Often enough, that 
solves the problem, and it rarely causes any problems that anybody would 
notice.