Threads + Re(gex) Baffled!?!?!?

Gordon McMillan gmcm at hypernet.com
Wed Sep 15 09:10:25 EDT 1999


Cliff,

The problem could be re, or a platform problem not exposed 
until you use multiple CPUs, or a combination of these.

A solution might be to dedicated one thread to running the 
regexes, and use Queue to pass information to him from the 
socket-threads. Assuming that your regex exhibits decent 
behavior, the bottleneck is probably in the network, anyway.

----original message-------------
> I'm totally baffled right now, wrt threads.  I have built a test
> program using sockets to pull down just a few lines of text,
> parse it and junk the result.  
> 
> For practical purposes I have spawned 15 worker threads who
> each will connect to a different host (480 Hosts total).  They
> would then execute code similar to this:
> 
> ---------------------------------------
> Sock.send('some-command')
> Reply = ReadUntil('> ', Sock, Timeout)
> 
> if Reply == '':
>     return(-1)
> else:
>     tmp = re.split('\r\n', Reply)
>     for line in tmp:
>  m = pat.match(line)
>  .etc.
> 
> Upon running the script you see nice and speedy results at first
> with very little cpu usage.  However, the cpu usage grows
> continously until the program ends.  I have noticed that when the
> bloating occurs the a lot of the cpu time is spent in the kernel
> which would lead me to believe this is some sort of locking issue
> hidden within the 're' module?  When I put a 'continue' in front
> of the pat.match() every thing works just fine.  
> 
> With re:
> CPU states: 12.5% idle, 18.8% user, 46.4% kernel
> 26.54% test.py
> 
> Without:
> CPU states: 19.1% idle, 35.8% user, 10.6% kernel,
> 0.76% test.py
> 
> The pattern I'm matching is as follows:
> pat = re.compile('\s+(\w+)\s+\(.+?\):\s+(\d+)\s+(\d+)')
> 
> Is there any known issues with locking and 're'?
> 
> The system is an UE5000 with 8 cpus, Solaris 2.6, Python 1.5.2
> 
> If anyone has any ideas about this I desperately need some help. 
> I hate to have to dump this entire project for performance
> reasons. I've stripped out EVERYTHING in the program just to
> reproduct this to rule out my code.
> 
> Regards,
> Cliff
> 
> 
> -- 
> http://www.python.org/mailman/listinfo/python-list



- Gordon




More information about the Python-list mailing list