optomizations

Roy Smith roy at panix.com
Mon Apr 22 21:53:11 EDT 2013


In article <mailman.944.1366680414.3114.python-list at python.org>,
 Rodrick Brown <rodrick.brown at gmail.com> wrote:

> I would like some feedback on possible solutions to make this script run
> faster.

If I had to guess, I would think this stuff:

>                     line = line.replace('mediacdn.xxx.com', 'media.xxx.com')
>                     line = line.replace('staticcdn.xxx.co.uk', '
> static.xxx.co.uk')
>                     line = line.replace('cdn.xxx', 'www.xxx')
>                     line = line.replace('cdn.xxx', 'www.xxx')
>                     line = line.replace('cdn.xx', 'www.xx')
>                     siteurl = line.split()[6].split('/')[2]
>                     line = re.sub(r'\bhttps?://%s\b' % siteurl, "", line, 1)

You make 6 copies of every line.  That's slow.  But I'm also going to 
quote something I wrote here a couple of months back:

> I've been doing some log analysis.  It's been taking a grovelingly long 
> time, so I decided to fire up the profiler and see what's taking so 
> long.  I had a pretty good idea of where the ONLY TWO POSSIBLE hotspots 
> might be (looking up IP addresses in the geolocation database, or 
> producing some pretty pictures using matplotlib).  It was just a matter 
> of figuring out which it was. 
> 
> As with most attempts to out-guess the profiler, I was totally, 
> absolutely, and embarrassingly wrong. 

So, my real advice to you is to fire up the profiler and see what it 
says.



More information about the Python-list mailing list