[Pydotorg-redesign] how to search the site

Simon Willison cs1spw at bath.ac.uk
Mon Sep 15 10:18:26 EDT 2003


Skip Montanaro wrote:
> Why?  Look at
> 
>     http://www.python.org/wwwstats/agent_200308.html
> 
> and see if you still think Netscape 4 really matters.

It took a while (those stats are pretty hard to figure out) but 
eventually I figured that the various versions of Netscape 4 accounts 
for 1.17% of hits to the Python site. In case anyone wants to check 
themselves, here's the code I used (after first saving the stats from 
that page in to a text file):

import re

# Regexp to extract number at start of line
num = re.compile('^(\d+)')

# Load in the lines from the file
lines = fp.open('python-browser-stats.txt').readlines()
# Filter out any lines that don't start with a number
lines = [line for line in lines if num.match(line)]
# Find all lines referring to a Netscape 4 version
netscape = [line for line in lines if
	'Mozilla/4' in line and
	'compatible' not in line and
	'Gecko' not in line]
# Build list of numbers for each NS4 user agent strings
nscounts = [int(num.match(line).groups()[0]) for line in netscape]
# Do the same for ALL user agent strings
allcounts = [int(num.match(line).groups()[0]) for line in lines]
# Now sum the above lists
nstotal = sum(nscounts)
alltotal = sum(allcounts)
# And calculate the percentage
print float(nstotal) / alltotal * 100

The Netscape 4 list comprehension is based on the idea that Netscape 4's 
user agent string always contains 'Mozilla/4', but then so does the 
string of a number of other browsers. Filtering on 'compatible' removes 
Microsoft browsers, and filtering on 'Gecko' removes any gecko variants.

Cheers,

Simon




More information about the Pydotorg-redesign mailing list