[Pydotorg-redesign] how to search the site
Simon Willison
cs1spw at bath.ac.uk
Mon Sep 15 10:18:26 EDT 2003
Skip Montanaro wrote:
> Why? Look at
>
> http://www.python.org/wwwstats/agent_200308.html
>
> and see if you still think Netscape 4 really matters.
It took a while (those stats are pretty hard to figure out) but
eventually I figured that the various versions of Netscape 4 accounts
for 1.17% of hits to the Python site. In case anyone wants to check
themselves, here's the code I used (after first saving the stats from
that page in to a text file):
import re
# Regexp to extract number at start of line
num = re.compile('^(\d+)')
# Load in the lines from the file
lines = fp.open('python-browser-stats.txt').readlines()
# Filter out any lines that don't start with a number
lines = [line for line in lines if num.match(line)]
# Find all lines referring to a Netscape 4 version
netscape = [line for line in lines if
'Mozilla/4' in line and
'compatible' not in line and
'Gecko' not in line]
# Build list of numbers for each NS4 user agent strings
nscounts = [int(num.match(line).groups()[0]) for line in netscape]
# Do the same for ALL user agent strings
allcounts = [int(num.match(line).groups()[0]) for line in lines]
# Now sum the above lists
nstotal = sum(nscounts)
alltotal = sum(allcounts)
# And calculate the percentage
print float(nstotal) / alltotal * 100
The Netscape 4 list comprehension is based on the idea that Netscape 4's
user agent string always contains 'Mozilla/4', but then so does the
string of a number of other browsers. Filtering on 'compatible' removes
Microsoft browsers, and filtering on 'Gecko' removes any gecko variants.
Cheers,
Simon
More information about the Pydotorg-redesign
mailing list