programming languages (etc) "web popularity" fun
Alex Martelli
aleax at aleax.it
Fri Oct 31 11:16:05 EST 2003
Cameron Laird wrote:
...
> It's easy to imagine sources of noise for these data, including
> such English-language commonplaces as "go forth" you already
Sure! Although juxtaposing "programming" to the search, as my
little script did, is, I believe, going to help a lot, it's no
magic. If a language was called, for example, 'and', we'd NEVER
manage to get reliable statistics about it:-).
Actually there is a lesson here about "product naming for the
21st century". If you want to help people googling for your
product (firm, project, whatever), *use a made-up word* so that
all the google hits on it will be real ones. If you want to make
sure you're basically ungooglable-for, well -- take a leaf from
MS, and name your technologies "COM", ".NET" and so on:-).
> mentioned. A next step might be to try to refine the queries
> to eliminate classes of noise. The one that most catches my at-
> tention is PHP; I've got to think that a lot of those are pages
> that use PHP, rather than discuss it.
No doubt, and google can help a little with THIS kind of artefact,
thanks to the "allintext:" qualifier. (BTW, should anybody with
any interest in web searching not have O'Reilly's book "Google
Hacks" yet, GET IT!-).
So, I've made a 2nd release of my script, more targeted at those
languages which stand a chance for the top spots and more subject
to automatic cleaning. The quoter function has gone, the langs
variable is built in more detail with:
langs = [x.strip() for x in '''
"c" -"c++" -"c#"
basic -visual
"c++"
"visual basic"
"assembly language" OR "machine code" OR "machine language"
forth -"go forth" -"and so forth"
"c#"
pascal -object
[ ...many simple unquoted single-word languages snipped... ]
smalltalk
ruby
'''.splitlines() if x.strip()]
and the search, in the loop, has become:
data = google.doGoogleSearch('allintext: %s programming' % lang)
with these refinements, we get the following top 30 languages:
Language # of hits
java 3050000
c" -"c++" -"c# 2470000
basic -visual 1880000
c++ 1710000
perl 1510000
php 1060000
javascript 939000
visual basic 758000
python 682000
scheme 642000
c# 460000
forth -"go forth" -"and so forth 325000
fortran 322000
delphi 305000
tcl 254000
postscript 236000
abc 233000
lisp 201000
ada 177000
ml 174000
vbscript 165000
cobol 157000
assembly language" OR "machine code" OR "machine language 146000
pascal -object 142000
foxpro 127000
vba 112000
matlab 103000
smalltalk 90300
ruby 88000
php has indeed lost a couple notches, and so have forth, assembly
(most particularly), pascal, basic. The "top 10" are still the same
though. A few hints for would-be further-cleaner-uppers though...:
abc programming gets a LOT of help from one certain TV network!-)
[all others on this list, from a simple eyeball test w/interactive
searches on 1st pages only, appear legit]
c is HEAVILY handicapped by those - conditions; if we did
java -"c++" -"c#" (tried interactively), we'd only get 2,370,000,
so c is in fact still quite likely to be king of the heap (same
query, interactive, with C instead of java, is over 4,000,000...)
this is an example of the fact that these numbers don't get reproduced
when I try the same google queries interactively (in opera) -- there
may be different filtering schemes in play
being careful is of course particularly warranted when two contendants
appear to be very close, abd there are many such pairs here --
python and scheme, forth and fortran, ada and ml, smalltalk and ruby...
Let's see what somebody else can dream up, perhaps on a very different
tack than my idea of tacking the word 'programming' on...
Alex
More information about the Python-list
mailing list