a python puzzle

Lulu of the Lotus-Eaters mertz at gnosis.cx
Thu Sep 26 00:20:33 EDT 2002


|the fact that it was mostly Python code would have
|likely skewed the letter frequencies, since Python keywords, modules,
|and builtin names appear more frequently in Python code than general
|user-chosen identifiers; the letter frequency of the code would be
|biased against the letter frequency of the common "words" in Python,
|which is likely to be somewhat different from English as a whole.

I wonder about that.  Python reserved words (and pseudo-reserved names)
are all rather ordinary English words.  I have a hunch that their letter
distribution falls pretty close to that of English prose.

Of course, you'd have to decide how to weight things.  If you merely did
a histogram on a list of keywords, you might get a somewhat different
pattern than if you checked actual scripts (with the comments and
variable names removed).  For example, most scripts have just a few
'import's at the top, but a whole bunch of 'if's 'for's and 'in's
scattered throughout the body.

Maybe I'll try an experiment.

Yours, Lulu...

--
---[ to our friends at TLAs (spread the word) ]--------------------------
Echelon North Korea Nazi cracking spy smuggle Columbia fissionable Stego
White Water strategic Clinton Delta Force militia TEMPEST Libya Mossad
---[ Postmodern Enterprises <mertz at gnosis.cx> ]--------------------------





More information about the Python-list mailing list