Python slow for filter scripts
Bengt Richter
bokr at oz.net
Wed Oct 29 22:36:07 EST 2003
On 29 Oct 2003 06:34:22 GMT, William Park <opengeometry at yahoo.ca> wrote:
>Alex Martelli <aleax at aleax.it> wrote:
>> and I'm specifically reading the King James' Bible (an easily
>> available text so you can reproduct my results!) and writing
>
>Can you post URL for the Bible?
>
Try Project Gutenburg, at
http://www.gutenberg.net/
or their new host at
http://www.ibiblio.org/gutenberg/
They have a number of bibles in various languages, and a ton (>10,000 e-texts) of other stuff,
also some audio texts, apparently. BTW I read somewhere that the BBC is going to make all their
archives, video and audio, freely available on the net, except where there is some legal reason
they can't. I guess they're a kind of FEF -- Free Entertainment Foundation (thank you British
telly owners ;-)
Apparently a new King James e-text is at (long URL, or use their search for "bible" (w/o qutoes)
and go to entry #16):
http://www.ibiblio.org/gutenberg/cgi-bin/sdb/t9.cgi?entry=30&full=yes&ftpsite=http://www.ibiblio.org/gutenberg/
They also have the Koran, BTW. It's interesting to compare word frequencies, e.g., the 20 most frequent
(unless I goofed) in the texts I downloaded:
"C:\Info\Linguistics\Gutenberg\bible\bible11.txt"
6647: 'LORD'
6649: 'him'
6856: 'is'
6893: 'be'
6971: 'they'
7249: 'for'
7972: 'a'
8388: 'his'
8854: 'I'
8940: 'unto'
9666: 'he'
9760: 'shall'
12353: 'in'
12592: 'that'
12846: 'And'
13429: 'to'
34472: 'of'
38891: 'and'
62135: 'the'
"C:\Info\Linguistics\Gutenberg\koran\koran10.txt"
1739: 'ye'
1752: 'with'
1956: 'And'
1979: 'for'
1991: 'who'
2037: 'be'
2108: 'not'
2186: 'that'
2254: 'shall'
2366: 'them'
2575: 'a'
2644: 'they'
2799: 'is'
2900: 'in'
3320: 'God'
5144: 'to'
6855: 'of'
6896: 'and'
10982: 'the'
Both start with the-and-of-to ;-)
(I hope this does not offend anyone ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list