YOU ALL SUCK!

Tris Orendorff triso at remove-me.cogeco.ca
Thu Sep 9 20:05:52 EDT 2004


carl.scharenberg at gmail.com (Carl Scharenberg) wrote in
news:e930c085.0409020529.2db830fc at posting.google.com: 


>> This seems to be of somewhat better quality than the output of the
>> typical random-text generator.  Can anyone suggest something on CPAN
>> useful for such?
> 
> You can do this by analyzing a sample text at a higher level. Instead
> of generating text from the frequency of single letters, you generate
> using the frequencies of 2, 3, or 4-letter sequences. You analyze a
> large text so you have a database of frequencies. When generating each
> new character you look at the frequences of the letters given that the
> 3 previous letters are 'the'. The possibilities are a space, 'r'
> (their), 'y' (they), and some others. Overall it will generate words
> and even phrases that seem to almost make sense. It is neat stuff.

This is known as a Markov Chain and it works even better if you generate using words rather than letters.  
Using letters creates words and non words.  The output is written in the same style as the input text.


-- 
Sincerely,

Tris Orendorff

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d++ s+:- a+ C+ UL++++ P+ L+ E- W+ N++ o- K++ w+ O+ M !V PS+ PE Y+ PGP t+ !5 X- R- tv--- b++ 
DI++ D+ G++ e++ h---- r+++ y+++
------END GEEK CODE BLOCK------





More information about the Python-list mailing list