Markov Analysis Help

Thu May 22 12:45:58 EDT 2008

dave wrote:
> Hi Guys,
> 
> I've written a Markov analysis program and would like to get your 
> comments on the code  As it stands now the final input comes out as a 
> tuple, then list, then tuple.  Something like ('the', 'water') ['us'] 
> ('we', 'took')..etc...
> 
> I'm still learning so I don't know any advanced techniques or methods 
> that may have made this easier.
> 
> 
> here's the code:
> 
> def makelist(f):     #turn a document into a list
>     fin = open(f)
>     results = []
>     for line in fin:
>                line = line.replace('"', '')
>         line = line.strip().split()
>         for word in line:
>             results.append(word)
>     return results
> 
> 

What's you data look like?  Just straight text?

> 
> def markov(f, preflen=2):    #f is the file to analyze, preflen is 
> prefix length
>     convert_file = makelist(f)
>     mapdict = {}        #dict where the prefixes will map to suffixes
>     start = 0
>     end = preflen         #start/end set the slice size
>     for words in convert_file:
>         prefix = tuple(convert_file[start:end])     #tuple as mapdict key
>         suffix = convert_file[start + 2 : end + 1]  #word as suffix to key
>         mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
>         start += 1
>         end += 1
>     return mapdict
> 
> 

What is convert_file??

> 
> def randsent(f, amt=10):     #prints a random sentence
>        analyze = markov(f)
>     for i in range(amt):
>         rkey = random.choice(analyze.keys())
>         print rkey, analyze[rkey],
> 
> 
> The book gave a hint  saying to make the prefixes in the dict using:
> 
> def shift(prefix, word):
>     return prefix[1:] + (word, )

That's not a very helpful hint.

It works if you call it with a tuple and a word --- it shifts off the 
front of the tuple ... so :

shift(('foo','bar') "word")
becomes   ('bar', 'word')

Whoopty doo --- I'm not sure what that accomplishes!!

Unless the author means "pass a list and a randomly pick a word from the 
list" in which case the return statement could be

random.choice(prefix) + (word, )

* shrug *

But -- that's not very Markov ... you'd want a weighted choice of words 
... depending on how you define your Markov chain -- say a Markov chain 
based on part-of-speech or probability of occurrence from a given word-set.

Can you give some more detail??

> 
> However I can't seem to wrap my head around incorporating that into the 
> code above, if you know a method or could point me in the right 
> direction (or think that I don't need to use it) please let me know.
> 
> Thanks for all your help,
> 
> Dave
>