Markov Analysis Help
Andrew Lee
fiacre.patrick at gmail.com
Thu May 22 12:45:58 EDT 2008
dave wrote:
> Hi Guys,
>
> I've written a Markov analysis program and would like to get your
> comments on the code As it stands now the final input comes out as a
> tuple, then list, then tuple. Something like ('the', 'water') ['us']
> ('we', 'took')..etc...
>
> I'm still learning so I don't know any advanced techniques or methods
> that may have made this easier.
>
>
> here's the code:
>
> def makelist(f): #turn a document into a list
> fin = open(f)
> results = []
> for line in fin:
> line = line.replace('"', '')
> line = line.strip().split()
> for word in line:
> results.append(word)
> return results
>
>
What's you data look like? Just straight text?
>
> def markov(f, preflen=2): #f is the file to analyze, preflen is
> prefix length
> convert_file = makelist(f)
> mapdict = {} #dict where the prefixes will map to suffixes
> start = 0
> end = preflen #start/end set the slice size
> for words in convert_file:
> prefix = tuple(convert_file[start:end]) #tuple as mapdict key
> suffix = convert_file[start + 2 : end + 1] #word as suffix to key
> mapdict[prefix] = mapdict.get(prefix, []) + suffix #append suffixes
> start += 1
> end += 1
> return mapdict
>
>
What is convert_file??
>
> def randsent(f, amt=10): #prints a random sentence
> analyze = markov(f)
> for i in range(amt):
> rkey = random.choice(analyze.keys())
> print rkey, analyze[rkey],
>
>
> The book gave a hint saying to make the prefixes in the dict using:
>
> def shift(prefix, word):
> return prefix[1:] + (word, )
That's not a very helpful hint.
It works if you call it with a tuple and a word --- it shifts off the
front of the tuple ... so :
shift(('foo','bar') "word")
becomes ('bar', 'word')
Whoopty doo --- I'm not sure what that accomplishes!!
Unless the author means "pass a list and a randomly pick a word from the
list" in which case the return statement could be
random.choice(prefix) + (word, )
* shrug *
But -- that's not very Markov ... you'd want a weighted choice of words
... depending on how you define your Markov chain -- say a Markov chain
based on part-of-speech or probability of occurrence from a given word-set.
Can you give some more detail??
>
> However I can't seem to wrap my head around incorporating that into the
> code above, if you know a method or could point me in the right
> direction (or think that I don't need to use it) please let me know.
>
> Thanks for all your help,
>
> Dave
>
More information about the Python-list
mailing list