[Tutor] parsing x is y statements from stdin
Magnus Lyckå
magnus@thinkware.se
Fri Jul 18 16:44:02 2003
At 13:19 2003-07-18 +0000, Scott Fallin wrote:
>I want to do the same thing in Python, well, I want to achieve the same
>goal: parse stdin on an irc channel, do a bit of regex to pull out "she
>is ..."/"they are ..." statements.
This works for me:
import sys
verbs = "is am are have has feel feels".split()
d = {}
for line in sys.stdin:
line = line.lower()
for verb in verbs:
space_verb = " %s " % verb
if space_verb in line:
who, what = line.split(space_verb, 2)
d.setdefault(verb, {}).setdefault(
who.strip(), []).append(what.strip())
for verb in d:
for who in d[verb]:
print who, verb
for what in d[verb][who]:
print '\t%s' % what
print
I'll go through it in some detail:
import sys
# Needed to access sys.stdin
verbs = "is am are have has feel feels".split()
# A bit more conveinent to type than
# verbs = ["is", "am", ...]
d = {}
# I magine this dictionary will eventually contain
# something like:
# d = {'is': {'dog': ['lazy', 'stupid'], 'cat': ['sleepy']},
# 'has': {'bill': ['money'], 'linus': ['respect']} }
# Note the lists! They are needed to allow several sentences
# with the same subject and verb.
for line in sys.stdin:
# Iterating directly over a file object is the same as readline()
# in recent Python versions
line = line.lower()
# Make it all lower case. If you don't want this, the splitting will
# probably be a little more complicated.
for verb in verbs:
# Iterate over the list of verbs
space_verb = " %s " % verb
# I assume the verbs are surrounded by space, and I don't want
# to find the "is" in "disk" when I look for verbs.
if space_verb in line:
#Ok, we found the current verb in this sentence
who, what = line.split(space_verb, 2)
# Split the line on the verb, but don't split in more
# than two parts. E.g. "he is what he is" should return
# ('he', 'what he is'), not ('he', 'what he', '')
d.setdefault(verb, {}).setdefault(
who.strip(), []).append(what.strip())
# This is obviously the tricky part...
# First of all, it could be rewritten like:
# who = who.strip() # remove leading/trailing whitespace
# what = what.strip() # remove leading/trailing whitespace
# d_verb = d.setdefault(verb, {})
# d_verb_who = d_verb.setdefault(who, [])
# d_verb_who.append(what)
# but if you don't understand .setdefault(), that won't make
# you a lot wiser...
# d.setdefault(x,y) means: Return d[x] if it exists, otherwise
# let d[x] = y and then return d[x]. This methods was created
# since code like the following was so common in Python code
# if not d.has_key(x):
# d[x] = []
# d[x].append(y)
# I think you figure out the rest...
for verb in d:
for who in d[verb]:
print who, verb
for what in d[verb][who]:
print '\t%s' % what
print
--
Magnus Lycka (It's really Lyckå), magnus@thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The Agile Programming Language