[Tutor] parsing x is y statements from stdin

Magnus Lyckå magnus@thinkware.se
Fri Jul 18 16:44:02 2003


At 13:19 2003-07-18 +0000, Scott Fallin wrote:
>I want to do the same thing in Python, well, I want to achieve the same
>goal: parse stdin on an irc channel, do a bit of regex to pull out "she
>is ..."/"they are ..." statements.

This works for me:

import sys

verbs = "is am are have has feel feels".split()

d = {}

for line in sys.stdin:
     line = line.lower()
     for verb in verbs:
         space_verb = " %s " % verb
         if space_verb in line:
             who, what = line.split(space_verb, 2)
             d.setdefault(verb, {}).setdefault(
                 who.strip(), []).append(what.strip())

for verb in d:
     for who in d[verb]:
         print who, verb
         for what in d[verb][who]:
             print '\t%s' % what
         print


I'll go through it in some detail:

import sys
# Needed to access sys.stdin

verbs = "is am are have has feel feels".split()
# A bit more conveinent to type than
# verbs = ["is", "am", ...]

d = {}
# I magine this dictionary will eventually contain
# something like:
# d = {'is': {'dog': ['lazy', 'stupid'], 'cat': ['sleepy']},
#      'has': {'bill': ['money'], 'linus': ['respect']} }
# Note the lists! They are needed to allow several sentences
# with the same subject and verb.

for line in sys.stdin:
# Iterating directly over a file object is the same as readline()
# in recent Python versions

     line = line.lower()
     # Make it all lower case. If you don't want this, the splitting will
     # probably be a little more complicated.

     for verb in verbs:
     # Iterate over the list of verbs

         space_verb = " %s " % verb
         # I assume the verbs are surrounded by space, and I don't want
         # to find the "is" in "disk" when I look for verbs.

         if space_verb in line:
             #Ok, we found the current verb in this sentence

             who, what = line.split(space_verb, 2)
             # Split the line on the verb, but don't split in more
             # than two parts. E.g. "he is what he is" should return
             # ('he', 'what he is'), not ('he', 'what he', '')

             d.setdefault(verb, {}).setdefault(
                 who.strip(), []).append(what.strip())
             # This is obviously the tricky part...
             # First of all, it could be rewritten like:
             #   who = who.strip() # remove leading/trailing whitespace
             #   what = what.strip() # remove leading/trailing whitespace
             #   d_verb = d.setdefault(verb, {})
             #   d_verb_who = d_verb.setdefault(who, [])
             #   d_verb_who.append(what)
             # but if you don't understand .setdefault(), that won't make
             # you a lot wiser...
             # d.setdefault(x,y) means: Return d[x] if it exists, otherwise
             # let d[x] = y and then return d[x]. This methods was created
             # since code like the following was so common in Python code
             #   if not d.has_key(x):
             #       d[x] = []
             #   d[x].append(y)


# I think you figure out the rest...
for verb in d:
     for who in d[verb]:
         print who, verb
         for what in d[verb][who]:
             print '\t%s' % what
         print


--
Magnus Lycka (It's really Lyckå), magnus@thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The Agile Programming Language