Generic logic/conditional class or library for classification of data

Sun Apr 1 06:55:30 EDT 2007

On Sat, 31 Mar 2007 21:54:46 -0700, Basilisk96 wrote:

> As a very basic example, consider a set of uncategorized objects that
> have text descriptions associated with them. The objects are some type
> of tangible product, e.g., books. So the input object has a
> Description attribute, and the output object (a categorized book)
> would have some attributes like Discipline, Target audience, etc.
> Let's say that one such rule is "if ( 'description' contains
> 'algebra') then ('discipline' = 'math', 'target' = 'student')". Keep
> in mind that all these attribute names and their values are not known at
> design time.

Easy-peasy.

rules = {'algebra': {'discipline': 'math', 'target': 'student'},
    'python': {'section': 'programming', 'os': 'linux, windows'}}

class Input_Book(object):
    def __init__(self, description):
        self.description = description

class Output_Book(object):
    def __repr__(self):
        return "Book - %s" % self.__dict__

def process_book(book):
    out = Output_Book()
    for desc in rules:
        if desc in book.description:
            attributes = rules[desc]
            for attr in attributes:
                setattr(out, attr, attributes[attr])
    return out

book1 = Input_Book('python for cheese-makers')
book2 = Input_Book('teaching algebra in haikus')
book3 = Input_Book('how to teach algebra to python programmers')

>>> process_book(book1)
Book - {'section': 'programming', 'os': 'linux, windows'}
>>> process_book(book2)
Book - {'discipline': 'math', 'target': 'student'}
>>> process_book(book3)
Book - {'discipline': 'math', 'section': 'programming', 
'os': 'linux, windows', 'target': 'student'}

I've made some simplifying assumptions: the input object always has a
description attribute. Also the behaviour when two or more rules set the
same attribute is left undefined. If you want more complex rules you can
follow the same technique, except you'll need a set of meta-rules to
decide what rules to follow.

But having said that, I STRONGLY recommend that you don't follow that
approach of creating variable instance attributes at runtime. The reason
is, it's quite hard for you to know what to do with an Output_Book once
you've got it. You'll probably end up filling your code with horrible
stuff like this:

if hasattr(book, 'target'):
    do_something_with(book.target)
elif hasattr(book, 'discipline'):
    do_something_with(book.discipline)
elif ... # etc.

Replacing the hasattr() checks with try...except blocks isn't any
less icky.

Creating instance attributes at runtime has its place; I just don't think
this is it.

Instead, I suggest you encapsulate the variable parts of the book
attributes into a single attribute:

class Output_Book(object):
    def __init__(self, name, data):
        self.name = name # common attribute(s)
        self.data = data # variable attributes

Then, instead of setting each variable attribute individually with
setattr(), simply collect all of them in a dict and save them in data:

def process_book(book):
    data = {}
    for desc in rules:
        if desc in book.description:
            data.update(rules[desc])
    return Output_Book(book.name, data)

Now you can do this:

outbook = process_book(book)
# handle the common attributes that are always there
print outbook.name
# handle the variable attributes
print "Stock = %s" % output.data.setdefault('status', 0)
print "discipline = %s" % output.data.get('discipline', 'none')
# handle all the variable attributes
for key, value in output.data.iteritems():
    do_something_with(key, value)

Any time you have to deal with variable attributes that may or may not be
there, you have to use more complex code, but you can minimize the
complexity by keeping the variable attributes separate from the common
attributes.

-- 
Steven.