From Michael@RCP.co.uk  Wed Aug  1 09:22:01 2001
From: Michael@RCP.co.uk (Michael Abbott)
Date: Wed, 1 Aug 2001 09:22:01 +0100
Subject: [Types-sig] Query about Types SIG status
Message-ID: <217F6DFA440ED111ACDA00A0C906B00601AAE6BC@arsenic.rcp.co.uk>

I'd be very grateful if someone could post a summary of the current status
of the Python Types SIG.  

I'm a little concerned:

1.	The only mail I've seen since joining this list has been SPAM!
2.	Some of the links on the SIG home page seem to be broken (of course
the recent outage of python.org doesn't help, but that's not what I'm
referring to).
3.	There doesn't seem to be much in the way of recent and current
proposals, as far as I can see.

There seem to be a variety of documents in varying stages of maturity, but
it's difficult to see what the current state of thinking is.  There's a
document from Guido van Rossum with some early ideas, an unnumbered PEP from
Paul Prescod on an interface declaration language, and PEP-0245 by Michel
Pelletier, plus a number of other papers.  However (it's difficult to tell),
most of these seem to be quite elderly!

Clearly the ideas of interfaces and of static types are distinct but closely
related developments.  Is this an area of active development, or is the
current consensus that it's not worth the effort?


From gward@mems-exchange.org  Mon Aug 20 22:01:38 2001
From: gward@mems-exchange.org (Greg Ward)
Date: Mon, 20 Aug 2001 17:01:38 -0400
Subject: [Types-sig] Pre-announce: Oscar 0.1
Message-ID: <20010820170138.A8954@mems-exchange.org>

--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi all --

several months ago, I cooked up a tool, Oscar for rigorously
type-checking a Python object graph: you define an object schema
(currently through specially-formatted class docstrings), and Oscar
crawls a persistent object graph to ensure that every scrap of data in
it conforms to your schema.  We use this regularly in the MEMS Exchange
for integrity-checking our ZODB database; it's not the be-all-end-all to
checking that all is well with an object database, but it's a hell of a
lot better than nothing.

In the past few weeks, I finally got around to writing the scripts and
documentation necessary to release Oscar publicly.  Now I'm ready to do
so, pending approval by the CNRI brass (sigh).  There's nothing
available for download just yet, so no chest-thumping post to
python-announce.  But there is documentation describing the Oscar type
language, which I think is a fine way to descibe Python data types.  So,
on the assumption that types-sig and zodb-dev readers are more likely
than most to want to rush out and try Oscar as soon as it's available,
I'm posting all that documentation right here.  I welcome feedback as to
whether this is a crazy idea or not, whether the type syntax is bogus or
excellent, whether the type-system is "good enough" or needs to be
all-encompassing, etc.

Attached you'll find:

  type-system.txt
    a description of Oscar's type system and the syntax for
    defining Oscar types
  schema.txt
    a description of what an object schema consists of and
    how you define one
  checking.txt
    how to use Oscar to type-check an existing persistent
    object graph

Enjoy!  Hopefully the real release will happen this week or next.

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org

--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="type-system.txt"

Oscar's type system
-------------------

Oscar's type system is a large, useful subset of Python's type system.
The major advantages of Oscar's type system are that it is explicit and
enforced.  Since Python types are implicit (determined at run-time) and
mostly unenforced, Oscar sits quite neatly on top of Python, bringing
order and structure to a potentially chaotic situation.

Oscar understands the following major classes of data types:

  * atomic types: anything with a distinct Python type object can
    be an atomic type in Oscar, but they're intended for types with
    a single, atomic value.  The built-in types int, string, and float
    are obvious candidates (and in fact these are present as atomic
    types by default in any Oscar schema, along with long and complex).
    You can add use other built-in types (e.g. file, function) as atomic
    types, or any extension type.  For example, if you use the
    mx.DateTime module, you might add DateTime as an atomic type, so
    you can declare variables as being of type DateTime and have Oscar
    enforce that requirement.

    Examples:
      "string" denotes a string variable
      "int" denotes an integer variable
      "DateTime" denotes a DateTime variable; this only works if you
        have explicitly added an atomic type called "DateTime" to your
        schema

  * container types: Python's built-in list, dictionary, and tuple types.
    (Classes that act like lists, dictionaries, and tuples are
    "instance-container" types, and I haven't yet decided what to do
    about the type-class unification in Python 2.2.)

    Oscar enforces fairly stringent rules for container types:
      - lists must be homogenous, i.e. all elements of the same
        type, and may be of any length
        Examples:
          "[string]" denotes a list of strings
          "[int|long]" denotes a list of either ints or longs
            (a union type; see below)
          "[any]" denotes a list of anything (ie., no enforcement)
            (see below for "any" types)

      - dictionaries must be separately homogenous: all keys must
        be of the same type, and all values must be of the same type.
        (Incidentally, Oscar knows nothing about which types are
        hashable and allowed to be dictionary keys; that's enforced by
        Python at run-time.)  The key type and value type are specified
        separately.
        
        Examples:
          "{ string : int }" denotes a dictionary mapping strings to ints
          "{string : int|long} denotes a dictionary mapping strings
            to either ints or longs
          "{long : [string]} denotes a dictionary mapping longs to
            lists of strings

      - tuples are hetergenous (mixed-type) but fixed in size, and each
        "slot" is fixed in type.

        Examples:
          "(int,)" denotes a tuple containing exactly one integer
          "(string, string)" denotes a pair of strings
          "([int|long], string, int)" denotes a triple:
            list of (int or long), string, int

        Tuple types have one exception to this rule: if a tuple type is
        "extended", then the rules change for its last slot: for
        example, the extended tuple type "(string, int*)" (note the "*")
        denotes a tuple with exactly one string followed by zero or more
        ints.  The following are all valid values of this type:
          ("foo", 3)
          ("foo", 3, 1)
          ("foo", 2, 5, 1, 6, 2, 1, 4, 5, 1, 15, 6, 2, 5)
          ("foo",)
        
        This is mainly used for tuples that act like lists, eg. if you
        want a list of strings to be usable as a dictionary key, you
        code it as a tuple of strings instead (lists aren't hashable).
        This practice is incompatible with Oscar's basic tuple
        definition, so extended tuples are provided as an escape.

    Note that "of the same type" refers to Oscar types, not Python
    types.  For example, if a variable is declared "[int|long]",
    each element is checked separately to make sure it is either
    an int or a long; [1, 2L, 3] is a valid value of the type
    "[int|long]".  (Again, union types are described below.)

  * instance types: used for class instances.  A class Foo defined in
    the module foo.bar has an associated instance type "foo.bar.Foo".
    Generally, it's not enough to say that a variable is of type
    "foo.bar.Foo"; you also want to specify the instance attributes of
    Foo (and their types!).  Each instance type has an associated class
    definition that stores this information.  This is where Oscar's real
    power shines through, because typically Python data is accessed via
    an instance of some class.  If your schema has a class definition
    for that "root class", and for the class of each object reachable
    from the root, Oscar will crawl your entire object graph, ensuring
    that every instance, every attribute of every instance, and every
    element of every container anywhere in that object graph is of the
    correct type.

    The essential ingredient of a class definition is its attribute
    list.  This is described below, in "Defining a class schema".

    Examples:
      "FooBar" denotes an instance of class FooBar defined in
        the main program
      "thing.Thing" denotes an instance of class Thing defined
        in module thing

  * instance-container types: Python classes often implement the
    semantics of lists, tuples, or dictionaries.  You don't want to give
    up type-checking every attribute of instances of such classes, but
    you also want to make sure that they conform to the strict
    type-checking rules Oscar applies to containers.  Hence,
    instance-container types marry the two.

    Examples:
      "UserList.UserList [string]"
        denotes an instance of the UserList class, defined in the
        UserList module, that acts like a list of strings
      "MyDict { string : int|long }"
        denotes an instance of the MyDict class that acts like a
        dictionary mapping strings to either ints or longs

  * union types: any set of Oscar types may be combined to form a
    union type.  A candidate value is tested against each sub-type of
    the union type, and only rejected if all of the sub-types reject it.

    Examples:
      "int | long" denotes a value that may be either an int or a long
      "string | [string] : (string, string)"
        denotes a value that may be either a string, a list of strings,
        or a pair (tuple) of strings

  * wildcard type: used for variables that can be of any value.
    There is only one wildcard type, spelled "any".

  * boolean type: used for boolean (true/false) values.  Strictly
    speaking, any Python value can be interepreted in a boolean way:
    eg. 0, 0L, 0.0, "", and None are all false values, while 42,
    3.14159, and "foo!" are all true.  Oscar restricts this drastically:
    the only allowed values for boolean variables are 0, 1, and None.

  * alias types: used to define shorthand names for commonly-used 
    types.  The most common use of this is to alias the bare name of a
    class to its fully-qualified name -- e.g. if class Thing is defined
    in module project.util, then "Thing" might be an alias for
    "project.util.Thing".  ("project.util.Thing" is the instance type,
    and "Thing" is an alias type that expands to that instance type.)

    Aliases are also useful if you have a particular union type used
    frequently; instead of always spelling out "int | float | long", you
    can define "number" as an alias for this union type.  (This also
    makes it easy to change your definition of "number" if someday you
    have to extend it to handle, say, complex or rational numbers.)


Type grammar
------------

[taken from the type_parser.py module]

type : NAME                     # atomic, alias, instance, boolean, any
     | container_type           # list, tuple, dictionary
     | NAME container_type      # instance-container type
     | union_type

container_type : list_type
               | tuple_type
               | dictionary_type
list_type      : "[" type "]"
tuple_type     : "(" (type ",")* type "*"? ","? ")"
dictionary_type: "{" type ":" type "}"

union_type : type ("|" type)+

Tokens:
  NAME : [a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)*


$Id: type-system.txt,v 1.1 2001/08/20 18:10:09 gward Exp $

--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: attachment; filename="schema.txt"
Content-Transfer-Encoding: 8bit

Object schemata
---------------

An object schema consists of the following components:

  * a set of atomic types, usually a subset of Python's builtin
    types.  The default atomic types are string, int, long, float, and
    complex.  In principle, you can add other builtin types (like
    function, class, or file) or extension types to a schema, but
    Oscar currently has problems with many builtin types.  (In
    particular, only types whose values can be pickled may be atomic
    types in Oscar.)

  * a type alias mapping, letting you define shorthand names for common
    types.

  * a set of class definitions.  A class definition maps instance
    attribute names to attribute types.  This performs two purposes: it
    defines the expected set of attributes for instances of a class, and
    it defines the type of each attribute.

In the current version of Oscar, an object schema is defined through a
project description file and the class docstrings in a set of source
files.  This is useful in practice, but it's kind of hard to talk about
object schemata without a simple, compact schema description language.
Thus, consider the following pseudo-schema:

  class Thing:
    name : string

  class Animal (Thing):
    num_legs : int
    furry : boolean

(Coincidentally, this is the syntax emitted by gen_schema's "-t" option.
However, this is currently a write-only language; Oscar has no way to
parse schemata created by "gen_schema -t".)

This defines an object schema with no additional atomic types (just the
default five: string, int, long, float, and complex), no aliases, and
two classes (both, presumably, in the __main__ module, since the class
names are unqualified).

If you ask Oscar to type-check an instance of Thing under this schema,
or if it comes across a Thing instance in the course of type-checking a
larger object graph, it does the following:
  * ensure that the instance has exactly one attribute, 'name'
  * ensure that the value of this attribute is a string

Similarly, Oscar type-checks an Animal instance under this schema as
follows:
  * ensure that it has exactly three attributes, 'name', 'num_legs',
    and 'furry' (note that 'name' is inherited from Thing)
  * ensure that the value of 'name' is a string, 'num_legs' an int,
    and 'furry' a boolean (i.e. 0, 1, or None)


Defining an object schema: class docstrings
-------------------------------------------

Currently, you define an object schema by writing specially-formatted
class docstrings.  (There is no separate schema description
language... yet.)  For example, the Thing class in the above
pseudo-schema might be documented as:

  class Thing:
      """A single thing, which may be an animal, vegetable, or mineral.
      The only property common to all things is a name.

      Instance attributes:
        name : string
          the name of the thing
      """

Oscar (specifically, the gen_schema script that parses these docstrings)
ignores everything in the docstring up to the "Instance attributes:"
line.  After that, things get fairly rigid:
    
  * the "Instance attributes:" line must be indented to the same depth
    as the main body of the docstring
    
  * each attribute name is indented two spaces relative to that,
    and followed by a colon (":") and the attribute's type
    
  * attribute descriptions (which are optional, and are ignored by
    Oscar) are indented a further two spaces
    
  * when indentation returns to the same level as the "Instance
    attributes:" line, Oscar stops processing the docstring and
    goes on to the next class in the module (thus, blank lines
    are allowed in the attribute list)

Here is a slightly more elaborate example:

  class Animal (Thing):
      """An animal, ie. a thing with multiple legs and possibly fur.

      Instance attributes:
        num_legs : int
          the number of legs this animal has
        furry : boolean
          whether this animal is furry or not

      Outsiders should use 'get_num_legs()' and 'is_furry()' to access
      these attributes.
      """

Here is a stripped-down version of this docstring that is exactly
equivalent as far as Oscar is concerned:

  class Animal (Thing):
      """
      Instance attributes:
        num_legs : int
        furry : boolean
      """

Sometimes a class will have no instance attributes of its own; Oscar has
special syntax for this:

  class Mammal (Animal):
    """Instance attributes: none"""

This is different from simply omitting the list of instance attributes,
or omitting the docstring entirely.  If Oscar sees a Mammal instance
with any attributes apart from those inherited from Animal, it will
complain.  However, if Mammal has no docstring or attribute list, Oscar
can't do detailed type-checking of instances of that class.  Instead, it
  * complains that the class has no docstring (or no attribute list)
  * exclude the class from the schema
  * when type-checking an object graph, complain about any instances of
    that class it discovers 


Defining an object schema: the project description file
-------------------------------------------------------

Writing class docstrings that document every instance attribute is the
key part of defining an object schema.  However, you still have to tell
Oscar how to find those class docstrings and what to do with them.  This
is done with the gen_schema script and its project description file.


[Searching by directory]

At its simplest, the project description file contains a list of
directories to search for Python source files, and possibly a prefix to
use in turning source filenames into module names.  For example, the
project description file for Oscar itself (oscar.cfg in the top-level
Oscar directory) starts out with this:

  dirs = ["."]
  prefix = "oscar"

(The project description file is just Python code; it's execfile'd by
gen_schema.)  This instructs gen_schema to search for *.py in the
current directory, and to assume that all modules found actually live in
the "oscar" package.  Hence when it finds schema.py, it considers that
module to be "oscar.schema", and a class ObjectSchema in that file will
be called "oscar.schema.ObjectSchema".

gen_schema does *not* search recursively; if you want it to descend into
sub-directories, you must specify them explicitly:
  dirs = ["compiler", "compiler/parser", "compiler/optimizer"]

The directories in 'dirs' are interpreted relative to a base directory
supplied with the "-d" (or --base-dir) option to gen_schema.  If you run
gen_schema from Oscar's top directory (ie., where schema.py lives),
everything is fine -- the current directory is the right place to look
for Oscar's source files.  In that case,
  ./scripts/gen_schema -p oscar.cfg

is the right incantation.  (The resulting schema will be written (as a
pickle) to schema.pkl.)

If you're in /home/greg and Oscar is in /tmp/oscar, though, the above
incantion is wrong: Oscar will consider any *.py files in /home/greg to
be part of the "oscar" package, and will scan them for docstrings to
generate a schema.  This probably won't work; you need to specify the
base directory that 'dirs' is interpreted relative to:
  /tmp/oscar/scripts/gen_schema -p /tmp/oscar/oscar.cfg -d /tmp/oscar

(Obviously, it's easier just to run gen_schema from the right place!)


[Specifying individual modules]

If you don't want to search every "*.py" file in a list of directories,
you can supply a list of explicit module names, eg.:
  extra_modules = ["oscar.schema",
                   "oscar.valuetype"]

Note that extra_modules is a list of fully-qualified module names, *not*
filenames.

This variable is called 'extra_modules' because these modules are added
to the list of modules found by searching the directories named in
'dirs'.  If 'dirs' isn't supplied, the modules in 'extra_modules' are
Oscar's only source for class definitions.


[Excluding individual modules]

You can refine gen_schema's search for classes by excluding certain
modules.  As an example, Oscar includes a copy of SPARK (John Aycock's
nifty parser framework) as the "oscar.spark" module; since this is
really someone else's code, it doesn't have Oscar-style docstrings to
parse.  Also, the parser classes are transient and shouldn't wind up in
any persistent store of an Oscar object graph, so there's not much point
in type-checking them.  Thus, I exclude both oscar.spark and
oscar.type_parser (which provides classes derived from the SPARK
classes) from gen_schema's scan:
  exclude_modules = ["oscar.spark", "oscar.type_parser"]

Like extra_modules, exclude_modules is a list of fully-qualified module
names.


[Excluding individual classes]

You can also exclude specific classes from the search, instead of whole
modules.  This is useful if a particular module provides some transient
classes and other first-class persistent classes.  For example, I might
wish to exclude the TypecheckContext class, defined in oscar.context,
from schema generation:
  exclude_classes = ["oscar.context.TypecheckContext"]

Again, classes are specified as fully-qualified Python names.


[Adding atomic types]

If the five default atomic types aren't enough for your project, you'll
have to add new ones.  This might happen if you use extension types in
your application, or if you store slightly odd objects in your
persistent object graph, like functions or class objects.  New atomic
types are specified using an example value, not using the type object
itself.  (This is necessary because type objects can't be pickled, and
gen_schema pickles the schema for future use.  We can't store type
objects in the pickled schema, so we store sample values instead.)

For instance, to add Marc-Andr� Lemburg's DateTime type to your schema,
add this to your project definition:
  import mx.DateTime
  atomic_types = [mx.DateTime.now()]

The structure of 'atomic_types' is a tad complex.  Most often, each
element of the list is simply a value of the atomic type you want to add
to your schema -- eg. here I created a sample DateTime object.  Since
these sample values go straight into the object schema, which is
subsequently pickled by gen_schema, these must be pickle-able values.
Oscar probably needs to grow a real schema definition language before
you can have, say, Python function or file objects as atomic types in an
object schema.  (In other words, I think this is an implementation
problem due to reliance on pickling rather than a fundamental problem.)

In this simple case, the name of the atomic type is implicit, because
the type itself supplies its name -- "DateTime" in the above example.
(Try "type(DateTime.now()).__name__".)

In some cases, though, you may want to specify your own name for an
atomic type.  In that case, just supply a tuple (sample_value,
type_name) in atomic_types.  This is useful if you're dealing with
ExtensionClass, where every class is a new type.  (This is also the case
with classes derived from types in Python 2.2.)  For instance, a ZODB
application that needs "class" and "instance" types (for class objects
and generic instance objects) might do this:

   import ZODB
   from Persistence import Persistent
   # ...
   atomic_types = [(Persistent(), "instance"),
                   (Persistent, "class")]

If you don't understand why you might need this, you probably don't need
it.


Putting it all together
-----------------------

For a simple example of defining an object schema, take a look in the
"examples" sub-directory of Oscar's source distribution.  There, you'll
find:
  * the thing.py and animal.py modules, which provide the classes
    ThingCollection, Thing, Animal, and Mammal
  * the make_things script, which creates some things, bundles
    them in a collection, and pickles them to things.pkl
  * the things.proj project description file, which tells
    gen_schema how to generate a schema for this project

For now, we're just going to generate a schema from the Python source
files and things.proj.  Later (in "checking.txt", the document that
covers type-checking an object graph) we'll run make_things and
type-check the results.

If you haven't installed Oscar yet, you should either do so now or
perpetrate your favourite kludge for ensuring that it's available
through sys.path.  (If you don't have a favourite kludge, just install
it.)  Run
  python -c 'import oscar'
to make sure it worked -- if this command completes silently, all is
well.

Installing Oscar should also install the gen_schema and check_data
scripts.  I'll assume they're in your shell's PATH; you might have to
adjust your PATH or the commands here accordingly.

Before we run gen_schema, let's take a look at the ingredients of this
project.  First, the project description file, things.proj, is quite
simple:

  extra_modules = [("thing", "thing.py"), ("animal", "animal.py")]

There's no 'dirs' here, meaning gen_schema won't go searching for "*.py"
anywhere.  It just looks for the 'thing' module in thing.py, and the
'animal' module in animal.py.  Since explicit source filenames are
supplied, the 'thing' and 'animal' modules don't have to be in Python's
path -- gen_schema simply parses the source files.

Next, take a look at thing.py.  You'll see that it defines two classes,
Thing and ThingCollection, and that the instance attributes of each are
fully documented.  Similarly, animal.py provides the Animal and Mammal
classes.

Finally, let's run gen_schema.  We'll save the schema for this project
to thing_schema.pkl and thing_schema.osc -- the two files have the same
content, but only the latter is human-readable.  From the "examples"
directory:

  gen_schema -p things.proj -o things_schema.pkl -t things_schema.osc

If you're really curious about what's going on here, add the "-v"
option.  The output of gen_schema (without "-v") should look like this:

  looking for classes...
  found 4 classes
  parsing class docstrings...
  writing object schema to things_schema.osc...
  pickling object schema to things_schema.pkl...

Take a look at things_schema.osc for a human-readable representation of
the schema also saved in things_schema.pkl.

Now that we have an object schema for this project, we can use it later
to type-check a persistent object graph created by applications that use
this project, such as make_things.  This will be done in the next
document, "checking.txt".


$Id: schema.txt,v 1.5 2001/08/20 18:32:31 gward Exp $

--6TrnltStXW4iwmi0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="checking.txt"

Type-checking an object graph
-----------------------------

Once you've gone to the trouble of defining your object schema by
documenting your instance attributes, creating a project description
file, and running gen_schema, it's quite simple to ensure that a
collection of objects conforms to your schema.  I'm going to assume
you've followed the steps at the end of the "schema.txt" document, and
created a schema for the "things" project in things_schema.pkl.  Once
you have a schema, you can type-check a saved object graph.  First, we
have to create an object graph using the make_things application.

This requires a bit more setup than running gen_schema, since now we
have to be able to import the 'thing' and 'animal' modules.  The easiest
way to do that is to switch into the "examples" subdirectory and force
Python to search it for modules:

  cd examples
  export PYTHONPATH=.

(That last line is for Bourne-like shells under Unix.  For csh-like
shells, say "setenv PYTHONPATH .".  For other operating systems, you're
on your own.)

Now run make_things:

  python make_things

You should now see the file "things.pkl" in the current directory.

Note that make_things has a (deliberate) type error in it, which means
the object graph captured in things.pkl is inconsistent with the object
schema in things_schema.pkl.  We're going to use the check_data script
to find the type error in the data.

(See if you can spot the error by reading the make_things script.  It
would be easy to modify the underlying class -- Animal, in this case --
to catch this particular error.  However, there's no limit to the number
of type errors you can make in Python code, and hardening your
underlying classes to catch every single one of them is a lot of work.
Thus, Oscar exists to catch such errors after-the-fact.  That is, Oscar
doesn't tell you immediately when a type error is made -- it only
detects it in data that has been saved to a persistent store.  I suspect
the Oscar machinery could be used for run-time type-checking, but that
would probably impose a pretty severe performance penalty, so I haven't
experimented in that direction... yet.)

Obviously, in order to load your pickled object graph, Oscar needs to be
able to import the 'thing' and 'animal' modules.  Since you already ran
make_things, that precondition is satisfied, so we can go ahead and run
check_data:

  check_data -f pickle things_schema.pkl things.pkl

The "-f" option tells check_data what format your object graph is stored
in.  (Currently, the only other option is "zodb", which is further
modified by the "-s" (storage) option.)  The first filename,
things_schema.pkl, is the file containing the object schema for this
project.  This is *always* a pickle, regardless of "-f".  The second
argument is the location of the data to be checked -- in this case, the
object graph created by make_things.

The one type error in the make_things script corresponds to one type
error in things.pkl, reported by check_data as:

  root.things['Tyrannosaurus rex'].num_legs:
    expected int, got string ('2 big, 2 small')

If you were unable to track down the error message by reading the code
earlier, this error message should help a lot.  ;-)

--6TrnltStXW4iwmi0--


From kiko@async.com.br  Tue Aug 21 20:35:38 2001
From: kiko@async.com.br (Christian Robottom Reis)
Date: Tue, 21 Aug 2001 16:35:38 -0300 (BRT)
Subject: [Types-sig] Re: [ZODB-Dev] Pre-announce: Oscar 0.1
In-Reply-To: <20010820170138.A8954@mems-exchange.org>
Message-ID: <Pine.LNX.4.32.0108211626180.325-100000@blackjesus.async.com.br>

On Mon, 20 Aug 2001, Greg Ward wrote:

> several months ago, I cooked up a tool, Oscar for rigorously
> type-checking a Python object graph: you define an object schema
> (currently through specially-formatted class docstrings), and Oscar

Currently meaning this will change predictably? Let me just say this is
quite nice, and I'll have to implement something like this in our Domain
classes _soon_. My question is:

Does the typesystem offer any introspection? I.E., can I in runtime
discover the attributes registered for my class, and what types they are?
I need this for type-checking when sorting columns in my UI framework, so
that would come in handy.

Oh, this _is_ a runtime typecheck? :)

> In the past few weeks, I finally got around to writing the scripts and
> documentation necessary to release Oscar publicly.  Now I'm ready to do
> so, pending approval by the CNRI brass (sigh).  There's nothing

How's it going?

Ok, assuming this will be released, can I go ahead and docstring my
classes and assume I'm going to be able to do the runtime checking when
it's available? I had devised a complete typesystem based on dictionaries,
but if yours is ready and tested, I'll chuck mine.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 272 3330 | NMFL


From gward@mems-exchange.org  Wed Aug 22 00:24:19 2001
From: gward@mems-exchange.org (Greg Ward)
Date: Tue, 21 Aug 2001 19:24:19 -0400
Subject: [Types-sig] Re: [ZODB-Dev] Pre-announce: Oscar 0.1
In-Reply-To: <Pine.LNX.4.32.0108211626180.325-100000@blackjesus.async.com.br>; from kiko@async.com.br on Tue, Aug 21, 2001 at 04:35:38PM -0300
References: <20010820170138.A8954@mems-exchange.org> <Pine.LNX.4.32.0108211626180.325-100000@blackjesus.async.com.br>
Message-ID: <20010821192419.C10598@mems-exchange.org>

On Mon, 20 Aug 2001, I wrote:
> several months ago, I cooked up a tool, Oscar for rigorously
> type-checking a Python object graph: you define an object schema
> (currently through specially-formatted class docstrings), and Oscar

First of all, let me modify the announcement a bit.  There are too many
software packages out there already called "Oscar" -- so Oscar is dead,
long live the Grouch!  (Does anyone who grew up outside of North America
get it?  Oh well...)

On 21 August 2001, Christian Robottom Reis said:
> Currently meaning this will change predictably? Let me just say this is
> quite nice, and I'll have to implement something like this in our Domain
> classes _soon_. My question is:

As far as I am concerned, Grouch will *always* support extracting an
object schema from docstrings.  The MEMS Exchange has ~100 classes in
20,000 lines of code that use Grouch's docstring format for database
type-checking, and another 100 classes that aren't in the object schema
but use the same docstring format for clarity and consistency.

When/if a new schema language becomes part of Grouch, it will be offered
as a complement to schema extraction from docstrings.

> Does the typesystem offer any introspection? I.E., can I in runtime
> discover the attributes registered for my class, and what types they are?
> I need this for type-checking when sorting columns in my UI framework, so
> that would come in handy.

Yes, although it's a tad clunky right now.  Eg. say I have a schema in
schema.pkl, as created by the gen_schema script:
  >>> from cPickle import *
  >>> schema = load(open("schema.pkl"))
  >>> cdef = schema.get_class_definition("mems.access.user.User")
  >>> cdef
  <ClassDefinition at 81a12d4: mems.access.user.User>

OK, you want to know what attributes this class has?
  >>> cdef.attrs
  ['user_id', 'password_hash', 'prefix', 'first_name', 'last_name',
  'suffix', 'address', 'email', 'phone', 'fax', 'timezone',
  'allow_mailing', 'group_list', 'history']

You want to know what type various attributes are?
  >>> cdef.get_attribute_type('password_hash')
  <AtomicType at 81475e4: string>
  >>> cdef.get_attribute_type('address')
  <AliasType at 81a176c: Address>

Hmmm, the 'address' attribute is an alias type -- let's expand the
alias to see what it really is:
  >>> schema.get_alias('Address')
  <InstanceType at 8149694: mems.lib.address.Address>

Digging up the class definition for this is more awkward than it needs
to be:
  >>> name = schema.get_alias('Address').get_class_name()
  >>> name
  'mems.lib.address.Address'
  >>> cdef2 = schema.get_class_definition(name)
  >>> cdef2
  <ClassDefinition at 81489ec: mems.lib.address.Address>

And now we can get the list of attributes in *this* class:
  >>> cdef2.attrs
  ['street1', 'street2', 'street3', 'city', 'state', 'zip',
  'country_code']

...and around we go.  You get the picture.  The documentation for this
API is all in the code.

> Oh, this _is_ a runtime typecheck? :)

Not currently.  Right now, we do a type-checking pass nightly on our
database.  So far, it mostly finds documentation errors -- ie. it's
mainly peace-of-mind, rather than something that regularly finds bugs.

The main reason Grouch doesn't step in at run-time is because I'm afraid
of the performance hit.  The implementation right now concentrates on
correctness and completeness, with efficiency to come later.  I don't
even have performance figures at hand, although the MEMS Exchange
database (140,000 objects in a 45 MB ZODB FileStorage) is a pretty good
test case.

Ideas for run-time:
  * invoke Grouch in __getstate__(), so your objects are checked before
    they're actually written
  * invoke Grouch in __setattr__(), so every attribute is checked at
    assignment time (this is the one that really scares me, performance-
    wise)

The __getstate__() hook ought to be doable if you have a common base
class for all your database classes.  The __setattr__() hook would be
scary for ZODB apps right now; probably best to wait until Python 2.2 is
out and Persistent has been rewritten as a meta-class.  (Or whatever is
going to happen to Persistent.)

> > In the past few weeks, I finally got around to writing the scripts and
> > documentation necessary to release Oscar publicly.  Now I'm ready to do
> > so, pending approval by the CNRI brass (sigh).  There's nothing
> 
> How's it going?

Pretty good, actually.  The actual idea of releasing the code went down
pretty well; nailing down a license took a bit longer.  It's basically
the same as the Quixote 0.3 license; the main problem is that --
according to the FSF -- it's not GPL-compatible.  To be honest, I'm not
entirely certain what this means, but I don't think it matters as much
for Python applications/libraries as it does for Python itself.

> Ok, assuming this will be released, can I go ahead and docstring my
> classes and assume I'm going to be able to do the runtime checking when
> it's available? I had devised a complete typesystem based on dictionaries,
> but if yours is ready and tested, I'll chuck mine.

Give it a whirl -- pre-release tarball is at
http://starship.python.net/~gward/Grouch-0.1.tar.gz.  Shh!  Don't tell
anyone I mentioned this.  It's *not* the final release.  There will be
another Grouch-0.1.tar.gz with possible differences in a few days.

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org


From kiko@async.com.br  Mon Aug 27 16:15:54 2001
From: kiko@async.com.br (Christian Robottom Reis)
Date: Mon, 27 Aug 2001 12:15:54 -0300 (BRT)
Subject: [Types-sig] Re: [ZODB-Dev] Pre-announce: Oscar 0.1
In-Reply-To: <20010821192419.C10598@mems-exchange.org>
Message-ID: <Pine.LNX.4.32.0108271205510.157-100000@blackjesus.async.com.br>

On Tue, 21 Aug 2001, Greg Ward wrote:

> First of all, let me modify the announcement a bit.  There are too many
> software packages out there already called "Oscar" -- so Oscar is dead,
> long live the Grouch!  (Does anyone who grew up outside of North America
> get it?  Oh well...)

Ewww. This looks awfully cool. Congrats, Greg, this works nicely. :)

> When/if a new schema language becomes part of Grouch, it will be offered
> as a complement to schema extraction from docstrings.

This would be an XML schema thing like we have for gnome's GAL and
libglade?

> Yes, although it's a tad clunky right now.  Eg. say I have a schema in
> schema.pkl, as created by the gen_schema script:
[... include general clunkiness ...]
> ...and around we go.  You get the picture.  The documentation for this
> API is all in the code.

Hmm. It is a bit clunky (the open database part definitely is!). Okay, so
now let me ask you this: Don't _you_ do runtime typechecking of anything?
How do you prevent 3vil things going into the database if no runtime
checking is done? Oh, this _is_ how you do runtime checking?

> > Oh, this _is_ a runtime typecheck? :)
>
> Not currently.  Right now, we do a type-checking pass nightly on our
> database.  So far, it mostly finds documentation errors -- ie. it's
> mainly peace-of-mind, rather than something that regularly finds bugs.

Okay, so if it only finds doc errors, how do you avoid user-input data to
be in the correct types? One thing that bothers me is that, f.i., GTK
entries always give me back strings, which I have to "cast" into a
float/int/DateTime/whatever.

Apart from having in-field validation (using GtkEntry's *_text callbacks),
I do validation in the domain classes.  However, this is very ugly using
custom-written code because everybody has to specify the types inside the
handler themselves (with some type(foo) == type(0.0) statements). This
could be done using Grouch's runtime system very transparently, AFAICS.
Agreed?

>   * invoke Grouch in __setattr__(), so every attribute is checked at
>     assignment time (this is the one that really scares me, performance-
>     wise)

Yeah, but for user-input data going into persistent domain classes, it's a
must.

> Give it a whirl -- pre-release tarball is at
> http://starship.python.net/~gward/Grouch-0.1.tar.gz.  Shh!  Don't tell
> anyone I mentioned this.  It's *not* the final release.  There will be
> another Grouch-0.1.tar.gz with possible differences in a few days.

Vere is da stone? :)

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 272 3330 | NMFL