From Michael@RCP.co.uk Wed Aug 1 09:22:01 2001 From: Michael@RCP.co.uk (Michael Abbott) Date: Wed, 1 Aug 2001 09:22:01 +0100 Subject: [Types-sig] Query about Types SIG status Message-ID: <217F6DFA440ED111ACDA00A0C906B00601AAE6BC@arsenic.rcp.co.uk> I'd be very grateful if someone could post a summary of the current status of the Python Types SIG. I'm a little concerned: 1. The only mail I've seen since joining this list has been SPAM! 2. Some of the links on the SIG home page seem to be broken (of course the recent outage of python.org doesn't help, but that's not what I'm referring to). 3. There doesn't seem to be much in the way of recent and current proposals, as far as I can see. There seem to be a variety of documents in varying stages of maturity, but it's difficult to see what the current state of thinking is. There's a document from Guido van Rossum with some early ideas, an unnumbered PEP from Paul Prescod on an interface declaration language, and PEP-0245 by Michel Pelletier, plus a number of other papers. However (it's difficult to tell), most of these seem to be quite elderly! Clearly the ideas of interfaces and of static types are distinct but closely related developments. Is this an area of active development, or is the current consensus that it's not worth the effort? From gward@mems-exchange.org Mon Aug 20 22:01:38 2001 From: gward@mems-exchange.org (Greg Ward) Date: Mon, 20 Aug 2001 17:01:38 -0400 Subject: [Types-sig] Pre-announce: Oscar 0.1 Message-ID: <20010820170138.A8954@mems-exchange.org> --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi all -- several months ago, I cooked up a tool, Oscar for rigorously type-checking a Python object graph: you define an object schema (currently through specially-formatted class docstrings), and Oscar crawls a persistent object graph to ensure that every scrap of data in it conforms to your schema. We use this regularly in the MEMS Exchange for integrity-checking our ZODB database; it's not the be-all-end-all to checking that all is well with an object database, but it's a hell of a lot better than nothing. In the past few weeks, I finally got around to writing the scripts and documentation necessary to release Oscar publicly. Now I'm ready to do so, pending approval by the CNRI brass (sigh). There's nothing available for download just yet, so no chest-thumping post to python-announce. But there is documentation describing the Oscar type language, which I think is a fine way to descibe Python data types. So, on the assumption that types-sig and zodb-dev readers are more likely than most to want to rush out and try Oscar as soon as it's available, I'm posting all that documentation right here. I welcome feedback as to whether this is a crazy idea or not, whether the type syntax is bogus or excellent, whether the type-system is "good enough" or needs to be all-encompassing, etc. Attached you'll find: type-system.txt a description of Oscar's type system and the syntax for defining Oscar types schema.txt a description of what an object schema consists of and how you define one checking.txt how to use Oscar to type-check an existing persistent object graph Enjoy! Hopefully the real release will happen this week or next. Greg -- Greg Ward - software developer gward@mems-exchange.org MEMS Exchange http://www.mems-exchange.org --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="type-system.txt" Oscar's type system ------------------- Oscar's type system is a large, useful subset of Python's type system. The major advantages of Oscar's type system are that it is explicit and enforced. Since Python types are implicit (determined at run-time) and mostly unenforced, Oscar sits quite neatly on top of Python, bringing order and structure to a potentially chaotic situation. Oscar understands the following major classes of data types: * atomic types: anything with a distinct Python type object can be an atomic type in Oscar, but they're intended for types with a single, atomic value. The built-in types int, string, and float are obvious candidates (and in fact these are present as atomic types by default in any Oscar schema, along with long and complex). You can add use other built-in types (e.g. file, function) as atomic types, or any extension type. For example, if you use the mx.DateTime module, you might add DateTime as an atomic type, so you can declare variables as being of type DateTime and have Oscar enforce that requirement. Examples: "string" denotes a string variable "int" denotes an integer variable "DateTime" denotes a DateTime variable; this only works if you have explicitly added an atomic type called "DateTime" to your schema * container types: Python's built-in list, dictionary, and tuple types. (Classes that act like lists, dictionaries, and tuples are "instance-container" types, and I haven't yet decided what to do about the type-class unification in Python 2.2.) Oscar enforces fairly stringent rules for container types: - lists must be homogenous, i.e. all elements of the same type, and may be of any length Examples: "[string]" denotes a list of strings "[int|long]" denotes a list of either ints or longs (a union type; see below) "[any]" denotes a list of anything (ie., no enforcement) (see below for "any" types) - dictionaries must be separately homogenous: all keys must be of the same type, and all values must be of the same type. (Incidentally, Oscar knows nothing about which types are hashable and allowed to be dictionary keys; that's enforced by Python at run-time.) The key type and value type are specified separately. Examples: "{ string : int }" denotes a dictionary mapping strings to ints "{string : int|long} denotes a dictionary mapping strings to either ints or longs "{long : [string]} denotes a dictionary mapping longs to lists of strings - tuples are hetergenous (mixed-type) but fixed in size, and each "slot" is fixed in type. Examples: "(int,)" denotes a tuple containing exactly one integer "(string, string)" denotes a pair of strings "([int|long], string, int)" denotes a triple: list of (int or long), string, int Tuple types have one exception to this rule: if a tuple type is "extended", then the rules change for its last slot: for example, the extended tuple type "(string, int*)" (note the "*") denotes a tuple with exactly one string followed by zero or more ints. The following are all valid values of this type: ("foo", 3) ("foo", 3, 1) ("foo", 2, 5, 1, 6, 2, 1, 4, 5, 1, 15, 6, 2, 5) ("foo",) This is mainly used for tuples that act like lists, eg. if you want a list of strings to be usable as a dictionary key, you code it as a tuple of strings instead (lists aren't hashable). This practice is incompatible with Oscar's basic tuple definition, so extended tuples are provided as an escape. Note that "of the same type" refers to Oscar types, not Python types. For example, if a variable is declared "[int|long]", each element is checked separately to make sure it is either an int or a long; [1, 2L, 3] is a valid value of the type "[int|long]". (Again, union types are described below.) * instance types: used for class instances. A class Foo defined in the module foo.bar has an associated instance type "foo.bar.Foo". Generally, it's not enough to say that a variable is of type "foo.bar.Foo"; you also want to specify the instance attributes of Foo (and their types!). Each instance type has an associated class definition that stores this information. This is where Oscar's real power shines through, because typically Python data is accessed via an instance of some class. If your schema has a class definition for that "root class", and for the class of each object reachable from the root, Oscar will crawl your entire object graph, ensuring that every instance, every attribute of every instance, and every element of every container anywhere in that object graph is of the correct type. The essential ingredient of a class definition is its attribute list. This is described below, in "Defining a class schema". Examples: "FooBar" denotes an instance of class FooBar defined in the main program "thing.Thing" denotes an instance of class Thing defined in module thing * instance-container types: Python classes often implement the semantics of lists, tuples, or dictionaries. You don't want to give up type-checking every attribute of instances of such classes, but you also want to make sure that they conform to the strict type-checking rules Oscar applies to containers. Hence, instance-container types marry the two. Examples: "UserList.UserList [string]" denotes an instance of the UserList class, defined in the UserList module, that acts like a list of strings "MyDict { string : int|long }" denotes an instance of the MyDict class that acts like a dictionary mapping strings to either ints or longs * union types: any set of Oscar types may be combined to form a union type. A candidate value is tested against each sub-type of the union type, and only rejected if all of the sub-types reject it. Examples: "int | long" denotes a value that may be either an int or a long "string | [string] : (string, string)" denotes a value that may be either a string, a list of strings, or a pair (tuple) of strings * wildcard type: used for variables that can be of any value. There is only one wildcard type, spelled "any". * boolean type: used for boolean (true/false) values. Strictly speaking, any Python value can be interepreted in a boolean way: eg. 0, 0L, 0.0, "", and None are all false values, while 42, 3.14159, and "foo!" are all true. Oscar restricts this drastically: the only allowed values for boolean variables are 0, 1, and None. * alias types: used to define shorthand names for commonly-used types. The most common use of this is to alias the bare name of a class to its fully-qualified name -- e.g. if class Thing is defined in module project.util, then "Thing" might be an alias for "project.util.Thing". ("project.util.Thing" is the instance type, and "Thing" is an alias type that expands to that instance type.) Aliases are also useful if you have a particular union type used frequently; instead of always spelling out "int | float | long", you can define "number" as an alias for this union type. (This also makes it easy to change your definition of "number" if someday you have to extend it to handle, say, complex or rational numbers.) Type grammar ------------ [taken from the type_parser.py module] type : NAME # atomic, alias, instance, boolean, any | container_type # list, tuple, dictionary | NAME container_type # instance-container type | union_type container_type : list_type | tuple_type | dictionary_type list_type : "[" type "]" tuple_type : "(" (type ",")* type "*"? ","? ")" dictionary_type: "{" type ":" type "}" union_type : type ("|" type)+ Tokens: NAME : [a-zA-Z_][a-zA-Z0-9_]*(\.[a-zA-Z_][a-zA-Z0-9_]*)* $Id: type-system.txt,v 1.1 2001/08/20 18:10:09 gward Exp $ --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: attachment; filename="schema.txt" Content-Transfer-Encoding: 8bit Object schemata --------------- An object schema consists of the following components: * a set of atomic types, usually a subset of Python's builtin types. The default atomic types are string, int, long, float, and complex. In principle, you can add other builtin types (like function, class, or file) or extension types to a schema, but Oscar currently has problems with many builtin types. (In particular, only types whose values can be pickled may be atomic types in Oscar.) * a type alias mapping, letting you define shorthand names for common types. * a set of class definitions. A class definition maps instance attribute names to attribute types. This performs two purposes: it defines the expected set of attributes for instances of a class, and it defines the type of each attribute. In the current version of Oscar, an object schema is defined through a project description file and the class docstrings in a set of source files. This is useful in practice, but it's kind of hard to talk about object schemata without a simple, compact schema description language. Thus, consider the following pseudo-schema: class Thing: name : string class Animal (Thing): num_legs : int furry : boolean (Coincidentally, this is the syntax emitted by gen_schema's "-t" option. However, this is currently a write-only language; Oscar has no way to parse schemata created by "gen_schema -t".) This defines an object schema with no additional atomic types (just the default five: string, int, long, float, and complex), no aliases, and two classes (both, presumably, in the __main__ module, since the class names are unqualified). If you ask Oscar to type-check an instance of Thing under this schema, or if it comes across a Thing instance in the course of type-checking a larger object graph, it does the following: * ensure that the instance has exactly one attribute, 'name' * ensure that the value of this attribute is a string Similarly, Oscar type-checks an Animal instance under this schema as follows: * ensure that it has exactly three attributes, 'name', 'num_legs', and 'furry' (note that 'name' is inherited from Thing) * ensure that the value of 'name' is a string, 'num_legs' an int, and 'furry' a boolean (i.e. 0, 1, or None) Defining an object schema: class docstrings ------------------------------------------- Currently, you define an object schema by writing specially-formatted class docstrings. (There is no separate schema description language... yet.) For example, the Thing class in the above pseudo-schema might be documented as: class Thing: """A single thing, which may be an animal, vegetable, or mineral. The only property common to all things is a name. Instance attributes: name : string the name of the thing """ Oscar (specifically, the gen_schema script that parses these docstrings) ignores everything in the docstring up to the "Instance attributes:" line. After that, things get fairly rigid: * the "Instance attributes:" line must be indented to the same depth as the main body of the docstring * each attribute name is indented two spaces relative to that, and followed by a colon (":") and the attribute's type * attribute descriptions (which are optional, and are ignored by Oscar) are indented a further two spaces * when indentation returns to the same level as the "Instance attributes:" line, Oscar stops processing the docstring and goes on to the next class in the module (thus, blank lines are allowed in the attribute list) Here is a slightly more elaborate example: class Animal (Thing): """An animal, ie. a thing with multiple legs and possibly fur. Instance attributes: num_legs : int the number of legs this animal has furry : boolean whether this animal is furry or not Outsiders should use 'get_num_legs()' and 'is_furry()' to access these attributes. """ Here is a stripped-down version of this docstring that is exactly equivalent as far as Oscar is concerned: class Animal (Thing): """ Instance attributes: num_legs : int furry : boolean """ Sometimes a class will have no instance attributes of its own; Oscar has special syntax for this: class Mammal (Animal): """Instance attributes: none""" This is different from simply omitting the list of instance attributes, or omitting the docstring entirely. If Oscar sees a Mammal instance with any attributes apart from those inherited from Animal, it will complain. However, if Mammal has no docstring or attribute list, Oscar can't do detailed type-checking of instances of that class. Instead, it * complains that the class has no docstring (or no attribute list) * exclude the class from the schema * when type-checking an object graph, complain about any instances of that class it discovers Defining an object schema: the project description file ------------------------------------------------------- Writing class docstrings that document every instance attribute is the key part of defining an object schema. However, you still have to tell Oscar how to find those class docstrings and what to do with them. This is done with the gen_schema script and its project description file. [Searching by directory] At its simplest, the project description file contains a list of directories to search for Python source files, and possibly a prefix to use in turning source filenames into module names. For example, the project description file for Oscar itself (oscar.cfg in the top-level Oscar directory) starts out with this: dirs = ["."] prefix = "oscar" (The project description file is just Python code; it's execfile'd by gen_schema.) This instructs gen_schema to search for *.py in the current directory, and to assume that all modules found actually live in the "oscar" package. Hence when it finds schema.py, it considers that module to be "oscar.schema", and a class ObjectSchema in that file will be called "oscar.schema.ObjectSchema". gen_schema does *not* search recursively; if you want it to descend into sub-directories, you must specify them explicitly: dirs = ["compiler", "compiler/parser", "compiler/optimizer"] The directories in 'dirs' are interpreted relative to a base directory supplied with the "-d" (or --base-dir) option to gen_schema. If you run gen_schema from Oscar's top directory (ie., where schema.py lives), everything is fine -- the current directory is the right place to look for Oscar's source files. In that case, ./scripts/gen_schema -p oscar.cfg is the right incantation. (The resulting schema will be written (as a pickle) to schema.pkl.) If you're in /home/greg and Oscar is in /tmp/oscar, though, the above incantion is wrong: Oscar will consider any *.py files in /home/greg to be part of the "oscar" package, and will scan them for docstrings to generate a schema. This probably won't work; you need to specify the base directory that 'dirs' is interpreted relative to: /tmp/oscar/scripts/gen_schema -p /tmp/oscar/oscar.cfg -d /tmp/oscar (Obviously, it's easier just to run gen_schema from the right place!) [Specifying individual modules] If you don't want to search every "*.py" file in a list of directories, you can supply a list of explicit module names, eg.: extra_modules = ["oscar.schema", "oscar.valuetype"] Note that extra_modules is a list of fully-qualified module names, *not* filenames. This variable is called 'extra_modules' because these modules are added to the list of modules found by searching the directories named in 'dirs'. If 'dirs' isn't supplied, the modules in 'extra_modules' are Oscar's only source for class definitions. [Excluding individual modules] You can refine gen_schema's search for classes by excluding certain modules. As an example, Oscar includes a copy of SPARK (John Aycock's nifty parser framework) as the "oscar.spark" module; since this is really someone else's code, it doesn't have Oscar-style docstrings to parse. Also, the parser classes are transient and shouldn't wind up in any persistent store of an Oscar object graph, so there's not much point in type-checking them. Thus, I exclude both oscar.spark and oscar.type_parser (which provides classes derived from the SPARK classes) from gen_schema's scan: exclude_modules = ["oscar.spark", "oscar.type_parser"] Like extra_modules, exclude_modules is a list of fully-qualified module names. [Excluding individual classes] You can also exclude specific classes from the search, instead of whole modules. This is useful if a particular module provides some transient classes and other first-class persistent classes. For example, I might wish to exclude the TypecheckContext class, defined in oscar.context, from schema generation: exclude_classes = ["oscar.context.TypecheckContext"] Again, classes are specified as fully-qualified Python names. [Adding atomic types] If the five default atomic types aren't enough for your project, you'll have to add new ones. This might happen if you use extension types in your application, or if you store slightly odd objects in your persistent object graph, like functions or class objects. New atomic types are specified using an example value, not using the type object itself. (This is necessary because type objects can't be pickled, and gen_schema pickles the schema for future use. We can't store type objects in the pickled schema, so we store sample values instead.) For instance, to add Marc-André Lemburg's DateTime type to your schema, add this to your project definition: import mx.DateTime atomic_types = [mx.DateTime.now()] The structure of 'atomic_types' is a tad complex. Most often, each element of the list is simply a value of the atomic type you want to add to your schema -- eg. here I created a sample DateTime object. Since these sample values go straight into the object schema, which is subsequently pickled by gen_schema, these must be pickle-able values. Oscar probably needs to grow a real schema definition language before you can have, say, Python function or file objects as atomic types in an object schema. (In other words, I think this is an implementation problem due to reliance on pickling rather than a fundamental problem.) In this simple case, the name of the atomic type is implicit, because the type itself supplies its name -- "DateTime" in the above example. (Try "type(DateTime.now()).__name__".) In some cases, though, you may want to specify your own name for an atomic type. In that case, just supply a tuple (sample_value, type_name) in atomic_types. This is useful if you're dealing with ExtensionClass, where every class is a new type. (This is also the case with classes derived from types in Python 2.2.) For instance, a ZODB application that needs "class" and "instance" types (for class objects and generic instance objects) might do this: import ZODB from Persistence import Persistent # ... atomic_types = [(Persistent(), "instance"), (Persistent, "class")] If you don't understand why you might need this, you probably don't need it. Putting it all together ----------------------- For a simple example of defining an object schema, take a look in the "examples" sub-directory of Oscar's source distribution. There, you'll find: * the thing.py and animal.py modules, which provide the classes ThingCollection, Thing, Animal, and Mammal * the make_things script, which creates some things, bundles them in a collection, and pickles them to things.pkl * the things.proj project description file, which tells gen_schema how to generate a schema for this project For now, we're just going to generate a schema from the Python source files and things.proj. Later (in "checking.txt", the document that covers type-checking an object graph) we'll run make_things and type-check the results. If you haven't installed Oscar yet, you should either do so now or perpetrate your favourite kludge for ensuring that it's available through sys.path. (If you don't have a favourite kludge, just install it.) Run python -c 'import oscar' to make sure it worked -- if this command completes silently, all is well. Installing Oscar should also install the gen_schema and check_data scripts. I'll assume they're in your shell's PATH; you might have to adjust your PATH or the commands here accordingly. Before we run gen_schema, let's take a look at the ingredients of this project. First, the project description file, things.proj, is quite simple: extra_modules = [("thing", "thing.py"), ("animal", "animal.py")] There's no 'dirs' here, meaning gen_schema won't go searching for "*.py" anywhere. It just looks for the 'thing' module in thing.py, and the 'animal' module in animal.py. Since explicit source filenames are supplied, the 'thing' and 'animal' modules don't have to be in Python's path -- gen_schema simply parses the source files. Next, take a look at thing.py. You'll see that it defines two classes, Thing and ThingCollection, and that the instance attributes of each are fully documented. Similarly, animal.py provides the Animal and Mammal classes. Finally, let's run gen_schema. We'll save the schema for this project to thing_schema.pkl and thing_schema.osc -- the two files have the same content, but only the latter is human-readable. From the "examples" directory: gen_schema -p things.proj -o things_schema.pkl -t things_schema.osc If you're really curious about what's going on here, add the "-v" option. The output of gen_schema (without "-v") should look like this: looking for classes... found 4 classes parsing class docstrings... writing object schema to things_schema.osc... pickling object schema to things_schema.pkl... Take a look at things_schema.osc for a human-readable representation of the schema also saved in things_schema.pkl. Now that we have an object schema for this project, we can use it later to type-check a persistent object graph created by applications that use this project, such as make_things. This will be done in the next document, "checking.txt". $Id: schema.txt,v 1.5 2001/08/20 18:32:31 gward Exp $ --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="checking.txt" Type-checking an object graph ----------------------------- Once you've gone to the trouble of defining your object schema by documenting your instance attributes, creating a project description file, and running gen_schema, it's quite simple to ensure that a collection of objects conforms to your schema. I'm going to assume you've followed the steps at the end of the "schema.txt" document, and created a schema for the "things" project in things_schema.pkl. Once you have a schema, you can type-check a saved object graph. First, we have to create an object graph using the make_things application. This requires a bit more setup than running gen_schema, since now we have to be able to import the 'thing' and 'animal' modules. The easiest way to do that is to switch into the "examples" subdirectory and force Python to search it for modules: cd examples export PYTHONPATH=. (That last line is for Bourne-like shells under Unix. For csh-like shells, say "setenv PYTHONPATH .". For other operating systems, you're on your own.) Now run make_things: python make_things You should now see the file "things.pkl" in the current directory. Note that make_things has a (deliberate) type error in it, which means the object graph captured in things.pkl is inconsistent with the object schema in things_schema.pkl. We're going to use the check_data script to find the type error in the data. (See if you can spot the error by reading the make_things script. It would be easy to modify the underlying class -- Animal, in this case -- to catch this particular error. However, there's no limit to the number of type errors you can make in Python code, and hardening your underlying classes to catch every single one of them is a lot of work. Thus, Oscar exists to catch such errors after-the-fact. That is, Oscar doesn't tell you immediately when a type error is made -- it only detects it in data that has been saved to a persistent store. I suspect the Oscar machinery could be used for run-time type-checking, but that would probably impose a pretty severe performance penalty, so I haven't experimented in that direction... yet.) Obviously, in order to load your pickled object graph, Oscar needs to be able to import the 'thing' and 'animal' modules. Since you already ran make_things, that precondition is satisfied, so we can go ahead and run check_data: check_data -f pickle things_schema.pkl things.pkl The "-f" option tells check_data what format your object graph is stored in. (Currently, the only other option is "zodb", which is further modified by the "-s" (storage) option.) The first filename, things_schema.pkl, is the file containing the object schema for this project. This is *always* a pickle, regardless of "-f". The second argument is the location of the data to be checked -- in this case, the object graph created by make_things. The one type error in the make_things script corresponds to one type error in things.pkl, reported by check_data as: root.things['Tyrannosaurus rex'].num_legs: expected int, got string ('2 big, 2 small') If you were unable to track down the error message by reading the code earlier, this error message should help a lot. ;-) --6TrnltStXW4iwmi0-- From kiko@async.com.br Tue Aug 21 20:35:38 2001 From: kiko@async.com.br (Christian Robottom Reis) Date: Tue, 21 Aug 2001 16:35:38 -0300 (BRT) Subject: [Types-sig] Re: [ZODB-Dev] Pre-announce: Oscar 0.1 In-Reply-To: <20010820170138.A8954@mems-exchange.org> Message-ID: On Mon, 20 Aug 2001, Greg Ward wrote: > several months ago, I cooked up a tool, Oscar for rigorously > type-checking a Python object graph: you define an object schema > (currently through specially-formatted class docstrings), and Oscar Currently meaning this will change predictably? Let me just say this is quite nice, and I'll have to implement something like this in our Domain classes _soon_. My question is: Does the typesystem offer any introspection? I.E., can I in runtime discover the attributes registered for my class, and what types they are? I need this for type-checking when sorting columns in my UI framework, so that would come in handy. Oh, this _is_ a runtime typecheck? :) > In the past few weeks, I finally got around to writing the scripts and > documentation necessary to release Oscar publicly. Now I'm ready to do > so, pending approval by the CNRI brass (sigh). There's nothing How's it going? Ok, assuming this will be released, can I go ahead and docstring my classes and assume I'm going to be able to do the runtime checking when it's available? I had devised a complete typesystem based on dictionaries, but if yours is ready and tested, I'll chuck mine. Take care, -- Christian Reis, Senior Engineer, Async Open Source, Brazil. http://async.com.br/~kiko/ | [+55 16] 272 3330 | NMFL From gward@mems-exchange.org Wed Aug 22 00:24:19 2001 From: gward@mems-exchange.org (Greg Ward) Date: Tue, 21 Aug 2001 19:24:19 -0400 Subject: [Types-sig] Re: [ZODB-Dev] Pre-announce: Oscar 0.1 In-Reply-To: ; from kiko@async.com.br on Tue, Aug 21, 2001 at 04:35:38PM -0300 References: <20010820170138.A8954@mems-exchange.org> Message-ID: <20010821192419.C10598@mems-exchange.org> On Mon, 20 Aug 2001, I wrote: > several months ago, I cooked up a tool, Oscar for rigorously > type-checking a Python object graph: you define an object schema > (currently through specially-formatted class docstrings), and Oscar First of all, let me modify the announcement a bit. There are too many software packages out there already called "Oscar" -- so Oscar is dead, long live the Grouch! (Does anyone who grew up outside of North America get it? Oh well...) On 21 August 2001, Christian Robottom Reis said: > Currently meaning this will change predictably? Let me just say this is > quite nice, and I'll have to implement something like this in our Domain > classes _soon_. My question is: As far as I am concerned, Grouch will *always* support extracting an object schema from docstrings. The MEMS Exchange has ~100 classes in 20,000 lines of code that use Grouch's docstring format for database type-checking, and another 100 classes that aren't in the object schema but use the same docstring format for clarity and consistency. When/if a new schema language becomes part of Grouch, it will be offered as a complement to schema extraction from docstrings. > Does the typesystem offer any introspection? I.E., can I in runtime > discover the attributes registered for my class, and what types they are? > I need this for type-checking when sorting columns in my UI framework, so > that would come in handy. Yes, although it's a tad clunky right now. Eg. say I have a schema in schema.pkl, as created by the gen_schema script: >>> from cPickle import * >>> schema = load(open("schema.pkl")) >>> cdef = schema.get_class_definition("mems.access.user.User") >>> cdef OK, you want to know what attributes this class has? >>> cdef.attrs ['user_id', 'password_hash', 'prefix', 'first_name', 'last_name', 'suffix', 'address', 'email', 'phone', 'fax', 'timezone', 'allow_mailing', 'group_list', 'history'] You want to know what type various attributes are? >>> cdef.get_attribute_type('password_hash') >>> cdef.get_attribute_type('address') Hmmm, the 'address' attribute is an alias type -- let's expand the alias to see what it really is: >>> schema.get_alias('Address') Digging up the class definition for this is more awkward than it needs to be: >>> name = schema.get_alias('Address').get_class_name() >>> name 'mems.lib.address.Address' >>> cdef2 = schema.get_class_definition(name) >>> cdef2 And now we can get the list of attributes in *this* class: >>> cdef2.attrs ['street1', 'street2', 'street3', 'city', 'state', 'zip', 'country_code'] ...and around we go. You get the picture. The documentation for this API is all in the code. > Oh, this _is_ a runtime typecheck? :) Not currently. Right now, we do a type-checking pass nightly on our database. So far, it mostly finds documentation errors -- ie. it's mainly peace-of-mind, rather than something that regularly finds bugs. The main reason Grouch doesn't step in at run-time is because I'm afraid of the performance hit. The implementation right now concentrates on correctness and completeness, with efficiency to come later. I don't even have performance figures at hand, although the MEMS Exchange database (140,000 objects in a 45 MB ZODB FileStorage) is a pretty good test case. Ideas for run-time: * invoke Grouch in __getstate__(), so your objects are checked before they're actually written * invoke Grouch in __setattr__(), so every attribute is checked at assignment time (this is the one that really scares me, performance- wise) The __getstate__() hook ought to be doable if you have a common base class for all your database classes. The __setattr__() hook would be scary for ZODB apps right now; probably best to wait until Python 2.2 is out and Persistent has been rewritten as a meta-class. (Or whatever is going to happen to Persistent.) > > In the past few weeks, I finally got around to writing the scripts and > > documentation necessary to release Oscar publicly. Now I'm ready to do > > so, pending approval by the CNRI brass (sigh). There's nothing > > How's it going? Pretty good, actually. The actual idea of releasing the code went down pretty well; nailing down a license took a bit longer. It's basically the same as the Quixote 0.3 license; the main problem is that -- according to the FSF -- it's not GPL-compatible. To be honest, I'm not entirely certain what this means, but I don't think it matters as much for Python applications/libraries as it does for Python itself. > Ok, assuming this will be released, can I go ahead and docstring my > classes and assume I'm going to be able to do the runtime checking when > it's available? I had devised a complete typesystem based on dictionaries, > but if yours is ready and tested, I'll chuck mine. Give it a whirl -- pre-release tarball is at http://starship.python.net/~gward/Grouch-0.1.tar.gz. Shh! Don't tell anyone I mentioned this. It's *not* the final release. There will be another Grouch-0.1.tar.gz with possible differences in a few days. Greg -- Greg Ward - software developer gward@mems-exchange.org MEMS Exchange http://www.mems-exchange.org From kiko@async.com.br Mon Aug 27 16:15:54 2001 From: kiko@async.com.br (Christian Robottom Reis) Date: Mon, 27 Aug 2001 12:15:54 -0300 (BRT) Subject: [Types-sig] Re: [ZODB-Dev] Pre-announce: Oscar 0.1 In-Reply-To: <20010821192419.C10598@mems-exchange.org> Message-ID: On Tue, 21 Aug 2001, Greg Ward wrote: > First of all, let me modify the announcement a bit. There are too many > software packages out there already called "Oscar" -- so Oscar is dead, > long live the Grouch! (Does anyone who grew up outside of North America > get it? Oh well...) Ewww. This looks awfully cool. Congrats, Greg, this works nicely. :) > When/if a new schema language becomes part of Grouch, it will be offered > as a complement to schema extraction from docstrings. This would be an XML schema thing like we have for gnome's GAL and libglade? > Yes, although it's a tad clunky right now. Eg. say I have a schema in > schema.pkl, as created by the gen_schema script: [... include general clunkiness ...] > ...and around we go. You get the picture. The documentation for this > API is all in the code. Hmm. It is a bit clunky (the open database part definitely is!). Okay, so now let me ask you this: Don't _you_ do runtime typechecking of anything? How do you prevent 3vil things going into the database if no runtime checking is done? Oh, this _is_ how you do runtime checking? > > Oh, this _is_ a runtime typecheck? :) > > Not currently. Right now, we do a type-checking pass nightly on our > database. So far, it mostly finds documentation errors -- ie. it's > mainly peace-of-mind, rather than something that regularly finds bugs. Okay, so if it only finds doc errors, how do you avoid user-input data to be in the correct types? One thing that bothers me is that, f.i., GTK entries always give me back strings, which I have to "cast" into a float/int/DateTime/whatever. Apart from having in-field validation (using GtkEntry's *_text callbacks), I do validation in the domain classes. However, this is very ugly using custom-written code because everybody has to specify the types inside the handler themselves (with some type(foo) == type(0.0) statements). This could be done using Grouch's runtime system very transparently, AFAICS. Agreed? > * invoke Grouch in __setattr__(), so every attribute is checked at > assignment time (this is the one that really scares me, performance- > wise) Yeah, but for user-input data going into persistent domain classes, it's a must. > Give it a whirl -- pre-release tarball is at > http://starship.python.net/~gward/Grouch-0.1.tar.gz. Shh! Don't tell > anyone I mentioned this. It's *not* the final release. There will be > another Grouch-0.1.tar.gz with possible differences in a few days. Vere is da stone? :) Take care, -- Christian Reis, Senior Engineer, Async Open Source, Brazil. http://async.com.br/~kiko/ | [+55 16] 272 3330 | NMFL