From john@viega.org Tue Mar 14 22:54:52 2000 From: john@viega.org (John Viega) Date: Tue, 14 Mar 2000 17:54:52 -0500 Subject: [Types-sig] A late entry Message-ID: <38CEC33C.6AC199A1@viega.org> Whoops. For me, this has always been an area of interest, but I completely missed the fact that people were actually discussing stuff here now, until the DC PIGgies meeting last night. Here are my comments so far on what I've seen on the type sig. Unfortunately, I haven't read much of what's been said, so I'm sorry if some things have been discussed or are not appropriate. I've tried to at least read or skim the proposals on the list web page. Here I'm going to mainly focus on Guido's proposal, since it seems to be one of the more recent proposals, incorporating select ideas from other proposals. Let's start with some notes on terminology. I've seen the word "typesafe" thrown around quite a bit in a short period of time. Let's try to avoid using the word casually here, because it can lead to confusion. I find people often don't know what they're refering to when they use (or hear) the term. Does it mean that no type errors can ever happen at run time? At all? Or that they can only happen under specific conditions? If so, what are those conditions? Cardelli would probably say that "safe" means that the error won't cause a crash or go unnoticed at runtime (not everyone agrees with his definitions). By his definition, Python is already a type safe language. However, I often hear people who use the term to mean static type safety... i.e., no type error will happen at run time. There's essentially no practical hope for that in an object oriented language like python. I think it might be nice for us to agree to share a common terminology. I know there's a bit of dissention w/ Cardelli's prefered terminology, but it at least provides us with a common reference in a field where terminology tends to be very muddled. I would recommend we all read Section 1 of Cardelli's "Type Systems" (it's about 9 pages). I have copies of this paper in ps and pdf: http://www.list.org/~viega/cs655/TypeSystems.ps http://www.list.org/~viega/cs655/TypeSystems.pdf Oh, one thing that I should mention is that throughout this document I assume an inline syntax for type checking. I far prefer it. Okay, next I want to talk about the way that "checked" and "unchecked" modules interact in Guido's proposal. He says that when an checked module imports and uses an unchecked module, all the objects in the unchecked module are assigned the type 'any' for the purposes of the checking. I am not 100% sure that's the best idea in the face of type inference. The checked module deserves type information that's as specific as possible, so that better types can be infered for the checked program. Of course, in many situations a type inference engine isn't going to be able to infer much more than "any" when there are absolutely no type hints in the code. But if something can be done, why not? The flip side of the coin is that after the type checking occurs, the unchecked code can exhibit bad behavior, breaking infered invariants. For example, let's say some checked code calls foo.bar(), which is infered to have the type: integer*integer->integer. Someone can dynamically load a module from unchecked code which replaces foo.bar() with a function of type string->integer. Options here? Run-time checks can be added at points where this is possible. I don't know if I support runtime checks if they can be avoided. Another option is to say that "guarantees" made by the type checker only hold if the whole program is checked. I'm comfortable with that (though I am still probably slightly in favor of the addition of runtime checks), but I suspect that others will not be. I have the same opinion when it comes to using checked modules from unchecked modules. Guido proposes adding runtime checks. I might prefer not to have them. Of course, one option is to support both options... I also might disagree with always performing type checking at load time (when an unchecked module imports a checked module). I see new type checking features as a static convenience for people who want to use it. Are people going to want to pay a runtime hit when linking against checked code when they don't want to check code themselves? I don't think so. Arguing against myself again, you can also make a case that this sort of dynamic checking is necessary to be able to live up to the promises a static checker makes when it "passes" a piece of code. Guido talks about dynamically importing a module and checking to see if the module matches the expected signature. That's certainly true... here's a really simple case: let's assume that A imports B, and that A+B were fully checked statically, but someone replaced B with C after checking and before runtime. Is it worth performing those extra checks? The static checks have done the best they could, and now, if there are gross incompatabilities, we're hopefully going to see an error dynamically whether we added extra checks or not (you can certainly construct cases where that's not true). I'd probably say it is worth performing those static checks in some less frequent, highly dynamic situations, such as when you've got a completely checked application, then you import a module that was essentially typed to stdin. I think the right answer for Python depends on what the goals are. I'm looking for something that will help me find bugs statically that traditionally only cropped up at run time, when possible. Getting contraversial, I don't know if I really care about supporting those dynamic checks that are only there to support cases in which the application is used in ways that subverts the assumptions I made when performing the type checking, such as what modules I'd be using in my application. It should be possible to turn all that stuff off, at the very least, and just let everything go while completely ignoring type information. Remember, dynamic features in the type system can end up slowing down a language... Okay, going to the GCD example... Do we really need to use the "decl" keyword alone to indicate that a module has been checked? I'm fine with gcd(3.5,3) throwing a dynamic error if the module is not checked (I'd also be fine with it trying to run here, honestly). But should it try to compute the result if I remove the "decl" keyword from the file, but still specify types of the arguments to gcd?? Why does the word "decl" indicate the module is checked? Just because the programmer added the keyword doesn't mean the types he or she assigned are even internally consistant. Is the interpreter going to recheck on startup every single time? It seems like a lot of overhead to me. Plus, if the keyword isn't there, but there's some type information, shouldn't we check the types? If the type checker is a separate static program, it seems to me that things are more natural. We try to check the inputs to the program. If those inputs use other modules, we go and check them, too. If the checking fails (say it's unchecked), then we warn, and can either fail or (optionally?) make assumptions (e.g., assign any everywhere) and continue. I'm worried a bit about a type checked library breaking code that doesn't use type checking. For example, let's say that Guido's GCD example is in a std library right now, and type checking gets added. What if code passes in floats? The unchecked code causes a runtime exception in Guido's proposal. Well, the code might not always be wrong! For example, let's go back to GCD. While GCD would get typed as integer*integer->integer, you can get perfectly valid results if you pass in floats if your floats always happen to be integers. Maybe my code always calls GCD as so: gcd(12.0,28.0) Why should this code break? Thats one more reason why I think that checking maybe should only be performed "on demand" (dynamic checks are okay if they are explicitly requested). Next, let's talk briefly about what type inferencing can do. In particular, consider Guido's example where he assumes that the type inference algorithm isn't sophisticated enough to handle complex types, such as lists. Let me say that lists are pretty easy to handle when doing type inferencing. Object types can get a tad bit tricky, but everything should be doable. When I have time, I'll ramble a bit about what I see as the best approach to implementing a type inference engine for Python based on well-known algorithms. There's a problem though. No one has ever been successful at coming up with an OO language that successfully integrates all 3 of the following: 1) Type inferencing 2) Subtyping 3) principal types Currently, you can choose any two (See Jens Palsburg's brief note "Type Inference For Objects", which I have at http://www.list.org/~viega/cs655/ObjInference.pdf. BTW, a principal type summarizes all possible types of a given piece of code (variable, function, etc). I'd like to capture all 3. Basically, 'a->'a is the principal type of the "identity" function, even though there are plenty of valid specific types that will work in a particular context. I don't know if we can solve this problem in the general case. I don't think it's possible for Python's needs, but I haven't really given it too much thought yet. A couple of brief thoughts at this point... What should the type checker do with infered types? Place them directly into the code? Place them in comments or a doc string so that they have no effect but you can see what the checker did and double-check its work? Also, it might be nice to type code when you don't have access to the source. Should it be possible to specify the types of features in a module for which you don't have source (assume it wasn't written with checking code)? If so, how? An interface definition file? Then do you try to check the byte code to make sure the given interfaces are actually valid? Here's an issue I haven't seen brought up yet. Consider the following function: def identity(x): return x What's the type of this function? There are a couple ways to look at this problem. First, we can look at all contexts in which "identity" is called, and have its type be the union of all of those types. That is a very ad hoc method of genericity. If we call "identity" as such: identity('foo') And there are no other instances of this call, then we would assign this function the type string->string. If there's also a call: identity(12) Then would the type then become (string|integer)->(string|integer)? That sounds like a bad idea. At that point, the type system has lost precision (I have a big problem with OR-ing of types, which has been proposed... more below). The signature implies you can pass in a string and get back an integer, which means the type is too liberal. Another reason this isn't a great idea is that you'd have to defer the type until you've seen all calls. Plus, as you add more code, and call "identity" in different contexts, the type will grow. That seems unweildy, especially in documentation. I think it would be best to be able to say, "the in type is the out type, and who cares about the type beyond that". Parametric polymorphism to the rescue... the type could be ->, where indicates that x can be of any type. I think the syntax could be better (I prefer ML's, which types identity as 'a->'a, but I hear there's some resistance to it in this community... I'm going to use 'a from now on because it looks more natural to me). One problem here is that it probably isn't going to be possible to infer an accurate principal polymorphic type for things in all cases (see above). I'll have to give the worst cases some thought. You may have noticed that 'a looks a heck of a lot like the "any" keyword proposed by others. It turns out to be pretty much the same thing, except parametrically polymorphic. One thing you can do is differentiate between different types (basically buying you parameterized functions, but with a much better syntax, IMHO. Eg: def init_dict(key : 'a, val : 'b): global dict : {'a:'b} dict[key] = val The proposed alternative is something like: def init_dict(key: a, val : b): ... There seems to be an implication that functions named init_dict will need to be instantiated... That's a kludgy C++-ism... instantiation is not necessary for parametric polymorphism. So should the proposed "any" keyword go away? Unfortunately, no. The difference in semantics between "any" and generic types is that the "any" keyword basically forces the program to forego type checks when variables involving "any" are involved, which will sometimes be necessary. An example might clear up the distinction: def f(x): # Here, x is infered to be of type 'a, which is currently # the principal type SO FAR. x.foo() # Whoops, we just had to narrow 'a to any object with a "foo" # method. If x were of type "any", its type would stay the same. I really hope that people avoid using "any" *always*, but it does need to be there, IMHO. Another problem that falls out at this point is how to write the type of x when we've seen x call "foo". Do we look at all classes across the application for ones with a foo method? And do we care if that solutions precluds classes that dynamically add their "foo" method from applying? Ick. I'd prefer to avoid the concrete, and infer something like: << foo: None->'a >> Which would read: an object with a field foo of type function taking no arguments and returning anything. In the following case: def f(x): decl i : integer x.foo() i = i + x.bar("xxx") z = x.blah(1,2) We would infer the following type: << foo: None->'a, bar: string->integer, blah: integer*integer->'b >> Adding contrained polymorphic types in declarations should be possible, even though it'd get messy without some sort of typedef statement: def add_observer(x : << notify: string->None >> ) -> None: global notify_list: << notify : string->None >> if not notify_list: notify_list = [x] else: notify_list.append(x) It'd be nice to be able to do: typedef notifyable << notify: string->None >> def add_observer(x : notifyable) -> None: global notify_list: notifyable if not notify_list: notify_list = [x] else: notify_list.append(x) Ok, back to the OR-ing of types. The result of such a construct is only going to be lots of ad hoc typecase stuff and types that are not as precise as they should be (e.g., the (string|integer)->(string|integer) example above). For example, consider the following code: class a: def foo(self : a, x : integer): ... class b: def bar(self : b): ... def blah(x : a | b): x.foo(2) That code shouldn't pass through the type checker, because there's no guarantee that x has a method foo. The only real solution every time you have an OR type is a typecase statement, which is ad hoc and can lead to maintainance problems. I really don't think there should be a language construct to support what's really bad programming practice... I think that "any" should be the only place in the system where types can be that ambiguous. (To clarify... typecases are sometimes necessary, but I think the OR-ing of types is a pretty bad idea). I do, however, support the AND-ing of types. There's still a minor issue here. Consider the code: class a: def foo(self: a, x: integer) -> integer: ... class b: def bar(self: b) -> None: ... def blah(x:a&b): x.foo() This should definitely type check. However, what are the requirements we should impose on the variable x? Must it inherit both a and b? Or need it only implement the same methods that a and b statically define? I prefer the former. There's also the option of restricting &'s to interface types only, which I think is fine. BTW, if there isn't an interface mechanism in the 1.0 version of the type system, people will start defining "interfaces" as such: class IWidget: def draw(self: IWidget) -> None: pass def getboundingbox(self: IWidget) -> (float*float)*(float*float): pass That's okay, but you do want the type checker to be able to distinguish between an abstract method and a concrete method. Otherwise: class Scrollbar(IWidget): pass Would automatically be a correct implementation of the IWidget interface, even though we failed to define the methods listed in that interface (we should be forced to add them explicitly, even if their body is just a "pass"). I'd much rather the above give an error. I think that special casing classes with no concrete implementations isn't that good an idea, so an "interface" keyword should be considered, which would look the same as classes without the method bodies, and with the restriction that interfaces cannot inherit classes (though it's desirable for classes to inherit interfaces, obviously). I wonder if it would be good to also pull out the explicit self parameter. Eg: interface IWidget: def draw() -> None def getboundingbox() -> (float*float)*(float*float) I think it would be good to allow parameter-based method overloading for people who use the type system. You'd be allowed to do stuff like: class Formatter: def print(self: Formatter, i : integer)->None: ... def print(self: Formatter, s : string)->None: ... def print(self: Formatter, l : ['a])->None: ... It would be easy for the compiler to turn the above into something like: class Formatter: def $print_integer(self, i)->None: ... def $print_string(self, s)->None: ... def $print_list_of_generic(self, l)->None: ... def print(self, x)->None: typecase x: case i: integer => self.$print_integer(i) case s: string => self.$print_string(s) case l: ['a] => self.$print_list_of_generic(l) # The following should be implicit, but I'll list it explicitly... default => raise TypeError The $'s above are just some magic to prevent collisions... however this is actually implemented is also not very important. One difficulty here is making sure the debugger handles things properly (i.e., maps stuff back to the original code properly)... The syntax of typecasing is not too important here. It could be the casting syntax proposed elsewhere. I prefer a real typecase statement based on matching as above. It's more powerful, and easier to read (I don't like choosing arbitrary operators). The problem is what exact syntax to use so that you can show types, bind to variables and allow for code blocks, while keeping the syntax as consistant with the rest of the language and type system as possible. The colon always precedes a code block, but we're using the colon to separate a variable from its type, too. However, don't like: case l : ['a] : self.$print_list_of_generic(l) Perhaps: case ['a] l : self.$print_list_of_generic(l) Though that suddenly makes type decls very irregular. Then there's the option of not explicitly assigning to a new variable: case ['a] : self.$print_list_of_generic(x) # note the x I think that last one gets my vote, currently. We can definitely figure out the type of x within that block. If it needs to be used outside the block, then people can copy it into a variable if they want to preserve the cast: decl l : ['a] typecase x: case ['a] : l = x By the way, note that the two "'a"'s in the above code can be different and still work: decl l : ['whatever] typecase x: case ['a] : l = x # This is okay... the types are compatible, # but now there is an implicit equivolence of # the 2 types. To implement the same kind of matching better functional languages provide, we'd need something that allowed for assigning to multiple vars at once: decl z : integer typecase a: case (x : integer, y : integer) => z = x + y case (x : integer,) => z = x case x : integer => z = x I'd like this type of matching, but it's got the too many colon syntax problem, since the => is not pythonesque... Hmm, what if all types were expressed using :- instead of :? Yes, :- is the assignment operator in one or two languages, but not too many people have used those languages: decl z :- integer typecase a: case (x :- integer, y :- integer) : z = x + y case (x :- integer,) : z = x case x :- integer : z = x That's not quite as bad. Now, on to variance of arguments. Contravariance is definitely bad IMHO. Yes, it's a simpler model, and lends itself to type safety better, but if you've ever programmed in Sather, you probably know that it can get really inconvenient to do really simple things. Invariance (aka nonvariance or novariance) is the approach taken by C++ and Java. It usually does pretty well, and is simple to implement. It usually does what you want, and doesn't lead to the same runtime type problems covariance does. However, consider a situation like the following: class Container: pass class LinkedList(Container): def add(self, l :- LinkedList): ... class BidirectionalLinkedList(LinkedList): def add(self, l :- ???): ... (BTW, I'm going to drop the type of self from here on out, and assume that it is always the type of the class in which the method is defined. I don't think there should be a type variable to allow for talking about the type of self, such as in the __add__ example in Guido's proposal. Let covariance of parameters do the work... type variables can lead to some pretty subtle problems.) In the above example, what should the type of l be? In a contravariant language, as the class gets more specific, the parameters must get more generic or stay the same. Therefore, implementing BidirectionalLinkedList requires an explicit type cast if we want to enforce the natural restriction that you can only add another bidirectional linked list on to the end of a bidirectional linked list... In a covariant language, "BidirectionalLinkedList" would be the right answer, and that seems natual. Choosing to keep your parameter invariant is actually fine as well, so you can do: class LinkedList: def merge(self, l :- LinkedList): ... class BidirectionalLinkedList(LinkedList): def merge(self, l :- LinkedList): ... (We don't care whether the parameter is bidirectional or not) I'll get back to the problems with this approach in a second. Invariance is an answer... the parameter would forever have to be LinkedList in all subclasses, but really requires a typecase on the parameter, or the clever use of a constrained generic type: class LinkedList LinkedList>: # specifies that the parameter must be # substitutable with LinkedList. def add(self, l :- T): ... def merge(self, l : LinkedList): ... class BidirectionalLinkedList(LinkedList): def add(self, l): ... # l is of type BidirectionalLinkedList here. def merge(self, l : LinkedList): ... I think constrained generic types are good(tm) and should be added. This was the approach I was leaning towards last night at the DC PIGgies meeting, but I've changed my mind, as, there are some reasonable problems with using them here. If we have a lot of parameters of different types that each need to vary, we have to use a ton of constrained types. It can get ugly quickly. Plus, if we wanted to add a feature to a base class that required a covariant parameter, we'd have to add a new constrained parameter, which change the interface of the class, potentially breaking a lot of code. I think this is bad, and perhaps worse than typecasting, which is another solution when you only have invariance. So what's wrong with covariance? Here's a common example of the problem: class Person: def ShareRoom(self, p :- Person)->None: self.room = p.room class Boy(Person): pass class Girl(Person): pass Hmm, the above isn't quite what we want, as the intent is probably to keep boys from rooming with girls. Let's assume we want to allow invariant parameters. We'd have to recode our Boy and Girl classes as such: class Boy(Person): def ShareRoom(self, p :- Boy)->None: Person.ShareRoom(self, p) class Girl(Person): def ShareRoom(self, p :- Girl)->None: Person.ShareRoom(self, p) The problem here is that we can now do the following: decl p :- Person decl g :- Girl p = Boy() g = Girl() g.room = blah p.ShareRoom(g) The last call there appears type correct... a static type checker will say "looks good!" because Person's ShareRoom accepts any person object. At run time, we will dispatch to the Boy's version of ShareRoom, which narrows the type, and yields a type error (if we are adding dynamic checks). One solution is adding anchored types to the language, which is okay, but the programmer could still write the above code, and it would still be broken. Anchored types essentially just gives the programmer a way to perform the above and get an error on the p.ShareRoom(g) call statically, but only if he added additional magic to one of his classes. Anchored types are cool, but I don't think this is the right solution for the problem, so I won't cover them right now. Meyer's prefered approach is to add a rule that says "polymorphic catcalls are invalid". What's a catcall? CAT stands for "Changing Availability or Type". In the context of this discussion, a routine that is a cat is a routine where the type of the arguments of a function vary in a derived class(*). A catcall is any call to a cat method. A polymorphic catcall is a call to a cat method where the target object is polymorphic, which happens in at least 2 cases: 1) The object appears in the LHS of an assignment, where the RHS is a subtype of the LHS. 2) The object is a formal parameter to a method. There might be a third case... I'll have to go look it up. I don't think there is any problem that would make the solution not appropriate for Python, though. It should be possible to implement this solution, and even do so incrementally. There's another (better, less pessimistic) solution called the global validity approach. The problem with it, IIRC, is that the algorithm basically assumes that type checking goes on in a "closed world" environment, where you're checking the entire system at once. That probably isn't desirable. I wonder if there haven't been refinements to this algorithm that allow it to work incrementally. Therefore, I'd definitely prefer covariance w/ a polymorphic catcall rule, assuming that the catcall rule can actually work for Python. BTW, return types should be covariant too. Syntax for exceptions: If we add exceptions as part of a signature, I don't think that I see a good reason to use anything other than something similar to Java's "throws" syntax. I'd add a comma before the "throws" for clarity's sake. Here's a right-recursive (LL) grammar fragment: throws_clause: "," "raises" throws_list; throws_list: identifier more_throws; more_throws: "," throws_list |; Example: class Client: def getServerVersion(self) -> string, raises ENotConnected, ETimeout: pass The problem is that "raises" is a bit wordy. I just don't like using symbols without natural meanings in these situations... who wants to be Perl? *Maybe* a ! would be alright, since there is a very loose connection: class Client: def getServerVersion(self) -> string ! ENotConnected, ETimeout: pass Or perhaps 2 bangs..., but I am really uncomfortable about the readability of that syntax for people new to the language. Whatever, I'm not too enamored by having exceptions as part of a signature. Part of the reason is variance of exceptions. I've seen cases where people wanted contravariance and it seemed natural. I don't think those situations are all that common, but I do also believe that exceptions in signatures can lead to code that is massively difficult to maintain as exceptions added to one method end up needing to propogate all the way up a call chain through a program. You end up with LONG exception lists. I need to think more about this topic, but right now I don't really care whether the programmer explicitly lists exceptions on a per-function basis. The type checker can still determine what exceptions don't get caught in the scope of what's been checked. If there is a new build process for static checking, it could report all uncaught exceptions at that time (probably optionally), and it could dump them to a supplimental file when doing incremental checking. In short, I haven't thought about this problem enough recently to say which way I prefer, but I lean towards not making exceptions part of signatures. Okay, this has been pretty long and rambling, and it's time for me to stop writing. I'm sorry if I didn't make total sense. I have more to say, but it'll have to wait until another day... John (*) Changing availability: The catcall rule also applies if a derived type also chooses to make a feature unavailable to the outside world (e.g., by removing the feature). No one has been proposing a feature of the system to do this sort of thing (yet... but visibility mechanisms haven't been discussed much so far as I've seen). Of course, you can always muck around with the attribute directly... From faassen@vet.uu.nl Wed Mar 15 18:21:28 2000 From: faassen@vet.uu.nl (Martijn Faassen) Date: Wed, 15 Mar 2000 19:21:28 +0100 Subject: [Types-sig] An ignorable wild idea Message-ID: <20000315192128.A28183@vet.uu.nl> Hi there, I had another wild idea the other day that's probably fairly silly, so of course I instantly thought I should share it with you all. If you don't feel like having the discussion move out into a strange direction simply don't reply or don't read this, though of course I wouldn't post this if I didn't want feedback. What are dynamic types in Python? They're basically attributes of Python objects. What would static types be? They're attributes of variables. We change attributes of objects like this: a.attribute = b i.e., with the '.' notation and assignment. It's sometimes even possible to change the type of an object (or at least the class, or the methods and data a particular object has). Now, the idea was introduce variable attributes. I'll use the operator -> for that, but any operator would do. a->attribute = b This would change the attribute of the variable 'a' to whatever's in 'b'. Since a type is an attribute of a variable, we'd set the type of a particular variable like this: a->type = Integer By default, all variables have the attribute 'type' set to 'Any'. Variable attribute access is all analyzed and evaluated during compile time, so one can't the result of some arbitrary Python expression into a variable attribute at runtime. There needs to be a separate compile-time separate namespace for variable attribute assignments, with some heavy restrictions. Accessing the variable attribute space during runtime is no problem, though. This could work: print a->type and this too: if a->type == Integer: # do whatever # though conditional type assignments should probably be disallowed: b->type = Integer though it's doubtful how useful this would be. Is this idea in fact useful at all, besides the nice parallel with object attributes? I'm not sure. This might come in handy if you're doing generic functions, perhaps: a->type = foo->type # the variable a will contain the same type as foo. And this variable attribute facility may have more uses. Perhaps it could support docstrings: a->doc = "Holds temporary value" And it could be used for introspection, if any variable inside scope is accessible through the -> operator: def foo(a): b->doc = "holds a number" b = 5 return b print foo->a->type print foo->b->doc class Bar: def __init__(self): pass method->type = String def method(self): b->type = Integer b = 15 return str(b) print Bar->method->b->type bar->type = Bar bar = Bar() if hasmethod(bar->type, 'method'): bar->method() Where 'hasmethod' could be evaluated at compile-time, so we'd get this: if 1: bar->method() but this if we removed 'method': if 0: bar->method() Of course you'd need many restrictions about what can be done with a class and objects at run time, but you need those in any case if you do static type checking. Anyway, these are mostly idle speculations, based on the idea that variables themselves have attributes. Just wanted to let you all know, if this is news at all. Many questions are left unanswered. :) Regards, Martijn From John@list.org Wed Mar 15 21:00:24 2000 From: John@list.org (John Viega) Date: Wed, 15 Mar 2000 13:00:24 -0800 Subject: [Types-sig] An ignorable wild idea In-Reply-To: <20000315192128.A28183@vet.uu.nl>; from Martijn Faassen on Wed, Mar 15, 2000 at 07:21:28PM +0100 References: <20000315192128.A28183@vet.uu.nl> Message-ID: <20000315130024.A30986@viega.org> Martijn, General, off the cuff comments: I think it isn't a bad idea to add attributes to variables, but I see that as completely seperate from type systems. For the most part, I can interpret your proposal as an alternate syntax for a proposed type system. I personally would prefer the type system to look more different than similar when compared to attributes. Beyond the syntax issues: You talk about disallowing conditional assignments to type variables: if some_condition: a->type = Integer else: a->type = Float Yes, this would be bad for static checking. When statically checking this, the only thing I can infer about a's type is Integer|Float. I've argued against OR-ing types previously. However, people aren't going to understand the restriction. Everything about the syntax and the feature seems to point to "dynamic". Plus, types can have lifetimes now, where a is definitely only an Integer for part of the program, and a Float for part of the program. It becomes difficult to reason about things statically when that's possible: a->type = Integer ... operations on a... a->type = Float ... more operations on a... It's not impossible to deal with, but an unnecessary headache. Then, how to specify parameter types, etc? The syntax wouldn't provide an elegant solution. I don't think it's the right approach *for types*, but I do think there may be some utility in having slots for variables in other areas for meta-information including debug information. BTW, one minor nit with what you said up front: every value in the language has a type, not just variables. John On Wed, Mar 15, 2000 at 07:21:28PM +0100, Martijn Faassen wrote: > What are dynamic types in Python? They're basically attributes of Python > objects. > > What would static types be? They're attributes of variables. > > We change attributes of objects like this: > > a.attribute = b > > i.e., with the '.' notation and assignment. It's sometimes even possible > to change the type of an object (or at least the class, or the methods and > data a particular object has). > > Now, the idea was introduce variable attributes. I'll use the operator > -> for that, but any operator would do. > > a->attribute = b > > This would change the attribute of the variable 'a' to whatever's in 'b'. > > Since a type is an attribute of a variable, we'd set the type of a > particular variable like this: > > a->type = Integer > > By default, all variables have the attribute 'type' set to 'Any'. > > Variable attribute access is all analyzed and evaluated during compile > time, so one can't the result of some arbitrary Python expression into > a variable attribute at runtime. There needs to be a separate > compile-time separate namespace for variable attribute assignments, with > some heavy restrictions. Accessing the variable attribute space during > runtime is no problem, though. This could work: > > print a->type > > and this too: > > if a->type == Integer: > # do whatever > # though conditional type assignments should probably be disallowed: > b->type = Integer > > though it's doubtful how useful this would be. > > Is this idea in fact useful at all, besides the nice parallel with object > attributes? I'm not sure. This might come in handy if you're doing > generic functions, perhaps: > > a->type = foo->type # the variable a will contain the same type as foo. > > And this variable attribute facility may have more uses. Perhaps it > could support docstrings: > > a->doc = "Holds temporary value" > > And it could be used for introspection, if any variable inside scope > is accessible through the -> operator: > > def foo(a): > b->doc = "holds a number" > b = 5 > return b > > print foo->a->type > print foo->b->doc > > class Bar: > def __init__(self): > pass > > method->type = String > def method(self): > b->type = Integer > b = 15 > return str(b) > > print Bar->method->b->type > > bar->type = Bar > bar = Bar() > > if hasmethod(bar->type, 'method'): > bar->method() > > Where 'hasmethod' could be evaluated at compile-time, so we'd get this: > > if 1: > bar->method() > > but this if we removed 'method': > > if 0: > bar->method() > > Of course you'd need many restrictions about what can be done with > a class and objects at run time, but you need those in any case if you > do static type checking. > > Anyway, these are mostly idle speculations, based on the idea that variables > themselves have attributes. Just wanted to let you all know, if this is news > at all. Many questions are left unanswered. :) > > Regards, > > Martijn > > > _______________________________________________ > Types-SIG mailing list > Types-SIG@python.org > http://www.python.org/mailman/listinfo/types-sig From scott@chronis.pobox.com Thu Mar 16 02:26:28 2000 From: scott@chronis.pobox.com (scott) Date: Wed, 15 Mar 2000 21:26:28 -0500 Subject: [Types-sig] A late entry In-Reply-To: <38CEC33C.6AC199A1@viega.org>; from john@viega.org on Tue, Mar 14, 2000 at 05:54:52PM -0500 References: <38CEC33C.6AC199A1@viega.org> Message-ID: <20000315212628.A99258@chronis.pobox.com> On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote: > Whoops. For me, this has always been an area of interest, but I > completely missed the fact that people were actually discussing stuff > here now, until the DC PIGgies meeting last night. Here are my > comments so far on what I've seen on the type sig. Unfortunately, I > haven't read much of what's been said, so I'm sorry if some things > have been discussed or are not appropriate. I've tried to at least > read or skim the proposals on the list web page. It's great to see a post from you here. I know you've studied this stuff and can offer valuable insights. > > Here I'm going to mainly focus on Guido's proposal, since it seems to > be one of the more recent proposals, incorporating select ideas from > other proposals. > > Let's start with some notes on terminology. I've seen the word > "typesafe" thrown around quite a bit in a short period of time. Let's > try to avoid using the word casually here, because it can lead to > confusion. I find people often don't know what they're refering to > when they use (or hear) the term. Does it mean that no type errors > can ever happen at run time? At all? Or that they can only happen > under specific conditions? If so, what are those conditions? > Cardelli would probably say that "safe" means that the error won't > cause a crash or go unnoticed at runtime (not everyone agrees with his > definitions). By his definition, Python is already a type safe > language. However, I often hear people who use the term to mean > static type safety... i.e., no type error will happen at run time. > There's essentially no practical hope for that in an object oriented > language like python. > > I think it might be nice for us to agree to share a common > terminology. I know there's a bit of dissention w/ Cardelli's > prefered terminology, but it at least provides us with a common > reference in a field where terminology tends to be very muddled. I > would recommend we all read Section 1 of Cardelli's "Type Systems" > (it's about 9 pages). I have copies of this paper in ps and pdf: > > http://www.list.org/~viega/cs655/TypeSystems.ps > http://www.list.org/~viega/cs655/TypeSystems.pdf neat paper. any other references to throw at us? > > Oh, one thing that I should mention is that throughout this document I > assume an inline syntax for type checking. I far prefer it. > > Okay, next I want to talk about the way that "checked" and "unchecked" > modules interact in Guido's proposal. He says that when an checked > module imports and uses an unchecked module, all the objects in the > unchecked module are assigned the type 'any' for the purposes of the > checking. I am not 100% sure that's the best idea in the face of type > inference. The checked module deserves type information that's as > specific as possible, so that better types can be infered for the > checked program. Of course, in many situations a type inference > engine isn't going to be able to infer much more than "any" when there > are absolutely no type hints in the code. But if something can be > done, why not? > > The flip side of the coin is that after the type checking occurs, the > unchecked code can exhibit bad behavior, breaking infered invariants. > For example, let's say some checked code calls foo.bar(), which is > infered to have the type: integer*integer->integer. Someone can > dynamically load a module from unchecked code which replaces foo.bar() > with a function of type string->integer. > > Options here? Run-time checks can be added at points where this is > possible. I don't know if I support runtime checks if they can be > avoided. Another option is to say that "guarantees" made by the type > checker only hold if the whole program is checked. I'm comfortable > with that (though I am still probably slightly in favor of the > addition of runtime checks), but I suspect that others will not be. I > have the same opinion when it comes to using checked modules from > unchecked modules. Guido proposes adding runtime checks. I might > prefer not to have them. Of course, one option is to support both > options... When we talk about adding runtime checks, it's a little unclear to me exactly what is meant. Are you referring to additional runtime checks that the existing dynamic type system in python does not provide? Is leveraging the existing dynamic type system in this way feasible for these sorts of checks? If this is possible, it is one approach I'd prefer -- there's no performance hit, and nothing that isn't checked. The only drawback I can see to using the existing system as a fallback is that it would limit the degree of optimization that is available, but I believe that's OK. > > I also might disagree with always performing type checking at load > time (when an unchecked module imports a checked module). I see new > type checking features as a static convenience for people who want to > use it. Are people going to want to pay a runtime hit when linking > against checked code when they don't want to check code themselves? I > don't think so. > > Arguing against myself again, you can also make a case that this sort > of dynamic checking is necessary to be able to live up to the promises > a static checker makes when it "passes" a piece of code. Guido talks > about dynamically importing a module and checking to see if the module > matches the expected signature. That's certainly true... here's a > really simple case: let's assume that A imports B, and that A+B were > fully checked statically, but someone replaced B with C after checking > and before runtime. Is it worth performing those extra checks? The > static checks have done the best they could, and now, if there are > gross incompatabilities, we're hopefully going to see an error > dynamically whether we added extra checks or not (you can certainly > construct cases where that's not true). I'd probably say it is worth > performing those static checks in some less frequent, highly dynamic > situations, such as when you've got a completely checked application, > then you import a module that was essentially typed to stdin. > > I think the right answer for Python depends on what the goals are. > I'm looking for something that will help me find bugs statically that > traditionally only cropped up at run time, when possible. Getting > contraversial, I don't know if I really care about supporting those > dynamic checks that are only there to support cases in which the > application is used in ways that subverts the assumptions I made when > performing the type checking, such as what modules I'd be using in my > application. It should be possible to turn all that stuff off, at the > very least, and just let everything go while completely ignoring type > information. Remember, dynamic features in the type system can end up > slowing down a language... I agree very much with all these points about checking modules dynamically. > > Okay, going to the GCD example... Do we really need to use the "decl" > keyword alone to indicate that a module has been checked? I'm fine > with gcd(3.5,3) throwing a dynamic error if the module is not checked > (I'd also be fine with it trying to run here, honestly). But should > it try to compute the result if I remove the "decl" keyword from the > file, but still specify types of the arguments to gcd?? > > Why does the word "decl" indicate the module is checked? Just because > the programmer added the keyword doesn't mean the types he or she > assigned are even internally consistant. Is the interpreter going to > recheck on startup every single time? It seems like a lot of overhead > to me. Plus, if the keyword isn't there, but there's some type > information, shouldn't we check the types? > > If the type checker is a separate static program, it seems to me that > things are more natural. We try to check the inputs to the program. > If those inputs use other modules, we go and check them, too. If the > checking fails (say it's unchecked), then we warn, and can either fail > or (optionally?) make assumptions (e.g., assign any everywhere) and > continue. > [ gcd working w/ floats ] > > Next, let's talk briefly about what type inferencing can do. In > particular, consider Guido's example where he assumes that the type > inference algorithm isn't sophisticated enough to handle complex > types, such as lists. Let me say that lists are pretty easy to handle > when doing type inferencing. Object types can get a tad bit tricky, > but everything should be doable. When I have time, I'll ramble a bit > about what I see as the best approach to implementing a type inference > engine for Python based on well-known algorithms. looking forward to the ramblings! > > There's a problem though. No one has ever been successful at coming > up with an OO language that successfully integrates all 3 of the > following: > > 1) Type inferencing > 2) Subtyping > 3) principal types > > Currently, you can choose any two (See Jens Palsburg's brief note > "Type Inference For Objects", which I have at > http://www.list.org/~viega/cs655/ObjInference.pdf. BTW, a principal > type summarizes all possible types of a given piece of code (variable, > function, etc). I'd like to capture all 3. Basically, 'a->'a is the > principal type of the "identity" function, even though there are > plenty of valid specific types that will work in a particular context. > > I don't know if we can solve this problem in the general case. I don't > think it's possible for Python's needs, but I haven't really given it > too much thought yet. It seems like you are refering to inferencing as a mechanism which can both allow the user to denote fewer types and as a means of dealing with the mixing of unchecked code with checked code. With regards to meating the first goal, a very limited kind of inferencing is possible, where the first assignment to a variable from an expression of a given type has the same affect as declaring the variable as that type in the first place. I think that the former goal of reducing the number of declarations the programmer must make is attainable with a mechanism like this, but the latter goal would require a real inferencing algo. > > A couple of brief thoughts at this point... What should the type > checker do with infered types? Place them directly into the code? > Place them in comments or a doc string so that they have no effect but > you can see what the checker did and double-check its work? maybe just print out inferred types if requested and the type checker is a separate program? > > Also, it might be nice to type code when you don't have access to the > source. Should it be possible to specify the types of features in a > module for which you don't have source (assume it wasn't written with > checking code)? If so, how? An interface definition file? Then do > you try to check the byte code to make sure the given interfaces are > actually valid? > > Here's an issue I haven't seen brought up yet. Consider the following > function: [ identity function signature ] > I think it would be best to be able to say, "the in type is the out > type, and who cares about the type beyond that". Parametric > polymorphism to the rescue... the type could be ->, where > indicates that x can be of any type. I think the syntax could be > better (I prefer ML's, which types identity as 'a->'a, but I hear > there's some resistance to it in this community... I'm going to use 'a > from now on because it looks more natural to me). One problem here is > that it probably isn't going to be possible to infer an accurate > principal polymorphic type for things in all cases (see above). I'll > have to give the worst cases some thought. The stab I have taken at writing a checker allows this kind of polymorphism for functions. It uses ~ instead of '. Using a prefix character is something I prefer as well, though I think there are much more important issues to ponder, so I get on with it :) > > You may have noticed that 'a looks a heck of a lot like the "any" > keyword proposed by others. It turns out to be pretty much the same > thing, except parametrically polymorphic. One thing you can do is > differentiate between different types (basically buying you > parameterized functions, but with a much better syntax, IMHO. Eg: > > def init_dict(key : 'a, val : 'b): > global dict : {'a:'b} > dict[key] = val > > The proposed alternative is something like: > > def init_dict(key: a, val : b): > ... > > There seems to be an implication that functions named init_dict will > need to be instantiated... That's a kludgy C++-ism... instantiation > is not necessary for parametric polymorphism. Correct. the stab-at-type-checker I wrote does this, and it would be handy. > > So should the proposed "any" keyword go away? Unfortunately, no. The > difference in semantics between "any" and generic types is that the > "any" keyword basically forces the program to forego type checks when > variables involving "any" are involved, which will sometimes be > necessary. > > An example might clear up the distinction: > > def f(x): # Here, x is infered to be of type 'a, which is currently > # the principal type SO FAR. > x.foo() # Whoops, we just had to narrow 'a to any object with a "foo" > # method. If x were of type "any", its type would stay the > same. > > > I really hope that people avoid using "any" *always*, but it does need > to be there, IMHO. > > Another problem that falls out at this point is how to write the type > of x when we've seen x call "foo". Do we look at all classes across > the application for ones with a foo method? And do we care if that > solutions precluds classes that dynamically add their "foo" method > from applying? Ick. I'd prefer to avoid the concrete, and infer > something like: > > << foo: None->'a >> > > Which would read: an object with a field foo of type function taking > no arguments and returning anything. Do you think it's ok to require that the programmer declare something about x in this case? If something like what you suggest is inferred there, it seems like a class of errors might slip through that we might not want to allow to slip. > > In the following case: > > def f(x): > decl i : integer > x.foo() > i = i + x.bar("xxx") > z = x.blah(1,2) > > We would infer the following type: > > << > foo: None->'a, > bar: string->integer, > blah: integer*integer->'b > >> > > Adding contrained polymorphic types in declarations should be > possible, even though it'd get messy without some sort of typedef > statement: > > def add_observer(x : << notify: string->None >> ) -> None: > global notify_list: << notify : string->None >> > if not notify_list: notify_list = [x] > else: notify_list.append(x) > > It'd be nice to be able to do: > > typedef notifyable << notify: string->None >> > def add_observer(x : notifyable) -> None: > global notify_list: notifyable > if not notify_list: notify_list = [x] > else: notify_list.append(x) contstrained polymorphism seems to solve lots of more complex static type issues in clean ways. Do you know of any references on implementations for constrained polymorphism? I'd like to look more closely at how it's done elsewhere before taking a crack at it myself. > > Ok, back to the OR-ing of types. The result of such a construct is > only going to be lots of ad hoc typecase stuff and types that are not > as precise as they should be (e.g., the > (string|integer)->(string|integer) example above). For example, > consider the following code: > > class a: > def foo(self : a, x : integer): ... > class b: > def bar(self : b): ... > > def blah(x : a | b): > x.foo(2) > > That code shouldn't pass through the type checker, because there's no > guarantee that x has a method foo. The only real solution every time > you have an OR type is a typecase statement, which is ad hoc and can > lead to maintainance problems. I really don't think there should be a > language construct to support what's really bad programming > practice... I think that "any" should be the only place in the system > where types can be that ambiguous. (To clarify... typecases are > sometimes necessary, but I think the OR-ing of types is a pretty bad > idea). I like the idea of discouraging OR'ing of types as much as possible. There are two cases where I think it could come in very handy: first, if the static type of None is considered a distinct type rather something like a null type, then there are lots of things that return or produce type-of(None) OR something else. Examples are dict.get and default arguments. After wrestling with this for a while, I've come to think that the introduction of a null-type in a static type system is more manageable for the programmer. Do you see other ways of dealing with that or how would you prefer that those things are handled? The other case is recursive data types such as trees, where a node can contain either other nodes or leaves. > > I do, however, support the AND-ing of types. There's still a minor > issue here. Consider the code: > > class a: > def foo(self: a, x: integer) -> integer: ... > class b: > def bar(self: b) -> None: ... > > def blah(x:a&b): > x.foo() > > This should definitely type check. However, what are the requirements > we should impose on the variable x? Must it inherit both a and b? Or > need it only implement the same methods that a and b statically > define? I prefer the former. There's also the option of restricting > &'s to interface types only, which I think is fine. BTW, if there > isn't an interface mechanism in the 1.0 version of the type system, > people will start defining "interfaces" as such: > > class IWidget: > def draw(self: IWidget) -> None: pass > def getboundingbox(self: IWidget) -> (float*float)*(float*float): pass > > That's okay, but you do want the type checker to be able to > distinguish between an abstract method and a concrete method. > Otherwise: > > class Scrollbar(IWidget): > pass > > Would automatically be a correct implementation of the IWidget > interface, even though we failed to define the methods listed in that > interface (we should be forced to add them explicitly, even if their > body is just a "pass"). I'd much rather the above give an error. I > think that special casing classes with no concrete implementations > isn't that good an idea, so an "interface" keyword should be > considered, which would look the same as classes without the method > bodies, and with the restriction that interfaces cannot inherit > classes (though it's desirable for classes to inherit interfaces, > obviously). I wonder if it would be good to also pull out the > explicit self parameter. I think it's a good idea to pull out the self parameter. It makes interfaces things that are more flexible and not constrained to class methods being the only callable attributes. > > > Eg: > > interface IWidget: > def draw() -> None > def getboundingbox() -> (float*float)*(float*float) > > I think it would be good to allow parameter-based method overloading > for people who use the type system. You'd be allowed to do stuff like: > > class Formatter: > def print(self: Formatter, i : integer)->None: ... > def print(self: Formatter, s : string)->None: ... > def print(self: Formatter, l : ['a])->None: ... > > It would be easy for the compiler to turn the above into something like: > > class Formatter: > def $print_integer(self, i)->None: ... > def $print_string(self, s)->None: ... > def $print_list_of_generic(self, l)->None: ... > def print(self, x)->None: > typecase x: > case i: integer => self.$print_integer(i) > case s: string => self.$print_string(s) > case l: ['a] => self.$print_list_of_generic(l) > # The following should be implicit, but I'll list it explicitly... > default => raise TypeError > > The $'s above are just some magic to prevent collisions... however > this is actually implemented is also not very important. One > difficulty here is making sure the debugger handles things properly > (i.e., maps stuff back to the original code properly)... Having multiple signatures for method overloading is a convenient way to express the idea. If such a thing were available to the type checking engine, it would be fairly easy to use the underlying data structures to map the overloading of the arithmetic operators, for example. > > The syntax of typecasing is not too important here. It could be the > casting syntax proposed elsewhere. I prefer a real typecase statement > based on matching as above. It's more powerful, and easier to read (I > don't like choosing arbitrary operators). The problem is what exact > syntax to use so that you can show types, bind to variables and allow > for code blocks, while keeping the syntax as consistant with the rest > of the language and type system as possible. The colon always > precedes a code block, but we're using the colon to separate a > variable from its type, too. However, don't like: > case l : ['a] : self.$print_list_of_generic(l) > Perhaps: > case ['a] l : self.$print_list_of_generic(l) > > Though that suddenly makes type decls very irregular. Then there's > the option of not explicitly assigning to a new variable: > > case ['a] : self.$print_list_of_generic(x) # note the x > > I think that last one gets my vote, currently. We can definitely > figure out the type of x within that block. If it needs to be used > outside the block, then people can copy it into a variable if they > want to preserve the cast: > > decl l : ['a] > typecase x: > case ['a] : l = x > > By the way, note that the two "'a"'s in the above code can be > different and still work: > > decl l : ['whatever] > typecase x: > case ['a] : l = x # This is okay... the types are compatible, > # but now there is an implicit equivolence of > # the 2 types. > > To implement the same kind of matching better functional languages > provide, we'd need something that allowed for assigning to multiple > vars at once: > > decl z : integer > typecase a: > case (x : integer, y : integer) => z = x + y > case (x : integer,) => z = x > case x : integer => z = x > > I'd like this type of matching, but it's got the too many colon syntax > problem, since the => is not pythonesque... > > Hmm, what if all types were expressed using :- instead of :? Yes, :- > is the assignment operator in one or two languages, but not too many > people have used those languages: > > decl z :- integer > typecase a: > case (x :- integer, y :- integer) : z = x + y > case (x :- integer,) : z = x > case x :- integer : z = x > > That's not quite as bad. With regards to type casing possibly causing the language to slow down by bringing the static type information into runtime, do you think it would be reasonable to allow typecasing only on types that are easily expressible in terms of the existing dynamic type system? It seems to me that this approach would save a lot of work, limit the runtime overhead, and discourage OR'ing all at once. > > > Now, on to variance of arguments. Contravariance is definitely bad > IMHO. Yes, it's a simpler model, and lends itself to type safety > better, but if you've ever programmed in Sather, you probably know > that it can get really inconvenient to do really simple > things. Invariance (aka nonvariance or novariance) is the approach > taken by C++ and Java. It usually does pretty well, and is simple to > implement. It usually does what you want, and doesn't lead to the > same runtime type problems covariance does. However, consider a > situation like the following: > > class Container: pass > class LinkedList(Container): > def add(self, l :- LinkedList): > ... > > class BidirectionalLinkedList(LinkedList): > def add(self, l :- ???): > ... > > (BTW, I'm going to drop the type of self from here on out, and assume > that it is always the type of the class in which the method is > defined. I don't think there should be a type variable to allow for > talking about the type of self, such as in the __add__ example in > Guido's proposal. Let covariance of parameters do the work... type > variables can lead to some pretty subtle problems.) > > In the above example, what should the type of l be? In a > contravariant language, as the class gets more specific, the > parameters must get more generic or stay the same. Therefore, > implementing BidirectionalLinkedList requires an explicit type cast if > we want to enforce the natural restriction that you can only add > another bidirectional linked list on to the end of a bidirectional > linked list... > > In a covariant language, "BidirectionalLinkedList" would be the right > answer, and that seems natual. Choosing to keep your parameter > invariant is actually fine as well, so you can do: > > class LinkedList: > def merge(self, l :- LinkedList): > ... > > class BidirectionalLinkedList(LinkedList): > def merge(self, l :- LinkedList): > ... > > (We don't care whether the parameter is bidirectional or not) > I'll get back to the problems with this approach in a second. > > Invariance is an answer... the parameter would forever have to be > LinkedList in all subclasses, but really requires a typecase on the > parameter, or the clever use of a constrained generic type: > > class LinkedList LinkedList>: # specifies that the parameter must > be > # substitutable with LinkedList. > def add(self, l :- T): > ... > def merge(self, l : LinkedList): > ... > > class BidirectionalLinkedList(LinkedList): > def add(self, l): > ... # l is of type BidirectionalLinkedList here. > def merge(self, l : LinkedList): > ... > > I think constrained generic types are good(tm) and should be added. > This was the approach I was leaning towards last night at the DC > PIGgies meeting, but I've changed my mind, as, there are some > reasonable problems with using them here. If we have a lot of > parameters of different types that each need to vary, we have to use a > ton of constrained types. It can get ugly quickly. Plus, if we > wanted to add a feature to a base class that required a covariant > parameter, we'd have to add a new constrained parameter, which change > the interface of the class, potentially breaking a lot of code. I > think this is bad, and perhaps worse than typecasting, which is > another solution when you only have invariance. Have you read about expressing the above with "mytype"? that is: interface if_LinkedList: def add(self, l :- mytype): ... class LinkedList(if_LinkedList): def add(self, l): ... class BiDirectionalLinkedList(LinkedList): ... the syntax is fairly simple, and 'mytype' just means the type of instances of the class that implements the method. > > So what's wrong with covariance? Here's a common example of the > problem: > > class Person: > def ShareRoom(self, p :- Person)->None: > self.room = p.room > > class Boy(Person): > pass > > class Girl(Person): > pass > > Hmm, the above isn't quite what we want, as the intent is probably to > keep boys from rooming with girls. Let's assume we want to allow > invariant parameters. We'd have to recode our Boy and Girl classes as > such: > > class Boy(Person): > def ShareRoom(self, p :- Boy)->None: > Person.ShareRoom(self, p) > > class Girl(Person): > def ShareRoom(self, p :- Girl)->None: > Person.ShareRoom(self, p) > > The problem here is that we can now do the following: > > decl p :- Person > decl g :- Girl > p = Boy() > g = Girl() > g.room = blah > p.ShareRoom(g) > > The last call there appears type correct... a static type checker will > say "looks good!" because Person's ShareRoom accepts any person > object. At run time, we will dispatch to the Boy's version of > ShareRoom, which narrows the type, and yields a type error (if we are > adding dynamic checks). > > One solution is adding anchored types to the language, which is okay, > but the programmer could still write the above code, and it would > still be broken. Anchored types essentially just gives the programmer > a way to perform the above and get an error on the p.ShareRoom(g) call > statically, but only if he added additional magic to one of his > classes. Anchored types are cool, but I don't think this is the right > solution for the problem, so I won't cover them right now. > > Meyer's prefered approach is to add a rule that says "polymorphic > catcalls are invalid". What's a catcall? CAT stands for "Changing > Availability or Type". In the context of this discussion, a routine > that is a cat is a routine where the type of the arguments of a > function vary in a derived class(*). A catcall is any call to a cat > method. A polymorphic catcall is a call to a cat method where the > target object is polymorphic, which happens in at least 2 cases: > > 1) The object appears in the LHS of an assignment, where the RHS is > a subtype of the LHS. > > 2) The object is a formal parameter to a method. > > There might be a third case... I'll have to go look it up. I don't > think there is any problem that would make the solution not > appropriate for Python, though. > > It should be possible to implement this solution, and even do so > incrementally. There's another (better, less pessimistic) solution > called the global validity approach. The problem with it, IIRC, is > that the algorithm basically assumes that type checking goes on in a > "closed world" environment, where you're checking the entire system at > once. That probably isn't desirable. I wonder if there haven't been > refinements to this algorithm that allow it to work incrementally. > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall > rule, assuming that the catcall rule can actually work for Python. One possible approach to covariance of method parameters is to check each method of class against all the possible different types of 'self'. This is what "stick" does, and it finds exactly the cases where there are type errors. It does require more checking than a general rule, and it does add complications to the problem of mixing checked modules and unchecked ones accross class inheritence, but it does work. I'd be interested in any feedback on this approach you have... > > BTW, return types should be covariant too. > > Syntax for exceptions: If we add exceptions as part of a signature, I > don't think that I see a good reason to use anything other than > something similar to Java's "throws" syntax. I'd add a comma before > the "throws" for clarity's sake. Here's a right-recursive (LL) grammar > fragment: > > throws_clause: "," "raises" throws_list; > throws_list: identifier more_throws; > more_throws: "," throws_list > |; > > Example: > > class Client: > def getServerVersion(self) -> string, raises ENotConnected, ETimeout: > pass > > The problem is that "raises" is a bit wordy. I just don't like using > symbols without natural meanings in these situations... who wants to > be Perl? *Maybe* a ! would be alright, since there is a very loose > connection: > > class Client: > def getServerVersion(self) -> string ! ENotConnected, ETimeout: > pass > > Or perhaps 2 bangs..., but I am really uncomfortable about the > readability of that syntax for people new to the language. > > Whatever, I'm not too enamored by having exceptions as part of a > signature. Part of the reason is variance of exceptions. I've seen > cases where people wanted contravariance and it seemed natural. I > don't think those situations are all that common, but I do also > believe that exceptions in signatures can lead to code that is > massively difficult to maintain as exceptions added to one method end > up needing to propogate all the way up a call chain through a program. > You end up with LONG exception lists. I need to think more about this > topic, but right now I don't really care whether the programmer > explicitly lists exceptions on a per-function basis. The type checker > can still determine what exceptions don't get caught in the scope of > what's been checked. If there is a new build process for static > checking, it could report all uncaught exceptions at that time > (probably optionally), and it could dump them to a supplimental file > when doing incremental checking. In short, I haven't thought about > this problem enough recently to say which way I prefer, but I lean > towards not making exceptions part of signatures. > > Okay, this has been pretty long and rambling, and it's time for me to > stop writing. I'm sorry if I didn't make total sense. I have more to > say, but it'll have to wait until another day... I'm waiting :) thanks for taking the time to write all this. scott > > John > > (*) Changing availability: The catcall rule also applies if a derived > type also chooses to make a feature unavailable to the outside world > (e.g., by removing the feature). No one has been proposing a feature > of the system to do this sort of thing (yet... but visibility > mechanisms haven't been discussed much so far as I've seen). Of > course, you can always muck around with the attribute directly... > > > > _______________________________________________ > Types-SIG mailing list > Types-SIG@python.org > http://www.python.org/mailman/listinfo/types-sig From John@list.org Thu Mar 16 19:24:19 2000 From: John@list.org (John Viega) Date: Thu, 16 Mar 2000 11:24:19 -0800 Subject: [Types-sig] A late entry In-Reply-To: <20000315212628.A99258@chronis.pobox.com>; from scott on Wed, Mar 15, 2000 at 09:26:28PM -0500 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> Message-ID: <20000316112419.D3845@viega.org> On Wed, Mar 15, 2000 at 09:26:28PM -0500, scott wrote: > On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote: > > neat paper. any other references to throw at us? Tons, more than you would want to read. I think the most interesting for you would be: Jens Palsberg and Michael Schwartzbach. Object-Oriented Type Systems. John Wiley and Sons, 1994. ISBN 0-471-941288 You can also look at some more of the papers I've given students in the past, some of which are downloadable from: http://www.list.org/~viega/cs655/ In particular, Unit 6 papers. The Cardelli and Wegner paper is worthwhile. The Milner paper isn't there, and it's pretty dense anyway. You'll be interested in Day et. al. (which is a reasonable starting place for constrained genericity). Out of the unit 8 papers, I'd probably recommend only Agesen's beyond what I've already given you. Abadi and Cardelli's is really dense and not all that interesting/applicable. Castagna's is there to see if students can find the big problems with the work... > When we talk about adding runtime checks, it's a little unclear to me > exactly what is meant. Are you referring to additional runtime checks > that the existing dynamic type system in python does not provide? Is > leveraging the existing dynamic type system in this way feasible for > these sorts of checks? If this is possible, it is one approach I'd > prefer -- there's no performance hit, and nothing that isn't checked. > The only drawback I can see to using the existing system as a fallback > is that it would limit the degree of optimization that is available, > but I believe that's OK. Any time you add a dynamic type check that wasn't there before, there is a performance hit. I'm just saying that we should try hard to minimize the number of dynamic checks, period. > It seems like you are refering to inferencing as a mechanism which can > both allow the user to denote fewer types and as a means of dealing > with the mixing of unchecked code with checked code. Honestly, without some type annotations, you're not likely to get very far on the later there. > With regards to > meating the first goal, a very limited kind of inferencing is > possible, where the first assignment to a variable from an expression > of a given type has the same affect as declaring the variable as that > type in the first place. I think that the former goal of reducing the > number of declarations the programmer must make is attainable with a > mechanism like this, but the latter goal would require a real > inferencing algo. I think it's silly to do a 1/2 assed job with an inference algorithm... I'd rather not have one at all. People don't want to memorize a rule beyond "if the type is ambiguous you must declare it explicitly". Honestly, type inferencing has its problems too. For example, code can infer a more general type than intended, etc. However, minimizing the effort of the programmer definitely seems more pythonesque. > > A couple of brief thoughts at this point... What should the type > > checker do with infered types? Place them directly into the code? > > Place them in comments or a doc string so that they have no effect but > > you can see what the checker did and double-check its work? > > maybe just print out inferred types if requested and the type checker > is a separate program? I dunno, I always thought it'd be cool if it spit out another copy of the code that's fully annotated, improving documentation and keeping the amount of work down. I actually prefer to see all type information, myself... > > I think it would be best to be able to say, "the in type is the out > > type, and who cares about the type beyond that". Parametric > > polymorphism to the rescue... the type could be ->, where > > indicates that x can be of any type. I think the syntax could be > > better (I prefer ML's, which types identity as 'a->'a, but I hear > > there's some resistance to it in this community... I'm going to use 'a > > from now on because it looks more natural to me). One problem here is > > that it probably isn't going to be possible to infer an accurate > > principal polymorphic type for things in all cases (see above). I'll > > have to give the worst cases some thought. > > The stab I have taken at writing a checker allows this kind of > polymorphism for functions. It uses ~ instead of '. Using a prefix > character is something I prefer as well, though I think there are much > more important issues to ponder, so I get on with it :) Use the tick, as it's widely accepted... assume the emacs mode problems can be fixed :) I take it you're not trying to infer principal types or anything complex yet... > > Another problem that falls out at this point is how to write the type > > of x when we've seen x call "foo". Do we look at all classes across > > the application for ones with a foo method? And do we care if that > > solutions precluds classes that dynamically add their "foo" method > > from applying? Ick. I'd prefer to avoid the concrete, and infer > > something like: > > > > << foo: None->'a >> > > > > Which would read: an object with a field foo of type function taking > > no arguments and returning anything. > > Do you think it's ok to require that the programmer declare something > about x in this case? If something like what you suggest is inferred > there, it seems like a class of errors might slip through that we > might not want to allow to slip. That's true with type inferencing, period. I think if we're going to have it, we should stick to regular rules instead of special casing stuff like this. I don't think that it's going to end up being a huge source of problems anyway. > > contstrained polymorphism seems to solve lots of more complex static > type issues in clean ways. Do you know of any references on > implementations for constrained polymorphism? I'd like to look more > closely at how it's done elsewhere before taking a crack at it myself. An okay place to start is with the Day et al. paper on the website I gave you above. > I like the idea of discouraging OR'ing of types as much as possible. > There are two cases where I think it could come in very handy: first, > if the static type of None is considered a distinct type rather > something like a null type, then there are lots of things that return > or produce type-of(None) OR something else. Examples are dict.get and > default arguments. After wrestling with this for a while, I've come > to think that the introduction of a null-type in a static type system > is more manageable for the programmer. Do you see other ways of > dealing with that or how would you prefer that those things are > handled? The other case is recursive data types such as trees, where > a node can contain either other nodes or leaves. I disagree with you here. For your first case, "None type": Many languages have void single-valued types without any need for OR-ing types. Remember, a type specifies the universe of possible values. For all object types, None is a valid value in that set of values. The void type is the set that only contains the value None. All object types are subtypes of the void type, and you get all the benefits of subtyping polymorphism. I don't see any problem of the sort you're talking about here, at all. For your second case, modling a tree where a node contains either other nodes or leaves: There are far better ways to model the problem. First, most trees don't have nodes without values, but let's ignore that for a minute, and assume otherwise. The natural way to model this problem is with subtyping polymorphism, not with the OR-ing of types: class NodeBase: # Theoretically abstract. def print_tree(self): pass class NonLeafNode(NodeBase): left :- NodeBase right :- NodeBase def print_tree(self): left.print_tree() right.print_tree() typedef printable << print()-> None >> class LeafNode< T -> printable >(NodeBase): value :- T def print_tree(self): value.print() "T -> printable" Should read something like "any type T that is printable" (constrained genericity). I still assert that OR-ing types should *not* be in a python type system. You're basically saying, "here are things that should require a runtime cast, but we're going to completely ignore that statically and dynamically". When are dynamic checks necessary? Generally, you're trying to do something that can be written as an assignment. Since types are essentially sets, the LHS has to be a subset of the RHS in order for us to make the determination that an assignment will always yield an object of a legal type. If the LHS and RHS are disjoint (well, if None is the only shared value w/ object types), that should never be possible. If there is some overlap, then a dynamic cast is required. I'd *really* like to see it be the case that the only times that "foo" is in the same set of values as 12 is when parametric polymorphism is involved, and with the any type. I am aware that not allowing OR'd types makes things a bit harder for legacy code that people want to change to use the type system (such as the standard library). The one place where it's a big issue is with heterogenous lists. However, I think it really reduces the power of a type system to allow a list to be typed (string|integer|file), etc. > > > > decl z :- integer > > typecase a: > > case (x :- integer, y :- integer) : z = x + y > > case (x :- integer,) : z = x > > case x :- integer : z = x > > > > That's not quite as bad. > > With regards to type casing possibly causing the language to slow down > by bringing the static type information into runtime, do you think it > would be reasonable to allow typecasing only on types that are easily > expressible in terms of the existing dynamic type system? It seems to > me that this approach would save a lot of work, limit the runtime > overhead, and discourage OR'ing all at once. Well, any time you have to dynamically cast there's going to be a performance hit. I'm not really worried about matching, though. You can do it fairly efficiently, plus it satisfies the principal of localized cost... the feature costs the programmer nothing unless he uses it. Also, I think that our most important goal here is to design the right type system for python, not to pick one that is very easy to implement... > > Have you read about expressing the above with "mytype"? that is: > > interface if_LinkedList: > def add(self, l :- mytype): > ... > > class LinkedList(if_LinkedList): > def add(self, l): > ... > > class BiDirectionalLinkedList(LinkedList): > ... > > the syntax is fairly simple, and 'mytype' just means the type of > instances of the class that implements the method. Well, first of all, let me say that covariance provides a much more elegent solution to the problem. Type variables are not as easy for the average programmer to understand. If type variables are done right, then they'd basically duplicate language features like genericity. No one wants to have non-orthogonal constructs, so we'd probably remove genericity, which would result in a type system that's more difficult to explain and use, plus not nearly as well understood. Another problem with using type variables to solve this problem is that it requires the programmer to anticipate how people will want to use their classes upfront. If you don't happen to use a type variable the first time you specify a parameter, derived classes cannot change the variance of the parameters without going back and modifying the original code. Plus, you guys haven't talked about any type variables except "mytype", which is not powerful enough to handle uses of covariance where the argument to a method is of some type other than the type for which the method is a member. > > It should be possible to implement this solution, and even do so > > incrementally. There's another (better, less pessimistic) solution > > called the global validity approach. The problem with it, IIRC, is > > that the algorithm basically assumes that type checking goes on in a > > "closed world" environment, where you're checking the entire system at > > once. That probably isn't desirable. I wonder if there haven't been > > refinements to this algorithm that allow it to work incrementally. > > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall > > rule, assuming that the catcall rule can actually work for Python. > > One possible approach to covariance of method parameters is to check > each method of class against all the possible different types of > 'self'. This is what "stick" does, and it finds exactly the cases > where there are type errors. It does require more checking than a > general rule, and it does add complications to the problem of mixing > checked modules and unchecked ones accross class inheritence, but it > does work. I'd be interested in any feedback on this approach you > have... To get it right, you would essentially be doing the same thing that the global validity approach does. In particular, you have the exact same problems in that a "closed world" assumption is required. Incremental checking is far more useful, and I think that the polymorphic catcall rule is simple enough (though not if you call it "polymorphic catcall" when you report an error to the end user!). John From scott@chronis.pobox.com Thu Mar 16 21:06:12 2000 From: scott@chronis.pobox.com (scott) Date: Thu, 16 Mar 2000 16:06:12 -0500 Subject: [Types-sig] A late entry In-Reply-To: <20000316112419.D3845@viega.org>; from John@list.org on Thu, Mar 16, 2000 at 11:24:19AM -0800 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> Message-ID: <20000316160612.A11488@chronis.pobox.com> On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote: > On Wed, Mar 15, 2000 at 09:26:28PM -0500, scott wrote: > > On Tue, Mar 14, 2000 at 05:54:52PM -0500, John Viega wrote: > > > > neat paper. any other references to throw at us? > > Tons, more than you would want to read. I think the most interesting > for you would be: > > Jens Palsberg and Michael Schwartzbach. Object-Oriented Type Systems. > John Wiley and Sons, 1994. ISBN 0-471-941288 > > You can also look at some more of the papers I've given students in > the past, some of which are downloadable from: > http://www.list.org/~viega/cs655/ > > In particular, Unit 6 papers. The Cardelli and Wegner > paper is worthwhile. The Milner paper isn't there, and it's pretty > dense anyway. You'll be interested in Day et. al. (which is a > reasonable starting place for constrained genericity). > > Out of the unit 8 papers, I'd probably recommend only Agesen's beyond > what I've already given you. Abadi and Cardelli's is really dense and > not all that interesting/applicable. Castagna's is there to see if > students can find the big problems with the work... I guess I need to set aside some time to read :) > > > When we talk about adding runtime checks, it's a little unclear to me > > exactly what is meant. Are you referring to additional runtime checks > > that the existing dynamic type system in python does not provide? Is > > leveraging the existing dynamic type system in this way feasible for > > these sorts of checks? If this is possible, it is one approach I'd > > prefer -- there's no performance hit, and nothing that isn't checked. > > The only drawback I can see to using the existing system as a fallback > > is that it would limit the degree of optimization that is available, > > but I believe that's OK. > > Any time you add a dynamic type check that wasn't there before, there > is a performance hit. I'm just saying that we should try hard to > minimize the number of dynamic checks, period. I don't get the feeling I was clear enough about what I'm saying re: dynamic checks, so I'll try again. Minimizing run time checks is something I very much agree with. I also agree that some are likely to be necessary. I think that where those checks are necessary, it seems to make sense to leverage python's existing type system to implement them, because that type system is already in place and there would be no need for python objects as they exist in C or java or whatever to carry around any additional information. For example, if we add a field to the list structure in C Python that contains the set of types contained in the list, then every time a del somelist[x] occured, the extra information would have to be updated by potentially checking the entire list. If there was a way to make runtime checks work reasonably without this kind of extra weight, it seems worth pursueing to me. One way that seems feasible is to leverage the existing type system in python. For example, if we provided a hierarchical interface to the existing type system, and that hierarchy were mirrored in the static system, then dynamic checks or casts could be limited to those expressible in the hierarchy based on python's existing type system (where you can compare lists and tuples but not lists of ints and lists of strings). I hope that clarifies the idea... > > > It seems like you are refering to inferencing as a mechanism which can > > both allow the user to denote fewer types and as a means of dealing > > with the mixing of unchecked code with checked code. > > Honestly, without some type annotations, you're not likely to get very > far on the later there. Of course :) I didn't mean to imply otherwise at all. > > > With regards to > > meating the first goal, a very limited kind of inferencing is > > possible, where the first assignment to a variable from an expression > > of a given type has the same affect as declaring the variable as that > > type in the first place. I think that the former goal of reducing the > > number of declarations the programmer must make is attainable with a > > mechanism like this, but the latter goal would require a real > > inferencing algo. > > I think it's silly to do a 1/2 assed job with an inference > algorithm... I'd rather not have one at all. People don't want to > memorize a rule beyond "if the type is ambiguous you must declare it > explicitly". Honestly, type inferencing has its problems too. For > example, code can infer a more general type than intended, etc. > However, minimizing the effort of the programmer definitely seems more > pythonesque. In my own experience, I've never seen a type inferencer that succeeded at only requiring type annotations where they would otherwise be ambiguous, that includes ML. I spent more time guessing what the type inferencer considered ambiguous than anything else. The rule of thumb for where annotations are required would be easier for me to get if it didn't involve potentially vague notions like deciding what is and is not ambiguous. That's just me, though. [...] > > I take it you're not trying to infer principal types or anything > complex yet... No, deduction would be a more accurate term. > > > > Another problem that falls out at this point is how to write the type > > > of x when we've seen x call "foo". Do we look at all classes across > > > the application for ones with a foo method? And do we care if that > > > solutions precluds classes that dynamically add their "foo" method > > > from applying? Ick. I'd prefer to avoid the concrete, and infer > > > something like: > > > > > > << foo: None->'a >> > > > > > > Which would read: an object with a field foo of type function taking > > > no arguments and returning anything. > > > > Do you think it's ok to require that the programmer declare something > > about x in this case? If something like what you suggest is inferred > > there, it seems like a class of errors might slip through that we > > might not want to allow to slip. > > > That's true with type inferencing, period. I think if we're going to > have it, we should stick to regular rules instead of special casing > stuff like this. I don't think that it's going to end up being a huge > source of problems anyway. One of things expressed earlier was the idea of leaving the implementation of inferencing until after the type system and checker were done. Does that order of events seem reasonable to you? > > I like the idea of discouraging OR'ing of types as much as possible. > > There are two cases where I think it could come in very handy: first, > > if the static type of None is considered a distinct type rather > > something like a null type, then there are lots of things that return > > or produce type-of(None) OR something else. Examples are dict.get and > > default arguments. After wrestling with this for a while, I've come > > to think that the introduction of a null-type in a static type system > > is more manageable for the programmer. Do you see other ways of > > dealing with that or how would you prefer that those things are > > handled? The other case is recursive data types such as trees, where > > a node can contain either other nodes or leaves. > > I disagree with you here. The way I read what you say below, we're actually agreeing about having a special type for the value None, it seems to work best to me as a valid value in the set of values of any object type. That's what I meant by 'something like a null type' above. By doing this, you lose the ability of a type checker to distinguish when something should be None and when it should not, but this approach makes lots of things easier both for the programmer and the implementation of a static type system. > > For your first case, "None type": Many languages have void > single-valued types without any need for OR-ing types. Remember, a > type specifies the universe of possible values. For all object types, > None is a valid value in that set of values. The void type is the set > that only contains the value None. All object types are subtypes of > the void type, and you get all the benefits of subtyping polymorphism. > I don't see any problem of the sort you're talking about here, at all. > > For your second case, modling a tree where a node contains either > other nodes or leaves: There are far better ways to model the problem. > First, most trees don't have nodes without values, but let's ignore > that for a minute, and assume otherwise. The natural way to model > this problem is with subtyping polymorphism, not with the OR-ing of > types: > > class NodeBase: # Theoretically abstract. > def print_tree(self): pass > > class NonLeafNode(NodeBase): > left :- NodeBase > right :- NodeBase > def print_tree(self): > left.print_tree() > right.print_tree() > > typedef printable << print()-> None >> > class LeafNode< T -> printable >(NodeBase): > value :- T > def print_tree(self): > value.print() > > "T -> printable" Should read something like "any type T that is > printable" (constrained genericity). This seems like a good approach, and if None is treated specially as above, then recursive types such as: typedef IntTree (IntTree, int, IntTree) aren't a problem either (atleast in terms of the need for OR). > > I still assert that OR-ing types should *not* be in a python type > system. >You're basically saying, "here are things that should require > a runtime cast, but we're going to completely ignore that statically > and dynamically". huh? You mean you feel that OR-ing creates this situation? In the worst case, I agree. I also have been searching for ways to eliminate or atleast reduce OR-ing myself. It seems essentially bad for static systems. > > When are dynamic checks necessary? Generally, you're trying to do > something that can be written as an assignment. Since types are > essentially sets, the LHS has to be a subset of the RHS in order for > us to make the determination that an assignment will always yield an > object of a legal type. You mean RHS must be a subset of the LHS, right? as in x :- numeric y :- int y = 1 x = y # this is ok, int is subset of numeric > If the LHS and RHS are disjoint (well, if > None is the only shared value w/ object types), that should never be > possible. If there is some overlap, then a dynamic cast is required. > I'd *really* like to see it be the case that the only times that "foo" > is in the same set of values as 12 is when parametric polymorphism is > involved, and with the any type. I don't think many will disagree with you there. > > I am aware that not allowing OR'd types makes things a bit harder for > legacy code that people want to change to use the type system (such as > the standard library). The one place where it's a big issue is with > heterogenous lists. However, I think it really reduces the power of a > type system to allow a list to be typed (string|integer|file), etc. > > > > > > > > decl z :- integer > > > typecase a: > > > case (x :- integer, y :- integer) : z = x + y > > > case (x :- integer,) : z = x > > > case x :- integer : z = x > > > > > > That's not quite as bad. > > > > With regards to type casing possibly causing the language to slow down > > by bringing the static type information into runtime, do you think it > > would be reasonable to allow typecasing only on types that are easily > > expressible in terms of the existing dynamic type system? It seems to > > me that this approach would save a lot of work, limit the runtime > > overhead, and discourage OR'ing all at once. > > Well, any time you have to dynamically cast there's going to be a > performance hit. I'm not really worried about matching, though. You > can do it fairly efficiently, plus it satisfies the principal of > localized cost... the feature costs the programmer nothing unless he > uses it. This is true so long as extra information isn't carried around and kept up to date at run time in order to make matching more efficient. > Also, I think that our most important goal here is to design > the right type system for python, not to pick one that is very easy to > implement... agreed. There seems to be an initially steep learning curve to designing type systems. I know it's been that way for me. At some point in the near future I hope to have read and learned enough to take another stab at implementing an optional static type system for python. > > > > > > > Have you read about expressing the above with "mytype"? that is: > > > > interface if_LinkedList: > > def add(self, l :- mytype): > > ... > > > > class LinkedList(if_LinkedList): > > def add(self, l): > > ... > > > > class BiDirectionalLinkedList(LinkedList): > > ... > > > > the syntax is fairly simple, and 'mytype' just means the type of > > instances of the class that implements the method. > > Well, first of all, let me say that covariance provides a much more > elegent solution to the problem. Type variables are not as easy for > the average programmer to understand. If type variables are done > right, then they'd basically duplicate language features like > genericity. No one wants to have non-orthogonal constructs, so we'd > probably remove genericity, which would result in a type system that's > more difficult to explain and use, plus not nearly as well understood. > > Another problem with using type variables to solve this problem is > that it requires the programmer to anticipate how people will want to > use their classes upfront. If you don't happen to use a type variable > the first time you specify a parameter, derived classes cannot change > the variance of the parameters without going back and modifying the > original code. Plus, you guys haven't talked about any type variables > except "mytype", which is not powerful enough to handle uses of > covariance where the argument to a method is of some type other than > the type for which the method is a member. time to take a closer look at constrained parametric polymorphism :) > > > > It should be possible to implement this solution, and even do so > > > incrementally. There's another (better, less pessimistic) solution > > > called the global validity approach. The problem with it, IIRC, is > > > that the algorithm basically assumes that type checking goes on in a > > > "closed world" environment, where you're checking the entire system at > > > once. That probably isn't desirable. I wonder if there haven't been > > > refinements to this algorithm that allow it to work incrementally. > > > Therefore, I'd definitely prefer covariance w/ a polymorphic catcall > > > rule, assuming that the catcall rule can actually work for Python. > > > > One possible approach to covariance of method parameters is to check > > each method of class against all the possible different types of > > 'self'. This is what "stick" does, and it finds exactly the cases > > where there are type errors. It does require more checking than a > > general rule, and it does add complications to the problem of mixing > > checked modules and unchecked ones accross class inheritence, but it > > does work. I'd be interested in any feedback on this approach you > > have... > > To get it right, you would essentially be doing the same thing that > the global validity approach does. In particular, you have the exact > same problems in that a "closed world" assumption is required. > Incremental checking is far more useful, and I think that the > polymorphic catcall rule is simple enough (though not if you call it > "polymorphic catcall" when you report an error to the end user!). I'll look into that more, as well as potential means for making the global validity approach work more incrementally. thanks, scott From bwarsaw@cnri.reston.va.us Thu Mar 16 21:36:12 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 16 Mar 2000 16:36:12 -0500 (EST) Subject: [Types-sig] A late entry References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> Message-ID: <14545.21452.817822.231182@anthem.cnri.reston.va.us> >>>>> "JV" == John Viega writes: JV> Use the tick, as it's widely accepted... assume the emacs mode JV> problems can be fixed :) I'll just pipe in here. :) If you use a tick, you will break python-mode and I predict that it will never be fixed, because what's really happening is that you're breaking some fundamental assumptions that X/Emacs makes about code. Trust me on this. Why do you think Perl added `::' ? Not /just/ to make C++ programmers more comfortable. -Barry From John@list.org Thu Mar 16 22:00:56 2000 From: John@list.org (John Viega) Date: Thu, 16 Mar 2000 14:00:56 -0800 Subject: [Types-sig] A late entry In-Reply-To: <20000316160612.A11488@chronis.pobox.com>; from scott on Thu, Mar 16, 2000 at 04:06:12PM -0500 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <20000316160612.A11488@chronis.pobox.com> Message-ID: <20000316140056.F3845@viega.org> On Thu, Mar 16, 2000 at 04:06:12PM -0500, scott wrote: > On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote: > > I don't get the feeling I was clear enough about what I'm saying re: > dynamic checks, so I'll try again. Minimizing run time checks is > something I very much agree with. I also agree that some are likely > to be necessary. I think that where those checks are necessary, it > seems to make sense to leverage python's existing type system to > implement them, because that type system is already in place and there > would be no need for python objects as they exist in C or java or > whatever to carry around any additional information. I didn't disagree with this. It seems rather obvious to try to avoid duplicating work if it's not necessary. > For example, if we add a field to the list structure in C Python that > contains the set of types contained in the list, then every time a del > somelist[x] occured, the extra information would have to be updated by > potentially checking the entire list. If there was a way to make > runtime checks work reasonably without this kind of extra weight, it > seems worth pursueing to me. One way that seems feasible is to > leverage the existing type system in python. Ack! Set of types bad! The only problem with disallowing sets of types (OR-ing types) is that legacy code will either need to be rewritten or use "any", where it might be nice to get a bit more accurate than "any". I don't think it's worth supporting the feature for this reason alone, as it has way too many bad consequences, and the good consequences aren't that good. > For example, if we provided a hierarchical interface to the existing > type system, and that hierarchy were mirrored in the static system, > then dynamic checks or casts could be limited to those expressible in > the hierarchy based on python's existing type system (where you can > compare lists and tuples but not lists of ints and lists of strings). Oh, I don't think I agree with you here. I think it's fine to leverage off the existing type system where possible, but you need to be able to add dynamic checks any time you can determine statically that something may or may not need to give a type error at runtime. If the dynamic checks needed can't be expressed in Python's dynamic type system currently, then the dynamic system will have to be changed. I wouldn't worry too much about this problem, though. If you have a good static type system, the number of dynamic checks that get added will generally be pretty low. The code in question can be added without changes to Python's current type system. > > > > I think it's silly to do a 1/2 assed job with an inference > > algorithm... I'd rather not have one at all. People don't want to > > memorize a rule beyond "if the type is ambiguous you must declare it > > explicitly". Honestly, type inferencing has its problems too. For > > example, code can infer a more general type than intended, etc. > > However, minimizing the effort of the programmer definitely seems more > > pythonesque. > > In my own experience, I've never seen a type inferencer that succeeded > at only requiring type annotations where they would otherwise be > ambiguous, that includes ML. I spent more time guessing what the type > inferencer considered ambiguous than anything else. The rule of > thumb for where annotations are required would be easier for me to get > if it didn't involve potentially vague notions like deciding what is > and is not ambiguous. That's just me, though. ML is unification-based. Depending on what kind of implementation of unification is used, there can be some bizzare corner cases (the correct implementation is harder/less efficient). But generally, if the code you wrote was actually unambiguous, then ML can infer the principal type. In that respect, I do not believe you are correct. However, I will agree that it can take people a while to develop a good mental model of what an inferencer is actually going to deduce. Like I said before, I'm perfectly comfortable with requiring explicit types in order to type check a program. But there seems to be some interest in having the automation, even despite the drawbacks. If that is the case, I'd rather see something powerful and well-designed with few restrictions than something ad hoc with many restrictions. > One of things expressed earlier was the idea of leaving the > implementation of inferencing until after the type system and checker > were done. Does that order of events seem reasonable to you? Well, if you design things right, your inferencer can leverage some of the infrastructure of your checker quite effectively. It's definitely the prefered ordering. > The way I read what you say below, we're actually agreeing about > having a special type for the value None, it seems to work best to me > as a valid value in the set of values of any object type. That's what > I meant by 'something like a null type' above. By doing this, you > lose the ability of a type checker to distinguish when something > should be None and when it should not, but this approach makes lots of > things easier both for the programmer and the implementation of a > static type system. No, most languages have a rule that variables cannot have the void type as their principal type. This is no reason to allow OR-ing of types. Note how it isn't an issue in pretty much every other language, either. So I still don't see what you are seeing that forces an OR construct. > > For your second case, modling a tree where a node contains either > > other nodes or leaves: There are far better ways to model the problem. > > First, most trees don't have nodes without values, but let's ignore > > that for a minute, and assume otherwise. The natural way to model > > this problem is with subtyping polymorphism, not with the OR-ing of > > types: > > > > class NodeBase: # Theoretically abstract. > > def print_tree(self): pass > > > > class NonLeafNode(NodeBase): > > left :- NodeBase > > right :- NodeBase > > def print_tree(self): > > left.print_tree() > > right.print_tree() > > > > typedef printable << print()-> None >> > > class LeafNode< T -> printable >(NodeBase): > > value :- T > > def print_tree(self): > > value.print() > > > > "T -> printable" Should read something like "any type T that is > > printable" (constrained genericity). > > This seems like a good approach, and if None is treated specially as > above, then recursive types such as: > > typedef IntTree (IntTree, int, IntTree) > > aren't a problem either (atleast in terms of the need for OR). I got confused here for a second... this looks too much like an actual tuple use for me. :) Whatever syntax ends up getting used, can we use (x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2 of z? That's a very common notation. > > I still assert that OR-ing types should *not* be in a python type > > system. > > > >You're basically saying, "here are things that should require > > a runtime cast, but we're going to completely ignore that statically > > and dynamically". > > huh? You mean you feel that OR-ing creates this situation? In the > worst case, I agree. I also have been searching for ways to eliminate > or atleast reduce OR-ing myself. It seems essentially bad for static > systems. Yes, OR-ing creates that situation. I've definitely been arguing that it makes your static checking less precise, forcing many type errors to be caught at runtime. I think that "any" should be the only shady construct here, personally. It should suffice for supporting code that was untyped as written. There is no need for an OR construct, it can and should be eliminated. I'll be very disappointed if it makes it into the Python type system :) > > > > > When are dynamic checks necessary? Generally, you're trying to do > > something that can be written as an assignment. Since types are > > essentially sets, the LHS has to be a subset of the RHS in order for > > us to make the determination that an assignment will always yield an > > object of a legal type. > > You mean RHS must be a subset of the LHS, right? Of course; my typo. > > Well, any time you have to dynamically cast there's going to be a > > performance hit. I'm not really worried about matching, though. You > > can do it fairly efficiently, plus it satisfies the principal of > > localized cost... the feature costs the programmer nothing unless he > > uses it. > > This is true so long as extra information isn't carried around and > kept up to date at run time in order to make matching more efficient. Well, you're going to want to keep complete type information around for the runtime to use anyway, so that doesn't really matter. > > Well, first of all, let me say that covariance provides a much more > > elegent solution to the problem. Type variables are not as easy for > > the average programmer to understand. If type variables are done > > right, then they'd basically duplicate language features like > > genericity. No one wants to have non-orthogonal constructs, so we'd > > probably remove genericity, which would result in a type system that's > > more difficult to explain and use, plus not nearly as well understood. > > > > Another problem with using type variables to solve this problem is > > that it requires the programmer to anticipate how people will want to > > use their classes upfront. If you don't happen to use a type variable > > the first time you specify a parameter, derived classes cannot change > > the variance of the parameters without going back and modifying the > > original code. Plus, you guys haven't talked about any type variables > > except "mytype", which is not powerful enough to handle uses of > > covariance where the argument to a method is of some type other than > > the type for which the method is a member. > > time to take a closer look at constrained parametric polymorphism :) This isn't the best solution, IMHO. I advocate covariance + a poly catcall rule. > > To get it right, you would essentially be doing the same thing that > > the global validity approach does. In particular, you have the exact > > same problems in that a "closed world" assumption is required. > > Incremental checking is far more useful, and I think that the > > polymorphic catcall rule is simple enough (though not if you call it > > "polymorphic catcall" when you report an error to the end user!). > > I'll look into that more, as well as potential means for making the > global validity approach work more incrementally. There's been some work done in this area, but nothing that had actually been implemented last I checked. Why is it necessary? What do you have against a poly catcall rule? John From John@list.org Thu Mar 16 22:12:31 2000 From: John@list.org (John Viega) Date: Thu, 16 Mar 2000 14:12:31 -0800 Subject: [Types-sig] A late entry In-Reply-To: <14545.21452.817822.231182@anthem.cnri.reston.va.us>; from Barry A. Warsaw on Thu, Mar 16, 2000 at 04:36:12PM -0500 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <14545.21452.817822.231182@anthem.cnri.reston.va.us> Message-ID: <20000316141231.G3845@viega.org> As far as I know, the assumption is that you won't be able to approximate the grammar with regexp-based matching. You already can't do it perfectly, of course. In practice, you just have to change your regular expressions around in contexts where people can specify a type. For example, if you looked at a def line and just said "find all quote pairs and format the crap inbetween", you'd have to get more complex, "find the next argument, and format pairs of ticks, if a pair is found up to a :-". I haven't ever written a font-lock mode, so I don't know what the interface to emacs primitives for this stuff looks like. And thus, I might be wrong, it may turn out to be really difficult. It's definitely possible, though, even if font-lock had to essentially be recreated from scratch with better primitives :) Now that might not be worth the effort, but I'd like to assume that the emacs problems can be fixed and are worth fixing to gain a syntax that's more natural to people who are actually familiar with this stuff. At least, let's please do that for the sake of discussing these concepts in this thread, because I get confused very easily :) John On Thu, Mar 16, 2000 at 04:36:12PM -0500, Barry A. Warsaw wrote: > > >>>>> "JV" == John Viega writes: > > JV> Use the tick, as it's widely accepted... assume the emacs mode > JV> problems can be fixed :) > > I'll just pipe in here. :) > > If you use a tick, you will break python-mode and I predict that it > will never be fixed, because what's really happening is that you're > breaking some fundamental assumptions that X/Emacs makes about code. > Trust me on this. Why do you think Perl added `::' ? Not /just/ to > make C++ programmers more comfortable. > > -Barry From scott@chronis.pobox.com Thu Mar 16 22:51:08 2000 From: scott@chronis.pobox.com (scott) Date: Thu, 16 Mar 2000 17:51:08 -0500 Subject: [Types-sig] A late entry In-Reply-To: <20000316140056.F3845@viega.org>; from John@list.org on Thu, Mar 16, 2000 at 02:00:56PM -0800 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <20000316160612.A11488@chronis.pobox.com> <20000316140056.F3845@viega.org> Message-ID: <20000316175108.C14723@chronis.pobox.com> On Thu, Mar 16, 2000 at 02:00:56PM -0800, John Viega wrote: > On Thu, Mar 16, 2000 at 04:06:12PM -0500, scott wrote: > > On Thu, Mar 16, 2000 at 11:24:19AM -0800, John Viega wrote: > > The way I read what you say below, we're actually agreeing about > > having a special type for the value None, it seems to work best to me > > as a valid value in the set of values of any object type. That's what > > I meant by 'something like a null type' above. By doing this, you > > lose the ability of a type checker to distinguish when something > > should be None and when it should not, but this approach makes lots of > > things easier both for the programmer and the implementation of a > > static type system. > > No, most languages have a rule that variables cannot have the void > type as their principal type. This is no reason to allow OR-ing of > types. Note how it isn't an issue in pretty much every other > language, either. So I still don't see what you are seeing that > forces an OR construct. I'm not saying anything forces an OR construct, or even that one is a good idea. I'm just trying to get the implications straight. One of the implications of treating the type of None as a principal type is that a static type checker will be able to say "hey x might be None, but you're assuming it's a string!" in code like the following: x = {'foo': 'bar'}.get('baz') x = x + '' That's a good thing, and allowing the type of x to be 'None | string' combined with the necessity of typecasing the result is one way of having a static type system understand this error. But, as you argue, there are lots of tradeoffs to consider with allowing OR's. I personally don't think that it's worth it to introduce OR's for cases like this (I used to, but trying to build a static type system with this construct made me change my mind). The idea does have it's proponents, so it's definitely worth mentioning that there is a tradeoff to be considered. > > This seems like a good approach, and if None is treated specially as > > above, then recursive types such as: > > > > typedef IntTree (IntTree, int, IntTree) > > > > aren't a problem either (atleast in terms of the need for OR). > > I got confused here for a second... this looks too much like an actual > tuple use for me. :) Whatever syntax ends up getting used, can we use > (x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2 > of z? That's a very common notation. hmm. I don't want to get into syntax wars. I'll use whatever syntax you like for this discussion. The reason I used the above syntax is that it was proposed and used lots in previous discussions. [...] > > > > Well, any time you have to dynamically cast there's going to be a > > > performance hit. I'm not really worried about matching, though. You > > > can do it fairly efficiently, plus it satisfies the principal of > > > localized cost... the feature costs the programmer nothing unless he > > > uses it. > > > > This is true so long as extra information isn't carried around and > > kept up to date at run time in order to make matching more efficient. > > Well, you're going to want to keep complete type information around > for the runtime to use anyway, so that doesn't really matter. yikes. I was hoping to avoid that, as it implies a major leap in difficulty. It also seems like making this info available at runtime may imply that a static type system isn't really feasible until PY3000, which is a little disappointing. We'll see, I guess :) > > > To get it right, you would essentially be doing the same thing that > > > the global validity approach does. In particular, you have the exact > > > same problems in that a "closed world" assumption is required. > > > Incremental checking is far more useful, and I think that the > > > polymorphic catcall rule is simple enough (though not if you call it > > > "polymorphic catcall" when you report an error to the end user!). > > > > I'll look into that more, as well as potential means for making the > > global validity approach work more incrementally. > > There's been some work done in this area, but nothing that had > actually been implemented last I checked. Why is it necessary? I'm just trying to understand the options, I don't have the knowledge to rule a whole lot of things out at this point, and I feel the need to understand things well enough to make those calls for myself. >What > do you have against a poly catcall rule? Nothing except that I don't know enough about it yet. Just found a good description at http://www.eiffel.com/doc/manuals/technology/typing/paper/page.html like I said before, I need to set aside more time to read! scott From bwarsaw@cnri.reston.va.us Thu Mar 16 22:58:56 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Thu, 16 Mar 2000 17:58:56 -0500 (EST) Subject: [Types-sig] A late entry References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <14545.21452.817822.231182@anthem.cnri.reston.va.us> <20000316141231.G3845@viega.org> Message-ID: <14545.26416.628396.276382@anthem.cnri.reston.va.us> font-lock is only one of the problems, and even it is only partially driven by regexps. There are C primitives that handle things like parsing over a string, comment, or s-expression. These cannot be taught that a non-embedded unescaped tick opens a string sometimes but not other times. Having actually implemented support for dual-comment styles in a single buffer (i.e. /*...*/ and //...\n) I can tell you that this stuff is really really tricky. So somebody (not me :) is either going to have to rewrite the syntax parsing model and primitives from scratch, or throw out anything that actually uses the primitives. Font-locking is one thing, but there may be more subtle breakages. I have no idea how well cperl-mode handles this stuff, but they've already been down that road. You could also argue that Py3K shouldn't have to cater to a 20 year old technology like Emacs, and you'd probably be right. I'd still grumble though :) I'd also be interested in seeing what the IDLE developers think about such syntax changes. My prediction stands: it'll never get done, even if it were possible. Meaning, I really don't think it's worth the effort, and I can't imagine anybody actually spending the time to do it. >>>>> "JV" == John Viega writes: JV> Now that might not be worth the effort, but I'd like to assume JV> that the emacs problems can be fixed and are worth fixing to JV> gain a syntax that's more natural to people who are actually JV> familiar with this stuff. At least, let's please do that for JV> the sake of discussing these concepts in this thread, because JV> I get confused very easily :) Sure! Use whatever notation makes sense for the current discussions. You're first point is more interesting because I don't think /any/ of these typing issues will seem natural to the vast majority of Python hackers. I could be wrong, and besides you know my biases already, so I'll shut up now :). -Barry From tim_one@email.msn.com Fri Mar 17 06:24:09 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 17 Mar 2000 01:24:09 -0500 Subject: [Types-sig] A late entry In-Reply-To: <14545.26416.628396.276382@anthem.cnri.reston.va.us> Message-ID: <000301bf8fd9$6798b700$682d153f@tim> [Barry Warsaw, explaining the problems 'a would create for python-mode.el] > ... > You could also argue that Py3K shouldn't have to cater to a 20 year > old technology like Emacs, and you'd probably be right. P3K should cater to Python programmers, though! 'a simply looks like an unterminated string, regardless of whether pymode or Python programmers are looking at it. The second most likely bad interpretation will come from Lispers, viewing it as a symbol. So it's simply poor notation for Python. 'a' would work, though! Haskell uses unadorned letters for type parameters (i.e., the ML convention isn't universal even among its relatives) -- but Haskell doesn't have inline function declarations. > ... > I'd also be interested in seeing what the IDLE developers think about > such syntax changes. I expect that context-sensitive literal syntax is a non-starter regardless of tool (don't forget PythonWorks, and tokenize.py, and pyclbr.py, and kjlint, and untold mounds of homegrown stuff that also expects apostrophe to mean string). unary-plus-is-pretty-much-unused-ly y'rs - tim PS: [John Viega] > ... > As far as I know, the assumption is that you won't be able to approximate > the grammar with regexp-based matching. You already can't do it perfectly, > of course. pymode cannot because of its reliance on the Emacs parsing functions. But IDLE's regexp-based parsing is believed to be 100% correct(*). Ditto tokenize.py's. Don't get hung up on the spelling! As someone else wise once said, Guido is a master of syntax, and will pick something *he* likes regardless of what we recommend <0.9 wink>. (*) For a value of 100 strictly less than 100 , but equal to 100 for the almost-inclusive subset of Python's full grammar Guido doesn't regret: >>> i = 3and 4 is mis-colorized by IDLE, and that's the way Guido wants it (in order, of course, to discourage it). From jeremy-home@cnri.reston.va.us Fri Mar 17 16:56:42 2000 From: jeremy-home@cnri.reston.va.us (Jeremy Hylton) Date: Fri, 17 Mar 2000 11:56:42 -0500 (EST) Subject: [Types-sig] Re: A late entry In-Reply-To: <200003171631.LAA23287@ns1.cnri.reston.va.us> References: <200003171631.LAA23287@ns1.cnri.reston.va.us> Message-ID: <14546.24663.399881.121026@walden> >I think it would be good to allow parameter-based method overloading >for people who use the type system. You'd be allowed to do stuff like: > >class Formatter: > def print(self: Formatter, i : integer)->None: ... > def print(self: Formatter, s : string)->None: ... > def print(self: Formatter, l : ['a])->None: ... I have been uneasy about OR types, too. I think the primary source of OR-ing is default arguments and various methods that implement Pythonic method overloading. If we allow method overloading in the type system -- to describe multiple valid signatures of a single method object -- we might eliminate many of the problems. class Foo: decl __init__(self, arg1: int, arg2: int) decl __init__(self, arg1: string) def __init__(self, arg1=None, arg2=None): [...] I think this is a little simpler than the propopsal you made. It merely provides a mechanism to define simple types for existing Python code. The other significant source of OR types is treating None as a distinct type, which requires an OR type anywhere that you want to pass an object or None. If we also eliminate that, there is little need for OR types. -- Jeremy Hylton From John@list.org Sat Mar 18 01:26:08 2000 From: John@list.org (John Viega) Date: Fri, 17 Mar 2000 17:26:08 -0800 Subject: [Types-sig] A late entry In-Reply-To: <20000316175108.C14723@chronis.pobox.com>; from scott on Thu, Mar 16, 2000 at 05:51:08PM -0500 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <20000316160612.A11488@chronis.pobox.com> <20000316140056.F3845@viega.org> <20000316175108.C14723@chronis.pobox.com> Message-ID: <20000317172608.A12852@viega.org> On Thu, Mar 16, 2000 at 05:51:08PM -0500, scott wrote: > On Thu, Mar 16, 2000 at 02:00:56PM -0800, John Viega wrote: > > I'm not saying anything forces an OR construct, or even that one is a > good idea. I'm just trying to get the implications straight. One of > the implications of treating the type of None as a principal type is > that a static type checker will be able to say "hey x might be None, > but you're assuming it's a string!" in code like the following: > > x = {'foo': 'bar'}.get('baz') > x = x + '' > > That's a good thing, and allowing the type of x to be 'None | string' > combined with the necessity of typecasing the result is one way of > having a static type system understand this error. It's one way. Most languages don't find the problem worth fixing statically, of course. There's another, more natural way of modeling the problem that does allow for static checking of this sort of thing. Basically, for each object type, you have a second type which is identical, with the exception that it can never hold null. In your above example, we'd then be able to give a type warning in the above case. The problem there is that you have to explicitly typecase (or typecast) every time you make a call to get() and get back a valid result. No languages bother here. Most languages just do an analysis, and try to figure out which uses might perform illegal operations on the void value, giving errors at only those points. That kind of analysis can be done statically, and leverages the type system, even though it doesn't have any obvious manifestations in the syntax. > > I got confused here for a second... this looks too much like an actual > > tuple use for me. :) Whatever syntax ends up getting used, can we use > > (x*y*z) to refer to the 3-tuple where arg 0 is of type x, 1 of y and 2 > > of z? That's a very common notation. > > hmm. I don't want to get into syntax wars. I'll use whatever syntax > you like for this discussion. The reason I used the above syntax is > that it was proposed and used lots in previous discussions. I'm not so much concerned about the final syntax... I just got confused seeing something that looked like a tuple, and not a type :) > > Well, you're going to want to keep complete type information around > > for the runtime to use anyway, so that doesn't really matter. > > yikes. I was hoping to avoid that, as it implies a major leap in > difficulty. It also seems like making this info available at runtime > may imply that a static type system isn't really feasible until > PY3000, which is a little disappointing. We'll see, I guess :) Why is that? I don't see a major leap in difficulty myself... > > There's been some work done in this area, but nothing that had > > actually been implemented last I checked. Why is it necessary? > > I'm just trying to understand the options, I don't have the knowledge > to rule a whole lot of things out at this point, and I feel the need > to understand things well enough to make those calls for myself. Fair enough. > >What > > do you have against a poly catcall rule? > > Nothing except that I don't know enough about it yet. Just found a > good description at > http://www.eiffel.com/doc/manuals/technology/typing/paper/page.html > > like I said before, I need to set aside more time to read! Now that I'm thinking about it, another good thing to read is Meyer's chapter on types in "Object-Oriented Software Construction" 2nd ed. John From John@list.org Sat Mar 18 01:27:08 2000 From: John@list.org (John Viega) Date: Fri, 17 Mar 2000 17:27:08 -0800 Subject: [Types-sig] A late entry In-Reply-To: <14545.26416.628396.276382@anthem.cnri.reston.va.us>; from bwarsaw@cnri.reston.va.us on Thu, Mar 16, 2000 at 05:58:56PM -0500 References: <38CEC33C.6AC199A1@viega.org> <20000315212628.A99258@chronis.pobox.com> <20000316112419.D3845@viega.org> <14545.21452.817822.231182@anthem.cnri.reston.va.us> <20000316141231.G3845@viega.org> <14545.26416.628396.276382@anthem.cnri.reston.va.us> Message-ID: <20000317172708.B12852@viega.org> On Thu, Mar 16, 2000 at 05:58:56PM -0500, bwarsaw@cnri.reston.va.us wrote: > > My prediction stands: it'll never get done, even if it were possible. > Meaning, I really don't think it's worth the effort, and I can't > imagine anybody actually spending the time to do it. You are probably not wrong :) John From John@list.org Sat Mar 18 01:55:03 2000 From: John@list.org (John Viega) Date: Fri, 17 Mar 2000 17:55:03 -0800 Subject: [Types-sig] A late entry In-Reply-To: <000301bf8fd9$6798b700$682d153f@tim>; from Tim Peters on Fri, Mar 17, 2000 at 01:24:09AM -0500 References: <14545.26416.628396.276382@anthem.cnri.reston.va.us> <000301bf8fd9$6798b700$682d153f@tim> Message-ID: <20000317175503.C12852@viega.org> On Fri, Mar 17, 2000 at 01:24:09AM -0500, Tim Peters wrote: > [Barry Warsaw, explaining the problems 'a would create for python-mode.el] > > ... > > You could also argue that Py3K shouldn't have to cater to a 20 year > > old technology like Emacs, and you'd probably be right. > > P3K should cater to Python programmers, though! > > 'a > > simply looks like an unterminated string, regardless of whether pymode or > Python programmers are looking at it. The syntactic context in which types are used is more than different enough for programmers, IMHO. I think you'd have a much bigger problem with "=" and "==". My personal goal for the syntax is to choose something natural to people with some familiarity with the concepts coming in, avoiding something really ugly. I think that saying the tick isn't very obvious to anyone not familiar with ML and similar languages is fair. I think formatting problems are fair. I think <> is probably going to be a bit more familiar to people, but look horrible in source. Hey, how about the backtick? :) Using that would mess up peoples emacs formatting in far fewer cases, if you tell emacs mode that ` is just a regular old operator :) Seriously, though, it's probably not all that likely that a syntax will be chosen where it's all things to all people, and that's to be expected. > The second most likely bad > interpretation will come from Lispers, viewing it as a symbol. So it's > simply poor notation for Python. 'a' would work, though! That's almost like saying that people with a lisp background are going to think there's a function call every time they see a left parenthesis. Plus, there are definitely reasons why the tick is common between types in ML and symbols in lisp. They both represent abstractions away from concrete values. Granted, there are some significant semantic differences, though :) > Haskell uses > unadorned letters for type parameters (i.e., the ML convention isn't > universal even among its relatives) -- but Haskell doesn't have inline > function declarations. Right... there are good reasons why Haskell doesn't need such syntactic garbage. You're also right that there isn't a universal syntax. ML's is pretty widely known as far as it goes, but the FP community is really small. There are all sorts of syntaxes for type variables... I know some languages start with "*" and then keep adding "*"'s as they need more vars in a type. > > > ... > > I'd also be interested in seeing what the IDLE developers think about > > such syntax changes. > > I expect that context-sensitive literal syntax is a non-starter regardless > of tool (don't forget PythonWorks, and tokenize.py, and pyclbr.py, and > kjlint, and untold mounds of homegrown stuff that also expects apostrophe to > mean string). Nooo, using a tick in type expressions doesn't do anything to affect whether that particular piece of syntax is regular, context-free or context-sensitive. Context-sensitive is definitely wrong; the entire type syntax I proposed is context-free, and not because of the ticks. Replace the ' with a `, + or a ~, and nothing has changed. The presence of the :- segregates the type quite distinctly... the tick itself doesn't even move the syntax into the context-free world... it's the ability to nest types (e.g., ('x ('y 'z ('x))) ) that brings the type language in the realm of the non-regular. > Don't get hung up on the spelling! As someone else wise once said, Guido is > a master of syntax, and will pick something *he* likes regardless of what we > recommend <0.9 wink>. In this particular case, I'm not really attached to the syntax; I'm just hard pressed to come up with an alternative that isn't at least as ugly. Point well taken, however. John From John@list.org Sun Mar 19 14:30:35 2000 From: John@list.org (John Viega) Date: Sun, 19 Mar 2000 06:30:35 -0800 Subject: [Types-sig] Re: A late entry In-Reply-To: <14546.24663.399881.121026@walden>; from Jeremy Hylton on Fri, Mar 17, 2000 at 11:56:42AM -0500 References: <200003171631.LAA23287@ns1.cnri.reston.va.us> <14546.24663.399881.121026@walden> Message-ID: <20000319063035.A16949@viega.org> On Fri, Mar 17, 2000 at 11:56:42AM -0500, Jeremy Hylton wrote: > > I have been uneasy about OR types, too. I think the primary source of > OR-ing is default arguments and various methods that implement > Pythonic method overloading. If we allow method overloading in the > type system -- to describe multiple valid signatures of a single > method object -- we might eliminate many of the problems. > > class Foo: > decl __init__(self, arg1: int, arg2: int) > decl __init__(self, arg1: string) > def __init__(self, arg1=None, arg2=None): > [...] > > I think this is a little simpler than the propopsal you made. It > merely provides a mechanism to define simple types for existing Python > code. Really? I think it ends up being more complex all around. First, you're going to end up having the same problems as prototypes, mainly what to do with argument names? Do you ignore them if they don't match between the declaration and definition, or treat it as an error? Since the argument names aren't actually valuable in the declaration, it's an bit of extra syntax... but removing it doesn't necessarily help. The second problem I see is that type declarations will sometimes not accompany the method definition. That will be slightly confusing. Of course, if you allow multiple definitions those definitions don't necessarily have to be in the same place (though you could add some ML-like syntax to force the issue). More importantly, if you do things this way, it's much more difficult to type check, because you can't assume the programmer wrote correct code, and now you have to enforce potentially complex constraints. Consider, for example: class Foo: decl __init__(self, arg1 :- int, arg2 :- float) decl __init__(self, arg1 :- float, arg2 :- string) def __init__(self, arg1, arg2): ... When we type check the actual definition, not only do we get complex OR'd types (int|float, float|string), but we have to maintain constraints between these types (if arg1 is a float then arg2 must be a string). Plus, it doesn't take very long to construct a program where you end up performing more dynamic type checks than you would if you just had to bind a call to a method at run time. I think it's too useful to be able to accurately type method bodies. Getting back to that ML-ish syntax, you could try something like: class Foo: def __init__: # No parens version __init__(self, arg1 :- int, arg2 :- float): ... version __init__(self, arg1 :- float, arg2 :- int): ... At least it would be easy to type check, and keep all definitions together, except those overloaded in a derived class. > > The other significant source of OR types is treating None as a > distinct type, which requires an OR type anywhere that you want to > pass an object or None. If we also eliminate that, there is little > need for OR types. Hopefully I've already argued effectively against this solution in other messages... John