From da@ski.org  Tue Aug  3 00:01:26 1999
From: da@ski.org (David Ascher)
Date: Mon, 2 Aug 1999 16:01:26 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
Message-ID: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>

An issue which has dogged the NumPy project is that there is (to my
knowledge) no way to pickle very large arrays without creating strings
which contain all of the data.  This can be a problem given that NumPy
arrays tend to be very large -- often several megabytes, sometimes much
bigger.  This slows things down, sometimes a lot, depending on the
platform. It seems that it should be possible to do something more
efficient.

Two alternatives come to mind:

 -- define a new pickling protocol which passes a file-like object to the
    instance and have the instance write itself to that file, being as
    efficient or inefficient as it cares to.  This protocol is used only
    if the instance/type defines the appropriate slot.  Alternatively,
    enrich the semantics of the getstate interaction, so that an object
    can return partial data and tell the pickling mechanism to come back
    for more.

 -- make pickling of objects which support the buffer interface use that
    inteface's notion of segments and use that 'chunk' size to do
    something more efficient if not necessarily most efficient.  (oh, and
    make NumPy arrays support the buffer interface =).  This is simple
    for NumPy arrays since we want to pickle "everything", but may not be
    what other buffer-supporting objects want. 

Thoughts?  Alternatives?

--david


From mhammond@skippinet.com.au  Tue Aug  3 01:41:23 1999
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Tue, 3 Aug 1999 10:41:23 +1000
Subject: [Python-Dev] Buffer interface in abstract.c?
Message-ID: <001001bedd48$ea796280$1101a8c0@bobcat>

Hi all,
	Im trying to slowly wean myself over to the buffer interfaces.

My exploration so far indicates that, for most cases, simply replacing
"PyString_FromStringAndSize" with "PyBuffer_FromMemory" handles the vast
majority of cases, and is preferred when the data contains arbitary bytes.
PyArg_ParseTuple("s#", ...) still works correctly as we would hope.

However, performing this explicitly is a pain.  Looking at getargs.c, the
code to achieve this is a little too convoluted to cut-and-paste each time.

Therefore, I would like to propose these functions to be added to
abstract.c:

int PyObject_GetBufferSize();
void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
const void *PyObject_GetReadOnlyBuffer();

Although equivalent functions exist for the buffer object, I can't see the
equivalent abstract implementations - ie, that work with any object
supporting the protocol.

Im willing to provide a patch if there is agreement a) the general idea is
good, and b) my specific spelling of the idea is OK (less likely -
PyBuffer_* seems better, but loses any implication of being abstract?).

Thoughts?

Mark.


From gstein@lyra.org  Tue Aug  3 02:51:43 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 02 Aug 1999 18:51:43 -0700
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat>
Message-ID: <37A64B2F.3386F0A9@lyra.org>

Mark Hammond wrote:
> ...
> Therefore, I would like to propose these functions to be added to
> abstract.c:
> 
> int PyObject_GetBufferSize();
> void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
> const void *PyObject_GetReadOnlyBuffer();
> 
> Although equivalent functions exist for the buffer object, I can't see the
> equivalent abstract implementations - ie, that work with any object
> supporting the protocol.
> 
> Im willing to provide a patch if there is agreement a) the general idea is
> good, and b) my specific spelling of the idea is OK (less likely -
> PyBuffer_* seems better, but loses any implication of being abstract?).

Marc-Andre proposed exactly the same thing back at the end of March (to
me and Guido). The two of us hashed out some of the stuff and M.A. came
up with a full patch for the stuff. Guido was relatively non-committal
at the point one way or another, but said they seemed fine. It appears
the stuff never made it into source control.

If Marc-Andre can resurface the final proposal/patch, then we'd be set.

Until then: use the bufferprocs :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Tue Aug  3 10:11:11 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 11:11:11 +0200
Subject: [Python-Dev] Pickling w/ low overhead
References: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
Message-ID: <37A6B22F.7A14BA2C@lemburg.com>

David Ascher wrote:
> 
> An issue which has dogged the NumPy project is that there is (to my
> knowledge) no way to pickle very large arrays without creating strings
> which contain all of the data.  This can be a problem given that NumPy
> arrays tend to be very large -- often several megabytes, sometimes much
> bigger.  This slows things down, sometimes a lot, depending on the
> platform. It seems that it should be possible to do something more
> efficient.
> 
> Two alternatives come to mind:
> 
>  -- define a new pickling protocol which passes a file-like object to the
>     instance and have the instance write itself to that file, being as
>     efficient or inefficient as it cares to.  This protocol is used only
>     if the instance/type defines the appropriate slot.  Alternatively,
>     enrich the semantics of the getstate interaction, so that an object
>     can return partial data and tell the pickling mechanism to come back
>     for more.
> 
>  -- make pickling of objects which support the buffer interface use that
>     inteface's notion of segments and use that 'chunk' size to do
>     something more efficient if not necessarily most efficient.  (oh, and
>     make NumPy arrays support the buffer interface =).  This is simple
>     for NumPy arrays since we want to pickle "everything", but may not be
>     what other buffer-supporting objects want.
> 
> Thoughts?  Alternatives?

Hmm, types can register their own pickling/unpickling functions
via copy_reg, so they can access the self.write method in pickle.py
to implement the write to file interface. Don't know how this
would be done for cPickle.c though.

For instances the situation is different since there is no
dispatching done on a per-class basis. I guess an optional argument
could help here.

Perhaps some lazy pickling wrapper would help fix this in general:
an object which calls back into the to-be-pickled object to
access the data rather than store the data in a huge string.

Yet another idea would be using memory mapped files instead
of strings as temporary storage (but this is probably hard to implement
right and not as portable).

Dunno... just some thoughts.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Tue Aug  3 08:50:33 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 09:50:33 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org>
Message-ID: <37A69F49.3575AE85@lemburg.com>

Greg Stein wrote:
> 
> Mark Hammond wrote:
> > ...
> > Therefore, I would like to propose these functions to be added to
> > abstract.c:
> >
> > int PyObject_GetBufferSize();
> > void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
> > const void *PyObject_GetReadOnlyBuffer();
> >
> > Although equivalent functions exist for the buffer object, I can't see the
> > equivalent abstract implementations - ie, that work with any object
> > supporting the protocol.
> >
> > Im willing to provide a patch if there is agreement a) the general idea is
> > good, and b) my specific spelling of the idea is OK (less likely -
> > PyBuffer_* seems better, but loses any implication of being abstract?).
> 
> Marc-Andre proposed exactly the same thing back at the end of March (to
> me and Guido). The two of us hashed out some of the stuff and M.A. came
> up with a full patch for the stuff. Guido was relatively non-committal
> at the point one way or another, but said they seemed fine. It appears
> the stuff never made it into source control.
> 
> If Marc-Andre can resurface the final proposal/patch, then we'd be set.

Below is the code I currently use. I don't really remember if this
is what Greg and I discussed a while back, but I'm sure he'll
correct me ;-) Note that you the buffer length is implicitly
returned by these APIs.

/* Takes an arbitrary object which must support the character (single
   segment) buffer interface and returns a pointer to a read-only
   memory location useable as character based input for subsequent
   processing.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsCharBuffer(PyObject *obj,
			  const char **buffer,
			  int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    const char *pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getcharbuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a character buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getcharbuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}

/* Same as PyObject_AsCharBuffer() except that this API expects a
   readable (single segment) buffer interface and returns a pointer
   to a read-only memory location which can contain arbitrary data.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsReadBuffer(PyObject *obj,
			  const void **buffer,
			  int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    void *pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getreadbuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a readable buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getreadbuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}

/* Takes an arbitrary object which must support the writeable (single
   segment) buffer interface and returns a pointer to a writeable
   memory location in buffer of size buffer_len.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsWriteBuffer(PyObject *obj,
			   void **buffer,
			   int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    void*pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getwritebuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a writeable buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getwritebuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}


-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Tue Aug  3 10:53:39 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Tue, 03 Aug 1999 11:53:39 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
 Tue, 03 Aug 1999 09:50:33 +0200 , <37A69F49.3575AE85@lemburg.com>
Message-ID: <19990803095339.E02CE303120@snelboot.oratrix.nl>

Why not pass the index to the As*Buffer routines as well and make getsegcount 
available too? Then you could code things like
  for(i=0; i<PyObject_GetBufferCount(obj); i++) {
	if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
		return -1;
	write(fp, buf, count);
  }

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein@lyra.org  Tue Aug  3 11:25:11 1999
From: gstein@lyra.org (Greg Stein)
Date: Tue, 03 Aug 1999 03:25:11 -0700
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <19990803095339.E02CE303120@snelboot.oratrix.nl>
Message-ID: <37A6C387.7360D792@lyra.org>

Jack Jansen wrote:
> 
> Why not pass the index to the As*Buffer routines as well and make getsegcount
> available too? Then you could code things like
>   for(i=0; i<PyObject_GetBufferCount(obj); i++) {
>         if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
>                 return -1;
>         write(fp, buf, count);
>   }

Simply because multiple segments hasn't been seen. All objects
supporting the buffer interface have a single segment. IMO, it is best
to drop the argument to make typical usage easier. For handling multiple
segments, a caller can use the raw interface rather than the handy
functions.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jim@digicool.com  Tue Aug  3 11:58:54 1999
From: jim@digicool.com (Jim Fulton)
Date: Tue, 03 Aug 1999 06:58:54 -0400
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat>
Message-ID: <37A6CB6E.C990F561@digicool.com>

Mark Hammond wrote:
> 
> Hi all,
>         Im trying to slowly wean myself over to the buffer interfaces.

OK, I'll bite.  Where is the buffer interface documented?  I found references
to it in various places (e.g. built-in buffer()) but didn't find the interface 
itself.

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mal@lemburg.com  Tue Aug  3 12:06:46 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 13:06:46 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <19990803095339.E02CE303120@snelboot.oratrix.nl>
Message-ID: <37A6CD46.642A9C6D@lemburg.com>

Jack Jansen wrote:
> 
> Why not pass the index to the As*Buffer routines as well and make getsegcount
> available too? Then you could code things like
>   for(i=0; i<PyObject_GetBufferCount(obj); i++) {
>         if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
>                 return -1;
>         write(fp, buf, count);
>   }

Well, just like Greg said, this is not much different than using the
buffer interface directly. While the above would be a handy
PyObject_WriteAsBuffer() kind of helper, I don't think that this
is really used all that much. E.g. in mxODBC I use the APIs
for accessing the raw char data in a buffer: the pointer is passed
directly to the ODBC APIs without copying, which makes things
quite fast. IMHO, this is the greatest advantage of the buffer
interface.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Aug  3 14:07:44 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake)
Date: Tue, 3 Aug 1999 09:07:44 -0400 (EDT)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <37A64B2F.3386F0A9@lyra.org>
References: <001001bedd48$ea796280$1101a8c0@bobcat>
 <37A64B2F.3386F0A9@lyra.org>
Message-ID: <14246.59808.561395.761772@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Until then: use the bufferprocs :-)

Greg,
  On the topic of the buffer interface: Have you written documentation 
for this that I can include in the API reference?  Bugging you about
this is on my to-do list.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From mal@lemburg.com  Tue Aug  3 12:29:43 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 13:29:43 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A6CB6E.C990F561@digicool.com>
Message-ID: <37A6D2A7.27F27554@lemburg.com>

Jim Fulton wrote:
> 
> Mark Hammond wrote:
> >
> > Hi all,
> >         Im trying to slowly wean myself over to the buffer interfaces.
> 
> OK, I'll bite.  Where is the buffer interface documented?  I found references
> to it in various places (e.g. built-in buffer()) but didn't find the interface
> itself.

I guess it's a read-the-source feature :-) Objects/bufferobject.c
and Include/object.h provide a start. Objects/stringobject.c has
a "sample" implementation.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Tue Aug  3 15:45:25 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Tue, 03 Aug 1999 16:45:25 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
 Tue, 03 Aug 1999 03:25:11 -0700 , <37A6C387.7360D792@lyra.org>
Message-ID: <19990803144526.6B796303120@snelboot.oratrix.nl>

> > Why not pass the index to the As*Buffer routines as well and make getsegcount
> > available too? 
> 
> Simply because multiple segments hasn't been seen. All objects
> supporting the buffer interface have a single segment.

Hmm. And I went out of my way to include this stupid multi-buffer stuff 
because the NumPy folks said they couldn't live without it (and one of the 
reasons for the buffer stuff was to allow NumPy arrays, which may be 
discontiguous, to be written efficiently).

Can someone confirm that the Numeric stuff indeed doesn't use this?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From da@ski.org  Tue Aug  3 17:19:19 1999
From: da@ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:19:19 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <37A6B22F.7A14BA2C@lemburg.com>
Message-ID: <Pine.WNT.4.04.9908030911550.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, M.-A. Lemburg wrote:

> Hmm, types can register their own pickling/unpickling functions
> via copy_reg, so they can access the self.write method in pickle.py
> to implement the write to file interface. 

Are you sure?  My understanding of copy_reg is, as stated in the doc:

pickle (type, function[, constructor]) 
    Declares that function should be used as a ``reduction'' function for
    objects of type or class type. function should return either a string
    or a tuple. The optional constructor parameter, if provided, is a
    callable object which can be used to reconstruct the object when
    called with the tuple of arguments returned by function at pickling
    time.  

How does one access the 'self.write method in pickle.py'?

> Perhaps some lazy pickling wrapper would help fix this in general:
> an object which calls back into the to-be-pickled object to
> access the data rather than store the data in a huge string.

Right.  That's an idea.

> Yet another idea would be using memory mapped files instead
> of strings as temporary storage (but this is probably hard to implement
> right and not as portable).

That's a very interesting idea!  I'll try that -- it might just be the
easiest way to do this.  I think that portability isn't a huge concern --
the folks who are coming up with the speed issue are on platforms which
have mmap support.

Thanks for the suggestions.

--david


From da@ski.org  Tue Aug  3 17:20:37 1999
From: da@ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:20:37 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <37A6C387.7360D792@lyra.org>
Message-ID: <Pine.WNT.4.04.9908030920070.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Greg Stein wrote:

> Simply because multiple segments hasn't been seen. All objects
> supporting the buffer interface have a single segment. IMO, it is best

FYI, if/when NumPy objects support the buffer API, they will require
multiple-segments.  


From da@ski.org  Tue Aug  3 17:23:31 1999
From: da@ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:23:31 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <19990803144526.6B796303120@snelboot.oratrix.nl>
Message-ID: <Pine.WNT.4.04.9908030921430.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Jack Jansen wrote:

> > > Why not pass the index to the As*Buffer routines as well and make getsegcount
> > > available too? 
> > 
> > Simply because multiple segments hasn't been seen. All objects
> > supporting the buffer interface have a single segment.
> 
> Hmm. And I went out of my way to include this stupid multi-buffer stuff 
> because the NumPy folks said they couldn't live without it (and one of the 
> reasons for the buffer stuff was to allow NumPy arrays, which may be 
> discontiguous, to be written efficiently).
> 
> Can someone confirm that the Numeric stuff indeed doesn't use this?

/usr/LLNLDistribution/Numerical/Include$ grep buffer *.h
/usr/LLNLDistribution/Numerical/Include$

Yes. =) 

See the other thread on low-overhead pickling.

But again, *if* multiarrays supported the buffer interface, they'd have to
use the multi-segment feature (repeating myself).

--david


From mal@lemburg.com  Tue Aug  3 20:17:16 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 21:17:16 +0200
Subject: [Python-Dev] Pickling w/ low overhead
References: <Pine.WNT.4.04.9908030911550.145-100000@rigoletto.ski.org>
Message-ID: <37A7403C.3BC05D02@lemburg.com>

David Ascher wrote:
> 
> On Tue, 3 Aug 1999, M.-A. Lemburg wrote:
> 
> > Hmm, types can register their own pickling/unpickling functions
> > via copy_reg, so they can access the self.write method in pickle.py
> > to implement the write to file interface.
> 
> Are you sure?  My understanding of copy_reg is, as stated in the doc:
> 
> pickle (type, function[, constructor])
>     Declares that function should be used as a ``reduction'' function for
>     objects of type or class type. function should return either a string
>     or a tuple. The optional constructor parameter, if provided, is a
>     callable object which can be used to reconstruct the object when
>     called with the tuple of arguments returned by function at pickling
>     time.
> 
> How does one access the 'self.write method in pickle.py'?

Ooops. Sorry, that doesn't work... well at least not using "normal"
Python ;-) You could of course simply go up one stack frame and
then grab the self object and then... well, you know...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From skip@mojam.com (Skip Montanaro)  Tue Aug  3 21:47:04 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue,  3 Aug 1999 15:47:04 -0500 (CDT)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
References: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
Message-ID: <14247.21628.225029.392711@dolphin.mojam.com>

    David> An issue which has dogged the NumPy project is that there is (to
    David> my knowledge) no way to pickle very large arrays without creating
    David> strings which contain all of the data.  This can be a problem
    David> given that NumPy arrays tend to be very large -- often several
    David> megabytes, sometimes much bigger.  This slows things down,
    David> sometimes a lot, depending on the platform. It seems that it
    David> should be possible to do something more efficient.

David,

Using __getstate__/__setstate__, could you create a compressed
representation using zlib or some other scheme?  I don't know how well
numeric data compresses in general, but that might help.  Also, I trust you
use cPickle when it's available, yes?

Skip Montanaro	| http://www.mojam.com/
skip@mojam.com  | http://www.musi-cal.com/~skip/
847-475-3758


From da@ski.org  Tue Aug  3 21:58:23 1999
From: da@ski.org (David Ascher)
Date: Tue, 3 Aug 1999 13:58:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <14247.21628.225029.392711@dolphin.mojam.com>
Message-ID: <Pine.WNT.4.04.9908031349090.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Skip Montanaro wrote:

> Using __getstate__/__setstate__, could you create a compressed
> representation using zlib or some other scheme?  I don't know how well
> numeric data compresses in general, but that might help.  Also, I trust you
> use cPickle when it's available, yes?

I *really* hate to admit it, but I've found the source of the most massive
problem in the pickling process that I was using.  I didn't use binary
mode, which meant that the huge strings were written & read
one-character-at-a-time.

I think I'll put a big fat note in the NumPy doc to that effect.

(note that luckily this just affected my usage, not all NumPy users).

<embarassed sheepish grin>

--da


From gstein@lyra.org  Wed Aug  4 20:15:27 1999
From: gstein@lyra.org (Greg Stein)
Date: Wed, 04 Aug 1999 12:15:27 -0700
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex
References: <199908041313.JAA26344@weyr.cnri.reston.va.us>
Message-ID: <37A8914F.6F5B9971@lyra.org>

Fred L. Drake wrote:
> 
> Update of /projects/cvsroot/python/dist/src/Doc/api
> In directory weyr:/home/fdrake/projects/python/Doc/api
> 
> Modified Files:
>         api.tex
> Log Message:
> 
> Started documentation on buffer objects & types.  Very preliminary.
> 
> Greg Stein:  Please help with this; it's your baby!
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins@python.org
> http://www.python.org/mailman/listinfo/python-checkins


All righty. I'll send some doc on this stuff. Somebody else did the
initial buffer interface, but it seems that it has fallen to me now :-)

Please give me a little while to get to this, though. I'm in and out of
town for the next four weeks. <SubtleAnnouncement> I'm in the process of
moving into a new house in Palo Alto, CA, and I'm travelling back and
forth until Anni and I move for real in September. </SubtleAnnouncement>

I should be able to get to this by the weekend, or possibly in a couple
weeks.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Aug  4 22:00:26 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 4 Aug 1999 17:00:26 -0400 (EDT)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex
In-Reply-To: <37A8914F.6F5B9971@lyra.org>
References: <199908041313.JAA26344@weyr.cnri.reston.va.us>
 <37A8914F.6F5B9971@lyra.org>
Message-ID: <14248.43498.664539.597656@weyr.cnri.reston.va.us>

Greg Stein writes:
 > All righty. I'll send some doc on this stuff. Somebody else did the
 > initial buffer interface, but it seems that it has fallen to me now :-)

  I was not aware that you were not the origin of this work; feel free 
to pass it to the right person.

 > Please give me a little while to get to this, though. I'm in and out of
 > town for the next four weeks. <SubtleAnnouncement> I'm in the process of
 > moving into a new house in Palo Alto, CA, and I'm travelling back and
 > forth until Anni and I move for real in September. </SubtleAnnouncement>

  Cool!

 > I should be able to get to this by the weekend, or possibly in a couple
 > weeks.

  That's good enough for me.  I expect it may be a couple of months or 
more before I try and get another release out with various fixes and
additions.  There's not a huge need to update the released doc set,
other than a few embarassing editorial...er, "oversights" (!).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From jack@oratrix.nl  Thu Aug  5 10:57:33 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Thu, 05 Aug 1999 11:57:33 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api
 api.tex
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
 Wed, 04 Aug 1999 12:15:27 -0700 , <37A8914F.6F5B9971@lyra.org>
Message-ID: <19990805095733.69D90303120@snelboot.oratrix.nl>

> All righty. I'll send some doc on this stuff. Somebody else did the
> initial buffer interface, but it seems that it has fallen to me now :-)

I think I did, but I gladly bequeath it to you. (Hmm, that's the first time I 
typed "bequeath", I think).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From fredrik@pythonware.com  Thu Aug  5 16:46:43 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Thu, 5 Aug 1999 17:46:43 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <Pine.WNT.4.04.9908030920070.145-100000@rigoletto.ski.org>
Message-ID: <009801bedf59$b8150020$f29b12c2@secret.pythonware.com>

> > Simply because multiple segments hasn't been seen. All objects
> > supporting the buffer interface have a single segment. IMO, it is best
> 
> FYI, if/when NumPy objects support the buffer API, they will require
> multiple-segments.  

same goes for PIL.  in the worst case, there's
one segment per line.

...

on the other hand, I think something is missing from
the buffer design; I definitely don't like that people
can write and marshal objects that happen to
implement the buffer interface, only to find that
Python didn't do what they expected...

>>> import unicode
>>> import marshal
>>> u = unicode.unicode
>>> s = u("foo")
>>> data = marshal.dumps(s)
>>> marshal.loads(data)
'f\000o\000o\000'
>>> type(marshal.loads(data))
<type 'string'>

as for PIL, I would also prefer if the exported buffer
corresponded to what you get from im.tostring().  iirc,
that cannot be done -- I cannot export via a temporary
memory buffer, since there's no way to know when to
get rid of it...

</F>


From jack@oratrix.nl  Thu Aug  5 21:59:46 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Thu, 05 Aug 1999 22:59:46 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Message by "Fredrik Lundh" <fredrik@pythonware.com> ,
 Thu, 5 Aug 1999 17:46:43 +0200 , <009801bedf59$b8150020$f29b12c2@secret.pythonware.com>
Message-ID: <19990805205952.531B9E267A@oratrix.oratrix.nl>

Recently, "Fredrik Lundh" <fredrik@pythonware.com> said:
> on the other hand, I think something is missing from
> the buffer design; I definitely don't like that people
> can write and marshal objects that happen to
> implement the buffer interface, only to find that
> Python didn't do what they expected...
> 
> >>> import unicode
> >>> import marshal
> >>> u = unicode.unicode
> >>> s = u("foo")
> >>> data = marshal.dumps(s)
> >>> marshal.loads(data)
> 'f\000o\000o\000'
> >>> type(marshal.loads(data))
> <type 'string'>

Hmm. Looking at the code there is a catchall at the end, with a
comment explicitly saying "Write unknown buffer-style objects as a string".
IMHO this is an incorrect design, but that's a bit philosophical (so
I'll gladly defer to Our Great Philosopher if he has anything to say
on the matter:-). Unless, of course, there are buffer-style non-string 
objects around that are better read back as strings than not read back 
at all.

Hmm again, I think I'd like it better if marshal.dumps() would barf on 
attempts to write unrepresentable data. Currently unrepresentable
objects are written as TYPE_UNKNOWN (unless they have bufferness (or
should I call that "a buffer-aspect"? :-)), which means you think you
are writing correctly marshalled data but you'll be in for an
exception when you try to read it back...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From akuchlin@mems-exchange.org  Thu Aug  5 23:24:03 1999
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 5 Aug 1999 18:24:03 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
Message-ID: <199908052224.SAA24159@amarok.cnri.reston.va.us>

A while back the suggestion was made that the mmapfile module be added
to the core distribution, and there was a guardedly positive reaction.
Should I go ahead and do that?  No one reported any problems when I
asked for bug reports, but that was probably because no one tried it;
putting it in the core would cause more people to try it.

I suppose this leads to a more important question: at what point
should we start checking 1.6-only things into the CVS tree?  For
example, once the current alphas of the re module are up to it
(they're not yet), when should they be checked in?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your
home.
    -- Terry Pratchett & Neil Gaiman, _Good Omens_


From bwarsaw@cnri.reston.va.us (Barry A. Warsaw)  Fri Aug  6 03:10:18 1999
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) (Barry A. Warsaw)
Date: Thu, 5 Aug 1999 22:10:18 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
References: <199908052224.SAA24159@amarok.cnri.reston.va.us>
Message-ID: <14250.17418.781127.684009@anthem.cnri.reston.va.us>

>>>>> "AMK" == Andrew M Kuchling <akuchlin@mems-exchange.org> writes:

    AMK> I suppose this leads to a more important question: at what
    AMK> point should we start checking 1.6-only things into the CVS
    AMK> tree?  For example, once the current alphas of the re module
    AMK> are up to it (they're not yet), when should they be checked
    AMK> in?

Good question.  I've had a bunch of people ask about the string
methods branch, which I'm assuming will be a 1.6 feature, and I'd like
to get that checked in at some point too.  I think what's holding this
up is that Guido hasn't decided whether there will be a patch release
to 1.5.2 or not.

-Barry


From tim_one@email.msn.com  Fri Aug  6 03:26:06 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 5 Aug 1999 22:26:06 -0400
Subject: [Python-Dev] mmapfile module
In-Reply-To: <199908052224.SAA24159@amarok.cnri.reston.va.us>
Message-ID: <000201bedfb3$09a99000$98a22299@tim>

[Andrew M. Kuchling]
> ...
> I suppose this leads to a more important question: at what point
> should we start checking 1.6-only things into the CVS tree?  For
> example, once the current alphas of the re module are up to it
> (they're not yet), when should they be checked in?

I'd like to see a bugfix release of 1.5.2 put out first, then have at it.
There are several bugfixes that ought to go out ASAP.  Thread tstate races,
the cpickle/cookie.py snafu, and playing nice with current Tcl/Tk pop to
mind immediately.  I'm skeptical that anyone other than Guido could decide
what *needs* to go out, so it's a good thing he's got nothing to do <wink>.

one-boy's-opinion-ly y'rs  - tim


From mhammond@skippinet.com.au  Fri Aug  6 04:30:55 1999
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Fri, 6 Aug 1999 13:30:55 +1000
Subject: [Python-Dev] mmapfile module
In-Reply-To: <000201bedfb3$09a99000$98a22299@tim>
Message-ID: <00a801bedfbc$1871a7e0$1101a8c0@bobcat>

[Tim laments]
> mind immediately.  I'm skeptical that anyone other than Guido
> could decide
> what *needs* to go out, so it's a good thing he's got nothing
> to do <wink>.

He has been very quiet recently - where are you hiding Guido.

> one-boy's-opinion-ly y'rs  - tim

Here is another.  Lets take a different tack - what has been checked in
since 1.5.2 that should _not_ go out - ie, is too controversial?

If nothing else, makes a good starting point, and may help Guido out:

Below summary of the CVS diff I just did, and categorized by my opinion.
It turns out that most of the changes would appear candidates.  While not
actually "bug-fixes", many have better documentation, removal of unused
imports etc, so would definately not hurt to get out. Looks like some build
issues have been fixed too.

Apart from possibly Tim's recent "UnboundLocalError" (which is the only
serious behaviour change) I can't see anything that should obviously be
ommitted.

Hopefully this is of interest...

[Disclaimer - lots of files here - it is quite possible I missed
something...]

Mark.


UNCONTROVERSIAL:
----------------
RCS file: /projects/cvsroot/python/dist/src/README,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cgi.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/ftplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/poplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/re.py,v
RCS file: /projects/cvsroot/python/dist/src/Tools/audiopy/README,v
  Doc changes.

RCS file: /projects/cvsroot/python/dist/src/Lib/SimpleHTTPServer.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cmd.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/htmllib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/netrc.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/pipes.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/pty.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/shlex.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/urlparse.py,v
  Remove unused imports

RCS file: /projects/cvsroot/python/dist/src/Lib/pdb.py,v
  Remove unused globals

RCS file: /projects/cvsroot/python/dist/src/Lib/popen2.py,v
  Change to cleanup

RCS file: /projects/cvsroot/python/dist/src/Lib/profile.py,v
  Remove unused imports and changes to comments.

RCS file: /projects/cvsroot/python/dist/src/Lib/pyclbr.py,v
  Better doc, and support for module level functions.

RCS file: /projects/cvsroot/python/dist/src/Lib/repr.py,v
  self.maxlist changed to self.maxdict

RCS file: /projects/cvsroot/python/dist/src/Lib/rfc822.py,v
  Doc changes, and better date handling.

RCS file: /projects/cvsroot/python/dist/src/configure,v
RCS file: /projects/cvsroot/python/dist/src/configure.in,v
  Looks like FreeBSD build flag changes.

RCS file: /projects/cvsroot/python/dist/src/Demo/classes/bitvec.py,v
RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v
  Whitespace fixes.

RCS file: /projects/cvsroot/python/dist/src/Demo/scripts/makedir.py,v
  Check we have passed a non empty string

RCS file: /projects/cvsroot/python/dist/src/Include/patchlevel.h,v
  1.5.2+

RCS file: /projects/cvsroot/python/dist/src/Lib/BaseHTTPServer.py,v
  Remove import rfc822 and more robust errors.

RCS file: /projects/cvsroot/python/dist/src/Lib/CGIHTTPServer.py,v
  Support for HTTP_COOKIE

RCS file: /projects/cvsroot/python/dist/src/Lib/fpformat.py,v
  NotANumber supports class exceptions.

RCS file: /projects/cvsroot/python/dist/src/Lib/macpath.py,v
  Use constants from stat module

RCS file: /projects/cvsroot/python/dist/src/Lib/macurl2path.py,v
  Minor changes to path parsing

RCS file: /projects/cvsroot/python/dist/src/Lib/mimetypes.py,v
  Recognise '.js': 'application/x-javascript',

RCS file: /projects/cvsroot/python/dist/src/Lib/sunau.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/wave.py,v
  Support for binary files.

RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v
  Reads file header to check for bsddb format.

RCS file: /projects/cvsroot/python/dist/src/Lib/xmllib.py,v
  XML may be at the start of the string, instead of the whole string.

RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/tkSimpleDialog.py,v
  Destroy method added.

RCS file: /projects/cvsroot/python/dist/src/Modules/cPickle.c,v
 As in the log :-)

RCS file: /projects/cvsroot/python/dist/src/Modules/cStringIO.c,v
  No longer a Py_FatalError on module init failure.

RCS file: /projects/cvsroot/python/dist/src/Modules/fpectlmodule.c,v
  Support for OSF in #ifdefs

RCS file: /projects/cvsroot/python/dist/src/Modules/makesetup,v
    # to handle backslashes for sh's that don't automatically
    # continue a read when the last char is a backslash

RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v
   Better error handling

RCS file: /projects/cvsroot/python/dist/src/Modules/timemodule.c,v
  #ifdef changes for __GNU_LIBRARY__/_GLIBC_

RCS file: /projects/cvsroot/python/dist/src/Python/errors.c,v
  Better error messages on Win32

RCS file: /projects/cvsroot/python/dist/src/Python/getversion.c,v
  Bigger buffer and strings.

RCS file: /projects/cvsroot/python/dist/src/Python/pystate.c,v
  Threading bug

RCS file: /projects/cvsroot/python/dist/src/Objects/floatobject.c,v
  Tim Peters writes:1. Fixes float divmod etc.

RCS file: /projects/cvsroot/python/dist/src/Objects/listobject.c,v
   Doc changes, and When deallocating a list, DECREF the items from the end
back to the start.

RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v
  Bug for to do with width of a formatspecifier

RCS file: /projects/cvsroot/python/dist/src/Objects/tupleobject.c,v
   Appropriate overflow checks so that things like sys.maxint*(1,)
can'tdump core.

RCS file: /projects/cvsroot/python/dist/src/Lib/tempfile.py,v
  don't cache attributes of type int

RCS file: /projects/cvsroot/python/dist/src/Lib/urllib.py,v
 Number of revisions.

RCS file: /projects/cvsroot/python/dist/src/Lib/aifc.py,v
  Chunk moved to new module.

RCS file: /projects/cvsroot/python/dist/src/Lib/audiodev.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/dbhash.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/dis.py,v
  Changes in comments.

RCS file: /projects/cvsroot/python/dist/src/Lib/cmpcache.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cmp.py,v
  New "shallow" arg.

RCS file: /projects/cvsroot/python/dist/src/Lib/dumbdbm.py,v
  Coerce f.tell() to int.

RCS file: /projects/cvsroot/python/dist/src/Modules/main.c,v
  Fix to tracebacks off by a line with -x

RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/Tkinter.py,v
  Number of changes you can review!

OTHERS:
--------

RCS file: /projects/cvsroot/python/dist/src/Lib/asynchat.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/asyncore.py,v
 Latest versions from Sam???

RCS file: /projects/cvsroot/python/dist/src/Lib/smtplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/sched.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/SocketServer.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/calendar.py,v  Sorry - out
of time to detail

RCS file: /projects/cvsroot/python/dist/src/Python/bltinmodule.c,v
  Unbound local, docstring, and better support for ExtensionClasses.

Freeze:
  Few changes

IDLE:
  Lotsa changes :-)

Number of .h files have #ifdef changes for CE I wont detail (but would be
great to get a few of these in - and I have more :-)

Tools directory:
  Number of changes - outa time to detail


From mal@lemburg.com  Fri Aug  6 09:54:20 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 06 Aug 1999 10:54:20 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl>
Message-ID: <37AAA2BC.466750B5@lemburg.com>

Jack Jansen wrote:
> 
> Recently, "Fredrik Lundh" <fredrik@pythonware.com> said:
> > on the other hand, I think something is missing from
> > the buffer design; I definitely don't like that people
> > can write and marshal objects that happen to
> > implement the buffer interface, only to find that
> > Python didn't do what they expected...
> >
> > >>> import unicode
> > >>> import marshal
> > >>> u = unicode.unicode
> > >>> s = u("foo")
> > >>> data = marshal.dumps(s)
> > >>> marshal.loads(data)
> > 'f\000o\000o\000'
> > >>> type(marshal.loads(data))
> > <type 'string'>

Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
that unicode objects use a two-byte character representation.

Note that implementing the char buffer interface will also give
you strange results with other code that uses
PyArg_ParseTuple(...,"s#",...), e.g. you could search through
Unicode strings as if they were normal 1-byte/char strings (and
most certainly not find what you're looking for, I guess).

> Hmm again, I think I'd like it better if marshal.dumps() would barf on
> attempts to write unrepresentable data. Currently unrepresentable
> objects are written as TYPE_UNKNOWN (unless they have bufferness (or
> should I call that "a buffer-aspect"? :-)), which means you think you
> are writing correctly marshalled data but you'll be in for an
> exception when you try to read it back...

I'd prefer an exception on write too.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   147 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Aug  6 15:44:35 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 6 Aug 1999 10:44:35 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <00a801bedfbc$1871a7e0$1101a8c0@bobcat>
References: <000201bedfb3$09a99000$98a22299@tim>
 <00a801bedfbc$1871a7e0$1101a8c0@bobcat>
Message-ID: <14250.62675.807129.878242@weyr.cnri.reston.va.us>

Mark Hammond writes:
 > Apart from possibly Tim's recent "UnboundLocalError" (which is the only
 > serious behaviour change) I can't see anything that should obviously be

  Since UnboundLocalError is a subclass of NameError (what you got
before) normally, and they are the same string when -X is used, this
only represents a new name in the __builtin__ module for legacy code.
This should not be a problem; the only real difference is that, using
class exceptions for built-in exceptions, you get more useful
information in your tracebacks.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From fredrik@pythonware.com  Sat Aug  7 11:51:56 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Sat, 7 Aug 1999 12:51:56 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com>
Message-ID: <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com>

> > > >>> import unicode
> > > >>> import marshal
> > > >>> u = unicode.unicode
> > > >>> s = u("foo")
> > > >>> data = marshal.dumps(s)
> > > >>> marshal.loads(data)
> > > 'f\000o\000o\000'
> > > >>> type(marshal.loads(data))
> > > <type 'string'>
> 
> Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> that unicode objects use a two-byte character representation.

>>> import array
>>> import marshal
>>> a = array.array
>>> s = a("f", [1, 2, 3])
>>> data = marshal.dumps(s)
>>> marshal.loads(data)
'\000\000\200?\000\000\000@\000\000@@'

looks like the various implementors haven't
really understood the intentions of whoever
designed the buffer interface...

</F>


From mal@lemburg.com  Sat Aug  7 17:14:56 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 07 Aug 1999 18:14:56 +0200
Subject: [Python-Dev] Some more constants for the socket module
Message-ID: <37AC5B80.56F740DD@lemburg.com>

Following the recent discussion on c.l.p about socket options,
I found that the socket module does not define all constants
defined in the (Linux) socket header file.

Below is a patch that adds a few more (note that the SOL_*
constants should be used for the setsockopt() level, not the
IPPROTO_* constants).


--- socketmodule.c~     Sat Aug  7 17:56:05 1999
+++ socketmodule.c      Sat Aug  7 18:10:07 1999
@@ -2005,14 +2005,48 @@ initsocket()
        PySocketSock_Type.tp_doc = sockettype_doc;
        Py_INCREF(&PySocketSock_Type);
        if (PyDict_SetItemString(d, "SocketType",
                                 (PyObject *)&PySocketSock_Type) != 0)
                return;
+
+       /* Address families (we only support AF_INET and AF_UNIX) */
+#ifdef AF_UNSPEC
+       insint(moddict, "AF_UNSPEC", AF_UNSPEC);
+#endif
        insint(d, "AF_INET", AF_INET);
 #ifdef AF_UNIX
        insint(d, "AF_UNIX", AF_UNIX);
 #endif /* AF_UNIX */
+#ifdef AF_AX25
+       insint(moddict, "AF_AX25", AF_AX25); /* Amateur Radio AX.25 */
+#endif
+#ifdef AF_IPX
+       insint(moddict, "AF_IPX", AF_IPX); /* Novell IPX */
+#endif
+#ifdef AF_APPLETALK
+       insint(moddict, "AF_APPLETALK", AF_APPLETALK); /* Appletalk DDP */
+#endif
+#ifdef AF_NETROM
+       insint(moddict, "AF_NETROM", AF_NETROM); /* Amateur radio NetROM */
+#endif
+#ifdef AF_BRIDGE
+       insint(moddict, "AF_BRIDGE", AF_BRIDGE); /* Multiprotocol bridge */
+#endif
+#ifdef AF_AAL5
+       insint(moddict, "AF_AAL5", AF_AAL5); /* Reserved for Werner's ATM */
+#endif
+#ifdef AF_X25
+       insint(moddict, "AF_X25", AF_X25); /* Reserved for X.25 project */
+#endif
+#ifdef AF_INET6
+       insint(moddict, "AF_INET6", AF_INET6); /* IP version 6 */
+#endif
+#ifdef AF_ROSE
+       insint(moddict, "AF_ROSE", AF_ROSE); /* Amateur Radio X.25 PLP */
+#endif
+
+       /* Socket types */
        insint(d, "SOCK_STREAM", SOCK_STREAM);
        insint(d, "SOCK_DGRAM", SOCK_DGRAM);
 #ifndef __BEOS__
 /* We have incomplete socket support. */
        insint(d, "SOCK_RAW", SOCK_RAW);
@@ -2048,11 +2082,10 @@ initsocket()
        insint(d, "SO_OOBINLINE", SO_OOBINLINE);
 #endif
 #ifdef SO_REUSEPORT
        insint(d, "SO_REUSEPORT", SO_REUSEPORT);
 #endif
-
 #ifdef SO_SNDBUF
        insint(d, "SO_SNDBUF", SO_SNDBUF);
 #endif
 #ifdef SO_RCVBUF
        insint(d, "SO_RCVBUF", SO_RCVBUF);
@@ -2111,14 +2144,43 @@ initsocket()
 #ifdef MSG_ETAG
        insint(d, "MSG_ETAG", MSG_ETAG);
 #endif
 
        /* Protocol level and numbers, usable for [gs]etsockopt */
-/* Sigh -- some systems (e.g. Linux) use enums for these. */
 #ifdef SOL_SOCKET
        insint(d, "SOL_SOCKET", SOL_SOCKET);
 #endif
+#ifdef  SOL_IP
+       insint(moddict, "SOL_IP", SOL_IP);
+#else
+       insint(moddict, "SOL_IP", 0);
+#endif
+#ifdef  SOL_IPX
+       insint(moddict, "SOL_IPX", SOL_IPX);
+#endif
+#ifdef  SOL_AX25
+       insint(moddict, "SOL_AX25", SOL_AX25);
+#endif
+#ifdef  SOL_ATALK
+       insint(moddict, "SOL_ATALK", SOL_ATALK);
+#endif
+#ifdef  SOL_NETROM
+       insint(moddict, "SOL_NETROM", SOL_NETROM);
+#endif
+#ifdef  SOL_ROSE
+       insint(moddict, "SOL_ROSE", SOL_ROSE);
+#endif
+#ifdef  SOL_TCP
+       insint(moddict, "SOL_TCP", SOL_TCP);
+#else
+       insint(moddict, "SOL_TCP", 6);
+#endif
+#ifdef  SOL_UDP
+       insint(moddict, "SOL_UDP", SOL_UDP);
+#else
+       insint(moddict, "SOL_UDP", 17);
+#endif
 #ifdef IPPROTO_IP
        insint(d, "IPPROTO_IP", IPPROTO_IP);
 #else
        insint(d, "IPPROTO_IP", 0);
 #endif
@@ -2266,10 +2328,32 @@ initsocket()
 #ifdef IP_ADD_MEMBERSHIP
        insint(d, "IP_ADD_MEMBERSHIP", IP_ADD_MEMBERSHIP);
 #endif
 #ifdef IP_DROP_MEMBERSHIP
        insint(d, "IP_DROP_MEMBERSHIP", IP_DROP_MEMBERSHIP);
+#endif
+#ifdef  IP_DEFAULT_MULTICAST_TTL
+       insint(moddict, "IP_DEFAULT_MULTICAST_TTL", IP_DEFAULT_MULTICAST_TTL);
+#endif
+#ifdef  IP_DEFAULT_MULTICAST_LOOP
+       insint(moddict, "IP_DEFAULT_MULTICAST_LOOP", IP_DEFAULT_MULTICAST_LOOP);
+#endif
+#ifdef  IP_MAX_MEMBERSHIPS
+       insint(moddict, "IP_MAX_MEMBERSHIPS", IP_MAX_MEMBERSHIPS);
+#endif
+
+       /* TCP options */
+#ifdef  TCP_NODELAY
+       insint(moddict, "TCP_NODELAY", TCP_NODELAY);
+#endif
+#ifdef  TCP_MAXSEG
+       insint(moddict, "TCP_MAXSEG", TCP_MAXSEG);
+#endif
+
+       /* IPX options */
+#ifdef  IPX_TYPE
+       insint(moddict, "IPX_TYPE", IPX_TYPE);
 #endif
 
        /* Initialize gethostbyname lock */
 #ifdef USE_GETHOSTBYNAME_LOCK
        gethostbyname_lock = PyThread_allocate_lock();

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   146 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein@lyra.org  Sat Aug  7 21:15:08 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 07 Aug 1999 13:15:08 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com>
Message-ID: <37AC93CC.53982F3F@lyra.org>

Fredrik Lundh wrote:
> 
> > > > >>> import unicode
> > > > >>> import marshal
> > > > >>> u = unicode.unicode
> > > > >>> s = u("foo")
> > > > >>> data = marshal.dumps(s)
> > > > >>> marshal.loads(data)
> > > > 'f\000o\000o\000'
> > > > >>> type(marshal.loads(data))
> > > > <type 'string'>

This was a "nicety" that was put during a round of patches that I
submitted to Guido. We both had questions about it but figured that it
couldn't hurt since it at least let some things be marshalled out that
couldn't be marshalled before.

I would suggest backing out the marshalling of buffer-interface objects
and adding a mechanism for arbitrary type objects to marshal themselves.
Without the second part, arrays and Unicode objects aren't marshallable
at all (seems bad).

> > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > that unicode objects use a two-byte character representation.

Unicode objects should *not* implement the getcharbuffer slot. Only
read, write, and segcount.

> >>> import array
> >>> import marshal
> >>> a = array.array
> >>> s = a("f", [1, 2, 3])
> >>> data = marshal.dumps(s)
> >>> marshal.loads(data)
> '\000\000\200?\000\000\000@\000\000@@'
> 
> looks like the various implementors haven't
> really understood the intentions of whoever
> designed the buffer interface...

Arrays can/should support both the getreadbuffer and getcharbuffer
interface. The former: definitely. The latter: only if the contents are
byte-sized.

The loading back as a string is a different matter, as pointed out
above.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jack@oratrix.nl  Sun Aug  8 21:20:52 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Sun, 08 Aug 1999 22:20:52 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
 Sat, 07 Aug 1999 13:15:08 -0700 , <37AC93CC.53982F3F@lyra.org>
Message-ID: <19990808202057.DB803E267A@oratrix.oratrix.nl>

Recently, Greg Stein <gstein@lyra.org> said:
> I would suggest backing out the marshalling of buffer-interface objects
> and adding a mechanism for arbitrary type objects to marshal themselves.
> Without the second part, arrays and Unicode objects aren't marshallable
> at all (seems bad).

This sounds like the right approach. It would require 2 slots in the
tp_ structure and a little extra glue for the typecodes (currently
marshall knows all the 1-letter typecodes for all objecttypes it can
handle, but types marshalling their own objects would require a
centralized registry of object types. For the time being it would
probably suffice to have the mapping of type<->letter be hardcoded in
marshal.h, but eventually you probably want a more extensible scheme,
where Joe R. Extension-Writer could add a marshaller to his objects
and know it won't collide with someone else's.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal@lemburg.com  Mon Aug  9 09:56:30 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 09 Aug 1999 10:56:30 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990808202057.DB803E267A@oratrix.oratrix.nl>
Message-ID: <37AE97BE.2CADF48E@lemburg.com>

Jack Jansen wrote:
> 
> Recently, Greg Stein <gstein@lyra.org> said:
> > I would suggest backing out the marshalling of buffer-interface objects
> > and adding a mechanism for arbitrary type objects to marshal themselves.
> > Without the second part, arrays and Unicode objects aren't marshallable
> > at all (seems bad).
> 
> This sounds like the right approach. It would require 2 slots in the
> tp_ structure and a little extra glue for the typecodes (currently
> marshall knows all the 1-letter typecodes for all objecttypes it can
> handle, but types marshalling their own objects would require a
> centralized registry of object types. For the time being it would
> probably suffice to have the mapping of type<->letter be hardcoded in
> marshal.h, but eventually you probably want a more extensible scheme,
> where Joe R. Extension-Writer could add a marshaller to his objects
> and know it won't collide with someone else's.

This registry should ideally be reachable via C APIs. Then a module
writer could call these APIs in the init function of his module and
he'd be set. Since marshal won't be able to handle imports on the
fly (like pickle et al.), these modules will have to be imported
before unmarshalling.

Aside: wouldn't it make sense to move from marshal to pickle and
depreciate marshal altogether ? cPickle is quite fast and much more
flexible than marshal, plus it already provides mechanisms for
registering new types.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   144 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Mon Aug  9 14:49:44 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 09 Aug 1999 15:49:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
 Mon, 09 Aug 1999 10:56:30 +0200 , <37AE97BE.2CADF48E@lemburg.com>
Message-ID: <19990809134944.BB2FC303120@snelboot.oratrix.nl>

> Aside: wouldn't it make sense to move from marshal to pickle and
> depreciate marshal altogether ? cPickle is quite fast and much more
> flexible than marshal, plus it already provides mechanisms for
> registering new types.

This is probably the best idea so far. Just remove the buffer-workaround in 
marshall, keep it functioning for the things it is used for now (like pyc 
files) and refer people to (c)Pickle for new development.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido@CNRI.Reston.VA.US  Mon Aug  9 15:50:46 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 10:50:46 -0400
Subject: [Python-Dev] Some more constants for the socket module
In-Reply-To: Your message of "Sat, 07 Aug 1999 18:14:56 +0200."
 <37AC5B80.56F740DD@lemburg.com>
References: <37AC5B80.56F740DD@lemburg.com>
Message-ID: <199908091450.KAA29179@eric.cnri.reston.va.us>

Thanks for the socketmodule patch, Marc.  This was on my mental TO-DO
list for a long time!  I've checked it in.

(One note: I had a bit of trouble applying the patch; apparently your
mailer expanded all tabs to spaces.  Perhaps you could use attachments 
to mail diffs?  Also, you seem to have renamed 'd' to 'moddict' but
you didn't send the patch for that...)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Mon Aug  9 17:26:28 1999
From: guido@python.org (Guido van Rossum)
Date: Mon, 09 Aug 1999 12:26:28 -0400
Subject: [Python-Dev] preferred conference date?
Message-ID: <199908091626.MAA29411@eric.cnri.reston.va.us>

I need your input about the date of the next Python conference.

Foretec is close to a deal for a Python conference in January 2000 at
the Alexandria Old Town Hilton hotel.  Given our requirement of a good
location in the DC area, this is a very good deal (it's a brand new
hotel).  The prices are high (they tell me that the whole conference
will cost $900, with a room rate of $129) but it's a class A location
(metro, tons of restaurants, close to National Airport, etc.) and we
have found no cheaper DC hotel suitable for our purposes (even in drab
suburban locations).

I'm worried that I'll be flamed to hell for this by the PSA members,
but I don't think we can get the price any lower without starting all
over in a different location, probably causing several months of
delay.  If people won't come, Foretec (and I) will have learned a
valuable lesson and we'll rethink the issue for the 2001 conference.

Anyway, given that Foretec is likely to go with this hotel, we have a
choice of two dates: January 16-19, or 23-26 (both starting on a
Sunday with the tutorials).  This is where I need your help: which
date would you prefer?  Please mail me personally.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@mojam.com (Skip Montanaro)  Mon Aug  9 17:31:43 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Mon,  9 Aug 1999 11:31:43 -0500 (CDT)
Subject: [Python-Dev] preferred conference date?
In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us>
References: <199908091626.MAA29411@eric.cnri.reston.va.us>
Message-ID: <14255.557.474160.824877@dolphin.mojam.com>

    Guido> The prices are high (they tell me that the whole conference will
    Guido> cost $900, with a room rate of $129) but it's a class A location

No way I (or my company) can afford to plunk down $900 for me to attend...

Skip


From mal@lemburg.com  Mon Aug  9 17:40:45 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 09 Aug 1999 18:40:45 +0200
Subject: [Python-Dev] Some more constants for the socket module
References: <37AC5B80.56F740DD@lemburg.com> <199908091450.KAA29179@eric.cnri.reston.va.us>
Message-ID: <37AF048D.FC0A540@lemburg.com>

Guido van Rossum wrote:
> 
> Thanks for the socketmodule patch, Marc.  This was on my mental TO-DO
> list for a long time!  I've checked it in.

Cool, thanks.
 
> (One note: I had a bit of trouble applying the patch; apparently your
> mailer expanded all tabs to spaces.  Perhaps you could use attachments
> to mail diffs?

Ok.

>  Also, you seem to have renamed 'd' to 'moddict' but
> you didn't send the patch for that...)

Oops, sorry... my "#define to insint" script uses 'd' as moddict,
that's the reason why.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   144 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido@CNRI.Reston.VA.US  Mon Aug  9 18:30:36 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 13:30:36 -0400
Subject: [Python-Dev] preferred conference date?
In-Reply-To: Your message of "Mon, 09 Aug 1999 11:31:43 CDT."
 <14255.557.474160.824877@dolphin.mojam.com>
References: <199908091626.MAA29411@eric.cnri.reston.va.us>
 <14255.557.474160.824877@dolphin.mojam.com>
Message-ID: <199908091730.NAA29559@eric.cnri.reston.va.us>

>     Guido> The prices are high (they tell me that the whole conference will
>     Guido> cost $900, with a room rate of $129) but it's a class A location
> 
> No way I (or my company) can afford to plunk down $900 for me to attend...

Let me clarify this.  The $900 is for the whole 4-day conference,
including a day of tutorials and developers' day.  I don't know what
the exact price breakdown will be, but the tutorials will probably be
$300.  Last year the total price was $700, with $250 for tutorials.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov@inrialpes.fr  Tue Aug 10 13:04:27 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 13:04:27 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
Message-ID: <199908101204.NAA29572@pukapuka.inrialpes.fr>

Currently, dictionaries always grow until they are deallocated from
memory. This happens in PyDict_SetItem according to the following
code (before inserting the new item into the dict):

        /* if fill >= 2/3 size, double in size */
        if (mp->ma_fill*3 >= mp->ma_size*2) {
                if (dictresize(mp, mp->ma_used*2) != 0) {
                        if (mp->ma_fill+1 > mp->ma_size)
                                return -1;
                }
        }

The symmetric case is missing and this has intrigued me for a long time,
but I've never had the courage to look deeply into this portion of code
and try to propose a solution. Which is: reduce the size of the dict by
half when the nb of used items <= 1/6 the size.

This situation occurs far less frequently than dict growing, but anyways,
it seems useful for the degenerate cases where a dict has a peek usage,
then most of the items are deleted. This is usually the case for global
dicts holding dynamic object collections, etc.

A bonus effect of shrinking big dicts with deleted items is that
the lookup speed may be improved, because of the cleaned <dummy> entries
and the reduced overall size (resulting in a better hit ratio).

The (only) solution I could came with for this pb is the appended patch.
It is not immediately obvious, but in practice, it seems to work fine.
(inserting a print statement after the condition, showing the dict size
 and current usage helps in monitoring what's going on).

Any other ideas on how to deal with this? Thoughts, comments?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** dictobject.c-1.5.2	Fri Aug  6 18:51:02 1999
--- dictobject.c	Tue Aug 10 12:21:15 1999
***************
*** 417,423 ****
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key); 
  	return 0;
  }
  
--- 417,430 ----
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key);
! 	/* For bigger dictionaries, if used <= 1/6 size, half the size */
! 	if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) {
! 		if (dictresize(mp, mp->ma_used*2) != 0) {
! 			if (mp->ma_fill > mp->ma_size)
! 				return -1;
! 		}	  
! 	}
  	return 0;
  }
  

From Vladimir.Marangozov@inrialpes.fr  Tue Aug 10 14:20:36 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 14:20:36 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 10, 99 01:04:27 pm"
Message-ID: <199908101320.OAA21986@pukapuka.inrialpes.fr>

I wrote:
> 
> The (only) solution I could came with for this pb is the appended patch.
> It is not immediately obvious, but in practice, it seems to work fine.
> (inserting a print statement after the condition, showing the dict size
>  and current usage helps in monitoring what's going on).
> 
> Any other ideas on how to deal with this? Thoughts, comments?
> 

To clarify a bit what the patch does "as is", here's a short description:

The code is triggered in PyDict_DelItem only for sizes which are > MINSIZE*4,
i.e. greater than 4*4 = 16. Therefore, resizing will occur for a min size of
32 items.

one third  32 / 3 = 10
two thirds 32 * 2/3 = 21

one sixth  32 / 6 = 5

So the shinking will happen for a dict size of 32, of which 5 items are used
(the sixth was just deleted).  After the dictresize, the size will be 16, of
which 5 items are used, i.e. one third.

The threshold is fixed by the first condition of the patch. It could be
made 64, instead of 32. This is subject to discussion...

Obviously, this is most useful for bigger dicts, not for small ones.
A threshold of 32 items seemed to me to be a reasonable compromise.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From fredrik@pythonware.com  Tue Aug 10 13:35:33 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 10 Aug 1999 14:35:33 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>
Message-ID: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>

Greg Stein <gstein@lyra.org> wrote:
> > > > > >>> import unicode
> > > > > >>> import marshal
> > > > > >>> u = unicode.unicode
> > > > > >>> s = u("foo")
> > > > > >>> data = marshal.dumps(s)
> > > > > >>> marshal.loads(data)
> > > > > 'f\000o\000o\000'
> > > > > >>> type(marshal.loads(data))
> > > > > <type 'string'>
>
> > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > > that unicode objects use a two-byte character representation.
> 
> Unicode objects should *not* implement the getcharbuffer slot. Only
> read, write, and segcount.

unicode objects do not implement the getcharbuffer slot.
here's the relevant descriptor:

static PyBufferProcs unicode_as_buffer = {
    (getreadbufferproc) unicode_buffer_getreadbuf,
    (getwritebufferproc) unicode_buffer_getwritebuf,
    (getsegcountproc) unicode_buffer_getsegcount
};

the array module uses a similar descriptor.

maybe the unicode class shouldn't implement the
buffer interface at all?  sure looks like the best way
to avoid trivial mistakes (the current behaviour of
fp.write(unicodeobj) is even more serious than the
marshal glitch...)

or maybe the buffer design needs an overhaul?

</F>


From guido@CNRI.Reston.VA.US  Tue Aug 10 15:12:23 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Tue, 10 Aug 1999 10:12:23 -0400
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Your message of "Tue, 10 Aug 1999 14:35:33 +0200."
 <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>
 <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <199908101412.KAA02065@eric.cnri.reston.va.us>

> Greg Stein <gstein@lyra.org> wrote:
> > > > > > >>> import unicode
> > > > > > >>> import marshal
> > > > > > >>> u = unicode.unicode
> > > > > > >>> s = u("foo")
> > > > > > >>> data = marshal.dumps(s)
> > > > > > >>> marshal.loads(data)
> > > > > > 'f\000o\000o\000'
> > > > > > >>> type(marshal.loads(data))
> > > > > > <type 'string'>
> >
> > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > > > that unicode objects use a two-byte character representation.
> > 
> > Unicode objects should *not* implement the getcharbuffer slot. Only
> > read, write, and segcount.
> 
> unicode objects do not implement the getcharbuffer slot.
> here's the relevant descriptor:
> 
> static PyBufferProcs unicode_as_buffer = {
>     (getreadbufferproc) unicode_buffer_getreadbuf,
>     (getwritebufferproc) unicode_buffer_getwritebuf,
>     (getsegcountproc) unicode_buffer_getsegcount
> };
> 
> the array module uses a similar descriptor.
> 
> maybe the unicode class shouldn't implement the
> buffer interface at all?  sure looks like the best way
> to avoid trivial mistakes (the current behaviour of
> fp.write(unicodeobj) is even more serious than the
> marshal glitch...)
> 
> or maybe the buffer design needs an overhaul?

I think most places that should use the charbuffer interface actually
use the readbuffer interface.  This is what should be fixed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal@lemburg.com  Tue Aug 10 18:53:56 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 19:53:56 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <37B06734.4339D3BF@lemburg.com>

Fredrik Lundh wrote:
> 
> unicode objects do not implement the getcharbuffer slot.
>...
> or maybe the buffer design needs an overhaul?

I think its usage does. The character slot should be used whenever
character data is needed, not the read buffer slot. The latter one is
for passing around raw binary data (without reinterpretation !),
if I understood Greg correctly back when I gave those abstract
APIs a try.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Tue Aug 10 18:39:29 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 19:39:29 +0200
Subject: [Python-Dev] shrinking dicts
References: <199908101204.NAA29572@pukapuka.inrialpes.fr>
Message-ID: <37B063D1.29F3106A@lemburg.com>

Vladimir Marangozov wrote:
> 
> Currently, dictionaries always grow until they are deallocated from
> memory. This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
> 
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }
> 
> The symmetric case is missing and this has intrigued me for a long time,
> but I've never had the courage to look deeply into this portion of code
> and try to propose a solution. Which is: reduce the size of the dict by
> half when the nb of used items <= 1/6 the size.
> 
> This situation occurs far less frequently than dict growing, but anyways,
> it seems useful for the degenerate cases where a dict has a peek usage,
> then most of the items are deleted. This is usually the case for global
> dicts holding dynamic object collections, etc.
> 
> A bonus effect of shrinking big dicts with deleted items is that
> the lookup speed may be improved, because of the cleaned <dummy> entries
> and the reduced overall size (resulting in a better hit ratio).
> 
> The (only) solution I could came with for this pb is the appended patch.
> It is not immediately obvious, but in practice, it seems to work fine.
> (inserting a print statement after the condition, showing the dict size
>  and current usage helps in monitoring what's going on).
> 
> Any other ideas on how to deal with this? Thoughts, comments?

I think that integrating this into the C code is not really that
effective since the situation will not occur that often and then
it often better to let the programmer decide rather than integrate
an automatic downsize.

You can call dict.update({}) to force an internal
resize (the empty dictionary can be made global since it is not
manipulated in any way and thus does not cause creation overhead).

Perhaps a new method .resize(approx_size) would make this even
clearer. This would also have the benefit of allowing a programmer
to force allocation of the wanted size, e.g.

d = {}
d.resize(10000)
# Insert 10000 items in a batch insert

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov@inrialpes.fr  Tue Aug 10 20:58:27 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 20:58:27 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <37B063D1.29F3106A@lemburg.com> from "M.-A. Lemburg" at "Aug 10, 99 07:39:29 pm"
Message-ID: <199908101958.UAA22028@pukapuka.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> [me]
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> I think that integrating this into the C code is not really that
> effective since the situation will not occur that often and then
> it often better to let the programmer decide rather than integrate
> an automatic downsize.

Agreed that the situation is rare. But if it occurs, its Python's
responsability to manage its data structures (and system resources)
efficiently. As a programmer, I really don't want to be bothered with
internals -- I trust the interpreter for that. Moreover, how could
I decide that at some point, some dict needs to be resized in my
fairly big app, say IDLE?

> 
> You can call dict.update({}) to force an internal
> resize (the empty dictionary can be made global since it is not
> manipulated in any way and thus does not cause creation overhead).

I know that I can force the resize in other ways, but this is not
the point. I'm usually against the idea of changing the programming
logic because of my advanced knowledge of the internals.

> 
> Perhaps a new method .resize(approx_size) would make this even
> clearer. This would also have the benefit of allowing a programmer
> to force allocation of the wanted size, e.g.
> 
> d = {}
> d.resize(10000)
> # Insert 10000 items in a batch insert

This is interesting, but the two ideas are not mutually excusive.
Python has to dowsize dicts automatically (just the same way it doubles
the size automatically). Offering more through an API is a plus for
hackers. ;-)

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal@lemburg.com  Tue Aug 10 21:19:46 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 22:19:46 +0200
Subject: [Python-Dev] shrinking dicts
References: <199908101958.UAA22028@pukapuka.inrialpes.fr>
Message-ID: <37B08962.6DFB3F0@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> >
> > [me]
> > > Any other ideas on how to deal with this? Thoughts, comments?
> >
> > I think that integrating this into the C code is not really that
> > effective since the situation will not occur that often and then
> > it often better to let the programmer decide rather than integrate
> > an automatic downsize.
> 
> Agreed that the situation is rare. But if it occurs, its Python's
> responsability to manage its data structures (and system resources)
> efficiently. As a programmer, I really don't want to be bothered with
> internals -- I trust the interpreter for that. Moreover, how could
> I decide that at some point, some dict needs to be resized in my
> fairly big app, say IDLE?

You usually don't ;-) because "normal" dict only grow (well, more or
less). The downsizing thing will only become a problem if you use
dictionaries in certain algorithms and there you handle the problem
manually.

My stack implementation uses the same trick, BTW. Memory is cheap
and with an extra resize method (which the mxStack implementation
has), problems can be dealt with explicitly for everyone to see
in the code.

> > You can call dict.update({}) to force an internal
> > resize (the empty dictionary can be made global since it is not
> > manipulated in any way and thus does not cause creation overhead).
> 
> I know that I can force the resize in other ways, but this is not
> the point. I'm usually against the idea of changing the programming
> logic because of my advanced knowledge of the internals.

True, that why I mentioned...
 
> >
> > Perhaps a new method .resize(approx_size) would make this even
> > clearer. This would also have the benefit of allowing a programmer
> > to force allocation of the wanted size, e.g.
> >
> > d = {}
> > d.resize(10000)
> > # Insert 10000 items in a batch insert
> 
> This is interesting, but the two ideas are not mutually excusive.
> Python has to dowsize dicts automatically (just the same way it doubles
> the size automatically). Offering more through an API is a plus for
> hackers. ;-)

It's not really for hackers: the point is that it makes the technique
visible and understandable (as opposed to the hack above). The same
could be useful for lists too (the hack there is l = [None] * size,
which I find rather difficult to understand at first sight...).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond@skippinet.com.au  Tue Aug 10 23:39:30 1999
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Wed, 11 Aug 1999 08:39:30 +1000
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <37B08962.6DFB3F0@lemburg.com>
Message-ID: <010901bee381$36ee5d30$1101a8c0@bobcat>

Looking over the messages from Marc and Vladimir, Im going to add my 2c
worth.

IMO, Marc's position is untenable iff it can be demonstrated that the
"average" program is likely to see "sparse" dictionaries, and such
dictionaries have an adverse effect on either speed or memory.

The analogy is quite simple - you dont need to manually resize lists or
dicts before inserting (to allocate more storage - an internal
implementation issue) so neither should you need to manually resize when
deleting (to reclaim that storage - still internal implementation).
Suggesting that the allocation of resources should be automatic, but the
recycling of them not be automatic flies in the face of everything else -
eg, you dont need to delete each object - when it is no longer referenced,
its memory is reclaimed automatically.

Marc's position is only reasonable if the specific case we are talking
about is very very rare, and unlikely to be hit by anyone with normal,
real-world requirements or programs.  In this case, exposing the
implementation detail is reasonable.

So, the question comes down to: "What is the benefit to Vladmir's patch?"

Maybe we need some metrics on some dictionaries.  For example, maybe a
doctored Python that kept stats for each dictionary and log this info.  The
output of this should be able to tell you what savings you could possibly
expect.  If you find that the average program really would not benefit at
all (say only a few K from a small number of dicts) then the horse was
probably dead well before we started flogging it.  If however you can
demonstrate serious benefits could be achieved, then interest may pick up
and I too would lobby for automatic downsizing.

Mark.


From tim_one@email.msn.com  Wed Aug 11 06:30:20 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 11 Aug 1999 01:30:20 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr>
Message-ID: <000001bee3ba$9b226f60$8d2d2399@tim>

[Vladimir]
> Currently, dictionaries always grow until they are deallocated from
> memory.

It's more accurate to say they never shrink <0.9 wink>.  Even that has
exceptions, though, starting with:

> This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
>
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }

This code can shrink the dict too.  The load factor computation is based on
"fill", but the resize is based on "used".  If you grow a huge dict, then
delete all the entries one by one, "used" falls to 0 but "fill" stays at its
high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
continues to climb as keys are added again:  when the load factor
computation triggers again, "used" may be as small as 1, and dictresize can
shrink the dict dramatically.

The only clear a priori return I see in your patch is that I might save
memory if I delete gobs of stuff from a dict and then neither get rid of it
nor add keys to it again.  But my programs generally grow dicts forever,
grow then delete them entirely, or cycle through fat and lean times (in
which case the code above already shrinks them from time to time).  So I
don't expect that your patch would be buy me anything I want, but would cost
me more on every delete.

> ...
> Any other ideas on how to deal with this? Thoughts, comments?

Just that slowing the expected case to prevent theoretical bad cases is
usually a net loss -- I think the onus is on you to demonstrate that this
change is an exception to that rule.  I do recall one real-life complaint
about it on c.l.py a couple years ago:  the poster had a huge dict,
eventually deleted most of the items, and then kept it around purely for
lookups.  They were happy enough to copy the dict into a fresh one a
key+value pair at a time; today they could just do

    d = d.copy()

or even

    d.update({})

to shrink the dict.

It would certainly be good to document these tricks!

if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
    see-why-1999-is-special-ly y'rs  - tim


From tim_one@email.msn.com  Wed Aug 11 07:45:49 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 11 Aug 1999 02:45:49 -0400
Subject: [Python-Dev] preferred conference date?
In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us>
Message-ID: <000201bee3c5$25b47b00$8d2d2399@tim>

[Guido]
> ...
> The prices are high (they tell me that the whole conference will cost
> $900, with a room rate of $129)

Is room rental in addition to, or included in, that $900?

> ...
> I'm worried that I'll be flamed to hell for this by the PSA members,

So have JulieK announce it <wink>.

> ...
> Anyway, given that Foretec is likely to go with this hotel, we have a
> choice of two dates: January 16-19, or 23-26 (both starting on a
> Sunday with the tutorials).  This is where I need your help: which
> date would you prefer?

23-26 for me; 16-19 may not be doable.

or-everyone-can-switch-to-windows-and-we'll-do-the-conference-via-
    netmeeting-ly y'rs  - tim


From Vladimir.Marangozov@inrialpes.fr  Wed Aug 11 15:33:17 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 11 Aug 1999 15:33:17 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <000001bee3ba$9b226f60$8d2d2399@tim> from "Tim Peters" at "Aug 11, 99 01:30:20 am"
Message-ID: <199908111433.PAA31842@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Vladimir]
> > Currently, dictionaries always grow until they are deallocated from
> > memory.
> 
> It's more accurate to say they never shrink <0.9 wink>.  Even that has
> exceptions, though, starting with:
> 
> > This happens in PyDict_SetItem according to the following
> > code (before inserting the new item into the dict):
> >
> >         /* if fill >= 2/3 size, double in size */
> >         if (mp->ma_fill*3 >= mp->ma_size*2) {
> >                 if (dictresize(mp, mp->ma_used*2) != 0) {
> >                         if (mp->ma_fill+1 > mp->ma_size)
> >                                 return -1;
> >                 }
> >         }
> 
> This code can shrink the dict too.  The load factor computation is based on
> "fill", but the resize is based on "used".  If you grow a huge dict, then
> delete all the entries one by one, "used" falls to 0 but "fill" stays at its
> high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
> continues to climb as keys are added again:  when the load factor
> computation triggers again, "used" may be as small as 1, and dictresize can
> shrink the dict dramatically.

Thanks for clarifying this!

> [snip]
> 
> > ...
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> Just that slowing the expected case to prevent theoretical bad cases is
> usually a net loss -- I think the onus is on you to demonstrate that this
> change is an exception to that rule.

I won't, because this case is rare in practice, classifying it already
as an exception. A real exception. I'll have to think a bit more about
all this. Adding 1/3 new entries to trigger the next resize sounds
suboptimal (if it happens at all).

> I do recall one real-life complaint
> about it on c.l.py a couple years ago:  the poster had a huge dict,
> eventually deleted most of the items, and then kept it around purely for
> lookups.  They were happy enough to copy the dict into a fresh one a
> key+value pair at a time; today they could just do
> 
>     d = d.copy()
> 
> or even
> 
>     d.update({})
> 
> to shrink the dict.
> 
> It would certainly be good to document these tricks!

I think that officializing these tricks in the documentation is a bad idea.

> 
> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
>     see-why-1999-is-special-ly y'rs  - tim
> 

This is a good (your favorite ;-) argument, but don't forget that you've
been around, teaching people various tricks.

And 1999 is special -- we just had a solar eclipse today, the next being
scheduled for 2081.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From fredrik@pythonware.com  Wed Aug 11 15:07:44 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Wed, 11 Aug 1999 16:07:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>             <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>  <199908101412.KAA02065@eric.cnri.reston.va.us>
Message-ID: <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>

> > or maybe the buffer design needs an overhaul?
> 
> I think most places that should use the charbuffer interface actually
> use the readbuffer interface.  This is what should be fixed.

ok.

btw, how about adding support for buffer access
to data that have strange internal formats (like cer-
tain PIL image memories) or isn't directly accessible
(like "virtual" and "abstract" image buffers in PIL 1.1).
something like:

int initbuffer(PyObject* obj, void** context);
int exitbuffer(PyObject* obj, void* context);

and corresponding context arguments to the
rest of the functions...

</F>


From guido@CNRI.Reston.VA.US  Wed Aug 11 15:42:10 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 11 Aug 1999 10:42:10 -0400
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Your message of "Wed, 11 Aug 1999 16:07:44 +0200."
 <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>
 <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>
Message-ID: <199908111442.KAA04423@eric.cnri.reston.va.us>

> btw, how about adding support for buffer access
> to data that have strange internal formats (like cer-
> tain PIL image memories) or isn't directly accessible
> (like "virtual" and "abstract" image buffers in PIL 1.1).
> something like:
> 
> int initbuffer(PyObject* obj, void** context);
> int exitbuffer(PyObject* obj, void* context);
> 
> and corresponding context arguments to the
> rest of the functions...

Can you explain this idea more?  Without more understanding of PIL I
have no idea what you're talking about...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one@email.msn.com  Thu Aug 12 06:15:39 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 12 Aug 1999 01:15:39 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908111433.PAA31842@pukapuka.inrialpes.fr>
Message-ID: <000301bee481$b78ae5c0$4e2d2399@tim>

[Tim]
>> ...slowing the expected case to prevent theoretical bad cases is
>> usually a net loss -- I think the onus is on you to demonstrate
>> that this change is an exception to that rule.

[Vladimir Marangozov]
> I won't, because this case is rare in practice, classifying it already
> as an exception. A real exception. I'll have to think a bit more about
> all this. Adding 1/3 new entries to trigger the next resize sounds
> suboptimal (if it happens at all).

"Suboptimal" with respect to which specific cost model?  Exhibiting a
specific bad case isn't compelling, and especially not when it's considered
to be "a real exception".  Adding new expense to every delete is an obvious
new burden -- where's the payback, and is the expected net effect amortized
across all dict usage a win or loss?  Offhand it sounds like a small loss to
me, although I haven't worked up a formal cost model either <wink>.

> ...
> I think that officializing these tricks in the documentation is a
> bad idea.

It's rarely a good idea to keep truths secret, although
implementation-du-jour tricks don't belong in the current doc set.  Probably
in a HowTo.

>> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-
>>     to-see-why-1999-is-special-ly y'rs  - tim

> This is a good (your favorite ;-) argument,

I actually hate that kind of argument -- it's one of *Guido's* favorites,
and in his current silent state I'm simply channeling him <wink>.

> but don't forget that you've been around, teaching people various
> tricks.

As I said, this particular trick has come up only once in real life in my
experience; it's never come up in my own code; it's an anti-FAQ.  People are
100x more likely to whine about theoretical quadratic-time list growth
nobody has ever encountered (although it looks like they may finally get it
under an out-of-the-box BDW collector!).

> And 1999 is special -- we just had a solar eclipse today, the next being
> scheduled for 2081.

Ya, like any of us will survive Y2K to see it <wink>.

1999-is-special-cuz-it's-the-end-of-civilization-ly y'rs  - tim


From Vladimir.Marangozov@inrialpes.fr  Thu Aug 12 19:22:06 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Thu, 12 Aug 1999 19:22:06 +0100 (NFT)
Subject: [Python-Dev] about line numbers
Message-ID: <199908121822.TAA40444@pukapuka.inrialpes.fr>

Just curious:

Is python with vs. without "-O" equivalent today regarding line numbers?
Are SET_LINENO opcodes a plus in some situations or not?

Next, I see quite often several SET_LINENO in a row in the beginning
of code objects due to doc strings, etc. Since I don't think that
folding them into one SET_LINENO would be an optimisation (it would
rather be avoiding the redundancy), is it possible and/or reasonable
to do something in this direction?

A trivial example:

>>> def f():
...     "This is a comment about f"   
...     a = 1
... 
>>> import dis
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2

          6 SET_LINENO          3
          9 LOAD_CONST          1 (1)
         12 STORE_FAST          0 (a)
         15 LOAD_CONST          2 (None)
         18 RETURN_VALUE   
>>>

Can the above become something like this instead:

          0 SET_LINENO          3
          3 LOAD_CONST          1 (1)
          6 STORE_FAST          0 (a)
          9 LOAD_CONST          2 (None)
         12 RETURN_VALUE


-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack@oratrix.nl  Thu Aug 12 23:02:06 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 13 Aug 1999 00:02:06 +0200
Subject: [Python-Dev] about line numbers
In-Reply-To: Message by Vladimir Marangozov <Vladimir.Marangozov@inrialpes.fr> ,
 Thu, 12 Aug 1999 19:22:06 +0100 (NFT) , <199908121822.TAA40444@pukapuka.inrialpes.fr>
Message-ID: <19990812220211.B3CED993@oratrix.oratrix.nl>

The only possible problem I can see with folding linenumbers is if
someone sets a breakpoint on such a line. And I think it'll be
difficult to explain the missing line numbers to pdb, so there isn't
an easy workaround (at least, it takes more than my 30 seconds of
brainpoewr to come up with one:-).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Vladimir.Marangozov@inrialpes.fr  Fri Aug 13 00:10:26 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 00:10:26 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <000301bee481$b78ae5c0$4e2d2399@tim> from "Tim Peters" at "Aug 12, 99 01:15:39 am"
Message-ID: <199908122310.AAA29618@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Tim]
> >> ...slowing the expected case to prevent theoretical bad cases is
> >> usually a net loss -- I think the onus is on you to demonstrate
> >> that this change is an exception to that rule.
> 
> [Vladimir Marangozov]
> > I won't, because this case is rare in practice, classifying it already
> > as an exception. A real exception. I'll have to think a bit more about
> > all this. Adding 1/3 new entries to trigger the next resize sounds
> > suboptimal (if it happens at all).
> 
> "Suboptimal" with respect to which specific cost model?  Exhibiting a
> specific bad case isn't compelling, and especially not when it's considered
> to be "a real exception".  Adding new expense to every delete is an obvious
> new burden -- where's the payback, and is the expected net effect amortized
> across all dict usage a win or loss?  Offhand it sounds like a small loss to
> me, although I haven't worked up a formal cost model either <wink>.

C'mon Tim, don't try to impress me with cost models. I'm already impressed :-)
Anyways, I've looked at some traces. As expected, the conclusion is that
this case is extremely rare wrt the average dict usage. There are 3 reasons:
(1) dicts are usually deleted entirely and (2) del d[key] is rare in practice
(3) often d[key] = None is used instead of (2).

There is, however, a small percentage of dicts which are used below 1/3 of
their size. I must say, below 1/3 of their peek size, because dowsizing
is also rare. To trigger a downsize, 1/3 new entries of the peek size must
be inserted.

Besides these observations, after looking at the code one more time, I can't
really understand why the resize logic is based on the "fill" watermark
and not on "used". fill = used + dummy, but since lookdict returns the
first free slot (null or dummy), I don't really see what's the point of
using a fill watermark... Perhaps you can enlighten me on this. Using only
the "used" metrics seems fine to me. I even deactivated "fill" and replaced
it with "used" to see what happens -- no visible changes, except a tiny
speedup I'm willing to neglect.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From Vladimir.Marangozov@inrialpes.fr  Fri Aug 13 00:21:48 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 00:21:48 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <19990812220211.B3CED993@oratrix.oratrix.nl> from "Jack Jansen" at "Aug 13, 99 00:02:06 am"
Message-ID: <199908122321.AAA29572@pukapuka.inrialpes.fr>

Jack Jansen wrote:
> 
> 
> The only possible problem I can see with folding linenumbers is if
> someone sets a breakpoint on such a line. And I think it'll be
> difficult to explain the missing line numbers to pdb, so there isn't
> an easy workaround (at least, it takes more than my 30 seconds of
> brainpoewr to come up with one:-).
> 

Eek! We can set a breakpoint on a doc string? :-) There's no code
in there. It should be treated as a comment by pdb. I can't set a
breakpoint on a comment line even in C ;-) There must be something
deeper about it...

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one@email.msn.com  Fri Aug 13 01:07:32 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 12 Aug 1999 20:07:32 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908121822.TAA40444@pukapuka.inrialpes.fr>
Message-ID: <000101bee51f$d7601de0$fb2d2399@tim>

[Vladimir Marangozov]
> Is python with vs. without "-O" equivalent today regarding
> line numbers?
>
> Are SET_LINENO opcodes a plus in some situations or not?

In theory it should make no difference, except that the trace mechanism
makes a callback on each SET_LINENO, and that's how the debugger implements
line-number breakpoints.  Under -O, there are no SET_LINENOs, so debugger
line-number breakpoints don't work under -O.

I think there's also a sporadic buglet, which I've never bothered to track
down:  sometimes a line number reported in a traceback under -O (&, IIRC,
it's always the topmost line number) comes out as a senseless negative
value.

> Next, I see quite often several SET_LINENO in a row in the beginning
> of code objects due to doc strings, etc. Since I don't think that
> folding them into one SET_LINENO would be an optimisation (it would
> rather be avoiding the redundancy), is it possible and/or reasonable
> to do something in this direction?

All opcodes consume time, although a wasted trip or two around the eval loop
at the start of a function isn't worth much effort to avoid.  Still, it's a
legitimate opportunity for provable speedup, even if unmeasurable speedup
<wink>.

Would be more valuable to rethink the debugger's breakpoint approach so that
SET_LINENO is never needed (line-triggered callbacks are expensive because
called so frequently, turning each dynamic SET_LINENO into a full-blown
Python call; if I used the debugger often enough to care <wink>, I'd think
about munging in a new opcode to make breakpoint sites explicit).

immutability-is-made-to-be-violated-ly y'rs  - tim


From tim_one@email.msn.com  Fri Aug 13 05:53:38 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Fri, 13 Aug 1999 00:53:38 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908122307.AAA06018@pukapuka.inrialpes.fr>
Message-ID: <000101bee547$cffaa020$992d2399@tim>

[Vladimir Marangozov, *almost* seems ready to give up on a counter-
 productive dict pessimization <wink>]

> ...
> There is, however, a small percentage of dicts which are used
> below 1/3 of their size. I must say, below 1/3 of their peek size,
> because dowsizing is also rare. To trigger a downsize, 1/3 new
> entries of the peek size must be inserted.

Not so, although "on average" 1/6 may be correct.  Look at an extreme:  Say
a dict has size 333 (it can't, but it makes the math obvious ...).  Say it
contains 221 items.  Now someone deletes them all, one at a time.  used==0
and fill==221 at this point.  They insert one new key that happens to hit
one of the 333-221 = 112 remaining NULL keys.  Then used==1 and fill==222.
They insert a 2nd key, and before the dict is searched the new fill of 222
triggers the 2/3rds load-factor resizing -- which asks for a new size of 1*2
== 2.

For the minority of dicts that go up and down in size wildly many times, the
current behavior is fine.

> Besides these observations, after looking at the code one more
> time, I can't really understand why the resize logic is based on
> the "fill" watermark and not on "used". fill = used + dummy, but
> since lookdict returns the first free slot (null or dummy), I don't
> really see what's the point of using a fill watermark...

Let's just consider an unsuccessful search.  Then it does return "the first"
free slot, but not necessarily at the time it *sees* the first free slot.
So long as it sees a dummy, it has to keep searching; the search doesn't end
until it finds a NULL.  So consider this, assuming the resize triggered only
on "used":

d = {}
for i in xrange(50000):
    d[random.randrange(1000000)] = 1
for k in d.keys():
    del d[k]
# now there are 50000 dummy dict keys, and some number of NULLs

# loop invariant:  used == 0
for i in xrange(sys.maxint):
    j = random.randrange(10000000)
    d[j] = 1
    del d[j]
    assert not d.has_key(i)

However many NULL slots remained, the last loop eventually transforms them
*all* into dummies.  The dummies act exactly like "real keys" with respect
to expected time for an unsuccessful search, which is why it's thoroughly
appropriate to include dummies in the load factor computation.  The loop
will run slower and slower as the percentage of dummies approaches 100%, and
each failing has_key approaches O(N) time.

In most hash table implementations that's the worst that can happen (and
it's a disaster), but under Python's implementation it's worse:  Python
never checks to see whether the probe sequence "wraps around", so the first
search after the last NULL is changed to a dummy never ends.

Counting the dummies in the load-factor computation prevents all that:  no
matter how much inserts and deletes are intermixed, the "effective load
factor" stays under 2/3rds so gives excellent expected-case behavior; and it
also protects against an all-dummy dict, making the lack of an expensive
inner-loop "wrapped around?" check safe.

> Perhaps you can enlighten me on this. Using only the "used" metrics
> seems fine to me. I even deactivated "fill" and replaced it with "used"
> to see what happens -- no visible changes, except a tiny speedup I'm
> willing to neglect.

You need a mix of deletes and inserts for the dummies to make a difference;
dicts that always grow don't have dummies, so they're not likely to have any
dummy-related problems either <wink>.  Try this (untested):

import time
from random import randrange
N = 1000
thatmany = [None] * N

while 1:
    start = time.clock()
    for i in thatmany:
        d[randrange(10000000)] = 1
    for i in d.keys():
        del d[i]
    finish = time.clock()
    print round(finish - start, 3)

Succeeding iterations of the outer loop should grow dramatically slower, and
finally get into an infinite loop, despite that "used" never exceeds N.

Short course rewording:  for purposes of predicting expected search time, a
dummy is the same as a live key, because finding a dummy doesn't end a
search -- it has to press on until either finding the key it was looking
for, or finding a NULL.  And with a mix of insertions and deletions, and if
the hash function is doing a good job, then over time all the slots in the
table will become either live or dummy, even if "used" stays within a very
small range.

So, that's why <wink>.

dictobject-may-be-the-subtlest-object-there-is-ly y'rs  - tim


From gstein@lyra.org  Fri Aug 13 10:13:55 1999
From: gstein@lyra.org (Greg Stein)
Date: Fri, 13 Aug 1999 02:13:55 -0700 (PDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org>

On Tue, 10 Aug 1999, Fredrik Lundh wrote:
>...
> unicode objects do not implement the getcharbuffer slot.

This is Goodness. All righty.

>...
> maybe the unicode class shouldn't implement the
> buffer interface at all?  sure looks like the best way

It is needed for fp.write(unicodeobj) ...

It is also very handy for C functions to deal with Unicode strings.

> to avoid trivial mistakes (the current behaviour of
> fp.write(unicodeobj) is even more serious than the
> marshal glitch...)

What's wrong with fp.write(unicodeobj)? It should write the unicode value
to the file. Are you suggesting that it will need to be done differently?
Icky.

> or maybe the buffer design needs an overhaul?

Not that I know of. 

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Fri Aug 13 11:36:13 1999
From: gstein@lyra.org (Greg Stein)
Date: Fri, 13 Aug 1999 03:36:13 -0700 (PDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <199908101412.KAA02065@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>

On Tue, 10 Aug 1999, Guido van Rossum wrote:
>...
> > or maybe the buffer design needs an overhaul?
> 
> I think most places that should use the charbuffer interface actually
> use the readbuffer interface.  This is what should be fixed.

I believe that I properly changed all of these within the core
distribution. Per your requested design, third-party extensions must
switch from "s#" to "t#" to move to the charbuffer interface, as needed. 

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Vladimir.Marangozov@inrialpes.fr  Fri Aug 13 14:47:05 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 14:47:05 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm"
Message-ID: <199908131347.OAA30740@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Vladimir Marangozov, *almost* seems ready to give up on a counter-
>  productive dict pessimization <wink>]
> 

Of course I will! Now everything is perfectly clear. Thanks.

> ...
> So, that's why <wink>.
> 

Now, *this* one explanation of yours should go into a HowTo/BecauseOf
for developers. I timed your scripts and a couple of mine which attest
(again) the validity of the current implementation. My patch is out of
bounds. It even disturbs from time to time the existing harmony in the
results ;-) because of early resizing.

All in all, for performance reasons, dicts remain an exception to the
rule of releasing memory ASAP. They have been designed to tolerate caching
because of their dynamics, which is the main reason for the rare case
addressed by my patch.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal@lemburg.com  Fri Aug 13 18:27:19 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 13 Aug 1999 19:27:19 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
Message-ID: <37B45577.7772CAA1@lemburg.com>

Greg Stein wrote:
> 
> On Tue, 10 Aug 1999, Guido van Rossum wrote:
> >...
> > > or maybe the buffer design needs an overhaul?
> >
> > I think most places that should use the charbuffer interface actually
> > use the readbuffer interface.  This is what should be fixed.
> 
> I believe that I properly changed all of these within the core
> distribution. Per your requested design, third-party extensions must
> switch from "s#" to "t#" to move to the charbuffer interface, as needed.

Shouldn't this be the other way around ? After all, extensions
using "s#" do expect character data and not arbitrary binary
encodings of information. IMHO, the latter should be special
cased, not the prior. E.g. it doesn't make sense to use the
re module to scan over 2-byte Unicode with single character
based search patterns.

Aside: Is the buffer interface reachable in any way from within
Python ? Why isn't the interface exposed via __XXX__ methods
on normal Python instances (could be implemented by returning a
buffer object) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Aug 13 16:32:40 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Aug 1999 11:32:40 -0400 (EDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B45577.7772CAA1@lemburg.com>
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
 <37B45577.7772CAA1@lemburg.com>
Message-ID: <14260.15000.398399.840716@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > Aside: Is the buffer interface reachable in any way from within
 > Python ? Why isn't the interface exposed via __XXX__ methods
 > on normal Python instances (could be implemented by returning a
 > buffer object) ?

  Would it even make sense?  I though a large part of the intent was
to for performance, avoiding memory copies.  Perhaps there should be
an .__as_buffer__() which returned an object that supports the C
buffer interface.  I'm not sure how useful it would be; perhaps for
classes that represent image data?  They could return a buffer object
created from a string/array/NumPy array.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From fredrik@pythonware.com  Fri Aug 13 16:59:12 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Fri, 13 Aug 1999 17:59:12 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org><37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us>
Message-ID: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>

>   Would it even make sense?  I though a large part of the intent was
> to for performance, avoiding memory copies.

looks like there's some confusion here over
what the buffer interface is all about.  time
for a new GvR essay, perhaps?

</F>


From Fred L. Drake, Jr." <fdrake@acm.org  Fri Aug 13 17:22:09 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Aug 1999 12:22:09 -0400 (EDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
 <37B45577.7772CAA1@lemburg.com>
 <14260.15000.398399.840716@weyr.cnri.reston.va.us>
 <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>
Message-ID: <14260.17969.497916.382752@weyr.cnri.reston.va.us>

Fredrik Lundh writes:
 > looks like there's some confusion here over
 > what the buffer interface is all about.  time
 > for a new GvR essay, perhaps?

  If he'll write something about it, I'll be glad to adapt it to the
extending & embedding manual.  It seems important that it be included
in the standard documentation since it will be important for extension 
writers to understand when they should implement it.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From fredrik@pythonware.com  Fri Aug 13 17:34:46 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Fri, 13 Aug 1999 18:34:46 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us>
Message-ID: <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com>

Guido van Rossum wrote:
> > btw, how about adding support for buffer access
> > to data that have strange internal formats (like cer-
> > tain PIL image memories) or isn't directly accessible
> > (like "virtual" and "abstract" image buffers in PIL 1.1).
> > something like:
> > 
> > int initbuffer(PyObject* obj, void** context);
> > int exitbuffer(PyObject* obj, void* context);
> > 
> > and corresponding context arguments to the
> > rest of the functions...
> 
> Can you explain this idea more?  Without more understanding of PIL I
> have no idea what you're talking about...

in code:

    void* context;

    // this can be done at any time
    segments = pb->getsegcount(obj, NULL, context);

    if (!pb->bf_initbuffer(obj, &context))
        ... failed to initialise buffer api ...
    
    ... allocate segment size buffer ...

    pb->getsegcount(obj, &bytes, context);
    ... calculate total buffer size and allocate buffer ...

    for (i = offset = 0; i < segments; i++) {
        n = pb->getreadbuffer(obj, i, &p, context);
        if (n < 0)
            ... failed to fetch a given segment ...
        memcpy(buf + offset, p, n); // or write to file, or whatevef
        offset = offset + n;
   }

   pb->bf_exitbuffer(obj, context);

in other words, this would given the target object a
chance to keep some local context (like a temporary
buffer) during a sequence of buffer operations...

for PIL, this would make it possible to

1) store required metadata (size, mode, palette)
along with the actual buffer contents.

2) possibly pack formats that use extra internal
storage for performance reasons -- RGB pixels
are stored as 32-bit integers, for example.

3) access virtual image memories (that can only
be accessed via a buffer-like interface in them-
selves -- given an image object, you acquire an
access handle, and use a getdata method to
access the actual data.  without initbuffer,
there's no way to do two buffer access in
parallel.  without exitbuffer, there's no way
to release the access handle.  without the
context variable, there's nowhere to keep
the access handle between calls.)

4) access abstract image memories (like virtual
memories, but they reside outside PIL, like on
a remote server, or inside another image pro-
cessing library, or on a hardware device).

5) convert to external formats on the fly:

    fp.write(im.buffer("JPEG"))

and probably a lot more.  as far as I can tell,
nothing of this can be done using the current
design...

...

besides, what about buffers and threads?  if you
return a pointer from getreadbuf, wouldn't it be
good to know exactly when Python doesn't need
that pointer any more?  explicit initbuffer/exitbuffer
calls around each sequence of buffer operations
would make that a lot safer...

</F>


From mal@lemburg.com  Fri Aug 13 20:16:44 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Fri, 13 Aug 1999 21:16:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
 <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us>
Message-ID: <37B46F1C.1A513F33@lemburg.com>

Fred L. Drake, Jr. wrote:
> 
> M.-A. Lemburg writes:
>  > Aside: Is the buffer interface reachable in any way from within
>  > Python ? Why isn't the interface exposed via __XXX__ methods
>  > on normal Python instances (could be implemented by returning a
>  > buffer object) ?
> 
>   Would it even make sense?  I though a large part of the intent was
> to for performance, avoiding memory copies.  Perhaps there should be
> an .__as_buffer__() which returned an object that supports the C
> buffer interface.  I'm not sure how useful it would be; perhaps for
> classes that represent image data?  They could return a buffer object
> created from a string/array/NumPy array.

That's what I had in mind.

def __getreadbuffer__(self):
    return buffer(self.data)

def __getcharbuffer__(self):
    return buffer(self.string_data)

def __getwritebuffer__(self):
    return buffer(self.mmaped_file)

Note that buffer() does not copy the data, it only adds a reference
to the object being used.

Hmm, how about adding a writeable binary object to the core ?
This would be useful for the __getwritebbuffer__() API because
currently, I think, only mmap'ed files are useable as write
buffers -- no other in-memory type. Perhaps buffer objects
could be used for this purpose too, e.g. by having them
allocate the needed memory chunk in case you pass None as
object.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Fri Aug 13 22:48:12 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Fri, 13 Aug 1999 23:48:12 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
Message-ID: <19990813214817.5393C1C4742@oratrix.oratrix.nl>

This week again I was bitten by the fact that Python doesn't have any
form of weak references, and while I was toying with some ideas I came 
up with the following quick-and-dirty scheme that I thought I'd bounce 
off this list. I might even volunteer to implement it, if people agree 
it is worth it:-)

We add a new builtin function (or a module with that function)
weak(). This returns a weak reference to the object passed as a
parameter. A weak object has one method: strong(), which returns the
corresponding real object or raises an exception if the object doesn't 
exist anymore. For convenience we could add a method exists() that
returns true if the real object still exists.

Now comes the bit that I'm unsure about: to implement this I need to
add a pointer to every object. This pointer is either NULL or points
to the corresponding weak objectt (so for every object there is either no
weak reference object or exactly one). But, for the price of 4 bytes extra
in every object we get the nicety that there is little cpu-overhead:
refcounting macros work identical to the way they do now, the only
thing to take care of is that during object deallocation we have to
zero the weak pointer. (actually: we could make do with a single bit
in every object, with the bit meaning "this object has an associated
weak object". We could then use a global dictionary indexed by object
address to find the weak object)

From here on life is easy: the weak object is a normal refcounted
object with a pointer to the real object as its only data. weak()
creates the weak object if it doesn't exist and returns the existing
(and INCREFfed) weak object if it does. Strong() checks that
self->object->weak == self and returns self->object (INCREFfed) if it
is. This works on all platforms that I'm aware of, but it could break
if there are any (Python) platforms that can have objects at VM
addresses that are later, when the object has been free()d, become
invalid addresses. And even then a vmaddrvalid() function, only needed 
in the strong() method, could solve this.

The weak object isn't transparent, because you have to call strong()
before you can do anything with it, but this is an advantage (says he, 
aspiring to a career in politics or sales:-): with a transparent weak
object the object could disappear at unexpected moments and with this
scheme it can't, because when you have the object itself in hand you
have a refcount too.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal@lemburg.com  Sat Aug 14 00:15:39 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 01:15:39 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org>
Message-ID: <37B4A71B.2073875F@lemburg.com>

Greg Stein wrote:
> 
> On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > maybe the unicode class shouldn't implement the
> > buffer interface at all?  sure looks like the best way
> 
> It is needed for fp.write(unicodeobj) ...
> 
> It is also very handy for C functions to deal with Unicode strings.

Wouldn't a special C API be (even) more convenient ?

> > to avoid trivial mistakes (the current behaviour of
> > fp.write(unicodeobj) is even more serious than the
> > marshal glitch...)
> 
> What's wrong with fp.write(unicodeobj)? It should write the unicode value
> to the file. Are you suggesting that it will need to be done differently?
> Icky.

Would this also write some kind of Unicode encoding header ?
[Sorry, this is my Unicode ignorance shining through... I only
 remember lots of talk about these things on the string-sig.]

Since fp.write() uses "s#" this would use the getreadbuffer
slot in 1.5.2... I think what it *should* do is use the
getcharbuffer slot instead (see my other post), since dumping
the raw unicode data would loose too much information. Again,
such things should be handled by extra methods, e.g. fp.rawwrite().

Hmm, I guess the philosophy behind the interface is not
really clear. Binary data is fetched via getreadbuffer and then
interpreted as character data... I always thought that the
getcharbuffer should be used for such an interpretation.

Or maybe, we should dump the getcharbufer slot again and
use the getreadbuffer information just as we would a
void* pointer in C: with no explicit or implicit type information.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein@lyra.org  Sat Aug 14 09:53:04 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 01:53:04 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com>
Message-ID: <37B52E70.2D957546@lyra.org>

M.-A. Lemburg wrote:
> 
> Greg Stein wrote:
> >
> > On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > > maybe the unicode class shouldn't implement the
> > > buffer interface at all?  sure looks like the best way
> >
> > It is needed for fp.write(unicodeobj) ...
> >
> > It is also very handy for C functions to deal with Unicode strings.
> 
> Wouldn't a special C API be (even) more convenient ?

Why? Accessing the Unicode values as a series of bytes matches exactly
to the semantics of the buffer interface. Why throw in Yet Another
Function?

Your abstract.c functions make it quite simple.

> > > to avoid trivial mistakes (the current behaviour of
> > > fp.write(unicodeobj) is even more serious than the
> > > marshal glitch...)
> >
> > What's wrong with fp.write(unicodeobj)? It should write the unicode value
> > to the file. Are you suggesting that it will need to be done differently?
> > Icky.
> 
> Would this also write some kind of Unicode encoding header ?
> [Sorry, this is my Unicode ignorance shining through... I only
>  remember lots of talk about these things on the string-sig.]

Absolutely not. Placing the Byte Order Mark (BOM) into an output stream
is an application-level task. It should never by done by any subsystem.

There are no other "encoding headers" that would go into the output
stream. The output would simply be UTF-16 (2-byte values in host byte
order).

> Since fp.write() uses "s#" this would use the getreadbuffer
> slot in 1.5.2... I think what it *should* do is use the
> getcharbuffer slot instead (see my other post), since dumping
> the raw unicode data would loose too much information. Again,

I very much disagree. To me, fp.write() is not about writing characters
to a stream. I think it makes much more sense as "writing bytes to a
stream" and the buffer interface fits that perfectly.

There is no loss of data. You could argue that the byte order is lost,
but I think that is incorrect. The application defines the semantics:
the file might be defined as using host-order, or the application may be
writing a BOM at the head of the file.

> such things should be handled by extra methods, e.g. fp.rawwrite().

I believe this would be a needless complication of the interface.

> Hmm, I guess the philosophy behind the interface is not
> really clear.

I didn't design or implement it initially, but (as you may have guessed)
I am a proponent of its existence.

> Binary data is fetched via getreadbuffer and then
> interpreted as character data... I always thought that the
> getcharbuffer should be used for such an interpretation.

The former is bad behavior. That is why getcharbuffer was added (by me,
for 1.5.2). It was a preventative measure for the introduction of
Unicode strings. Using getreadbuffer for characters would break badly
given a Unicode string. Therefore, "clients" that want (8-bit)
characters from an object supporting the buffer interface should use
getcharbuffer. The Unicode object doesn't implement it, implying that it
cannot provide 8-bit characters. You can get the raw bytes thru
getreadbuffer.

> Or maybe, we should dump the getcharbufer slot again and
> use the getreadbuffer information just as we would a
> void* pointer in C: with no explicit or implicit type information.

Nope. That path is frought with failure :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Sat Aug 14 11:21:51 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 12:21:51 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <19990813214817.5393C1C4742@oratrix.oratrix.nl>
Message-ID: <37B5433F.61CE6F76@lemburg.com>

Jack Jansen wrote:
> 
> This week again I was bitten by the fact that Python doesn't have any
> form of weak references, and while I was toying with some ideas I came
> up with the following quick-and-dirty scheme that I thought I'd bounce
> off this list. I might even volunteer to implement it, if people agree
> it is worth it:-)

Have you checked the weak reference dictionary implementation
by Dieter Maurer ? It's at:

	http://www.handshake.de/~dieter/weakdict.html

While I like the idea of having weak references in the core,
I think 4 extra bytes for *every* object is just a little
too much. The flag bit idea (with the added global dictionary
of weak referenced objects) looks promising though.

BTW, how would this be done in JPython ? I guess it doesn't
make much sense there because cycles are no problem for the
Java VM GC.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Sat Aug 14 13:30:45 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 14:30:45 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org>
Message-ID: <37B56175.23ABB350@lemburg.com>

Greg Stein wrote:
> 
> M.-A. Lemburg wrote:
> >
> > Greg Stein wrote:
> > >
> > > On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > > > maybe the unicode class shouldn't implement the
> > > > buffer interface at all?  sure looks like the best way
> > >
> > > It is needed for fp.write(unicodeobj) ...
> > >
> > > It is also very handy for C functions to deal with Unicode strings.
> >
> > Wouldn't a special C API be (even) more convenient ?
> 
> Why? Accessing the Unicode values as a series of bytes matches exactly
> to the semantics of the buffer interface. Why throw in Yet Another
> Function?

I meant PyUnicode_* style APIs for dealing with all the aspects
of Unicode objects -- much like the PyString_* APIs available.
 
> Your abstract.c functions make it quite simple.

BTW, do we need an extra set of those with buffer index or not ?
Those would really be one-liners for the sake of hiding the
type slots from applications.

> > > > to avoid trivial mistakes (the current behaviour of
> > > > fp.write(unicodeobj) is even more serious than the
> > > > marshal glitch...)
> > >
> > > What's wrong with fp.write(unicodeobj)? It should write the unicode value
> > > to the file. Are you suggesting that it will need to be done differently?
> > > Icky.
> >
> > Would this also write some kind of Unicode encoding header ?
> > [Sorry, this is my Unicode ignorance shining through... I only
> >  remember lots of talk about these things on the string-sig.]
> 
> Absolutely not. Placing the Byte Order Mark (BOM) into an output stream
> is an application-level task. It should never by done by any subsystem.
> 
> There are no other "encoding headers" that would go into the output
> stream. The output would simply be UTF-16 (2-byte values in host byte
> order).

Ok.

> > Since fp.write() uses "s#" this would use the getreadbuffer
> > slot in 1.5.2... I think what it *should* do is use the
> > getcharbuffer slot instead (see my other post), since dumping
> > the raw unicode data would loose too much information. Again,
> 
> I very much disagree. To me, fp.write() is not about writing characters
> to a stream. I think it makes much more sense as "writing bytes to a
> stream" and the buffer interface fits that perfectly.

This is perfectly ok, but shouldn't the behaviour of fp.write()
mimic that of previous Python versions ? How does JPython
write the data ?

Inlined different subject:
I think the internal semantics of "s#" using the getreadbuffer slot
and "t#" the getcharbuffer slot should be switched; see my other post.
In previous Python versions "s#" had the semantics of string data
with possibly embedded NULL bytes. Now it suddenly has the meaning
of binary data and you can't simply change extensions to use the
new "t#" because people are still using them with older Python
versions.
 
> There is no loss of data. You could argue that the byte order is lost,
> but I think that is incorrect. The application defines the semantics:
> the file might be defined as using host-order, or the application may be
> writing a BOM at the head of the file.

The problem here is that many application were not written
to handle these kind of objects. Previously they could only
handle strings, now they can suddenly handle any object
having the buffer interface and then fail when the data
gets read back in.

> > such things should be handled by extra methods, e.g. fp.rawwrite().
> 
> I believe this would be a needless complication of the interface.

It would clarify things and make the interface 100% backward
compatible again.
 
> > Hmm, I guess the philosophy behind the interface is not
> > really clear.
> 
> I didn't design or implement it initially, but (as you may have guessed)
> I am a proponent of its existence.
> 
> > Binary data is fetched via getreadbuffer and then
> > interpreted as character data... I always thought that the
> > getcharbuffer should be used for such an interpretation.
> 
> The former is bad behavior. That is why getcharbuffer was added (by me,
> for 1.5.2). It was a preventative measure for the introduction of
> Unicode strings. Using getreadbuffer for characters would break badly
> given a Unicode string. Therefore, "clients" that want (8-bit)
> characters from an object supporting the buffer interface should use
> getcharbuffer. The Unicode object doesn't implement it, implying that it
> cannot provide 8-bit characters. You can get the raw bytes thru
> getreadbuffer.

I agree 100%, but did you add the "t#" instead of having
"s#" use the getcharbuffer interface ? E.g. my mxTextTools
package uses "s#" on many APIs. Now someone could stick
in a Unicode object and get pretty strange results without
any notice about mxTextTools and Unicode being incompatible.
You could argue that I change to "t#", but that doesn't
work since many people out there still use Python versions
<1.5.2 and those didn't have "t#", so mxTextTools would then
fail completely for them.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein@lyra.org  Sat Aug 14 12:34:17 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 04:34:17 -0700
Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?))
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com>
Message-ID: <37B55439.683272D2@lyra.org>

M.-A. Lemburg wrote:
>...
> I meant PyUnicode_* style APIs for dealing with all the aspects
> of Unicode objects -- much like the PyString_* APIs available.

Sure, these could be added as necessary. For raw access to the bytes, I
would refer people to the abstract buffer functions, tho.

> > Your abstract.c functions make it quite simple.
> 
> BTW, do we need an extra set of those with buffer index or not ?
> Those would really be one-liners for the sake of hiding the
> type slots from applications.

It sounds like NumPy and PIL would need it, which makes the landscape
quite a bit different from the last time we discussed this (when we
didn't imagine anybody needing those).

>...
> > > Since fp.write() uses "s#" this would use the getreadbuffer
> > > slot in 1.5.2... I think what it *should* do is use the
> > > getcharbuffer slot instead (see my other post), since dumping
> > > the raw unicode data would loose too much information. Again,
> >
> > I very much disagree. To me, fp.write() is not about writing characters
> > to a stream. I think it makes much more sense as "writing bytes to a
> > stream" and the buffer interface fits that perfectly.
> 
> This is perfectly ok, but shouldn't the behaviour of fp.write()
> mimic that of previous Python versions ? How does JPython
> write the data ?

fp.write() had no semantics for writing Unicode objects since they
didn't exist. Therefore, we are not breaking or changing any behavior.

> Inlined different subject:
> I think the internal semantics of "s#" using the getreadbuffer slot
> and "t#" the getcharbuffer slot should be switched; see my other post.

1) Too late
2) The use of "t#" ("text") for the getcharbuffer slot was decided by
the Benevolent Dictator.
3) see (2)

> In previous Python versions "s#" had the semantics of string data
> with possibly embedded NULL bytes. Now it suddenly has the meaning
> of binary data and you can't simply change extensions to use the
> new "t#" because people are still using them with older Python
> versions.

Guido and I had a pretty long discussion on what the best approach here
was. I think we even pulled in Tim as a final arbiter, as I recall.

I believe "s#" remained getreadbuffer simply because it *also* meant
"give me the bytes of that object". If it changed to getcharbuffer, then
you could see exceptions in code that didn't raise exceptions
beforehand.

(more below)

> > There is no loss of data. You could argue that the byte order is lost,
> > but I think that is incorrect. The application defines the semantics:
> > the file might be defined as using host-order, or the application may be
> > writing a BOM at the head of the file.
> 
> The problem here is that many application were not written
> to handle these kind of objects. Previously they could only
> handle strings, now they can suddenly handle any object
> having the buffer interface and then fail when the data
> gets read back in.

An application is a complete unit. How are you suddenly going to
manifest Unicode objects within that application? The only way is if the
developer goes in and changes things; let them deal with the issues and
fallout of their change. The other is external changes such as an
upgrade to the interpreter or a module. Again, (IMO) if you're
perturbing a system, then you are responsible for also correcting any
problems you introduce.

In any case, Guido's position was that things can easily switch over to
the "t#" interface to prevent the class of error where you pass a
Unicode string to a function that expects a standard string.

> > > such things should be handled by extra methods, e.g. fp.rawwrite().
> >
> > I believe this would be a needless complication of the interface.
> 
> It would clarify things and make the interface 100% backward
> compatible again.

No. "s#" used to pull bytes from any buffer-capable object. Your
suggestion for "s#" to use the getcharbuffer could introduce exceptions
into currently-working code.

(this was probably Guido's prime motivation for the currently meaning of
"t#"... I can dig up the mail thread if people need an authoritative
commentary on the decision that was made)

> > > Hmm, I guess the philosophy behind the interface is not
> > > really clear.
> >
> > I didn't design or implement it initially, but (as you may have guessed)
> > I am a proponent of its existence.
> >
> > > Binary data is fetched via getreadbuffer and then
> > > interpreted as character data... I always thought that the
> > > getcharbuffer should be used for such an interpretation.
> >
> > The former is bad behavior. That is why getcharbuffer was added (by me,
> > for 1.5.2). It was a preventative measure for the introduction of
> > Unicode strings. Using getreadbuffer for characters would break badly
> > given a Unicode string. Therefore, "clients" that want (8-bit)
> > characters from an object supporting the buffer interface should use
> > getcharbuffer. The Unicode object doesn't implement it, implying that it
> > cannot provide 8-bit characters. You can get the raw bytes thru
> > getreadbuffer.
> 
> I agree 100%, but did you add the "t#" instead of having
> "s#" use the getcharbuffer interface ?

Yes. For reasons detailed above.

> E.g. my mxTextTools
> package uses "s#" on many APIs. Now someone could stick
> in a Unicode object and get pretty strange results without
> any notice about mxTextTools and Unicode being incompatible.

They could also stick in an array of integers. That supports the buffer
interface, meaning the "s#" in your code would extract the bytes from
it. In other words, people can already stick bogus stuff into your code.

This seems to be a moot argument.

> You could argue that I change to "t#", but that doesn't
> work since many people out there still use Python versions
> <1.5.2 and those didn't have "t#", so mxTextTools would then
> fail completely for them.

If support for the older versions is needed, then use an #ifdef to set
up the appropriate macro in some header. Use that throughout your code.

In any case: yes -- I would argue that you should absolutely be using
"t#".

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From fredrik@pythonware.com  Sat Aug 14 14:19:07 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Sat, 14 Aug 1999 15:19:07 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com>
Message-ID: <003101bee657$972d1550$f29b12c2@secret.pythonware.com>

M.-A. Lemburg <mal@lemburg.com> wrote:
> I meant PyUnicode_* style APIs for dealing with all the aspects
> of Unicode objects -- much like the PyString_* APIs available.

it's already there, of course.  see unicode.h
in the unicode distribution (Mark is hopefully
adding this to 1.6 in this very moment...)

> > I very much disagree. To me, fp.write() is not about writing characters
> > to a stream. I think it makes much more sense as "writing bytes to a
> > stream" and the buffer interface fits that perfectly.
> 
> This is perfectly ok, but shouldn't the behaviour of fp.write()
> mimic that of previous Python versions ? How does JPython
> write the data ?

the crucial point is how an average user expects things
to work.  the current design is quite assymmetric -- you
can easily *write* things that implement the buffer inter-
face to a stream, but how the heck do you get them
back?

(as illustrated by the marshal buglet...)

</F>


From fredrik@pythonware.com  Sat Aug 14 16:21:48 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Sat, 14 Aug 1999 17:21:48 +0200
Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?))
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org>
Message-ID: <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com>

Greg Stein <gstein@lyra.org> wrote:
> > E.g. my mxTextTools
> > package uses "s#" on many APIs. Now someone could stick
> > in a Unicode object and get pretty strange results without
> > any notice about mxTextTools and Unicode being incompatible.
> 
> They could also stick in an array of integers. That supports the buffer
> interface, meaning the "s#" in your code would extract the bytes from
> it. In other words, people can already stick bogus stuff into your code.

Except that people may expect unicode strings
to work just like any other kind of string, while
arrays are surely a different thing.

I'm beginning to suspect that the current buffer
design is partially broken; it tries to work around
at least two problems at once:

a) the current use of "string" objects for two purposes:
as strings of 8-bit characters, and as buffers containing
arbitrary binary data.

b) performance issues when reading/writing certain kinds
of data to/from streams.

and fails to fully address either of them.

</F>


From mal@lemburg.com  Sat Aug 14 17:30:21 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 18:30:21 +0200
Subject: [Python-Dev] Re: buffer design
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org>
Message-ID: <37B5999D.201EA88C@lemburg.com>

Greg Stein wrote:
> 
> M.-A. Lemburg wrote:
> >...
> > I meant PyUnicode_* style APIs for dealing with all the aspects
> > of Unicode objects -- much like the PyString_* APIs available.
> 
> Sure, these could be added as necessary. For raw access to the bytes, I
> would refer people to the abstract buffer functions, tho.

I guess that's up to them... PyUnicode_AS_WCHAR() could also be
exposed I guess (are C's wchar strings useable as Unicode basis ?).

> > > Your abstract.c functions make it quite simple.
> >
> > BTW, do we need an extra set of those with buffer index or not ?
> > Those would really be one-liners for the sake of hiding the
> > type slots from applications.
> 
> It sounds like NumPy and PIL would need it, which makes the landscape
> quite a bit different from the last time we discussed this (when we
> didn't imagine anybody needing those).

Ok, then I'll add them and post the new set next week.
 
> >...
> > > > Since fp.write() uses "s#" this would use the getreadbuffer
> > > > slot in 1.5.2... I think what it *should* do is use the
> > > > getcharbuffer slot instead (see my other post), since dumping
> > > > the raw unicode data would loose too much information. Again,
> > >
> > > I very much disagree. To me, fp.write() is not about writing characters
> > > to a stream. I think it makes much more sense as "writing bytes to a
> > > stream" and the buffer interface fits that perfectly.
> >
> > This is perfectly ok, but shouldn't the behaviour of fp.write()
> > mimic that of previous Python versions ? How does JPython
> > write the data ?
> 
> fp.write() had no semantics for writing Unicode objects since they
> didn't exist. Therefore, we are not breaking or changing any behavior.

The problem is hidden in polymorph functions and tools: previously
they could not handle anything but strings, now they also work
on arbitrary buffers without raising exceptions. That's what I'm
concerned about.
 
> > Inlined different subject:
> > I think the internal semantics of "s#" using the getreadbuffer slot
> > and "t#" the getcharbuffer slot should be switched; see my other post.
> 
> 1) Too late
> 2) The use of "t#" ("text") for the getcharbuffer slot was decided by
> the Benevolent Dictator.
> 3) see (2)

1) It's not too late: most people aren't even aware of the buffer
interface (except maybe the small crowd on this list).
 
2) A mistake in patchlevel release of Python can easily be undone
in the next minor release. No big deal.

3) Too remain even compatible to 1.5.2 in future revisions, a
new explicit marker, e.g. "r#" for raw data, could be added to hold the
code for getreadbuffer. "s#" and "z#" should then switch 
to using getcharbuffer.

> > In previous Python versions "s#" had the semantics of string data
> > with possibly embedded NULL bytes. Now it suddenly has the meaning
> > of binary data and you can't simply change extensions to use the
> > new "t#" because people are still using them with older Python
> > versions.
> 
> Guido and I had a pretty long discussion on what the best approach here
> was. I think we even pulled in Tim as a final arbiter, as I recall.

What was the final argument then ? (I guess the discussion was
held *before* the addition of getcharbuffer, right ?)
 
> I believe "s#" remained getreadbuffer simply because it *also* meant
> "give me the bytes of that object". If it changed to getcharbuffer, then
> you could see exceptions in code that didn't raise exceptions
> beforehand.
>
> (more below)

"s#" historically always meant "give be char* data with length".
It did not mean: "give me a pointer to the data area and its length".
That interpretation is new in 1.5.2. Even integers and lists
could provide buffer access with the new interpretation...
(sound evil ;-)

> > > There is no loss of data. You could argue that the byte order is lost,
> > > but I think that is incorrect. The application defines the semantics:
> > > the file might be defined as using host-order, or the application may be
> > > writing a BOM at the head of the file.
> >
> > The problem here is that many application were not written
> > to handle these kind of objects. Previously they could only
> > handle strings, now they can suddenly handle any object
> > having the buffer interface and then fail when the data
> > gets read back in.
> 
> An application is a complete unit. How are you suddenly going to
> manifest Unicode objects within that application? The only way is if the
> developer goes in and changes things; let them deal with the issues and
> fallout of their change. The other is external changes such as an
> upgrade to the interpreter or a module. Again, (IMO) if you're
> perturbing a system, then you are responsible for also correcting any
> problems you introduce.

Well, ok, if you're talking about standalone apps. I was
referring to applications which interact with other applications,
e.g. via files or sockets. You could pass a Unicode obj to a
socket and have it transfer the data to the other end without
getting an exception on the sending part of the connection.
The receiver would read the data as string and most probably
fail.

The whole application sitting in between and dealing with
the protocol and connection management wouldn't even notice
that you've just tried to extended its capabilities.

> In any case, Guido's position was that things can easily switch over to
> the "t#" interface to prevent the class of error where you pass a
> Unicode string to a function that expects a standard string.

Strange, why should code that relies on 8-bit character data
be changed because a new unsupported object type pops up ?
Code supporting the new type will have to be rewritten anyway,
but why break existing extensions in unpredicted ways ?

> > > > such things should be handled by extra methods, e.g. fp.rawwrite().
> > >
> > > I believe this would be a needless complication of the interface.
> >
> > It would clarify things and make the interface 100% backward
> > compatible again.
> 
> No. "s#" used to pull bytes from any buffer-capable object. Your
> suggestion for "s#" to use the getcharbuffer could introduce exceptions
> into currently-working code.

The buffer objects were introduced in 1.5.1, AFAIR. Changing
the semantics back to the original ones would only break
extensions relying on the behaviour you desribe -- the distribution
can easily be adapted to use some other marker, such as "r#".

> (this was probably Guido's prime motivation for the currently meaning of
> "t#"... I can dig up the mail thread if people need an authoritative
> commentary on the decision that was made)
> 
> > > > Hmm, I guess the philosophy behind the interface is not
> > > > really clear.
> > >
> > > I didn't design or implement it initially, but (as you may have guessed)
> > > I am a proponent of its existence.
> > >
> > > > Binary data is fetched via getreadbuffer and then
> > > > interpreted as character data... I always thought that the
> > > > getcharbuffer should be used for such an interpretation.
> > >
> > > The former is bad behavior. That is why getcharbuffer was added (by me,
> > > for 1.5.2). It was a preventative measure for the introduction of
> > > Unicode strings. Using getreadbuffer for characters would break badly
> > > given a Unicode string. Therefore, "clients" that want (8-bit)
> > > characters from an object supporting the buffer interface should use
> > > getcharbuffer. The Unicode object doesn't implement it, implying that it
> > > cannot provide 8-bit characters. You can get the raw bytes thru
> > > getreadbuffer.
> >
> > I agree 100%, but did you add the "t#" instead of having
> > "s#" use the getcharbuffer interface ?
> 
> Yes. For reasons detailed above.
> 
> > E.g. my mxTextTools
> > package uses "s#" on many APIs. Now someone could stick
> > in a Unicode object and get pretty strange results without
> > any notice about mxTextTools and Unicode being incompatible.
> 
> They could also stick in an array of integers. That supports the buffer
> interface, meaning the "s#" in your code would extract the bytes from
> it. In other words, people can already stick bogus stuff into your code.

Right now they can with 1.5.1 and 1.5.2 which is unfortunate.
I'd rather have the parsing function raise an exception.
 
> This seems to be a moot argument.

Not really when you have to support extensions across three
different patchlevels of Python.
 
> > You could argue that I change to "t#", but that doesn't
> > work since many people out there still use Python versions
> > <1.5.2 and those didn't have "t#", so mxTextTools would then
> > fail completely for them.
> 
> If support for the older versions is needed, then use an #ifdef to set
> up the appropriate macro in some header. Use that throughout your code.
>
> In any case: yes -- I would argue that you should absolutely be using
> "t#".

I can easily change my code, no big deal, but what about
the dozens of other extensions I don't want to bother diving
into ? I'd rather see an exception then complete garbage written
to a file or a socket.

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Sat Aug 14 17:53:45 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 18:53:45 +0200
Subject: [Python-Dev] buffer design
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com>
Message-ID: <37B59F19.45C1D23B@lemburg.com>

Fredrik Lundh wrote:
> 
> Greg Stein <gstein@lyra.org> wrote:
> > > E.g. my mxTextTools
> > > package uses "s#" on many APIs. Now someone could stick
> > > in a Unicode object and get pretty strange results without
> > > any notice about mxTextTools and Unicode being incompatible.
> >
> > They could also stick in an array of integers. That supports the buffer
> > interface, meaning the "s#" in your code would extract the bytes from
> > it. In other words, people can already stick bogus stuff into your code.
> 
> Except that people may expect unicode strings
> to work just like any other kind of string, while
> arrays are surely a different thing.
> 
> I'm beginning to suspect that the current buffer
> design is partially broken; it tries to work around
> at least two problems at once:
> 
> a) the current use of "string" objects for two purposes:
> as strings of 8-bit characters, and as buffers containing
> arbitrary binary data.
> 
> b) performance issues when reading/writing certain kinds
> of data to/from streams.
> 
> and fails to fully address either of them.

True, a higher level interface for those two objectives would
certainly address them much better than what we are trying to do at
bit level. Buffers should probably only be treated as pointers to
abstract memory areas and nothing more.

BTW, what about my suggestion to extend buffers to also allocate
memory (in case you pass None as object) ? Or should array
be used for that purpose (its an undocumented feature of arrays) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein@lyra.org  Sun Aug 15 03:59:25 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 19:59:25 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com>
Message-ID: <37B62D0D.6EC24240@lyra.org>

Fredrik Lundh wrote:
>...
> besides, what about buffers and threads?  if you
> return a pointer from getreadbuf, wouldn't it be
> good to know exactly when Python doesn't need
> that pointer any more?  explicit initbuffer/exitbuffer
> calls around each sequence of buffer operations
> would make that a lot safer...

This is a pretty obvious one, I think: it lasts only as long as the
object. PyString_AS_STRING is similar. Nothing new or funny here.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sun Aug 15 04:09:19 1999
From: gstein@lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 20:09:19 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
 <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com>
Message-ID: <37B62F5E.30C62070@lyra.org>

M.-A. Lemburg wrote:
> 
> Fred L. Drake, Jr. wrote:
> >
> > M.-A. Lemburg writes:
> >  > Aside: Is the buffer interface reachable in any way from within
> >  > Python ? Why isn't the interface exposed via __XXX__ methods
> >  > on normal Python instances (could be implemented by returning a
> >  > buffer object) ?
> >
> >   Would it even make sense?  I though a large part of the intent was
> > to for performance, avoiding memory copies.  Perhaps there should be
> > an .__as_buffer__() which returned an object that supports the C
> > buffer interface.  I'm not sure how useful it would be; perhaps for
> > classes that represent image data?  They could return a buffer object
> > created from a string/array/NumPy array.

There is no way to do this. The buffer interface only returns pointers
to memory. There would be no place to return an intermediary object, nor
a way to retain the reference for it.

For example, your class instance quickly sets up a PyBufferObject with
the relevant data and returns that. The underlying C code must now hold
that reference *and* return a pointer to the calling code. Impossible.

Fredrik's open/close concept for buffer accesses would make this
possible, as long as clients are aware that any returned pointer is
valid only until the buffer_close call. The context argument he proposes
would hold the object reference.

Having class instances respond to the buffer interface is interesting,
but until more code attempts to *use* the interface, I'm not quite sure
of the utility...

>... 
> Hmm, how about adding a writeable binary object to the core ?
> This would be useful for the __getwritebbuffer__() API because
> currently, I think, only mmap'ed files are useable as write
> buffers -- no other in-memory type. Perhaps buffer objects
> could be used for this purpose too, e.g. by having them
> allocate the needed memory chunk in case you pass None as
> object.

Yes, this would be very good. I would recommend that you pass an
integer, however, rather than None. You need to tell it the size of the
buffer to allocate. Since buffer(5) has no meaning at the moment,
altering the semantics to include this form would not be a problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From da@ski.org  Sun Aug 15 07:10:59 1999
From: da@ski.org (David Ascher)
Date: Sat, 14 Aug 1999 23:10:59 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B62F5E.30C62070@lyra.org>
Message-ID: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>

On Sat, 14 Aug 1999, Greg Stein wrote:

> Having class instances respond to the buffer interface is interesting,
> but until more code attempts to *use* the interface, I'm not quite sure
> of the utility...

Well, here's an example from my work today.  Maybe someone can suggest an
alternative that I haven't seen.

I'm using buffer objects to pass pointers to structs back and forth
between Python and Windows (Win32's GUI scheme involves sending messages
to functions with, oftentimes, addresses of structs as arguments, and
expect the called function to modify the struct directly -- similarly, I
must call Win32 functions w/ pointers to memory that Windows will modify,
and be able to read the modified memory). With 'raw' buffer object
manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
Python), this works fine [*].  So far, no instances.

I also have a class which allows the user to describe the buffer memory
layout in a natural way given the C struct, and manipulate the buffer
layout w/ getattr/setattr.  For example:

class Win32MenuItemStruct(AutoStruct):
    #
    # for each slot, specify type (maps to a struct.pack specifier),
    # name (for setattr/getattr behavior) and optional defaults.
    #
    table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct),
             (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID),
             (UINT, 'fType', MFT_STRING),
             (UINT, 'fState', MFS_ENABLED),
             (UINT, 'wID', None),
             (HANDLE, 'hSubMenu', 0),
             (HANDLE, 'hbmpChecked', 0),
             (HANDLE, 'hbmpUnchecked', 0),
             (DWORD, 'dwItemData', 0),
             (LPSTR, 'name', None),
             (UINT, 'cch', 0)]

AutoStruct has machinery which allows setting of buffer slices by slot
name, conversion of numeric types, etc.  This is working well.

The only hitch is that to send the buffer to the SWIG'ed function call, I
have three options, none ideal:

   1) define a __str__ method which makes a string of the buffer and pass
      that to the function which expects an "s#" argument.  This send
      a copy of the data, not the address.  As a result, this works
      well for structs which I create from scratch as long as I don't need
      to see any changes that Windows might have performed on the memory.

   2) send the instance but make up my own 'get-the-instance-as-buffer'
      API -- complicates extension module code.

   3) send the buffer attribute of the instance instead of the instance --
      complicates Python code, and the C code isn't trivial because there
      is no 'buffer' typecode for PyArg_ParseTuple().

If I could define an 

  def __aswritebuffer__

and if there was a PyArg_ParseTuple() typecode associated with read/write
buffers (I nominate 'w'!), I believe things would be simpler -- I could
then send the instance, specify in the PyArgParse_Tuple that I want a
pointer to memory, and I'd be golden.

What did I miss?

--david

[*] I feel naughty modifying random bits of memory from Python, but Bill
    Gates made me do it!


From mal@lemburg.com  Sun Aug 15 09:47:00 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 15 Aug 1999 10:47:00 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
 <37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> <37B62F5E.30C62070@lyra.org>
Message-ID: <37B67E84.6BBC8136@lemburg.com>

Greg Stein wrote:
>
> [me suggesting new __XXX__ methods on Python instances to provide
>  the buffer slots to Python programmers]
>
> Having class instances respond to the buffer interface is interesting,
> but until more code attempts to *use* the interface, I'm not quite sure
> of the utility...

Well, there already is lots of code supporting the interface,
e.g. fp.write(), socket.write() etc. Basically all streaming
interfaces I guess. So these APIs could be used to "write"
the object directly into a file.

> >...
> > Hmm, how about adding a writeable binary object to the core ?
> > This would be useful for the __getwritebbuffer__() API because
> > currently, I think, only mmap'ed files are useable as write
> > buffers -- no other in-memory type. Perhaps buffer objects
> > could be used for this purpose too, e.g. by having them
> > allocate the needed memory chunk in case you pass None as
> > object.
> 
> Yes, this would be very good. I would recommend that you pass an
> integer, however, rather than None. You need to tell it the size of the
> buffer to allocate. Since buffer(5) has no meaning at the moment,
> altering the semantics to include this form would not be a problem.

I was thinking of using the existing buffer(object,offset,size)
constructor... that's why I took None as object. offset would
then always be 0 and size gives the size of the memory chunk
to allocate. Of course, buffer(size) would look nicer, but it seems
a rather peculiar interface definition to say: ok, if you pass
a real Python integer, we'll take that as size. Who knows, maybe
at some in the future, you want to "write" integers via the
buffer interface too... then you'd probably also want to write
None... so how about a new builtin writebuffer(size) ?

Also, I think it would make sense to extend buffers to have
methods and attributes:

.writeable - attribute that tells whether the buffer is writeable
.chardata - true iff the getcharbuffer slot is available
.asstring() - return the buffer as Python string object

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Sun Aug 15 09:59:21 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sun, 15 Aug 1999 10:59:21 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>
Message-ID: <37B68169.73E03C84@lemburg.com>

David Ascher wrote:
> 
> On Sat, 14 Aug 1999, Greg Stein wrote:
> 
> > Having class instances respond to the buffer interface is interesting,
> > but until more code attempts to *use* the interface, I'm not quite sure
> > of the utility...
> 
> Well, here's an example from my work today.  Maybe someone can suggest an
> alternative that I haven't seen.
> 
> I'm using buffer objects to pass pointers to structs back and forth
> between Python and Windows (Win32's GUI scheme involves sending messages
> to functions with, oftentimes, addresses of structs as arguments, and
> expect the called function to modify the struct directly -- similarly, I
> must call Win32 functions w/ pointers to memory that Windows will modify,
> and be able to read the modified memory). With 'raw' buffer object
> manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
> Python), this works fine [*].  So far, no instances.

So that's why you were suggesting that struct.pack returns a buffer
rather than a string ;-)

Actually, I think you could use arrays to do the trick right now,
because they are writeable (unlike strings). Until creating
writeable buffer objects becomes possible that is...

> I also have a class which allows the user to describe the buffer memory
> layout in a natural way given the C struct, and manipulate the buffer
> layout w/ getattr/setattr.  For example:
> 
> class Win32MenuItemStruct(AutoStruct):
>     #
>     # for each slot, specify type (maps to a struct.pack specifier),
>     # name (for setattr/getattr behavior) and optional defaults.
>     #
>     table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct),
>              (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID),
>              (UINT, 'fType', MFT_STRING),
>              (UINT, 'fState', MFS_ENABLED),
>              (UINT, 'wID', None),
>              (HANDLE, 'hSubMenu', 0),
>              (HANDLE, 'hbmpChecked', 0),
>              (HANDLE, 'hbmpUnchecked', 0),
>              (DWORD, 'dwItemData', 0),
>              (LPSTR, 'name', None),
>              (UINT, 'cch', 0)]
> 
> AutoStruct has machinery which allows setting of buffer slices by slot
> name, conversion of numeric types, etc.  This is working well.
> 
> The only hitch is that to send the buffer to the SWIG'ed function call, I
> have three options, none ideal:
> 
>    1) define a __str__ method which makes a string of the buffer and pass
>       that to the function which expects an "s#" argument.  This send
>       a copy of the data, not the address.  As a result, this works
>       well for structs which I create from scratch as long as I don't need
>       to see any changes that Windows might have performed on the memory.
> 
>    2) send the instance but make up my own 'get-the-instance-as-buffer'
>       API -- complicates extension module code.
> 
>    3) send the buffer attribute of the instance instead of the instance --
>       complicates Python code, and the C code isn't trivial because there
>       is no 'buffer' typecode for PyArg_ParseTuple().
> 
> If I could define an
> 
>   def __aswritebuffer__
> 
> and if there was a PyArg_ParseTuple() typecode associated with read/write
> buffers (I nominate 'w'!), I believe things would be simpler -- I could
> then send the instance, specify in the PyArgParse_Tuple that I want a
> pointer to memory, and I'd be golden.
> 
> What did I miss?

Just a naming thingie: __getwritebuffer__ et al. would map to the
C interfaces more directly.

The new typecode "w#" for writeable buffer style objects is a good idea
(it should only work on single segment buffers).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fredrik@pythonware.com  Sun Aug 15 11:32:59 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Sun, 15 Aug 1999 12:32:59 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org>
Message-ID: <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>

> Fredrik Lundh wrote:
> >...
> > besides, what about buffers and threads?  if you
> > return a pointer from getreadbuf, wouldn't it be
> > good to know exactly when Python doesn't need
> > that pointer any more?  explicit initbuffer/exitbuffer
> > calls around each sequence of buffer operations
> > would make that a lot safer...
> 
> This is a pretty obvious one, I think: it lasts only as long as the
> object. PyString_AS_STRING is similar. Nothing new or funny here.

well, I think the buffer behaviour is both
new and pretty funny:

from array import array

a = array("f", [0]*8192)

b = buffer(a)

for i in range(1000):
    a.append(1234)

print b

in other words, the buffer interface should
be redesigned, or removed.

(though I'm sure AOL would find some inter-
resting use for this ;-)

</F>

    "Confusing?  Yes, but this is a lot better than
    allowing arbitrary pointers!"
    -- GvR on assignment operators, November 91


From da@ski.org  Sun Aug 15 17:54:23 1999
From: da@ski.org (David Ascher)
Date: Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B68169.73E03C84@lemburg.com>
Message-ID: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>

On Sun, 15 Aug 1999, M.-A. Lemburg wrote:

> Actually, I think you could use arrays to do the trick right now,
> because they are writeable (unlike strings). Until creating
> writeable buffer objects becomes possible that is...

No, because I can't make an array around existing memory which Win32
allocates before I get to it.

> Just a naming thingie: __getwritebuffer__ et al. would map to the
> C interfaces more directly.

Whatever.

> The new typecode "w#" for writeable buffer style objects is a good idea
> (it should only work on single segment buffers).

Indeed.

--david


From gstein@lyra.org  Sun Aug 15 21:27:57 1999
From: gstein@lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:27:57 -0700
Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? ))
References: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>
Message-ID: <37B722CD.383A2A9E@lyra.org>

David Ascher wrote:
> On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> ...
> > The new typecode "w#" for writeable buffer style objects is a good idea
> > (it should only work on single segment buffers).
> 
> Indeed.

I just borrowed Guido's time machine. That typecode is already in 1.5.2.

:-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein@lyra.org  Sun Aug 15 21:35:25 1999
From: gstein@lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:35:25 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>
Message-ID: <37B7248D.31E5D2BF@lyra.org>

Fredrik Lundh wrote:
>...
> well, I think the buffer behaviour is both
> new and pretty funny:

I think the buffer interface was introduced in 1.5 (by Jack?). I added
the 8-bit character buffer slot and buffer objects in 1.5.2.

> from array import array
> 
> a = array("f", [0]*8192)
> 
> b = buffer(a)
> 
> for i in range(1000):
>     a.append(1234)
> 
> print b
> 
> in other words, the buffer interface should
> be redesigned, or removed.

I don't understand what you believe is weird here. Also, are you saying
the buffer *interface* is weird, or the buffer *object* ?

thx,
-g

--
Greg Stein, http://www.lyra.org/


From da@ski.org  Sun Aug 15 21:49:23 1999
From: da@ski.org (David Ascher)
Date: Sun, 15 Aug 1999 13:49:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in
 abstract.c? ))
In-Reply-To: <37B722CD.383A2A9E@lyra.org>
Message-ID: <Pine.WNT.4.05.9908151347050.68-100000@david.ski.org>

On Sun, 15 Aug 1999, Greg Stein wrote:

> David Ascher wrote:
> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> > ...
> > > The new typecode "w#" for writeable buffer style objects is a good idea
> > > (it should only work on single segment buffers).
> > 
> > Indeed.
> 
> I just borrowed Guido's time machine. That typecode is already in 1.5.2.

Ha.  Cool. 

--da


From gstein@lyra.org  Sun Aug 15 21:53:51 1999
From: gstein@lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:53:51 -0700
Subject: [Python-Dev] instances as buffers
References: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>
Message-ID: <37B728DF.2CA2A20A@lyra.org>

David Ascher wrote:
>...
> I'm using buffer objects to pass pointers to structs back and forth
> between Python and Windows (Win32's GUI scheme involves sending messages
> to functions with, oftentimes, addresses of structs as arguments, and
> expect the called function to modify the struct directly -- similarly, I
> must call Win32 functions w/ pointers to memory that Windows will modify,
> and be able to read the modified memory). With 'raw' buffer object
> manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
> Python), this works fine [*].  So far, no instances.

How do you manage the lifetimes of the memory and objects?
PyBuffer_FromReadWriteMemory() creates a buffer object that points to
memory. You need to ensure that the memory exists as long as the buffer
does.

Would it make more sense to use PyBuffer_New(size)?

Note: PyBuffer_FromMemory() (read-only) was built primarily for the case
where you have static constants in an extension module (strings, code
objects, etc) and want to expose them to Python without copying them
into the heap. Currently, stuff like this must be copied into a dynamic
string object to be exposed to Python. The
PyBuffer_FromReadWriteMemory() is there for symmetry, but it can be very
dangerous to use because of the lifetime problem.

PyBuffer_New() allocates its own memory, so the lifetimes are managed
properly. PyBuffer_From*Object maintains a reference to the target
object so that the target object can be kept around at least as long as
the buffer.

> I also have a class which allows the user to describe the buffer memory
> layout in a natural way given the C struct, and manipulate the buffer
> layout w/ getattr/setattr.  For example:

This is a very cool class. Mark and I had discussed doing something just
like this (a while back) for some of the COM stuff. Basically, we'd want
to generate these structures from type libraries.

>...
> The only hitch is that to send the buffer to the SWIG'ed function call, I
> have three options, none ideal:
> 
>    1) define a __str__ method which makes a string of the buffer and pass
>       that to the function which expects an "s#" argument.  This send
>       a copy of the data, not the address.  As a result, this works
>       well for structs which I create from scratch as long as I don't need
>       to see any changes that Windows might have performed on the memory.

Note that "s#" can be used directly against the buffer object. You could
pass it directly rather than via __str__.

>    2) send the instance but make up my own 'get-the-instance-as-buffer'
>       API -- complicates extension module code.
> 
>    3) send the buffer attribute of the instance instead of the instance --
>       complicates Python code, and the C code isn't trivial because there
>       is no 'buffer' typecode for PyArg_ParseTuple().
> 
> If I could define an
> 
>   def __aswritebuffer__
> 
> and if there was a PyArg_ParseTuple() typecode associated with read/write
> buffers (I nominate 'w'!), I believe things would be simpler -- I could
> then send the instance, specify in the PyArgParse_Tuple that I want a
> pointer to memory, and I'd be golden.
> 
> What did I miss?

You can do #3 today since there is a buffer typecode present ("w" or
"w#"). It will complicate Python code a bit since you need to pass the
buffer, but it is the simplest of the three options.

Allowing instances to return buffers does seem to make sense, although
it exposes a lot of underlying machinery at the Python level. It might
be nicer to find a better semantic for this than just exposing the
buffer interface slots.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From da@ski.org  Sun Aug 15 22:07:35 1999
From: da@ski.org (David Ascher)
Date: Sun, 15 Aug 1999 14:07:35 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Re: instances as buffers
In-Reply-To: <37B728DF.2CA2A20A@lyra.org>
Message-ID: <Pine.WNT.4.05.9908151358310.68-100000@david.ski.org>

On Sun, 15 Aug 1999, Greg Stein wrote:

> How do you manage the lifetimes of the memory and objects?
> PyBuffer_FromReadWriteMemory() creates a buffer object that points to
> memory. You need to ensure that the memory exists as long as the buffer
> does.

For those cases where I use PyBuffer_FromReadWriteMemory, I have no
control over the memory involved.  Windows allocates the memory, lets me
use it for a litle while, and it cleans it up whenever it feels like it.
It hasn't been a problem yet, but I agree that it's possibly a problem.
I'd call it a problem w/ the win32 API, though.

> Would it make more sense to use PyBuffer_New(size)?

Again, I can't because I am given a pointer and am expected to modify e.g.
bytes 10-12 starting from that memory location.

> This is a very cool class. Mark and I had discussed doing something just
> like this (a while back) for some of the COM stuff. Basically, we'd want
> to generate these structures from type libraries.

I know zilch about type libraries.  This is for CE work, although none
about this class is CE-specific.  Do type libraries give the same kind of
info?

> You can do #3 today since there is a buffer typecode present ("w" or
> "w#"). It will complicate Python code a bit since you need to pass the
> buffer, but it is the simplest of the three options.

Ok.  Time to patch SWIG again!

--david


From Vladimir.Marangozov@inrialpes.fr  Mon Aug 16 00:35:10 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Mon, 16 Aug 1999 00:35:10 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm"
Message-ID: <199908152335.AAA55842@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call; if I used the debugger often enough to care <wink>, I'd think
> about munging in a new opcode to make breakpoint sites explicit).
> 
> immutability-is-made-to-be-violated-ly y'rs  - tim
> 

Could you elaborate a bit more on this? Do you mean setting breakpoints
on a per opcode basis (for example by exchanging the original opcode
with a new BREAKPOINT opcode in the code object) and use the lineno tab
for breakpoints based on the source listing?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one@email.msn.com  Mon Aug 16 03:31:16 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Sun, 15 Aug 1999 22:31:16 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908152335.AAA55842@pukapuka.inrialpes.fr>
Message-ID: <000101bee78f$6aa217e0$f22d2399@tim>

[Vladimir Marangozov]
> Could you elaborate a bit more on this?

No time for this now -- sorry.

> Do you mean setting breakpoints on a per opcode basis (for example
> by exchanging the original opcode with a new BREAKPOINT opcode in
> the code object) and use the lineno tab for breakpoints based on
> the source listing?

Something like that.  The classic way to implement positional breakpoints is
to perturb the code; the classic problem is how to get back the effect of
the code that was overwritten.


From gstein@lyra.org  Mon Aug 16 05:42:19 1999
From: gstein@lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 21:42:19 -0700
Subject: [Python-Dev] Re: why
References: <Pine.WNT.4.05.9908152139000.180-100000@david.ski.org>
Message-ID: <37B796AB.34F6F93@lyra.org>

David Ascher wrote:
> 
> Why does buffer(array('c', 'test')) return a read-only buffer?

Simply because the buffer() builtin always creates a read-only object,
rather than selecting read/write when possible.

Shouldn't be hard to alter the semantics of buffer() to do so. Maybe do
this at the same time as updating it to create read/write buffers out of
the blue.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From tim_one@email.msn.com  Mon Aug 16 07:42:17 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Mon, 16 Aug 1999 02:42:17 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <19990813214817.5393C1C4742@oratrix.oratrix.nl>
Message-ID: <000b01bee7b2$7c62d780$f22d2399@tim>

[Jack Jansen]
> ...

A long time ago, Dianne Hackborn actually implemented a scheme like this,
under the name VREF (for "virtual reference", or some such).  IIRC,
differences from your scheme were mainly that:

1) There was an elaborate proxy mechanism to avoid having to explicitly
strengthen the weak.

2) Each object contained a pointer to a linked list of associated weak refs.

This predates DejaNews, so may be a pain to find.

> ...
> We add a new builtin function (or a module with that function)
> weak(). This returns a weak reference to the object passed as a
> parameter. A weak object has one method: strong(), which returns the
> corresponding real object or raises an exception if the object doesn't
> exist anymore.

This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash"
functions, except that their hash returns an (unbounded) int and guarantees
that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a
stronger guarantee than Python's "id", which may return the same int for
objects with disjoint lifetimes; the other reason object address isn't
appropriate for them is that objects can be moved by garbage collection, but
hash is an object invariant).

Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises
an exception.  By most accounts (I haven't used it seriously myself), it's a
usable interface.

> ...
> to implement this I need to add a pointer to every object.

That's unattractive, of course.

> ...
> (actually: we could make do with a single bit in every object, with
> the bit meaning "this object has an associated weak object". We could
> then use a global dictionary indexed by object address to find the
> weak object)

Is a single bit actually smaller than a pointer?  For example, on most
machines these days

#define PyObject_HEAD \
	int ob_refcnt; \
	struct _typeobject *ob_type;

is two 4-byte fields packed solid already, and structure padding prevents
adding anything less than a 4-byte increment in reality.  I guess on Alpha
there's a 4-byte hole here, but I don't want weak pointers enough to switch
machines <wink>.

OTOH, sooner or later Guido is going to want a mark bit too, so the other
way to view this is that 32 new flag bits are as cheap as one <wink>.

There's one other thing I like about this:  it can get rid of the dicey

> Strong() checks that self->object->weak == self and returns
> self->object (INCREFfed) if it is.

check.  If object has gone away, you're worried that self->object may (on
some systems) point to a newly-invalid address.  But worse than that, its
memory may get reused, and then self->object may point into the *middle* of
some other object where the bit pattern at the "weak" offset just happens to
equal self.

Let's try a sketch in pseduo-Python, where __xxx are secret functions that
do the obvious things (and glossing over thread safety since these are
presumably really implemented in C):

# invariant:  __is_weak_bit_set(obj) == id2weak.has_key(id(obj))
# So "the weak bit" is simply an optimization, sparing most objects
# from a dict lookup when they die.
# The invariant is delicate in the presence of threads.

id2weak = {}

class _Weak:
    def __init__(self, obj):
        self.id = id(obj)  # obj's refcount not bumped
        __set_weak_bit(obj)
        id2weak[self.id] = self
        # note that "the system" (see below) sets self.id
        # to None if obj dies

    def strong(self):
        if self.id is None:
            raise DeadManWalkingError(self.id)
        return __id2obj(self.id)  # will bump obj's refcount

    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

def weak(obj):
    return id2weak.get(id(obj), None) or _Weak(obj)

and then whenever an object of any kind is deleted the system does:

    if __is_weak_bit_set(obj):
        objid = id(obj)
        id2weak[objid].id = None
        del id2weak[objid]

In my current over-tired state, I think that's safe (modulo threads),
portable and reasonably fast; I do think the extra bit costs 4 bytes,
though.

> ...
> The weak object isn't transparent, because you have to call strong()
> before you can do anything with it, but this is an advantage (says he,
> aspiring to a career in politics or sales:-): with a transparent weak
> object the object could disappear at unexpected moments and with this
> scheme it can't, because when you have the object itself in hand you
> have a refcount too.

Explicit is better than implicit for me.

[M.-A. Lemburg]
> Have you checked the weak reference dictionary implementation
> by Dieter Maurer ? It's at:
>
>	http://www.handshake.de/~dieter/weakdict.html

A project where I work is using it; it blows up a lot <wink/frown>.

While some form of weak dict is what most people want in the end, I'm not
sure Dieter's decision to support weak dicts with only weak values (not weak
keys) is sufficient.  For example, the aforementioned project wants to
associate various computed long strings with certain hashable objects, and
for some reason or other (ain't my project ...) these objects can't be
changed.  So they can't store the strings in the objects.  So they'd like to
map the objects to the strings via assorted dicts.  But using the object as
a dict key keeps it (and, via the dicts, also its associated strings)
artificially alive; they really want a weakdict with weak *keys*.

I'm not sure I know of a clear & fast way to implement a weakdict building
only on the weak() function.  Jack?

Using weak objects as values (or keys) with an ordinary dict can prevent
their referents from being kept artificially alive, but that doesn't get the
dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
object when its referent goes away; that would at least give the WO a chance
to purge itself from structures it knows it's in ...

> ...
> BTW, how would this be done in JPython ? I guess it doesn't
> make much sense there because cycles are no problem for the
> Java VM GC.

Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
"hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
details.  I stopped paying attention to Java, so it's up to you to tell us
what you learn about it <wink>.


From fredrik@pythonware.com  Mon Aug 16 08:06:43 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Mon, 16 Aug 1999 09:06:43 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org>
Message-ID: <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com>

> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.
> 
> > from array import array
> > 
> > a = array("f", [0]*8192)
> > 
> > b = buffer(a)
> > 
> > for i in range(1000):
> >     a.append(1234)
> > 
> > print b
> > 
> > in other words, the buffer interface should
> > be redesigned, or removed.
> 
> I don't understand what you believe is weird here.

did you run that code?

it may work, it may bomb, or it may generate bogus
output. all depending on your memory allocator, the
phase of the moon, etc. just like back in the C/C++
days...

imo, that's not good enough for a core feature.

</F>


From gstein@lyra.org  Mon Aug 16 08:15:54 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 00:15:54 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com>
Message-ID: <37B7BAAA.1E6EE4CA@lyra.org>

Fredrik Lundh wrote:
> 
> > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > the 8-bit character buffer slot and buffer objects in 1.5.2.
> >
> > > from array import array
> > >
> > > a = array("f", [0]*8192)
> > >
> > > b = buffer(a)
> > >
> > > for i in range(1000):
> > >     a.append(1234)
> > >
> > > print b
> > >
> > > in other words, the buffer interface should
> > > be redesigned, or removed.
> >
> > I don't understand what you believe is weird here.
> 
> did you run that code?

Yup. It printed nothing.

> it may work, it may bomb, or it may generate bogus
> output. all depending on your memory allocator, the
> phase of the moon, etc. just like back in the C/C++
> days...

It probably appeared as an empty string because the construction of the
array filled it with zeroes (at least the first byte).

Regardless, I'd be surprised if it crashed the interpreter. The print
command is supposed to do a str() on the object, which creates a
PyStringObject from the buffer contents. Shouldn't be a crash there.

> imo, that's not good enough for a core feature.

If it crashed, then sure. But I'd say that indicates a bug rather than a
design problem. Do you have a stack trace from a crash?

Ah. I just worked through, in my head, what is happening here. The
buffer object caches the pointer returned by the array object. The
append on the array does a realloc() somewhere, thereby invalidating the
pointer inside the buffer object.

Icky. Gotta think on this one... As an initial thought, it would seem
that the buffer would have to re-query the pointer for each operation.
There are performance implications there, of course, but that would
certainly fix the problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jack@oratrix.nl  Mon Aug 16 10:42:42 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 11:42:42 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Message by David Ascher <da@ski.org> ,
 Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) ,
 <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>
Message-ID: <19990816094243.3CE83303120@snelboot.oratrix.nl>

> On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> 
> > Actually, I think you could use arrays to do the trick right now,
> > because they are writeable (unlike strings). Until creating
> > writeable buffer objects becomes possible that is...
> 
> No, because I can't make an array around existing memory which Win32
> allocates before I get to it.

Would adding a buffer interface to cobject solve your problem? Cobject is 
described as being used for passing C objects between Python modules, but I've 
always thought of it as passing C objects from one C routine to another C 
routine through Python, which doesn't necessarily understand what the object 
is all about.

That latter description seems to fit your bill quite nicely.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack@oratrix.nl  Mon Aug 16 10:49:41 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 11:49:41 +0200
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
 Sun, 15 Aug 1999 13:35:25 -0700 , <37B7248D.31E5D2BF@lyra.org>
Message-ID: <19990816094941.83BE2303120@snelboot.oratrix.nl>

> >...
> > well, I think the buffer behaviour is both
> > new and pretty funny:
> 
> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.

Ah, now I understand why I didn't understand some of the previous 
conversation: I hadn't never come across the buffer *objects* (as opposed to 
the buffer *interface*) until Fredrik's example.

I've just look at it, and I'm not sure I understand the full intentions of the 
buffer object. Buffer objects can either behave as the "buffer-aspect" of the 
object behind them (without the rest of their functionality) or as array 
objects, and if they start out life as the first they can evolve into the 
second, is that right?

Is there a rationale behind this design, or is it just something that 
happened?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein@lyra.org  Mon Aug 16 10:56:31 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 02:56:31 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990816094941.83BE2303120@snelboot.oratrix.nl>
Message-ID: <37B7E04F.3843004@lyra.org>

Jack Jansen wrote:
>...
> I've just look at it, and I'm not sure I understand the full intentions of the
> buffer object. Buffer objects can either behave as the "buffer-aspect" of the
> object behind them (without the rest of their functionality) or as array
> objects, and if they start out life as the first they can evolve into the
> second, is that right?
> 
> Is there a rationale behind this design, or is it just something that
> happened?

The object doesn't change. You create it as a reference to an existing
object's buffer (as exported via the buffer interface), or you create it
as a reference to some arbitrary memory.

The buffer object provides (optionally read/write) string-like behavior
to any object that supports buffer behavior. It can also be used to make
lightweight slices of another object. For example:

>>> a = "abcdefghi"
>>> b = buffer(a, 3, 3)
>>> print b
def
>>>

In the above example, there is only one copy of "def" (the portion
inside of the string object referenced by <a>).

The string-like behavior can be quite nice for memory-mapped files.
Andrew's mmapfile module's file objects export the buffer interface.
This means that you can open a file, wrap a buffer around it, and
perform quick and easy random-access on the thing. You could even select
slices of the file and pass them around as if they were strings, without
loading anything into the process heap. (I want to try mmap'ing a .pyc
and create code objects that have buffer-based bytecode streams; it will
be interesting to see if this significantly reduces memory consumption
(in terms of the heap size; the mmap'd .pyc can be shared across
processes)).

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jim@digicool.com  Mon Aug 16 13:30:41 1999
From: jim@digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 08:30:41 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>
Message-ID: <37B80471.F0F467C9@digicool.com>

Fredrik Lundh wrote:
> 
> > Fredrik Lundh wrote:
> > >...
> > > besides, what about buffers and threads?  if you
> > > return a pointer from getreadbuf, wouldn't it be
> > > good to know exactly when Python doesn't need
> > > that pointer any more?  explicit initbuffer/exitbuffer
> > > calls around each sequence of buffer operations
> > > would make that a lot safer...
> >
> > This is a pretty obvious one, I think: it lasts only as long as the
> > object. PyString_AS_STRING is similar. Nothing new or funny here.
> 
> well, I think the buffer behaviour is both
> new and pretty funny:
> 
> from array import array
> 
> a = array("f", [0]*8192)
> 
> b = buffer(a)
> 
> for i in range(1000):
>     a.append(1234)
> 
> print b
> 
> in other words, the buffer interface should
> be redesigned, or removed.

A while ago I asked for some documentation on the Buffer
interface.  I basically got silence.  At this point, I 
don't have a good idea what buffers are for and I don't see alot
of evidence that there *is* a design. I assume that there was
a design, but I can't see it.  This whole discussion makes me
very queasy.  

I'm probably just out of it, since I don't have
time to read the Python list anymore. Presumably the buffer
interface was proposed and discussed there at some distant
point in the past.

(I can't pay as much attention to this discussion as I suspect
I should, due to time constaints and due to a basic understanding
of the rational for the buffer interface.  Jst now I caught a sniff
of something I find kinda repulsive.  I think I hear you all talking about
beasies that hold a reference to some object's internal storage and that
have write operations so you can write directly to the objects storage 
bypassing the object interfaces. I probably just imagined it.)

</whine>

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gstein@lyra.org  Mon Aug 16 13:41:23 1999
From: gstein@lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 05:41:23 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com>
Message-ID: <37B806F3.2C5EDC44@lyra.org>

Jim Fulton wrote:
>...
> A while ago I asked for some documentation on the Buffer
> interface.  I basically got silence.  At this point, I

I think the silence was caused by the simple fact that the documentation
does not (yet) exist. That's all... nothing nefarious.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal@lemburg.com  Mon Aug 16 13:05:35 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:05:35 +0200
Subject: [Python-Dev] Re: w# typecode (was: marshal (was:Buffer interface in abstract.c? ))
References: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org> <37B722CD.383A2A9E@lyra.org>
Message-ID: <37B7FE8F.30C35284@lemburg.com>

Greg Stein wrote:
> 
> David Ascher wrote:
> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> > ...
> > > The new typecode "w#" for writeable buffer style objects is a good idea
> > > (it should only work on single segment buffers).
> >
> > Indeed.
> 
> I just borrowed Guido's time machine. That typecode is already in 1.5.2.
> 
> :-)

Ah, cool :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Mon Aug 16 13:29:31 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:29:31 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000b01bee7b2$7c62d780$f22d2399@tim>
Message-ID: <37B8042B.21DE6053@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > Have you checked the weak reference dictionary implementation
> > by Dieter Maurer ? It's at:
> >
> >       http://www.handshake.de/~dieter/weakdict.html
> 
> A project where I work is using it; it blows up a lot <wink/frown>.
> 
> While some form of weak dict is what most people want in the end, I'm not
> sure Dieter's decision to support weak dicts with only weak values (not weak
> keys) is sufficient.  For example, the aforementioned project wants to
> associate various computed long strings with certain hashable objects, and
> for some reason or other (ain't my project ...) these objects can't be
> changed.  So they can't store the strings in the objects.  So they'd like to
> map the objects to the strings via assorted dicts.  But using the object as
> a dict key keeps it (and, via the dicts, also its associated strings)
> artificially alive; they really want a weakdict with weak *keys*.
> 
> I'm not sure I know of a clear & fast way to implement a weakdict building
> only on the weak() function.  Jack?
> 
> Using weak objects as values (or keys) with an ordinary dict can prevent
> their referents from being kept artificially alive, but that doesn't get the
> dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
> object when its referent goes away; that would at least give the WO a chance
> to purge itself from structures it knows it's in ...

Perhaps one could fiddle something out of the Proxy objects
in mxProxy (you know where...). These support a special __cleanup__
protocol that I use a lot to work around circular garbage:
the __cleanup__ method of the referenced object is called prior
to destroying the proxy; even if the reference count on the
object has not yet gone down to 0.

This makes direct circles possible without problems: the parent
can reference a child through the proxy and the child can reference the
parent directly. As soon as the parent is cleaned up, the reference to
the proxy is deleted which then automagically makes the
back reference in the child disappear, allowing the parent
to be deallocated after cleanup without leaving a circular
reference around.

> > ...
> > BTW, how would this be done in JPython ? I guess it doesn't
> > make much sense there because cycles are no problem for the
> > Java VM GC.
> 
> Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
> "hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
> details.  I stopped paying attention to Java, so it's up to you to tell us
> what you learn about it <wink>.

Thanks for the reference... but I guess this will remain a
weak one for some time since the latter is currently a
limited resource :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Mon Aug 16 13:41:51 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:41:51 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> <37B7BAAA.1E6EE4CA@lyra.org>
Message-ID: <37B8070F.763C3FF8@lemburg.com>

Greg Stein wrote:
> 
> Fredrik Lundh wrote:
> >
> > > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > > the 8-bit character buffer slot and buffer objects in 1.5.2.
> > >
> > > > from array import array
> > > >
> > > > a = array("f", [0]*8192)
> > > >
> > > > b = buffer(a)
> > > >
> > > > for i in range(1000):
> > > >     a.append(1234)
> > > >
> > > > print b
> > > >
> > > > in other words, the buffer interface should
> > > > be redesigned, or removed.
> > >
> > > I don't understand what you believe is weird here.
> >
> > did you run that code?
> 
> Yup. It printed nothing.
> 
> > it may work, it may bomb, or it may generate bogus
> > output. all depending on your memory allocator, the
> > phase of the moon, etc. just like back in the C/C++
> > days...
> 
> It probably appeared as an empty string because the construction of the
> array filled it with zeroes (at least the first byte).
> 
> Regardless, I'd be surprised if it crashed the interpreter. The print
> command is supposed to do a str() on the object, which creates a
> PyStringObject from the buffer contents. Shouldn't be a crash there.
> 
> > imo, that's not good enough for a core feature.
> 
> If it crashed, then sure. But I'd say that indicates a bug rather than a
> design problem. Do you have a stack trace from a crash?
> 
> Ah. I just worked through, in my head, what is happening here. The
> buffer object caches the pointer returned by the array object. The
> append on the array does a realloc() somewhere, thereby invalidating the
> pointer inside the buffer object.
> 
> Icky. Gotta think on this one... As an initial thought, it would seem
> that the buffer would have to re-query the pointer for each operation.
> There are performance implications there, of course, but that would
> certainly fix the problem.

I guess that's the way to go. I wouldn't want to think
about those details when using buffer objects and a function call
is still better than a copy... it would do the init/exit
wrapping implicitly: init at the time the getreadbuffer
call is made and exit next time a thread switch is done - 
provided that the functions using the memory pointer also
keep a reference to the buffer object alive (but that should
be natural as this is always done when dealing with references
in a safe way).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jim@digicool.com  Mon Aug 16 14:26:40 1999
From: jim@digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 09:26:40 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> <37B806F3.2C5EDC44@lyra.org>
Message-ID: <37B81190.165C373E@digicool.com>

Greg Stein wrote:
> 
> Jim Fulton wrote:
> >...
> > A while ago I asked for some documentation on the Buffer
> > interface.  I basically got silence.  At this point, I
> 
> I think the silence was caused by the simple fact that the documentation
> does not (yet) exist. That's all... nothing nefarious.

I didn't mean to suggest anything nefarious.  I do think that a change that
affects something as basic as the standard object type layout and that
generates this much discussion really should be documented before it
becomes part of the core.  I'd especially like to see some kind of document
that includes information like:

  - A problem statement that describes the problem the change is
    solving,

  - How does the solution solve the problem,

  - When and how should people writing new types support the new
    interfaces?

We're not talking about a new library module here.  There's been 
a change to the core object interface.

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From jack@oratrix.nl  Mon Aug 16 14:45:31 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 15:45:31 +0200
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: Message by Jim Fulton <jim@digicool.com> ,
 Mon, 16 Aug 1999 08:30:41 -0400 , <37B80471.F0F467C9@digicool.com>
Message-ID: <19990816134531.C30B5303120@snelboot.oratrix.nl>

> A while ago I asked for some documentation on the Buffer
> interface.  I basically got silence.  At this point, I 
> don't have a good idea what buffers are for and I don't see alot
> of evidence that there *is* a design. I assume that there was
> a design, but I can't see it.  This whole discussion makes me
> very queasy.  

Okay, as I'm apparently not the only one who is queasy let's start from 
scratch.

First, there is the old buffer _interface_. This is a C interface that allows 
extension (and builtin) modules and functions a unified way to access objects 
if they want to write the object to file and similar things. It is also what 
the PyArg_ParseTuple "s#" returns. This is, in C, the 
getreadbuffer/getwritebuffer interface.

Second, there's the extension the the buffer interface as of 1.5.2. This is 
again only available in C, and it allows C programmers to get an object _as an 
ASCII string_. This is meant for things like regexp modules, to access any 
"textual" object as an ASCII string. This is the getcharbuffer interface, and 
bound to the "t#" specifier in PyArg_ParseTuple.

Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports 
the functionality of the buffer interface to Python, but it does a bit more as 
well, because the buffer objects have a sort of copy-on-write semantics that 
means they may or may not be "attached" to a python object through the buffer 
interface.

<personal opinion>
I think that the C interface and the object should be treated completely 
separately. I definitely want the C interface, but I personally don't use the 
Python buffer objects, so I don't really care all that much about those. Also, 
I think that the buffer objects might become easier to understand if we don't 
think of it as "the buffer interface exported to python", but as "Python 
buffer objects, that may share memory with other Python objects as an 
optimization".
</personal opinion>
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jim@digicool.com  Mon Aug 16 17:03:54 1999
From: jim@digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 12:03:54 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990816134531.C30B5303120@snelboot.oratrix.nl>
Message-ID: <37B8366A.82B305C7@digicool.com>

Jack Jansen wrote:
> 
> > A while ago I asked for some documentation on the Buffer
> > interface.  I basically got silence.  At this point, I
> > don't have a good idea what buffers are for and I don't see alot
> > of evidence that there *is* a design. I assume that there was
> > a design, but I can't see it.  This whole discussion makes me
> > very queasy.
> 
> Okay, as I'm apparently not the only one who is queasy let's start from
> scratch.

Yee ha!
 
> First, there is the old buffer _interface_. This is a C interface that allows
> extension (and builtin) modules and functions a unified way to access objects
> if they want to write the object to file and similar things.

Is this serialization?  What does this achiev that, say, the pickling
protocols don't achiev? What other problems does it solve?

> It is also what
> the PyArg_ParseTuple "s#" returns. This is, in C, the
> getreadbuffer/getwritebuffer interface.

Huh? "s#" doesn't return a string? Or are you saying that you can
pass a non-string object to a C function that uses "s#" and have it
bufferized and then stringized?  In either case, this is not
consistent with the documentation (interface) of PyArg_ParseTuple.
 
> Second, there's the extension the the buffer interface as of 1.5.2. This is
> again only available in C, and it allows C programmers to get an object _as an
> ASCII string_. This is meant for things like regexp modules, to access any
> "textual" object as an ASCII string. This is the getcharbuffer interface, and
> bound to the "t#" specifier in PyArg_ParseTuple.

Hm. So this is making a little more sense. So, there is a notion that
there are "textual" objects that want to provide a method for getting
their "text". How does this text differ from what you get from __str__
or __repr__?  

> Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports
> the functionality of the buffer interface to Python,

How so?  Maybe I'm at sea because I still don't get what the 
C buffer interface is for.

> but it does a bit more as
> well, because the buffer objects have a sort of copy-on-write semantics that
> means they may or may not be "attached" to a python object through the buffer
> interface.

What is this thing used for?

Where does the slot in tp_as_buffer come into all of this?

Why does this need to be a slot in the first place?
Are these "textual" objects really common? Is the presense of this
slot a flag for "textualness"?
 
It would help alot, at least for me, if there was a clearer
description of what motivates these things. What problems are
they trying to solve?  

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da@ski.org  Mon Aug 16 17:45:47 1999
From: da@ski.org (David Ascher)
Date: Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B8366A.82B305C7@digicool.com>
Message-ID: <Pine.WNT.4.04.9908160926440.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

> > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > again only available in C, and it allows C programmers to get an object _as an
> > ASCII string_. This is meant for things like regexp modules, to access any
> > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > bound to the "t#" specifier in PyArg_ParseTuple.
> 
> Hm. So this is making a little more sense. So, there is a notion that
> there are "textual" objects that want to provide a method for getting
> their "text". How does this text differ from what you get from __str__
> or __repr__?  

I'll let others give a well thought out rationale.  Here are some examples
of use which I think worthwile:

* Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
  objects fit this aspect of the buffer interface allows you to do regexp
  searches on it w/o ever building a twelve gigabyte PyString.

* Consider a non-contiguous NumPy array.  If the array type supported the
  multi-segment buffer interface, extension module writers could
  manipulate the data within this array w/o having to worry about the
  non-contiguous nature of the data.  They'd still have to worry about
  the multi-byte nature of the data, but it's still a win.  In other
  words, I think that the buffer interface could be useful even w/
  non-textual data.  

* If NumPy was modified to have arrays with data stored in buffer objects
  as opposed to the current "char *", and if PIL was modified to have
  images stored in buffer objects as opposed to whatever it uses, one
  could have arrays and images which shared data.  

I think all of these provide examples of motivations which are appealing
to at least some Python users. I make no claim that they motivate the
specific interface.  In all the cases I can think of, one or both of two
features are the key asset:

  - access to subset of huge data regions w/o creation of huge temporary
    variables.

  - sharing of data space.

Yes, it's a power tool, and as a such should come with safety goggles.
But then again, the same is true for ExtensionClasses =).

leaving-out-the-regexp-on-NumPy-arrays-example, 

   --david

PS: I take back the implicit suggestion that buffer() return read-write
    buffers when possible.  


From jim@digicool.com  Mon Aug 16 18:06:19 1999
From: jim@digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 13:06:19 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908160926440.281-100000@rigoletto.ski.org>
Message-ID: <37B8450B.C5D308E4@digicool.com>

David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> > > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > > again only available in C, and it allows C programmers to get an object _as an
> > > ASCII string_. This is meant for things like regexp modules, to access any
> > > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > > bound to the "t#" specifier in PyArg_ParseTuple.
> >
> > Hm. So this is making a little more sense. So, there is a notion that
> > there are "textual" objects that want to provide a method for getting
> > their "text". How does this text differ from what you get from __str__
> > or __repr__?
> 
> I'll let others give a well thought out rationale. 

I eagerly await this. :)

> Here are some examples
> of use which I think worthwile:
> 
> * Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
>   objects fit this aspect of the buffer interface allows you to do regexp
>   searches on it w/o ever building a twelve gigabyte PyString.

This seems reasonable, if a bit exotic. :)

> * Consider a non-contiguous NumPy array.  If the array type supported the
>   multi-segment buffer interface, extension module writers could
>   manipulate the data within this array w/o having to worry about the
>   non-contiguous nature of the data.  They'd still have to worry about
>   the multi-byte nature of the data, but it's still a win.  In other
>   words, I think that the buffer interface could be useful even w/
>   non-textual data.

Why is this a good thing? Why should extension module writes 
worry abot the non-contiguous nature of the data now?  Does the NumPy
C API somehow expose this now?  Will multi-segment buffers make it
go away somehow?
 
> * If NumPy was modified to have arrays with data stored in buffer objects
>   as opposed to the current "char *", and if PIL was modified to have
>   images stored in buffer objects as opposed to whatever it uses, one
>   could have arrays and images which shared data.

Uh, and this would be a good thing? Maybe PIL should just be modified
to use NumPy arrays.
 
> I think all of these provide examples of motivations which are appealing
> to at least some Python users.

Perhaps, although Guido knows how they'd find out about them. ;)

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da@ski.org  Mon Aug 16 18:18:46 1999
From: da@ski.org (David Ascher)
Date: Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B8450B.C5D308E4@digicool.com>
Message-ID: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

>> [regexps on gigabyte files]
>
> This seems reasonable, if a bit exotic. :)

In the bioinformatics world, I think it's everyday stuff.

> Why is this a good thing? Why should extension module writes worry
> abot the non-contiguous nature of the data now?  Does the NumPy C API
> somehow expose this now?  Will multi-segment buffers make it go away
> somehow?

A NumPy extension module writer needs to create and modify NumPy arrays.
These arrays may be non-contiguous (if e.g. they are the result of
slicing).  The NumPy C API exposes the non-contiguous nature, but it's
hard enough to deal with it that I suspect most extension writers require
contiguous arrays, which means unnecessary copies.

Multi-segment buffers won't make the API go away necessarily (backwards
compatibility and all that), but it could make it unnecessary for many
extension writers.

> > * If NumPy was modified to have arrays with data stored in buffer objects
> >   as opposed to the current "char *", and if PIL was modified to have
> >   images stored in buffer objects as opposed to whatever it uses, one
> >   could have arrays and images which shared data.
> 
> Uh, and this would be a good thing? Maybe PIL should just be modified
> to use NumPy arrays.

Why?  PIL was designed for image processing, and made design decisions
appropriate to that domain.  NumPy was designed for multidimensional
numeric array processing, and made design decisions appropriate to that
domain. The intersection of interests exists (e.g. in the medical imaging
world), and I know people who spend a lot of their CPU time moving data
between images and arrays with "stupid" tostring/fromstring operations.  
Given the size of the images, it's a prodigious waste of time, and kills
the use of Python in many a project.

> Perhaps, although Guido knows how they'd find out about them. ;)

Uh?  These issues have been discussed in the NumPy/PIL world for a while,
with no solution in sight.  Recently, I and others saw mentions of buffers
in the source, and they seemed like a reasonable approach, which could be
done w/o a rewrite of either PIL or NumPy.  

Don't get me wrong -- I'm all for better documentation of the buffer
stuff, design guidelines, warnings and protocols.  I stated as much on
June 15:

  http://www.python.org/pipermail/python-dev/1999-June/000338.html


--david


From jim@digicool.com  Mon Aug 16 18:38:22 1999
From: jim@digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 13:38:22 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>
Message-ID: <37B84C8E.46885C8E@digicool.com>

David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> >> [regexps on gigabyte files]
> >
> > This seems reasonable, if a bit exotic. :)
> 
> In the bioinformatics world, I think it's everyday stuff.

Right, in some (exotic ;) domains it's not exotic at all. 

> > Why is this a good thing? Why should extension module writes worry
> > abot the non-contiguous nature of the data now?  Does the NumPy C API
> > somehow expose this now?  Will multi-segment buffers make it go away
> > somehow?
> 
> A NumPy extension module writer needs to create and modify NumPy arrays.
> These arrays may be non-contiguous (if e.g. they are the result of
> slicing).  The NumPy C API exposes the non-contiguous nature, but it's
> hard enough to deal with it that I suspect most extension writers require
> contiguous arrays, which means unnecessary copies.

Hm. This sounds like an API problem to me.

> Multi-segment buffers won't make the API go away necessarily (backwards
> compatibility and all that), but it could make it unnecessary for many
> extension writers.

Multi-segment buffers don't make the mult-segmented nature of the
memory go away. Do they really simplify the API that much?

They seem to strip away an awful lot of information hiding.
 
> > > * If NumPy was modified to have arrays with data stored in buffer objects
> > >   as opposed to the current "char *", and if PIL was modified to have
> > >   images stored in buffer objects as opposed to whatever it uses, one
> > >   could have arrays and images which shared data.
> >
> > Uh, and this would be a good thing? Maybe PIL should just be modified
> > to use NumPy arrays.
> 
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

It seems to me that NumPy is sufficiently broad enogh to encompass
image processing.

My main concern is having two systems rely on some low-level "shared
memory" mechanism to achiev effiecient communication.
 
> > Perhaps, although Guido knows how they'd find out about them. ;)
> 
> Uh?  These issues have been discussed in the NumPy/PIL world for a while,
> with no solution in sight.  Recently, I and others saw mentions of buffers
> in the source, and they seemed like a reasonable approach, which could be
> done w/o a rewrite of either PIL or NumPy.

My point was that people would be lucky to find out about buffers or
about how to use them as things stand.

> Don't get me wrong -- I'm all for better documentation of the buffer
> stuff, design guidelines, warnings and protocols.  I stated as much on
> June 15:
> 
>   http://www.python.org/pipermail/python-dev/1999-June/000338.html

Yes, that was quite a jihad you launched. ;)

Jim

--
Jim Fulton           mailto:jim@digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da@ski.org  Mon Aug 16 19:25:54 1999
From: da@ski.org (David Ascher)
Date: Mon, 16 Aug 1999 11:25:54 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B84C8E.46885C8E@digicool.com>
Message-ID: <Pine.WNT.4.04.9908161047290.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

[ Aside:

  > It seems to me that NumPy is sufficiently broad enogh to encompass
  > image processing.

  Well, I'll just say that you could have been right, but w/ the current
  NumPy, I don't blame F/ for having developed his own data structures.  
  NumPy is messy, and some of its design decisions are wrong for image
  things (memory handling, casting rules, etc.).  It's all water under the
  bridge at this point.
]

Back to the main topic:

You say:

> [Multi-segment buffers] seem to strip away an awful lot of information
> hiding.

My impression of the buffer notion was that it is intended to *provide*
information hiding, by giving a simple API to byte arrays which could be
stored in various ways.  I do agree that whether those bytes should be
shared or not is a decision which should be weighted carefully.

> My main concern is having two systems rely on some low-level "shared
> memory" mechanism to achiev effiecient communication.

I don't particularly care about the specific buffer interface (the
low-level nature of which is what I think you object to). I do care about
having a well-defined mechanism for sharing memory between objects, and I
think there is value in defining such an interface generically.  Maybe the
notion of segmented arrays of bytes is too low-level, and instead we
should think of the data spaces as segmented arrays of chunks, where a
chunk can be one or more bytes?  Or do you object to any 'generic'
interface?

Just for fun, here's the list of things which either currently do or have
been talked about possibly in the future supporting some sort of buffer
interface, and my guesses as to chunk size, segmented status and
writeability):

  - strings  (1 byte, single-segment, r/o)
  - unicode strings (2 bytes, single-segment, r/o)
  - struct.pack() things (1 byte, single-segment,r/o)
  - arrays (1-4? bytes, single-segment, r/w)
  - NumPy arrays (1-8 bytes, multi-segment, r/w)
  - PIL images (1-? bytes, multi-segment, r/w)
  - CObjects (1-byte, single-segment, r/?)
  - mmapfiles (1-byte, multi-segment?, r/w)
  - non-python-owned memory (1-byte, single-segment, r/w)

--david


From jack@oratrix.nl  Mon Aug 16 20:36:40 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 21:36:40 +0200
Subject: [Python-Dev] Buffer interface and multiple threads
Message-ID: <19990816193645.9E5B5CF320@oratrix.oratrix.nl>

Hmm, something that just struck me: the buffer _interface_ (i.e. the C 
routines, not the buffer object stuff) is potentially thread-unsafe.

In the "old world", where "s#" only worked on string objects, you
could be sure that the C pointer returned remained valid as long as
you had a reference to the python string object in hand, as strings
are immutable.

In the "new world", where "s#" also works on, say, array objects, this 
doesn't hold anymore. So, potentially, while one thread is in a
write() system call writing the contents of the array to a file
another thread could come in and change the data.

Is this a problem?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal@lemburg.com  Mon Aug 16 21:22:12 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 22:22:12 +0200
Subject: [Python-Dev] New htmlentitydefs.py file
Message-ID: <37B872F4.1C3F5D39@lemburg.com>

This is a multi-part message in MIME format.

--------------3B4AC9E96FE0666068F893B2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Attached you find a new HTML entity definitions file taken and
parsed from:

    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLlat1.ent
    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLsymbol.ent
    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLspecial.ent
 
The latter two contain Unicode charcodes which obviously cannot
(yet) be mapped to Unicode strings... perhaps Fredrik wants
to include a spiced up version in with his Unicode type.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/

--------------3B4AC9E96FE0666068F893B2
Content-Type: text/plain; charset=us-ascii; name="htmlentitydefs.py"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="htmlentitydefs.py"

"""
    Entity definitions for HTML4.0. Taken and parsed from:
        http://www.w3.org/TR/1998/REC-html40/HTMLlat1.ent
        http://www.w3.org/TR/1998/REC-html40/HTMLsymbol.ent
        http://www.w3.org/TR/1998/REC-html40/HTMLspecial.ent
"""

entitydefs = {
    'AElig':	chr(198),	# latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1
    'Aacute':	chr(193),	# latin capital letter A with acute, U+00C1 ISOlat1
    'Acirc':	chr(194),	# latin capital letter A with circumflex, U+00C2 ISOlat1
    'Agrave':	chr(192),	# latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1
    'Alpha':	'&#913;',	# greek capital letter alpha, U+0391
    'Aring':	chr(197),	# latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1
    'Atilde':	chr(195),	# latin capital letter A with tilde, U+00C3 ISOlat1
    'Auml':	chr(196),	# latin capital letter A with diaeresis, U+00C4 ISOlat1
    'Beta':	'&#914;',	# greek capital letter beta, U+0392
    'Ccedil':	chr(199),	# latin capital letter C with cedilla, U+00C7 ISOlat1
    'Chi':	'&#935;',	# greek capital letter chi, U+03A7
    'Dagger':	'&#8225;',	# double dagger, U+2021 ISOpub
    'Delta':	'&#916;',	# greek capital letter delta, U+0394 ISOgrk3
    'ETH':	chr(208),	# latin capital letter ETH, U+00D0 ISOlat1
    'Eacute':	chr(201),	# latin capital letter E with acute, U+00C9 ISOlat1
    'Ecirc':	chr(202),	# latin capital letter E with circumflex, U+00CA ISOlat1
    'Egrave':	chr(200),	# latin capital letter E with grave, U+00C8 ISOlat1
    'Epsilon':	'&#917;',	# greek capital letter epsilon, U+0395
    'Eta':	'&#919;',	# greek capital letter eta, U+0397
    'Euml':	chr(203),	# latin capital letter E with diaeresis, U+00CB ISOlat1
    'Gamma':	'&#915;',	# greek capital letter gamma, U+0393 ISOgrk3
    'Iacute':	chr(205),	# latin capital letter I with acute, U+00CD ISOlat1
    'Icirc':	chr(206),	# latin capital letter I with circumflex, U+00CE ISOlat1
    'Igrave':	chr(204),	# latin capital letter I with grave, U+00CC ISOlat1
    'Iota':	'&#921;',	# greek capital letter iota, U+0399
    'Iuml':	chr(207),	# latin capital letter I with diaeresis, U+00CF ISOlat1
    'Kappa':	'&#922;',	# greek capital letter kappa, U+039A
    'Lambda':	'&#923;',	# greek capital letter lambda, U+039B ISOgrk3
    'Mu':	'&#924;',	# greek capital letter mu, U+039C
    'Ntilde':	chr(209),	# latin capital letter N with tilde, U+00D1 ISOlat1
    'Nu':	'&#925;',	# greek capital letter nu, U+039D
    'Oacute':	chr(211),	# latin capital letter O with acute, U+00D3 ISOlat1
    'Ocirc':	chr(212),	# latin capital letter O with circumflex, U+00D4 ISOlat1
    'Ograve':	chr(210),	# latin capital letter O with grave, U+00D2 ISOlat1
    'Omega':	'&#937;',	# greek capital letter omega, U+03A9 ISOgrk3
    'Omicron':	'&#927;',	# greek capital letter omicron, U+039F
    'Oslash':	chr(216),	# latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1
    'Otilde':	chr(213),	# latin capital letter O with tilde, U+00D5 ISOlat1
    'Ouml':	chr(214),	# latin capital letter O with diaeresis, U+00D6 ISOlat1
    'Phi':	'&#934;',	# greek capital letter phi, U+03A6 ISOgrk3
    'Pi':	'&#928;',	# greek capital letter pi, U+03A0 ISOgrk3
    'Prime':	'&#8243;',	# double prime = seconds = inches, U+2033 ISOtech
    'Psi':	'&#936;',	# greek capital letter psi, U+03A8 ISOgrk3
    'Rho':	'&#929;',	# greek capital letter rho, U+03A1
    'Sigma':	'&#931;',	# greek capital letter sigma, U+03A3 ISOgrk3
    'THORN':	chr(222),	# latin capital letter THORN, U+00DE ISOlat1
    'Tau':	'&#932;',	# greek capital letter tau, U+03A4
    'Theta':	'&#920;',	# greek capital letter theta, U+0398 ISOgrk3
    'Uacute':	chr(218),	# latin capital letter U with acute, U+00DA ISOlat1
    'Ucirc':	chr(219),	# latin capital letter U with circumflex, U+00DB ISOlat1
    'Ugrave':	chr(217),	# latin capital letter U with grave, U+00D9 ISOlat1
    'Upsilon':	'&#933;',	# greek capital letter upsilon, U+03A5 ISOgrk3
    'Uuml':	chr(220),	# latin capital letter U with diaeresis, U+00DC ISOlat1
    'Xi':	'&#926;',	# greek capital letter xi, U+039E ISOgrk3
    'Yacute':	chr(221),	# latin capital letter Y with acute, U+00DD ISOlat1
    'Zeta':	'&#918;',	# greek capital letter zeta, U+0396
    'aacute':	chr(225),	# latin small letter a with acute, U+00E1 ISOlat1
    'acirc':	chr(226),	# latin small letter a with circumflex, U+00E2 ISOlat1
    'acute':	chr(180),	# acute accent = spacing acute, U+00B4 ISOdia
    'aelig':	chr(230),	# latin small letter ae = latin small ligature ae, U+00E6 ISOlat1
    'agrave':	chr(224),	# latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1
    'alefsym':	'&#8501;',	# alef symbol = first transfinite cardinal, U+2135 NEW
    'alpha':	'&#945;',	# greek small letter alpha, U+03B1 ISOgrk3
    'and':	'&#8743;',	# logical and = wedge, U+2227 ISOtech
    'ang':	'&#8736;',	# angle, U+2220 ISOamso
    'aring':	chr(229),	# latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1
    'asymp':	'&#8776;',	# almost equal to = asymptotic to, U+2248 ISOamsr
    'atilde':	chr(227),	# latin small letter a with tilde, U+00E3 ISOlat1
    'auml':	chr(228),	# latin small letter a with diaeresis, U+00E4 ISOlat1
    'bdquo':	'&#8222;',	# double low-9 quotation mark, U+201E NEW
    'beta':	'&#946;',	# greek small letter beta, U+03B2 ISOgrk3
    'brvbar':	chr(166),	# broken bar = broken vertical bar, U+00A6 ISOnum
    'bull':	'&#8226;',	# bullet = black small circle, U+2022 ISOpub
    'cap':	'&#8745;',	# intersection = cap, U+2229 ISOtech
    'ccedil':	chr(231),	# latin small letter c with cedilla, U+00E7 ISOlat1
    'cedil':	chr(184),	# cedilla = spacing cedilla, U+00B8 ISOdia
    'cent':	chr(162),	# cent sign, U+00A2 ISOnum
    'chi':	'&#967;',	# greek small letter chi, U+03C7 ISOgrk3
    'clubs':	'&#9827;',	# black club suit = shamrock, U+2663 ISOpub
    'cong':	'&#8773;',	# approximately equal to, U+2245 ISOtech
    'copy':	chr(169),	# copyright sign, U+00A9 ISOnum
    'crarr':	'&#8629;',	# downwards arrow with corner leftwards = carriage return, U+21B5 NEW
    'cup':	'&#8746;',	# union = cup, U+222A ISOtech
    'curren':	chr(164),	# currency sign, U+00A4 ISOnum
    'dArr':	'&#8659;',	# downwards double arrow, U+21D3 ISOamsa
    'dagger':	'&#8224;',	# dagger, U+2020 ISOpub
    'darr':	'&#8595;',	# downwards arrow, U+2193 ISOnum
    'deg':	chr(176),	# degree sign, U+00B0 ISOnum
    'delta':	'&#948;',	# greek small letter delta, U+03B4 ISOgrk3
    'diams':	'&#9830;',	# black diamond suit, U+2666 ISOpub
    'divide':	chr(247),	# division sign, U+00F7 ISOnum
    'eacute':	chr(233),	# latin small letter e with acute, U+00E9 ISOlat1
    'ecirc':	chr(234),	# latin small letter e with circumflex, U+00EA ISOlat1
    'egrave':	chr(232),	# latin small letter e with grave, U+00E8 ISOlat1
    'empty':	'&#8709;',	# empty set = null set = diameter, U+2205 ISOamso
    'emsp':	'&#8195;',	# em space, U+2003 ISOpub
    'ensp':	'&#8194;',	# en space, U+2002 ISOpub
    'epsilon':	'&#949;',	# greek small letter epsilon, U+03B5 ISOgrk3
    'equiv':	'&#8801;',	# identical to, U+2261 ISOtech
    'eta':	'&#951;',	# greek small letter eta, U+03B7 ISOgrk3
    'eth':	chr(240),	# latin small letter eth, U+00F0 ISOlat1
    'euml':	chr(235),	# latin small letter e with diaeresis, U+00EB ISOlat1
    'exist':	'&#8707;',	# there exists, U+2203 ISOtech
    'fnof':	'&#402;',	# latin small f with hook = function = florin, U+0192 ISOtech
    'forall':	'&#8704;',	# for all, U+2200 ISOtech
    'frac12':	chr(189),	# vulgar fraction one half = fraction one half, U+00BD ISOnum
    'frac14':	chr(188),	# vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum
    'frac34':	chr(190),	# vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum
    'frasl':	'&#8260;',	# fraction slash, U+2044 NEW
    'gamma':	'&#947;',	# greek small letter gamma, U+03B3 ISOgrk3
    'ge':	'&#8805;',	# greater-than or equal to, U+2265 ISOtech
    'hArr':	'&#8660;',	# left right double arrow, U+21D4 ISOamsa
    'harr':	'&#8596;',	# left right arrow, U+2194 ISOamsa
    'hearts':	'&#9829;',	# black heart suit = valentine, U+2665 ISOpub
    'hellip':	'&#8230;',	# horizontal ellipsis = three dot leader, U+2026 ISOpub
    'iacute':	chr(237),	# latin small letter i with acute, U+00ED ISOlat1
    'icirc':	chr(238),	# latin small letter i with circumflex, U+00EE ISOlat1
    'iexcl':	chr(161),	# inverted exclamation mark, U+00A1 ISOnum
    'igrave':	chr(236),	# latin small letter i with grave, U+00EC ISOlat1
    'image':	'&#8465;',	# blackletter capital I = imaginary part, U+2111 ISOamso
    'infin':	'&#8734;',	# infinity, U+221E ISOtech
    'int':	'&#8747;',	# integral, U+222B ISOtech
    'iota':	'&#953;',	# greek small letter iota, U+03B9 ISOgrk3
    'iquest':	chr(191),	# inverted question mark = turned question mark, U+00BF ISOnum
    'isin':	'&#8712;',	# element of, U+2208 ISOtech
    'iuml':	chr(239),	# latin small letter i with diaeresis, U+00EF ISOlat1
    'kappa':	'&#954;',	# greek small letter kappa, U+03BA ISOgrk3
    'lArr':	'&#8656;',	# leftwards double arrow, U+21D0 ISOtech
    'lambda':	'&#955;',	# greek small letter lambda, U+03BB ISOgrk3
    'lang':	'&#9001;',	# left-pointing angle bracket = bra, U+2329 ISOtech
    'laquo':	chr(171),	# left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum
    'larr':	'&#8592;',	# leftwards arrow, U+2190 ISOnum
    'lceil':	'&#8968;',	# left ceiling = apl upstile, U+2308 ISOamsc
    'ldquo':	'&#8220;',	# left double quotation mark, U+201C ISOnum
    'le':	'&#8804;',	# less-than or equal to, U+2264 ISOtech
    'lfloor':	'&#8970;',	# left floor = apl downstile, U+230A ISOamsc
    'lowast':	'&#8727;',	# asterisk operator, U+2217 ISOtech
    'loz':	'&#9674;',	# lozenge, U+25CA ISOpub
    'lrm':	'&#8206;',	# left-to-right mark, U+200E NEW RFC 2070
    'lsaquo':	'&#8249;',	# single left-pointing angle quotation mark, U+2039 ISO proposed
    'lsquo':	'&#8216;',	# left single quotation mark, U+2018 ISOnum
    'macr':	chr(175),	# macron = spacing macron = overline = APL overbar, U+00AF ISOdia
    'mdash':	'&#8212;',	# em dash, U+2014 ISOpub
    'micro':	chr(181),	# micro sign, U+00B5 ISOnum
    'middot':	chr(183),	# middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum
    'minus':	'&#8722;',	# minus sign, U+2212 ISOtech
    'mu':	'&#956;',	# greek small letter mu, U+03BC ISOgrk3
    'nabla':	'&#8711;',	# nabla = backward difference, U+2207 ISOtech
    'nbsp':	chr(160),	# no-break space = non-breaking space, U+00A0 ISOnum
    'ndash':	'&#8211;',	# en dash, U+2013 ISOpub
    'ne':	'&#8800;',	# not equal to, U+2260 ISOtech
    'ni':	'&#8715;',	# contains as member, U+220B ISOtech
    'not':	chr(172),	# not sign, U+00AC ISOnum
    'notin':	'&#8713;',	# not an element of, U+2209 ISOtech
    'nsub':	'&#8836;',	# not a subset of, U+2284 ISOamsn
    'ntilde':	chr(241),	# latin small letter n with tilde, U+00F1 ISOlat1
    'nu':	'&#957;',	# greek small letter nu, U+03BD ISOgrk3
    'oacute':	chr(243),	# latin small letter o with acute, U+00F3 ISOlat1
    'ocirc':	chr(244),	# latin small letter o with circumflex, U+00F4 ISOlat1
    'ograve':	chr(242),	# latin small letter o with grave, U+00F2 ISOlat1
    'oline':	'&#8254;',	# overline = spacing overscore, U+203E NEW
    'omega':	'&#969;',	# greek small letter omega, U+03C9 ISOgrk3
    'omicron':	'&#959;',	# greek small letter omicron, U+03BF NEW
    'oplus':	'&#8853;',	# circled plus = direct sum, U+2295 ISOamsb
    'or':	'&#8744;',	# logical or = vee, U+2228 ISOtech
    'ordf':	chr(170),	# feminine ordinal indicator, U+00AA ISOnum
    'ordm':	chr(186),	# masculine ordinal indicator, U+00BA ISOnum
    'oslash':	chr(248),	# latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1
    'otilde':	chr(245),	# latin small letter o with tilde, U+00F5 ISOlat1
    'otimes':	'&#8855;',	# circled times = vector product, U+2297 ISOamsb
    'ouml':	chr(246),	# latin small letter o with diaeresis, U+00F6 ISOlat1
    'para':	chr(182),	# pilcrow sign = paragraph sign, U+00B6 ISOnum
    'part':	'&#8706;',	# partial differential, U+2202 ISOtech
    'permil':	'&#8240;',	# per mille sign, U+2030 ISOtech
    'perp':	'&#8869;',	# up tack = orthogonal to = perpendicular, U+22A5 ISOtech
    'phi':	'&#966;',	# greek small letter phi, U+03C6 ISOgrk3
    'pi':	'&#960;',	# greek small letter pi, U+03C0 ISOgrk3
    'piv':	'&#982;',	# greek pi symbol, U+03D6 ISOgrk3
    'plusmn':	chr(177),	# plus-minus sign = plus-or-minus sign, U+00B1 ISOnum
    'pound':	chr(163),	# pound sign, U+00A3 ISOnum
    'prime':	'&#8242;',	# prime = minutes = feet, U+2032 ISOtech
    'prod':	'&#8719;',	# n-ary product = product sign, U+220F ISOamsb
    'prop':	'&#8733;',	# proportional to, U+221D ISOtech
    'psi':	'&#968;',	# greek small letter psi, U+03C8 ISOgrk3
    'rArr':	'&#8658;',	# rightwards double arrow, U+21D2 ISOtech
    'radic':	'&#8730;',	# square root = radical sign, U+221A ISOtech
    'rang':	'&#9002;',	# right-pointing angle bracket = ket, U+232A ISOtech
    'raquo':	chr(187),	# right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum
    'rarr':	'&#8594;',	# rightwards arrow, U+2192 ISOnum
    'rceil':	'&#8969;',	# right ceiling, U+2309 ISOamsc
    'rdquo':	'&#8221;',	# right double quotation mark, U+201D ISOnum
    'real':	'&#8476;',	# blackletter capital R = real part symbol, U+211C ISOamso
    'reg':	chr(174),	# registered sign = registered trade mark sign, U+00AE ISOnum
    'rfloor':	'&#8971;',	# right floor, U+230B ISOamsc
    'rho':	'&#961;',	# greek small letter rho, U+03C1 ISOgrk3
    'rlm':	'&#8207;',	# right-to-left mark, U+200F NEW RFC 2070
    'rsaquo':	'&#8250;',	# single right-pointing angle quotation mark, U+203A ISO proposed
    'rsquo':	'&#8217;',	# right single quotation mark, U+2019 ISOnum
    'sbquo':	'&#8218;',	# single low-9 quotation mark, U+201A NEW
    'sdot':	'&#8901;',	# dot operator, U+22C5 ISOamsb
    'sect':	chr(167),	# section sign, U+00A7 ISOnum
    'shy':	chr(173),	# soft hyphen = discretionary hyphen, U+00AD ISOnum
    'sigma':	'&#963;',	# greek small letter sigma, U+03C3 ISOgrk3
    'sigmaf':	'&#962;',	# greek small letter final sigma, U+03C2 ISOgrk3
    'sim':	'&#8764;',	# tilde operator = varies with = similar to, U+223C ISOtech
    'spades':	'&#9824;',	# black spade suit, U+2660 ISOpub
    'sub':	'&#8834;',	# subset of, U+2282 ISOtech
    'sube':	'&#8838;',	# subset of or equal to, U+2286 ISOtech
    'sum':	'&#8721;',	# n-ary sumation, U+2211 ISOamsb
    'sup':	'&#8835;',	# superset of, U+2283 ISOtech
    'sup1':	chr(185),	# superscript one = superscript digit one, U+00B9 ISOnum
    'sup2':	chr(178),	# superscript two = superscript digit two = squared, U+00B2 ISOnum
    'sup3':	chr(179),	# superscript three = superscript digit three = cubed, U+00B3 ISOnum
    'supe':	'&#8839;',	# superset of or equal to, U+2287 ISOtech
    'szlig':	chr(223),	# latin small letter sharp s = ess-zed, U+00DF ISOlat1
    'tau':	'&#964;',	# greek small letter tau, U+03C4 ISOgrk3
    'there4':	'&#8756;',	# therefore, U+2234 ISOtech
    'theta':	'&#952;',	# greek small letter theta, U+03B8 ISOgrk3
    'thetasym':	'&#977;',	# greek small letter theta symbol, U+03D1 NEW
    'thinsp':	'&#8201;',	# thin space, U+2009 ISOpub
    'thorn':	chr(254),	# latin small letter thorn with, U+00FE ISOlat1
    'times':	chr(215),	# multiplication sign, U+00D7 ISOnum
    'trade':	'&#8482;',	# trade mark sign, U+2122 ISOnum
    'uArr':	'&#8657;',	# upwards double arrow, U+21D1 ISOamsa
    'uacute':	chr(250),	# latin small letter u with acute, U+00FA ISOlat1
    'uarr':	'&#8593;',	# upwards arrow, U+2191 ISOnum
    'ucirc':	chr(251),	# latin small letter u with circumflex, U+00FB ISOlat1
    'ugrave':	chr(249),	# latin small letter u with grave, U+00F9 ISOlat1
    'uml':	chr(168),	# diaeresis = spacing diaeresis, U+00A8 ISOdia
    'upsih':	'&#978;',	# greek upsilon with hook symbol, U+03D2 NEW
    'upsilon':	'&#965;',	# greek small letter upsilon, U+03C5 ISOgrk3
    'uuml':	chr(252),	# latin small letter u with diaeresis, U+00FC ISOlat1
    'weierp':	'&#8472;',	# script capital P = power set = Weierstrass p, U+2118 ISOamso
    'xi':	'&#958;',	# greek small letter xi, U+03BE ISOgrk3
    'yacute':	chr(253),	# latin small letter y with acute, U+00FD ISOlat1
    'yen':	chr(165),	# yen sign = yuan sign, U+00A5 ISOnum
    'yuml':	chr(255),	# latin small letter y with diaeresis, U+00FF ISOlat1
    'zeta':	'&#950;',	# greek small letter zeta, U+03B6 ISOgrk3
    'zwj':	'&#8205;',	# zero width joiner, U+200D NEW RFC 2070
    'zwnj':	'&#8204;',	# zero width non-joiner, U+200C NEW RFC 2070

}

--------------3B4AC9E96FE0666068F893B2--


From tim_one@email.msn.com  Tue Aug 17 08:30:17 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 17 Aug 1999 03:30:17 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37B8042B.21DE6053@lemburg.com>
Message-ID: <000001bee882$5b7d8da0$112d2399@tim>

[about weakdicts and the possibility of building them on weak
 references; the obvious way doesn't clean up the dict itself by
 magic; maybe a weak object should be notified when its referent
 goes away
]

[M.-A. Lemburg]
> Perhaps one could fiddle something out of the Proxy objects
> in mxProxy (you know where...). These support a special __cleanup__
> protocol that I use a lot to work around circular garbage:
> the __cleanup__ method of the referenced object is called prior
> to destroying the proxy; even if the reference count on the
> object has not yet gone down to 0.
>
> This makes direct circles possible without problems: the parent
> can reference a child through the proxy and the child can reference the
> parent directly.

What you just wrote is:

    parent --> proxy --> child -->+
    ^                             v
    +<----------------------------+

Looks like a plain old cycle to me!

> As soon as the parent is cleaned up, the reference to
> the proxy is deleted which then automagically makes the
> back reference in the child disappear, allowing the parent
> to be deallocated after cleanup without leaving a circular
> reference around.

M-A, this is making less sense by the paragraph <wink>:  skipping the
middle, this says "as soon as the parent is cleaned up ... allowing the
parent to be deallocated after cleanup".  If we presume that the parent gets
cleaned up explicitly (since the reference from the child is keeping it
alive, it's not going to get cleaned up by magic, right?), then the parent
could just as well call the __cleanup__ methods of the things it references
directly without bothering with a proxy.  For that matter, if it's the
straightforward

    parent <-> child

kind of cycle, the parent's cleanup method can just do

    self.__dict__.clear()

and the cycle is broken without writing a __cleanup__ method anywhere
(that's what I usually do, and in this kind of cycle that clears the last
reference to the child, which then goes away, which in turn automagically
clears its back reference to the parent).

So, offhand, I don't see that the proxy protocol could help here.  In a
sense, what's really needed is the opposite:  notifying the *proxy* when the
*real* object goes away (which makes no sense in the context of what your
proxy objects were designed to do).

[about Java and its four reference strengths]

Found a good introductory writeup at (sorry, my mailer will break this URL,
so I'll break it myself at a sensible place):

http://developer.java.sun.com/developer/
    technicalArticles//ALT/RefObj/index.html

They have a class for each of the three "not strong" flavors of references.
For all three you pass the referenced object to the constructor, and all
three accept (optional in two of the flavors) a second ReferenceQueue
argument.  In the latter case, when the referenced object goes away the
weak/soft/phantom-ref proxy object is placed on the queue.  Which, in turn,
is a thread-safe queue with various put, get, and timeout-limited polling
functions.  So you have to write code to look at the queue from time to
time, to find the proxies whose referents have gone away.

The three flavors may (or may not ...) have these motivations:

soft:  an object reachable at strongest by soft references can go away at
any time, but the garbage collector strives to keep it intact until it can't
find any other way to get enough memory

weak:  an object reachable at strongest by weak references can go away at
any time, and the collector makes no attempt to delay its death

phantom:  an object reachable at strongest by phantom references can get
*finalized* at any time, but won't get *deallocated* before its phantom
proxy does something or other (goes away? wasn't clear).  This is the flavor
that requires passing a queue argument to the constructor.  Seems to be a
major hack to worm around Java's notorious problems with order of
finalization -- along the lines that you give phantom referents trivial
finalizers, and put the real cleanup logic in the phantom proxy.  This lets
your program take responsibility for running the real cleanup code in the
order-- and in the thread! --where it makes sense.

Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with
under-the-cover weak keys (unlike Dieter's flavor with weak values), and
where the key+value pairs vanish by magic when the key object goes away.
The details and the implementation of these guys waren't clear to me, but
then I didn't download the code, just scanned the online docs.


Ah, a correction to my last post:

class _Weak:
    ...
    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

Root of all evil:  this method is useless, since the id2weak dict keeps each
_Weak object alive until its referent goes away (at which time self.id gets
set to None, so _Weak.__del__ doesn't do anything).  Even if it did do
something, it's no cheaper to do it here than in the systemt cleanup code
("greater expense" was wrong).

weakly y'rs  - tim


PS:  Ooh!  Ooh!  Fellow at work today was whining about weakdicts, and
called them "limp dicts".  I'm not entirely sure it was an innocent Freudian
slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds
like American slang for "flaccid one-eyed trouser snake" ...).


From fredrik@pythonware.com  Tue Aug 17 08:23:03 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 17 Aug 1999 09:23:03 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>
Message-ID: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com>

David Ascher <da@ski.org> wrote:
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.  
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

as an aside, PIL 1.1 (*) introduces "virtual image memories" which
are, as I mentioned in an earlier post, accessed via an API rather
than via direct pointers.  it'll also include an adapter allowing you
to use NumPy objects as image memories.

unfortunately, the buffer interface is not good enough to use
on top of the virtual image memory interface...

</F>

*) 1.1 is our current development thread, which will be
released to plus customers in a number of weeks...


From mal@lemburg.com  Tue Aug 17 09:50:01 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Tue, 17 Aug 1999 10:50:01 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000001bee882$5b7d8da0$112d2399@tim>
Message-ID: <37B92239.4076841E@lemburg.com>

Tim Peters wrote:
> 
> [about weakdicts and the possibility of building them on weak
>  references; the obvious way doesn't clean up the dict itself by
>  magic; maybe a weak object should be notified when its referent
>  goes away
> ]
> 
> [M.-A. Lemburg]
> > Perhaps one could fiddle something out of the Proxy objects
> > in mxProxy (you know where...). These support a special __cleanup__
> > protocol that I use a lot to work around circular garbage:
> > the __cleanup__ method of the referenced object is called prior
> > to destroying the proxy; even if the reference count on the
> > object has not yet gone down to 0.
> >
> > This makes direct circles possible without problems: the parent
> > can reference a child through the proxy and the child can reference the
> > parent directly.
> 
> What you just wrote is:
> 
>     parent --> proxy --> child -->+
>     ^                             v
>     +<----------------------------+
> 
> Looks like a plain old cycle to me!

Sure :-) That was the intention. I'm using this to implement
acquisition without turning to ExtensionClasses. [Nice picture, BTW]
 
> > As soon as the parent is cleaned up, the reference to
> > the proxy is deleted which then automagically makes the
> > back reference in the child disappear, allowing the parent
> > to be deallocated after cleanup without leaving a circular
> > reference around.
> 
> M-A, this is making less sense by the paragraph <wink>:  skipping the
> middle, this says "as soon as the parent is cleaned up ... allowing the
> parent to be deallocated after cleanup".  If we presume that the parent gets
> cleaned up explicitly (since the reference from the child is keeping it
> alive, it's not going to get cleaned up by magic, right?), then the parent
> could just as well call the __cleanup__ methods of the things it references
> directly without bothering with a proxy.  For that matter, if it's the
> straightforward
> 
>     parent <-> child
> 
> kind of cycle, the parent's cleanup method can just do
> 
>     self.__dict__.clear()
> 
> and the cycle is broken without writing a __cleanup__ method anywhere
> (that's what I usually do, and in this kind of cycle that clears the last
> reference to the child, which then goes away, which in turn automagically
> clears its back reference to the parent).
> 
> So, offhand, I don't see that the proxy protocol could help here.  In a
> sense, what's really needed is the opposite:  notifying the *proxy* when the
> *real* object goes away (which makes no sense in the context of what your
> proxy objects were designed to do).

All true :-). The nice thing about the proxy is that it takes
care of the process automagically. And yes, the parent is used
via a proxy too. So the picture looks like this:

--> proxy --> parent --> proxy --> child -->+
              ^                             v
              +<----------------------------+

Since the proxy isn't noticed by the referencing objects (well, at
least if they don't fiddle with internals), the picture for the
objects looks like this:

--> parent --> child -->+
    ^                   v
    +<------------------+

You could of course do the same via explicit invokation of
the __cleanup__ method, but the object references involved could be
hidden in some other structure, so they might be hard to find.

And there's another feature about Proxies (as defined in mxProxy):
they allow you to control access in a much more strict way than
Python does. You can actually hide attributes and methods you
don't want exposed in a way that doesn't even let you access them
via some dict or pass me the frame object trick. This is very useful
when you program multi-user application host servers where you don't
want users to access internal structures of the server.

> [about Java and its four reference strengths]
> 
> Found a good introductory writeup at (sorry, my mailer will break this URL,
> so I'll break it myself at a sensible place):
> 
> http://developer.java.sun.com/developer/
>     technicalArticles//ALT/RefObj/index.html

Thanks for the reference... and for the summary ;-)
 
> They have a class for each of the three "not strong" flavors of references.
> For all three you pass the referenced object to the constructor, and all
> three accept (optional in two of the flavors) a second ReferenceQueue
> argument.  In the latter case, when the referenced object goes away the
> weak/soft/phantom-ref proxy object is placed on the queue.  Which, in turn,
> is a thread-safe queue with various put, get, and timeout-limited polling
> functions.  So you have to write code to look at the queue from time to
> time, to find the proxies whose referents have gone away.
> 
> The three flavors may (or may not ...) have these motivations:
> 
> soft:  an object reachable at strongest by soft references can go away at
> any time, but the garbage collector strives to keep it intact until it can't
> find any other way to get enough memory

So there is a possibility of reviving these objects, right ? 

I've just recently added a hackish function to my mxTools which allows
me to regain access to objects via their address (no, not thread safe,
not even necessarily correct). 
 
sys.makeref(id) 
         Provided that id is a valid address of a Python object (id(object) returns this address),
         this function returns a new reference to it. Only objects that are "alive" can be referenced
         this way, ones with zero reference count cause an exception to be raised. 

         You can use this function to reaccess objects lost during garbage collection.

         USE WITH CARE: this is an expert-only function since it can cause instant core dumps and
         many other strange things -- even ruin your system if you don't know what you're doing ! 

         SECURITY WARNING: This function can provide you with access to objects that are
         otherwise not visible, e.g. in restricted mode, and thus be a potential security hole. 

I use it for tracking objects via id-key based dictionary and
hooks in the create/del mechanisms of Python instances. It helps
finding those memory eating cycles. 

> weak:  an object reachable at strongest by weak references can go away at
> any time, and the collector makes no attempt to delay its death
> 
> phantom:  an object reachable at strongest by phantom references can get
> *finalized* at any time, but won't get *deallocated* before its phantom
> proxy does something or other (goes away? wasn't clear).  This is the flavor
> that requires passing a queue argument to the constructor.  Seems to be a
> major hack to worm around Java's notorious problems with order of
> finalization -- along the lines that you give phantom referents trivial
> finalizers, and put the real cleanup logic in the phantom proxy.  This lets
> your program take responsibility for running the real cleanup code in the
> order-- and in the thread! --where it makes sense.

Wouldn't these flavors be possible using the following setup ? Note
that it's quite similar to your _Weak class except that I use a
proxy without the need to first get a strong reference for the
object and that it doesn't use a weak bit.

--> proxy --> object
                ^
                |
         all_managed_objects

all_managed_objects is a dictionary indexed by address (its id)
and keeps a strong reference to the objects. The proxy does
not keep a strong reference to the object, but only the address
as integer and checks the ref-count on the object in the
all_managed_objects dictionary prior to every dereferencing
action. In case this refcount falls down to 1 (only the
all_managed_objects dict references it), the proxy takes
appropriate action, e.g. raises an exceptions and deletes
the reference in all_managed_objects to mimic a weak reference.
The same check is done prior to garbage collection of the
proxy.

Add to this some queues, pepper and salt and place it in an
oven at 220� for 20 minutes... plus take a look every 10 seconds
or so...

The downside is obvious: the zombified object will not get inspected
(and then GCed) until the next time a weak reference to it is used.

> Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with
> under-the-cover weak keys (unlike Dieter's flavor with weak values), and
> where the key+value pairs vanish by magic when the key object goes away.
> The details and the implementation of these guys waren't clear to me, but
> then I didn't download the code, just scanned the online docs.

Would the above help in creating such beasts ?
 
> Ah, a correction to my last post:
> 
> class _Weak:
>     ...
>     def __del__(self):
>         # this is purely an optimization:  if self gets nuked,
>         # exempt its referent from greater expense when *it*
>         # dies
>         if self.id is not None:
>             __clear_weak_bit(__id2obj(self.id))
>             del id2weak[self.id]
> 
> Root of all evil:  this method is useless, since the id2weak dict keeps each
> _Weak object alive until its referent goes away (at which time self.id gets
> set to None, so _Weak.__del__ doesn't do anything).  Even if it did do
> something, it's no cheaper to do it here than in the systemt cleanup code
> ("greater expense" was wrong).
> 
> weakly y'rs  - tim
> 
> PS:  Ooh!  Ooh!  Fellow at work today was whining about weakdicts, and
> called them "limp dicts".  I'm not entirely sure it was an innocent Freudian
> slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds
> like American slang for "flaccid one-eyed trouser snake" ...).

:-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   136 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond@skippinet.com.au  Tue Aug 17 17:05:40 1999
From: mhammond@skippinet.com.au (Mark Hammond)
Date: Wed, 18 Aug 1999 02:05:40 +1000
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com>
Message-ID: <000901bee8ca$5ceff4a0$1101a8c0@bobcat>

Fredrik,
	Care to elaborate?  Statements like "buffer interface needs a redesign" or
"the buffer interface is not good enough to use on top of the virtual image
memory interface" really only give me the impression you have a bee in your
bonnet over these buffer interfaces.

If you could actually stretch these statements out to provide even _some_
background, problem statement or potential solution it would help.  All I
know is "Fredrik doesnt like it for some unexplained reason".  You found an
issue with array reallocation - great - but thats a bug rather than a
design flaw.

Can you tell us why its not good enough, and an off-the-cuff design that
would solve it?  Or are you suggesting it is unsolvable?  I really dont
have a clue what your issue is.  Jim (for example) has made his position
and reasoning clear.  You have only made your position clear, but your
reasoning is still a mystery.

Mark.

>
> unfortunately, the buffer interface is not good enough to use
> on top of the virtual image memory interface...


From fredrik@pythonware.com  Tue Aug 17 17:48:31 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 17 Aug 1999 18:48:31 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <000901bee8ca$5ceff4a0$1101a8c0@bobcat>
Message-ID: <005201bee8d0$9b4737d0$f29b12c2@secret.pythonware.com>

> Care to elaborate?  Statements like "buffer interface needs a redesign" or
> "the buffer interface is not good enough to use on top of the virtual image
> memory interface" really only give me the impression you have a bee in your
> bonnet over these buffer interfaces.

re "good enough":
http://www.python.org/pipermail/python-dev/1999-August/000650.html

re "needs a redesign":
http://www.python.org/pipermail/python-dev/1999-August/000659.html
and to some extent:
http://www.python.org/pipermail/python-dev/1999-August/000658.html

> Jim (for example) has made his position and reasoning clear.

among other things, Jim said:

    "At this point, I don't have a good idea what buffers are
    for and I don't see alot of evidence that there *is* a design.
    I assume that there was a design, but I can't see it".

which pretty much echoes my concerns in:

http://www.python.org/pipermail/python-dev/1999-August/000612.html
http://www.python.org/pipermail/python-dev/1999-August/000648.html

> You found an issue with array reallocation - great - but thats
> a bug rather than a design flaw.

for me, that bug (and the marshal glitch) indicates that the
design isn't as chrystal-clear as it needs to be, for such a
fundamental feature.  otherwise, Greg would never have
made that mistake, and Guido would have spotted it when
he added the "buffer" built-in...

so what are you folks waiting for?   could someone who
thinks he understands exactly what this thing is spend
an hour on writing that design document, so me and Jim
can put this entire thing behind us?

</F>

PS. btw, was it luck or careful analysis behind the decision
to make buffer() always return read-only buffers, also for
objects implementing the read/write protocol?


From da@ski.org  Tue Aug 17 23:41:14 1999
From: da@ski.org (David Ascher)
Date: Tue, 17 Aug 1999 15:41:14 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <19990816094243.3CE83303120@snelboot.oratrix.nl>
Message-ID: <Pine.WNT.4.04.9908160953490.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jack Jansen wrote:

> Would adding a buffer interface to cobject solve your problem? Cobject is 
> described as being used for passing C objects between Python modules, but I've 
> always thought of it as passing C objects from one C routine to another C 
> routine through Python, which doesn't necessarily understand what the object 
> is all about.
> 
> That latter description seems to fit your bill quite nicely.

It's an interesting idea, but it wouldn't do as it is, as I'd need the
ability to create a CObject given a memory location and a size.  Also, I
am not expected to free() the memory, which would happen when the CObject
got GC'ed.

(BTW: I am *not* arguing that PyBuffer_FromReadWriteMemory() should be
exposed by default.  I'm happy with exposing it in my little extension
module for my exotic needs.)

--david


From mal@lemburg.com  Wed Aug 18 10:02:02 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 11:02:02 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000001bee882$5b7d8da0$112d2399@tim> <37B92239.4076841E@lemburg.com>
Message-ID: <37BA768A.50DF5574@lemburg.com>

[about weakdicts and the possibility of building them on weak
 references; the obvious way doesn't clean up the dict itself by
 magic; maybe a weak object should be notified when its referent
 goes away
]

Here is a new version of my Proxy package which includes a
self managing weak reference mechanism without the need to
add extra bits or bytes to all Python objects:

  http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip

The docs and an explanation of how the thingie works are
included in the archive's Doc subdir. Basically it builds
upon the idea I posted earlier on on this thread -- with
a few extra kicks to get it right in the end ;-)

Usage is pretty simple:

from Proxy import WeakProxy
object = []
wr = WeakProxy(object)
wr.append(8)
del object

>>> wr[0]
Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected

I have checked the ref counts pretty thoroughly, but before
going public I would like the Python-Dev crowd to run some
tests as well: after all, the point is for the weak references
to be weak and that's sometimes a bit hard to check.

Hope you have as much fun with it as I had writing it ;-)

Ah yes, for the raw details have a look at the code. The code
uses a list of back references to the weak Proxies and notifies them
when the object goes away... would it be useful to add a hook
to the Proxies so that they can apply some other action as well ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov@inrialpes.fr  Wed Aug 18 12:42:08 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 12:42:08 +0100 (NFT)
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37BA768A.50DF5574@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 11:02:02 am"
Message-ID: <199908181142.MAA22596@pukapuka.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> Usage is pretty simple:
> 
> from Proxy import WeakProxy
> object = []
> wr = WeakProxy(object)
> wr.append(8)
> del object
> 
> >>> wr[0]
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> mxProxy.LostReferenceError: object already garbage collected
> 
> I have checked the ref counts pretty thoroughly, but before
> going public I would like the Python-Dev crowd to run some
> tests as well: after all, the point is for the weak references
> to be weak and that's sometimes a bit hard to check.

It's even harder to implement them without side effects. I used
the same hack for the __heirs__ class attribute some time ago.
But I knew that a parent class cannot be garbage collected before
all of its descendants. That allowed me to keep weak refs in
the parent class, and preserve the existing strong refs in the
subclasses. On every dealloc of a subclass, the corresponding
weak ref in the parent class' __heirs__ is removed.

In your case, the lifetime of the objects cannot be predicted,
so implementing weak refs by messing with refcounts or checking
mem pointers is a dead end. I don't know whether this is the
case with mxProxy as I just browsed the code quickly, but here's
a scenario where your scheme (or implementation) is not working:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = WeakProxy(o)
>>> d = WeakProxy(o)
>>> p
<WeakProxy object at 20260138>
>>> d
<WeakProxy object at 20261328>
>>> print p
[]
>>> print d
[]
>>> del o
>>> p
<WeakProxy object at 20260138>
>>> d
<WeakProxy object at 20261328>
>>> print p
Illegal instruction (core dumped)


-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack@oratrix.nl  Wed Aug 18 12:02:13 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 18 Aug 1999 13:02:13 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
 Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com>
Message-ID: <19990818110213.A558F303120@snelboot.oratrix.nl>

The one thing I'm not thrilled by in mxProxy is that a call to 
CheckWeakReferences() is needed before an object is cleaned up. I guess this 
boils down to the same problem I had with my weak reference scheme: you 
somehow want the Python core to tell the proxy stuff that the object can be 
cleaned up (although the details are different: in my scheme this would be 
triggered by refcount==0 and in mxProxy by refcount==1). And because objects 
are created and destroyed in Python at a tremendous rate you don't want to do 
this call for every object, only if you have a hint that the object has a weak 
reference (or a proxy).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal@lemburg.com  Wed Aug 18 12:46:45 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 13:46:45 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <19990818110213.A558F303120@snelboot.oratrix.nl>
Message-ID: <37BA9D25.95E46EA@lemburg.com>

Jack Jansen wrote:
> 
> The one thing I'm not thrilled by in mxProxy is that a call to
> CheckWeakReferences() is needed before an object is cleaned up. I guess this
> boils down to the same problem I had with my weak reference scheme: you
> somehow want the Python core to tell the proxy stuff that the object can be
> cleaned up (although the details are different: in my scheme this would be
> triggered by refcount==0 and in mxProxy by refcount==1). And because objects
> are created and destroyed in Python at a tremendous rate you don't want to do
> this call for every object, only if you have a hint that the object has a weak
> reference (or a proxy).

Well, the check is done prior to every action using a proxy to
the object and also when a proxy to it is deallocated. The
addition checkweakrefs() API is only included to enable additional explicit
checking of the whole weak refs dictionary, e.g. every 10 seconds
or so (just like you would with a mark&sweep GC).

But yes, GC of the phantom object is delayed a bit depending on
how you set up the proxies. Still, I think most usages won't have
this problem, since the proxies themselves are usually 
temporary objects.

It may sometimes even make sense to have the phantom object
around as long as possible, e.g. to implement the soft references
Tim quoted from the Java paper.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal@lemburg.com  Wed Aug 18 12:33:18 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 13:33:18 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181142.MAA22596@pukapuka.inrialpes.fr>
Message-ID: <37BA99FE.45D582AD@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> > I have checked the ref counts pretty thoroughly, but before
> > going public I would like the Python-Dev crowd to run some
> > tests as well: after all, the point is for the weak references
> > to be weak and that's sometimes a bit hard to check.
> 
> It's even harder to implement them without side effects. I used
> the same hack for the __heirs__ class attribute some time ago.
> But I knew that a parent class cannot be garbage collected before
> all of its descendants. That allowed me to keep weak refs in
> the parent class, and preserve the existing strong refs in the
> subclasses. On every dealloc of a subclass, the corresponding
> weak ref in the parent class' __heirs__ is removed.
> 
> In your case, the lifetime of the objects cannot be predicted,
> so implementing weak refs by messing with refcounts or checking
> mem pointers is a dead end.

> I don't know whether this is the
> case with mxProxy as I just browsed the code quickly, but here's
> a scenario where your scheme (or implementation) is not working:
> 
> >>> from Proxy import WeakProxy
> >>> o = []
> >>> p = WeakProxy(o)
> >>> d = WeakProxy(o)
> >>> p
> <WeakProxy object at 20260138>
> >>> d
> <WeakProxy object at 20261328>
> >>> print p
> []
> >>> print d
> []
> >>> del o
> >>> p
> <WeakProxy object at 20260138>
> >>> d
> <WeakProxy object at 20261328>
> >>> print p
> Illegal instruction (core dumped)

Could you tell me where the core dump originates ? Also, it would
help to compile the package with the -DMAL_DEBUG switch turned
on (edit Setup) and then run the same things using 'python -d'.
The package will then print a pretty complete list of things it
is doing to mxProxy.log, which would help track down errors like
these.

BTW, I get:
>>> print p

Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected
>>>

[Don't know why the print statement prints an empty line, though.]

Thanks for trying it,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov@inrialpes.fr  Wed Aug 18 14:12:14 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 14:12:14 +0100 (NFT)
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37BA99FE.45D582AD@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 01:33:18 pm"
Message-ID: <199908181312.OAA20542@pukapuka.inrialpes.fr>

[about mxProxy, WeakProxy]

M.-A. Lemburg wrote:
> 
> Could you tell me where the core dump originates ? Also, it would
> help to compile the package with the -DMAL_DEBUG switch turned
> on (edit Setup) and then run the same things using 'python -d'.
> The package will then print a pretty complete list of things it
> is doing to mxProxy.log, which would help track down errors like
> these.
> 
> BTW, I get:
> >>> print p
> 
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> mxProxy.LostReferenceError: object already garbage collected
> >>>
> 
> [Don't know why the print statement prints an empty line, though.]
> 

The previous example now *seems* to work fine in a freshly launched
interpreter, so it's not a good example, but this shorter one
definitely doesn't:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = q = WeakProxy(o)
>>> p = q = WeakProxy(o)
>>> del o
>>> print p or q
Illegal instruction (core dumped)

Or even shorter:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = q = WeakProxy(o)
>>> p = WeakProxy(o)
>>> del o
>>> print p
Illegal instruction (core dumped)

It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference().
I can mail you a complete trace in private, if you still need it.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal@lemburg.com  Wed Aug 18 13:50:08 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 14:50:08 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181312.OAA20542@pukapuka.inrialpes.fr>
Message-ID: <37BAAC00.27A34FF7@lemburg.com>

Vladimir Marangozov wrote:
> 
> [about mxProxy, WeakProxy]
> 
> M.-A. Lemburg wrote:
> >
> > Could you tell me where the core dump originates ? Also, it would
> > help to compile the package with the -DMAL_DEBUG switch turned
> > on (edit Setup) and then run the same things using 'python -d'.
> > The package will then print a pretty complete list of things it
> > is doing to mxProxy.log, which would help track down errors like
> > these.
> >
> > BTW, I get:
> > >>> print p
> >
> > Traceback (innermost last):
> >   File "<stdin>", line 1, in ?
> > mxProxy.LostReferenceError: object already garbage collected
> > >>>
> >
> > [Don't know why the print statement prints an empty line, though.]
> >
> 
> The previous example now *seems* to work fine in a freshly launched
> interpreter, so it's not a good example, but this shorter one
> definitely doesn't:
> 
> >>> from Proxy import WeakProxy
> >>> o = []
> >>> p = q = WeakProxy(o)
> >>> p = q = WeakProxy(o)
> >>> del o
> >>> print p or q
> Illegal instruction (core dumped)
> 
> It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference().
> I can mail you a complete trace in private, if you still need it.

That would be nice (please also include the log-file), because I get:
>>> print p or q
Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected
>>>

Thank you,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From skip@mojam.com (Skip Montanaro)  Wed Aug 18 15:47:23 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 18 Aug 1999 09:47:23 -0500
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
Message-ID: <199908181447.JAA05151@dolphin.mojam.com>

I posted a note to the main list yesterday in response to Dan Connolly's
complaint that the os module isn't very portable.  I saw no followups (it's
amazing how fast a thread can die out :-), but I think it's a reasonable
idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback
from people more interesting in long-term Python developments.

The basic premise is that for each platform on which Python runs there are
portable and nonportable interfaces to the underlying operating system.  The
term POSIX has some portability connotations, so let's assume that the posix
module exposes the portable subset of the OS interface.  To keep things
simple, let's also assume there are only three supported general OS
platforms: unix, nt and mac.  The proposal then is that importing the
platform's module by name will import both the portable and non-portable
interface elements.  Importing the posix module will import just that
portion of the interface that is truly portable across all platforms.  To
add new functionality to the posix interface it would have to be added to
all three platforms.  The posix module will be able to ferret out the
platform it is running on and import the correct OS-independent posix
implementation:

    import sys
    _plat = sys.platform
    del sys

    if _plat == "mac": from posixmac import *
    elif _plat == "nt": from posixnt import *
    else: from posixunix import *	# some unix variant

The platform-dependent module would simply import everything it could, e.g.:

    from posixunix import *
    from nonposixunix import *

The os module would vanish or be deprecated with its current behavior
intact.  The documentation would be modified so that the posix module
documents the portable interface and the OS-dependent module's documentation
documents the rest and just refers users to the posix module docs for the
portable stuff.

In theory, this could be done for 1.6, however as I've proposed it, the
semantics of importing the posix module would change.  Dan Connolly probably
isn't going to have a problem with that, though I suppose Guido might...  If
this idea is good enough for 1.6, perhaps we leave os and posix module
semantics alone and add a module named "portable", "portableos" or
"portableposix" or something equally arcane.

Skip Montanaro	| http://www.mojam.com/
skip@mojam.com  | http://www.musi-cal.com/~skip/
847-971-7098


From guido@CNRI.Reston.VA.US  Wed Aug 18 15:54:28 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 18 Aug 1999 10:54:28 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Your message of "Wed, 18 Aug 1999 09:47:23 CDT."
 <199908181447.JAA05151@dolphin.mojam.com>
References: <199908181447.JAA05151@dolphin.mojam.com>
Message-ID: <199908181454.KAA07692@eric.cnri.reston.va.us>

> I posted a note to the main list yesterday in response to Dan Connolly's
> complaint that the os module isn't very portable.  I saw no followups (it's
> amazing how fast a thread can die out :-), but I think it's a reasonable
> idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback
> from people more interesting in long-term Python developments.
> 
> The basic premise is that for each platform on which Python runs there are
> portable and nonportable interfaces to the underlying operating system.  The
> term POSIX has some portability connotations, so let's assume that the posix
> module exposes the portable subset of the OS interface.  To keep things
> simple, let's also assume there are only three supported general OS
> platforms: unix, nt and mac.  The proposal then is that importing the
> platform's module by name will import both the portable and non-portable
> interface elements.  Importing the posix module will import just that
> portion of the interface that is truly portable across all platforms.  To
> add new functionality to the posix interface it would have to be added to
> all three platforms.  The posix module will be able to ferret out the
> platform it is running on and import the correct OS-independent posix
> implementation:
> 
>     import sys
>     _plat = sys.platform
>     del sys
> 
>     if _plat == "mac": from posixmac import *
>     elif _plat == "nt": from posixnt import *
>     else: from posixunix import *	# some unix variant
> 
> The platform-dependent module would simply import everything it could, e.g.:
> 
>     from posixunix import *
>     from nonposixunix import *
> 
> The os module would vanish or be deprecated with its current behavior
> intact.  The documentation would be modified so that the posix module
> documents the portable interface and the OS-dependent module's documentation
> documents the rest and just refers users to the posix module docs for the
> portable stuff.
> 
> In theory, this could be done for 1.6, however as I've proposed it, the
> semantics of importing the posix module would change.  Dan Connolly probably
> isn't going to have a problem with that, though I suppose Guido might...  If
> this idea is good enough for 1.6, perhaps we leave os and posix module
> semantics alone and add a module named "portable", "portableos" or
> "portableposix" or something equally arcane.

And the advantage of this would be...?

Basically, it seems you're just renaming the functionality of os to posix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip@mojam.com (Skip Montanaro)  Wed Aug 18 16:10:41 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:10:41 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <199908181454.KAA07692@eric.cnri.reston.va.us>
References: <199908181447.JAA05151@dolphin.mojam.com>
 <199908181454.KAA07692@eric.cnri.reston.va.us>
Message-ID: <14266.51743.904066.470431@dolphin.mojam.com>

    Guido> And the advantage of this would be...?

    Guido> Basically, it seems you're just renaming the functionality of os
    Guido> to posix.

I see a few advantages.

    1. We will get the meaning of the noun "posix" more or less right.
       Programmers coming from other languages are used to thinking of
       programming to a POSIX API or the "POSIX subset of the OS API".
       Witness all the "#ifdef _POSIX" in the header files on my Linux box
       In Python, the exact opposite is true.  Importing the posix module is
       documented to be the non-portable way to interface to Unix platforms.

    2. You would make it clear on all platforms when you expect to be
       programming in a non-portable fashion, by importing the
       platform-specific os (unix, nt, mac).  "import unix" would mean I
       expect this code to only run on Unix machines.  You could argue that
       you are declaring your non-portability by importing the posix module
       today, but to the casual user or to a new Python programmer with a C
       or C++ background, that won't be obvious.

    3. If Dan Connolly's contention is correct, importing the os module
       today is not all that portable.  I can't really say one way or the
       other, because I'm lucky enough to be able to confine my serious
       programming to Unix.  I'm sure there's someone out there that can try
       the following on a few platforms:

	  import os
	  dir(os)

       and compare the output.

Skip


From jack@oratrix.nl  Wed Aug 18 16:33:20 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 18 Aug 1999 17:33:20 +0200
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain
 fart
In-Reply-To: Message by Skip Montanaro <skip@mojam.com> ,
 Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com>
Message-ID: <19990818153320.D61F6303120@snelboot.oratrix.nl>

>  The proposal then is that importing the
> platform's module by name will import both the portable and non-portable
> interface elements.  Importing the posix module will import just that
> portion of the interface that is truly portable across all platforms.

There's one slight problem with this: when you use functionality that is 
partially portable, i.e. a call that is available on Windows and Unix but not 
on the Mac.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From akuchlin@mems-exchange.org  Wed Aug 18 16:39:30 1999
From: akuchlin@mems-exchange.org (Andrew M. Kuchling)
Date: Wed, 18 Aug 1999 11:39:30 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com>
References: <199908181447.JAA05151@dolphin.mojam.com>
 <199908181454.KAA07692@eric.cnri.reston.va.us>
 <14266.51743.904066.470431@dolphin.mojam.com>
Message-ID: <14266.54194.715887.808096@amarok.cnri.reston.va.us>

Skip Montanaro writes:
>    2. You would make it clear on all platforms when you expect to be
>       programming in a non-portable fashion, by importing the
>       platform-specific os (unix, nt, mac).  "import unix" would mean I

To my mind, POSIX == Unix; other platforms may have bits of POSIX-ish
functionality, but most POSIX functions will only be found on Unix
systems.  One of my projects for 1.6 is to go through the O'Reilly
POSIX book and add all the missing calls to the posix modules.
Practically none of those functions would exist on Windows or Mac.

Perhaps it's really a documentation fix: the os module should document
only those features common to all of the big 3 platforms (Unix,
Windows, Mac), and have pointers to a section for each of the
platform-specific modules, listing the platform-specific functions.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Setting loose on the battlefield weapons that are able to learn may be one of
the biggest mistakes mankind has ever made. It could also be one of the last.
    -- Richard Forsyth, "Machine Learning for Expert Systems"


From skip@mojam.com (Skip Montanaro)  Wed Aug 18 16:52:20 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:52:20 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.54194.715887.808096@amarok.cnri.reston.va.us>
References: <199908181447.JAA05151@dolphin.mojam.com>
 <199908181454.KAA07692@eric.cnri.reston.va.us>
 <14266.51743.904066.470431@dolphin.mojam.com>
 <14266.54194.715887.808096@amarok.cnri.reston.va.us>
Message-ID: <14266.54907.143970.101594@dolphin.mojam.com>

    Andrew> Perhaps it's really a documentation fix: the os module should
    Andrew> document only those features common to all of the big 3
    Andrew> platforms (Unix, Windows, Mac), and have pointers to a section
    Andrew> for each of the platform-specific modules, listing the
    Andrew> platform-specific functions.

Perhaps.  Should that read

    ... the os module should *expose* only those features common to all of
    the big 3 platforms ...

?

Skip


From skip@mojam.com (Skip Montanaro)  Wed Aug 18 16:54:11 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:54:11 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain
 fart
In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl>
References: <skip@mojam.com>
 <199908181447.JAA05151@dolphin.mojam.com>
 <19990818153320.D61F6303120@snelboot.oratrix.nl>
Message-ID: <14266.54991.27912.12075@dolphin.mojam.com>

>>>>> "Jack" == Jack Jansen <jack@oratrix.nl> writes:

    >> The proposal then is that importing the
    >> platform's module by name will import both the portable and non-portable
    >> interface elements.  Importing the posix module will import just that
    >> portion of the interface that is truly portable across all platforms.

    Jack> There's one slight problem with this: when you use functionality that is 
    Jack> partially portable, i.e. a call that is available on Windows and Unix but not 
    Jack> on the Mac.

Agreed.  I'm not sure what to do there.  Is the intersection of the common
OS calls on Unix, Windows and Mac so small as to be useless (or are there
some really gotta have functions not in the intersection because they are
missing only on the Mac)?

Skip


From guido@CNRI.Reston.VA.US  Wed Aug 18 17:16:27 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 18 Aug 1999 12:16:27 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Your message of "Wed, 18 Aug 1999 10:52:20 CDT."
 <14266.54907.143970.101594@dolphin.mojam.com>
References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us>
 <14266.54907.143970.101594@dolphin.mojam.com>
Message-ID: <199908181616.MAA07901@eric.cnri.reston.va.us>

>     ... the os module should *expose* only those features common to all of
>     the big 3 platforms ...

Why?

My experience has been that functionality that was thought to be Unix
specific has gradually become available on other platforms, which
makes it hard to decide in which module a function should be placed.

The proper test for portability of a program is not whether it imports
certain module names, but whether it uses certain functions from those
modules (and whether it uses them in a portable fashion).  As
platforms evolve, a program that was previously thought to be
non-portable might become more portable.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov@inrialpes.fr  Wed Aug 18 18:33:44 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 18:33:44 +0100 (NFT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain  fart
In-Reply-To: <14266.54991.27912.12075@dolphin.mojam.com> from "Skip Montanaro" at "Aug 18, 99 10:54:11 am"
Message-ID: <199908181733.SAA08434@pukapuka.inrialpes.fr>

Everybody's right in this debate. I have to type a lot to express
objectively my opinion, but better filter my reasoning and just say
the conclusion.

Having in mind:

- what POSIX is
- what an OS is
- that an OS may or may not comply w/ the POSIX standard, and if it doesn't,
  it may do so in a couple of years (Windows 3K and PyOS come to mind ;-)
- that the os module claims portability amongst the different
  OSes, mainly regarding their filesystem & process management services,
  hence it's exposing only a *subset* of the os specific services
- the current state of Python

It would be nice:
- to leave the os module as a common denominator
- to have a "unix" module (which could further incorporate the different
  brands of unix)
- to have the posix module capture the fraction of posix functionality,
  exported from a particular OS specific module, and add the appropriate
  POSIX propaganda in the docs
- to manage to do this, or argue what's wrong with the above
 
-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal@lemburg.com  Thu Aug 19 11:02:26 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Thu, 19 Aug 1999 12:02:26 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181312.OAA20542@pukapuka.inrialpes.fr> <37BAAC00.27A34FF7@lemburg.com>
Message-ID: <37BBD632.3F66419C@lemburg.com>

[about weak references and a sample implementation in mxProxy]

With the help of Vladimir, I have solved the problem and uploaded
a modified version of the prerelease:

      http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip

The archive now also contains a precompiled Win32 PYD file
for those on WinXX platforms. Please give it a try and tell
me what you think.

Cheers,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   134 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack@oratrix.nl  Thu Aug 19 15:06:01 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Thu, 19 Aug 1999 16:06:01 +0200
Subject: [Python-Dev] Optimization idea
Message-ID: <19990819140602.433BC303120@snelboot.oratrix.nl>

I just had yet another idea for optimizing Python that looks so plausible that 
I guess someone else must have looked into it already (and, hence, probably 
rejected it:-):

We add to the type structure a "type identifier" number, a small integer for 
the common types (int=1, float=2, string=3, etc) and 0 for everything else.

When eval_code2 sees, for instance, a MULTIPLY operation it does something 
like the following:
   case BINARY_MULTIPLY:
	w = POP();
	v = POP();
	code = (BINARY_MULTIPLY << 8) |
		((v->ob_type->tp_typeid) << 4) |
		((w->ob_type->tp_typeid);
	x = (binopfuncs[code])(v, w);
	.... etc ...

The idea is that all the 256 BINARY_MULTIPLY entries would be filled with 
PyNumber_Multiply, except for a few common cases. The int*int field could 
point straight to int_mul(), etc.

Assuming the common cases are really more common than the uncommon cases the 
fact that they jump straight out to the implementation function in stead of 
mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset 
the added overhead of shifts, ors and indexing.

Any thoughts?

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido@CNRI.Reston.VA.US  Thu Aug 19 15:05:28 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 10:05:28 -0400
Subject: [Python-Dev] Localization expert needed
Message-ID: <199908191405.KAA10401@eric.cnri.reston.va.us>

My contact at HP is asking for expert advice on localization and
multi-byte characters.  I have little to share except pointing to
Martin von Loewis and Pythonware.  Does anyone on this list have a
suggestion besides those?  Don't hesitate to recommend yourself --
there's money in it!

--Guido van Rossum (home page: http://www.python.org/~guido/)

------- Forwarded Message

Date:    Wed, 18 Aug 1999 23:15:55 -0700
From:    JOE_ELLSWORTH
To:      guido@CNRI.Reston.VA.US
Subject: Localization efforts and state in Python.

Hi Guido.  

Can you give me some references to The best references currently
available for using Python in CGI applications when multi-byte
localization is known to be needed?

Who is the expert in this in the Python area?   Can you recomend that
they work with us in this area?

            Thanks, Joe E.

------- End of Forwarded Message


From guido@CNRI.Reston.VA.US  Thu Aug 19 15:15:28 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 10:15:28 -0400
Subject: [Python-Dev] Optimization idea
In-Reply-To: Your message of "Thu, 19 Aug 1999 16:06:01 +0200."
 <19990819140602.433BC303120@snelboot.oratrix.nl>
References: <19990819140602.433BC303120@snelboot.oratrix.nl>
Message-ID: <199908191415.KAA10432@eric.cnri.reston.va.us>

> I just had yet another idea for optimizing Python that looks so
> plausible that I guess someone else must have looked into it already
> (and, hence, probably rejected it:-):
> 
> We add to the type structure a "type identifier" number, a small integer for 
> the common types (int=1, float=2, string=3, etc) and 0 for everything else.
> 
> When eval_code2 sees, for instance, a MULTIPLY operation it does something 
> like the following:
>    case BINARY_MULTIPLY:
> 	w = POP();
> 	v = POP();
> 	code = (BINARY_MULTIPLY << 8) |
> 		((v->ob_type->tp_typeid) << 4) |
> 		((w->ob_type->tp_typeid);
> 	x = (binopfuncs[code])(v, w);
> 	.... etc ...
> 
> The idea is that all the 256 BINARY_MULTIPLY entries would be filled with 
> PyNumber_Multiply, except for a few common cases. The int*int field could 
> point straight to int_mul(), etc.
> 
> Assuming the common cases are really more common than the uncommon cases the 
> fact that they jump straight out to the implementation function in stead of 
> mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset 
> the added overhead of shifts, ors and indexing.

You're assuming that arithmetic operations are a major time sink.  I
doubt that; much of my code contains hardly any arithmetic these days.

Of course, if you *do* have a piece of code that does a lot of basic
arithmetic, it might pay off -- but even then I would guess that the
majority of opcodes are things like list accessors and variable.

But we needn't speculate.  It's easy enough to measure the speedup:
you can use tp_xxx5 in the type structure and plug a typecode into it
for the int and float types.  

(Note that you would need a separate table of binopfuncs per
operator.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov@inrialpes.fr  Thu Aug 19 20:09:26 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Thu, 19 Aug 1999 20:09:26 +0100 (NFT)
Subject: [Python-Dev] about line numbers
Message-ID: <199908191909.UAA20618@pukapuka.inrialpes.fr>

[Tim, in an earlier msg]
> 
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call;

Ok. In the meantime I think that folding the redundant SET_LINENO doesn't
hurt. I ended up with a patchlet that seems to have no side effects, that
updates the lnotab as it should and that even makes pdb a bit more clever,
IMHO.

Consider an extreme case for the function f (listed below). Currently,
we get the following:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2

          6 SET_LINENO          3

          9 SET_LINENO          4

         12 SET_LINENO          5
         15 LOAD_CONST          1 (1)
         18 STORE_FAST          0 (a)

         21 SET_LINENO          6

         24 SET_LINENO          7

         27 SET_LINENO          8
         30 LOAD_CONST          2 (None)
         33 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(1)f()
-> def f():
(Pdb) list 1, 20
  1  -> def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5             a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(2)f()
-> """Comment about f"""
(Pdb) step
> test.py(3)f()
-> """Another one"""
(Pdb) step
> test.py(4)f()
-> """A third one"""
(Pdb) step
> test.py(5)f()
-> a = 1
(Pdb) step
> test.py(6)f()
-> """Forth"""
(Pdb) step
> test.py(7)f()
-> "and pdb can set a breakpoint on this one (simple quotes)"
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>>
-------------------------------------------

With folded SET_LINENO, we have this:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 SET_LINENO          5
          3 LOAD_CONST          1 (1)
          6 STORE_FAST          0 (a)

          9 SET_LINENO          8
         12 LOAD_CONST          2 (None)
         15 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 20
  1     def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5  ->         a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) break 7 
Breakpoint 1 at test.py:7
(Pdb) break 8
*** Blank or comment
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>> 
-------------------------------------------

i.e, pdb stops at (points to) the first real instruction and doesn't step
trough the doc strings.

Or is there something I'm missing here?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** compile.c-orig	Thu Aug 19 19:27:13 1999
--- compile.c	Thu Aug 19 19:00:31 1999
***************
*** 615,620 ****
--- 615,623 ----
  	int arg;
  {
  	if (op == SET_LINENO) {
+ 		if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3)
+ 			/* Hack for folding several SET_LINENO in a row. */
+ 			c->c_nexti -= 3;
  		com_set_lineno(c, arg);
  		if (Py_OptimizeFlag)
  			return;


From guido@CNRI.Reston.VA.US  Thu Aug 19 22:10:33 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 17:10:33 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: Your message of "Thu, 19 Aug 1999 20:09:26 BST."
 <199908191909.UAA20618@pukapuka.inrialpes.fr>
References: <199908191909.UAA20618@pukapuka.inrialpes.fr>
Message-ID: <199908192110.RAA12755@eric.cnri.reston.va.us>

Earlier, you argued that this is "not an optimization," but rather
avoiding redundancy.  I should have responded right then that I
disagree, or at least I'm lukewarm about your patch.  Either you're
not using -O, and then you don't care much about this; or you care,
and then you should be using -O.

Rather than encrusting the code with more and more ad-hoc micro
optimizations, I'd prefer to have someone look into Tim's suggestion
of supporting more efficient breakpoints...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov@inrialpes.fr  Fri Aug 20 13:45:46 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 13:45:46 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908192110.RAA12755@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 19, 99 05:10:33 pm"
Message-ID: <199908201245.NAA27098@pukapuka.inrialpes.fr>

Guido van Rossum wrote:
> 
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.

I haven't argued so much; I asked whether this would be reasonable.

Probably I should have said that I don't see the purpose of emitting
SET_LINENO instructions for those nodes for which the compiler
generates no code, mainly because (as I learned subsequently) SET_LINENO
serve no other purpose but debugging. As I haven't payed much attention to
this aspect of the code, I thought thay they might still be used for
tracebacks. But I couldn't have said that because I didn't know it.

> I should have responded right then that I disagree, ...

Although I agree this is a minor issue, I'm interested in your argument
here, if it's something else than the dialectic: "we're more interested
in long term improvements" which is also my opinion.

> ... or at least I'm lukewarm about your patch.

No surprise here :-) But I haven't found another way of not generating
SET_LINENO for doc strings other than backpatching.

> Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.

Neither of those. I don't really care, frankly. I was just intrigued by
the consecutive SET_LINENO in my disassemblies, so I started to think
and ask questions about it.

> 
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

This is *the* real issue with the real potential solution. I'm willing
to have a look at this (although I don't know pdb/bdb in its finest
details). All suggestions and thoughts are welcome.

We would probably leave the SET_LINENO opcode as is and (eventually)
introduce a new opcode (instead of transforming/renaming it) for
compatibility reasons, methinks.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gmcm@hypernet.com  Fri Aug 20 17:04:22 1999
From: gmcm@hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 11:04:22 -0500
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <19990818110213.A558F303120@snelboot.oratrix.nl>
References: Message by "M.-A. Lemburg" <mal@lemburg.com> ,	     Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com>
Message-ID: <1276961301-70195@hypernet.com>

In reply to no one in particular:

 I've often wished that the instance type object had an (optimized) 
__decref__ slot. With nothing but hand-waving to support it, I'll 
claim that would enable all these games.

- Gordon


From gmcm@hypernet.com  Fri Aug 20 17:04:22 1999
From: gmcm@hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 11:04:22 -0500
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl>
References: Message by Skip Montanaro <skip@mojam.com> ,	     Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com>
Message-ID: <1276961295-70552@hypernet.com>

Jack Jansen wrote:

> There's one slight problem with this: when you use functionality
> that is partially portable, i.e. a call that is available on Windows
> and Unix but not on the Mac.

 It gets worse, I think. How about the inconsistencies in POSIX 
support among *nixes? How about NT being a superset of Win9x? How 
about NTFS having capabilities that FAT does not? I'd guess there are 
inconsistencies between Mac flavors, too.

 The Java approach (if you can't do it everywhere, you can't do it)
sucks. In some cases you could probably have the missing
functionality (in os) fail silently, but in other cases that would
be a disaster. 

 "Least-worst"-is-not-necessarily-"good"-ly y'rs

- Gordon


From tismer@appliedbiometrics.com  Fri Aug 20 16:05:47 1999
From: tismer@appliedbiometrics.com (Christian Tismer)
Date: Fri, 20 Aug 1999 17:05:47 +0200
Subject: [Python-Dev] about line numbers
References: <199908191909.UAA20618@pukapuka.inrialpes.fr> <199908192110.RAA12755@eric.cnri.reston.va.us>
Message-ID: <37BD6ECB.9DD17460@appliedbiometrics.com>


Guido van Rossum wrote:
> 
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.  I should have responded right then that I
> disagree, or at least I'm lukewarm about your patch.  Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.
> 
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

I didn't think of this before, but I just realized that
I have something like that already in Stackless Python.
It is possible to set a breakpoint at every opcode, for every
frame. Adding an extra opcode for breakpoints is a good thing
as well. The former are good for tracing, conditionla breakpoints
and such, and cost a little more time since the is always one extra
function call. The latter would be a quick, less versatile thing.

The implementation of inserting extra breakpoint opcodes for
running code turns out to be easy to implement, if the running
frame gets a local extra copy of its code object, with the
breakpoints replacing the original opcodes. The breakpoint handler
would then simply look into the original code object.

Inserting breakpoints on the source level gives us breakpoints
per procedure. Doing it in a running frame gives "instance" level
debugging of code. Checking a monitor function on every opcode
is slightly more expensive but most general.
We can have it all, what do you think.
I'm going to finish and publish the stackless/continous package
and submit a paper by end of September. Should I include
this debugging feature?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From guido@CNRI.Reston.VA.US  Fri Aug 20 16:09:32 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 11:09:32 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."
 <1276961301-70195@hypernet.com>
References: Message by "M.-A. Lemburg" <mal@lemburg.com> , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com>
 <1276961301-70195@hypernet.com>
Message-ID: <199908201509.LAA14726@eric.cnri.reston.va.us>

> In reply to no one in particular:
> 
>  I've often wished that the instance type object had an (optimized) 
> __decref__ slot. With nothing but hand-waving to support it, I'll 
> claim that would enable all these games.

Without context, I don't know when this would be called.  If you want
this called on all DECREFs (regardless of the refcount value), realize
that this is a huge slowdown because it would mean the DECREF macro
has to inspect the type object, which means several indirections.
This would slow down *every* DECREF operation, not just those on
instances with a __decref__ slot, because the DECREF macro doesn't
know the type of the object!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@CNRI.Reston.VA.US  Fri Aug 20 16:13:16 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 11:13:16 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."
 <1276961295-70552@hypernet.com>
References: Message by Skip Montanaro <skip@mojam.com> , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com>
 <1276961295-70552@hypernet.com>
Message-ID: <199908201513.LAA14741@eric.cnri.reston.va.us>

From: "Gordon McMillan" <gmcm@hypernet.com>

> Jack Jansen wrote:
> 
> > There's one slight problem with this: when you use functionality
> > that is partially portable, i.e. a call that is available on Windows
> > and Unix but not on the Mac.
> 
>  It gets worse, I think. How about the inconsistencies in POSIX 
> support among *nixes? How about NT being a superset of Win9x? How 
> about NTFS having capabilities that FAT does not? I'd guess there are 
> inconsistencies between Mac flavors, too.
> 
>  The Java approach (if you can't do it everywhere, you can't do it)
> sucks. In some cases you could probably have the missing
> functionality (in os) fail silently, but in other cases that would
> be a disaster. 

The Python policy has always been "if it's available, there's a
standard name and API for it; if it's not available, the function is
not defined or will raise an exception; you can use hasattr(os, ...)
or catch exceptions to cope if you can live without it."

There are a few cases where unavailable calls are emulated, a few
where they are made no-ops, and a few where they are made to raise an
exception uncoditionally, but in most cases the function will simply
not exist, so it's easy to test.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov@inrialpes.fr  Fri Aug 20 21:54:10 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 21:54:10 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <37BD6ECB.9DD17460@appliedbiometrics.com> from "Christian Tismer" at "Aug 20, 99 05:05:47 pm"
Message-ID: <199908202054.VAA26970@pukapuka.inrialpes.fr>

I'll try to sketch here the scheme I'm thinking of for the
callback/breakpoint issue (without SET_LINENO), although some
technical details are still missing.

I'm assuming the following, in this order:

1) No radical changes in the current behavior, i.e. preserve the
   current architecture / strategy as much as possible.

2) We dont have breakpoints per opcode, but per source line. For that
   matter, we have sys.settrace (and for now, we don't aim to have
   sys.settracei that would be called on every opcode, although we might
   want this in the future)

3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
   used for callbacks from C to Python. So the basic problem is to generate
   these callbacks.

If any of the above is not an appropriate assumption and we want a radical
change in the strategy of setting breakpoints/ generating callbacks, then
this post is invalid.

The solution I'm thinking of:

a) Currently, we have a function PyCode_Addr2Line which computes the source
   line from the opcode's address. I hereby assume that we can write the
   reverse function PyCode_Line2Addr that returns the address from a given
   source line number. I don't have the implementation, but it should be
   doable. Furthermore, we can compute, having the co_lnotab table and
   co_firstlineno, the source line range for a code object.

   As a consequence, even with the dumbiest of all algorithms, by looping
   trough this source line range, we can enumerate with PyCode_Line2Addr 
   the sequence of addresses for the source lines of this code object.

b) As Chris pointed out, in case sys.settrace is defined, we can allocate
   and keep a copy of the original code string per frame. We can further
   dynamically overwrite the original code string with a new (internal,
   one byte) CALL_TRACE opcode at the addresses we have enumerated in a).

   The CALL_TRACE opcodes will trigger the callbacks from C to Python,
   just as the current SET_LINENO does.

c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
   the callback and if it returns successfully, we'll fetch the original
   opcode for the current location from the copy of the original co_code.
   Then we directly jump to the arg fetch code (or in case we fetch the
   entire original opcode in CALL_TRACE - we jump to the dispatch code).


Hmm. I think that's all.

At the heart of this scheme is the PyCode_Line2Addr function, which is
the only blob in my head, for now.

Christian Tismer wrote:
> 
> I didn't think of this before, but I just realized that
> I have something like that already in Stackless Python.
> It is possible to set a breakpoint at every opcode, for every
> frame. Adding an extra opcode for breakpoints is a good thing
> as well. The former are good for tracing, conditionla breakpoints
> and such, and cost a little more time since the is always one extra
> function call. The latter would be a quick, less versatile thing.

I don't think I understand clearly the difference you're talking about, 
and why the one thing is better that the other, probably because I'm a
bit far from stackless python.
 
> I'm going to finish and publish the stackless/continous package
> and submit a paper by end of September. Should I include this debugging
> feature?

Write the paper first, you have more than enough material to talk about
already ;-). Then if you have time to implement some debugging support,
you could always add another section, but it won't be a central point
of your paper.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From guido@CNRI.Reston.VA.US  Fri Aug 20 20:59:24 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 15:59:24 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: Your message of "Fri, 20 Aug 1999 21:54:10 BST."
 <199908202054.VAA26970@pukapuka.inrialpes.fr>
References: <199908202054.VAA26970@pukapuka.inrialpes.fr>
Message-ID: <199908201959.PAA16105@eric.cnri.reston.va.us>

> I'll try to sketch here the scheme I'm thinking of for the
> callback/breakpoint issue (without SET_LINENO), although some
> technical details are still missing.
> 
> I'm assuming the following, in this order:
> 
> 1) No radical changes in the current behavior, i.e. preserve the
>    current architecture / strategy as much as possible.
> 
> 2) We dont have breakpoints per opcode, but per source line. For that
>    matter, we have sys.settrace (and for now, we don't aim to have
>    sys.settracei that would be called on every opcode, although we might
>    want this in the future)
> 
> 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
>    used for callbacks from C to Python. So the basic problem is to generate
>    these callbacks.

They used to be the only mechanism by which the traceback code knew
the current line number (long before the debugger hooks existed), but
with the lnotab, that's no longer necessary.

> If any of the above is not an appropriate assumption and we want a radical
> change in the strategy of setting breakpoints/ generating callbacks, then
> this post is invalid.

Sounds reasonable.

> The solution I'm thinking of:
> 
> a) Currently, we have a function PyCode_Addr2Line which computes the source
>    line from the opcode's address. I hereby assume that we can write the
>    reverse function PyCode_Line2Addr that returns the address from a given
>    source line number. I don't have the implementation, but it should be
>    doable. Furthermore, we can compute, having the co_lnotab table and
>    co_firstlineno, the source line range for a code object.
> 
>    As a consequence, even with the dumbiest of all algorithms, by looping
>    trough this source line range, we can enumerate with PyCode_Line2Addr 
>    the sequence of addresses for the source lines of this code object.
> 
> b) As Chris pointed out, in case sys.settrace is defined, we can allocate
>    and keep a copy of the original code string per frame. We can further
>    dynamically overwrite the original code string with a new (internal,
>    one byte) CALL_TRACE opcode at the addresses we have enumerated in a).
> 
>    The CALL_TRACE opcodes will trigger the callbacks from C to Python,
>    just as the current SET_LINENO does.
> 
> c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
>    the callback and if it returns successfully, we'll fetch the original
>    opcode for the current location from the copy of the original co_code.
>    Then we directly jump to the arg fetch code (or in case we fetch the
>    entire original opcode in CALL_TRACE - we jump to the dispatch code).

Tricky, but doable.

> Hmm. I think that's all.
> 
> At the heart of this scheme is the PyCode_Line2Addr function, which is
> the only blob in my head, for now.

I'm pretty sure that this would be straightforward.

I'm a little anxious about modifying the code, and was thinking myself
of a way to specify a bitvector of addresses where to break.  But that
would still cause some overhead for code without breakpoints, so I
guess you're right (and it's certainly a long-standing tradition in
breakpoint-setting!)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov@inrialpes.fr  Fri Aug 20 22:22:12 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 22:22:12 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908201959.PAA16105@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 20, 99 03:59:24 pm"
Message-ID: <199908202122.WAA26956@pukapuka.inrialpes.fr>

Guido van Rossum wrote:
> 
> 
> I'm a little anxious about modifying the code, and was thinking myself
> of a way to specify a bitvector of addresses where to break.  But that
> would still cause some overhead for code without breakpoints, so I
> guess you're right (and it's certainly a long-standing tradition in
> breakpoint-setting!)
> 

Hm. You're probably right, especially if someone wants to inspect
a code object from the debugger or something. But I belive, that
we can manage to redirect the instruction pointer in the beginning
of eval_code2 to the *copy* of co_code, and modify the copy with
CALL_TRACE, preserving the original intact.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From skip@mojam.com (Skip Montanaro)  Fri Aug 20 21:25:25 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Fri, 20 Aug 1999 15:25:25 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: <1276961295-70552@hypernet.com>
References: <skip@mojam.com>
 <199908181447.JAA05151@dolphin.mojam.com>
 <19990818153320.D61F6303120@snelboot.oratrix.nl>
 <1276961295-70552@hypernet.com>
Message-ID: <14269.47443.192469.525132@dolphin.mojam.com>

    Gordon> It gets worse, I think. How about the inconsistencies in POSIX
    Gordon> support among *nixes? How about NT being a superset of Win9x?
    Gordon> How about NTFS having capabilities that FAT does not? I'd guess
    Gordon> there are inconsistencies between Mac flavors, too.

To a certain degree I think the C module(s) that interface to the underlying 
OS's API can iron out differences.  In other cases you may have to document
minor (known) differences.  In still other cases you may have to relegate
particular functionality to the OS-dependent modules.

Skip Montanaro	| http://www.mojam.com/
skip@mojam.com  | http://www.musi-cal.com/~skip/
847-971-7098


From gmcm@hypernet.com  Fri Aug 20 23:38:14 1999
From: gmcm@hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 17:38:14 -0500
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <199908201509.LAA14726@eric.cnri.reston.va.us>
References: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."             <1276961301-70195@hypernet.com>
Message-ID: <1276937670-1491544@hypernet.com>

[me]
> > 
> >  I've often wished that the instance type object had an (optimized) 
> > __decref__ slot. With nothing but hand-waving to support it, I'll 
> > claim that would enable all these games.

[Guido]
> Without context, I don't know when this would be called.  If you
> want this called on all DECREFs (regardless of the refcount value),
> realize that this is a huge slowdown because it would mean the
> DECREF macro has to inspect the type object, which means several
> indirections. This would slow down *every* DECREF operation, not
> just those on instances with a __decref__ slot, because the DECREF
> macro doesn't know the type of the object!

This was more 2.0-ish speculation, and really thinking of classic C++ 
ref counting where decref would be a function call, not a macro. 
Still a slowdown, of course, but not quite so massive. The upside is 
opening up all kinds of tricks at the type object and user class 
levels, (such as weak refs and copy on write etc). Worth it? I'd 
think so, but I'm not a speed demon.

- Gordon


From tim_one@email.msn.com  Sat Aug 21 09:09:17 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 21 Aug 1999 04:09:17 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com>
Message-ID: <000201beebac$776d32e0$0c2d2399@tim>

[Skip Montanaro]
> ...
> 3. If Dan Connolly's contention is correct, importing the os module
>    today is not all that portable.  I can't really say one way or the
>    other, because I'm lucky enough to be able to confine my serious
>    programming to Unix.  I'm sure there's someone out there that
>    can try the following on a few platforms:
>
> 	  import os
> 	  dir(os)
>
>    and compare the output.

There's no need to, Skip.  Just read the os module docs; where a function
says, e.g., "Availability: Unix.", it doesn't show up on a Windows or Mac
box.

In that sense using (some) os functions is certainly unportable.  But I have
no sympathy for the phrasing of Dan's complaint:  if he calls os.getegid(),
*he* knows perfectly well that's a Unix-specific function, and expressing
outrage over it not working on NT is disingenuous.

OTOH, I don't think you're going to find anything in the OS module
documented as available only on Windows or only on Macs, and some
semi-portable functions (notoriosly chmod) are documented in ways that make
sense only to Unixheads.  This certainly gives a strong impression of
Unix-centricity to non-Unix weenies, and has got to baffle true newbies
completely.

So 'twould be nice to have a basic os module all of whose functions "run
everywhere", whose interfaces aren't copies of cryptic old Unixisms, and
whose docs are platform neutral.

If Guido is right that the os functions tend to get more portable over time,
fine, that module can grow over time too.  In the meantime, life would be
easier for everyone except Python's implementers.


From Vladimir.Marangozov@inrialpes.fr  Sat Aug 21 16:34:32 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Sat, 21 Aug 1999 16:34:32 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908202122.WAA26956@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 20, 99 10:22:12 pm"
Message-ID: <199908211534.QAA22392@pukapuka.inrialpes.fr>

[me]
> 
> Guido van Rossum wrote:
> > 
> > 
> > I'm a little anxious about modifying the code, and was thinking myself
> > of a way to specify a bitvector of addresses where to break.  But that
> > would still cause some overhead for code without breakpoints, so I
> > guess you're right (and it's certainly a long-standing tradition in
> > breakpoint-setting!)
> > 
> 
> Hm. You're probably right, especially if someone wants to inspect
> a code object from the debugger or something. But I belive, that
> we can manage to redirect the instruction pointer in the beginning
> of eval_code2 to the *copy* of co_code, and modify the copy with
> CALL_TRACE, preserving the original intact.
> 

I wrote a very rough first implementation of this idea. The files are at:

http://sirac.inrialpes.fr/~marangoz/python/lineno/


Basically, what I did is:

1) what I said :-)
2) No more SET_LINENO
3) In tracing mode, a copy of the original code is put in an additional
   slot (co_tracecode) of the code object. Then it's overwritten with
   CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

   The VM is routed to execute this code, and not the original one.

4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to
   normal execution of the original code.


A couple of things that need finalization:

a) how to deallocate the modified code string when tracing is off
b) the value of CALL_TRACE (I almost randomly picked 76)
c) I don't handle the cases where sys_tracefunc is enabled or disabled
   within the same code object. Tracing or not is determined before
   the main loop.
d) update pdb, so that it does not allow setting breakpoints on lines with
   no code. To achieve this, I think that python versions of PyCode_Addr2Line
   & PyCode_Line2Addr have to be integrated into pdb as helper functions.
e) correct bugs and design flaws
f) something else?


And here's the sample session of my lousy function f with this
'proof of concept' code:

>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 LOAD_CONST          1 (1)
          3 STORE_FAST          0 (a)
          6 LOAD_CONST          2 (None)
          9 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 10
  1     def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5  ->         a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>>

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer@appliedbiometrics.com  Sat Aug 21 18:10:50 1999
From: tismer@appliedbiometrics.com (Christian Tismer)
Date: Sat, 21 Aug 1999 19:10:50 +0200
Subject: [Python-Dev] about line numbers
References: <199908211534.QAA22392@pukapuka.inrialpes.fr>
Message-ID: <37BEDD9A.DBA817B1@appliedbiometrics.com>


Vladimir Marangozov wrote:
...
> I wrote a very rough first implementation of this idea. The files are at:
> 
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
> 
> Basically, what I did is:
> 
> 1) what I said :-)
> 2) No more SET_LINENO
> 3) In tracing mode, a copy of the original code is put in an additional
>    slot (co_tracecode) of the code object. Then it's overwritten with
>    CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

I'd rather keep the original code object as it is, create a copy
with inserted breakpoints and put that into the frame slot.
Pointing back to the original from there.

Then I'd redirect the code from the CALL_TRACE opcode completely
to a user-defined function.
Getting rid of the extra code object would be done by this function
when tracing is off. It also vanishes automatically when the frame
is released.

> a) how to deallocate the modified code string when tracing is off

By making the copy a frame property which is temporary, I think.
Or, if tracing should work for all frames, by pushing the original
in the back of the modified. Both works.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Vladimir.Marangozov@inrialpes.fr  Sat Aug 21 22:40:05 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Sat, 21 Aug 1999 22:40:05 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <37BEDD9A.DBA817B1@appliedbiometrics.com> from "Christian Tismer" at "Aug 21, 99 07:10:50 pm"
Message-ID: <199908212140.WAA51054@pukapuka.inrialpes.fr>

Chris, could you please repeat that step by step in more detail?
I'm not sure I understand your suggestions.

Christian Tismer wrote:
>
> Vladimir Marangozov wrote:
> ...
> > I wrote a very rough first implementation of this idea. The files are at:
> >
> > http://sirac.inrialpes.fr/~marangoz/python/lineno/
> >
> > Basically, what I did is:
> >
> > 1) what I said :-)
> > 2) No more SET_LINENO
> > 3) In tracing mode, a copy of the original code is put in an additional
> >    slot (co_tracecode) of the code object. Then it's overwritten with
> >    CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.
>
> I'd rather keep the original code object as it is, create a copy
> with inserted breakpoints and put that into the frame slot.

You seem to suggest to duplicate the entire code object, right?
And reference the modified duplicata from the current frame?

I actually duplicate only the opcode string (that is, the co_code string
object) and I don't see the point of duplicating the entire code object.

Keeping a reference from the current frame makes sense, but won't it
deallocate the modified version on every frame release (then redo all the
code duplication work for every frame) ?

> Pointing back to the original from there.

I don't understand this. What points back where?

>
> Then I'd redirect the code from the CALL_TRACE opcode completely
> to a user-defined function.

What user-defined function? I don't understand that either...
Except the sys_tracefunc, what other (user-defined) function do we have here?
Is it a Python or a C function?

> Getting rid of the extra code object would be done by this function
> when tracing is off.

How exactly? This seems to be obvious for you, but obviously, not for me ;-)

> It also vanishes automatically when the frame is released.

The function or the extra code object?

>
> > a) how to deallocate the modified code string when tracing is off
>
> By making the copy a frame property which is temporary, I think.

I understood that the frame lifetime could be exploited "somehow"...

> Or, if tracing should work for all frames, by pushing the original
> in the back of the modified. Both works.

Tracing is done for all frames, if sys_tracefunc is not NULL, which
is a function that usually ends up in the f_trace slot.

>
> ciao - chris

I'm confused. I didn't understand your idea.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer@appliedbiometrics.com  Sat Aug 21 22:23:10 1999
From: tismer@appliedbiometrics.com (Christian Tismer)
Date: Sat, 21 Aug 1999 23:23:10 +0200
Subject: [Python-Dev] about line numbers
References: <199908212140.WAA51054@pukapuka.inrialpes.fr>
Message-ID: <37BF18BE.B3D58836@appliedbiometrics.com>


Vladimir Marangozov wrote:
> 
> Chris, could you please repeat that step by step in more detail?
> I'm not sure I understand your suggestions.

I think I was too quick. I thought of copying the whole
code object, of course.

...
> > I'd rather keep the original code object as it is, create a copy
> > with inserted breakpoints and put that into the frame slot.
> 
> You seem to suggest to duplicate the entire code object, right?
> And reference the modified duplicata from the current frame?

Yes.

> I actually duplicate only the opcode string (that is, the co_code string
> object) and I don't see the point of duplicating the entire code object.
> 
> Keeping a reference from the current frame makes sense, but won't it
> deallocate the modified version on every frame release (then redo all the
> code duplication work for every frame) ?

You get two options by that.
1) permanently modifying one code object to be traceable is
pushing a copy of the original "behind" by means of some
co_back pointer. This keeps the patched one where the
original was, and makes a global debugging version.

2) Creating a copy for one frame, and putting the original
in to an co_back pointer. This gives debugging just
for this one frame.

...
> > Then I'd redirect the code from the CALL_TRACE opcode completely
> > to a user-defined function.
> 
> What user-defined function? I don't understand that either...
> Except the sys_tracefunc, what other (user-defined) function do we have here?
> Is it a Python or a C function?

I would suggest a Python function, of course.

> > Getting rid of the extra code object would be done by this function
> > when tracing is off.
> 
> How exactly? This seems to be obvious for you, but obviously, not for me ;-)

If the permanent tracing "1)" is used, just restore the code object's
contents from the original in co_back, and drop co_back.
In the "2)" version, just pull the co_back into the frame's code pointer
and loose the reference to the copy. Occours automatically on frame
release.

> > It also vanishes automatically when the frame is released.
> 
> The function or the extra code object?

The extra code object.

...
> I'm confused. I didn't understand your idea.

Forget it, it isn't more than another brain fart :-)

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From tim_one@email.msn.com  Sun Aug 22 02:25:22 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Sat, 21 Aug 1999 21:25:22 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908131347.OAA30740@pukapuka.inrialpes.fr>
Message-ID: <000001beec3d$348f0160$cb2d2399@tim>

[going back a week here, to dict resizing ...]

[Vladimir Marangozov]
> ...
> All in all, for performance reasons, dicts remain an exception
> to the rule of releasing memory ASAP.

Yes, except I don't think there is such a rule!  The actual rule is a
balancing act between the cost of keeping memory around "just in case", and
the expense of getting rid of it.

Resizing a dict is extraordinarily expensive because the entire table needs
to be rearranged, but lists make this tradeoff too (when you del a list
element or list slice, it still goes thru NRESIZE, which still keeps space
for as many as 100 "extra" elements around).

The various internal caches for int and frame objects (etc) also play this
sort of game; e.g., if I happen to have a million ints sitting around at
some time, Python effectively assumes I'll never want to reuse that int
storage for anything other than ints again.

python-rarely-releases-memory-asap-ly y'rs  - tim


From Vladimir.Marangozov@inrialpes.fr  Sun Aug 22 20:41:59 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Sun, 22 Aug 1999 20:41:59 +0100 (NFT)
Subject: [Python-Dev]  Memory  (was: about line numbers, which was shrinking dicts)
In-Reply-To: <000001beec3d$348f0160$cb2d2399@tim> from "Tim Peters" at "Aug 21, 99 09:25:22 pm"
Message-ID: <199908221941.UAA54480@pukapuka.inrialpes.fr>

Tim Peters wrote:
>
> [going back a week here, to dict resizing ...]

Yes, and the subject line does not correspond to the contents because
at the moment I've sent this message, I ran out of disk space and the
mailer picked a random header after destroying half of the messages
in this mailbox.

>
> [Vladimir Marangozov]
> > ...
> > All in all, for performance reasons, dicts remain an exception
> > to the rule of releasing memory ASAP.
>
> Yes, except I don't think there is such a rule!  The actual rule is a
> balancing act between the cost of keeping memory around "just in case", and
> the expense of getting rid of it.

Good point.

>
> Resizing a dict is extraordinarily expensive because the entire table needs
> to be rearranged, but lists make this tradeoff too (when you del a list
> element or list slice, it still goes thru NRESIZE, which still keeps space
> for as many as 100 "extra" elements around).
>
> The various internal caches for int and frame objects (etc) also play this
> sort of game; e.g., if I happen to have a million ints sitting around at
> some time, Python effectively assumes I'll never want to reuse that int
> storage for anything other than ints again.
>
> python-rarely-releases-memory-asap-ly y'rs  - tim

Yes, and I'm somewhat sensible to this issue afer spending 6 years
in a team which deals a lot with memory management (mainly DSM).

In other words, you say that Python tolerates *virtual* memory fragmentation
(a funny term :-). In the case of dicts and strings, we tolerate "internal
fragmentation" (a contiguous chunk is allocated, then partially used).
In the case of ints, floats or frames, we tolerate "external fragmentation".

And as you said, Python tolerates this because of the speed/space tradeoff.
Hopefully, all we deal with at this level is virtual memory, so even if you
have zillions of ints, it's the OS VMM that will help you more with its
long-term scheduling than Python's wild guesses about a hypothetical usage
of zillions of ints later.

I think that some OS concepts can give us hints on how to reduce our
virtual fragmentation (which, as we all know, is a not a very good thing).
A few kewords: compaction, segmentation, paging, sharing.

We can't do much about our internal fragmentation, except changing the
algorithms of dicts & strings (which is not appealing anyways). But it
would be nice to think about the external fragmentation of Python's caches.
Or even try to reduce the internal fragmentation in combination with the
internal caches...

BTW, this is the whole point of PyMalloc: in a virtual memory world, try
to reduce the distance between the user view and the OS view on memory.
PyMalloc addresses the fragmentation problem at a lower level of granularity
than an OS (that is, *within* a page), because most Python's objects are
very small. However, it can't handle efficiently large chunks like the
int/float caches. Basically what it does is: segmentation of the virtual
space and sharing of the cached free space. I think that Python could
improve on sharing its internal caches, without significant slowdowns...

The bottom line is that there's still plenty of room for exploring alternate
mem mgt strategies that fit better Python's memory needs as a whole.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack@oratrix.nl  Sun Aug 22 22:25:56 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Sun, 22 Aug 1999 23:25:56 +0200
Subject: [Python-Dev] Converting C objects to Python objects and back
Message-ID: <19990822212601.2D4BE18BA0D@oratrix.oratrix.nl>

Here's another siy idea, not having to do with optimization.

On the Mac, and as far as I know on Windows as well, there are quite a 
few OS API structures that have a Python Object representation that is 
little more than the PyObject boilerplate plus a pointer to the C API
object. (And, of course, lots of methods to operate on the object).

To convert these from Python to C I always use boilerplate code like

  WindowPtr *win;

  PyArg_ParseTuple(args, "O&", PyWin_Convert, &win);

where PyWin_Convert is the function that takes a PyObject * and a void 
**, does the typecheck and sets the pointer. A similar way is used to
convert C pointers back to Python objects in Py_BuildValue.

What I was thinking is that it would be nice (if you are _very_
careful) if this functionality was available in struct. So, if I would 
somehow obtain (in a Python string) a C structure that contained, say, 
a WindowPtr and two ints, I would be able to say
  win, x, y = struct.unpack("Ohh", Win.WindowType)
and struct would be able, through the WindowType type object, to get
at the PyWin_Convert and PyWin_New functions.

A nice side issue is that you can add an option to PyArg_Parsetuple so 
you can say
   PyArg_ParseTuple(args, "O+", Win_WinObject, &win)
and you don't have to remember the different names the various types
use for their conversion routines.

Is this worth pursuing is is it just too dangerous? And, if it is
worth pursuing, I have to stash away the two function pointers
somewhere in the TypeObject, should I grab one of the tp_xxx fields
for this or is there a better place?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Fred L. Drake, Jr." <fdrake@acm.org  Mon Aug 23 15:54:07 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Mon, 23 Aug 1999 10:54:07 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000201beebac$776d32e0$0c2d2399@tim>
References: <14266.51743.904066.470431@dolphin.mojam.com>
 <000201beebac$776d32e0$0c2d2399@tim>
Message-ID: <14273.24719.865520.797568@weyr.cnri.reston.va.us>

Tim Peters writes:
 > OTOH, I don't think you're going to find anything in the OS module
 > documented as available only on Windows or only on Macs, and some

Tim,
  Actually, the spawn*() functions are included in os and are
documented as Windows-only, along with the related P_* constants.
These are provided by the nt module.

 > everywhere", whose interfaces aren't copies of cryptic old Unixisms, and
 > whose docs are platform neutral.

  I'm alwasy glad to see documentation patches, or even pointers to
specific problems.  Being a Unix-weenie myself, making the
documentation more readable to Windows-weenies can be difficult at
times.  But given useful pointers, I can usually pull it off, or at
least drive someone who canto do so.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From tim_one@email.msn.com  Tue Aug 24 07:32:49 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 24 Aug 1999 02:32:49 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
Message-ID: <000701beedfa$7c5c8e40$902d2399@tim>

[Fred L. Drake, Jr.]
>   Actually, the spawn*() functions are included in os and are
> documented as Windows-only, along with the related P_* constants.
> These are provided by the nt module.

I stand corrected, Fred -- so how do the Unix dweebs like this Windows crap
cluttering "their" docs <wink>?

[Tim, pitching a portable sane interface to a portable sane subset of
 os functionality]

>   I'm alwasy glad to see documentation patches, or even pointers to
> specific problems.  Being a Unix-weenie myself, making the
> documentation more readable to Windows-weenies can be difficult at
> times.  But given useful pointers, I can usually pull it off, or at
> least drive someone who canto do so.  ;-)

No, it's deeper than that.  Some of the inherited Unix interfaces are flatly
incomprehensible to anyone other than a Unix-head, but the functionality is
supplied only in that form (docs may ease the pain, but the interfaces still
suck); for example,

    mkdir (path[, mode])
    Create a directory named path with numeric mode mode.
    The default mode is 0777 (octal). On some systems, mode
    is ignored. Where it is used, the current umask value is
    first masked out. Availability: Macintosh, Unix, Windows.

If you have a sister or parent or 3-year-old child (they're all equivalent for
this purpose <wink>), just imagine them reading that.  If you can't, I'll have
my sister call you <wink>.  Raw numeric permission modes, octal mode notation,
and the "umask" business are Unix-specific -- and even Unices supply symbolic
ways to specify permissions.

chmod is likely the one I hear the most gripes about.  Windows heads are
looking to change "file attributes", the name "chmod" is gibberish to them,
most of the Unix mode bits make no sense under Windows (& contra Guido's
optimism, never will) even if you know the secret octal code, and Windows has
several attributes (hidden bit, system bit, archive bit) chmod can't get at.
The only portable functionality here is the write bit, but no non-Unix person
could possibly guess either that chmod is the function they need, or what to
type after someone tells them it's chmod.

So this is less a doc issue than that more of os needs to become more like
os.path (i.e., intelligently named functions with intelligently abstracted
interfaces).

never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs  - tim


From skip@mojam.com (Skip Montanaro)  Tue Aug 24 18:21:53 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Tue, 24 Aug 1999 12:21:53 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000701beedfa$7c5c8e40$902d2399@tim>
References: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
 <000701beedfa$7c5c8e40$902d2399@tim>
Message-ID: <14274.53860.210265.71990@dolphin.mojam.com>

    Tim> chmod is likely the one I hear the most gripes about.  Windows
    Tim> heads are looking to change "file attributes", the name "chmod" is
    Tim> gibberish to them

Well, we could confuse everyone and rename "chmod" to "chfat" (is that like
file system liposuction?).  Windows probably has an equivalent function
whose name is 17 characters long which we'd all love to type, I'm sure. ;-)

    Tim> most of the Unix mode bits make no sense under Windows (& contra
    Tim> Guido's optimism, never will) even if you know the secret octal
    Tim> code ...

It beats a secret handshake.  Imagine all the extra peripherals we'd have to
make available for everyone's computer. ;-)

    Tim> So this is less a doc issue than that more of os needs to become
    Tim> more like os.path (i.e., intelligently named functions with
    Tim> intelligently abstracted interfaces).

Hasn't Guido's position been that the interface modules like os, posix, etc
are just a thin layer over the underlying API (Guido: note how I cleverly
attributed this position to you but also placed the responsibility for
correctness on your head!)?  If that's the case, perhaps we should provide a
slightly higher level module that abstracts the file system as objects, and
adopts a more user-friendly approach to the secret octal codes.  Those of us
worried about job security could continue to use the lower level module and
leave the higher level interface for former Visual Basic programmers.

    Tim> never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs -

maybe-the-"e"-key-stuck-on-his-TTY-ly y'rs...

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From Fred L. Drake, Jr." <fdrake@acm.org  Tue Aug 24 19:21:44 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Tue, 24 Aug 1999 14:21:44 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com>
References: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
 <000701beedfa$7c5c8e40$902d2399@tim>
 <14274.53860.210265.71990@dolphin.mojam.com>
Message-ID: <14274.58040.138331.413958@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > whose name is 17 characters long which we'd all love to type, I'm sure. ;-)

  Just 17?  ;-)

 >     Tim> So this is less a doc issue than that more of os needs to become
 >     Tim> more like os.path (i.e., intelligently named functions with
 >     Tim> intelligently abstracted interfaces).

  Sounds like some doc improvements can really help improve things, at 
least in the short term.

 > correctness on your head!)?  If that's the case, perhaps we should provide a
 > slightly higher level module that abstracts the file system as objects, and
 > adopts a more user-friendly approach to the secret octal codes.  Those of us

  I'm all for an object interface to a logical filesystem; having had
to write just such a thing in Java not long ago, and we have a similar 
construct in Python (not by me, though), that we use in our Knowbot
work.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From tim_one@email.msn.com  Wed Aug 25 08:02:21 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 25 Aug 1999 03:02:21 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com>
Message-ID: <000801beeec7$c6f06b20$fc2d153f@tim>

[Skip Montanaro]
> Well, we could confuse everyone and rename "chmod" to "chfat" ...

I don't want to rename anything, nor do I want to use MS-specific names.  chmod
is both the wrong spelling & the wrong functionality for all non-Unix systems.
os.path did a Good Thing by, e.g., introducing getmtime(), despite that
everyone knows <wink> it's just os.stat()[8].  New isreadonly(path) and
setreadonly(path) are more what I'm after; nothing beyond that is portable, &
never will be.

> Windows probably has an equivalent function whose name is 17
> characters long

Indeed, SetFileAttributes is exactly 17 characters long (you moonlighting on
NT, Skip?!).  But while Windows geeks would like to use that, it's both the
wrong spelling & the wrong functionality for all non-Windows systems.

> ...
> Hasn't Guido's position been that the interface modules like os,
> posix, etc are just a thin layer over the underlying API (Guido:
> note how I cleverly attributed this position to you but also placed
> the responsibility for correctness on your head!)?  If that's the
> case, perhaps we should provide a slightly higher level module that
> abstracts the file system as objects, and adopts a more user-friendly
> approach to the secret octal codes.

Like that, yes.

> Those of us worried about job security could continue to use the
> lower level module and leave the higher level interface for former
> Visual Basic programmers.

You're just *begging* Guido to make the Python2 os module take all of its names
from the Win32 API <wink>.

it's-no-lamer-to-be-ignorant-of-unix-names-than-it-is-
    to-be-ignorant-of-chinese-ly y'rs  - tim


From tim_one@email.msn.com  Wed Aug 25 08:05:31 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Wed, 25 Aug 1999 03:05:31 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
Message-ID: <000901beeec8$380d05c0$fc2d153f@tim>

[Fred L. Drake, Jr.]
> ...
>   I'm all for an object interface to a logical filesystem; having
> had to write just such a thing in Java not long ago, and we have
> a similar construct in Python (not by me, though), that we use in
> our Knowbot work.

Well, don't read anything unintended into this, but Guido *is* out of town, and
you *do* have the power to check in code outside the doc subtree ...

barry-will-help-he's-been-itching-to-revolt-too<wink>-ly y'rs  - tim


From bwarsaw@cnri.reston.va.us (Barry A. Warsaw)  Wed Aug 25 12:20:16 1999
From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) (Barry A. Warsaw)
Date: Wed, 25 Aug 1999 07:20:16 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
 <000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14275.53616.585669.890621@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

    TP> Well, don't read anything unintended into this, but Guido *is*
    TP> out of town, and you *do* have the power to check in code
    TP> outside the doc subtree ...

    TP> barry-will-help-he's-been-itching-to-revolt-too<wink>-ly y'rs

I'll bring the pitchforks if you bring the torches!
-Barry


From skip@mojam.com (Skip Montanaro)  Wed Aug 25 16:17:35 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Wed, 25 Aug 1999 10:17:35 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
 <000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14276.2229.983969.228891@dolphin.mojam.com>

    > I'm all for an object interface to a logical filesystem; having had to
    > write just such a thing in Java not long ago, and we have a similar
    > construct in Python (not by me, though), that we use in our Knowbot
    > work.

Fred,

Since this is the dev group, how about showing us the Knowbot's logical
filesystem API, and let's do some dev-ing...

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Aug 25 17:22:52 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 12:22:52 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000801beeec7$c6f06b20$fc2d153f@tim>
References: <14274.53860.210265.71990@dolphin.mojam.com>
 <000801beeec7$c6f06b20$fc2d153f@tim>
Message-ID: <14276.6236.605103.369339@weyr.cnri.reston.va.us>

Tim Peters writes:
 > os.path did a Good Thing by, e.g., introducing getmtime(), despite that
 > everyone knows <wink> it's just os.stat()[8].  New isreadonly(path) and
 > setreadonly(path) are more what I'm after; nothing beyond that is portable,

Tim,
  I think we can simply declare that isreadonly() checks that the file 
doesn't allow the user to read it, but setreadonly() sounds to me like 
it wouldn't be portable to Unix.  There's more than one (reasonable)
way to make a file unreadable to a user just by manipulating
permission bits, and which is best will vary according to both the
user and the file's existing permissions.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Aug 25 17:26:25 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 12:26:25 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
 <000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14276.6449.428851.402955@weyr.cnri.reston.va.us>

Tim Peters writes:
 > Well, don't read anything unintended into this, but Guido *is* out
 > of town, and you *do* have the power to check in code outside the
 > doc subtree ...

  Good thing I turned of the python-checkins list when I added the
curly bracket patch I've been working on!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From Fred L. Drake, Jr." <fdrake@acm.org  Wed Aug 25 19:46:30 1999
From: Fred L. Drake, Jr." <fdrake@acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 14:46:30 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.2229.983969.228891@dolphin.mojam.com>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
 <000901beeec8$380d05c0$fc2d153f@tim>
 <14276.2229.983969.228891@dolphin.mojam.com>
Message-ID: <14276.14854.366220.664463@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > Since this is the dev group, how about showing us the Knowbot's logical
 > filesystem API, and let's do some dev-ing...

  Well, I took a look at it, and I must confess it's just not really
different from the set of interfaces in the os module; the important
point is that they are methods instead of functions (other than a few
data items: sep, pardir, curdir).  The path attribute provided the
same interface as os.path.  Its only user-visible state is the
current-directory setting, which may or may not be that useful.
  We left off chmod(), which would make Tim happy, but that was only
because it wasn't meaningful in context.  We'd have to add it (or
something equivalent) for a general purpose filesystem object.  So
Tim's only happy if he can come up with a general interface that is
actually portable (consider my earlier comments on setreadonly()).
  On the other hand, you don't need chmod() or anything like it for
most situations where a filesystem object would be useful.  An
FTPFilesystem class would not be hard to write!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives


From jack@oratrix.nl  Wed Aug 25 22:43:16 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Wed, 25 Aug 1999 23:43:16 +0200
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Message by "Fred L. Drake, Jr." <fdrake@acm.org> ,
 Wed, 25 Aug 1999 12:22:52 -0400 (EDT) , <14276.6236.605103.369339@weyr.cnri.reston.va.us>
Message-ID: <19990825214321.D50AD18BA0F@oratrix.oratrix.nl>

But in Python, with its nice high-level datastructures, couldn't we
design the Mother Of All File Attribute Calls, which would optionally
map functionality from one platform to another?

As an example consider the Mac resource fork size. If on unix I did
  fattrs = os.getfileattributes(filename)
  rfsize = fattrs.get('resourceforksize')
it would raise an exception. If, however, I did
  rfsize = fattrs.get('resourceforksize', compat=1)
I would get a "close approximation", 0. Note that you want some sort
of a compat parameter, not a default value, as for some attributes
(the various atime/mtime/ctimes, permission bits, etc) you'd get a
default based on other file attributes that do exist on the current
platform.

Hmm, the file-attribute-object idea has the added advantage that you
can then use setfileattributes(filename, fattrs) to be sure that
you've copied all relevant attributes, independent of the platform
you're on.

Mapping permissions takes a bit more (design-) work, with unix having
user/group/other only and Windows having full-fledged ACLs (or nothing 
at all, depending how you look at it:-), but should also be doable.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Vladimir.Marangozov@inrialpes.fr  Thu Aug 26 07:10:01 1999
From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov)
Date: Thu, 26 Aug 1999 07:10:01 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908211534.QAA22392@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 21, 99 04:34:32 pm"
Message-ID: <199908260610.HAA20304@pukapuka.inrialpes.fr>

[me, dropping SET_LINENO]
> 
> I wrote a very rough first implementation of this idea. The files are at:
> 
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
> 
> ...
> 
> A couple of things that need finalization:
> 
> ...

An updated version is available at the same location.
I think that this one does The Right Thing (tm).

a) Everything is internal to the VM and totally hidden, as it should be.
b) No modifications of the code and frame objects (no additional slots)
c) The modified code string (used for tracing) is allocated dynamically
   when the 1st frame pointing to its original switches in trace mode,
   and is deallocated automatically when the last frame pointing to its
   original dies.

I feel better with this code so I can stop thinking about it and move on :-)
(leaving it to your appreciation).

What's next? File attributes? ;-)

It's not easy to weight what kind of common interface would be easy to grasp,
intuitive and unambiguous for the average user. I think that the people on
this list (being core developers) are more or less biased here (I'd say more
than less). Perhaps some input from the community (c.l.py) would help?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one@email.msn.com  Thu Aug 26 06:06:57 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 26 Aug 1999 01:06:57 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.14854.366220.664463@weyr.cnri.reston.va.us>
Message-ID: <000301beef80$d26158c0$522d153f@tim>

[Fred L. Drake, Jr.]
> ...
>   We left off chmod(), which would make Tim happy, but that was only
> because it wasn't meaningful in context.

I'd be appalled to see chmod go away; for many people it's comfortable and
useful.  I want *another* way, to do what little bit is portable in a way that
doesn't require first mastering a badly designed interface from a dying OS
<wink>.

> We'd have to add it (or something equivalent) for a general purpose
> filesystem object.  So Tim's only happy if he can come up with a
> general interface that is actually portable (consider my earlier
> comments on setreadonly()).

I don't care about general here; making up a general new way to spell
everything that everyone may want to do under every OS would create an
interface even worse than chmod's.  My sister doesn't want to create files that
are read-only to the world but writable to her group -- she just wants to mark
certain precious files as read-only to minimize the chance of accidental
destruction.  What she wants is easy to do under Windows or Unix, and I expect
she's the norm rather than the exception.

>   On the other hand, you don't need chmod() or anything like it for
> most situations where a filesystem object would be useful.  An
> FTPFilesystem class would not be hard to write!

An OO filesystem object with a .makereadonly method suits me fine <wink>.


From tim_one@email.msn.com  Thu Aug 26 06:06:54 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Thu, 26 Aug 1999 01:06:54 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.6236.605103.369339@weyr.cnri.reston.va.us>
Message-ID: <000201beef80$d072f640$522d153f@tim>

[Fred L. Drake, Jr.]
>   I think we can simply declare that isreadonly() checks that the
> file doesn't allow the user to read it,

Had more in mind that the file doesn't allow the user to write it <wink>.

> but setreadonly() sounds to me like it wouldn't be portable to Unix.
> There's more than one (reasonable) way to make a file unreadable to
> a user just by manipulating permission bits, and which is best will
> vary according to both the user and the file's existing permissions.

"Portable" implies least common denominator, and the plain meaning of read-only
is that nobody (whether owner, group or world in Unix) has write permission.
People wanting something beyond that are going beyond what's portable, and
that's fine -- I'm not suggesting getting rid of chmod for Unix dweebs.  But by
the same token, Windows dweebs should get some other (as non-portable as chmod)
way to fiddle the bits important on *their* OS (only one of which chmod can
affect).

Billions of newbies will delightedly stick to the portable interface with the
name that makes sense.

the-percentage-of-programmers-doing-systems-programming-shrinks-by-
    the-millisecond-ly y'rs  - tim


From mal@lemburg.com  Sat Aug 28 15:37:50 1999
From: mal@lemburg.com (M.-A. Lemburg)
Date: Sat, 28 Aug 1999 16:37:50 +0200
Subject: [Python-Dev] Iterating over dictionaries and objects in general
References: <990826114149.ZM59302@rayburn.hcs.tl>
 <199908261702.NAA01866@eric.cnri.reston.va.us>
 <37C57E01.2ADC02AE@digicool.com> <990826150216.ZM60002@rayburn.hcs.tl> <37C5BAF1.4D6C1031@lemburg.com> <37C5C320.CF11BC7C@digicool.com> <37C643B0.7ECA586@lemburg.com> <37C69FB3.9CB279C7@digicool.com>
Message-ID: <37C7F43E.67EEAB98@lemburg.com>

[Followup to a discussion on psa-members about iterating over
 dictionaries without creating intermediate lists]

Jim Fulton wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Jim Fulton wrote:
> > >
> > > > The problem with the PyDict_Next() approach is that it will only
> > > > work reliably from within a single C call. You can't return
> > > > to Python between calls to PyDict_Next(), because those could
> > > > modify the dictionary causing the next PyDict_Next() call to
> > > > fail or core dump.
> > >
> > > I do this all the time without problem.  Basically, you provide an
> > > index and  if the index is out of range, you simply get an end-of-data return.
> > > The only downside of this approach is that you might get "incorrect"
> > > results if the dictionary is modified between calls.  This isn't
> > > all that different from iterating over a list with an index.
> >
> > Hmm, that's true... but what if the dictionary gets resized
> > in between iterations ? The item layout is then likely to
> > change, so you could potentially get complet bogus.
> 
> I think I said that. :)

Just wanted to verify my understanding ;-)

> > Even iterating over items twice may then occur, I guess.
> 
> Yup.
> 
> Again, this is not so different from iterating over
> a list using a range:
> 
>   l=range(10)
>   for i in range.len(l):
>     l.insert(0,'Bruce')
>     print l[i]
> 
> This always outputs 'Bruce'. :)

Ok, so the "risk" is under user control. Fine with me...
 
> > Or perhaps via a special dictionary iterator, so that the following
> > works:
> >
> > for item in dictrange(d):
> >    ...
> 
> Yup.
> 
> > The iterator could then also take some extra actions to insure
> > that the dictionary hasn't been resized.
> 
> I don't think it should do that. It should simply
> stop when it has run out of items.

I think I'll give such an iterator a spin. Would be a nice
extension to mxTools.

BTW, a generic type slot for iterating over types would probably
be a nice feature too. The type slot could provide hooks of the
form it_first, it_last, it_next, it_prev which all work integer
index based, e.g. in pseudo code:

int i;
PyObject *item;

/* set up i and item to point to the first item */
if (obj.it_first(&i,&item) < 0)
   goto onError;
while (1) {
   PyObject_Print(item);
   /* move i and item to the next item; an IndexError is raised
      in case there are no more items */
   if (obj.it_next(&i,&item) < 0) {
	PyErr_Clear();
	break;
   }
}

These slots would cover all problem instances where iteration
over non-sequences or non-uniform sequences (i.e. sequences like
objects which don't provide konvex index sets, e.g. 1,2,3,6,7,8,11,12)
is required, e.g. dictionaries, multi-segment buffers

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   127 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gward@cnri.reston.va.us  Mon Aug 30 20:02:22 1999
From: gward@cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:02:22 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
Message-ID: <19990830150222.B428@cnri.reston.va.us>

Hi all --

it recently occured to me that the 'spawn' module I wrote for the
Distutils (and which Perry Stoll extended to handle NT), could fit
nicely in the core library.  On Unix, it's just a front-end to
fork-and-exec; on NT, it's a front-end to spawnv().  In either case,
it's just enough code (and just tricky enough code) that not everybody
should have to duplicate it for their own uses.

The basic idea is this:

  from spawn import spawn
  ...
  spawn (['cmd', 'arg1', 'arg2'])
  # or
  spawn (['cmd'] + args)

you get the idea: it takes a *list* representing the command to spawn:
no strings to parse, no shells to get in the way, no sneaky
meta-characters ruining your day, draining your efficiency, or
compromising your security.  (Conversely, no pipelines, redirection,
etc.)

The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()'
depending on os.name.  Additionally, it takes a couple of optional
keyword arguments (all booleans): 'search_path', 'verbose', and
'dry_run', which do pretty much what you'd expect.

The module as it's currently in the Distutils code is attached.  Let me
know what you think...

        Greg
-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From skip@mojam.com (Skip Montanaro)  Mon Aug 30 20:11:50 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Mon, 30 Aug 1999 14:11:50 -0500 (CDT)
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830150222.B428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <14282.54880.922571.792484@dolphin.mojam.com>

    Greg> it recently occured to me that the 'spawn' module I wrote for the
    Greg> Distutils (and which Perry Stoll extended to handle NT), could fit
    Greg> nicely in the core library.

How's spawn.spawn semantically different from the Windows-dependent
os.spawn?  How are stdout/stdin/stderr connected to the child process - just 
like fork+exec or something slightly higher level like os.popen?  If it's
semantically like os.spawn and a little bit higher level abstraction than
fork+exec, I'd vote for having the os module simply import it:

    from spawn import spawn

and thus make that function more widely available...

    Greg> The module as it's currently in the Distutils code is attached.

Not in the message I saw...

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From gward@cnri.reston.va.us  Mon Aug 30 20:14:57 1999
From: gward@cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:14:57 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830150222.B428@cnri.reston.va.us>; from Greg Ward on Mon, Aug 30, 1999 at 03:02:22PM -0400
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <19990830151457.C428@cnri.reston.va.us>

On 30 August 1999, To python-dev@python.org said:
> The module as it's currently in the Distutils code is attached.  Let me
> know what you think...

New definition of "attached": I'll just reply to my own message with the 
code I meant to attach.  D'oh!

------------------------------------------------------------------------
"""distutils.spawn

Provides the 'spawn()' function, a front-end to various platform-
specific functions for launching another program in a sub-process."""

# created 1999/07/24, Greg Ward

__rcsid__ = "$Id: spawn.py,v 1.2 1999/08/29 18:20:56 gward Exp $"

import sys, os, string
from distutils.errors import *


def spawn (cmd,
           search_path=1,
           verbose=0,
           dry_run=0):

    """Run another program, specified as a command list 'cmd', in a new
       process.  'cmd' is just the argument list for the new process, ie.
       cmd[0] is the program to run and cmd[1:] are the rest of its
       arguments.  There is no way to run a program with a name different
       from that of its executable.

       If 'search_path' is true (the default), the system's executable
       search path will be used to find the program; otherwise, cmd[0] must
       be the exact path to the executable.  If 'verbose' is true, a
       one-line summary of the command will be printed before it is run.
       If 'dry_run' is true, the command will not actually be run.

       Raise DistutilsExecError if running the program fails in any way;
       just return on success."""

    if os.name == 'posix':
        _spawn_posix (cmd, search_path, verbose, dry_run)
    elif os.name in ( 'nt', 'windows' ):          # ???
        _spawn_nt (cmd, search_path, verbose, dry_run)
    else:
        raise DistutilsPlatformError, \
              "don't know how to spawn programs on platform '%s'" % os.name

# spawn ()

def _spawn_nt ( cmd,
                search_path=1,
                verbose=0,
                dry_run=0):
    import string
    executable = cmd[0]
    if search_path:
        paths = string.split( os.environ['PATH'], os.pathsep)
        base,ext = os.path.splitext(executable)
        if (ext != '.exe'):
            executable = executable + '.exe'
        if not os.path.isfile(executable):
            paths.reverse()         # go over the paths and keep the last one
            for p in paths:
                f = os.path.join( p, executable )
                if os.path.isfile ( f ):
                    # the file exists, we have a shot at spawn working
                    executable = f
    if verbose:
        print string.join ( [executable] + cmd[1:], ' ')
    if not dry_run:
        # spawn for NT requires a full path to the .exe
        rc = os.spawnv (os.P_WAIT, executable, cmd)
        if rc != 0:
            raise DistutilsExecError("command failed: %d" % rc) 

    
def _spawn_posix (cmd,
                  search_path=1,
                  verbose=0,
                  dry_run=0):

    if verbose:
        print string.join (cmd, ' ')
    if dry_run:
        return
    exec_fn = search_path and os.execvp or os.execv

    pid = os.fork ()

    if pid == 0:                        # in the child
        try:
            #print "cmd[0] =", cmd[0]
            #print "cmd =", cmd
            exec_fn (cmd[0], cmd)
        except OSError, e:
            sys.stderr.write ("unable to execute %s: %s\n" %
                              (cmd[0], e.strerror))
            os._exit (1)
            
        sys.stderr.write ("unable to execute %s for unknown reasons" % cmd[0])
        os._exit (1)

    
    else:                               # in the parent
        # Loop until the child either exits or is terminated by a signal
        # (ie. keep waiting if it's merely stopped)
        while 1:
            (pid, status) = os.waitpid (pid, 0)
            if os.WIFSIGNALED (status):
                raise DistutilsExecError, \
                      "command %s terminated by signal %d" % \
                      (cmd[0], os.WTERMSIG (status))

            elif os.WIFEXITED (status):
                exit_status = os.WEXITSTATUS (status)
                if exit_status == 0:
                    return              # hey, it succeeded!
                else:
                    raise DistutilsExecError, \
                          "command %s failed with exit status %d" % \
                          (cmd[0], exit_status)
        
            elif os.WIFSTOPPED (status):
                continue

            else:
                raise DistutilsExecError, \
                      "unknown error executing %s: termination status %d" % \
                      (cmd[0], status)
# _spawn_posix ()
------------------------------------------------------------------------

-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From gward@cnri.reston.va.us  Mon Aug 30 20:31:55 1999
From: gward@cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:31:55 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <14282.54880.922571.792484@dolphin.mojam.com>; from Skip Montanaro on Mon, Aug 30, 1999 at 02:11:50PM -0500
References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com>
Message-ID: <19990830153155.D428@cnri.reston.va.us>

On 30 August 1999, Skip Montanaro said:
> 
>     Greg> it recently occured to me that the 'spawn' module I wrote for the
>     Greg> Distutils (and which Perry Stoll extended to handle NT), could fit
>     Greg> nicely in the core library.
> 
> How's spawn.spawn semantically different from the Windows-dependent
> os.spawn?

My understanding (purely from reading Perry's code!) is that the Windows
spawnv() and spawnve() calls require the full path of the executable,
and there is no spawnvp().  Hence, the bulk of Perry's '_spawn_nt()'
function is code to search the system path if the 'search_path' flag is
true.

In '_spawn_posix()', I just use either 'execv()' or 'execvp()'
for this.  The bulk of my code is the complicated dance required to
wait for a fork'ed child process to finish.

> How are stdout/stdin/stderr connected to the child process - just 
> like fork+exec or something slightly higher level like os.popen?

Just like fork 'n exec -- '_spawn_posix()' is just a front end to fork
and exec (either execv or execvp).

In a previous life, I *did* implement a spawning module for a certain
other popular scripting language that handles redirection and capturing
(backticks in the shell and that other scripting language).  It was a
lot of fun, but pretty hairy.  Took three attempts gradually developed
over two years to get it right in the end.  In fact, it does all the
easy stuff that a Unix shell does in spawning commands, ie. search the
path, fork 'n exec, and redirection and capturing.  Doesn't handle the
tricky stuff, ie. pipelines and job control.

The documentation for this module is 22 pages long; the code is 600+
lines of somewhat tricky Perl (1300 lines if you leave in comments and
blank lines).  That's why the Distutils spawn module doesn't do anything
with std{out,err,in}.

> If it's semantically like os.spawn and a little bit higher level
> abstraction than fork+exec, I'd vote for having the os module simply
> import it:

So os.spawnv and os.spawnve would be Windows-specific, but os.spawn
portable?  Could be confusing.  And despite the recent extended
discussion of the os module, I'm not sure if this fits the model.

BTW, is there anything like this on the Mac?  On what other OSs does it
even make sense to talk about programs spawning other programs?  (Surely
those GUI user interfaces have to do *something*...)

        Greg
-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From skip@mojam.com (Skip Montanaro)  Mon Aug 30 20:52:49 1999
From: skip@mojam.com (Skip Montanaro) (Skip Montanaro)
Date: Mon, 30 Aug 1999 14:52:49 -0500 (CDT)
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830153155.D428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
 <14282.54880.922571.792484@dolphin.mojam.com>
 <19990830153155.D428@cnri.reston.va.us>
Message-ID: <14282.57574.918011.54595@dolphin.mojam.com>

    Greg> BTW, is there anything like this on the Mac? 

There will be, once Jack Jansen contributes _spawn_mac... ;-)

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From jack@oratrix.nl  Mon Aug 30 22:25:04 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 30 Aug 1999 23:25:04 +0200
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Message by Greg Ward <gward@cnri.reston.va.us> ,
 Mon, 30 Aug 1999 15:31:55 -0400 , <19990830153155.D428@cnri.reston.va.us>
Message-ID: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl>

Recently, Greg Ward <gward@cnri.reston.va.us> said:
> BTW, is there anything like this on the Mac?  On what other OSs does it
> even make sense to talk about programs spawning other programs?  (Surely
> those GUI user interfaces have to do *something*...)

Yes, but the interface is quite a bit more high-level, so it's pretty
difficult to reconcile with the Unix and Windows "every argument is a
string" paradigm. You start the process and pass along an AppleEvent
(basically an RPC-call) that will be presented to the program upon
startup.

So on the mac there's a serious difference between (inventing the API
interface here, cut down to make it understandable to non-macheads:-)
  spawn("netscape", ("Open", "file.html"))
and
  spawn("netscape", ("OpenURL", "http://foo.com/file.html"))

The mac interface is (of course:-) infinitely more powerful, allowing
you to talk to running apps, adressing stuff in it as COM/OLE does,
etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is
impossible to represent in a meaningful way.

Add to that the fact that there's no stdin/stdout/stderr and there's
little common ground. The one area of common ground is "run program X
on files Y and Z and wait (or don't wait) for completion", so that is
something that could maybe have a special method that could be
implemented on all three mentioned platforms (and probably everything
else as well). And even then it'll be surprising to Mac users that
they have to _exit_ their editor (if you specify wait), not something
people commonly do.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido@CNRI.Reston.VA.US  Mon Aug 30 22:29:55 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 30 Aug 1999 17:29:55 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Your message of "Mon, 30 Aug 1999 23:25:04 +0200."
 <19990830212509.7F5C018B9FB@oratrix.oratrix.nl>
References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl>
Message-ID: <199908302129.RAA08442@eric.cnri.reston.va.us>

> Recently, Greg Ward <gward@cnri.reston.va.us> said:
> > BTW, is there anything like this on the Mac?  On what other OSs does it
> > even make sense to talk about programs spawning other programs?  (Surely
> > those GUI user interfaces have to do *something*...)
> 
> Yes, but the interface is quite a bit more high-level, so it's pretty
> difficult to reconcile with the Unix and Windows "every argument is a
> string" paradigm. You start the process and pass along an AppleEvent
> (basically an RPC-call) that will be presented to the program upon
> startup.
> 
> So on the mac there's a serious difference between (inventing the API
> interface here, cut down to make it understandable to non-macheads:-)
>   spawn("netscape", ("Open", "file.html"))
> and
>   spawn("netscape", ("OpenURL", "http://foo.com/file.html"))
> 
> The mac interface is (of course:-) infinitely more powerful, allowing
> you to talk to running apps, adressing stuff in it as COM/OLE does,
> etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is
> impossible to represent in a meaningful way.
> 
> Add to that the fact that there's no stdin/stdout/stderr and there's
> little common ground. The one area of common ground is "run program X
> on files Y and Z and wait (or don't wait) for completion", so that is
> something that could maybe have a special method that could be
> implemented on all three mentioned platforms (and probably everything
> else as well). And even then it'll be surprising to Mac users that
> they have to _exit_ their editor (if you specify wait), not something
> people commonly do.

Indeed.  I'm guessing that Greg wrote his code specifically to drive
compilers, not so much to invoke an editor on a specific file.  It so
happens that the Windows compilers have command lines that look
sufficiently like the Unix compilers that this might actually work.

On the Mac, driving the compilers is best done using AppleEvents, so
it's probably better to to try to abuse the spawn() interface for
that...  (Greg, is there a higher level where the compiler actions are 
described without referring to specific programs, but perhaps just to
compiler actions and input and output files?)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@CNRI.Reston.VA.US  Mon Aug 30 22:35:45 1999
From: guido@CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 30 Aug 1999 17:35:45 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Your message of "Mon, 30 Aug 1999 15:02:22 EDT."
 <19990830150222.B428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <199908302135.RAA08467@eric.cnri.reston.va.us>

> it recently occured to me that the 'spawn' module I wrote for the
> Distutils (and which Perry Stoll extended to handle NT), could fit
> nicely in the core library.  On Unix, it's just a front-end to
> fork-and-exec; on NT, it's a front-end to spawnv().  In either case,
> it's just enough code (and just tricky enough code) that not everybody
> should have to duplicate it for their own uses.
> 
> The basic idea is this:
> 
>   from spawn import spawn
>   ...
>   spawn (['cmd', 'arg1', 'arg2'])
>   # or
>   spawn (['cmd'] + args)
> 
> you get the idea: it takes a *list* representing the command to spawn:
> no strings to parse, no shells to get in the way, no sneaky
> meta-characters ruining your day, draining your efficiency, or
> compromising your security.  (Conversely, no pipelines, redirection,
> etc.)
> 
> The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()'
> depending on os.name.  Additionally, it takes a couple of optional
> keyword arguments (all booleans): 'search_path', 'verbose', and
> 'dry_run', which do pretty much what you'd expect.
> 
> The module as it's currently in the Distutils code is attached.  Let me
> know what you think...

I'm not sure that the verbose and dry_run options belong in the
standard library.  When both are given, this does something
semi-useful; for Posix that's basically just printing the arguments,
while for NT it prints the exact command that will be executed.  Not
sure if that's significant though.

Perhaps it's better to extract the code that runs the path to find the
right executable and make that into a separate routine.  (Also, rather
than reversing the path, I would break out of the loop at the first
hit.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward@cnri.reston.va.us  Mon Aug 30 22:38:36 1999
From: gward@cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 17:38:36 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <199908302129.RAA08442@eric.cnri.reston.va.us>; from Guido van Rossum on Mon, Aug 30, 1999 at 05:29:55PM -0400
References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> <199908302129.RAA08442@eric.cnri.reston.va.us>
Message-ID: <19990830173836.F428@cnri.reston.va.us>

On 30 August 1999, Guido van Rossum said:
> Indeed.  I'm guessing that Greg wrote his code specifically to drive
> compilers, not so much to invoke an editor on a specific file.  It so
> happens that the Windows compilers have command lines that look
> sufficiently like the Unix compilers that this might actually work.

Correct, but the spawn module I posted should work for any case where
you want to run an external command synchronously without redirecting
I/O.  (And it could probably be extended to handle those cases, but a) I
don't need them for Distutils [yet!], and b) I don't know how to do it
portably.)

> On the Mac, driving the compilers is best done using AppleEvents, so
> it's probably better to to try to abuse the spawn() interface for
> that...  (Greg, is there a higher level where the compiler actions are 
> described without referring to specific programs, but perhaps just to
> compiler actions and input and output files?)

[off-topic alert... probably belongs on distutils-sig, but there you go]
Yes, my CCompiler class is all about providing a (hopefully) compiler-
and platform-neutral interface to a C/C++ compiler.  Currently there're
only two concrete subclasses of this: UnixCCompiler and MSVCCompiler,
and they both obviously use spawn, because Unix C compilers and MSVC
both provide that kind of interface.  A hypothetical sibling class that
provides an interface to some Mac C compiler might use a souped-up spawn
that "knows about" Apple Events, or it might use some other interface to
Apple Events.  If Jack's simplified summary of what passing Apple Events
to a command looks like is accurate, maybe spawn can be souped up to
work on the Mac.  Or we might need a dedicated module for running Mac
programs.

So does anybody have code to run external programs on the Mac using
Apple Events?  Would it be possible/reasonable to add that as
'_spawn_mac()' to my spawn module?

        Greg
--
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From jack@oratrix.nl  Mon Aug 30 22:52:29 1999
From: jack@oratrix.nl (Jack Jansen)
Date: Mon, 30 Aug 1999 23:52:29 +0200
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Message by Greg Ward <gward@cnri.reston.va.us> ,
 Mon, 30 Aug 1999 17:38:36 -0400 , <19990830173836.F428@cnri.reston.va.us>
Message-ID: <19990830215234.ED4E718B9FB@oratrix.oratrix.nl>

Hmm, if we're talking a "Python Make" or some such here the best way
would probably be to use Tool Server. Tool Server is a thing that is
based on Apple's old MPW programming environment, that is still
supported by compiler vendors like MetroWerks.

The nice thing of Tool Server for this type of work is that it _is_
command-line based, so you can probably send it things like
  spawn("cc", "-O", "test.c")

But, although I know it is possible to do this with ToolServer, I
haven't a clue on how to do it...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From tim_one@email.msn.com  Tue Aug 31 06:44:18 1999
From: tim_one@email.msn.com (Tim Peters)
Date: Tue, 31 Aug 1999 01:44:18 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830153155.D428@cnri.reston.va.us>
Message-ID: <000101bef373$de2974c0$932d153f@tim>

[Greg Ward]
> ...
> In a previous life, I *did* implement a spawning module for
> a certain other popular scripting language that handles
> redirection and capturing (backticks in the shell and that other
> scripting language).  It was a lot of fun, but pretty hairy.  Took
> three attempts gradually developed over two years to get it right
> in the end.  In fact, it does all the easy stuff that a Unix shell
> does in spawning commands, ie. search the path, fork 'n exec, and
> redirection and capturing.  Doesn't handle the tricky stuff, ie.
> pipelines and job control.
>
> The documentation for this module is 22 pages long; the code
> is 600+ lines of somewhat tricky Perl (1300 lines if you leave
> in comments and blank lines).  That's why the Distutils spawn
> module doesn't do anything with std{out,err,in}.

Note that win/tclWinPipe.c-- which contains the Windows-specific support for
Tcl's "exec" cmd --is about 3,200 lines of C.  It does handle pipelines and
redirection, and even fakes pipes as needed with temp files when it can
identify a pipeline component as belonging to the 16-bit subsystem.  Even so,
the Tcl help page for "exec" bristles with hilarious caveats under the Windows
subsection; e.g.,

    When redirecting from NUL:, some applications may hang, others
    will get an infinite stream of "0x01" bytes, and some will
    actually correctly get an immediate end-of-file; the behavior
    seems to depend upon something compiled into the application
    itself.  When redirecting greater than 4K or so to NUL:, some
    applications will hang.  The above problems do not happen with
    32-bit applications.

Still, people seem very happy with Tcl's exec, and I'm certain no language
tries harder to provide a portable way to "do command lines".

Two points to that:

1) If Python ever wants to do something similar, let's steal the Tcl code (&
unlike stealing Perl's code, stealing Tcl's code actually looks possible --
it's very much better organized and written).

2) For all its heroic efforts to hide platform limitations,

int
Tcl_ExecObjCmd(dummy, interp, objc, objv)
    ClientData dummy;			/* Not used. */
    Tcl_Interp *interp;			/* Current interpreter. */
    int objc;				/* Number of arguments. */
    Tcl_Obj *CONST objv[];		/* Argument objects. */
{
#ifdef MAC_TCL

    Tcl_AppendResult(interp, "exec not implemented under Mac OS",
		(char *)NULL);
    return TCL_ERROR;

#else
...

a-generalized-spawn-is-a-good-start-ly y'rs  - tim


From fredrik@pythonware.com  Tue Aug 31 07:39:56 1999
From: fredrik@pythonware.com (Fredrik Lundh)
Date: Tue, 31 Aug 1999 08:39:56 +0200
Subject: [Python-Dev] Portable "spawn" module for core?
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <005101bef37b$b0415070$f29b12c2@secret.pythonware.com>

Greg Ward <gward@cnri.reston.va.us> wrote:
> it recently occured to me that the 'spawn' module I wrote for the
> Distutils (and which Perry Stoll extended to handle NT), could fit
> nicely in the core library.  On Unix, it's just a front-end to
> fork-and-exec; on NT, it's a front-end to spawnv().

any reason this couldn't go into the os module instead?

just add parts of it to os.py, and change the docs to say
that spawn* are supported on Windows and Unix...

(supporting the full set of spawn* primitives would
of course be nice, btw.  just like os.py provides all
exec variants...)

</F>


From da at ski.org  Tue Aug  3 01:01:26 1999
From: da at ski.org (David Ascher)
Date: Mon, 2 Aug 1999 16:01:26 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
Message-ID: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>

An issue which has dogged the NumPy project is that there is (to my
knowledge) no way to pickle very large arrays without creating strings
which contain all of the data.  This can be a problem given that NumPy
arrays tend to be very large -- often several megabytes, sometimes much
bigger.  This slows things down, sometimes a lot, depending on the
platform. It seems that it should be possible to do something more
efficient.

Two alternatives come to mind:

 -- define a new pickling protocol which passes a file-like object to the
    instance and have the instance write itself to that file, being as
    efficient or inefficient as it cares to.  This protocol is used only
    if the instance/type defines the appropriate slot.  Alternatively,
    enrich the semantics of the getstate interaction, so that an object
    can return partial data and tell the pickling mechanism to come back
    for more.

 -- make pickling of objects which support the buffer interface use that
    inteface's notion of segments and use that 'chunk' size to do
    something more efficient if not necessarily most efficient.  (oh, and
    make NumPy arrays support the buffer interface =).  This is simple
    for NumPy arrays since we want to pickle "everything", but may not be
    what other buffer-supporting objects want. 

Thoughts?  Alternatives?

--david


From mhammond at skippinet.com.au  Tue Aug  3 02:41:23 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 3 Aug 1999 10:41:23 +1000
Subject: [Python-Dev] Buffer interface in abstract.c?
Message-ID: <001001bedd48$ea796280$1101a8c0@bobcat>

Hi all,
	Im trying to slowly wean myself over to the buffer interfaces.

My exploration so far indicates that, for most cases, simply replacing
"PyString_FromStringAndSize" with "PyBuffer_FromMemory" handles the vast
majority of cases, and is preferred when the data contains arbitary bytes.
PyArg_ParseTuple("s#", ...) still works correctly as we would hope.

However, performing this explicitly is a pain.  Looking at getargs.c, the
code to achieve this is a little too convoluted to cut-and-paste each time.

Therefore, I would like to propose these functions to be added to
abstract.c:

int PyObject_GetBufferSize();
void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
const void *PyObject_GetReadOnlyBuffer();

Although equivalent functions exist for the buffer object, I can't see the
equivalent abstract implementations - ie, that work with any object
supporting the protocol.

Im willing to provide a patch if there is agreement a) the general idea is
good, and b) my specific spelling of the idea is OK (less likely -
PyBuffer_* seems better, but loses any implication of being abstract?).

Thoughts?

Mark.


From gstein at lyra.org  Tue Aug  3 03:51:43 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 02 Aug 1999 18:51:43 -0700
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat>
Message-ID: <37A64B2F.3386F0A9@lyra.org>

Mark Hammond wrote:
> ...
> Therefore, I would like to propose these functions to be added to
> abstract.c:
> 
> int PyObject_GetBufferSize();
> void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
> const void *PyObject_GetReadOnlyBuffer();
> 
> Although equivalent functions exist for the buffer object, I can't see the
> equivalent abstract implementations - ie, that work with any object
> supporting the protocol.
> 
> Im willing to provide a patch if there is agreement a) the general idea is
> good, and b) my specific spelling of the idea is OK (less likely -
> PyBuffer_* seems better, but loses any implication of being abstract?).

Marc-Andre proposed exactly the same thing back at the end of March (to
me and Guido). The two of us hashed out some of the stuff and M.A. came
up with a full patch for the stuff. Guido was relatively non-committal
at the point one way or another, but said they seemed fine. It appears
the stuff never made it into source control.

If Marc-Andre can resurface the final proposal/patch, then we'd be set.

Until then: use the bufferprocs :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Tue Aug  3 11:11:11 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 11:11:11 +0200
Subject: [Python-Dev] Pickling w/ low overhead
References: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
Message-ID: <37A6B22F.7A14BA2C@lemburg.com>

David Ascher wrote:
> 
> An issue which has dogged the NumPy project is that there is (to my
> knowledge) no way to pickle very large arrays without creating strings
> which contain all of the data.  This can be a problem given that NumPy
> arrays tend to be very large -- often several megabytes, sometimes much
> bigger.  This slows things down, sometimes a lot, depending on the
> platform. It seems that it should be possible to do something more
> efficient.
> 
> Two alternatives come to mind:
> 
>  -- define a new pickling protocol which passes a file-like object to the
>     instance and have the instance write itself to that file, being as
>     efficient or inefficient as it cares to.  This protocol is used only
>     if the instance/type defines the appropriate slot.  Alternatively,
>     enrich the semantics of the getstate interaction, so that an object
>     can return partial data and tell the pickling mechanism to come back
>     for more.
> 
>  -- make pickling of objects which support the buffer interface use that
>     inteface's notion of segments and use that 'chunk' size to do
>     something more efficient if not necessarily most efficient.  (oh, and
>     make NumPy arrays support the buffer interface =).  This is simple
>     for NumPy arrays since we want to pickle "everything", but may not be
>     what other buffer-supporting objects want.
> 
> Thoughts?  Alternatives?

Hmm, types can register their own pickling/unpickling functions
via copy_reg, so they can access the self.write method in pickle.py
to implement the write to file interface. Don't know how this
would be done for cPickle.c though.

For instances the situation is different since there is no
dispatching done on a per-class basis. I guess an optional argument
could help here.

Perhaps some lazy pickling wrapper would help fix this in general:
an object which calls back into the to-be-pickled object to
access the data rather than store the data in a huge string.

Yet another idea would be using memory mapped files instead
of strings as temporary storage (but this is probably hard to implement
right and not as portable).

Dunno... just some thoughts.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Aug  3 09:50:33 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 09:50:33 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org>
Message-ID: <37A69F49.3575AE85@lemburg.com>

Greg Stein wrote:
> 
> Mark Hammond wrote:
> > ...
> > Therefore, I would like to propose these functions to be added to
> > abstract.c:
> >
> > int PyObject_GetBufferSize();
> > void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
> > const void *PyObject_GetReadOnlyBuffer();
> >
> > Although equivalent functions exist for the buffer object, I can't see the
> > equivalent abstract implementations - ie, that work with any object
> > supporting the protocol.
> >
> > Im willing to provide a patch if there is agreement a) the general idea is
> > good, and b) my specific spelling of the idea is OK (less likely -
> > PyBuffer_* seems better, but loses any implication of being abstract?).
> 
> Marc-Andre proposed exactly the same thing back at the end of March (to
> me and Guido). The two of us hashed out some of the stuff and M.A. came
> up with a full patch for the stuff. Guido was relatively non-committal
> at the point one way or another, but said they seemed fine. It appears
> the stuff never made it into source control.
> 
> If Marc-Andre can resurface the final proposal/patch, then we'd be set.

Below is the code I currently use. I don't really remember if this
is what Greg and I discussed a while back, but I'm sure he'll
correct me ;-) Note that you the buffer length is implicitly
returned by these APIs.

/* Takes an arbitrary object which must support the character (single
   segment) buffer interface and returns a pointer to a read-only
   memory location useable as character based input for subsequent
   processing.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsCharBuffer(PyObject *obj,
			  const char **buffer,
			  int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    const char *pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getcharbuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a character buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getcharbuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}

/* Same as PyObject_AsCharBuffer() except that this API expects a
   readable (single segment) buffer interface and returns a pointer
   to a read-only memory location which can contain arbitrary data.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsReadBuffer(PyObject *obj,
			  const void **buffer,
			  int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    void *pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getreadbuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a readable buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getreadbuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}

/* Takes an arbitrary object which must support the writeable (single
   segment) buffer interface and returns a pointer to a writeable
   memory location in buffer of size buffer_len.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsWriteBuffer(PyObject *obj,
			   void **buffer,
			   int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    void*pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getwritebuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a writeable buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getwritebuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}


-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Tue Aug  3 11:53:39 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 03 Aug 1999 11:53:39 +0200
Subject: [Python-Dev] Buffer interface in abstract.c? 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Tue, 03 Aug 1999 09:50:33 +0200 , <37A69F49.3575AE85@lemburg.com> 
Message-ID: <19990803095339.E02CE303120@snelboot.oratrix.nl>

Why not pass the index to the As*Buffer routines as well and make getsegcount 
available too? Then you could code things like
  for(i=0; i<PyObject_GetBufferCount(obj); i++) {
	if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
		return -1;
	write(fp, buf, count);
  }

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein at lyra.org  Tue Aug  3 12:25:11 1999
From: gstein at lyra.org (Greg Stein)
Date: Tue, 03 Aug 1999 03:25:11 -0700
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <19990803095339.E02CE303120@snelboot.oratrix.nl>
Message-ID: <37A6C387.7360D792@lyra.org>

Jack Jansen wrote:
> 
> Why not pass the index to the As*Buffer routines as well and make getsegcount
> available too? Then you could code things like
>   for(i=0; i<PyObject_GetBufferCount(obj); i++) {
>         if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
>                 return -1;
>         write(fp, buf, count);
>   }

Simply because multiple segments hasn't been seen. All objects
supporting the buffer interface have a single segment. IMO, it is best
to drop the argument to make typical usage easier. For handling multiple
segments, a caller can use the raw interface rather than the handy
functions.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jim at digicool.com  Tue Aug  3 12:58:54 1999
From: jim at digicool.com (Jim Fulton)
Date: Tue, 03 Aug 1999 06:58:54 -0400
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat>
Message-ID: <37A6CB6E.C990F561@digicool.com>

Mark Hammond wrote:
> 
> Hi all,
>         Im trying to slowly wean myself over to the buffer interfaces.

OK, I'll bite.  Where is the buffer interface documented?  I found references
to it in various places (e.g. built-in buffer()) but didn't find the interface 
itself.

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mal at lemburg.com  Tue Aug  3 13:06:46 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 13:06:46 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <19990803095339.E02CE303120@snelboot.oratrix.nl>
Message-ID: <37A6CD46.642A9C6D@lemburg.com>

Jack Jansen wrote:
> 
> Why not pass the index to the As*Buffer routines as well and make getsegcount
> available too? Then you could code things like
>   for(i=0; i<PyObject_GetBufferCount(obj); i++) {
>         if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
>                 return -1;
>         write(fp, buf, count);
>   }

Well, just like Greg said, this is not much different than using the
buffer interface directly. While the above would be a handy
PyObject_WriteAsBuffer() kind of helper, I don't think that this
is really used all that much. E.g. in mxODBC I use the APIs
for accessing the raw char data in a buffer: the pointer is passed
directly to the ODBC APIs without copying, which makes things
quite fast. IMHO, this is the greatest advantage of the buffer
interface.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at cnri.reston.va.us  Tue Aug  3 15:07:44 1999
From: fdrake at cnri.reston.va.us (Fred L. Drake)
Date: Tue, 3 Aug 1999 09:07:44 -0400 (EDT)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <37A64B2F.3386F0A9@lyra.org>
References: <001001bedd48$ea796280$1101a8c0@bobcat>
	<37A64B2F.3386F0A9@lyra.org>
Message-ID: <14246.59808.561395.761772@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Until then: use the bufferprocs :-)

Greg,
  On the topic of the buffer interface: Have you written documentation 
for this that I can include in the API reference?  Bugging you about
this is on my to-do list.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Tue Aug  3 13:29:43 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 13:29:43 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A6CB6E.C990F561@digicool.com>
Message-ID: <37A6D2A7.27F27554@lemburg.com>

Jim Fulton wrote:
> 
> Mark Hammond wrote:
> >
> > Hi all,
> >         Im trying to slowly wean myself over to the buffer interfaces.
> 
> OK, I'll bite.  Where is the buffer interface documented?  I found references
> to it in various places (e.g. built-in buffer()) but didn't find the interface
> itself.

I guess it's a read-the-source feature :-) Objects/bufferobject.c
and Include/object.h provide a start. Objects/stringobject.c has
a "sample" implementation.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Tue Aug  3 16:45:25 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 03 Aug 1999 16:45:25 +0200
Subject: [Python-Dev] Buffer interface in abstract.c? 
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Tue, 03 Aug 1999 03:25:11 -0700 , <37A6C387.7360D792@lyra.org> 
Message-ID: <19990803144526.6B796303120@snelboot.oratrix.nl>

> > Why not pass the index to the As*Buffer routines as well and make getsegcount
> > available too? 
> 
> Simply because multiple segments hasn't been seen. All objects
> supporting the buffer interface have a single segment.

Hmm. And I went out of my way to include this stupid multi-buffer stuff 
because the NumPy folks said they couldn't live without it (and one of the 
reasons for the buffer stuff was to allow NumPy arrays, which may be 
discontiguous, to be written efficiently).

Can someone confirm that the Numeric stuff indeed doesn't use this?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From da at ski.org  Tue Aug  3 18:19:19 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:19:19 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <37A6B22F.7A14BA2C@lemburg.com>
Message-ID: <Pine.WNT.4.04.9908030911550.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, M.-A. Lemburg wrote:

> Hmm, types can register their own pickling/unpickling functions
> via copy_reg, so they can access the self.write method in pickle.py
> to implement the write to file interface. 

Are you sure?  My understanding of copy_reg is, as stated in the doc:

pickle (type, function[, constructor]) 
    Declares that function should be used as a ``reduction'' function for
    objects of type or class type. function should return either a string
    or a tuple. The optional constructor parameter, if provided, is a
    callable object which can be used to reconstruct the object when
    called with the tuple of arguments returned by function at pickling
    time.  

How does one access the 'self.write method in pickle.py'?

> Perhaps some lazy pickling wrapper would help fix this in general:
> an object which calls back into the to-be-pickled object to
> access the data rather than store the data in a huge string.

Right.  That's an idea.

> Yet another idea would be using memory mapped files instead
> of strings as temporary storage (but this is probably hard to implement
> right and not as portable).

That's a very interesting idea!  I'll try that -- it might just be the
easiest way to do this.  I think that portability isn't a huge concern --
the folks who are coming up with the speed issue are on platforms which
have mmap support.

Thanks for the suggestions.

--david


From da at ski.org  Tue Aug  3 18:20:37 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:20:37 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <37A6C387.7360D792@lyra.org>
Message-ID: <Pine.WNT.4.04.9908030920070.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Greg Stein wrote:

> Simply because multiple segments hasn't been seen. All objects
> supporting the buffer interface have a single segment. IMO, it is best

FYI, if/when NumPy objects support the buffer API, they will require
multiple-segments.  


From da at ski.org  Tue Aug  3 18:23:31 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:23:31 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Buffer interface in abstract.c? 
In-Reply-To: <19990803144526.6B796303120@snelboot.oratrix.nl>
Message-ID: <Pine.WNT.4.04.9908030921430.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Jack Jansen wrote:

> > > Why not pass the index to the As*Buffer routines as well and make getsegcount
> > > available too? 
> > 
> > Simply because multiple segments hasn't been seen. All objects
> > supporting the buffer interface have a single segment.
> 
> Hmm. And I went out of my way to include this stupid multi-buffer stuff 
> because the NumPy folks said they couldn't live without it (and one of the 
> reasons for the buffer stuff was to allow NumPy arrays, which may be 
> discontiguous, to be written efficiently).
> 
> Can someone confirm that the Numeric stuff indeed doesn't use this?

/usr/LLNLDistribution/Numerical/Include$ grep buffer *.h
/usr/LLNLDistribution/Numerical/Include$

Yes. =) 

See the other thread on low-overhead pickling.

But again, *if* multiarrays supported the buffer interface, they'd have to
use the multi-segment feature (repeating myself).

--david


From mal at lemburg.com  Tue Aug  3 21:17:16 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 21:17:16 +0200
Subject: [Python-Dev] Pickling w/ low overhead
References: <Pine.WNT.4.04.9908030911550.145-100000@rigoletto.ski.org>
Message-ID: <37A7403C.3BC05D02@lemburg.com>

David Ascher wrote:
> 
> On Tue, 3 Aug 1999, M.-A. Lemburg wrote:
> 
> > Hmm, types can register their own pickling/unpickling functions
> > via copy_reg, so they can access the self.write method in pickle.py
> > to implement the write to file interface.
> 
> Are you sure?  My understanding of copy_reg is, as stated in the doc:
> 
> pickle (type, function[, constructor])
>     Declares that function should be used as a ``reduction'' function for
>     objects of type or class type. function should return either a string
>     or a tuple. The optional constructor parameter, if provided, is a
>     callable object which can be used to reconstruct the object when
>     called with the tuple of arguments returned by function at pickling
>     time.
> 
> How does one access the 'self.write method in pickle.py'?

Ooops. Sorry, that doesn't work... well at least not using "normal"
Python ;-) You could of course simply go up one stack frame and
then grab the self object and then... well, you know...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From skip at mojam.com  Tue Aug  3 22:47:04 1999
From: skip at mojam.com (Skip Montanaro)
Date: Tue,  3 Aug 1999 15:47:04 -0500 (CDT)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
References: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
Message-ID: <14247.21628.225029.392711@dolphin.mojam.com>

    David> An issue which has dogged the NumPy project is that there is (to
    David> my knowledge) no way to pickle very large arrays without creating
    David> strings which contain all of the data.  This can be a problem
    David> given that NumPy arrays tend to be very large -- often several
    David> megabytes, sometimes much bigger.  This slows things down,
    David> sometimes a lot, depending on the platform. It seems that it
    David> should be possible to do something more efficient.

David,

Using __getstate__/__setstate__, could you create a compressed
representation using zlib or some other scheme?  I don't know how well
numeric data compresses in general, but that might help.  Also, I trust you
use cPickle when it's available, yes?

Skip Montanaro	| http://www.mojam.com/
skip at mojam.com  | http://www.musi-cal.com/~skip/
847-475-3758


From da at ski.org  Tue Aug  3 22:58:23 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 13:58:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <14247.21628.225029.392711@dolphin.mojam.com>
Message-ID: <Pine.WNT.4.04.9908031349090.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Skip Montanaro wrote:

> Using __getstate__/__setstate__, could you create a compressed
> representation using zlib or some other scheme?  I don't know how well
> numeric data compresses in general, but that might help.  Also, I trust you
> use cPickle when it's available, yes?

I *really* hate to admit it, but I've found the source of the most massive
problem in the pickling process that I was using.  I didn't use binary
mode, which meant that the huge strings were written & read
one-character-at-a-time.

I think I'll put a big fat note in the NumPy doc to that effect.

(note that luckily this just affected my usage, not all NumPy users).

<embarassed sheepish grin>

--da


From gstein at lyra.org  Wed Aug  4 21:15:27 1999
From: gstein at lyra.org (Greg Stein)
Date: Wed, 04 Aug 1999 12:15:27 -0700
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex
References: <199908041313.JAA26344@weyr.cnri.reston.va.us>
Message-ID: <37A8914F.6F5B9971@lyra.org>

Fred L. Drake wrote:
> 
> Update of /projects/cvsroot/python/dist/src/Doc/api
> In directory weyr:/home/fdrake/projects/python/Doc/api
> 
> Modified Files:
>         api.tex
> Log Message:
> 
> Started documentation on buffer objects & types.  Very preliminary.
> 
> Greg Stein:  Please help with this; it's your baby!
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://www.python.org/mailman/listinfo/python-checkins


All righty. I'll send some doc on this stuff. Somebody else did the
initial buffer interface, but it seems that it has fallen to me now :-)

Please give me a little while to get to this, though. I'm in and out of
town for the next four weeks. <SubtleAnnouncement> I'm in the process of
moving into a new house in Palo Alto, CA, and I'm travelling back and
forth until Anni and I move for real in September. </SubtleAnnouncement>

I should be able to get to this by the weekend, or possibly in a couple
weeks.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From fdrake at acm.org  Wed Aug  4 23:00:26 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 4 Aug 1999 17:00:26 -0400 (EDT)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex
In-Reply-To: <37A8914F.6F5B9971@lyra.org>
References: <199908041313.JAA26344@weyr.cnri.reston.va.us>
	<37A8914F.6F5B9971@lyra.org>
Message-ID: <14248.43498.664539.597656@weyr.cnri.reston.va.us>

Greg Stein writes:
 > All righty. I'll send some doc on this stuff. Somebody else did the
 > initial buffer interface, but it seems that it has fallen to me now :-)

  I was not aware that you were not the origin of this work; feel free 
to pass it to the right person.

 > Please give me a little while to get to this, though. I'm in and out of
 > town for the next four weeks. <SubtleAnnouncement> I'm in the process of
 > moving into a new house in Palo Alto, CA, and I'm travelling back and
 > forth until Anni and I move for real in September. </SubtleAnnouncement>

  Cool!

 > I should be able to get to this by the weekend, or possibly in a couple
 > weeks.

  That's good enough for me.  I expect it may be a couple of months or 
more before I try and get another release out with various fixes and
additions.  There's not a huge need to update the released doc set,
other than a few embarassing editorial...er, "oversights" (!).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From jack at oratrix.nl  Thu Aug  5 11:57:33 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Thu, 05 Aug 1999 11:57:33 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api 
 api.tex
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Wed, 04 Aug 1999 12:15:27 -0700 , <37A8914F.6F5B9971@lyra.org> 
Message-ID: <19990805095733.69D90303120@snelboot.oratrix.nl>

> All righty. I'll send some doc on this stuff. Somebody else did the
> initial buffer interface, but it seems that it has fallen to me now :-)

I think I did, but I gladly bequeath it to you. (Hmm, that's the first time I 
typed "bequeath", I think).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From fredrik at pythonware.com  Thu Aug  5 17:46:43 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 5 Aug 1999 17:46:43 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <Pine.WNT.4.04.9908030920070.145-100000@rigoletto.ski.org>
Message-ID: <009801bedf59$b8150020$f29b12c2@secret.pythonware.com>

> > Simply because multiple segments hasn't been seen. All objects
> > supporting the buffer interface have a single segment. IMO, it is best
> 
> FYI, if/when NumPy objects support the buffer API, they will require
> multiple-segments.  

same goes for PIL.  in the worst case, there's
one segment per line.

...

on the other hand, I think something is missing from
the buffer design; I definitely don't like that people
can write and marshal objects that happen to
implement the buffer interface, only to find that
Python didn't do what they expected...

>>> import unicode
>>> import marshal
>>> u = unicode.unicode
>>> s = u("foo")
>>> data = marshal.dumps(s)
>>> marshal.loads(data)
'f\000o\000o\000'
>>> type(marshal.loads(data))
<type 'string'>

as for PIL, I would also prefer if the exported buffer
corresponded to what you get from im.tostring().  iirc,
that cannot be done -- I cannot export via a temporary
memory buffer, since there's no way to know when to
get rid of it...

</F>


From jack at oratrix.nl  Thu Aug  5 22:59:46 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Thu, 05 Aug 1999 22:59:46 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Message by "Fredrik Lundh" <fredrik@pythonware.com> ,
	     Thu, 5 Aug 1999 17:46:43 +0200 , <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> 
Message-ID: <19990805205952.531B9E267A@oratrix.oratrix.nl>

Recently, "Fredrik Lundh" <fredrik at pythonware.com> said:
> on the other hand, I think something is missing from
> the buffer design; I definitely don't like that people
> can write and marshal objects that happen to
> implement the buffer interface, only to find that
> Python didn't do what they expected...
> 
> >>> import unicode
> >>> import marshal
> >>> u = unicode.unicode
> >>> s = u("foo")
> >>> data = marshal.dumps(s)
> >>> marshal.loads(data)
> 'f\000o\000o\000'
> >>> type(marshal.loads(data))
> <type 'string'>

Hmm. Looking at the code there is a catchall at the end, with a
comment explicitly saying "Write unknown buffer-style objects as a string".
IMHO this is an incorrect design, but that's a bit philosophical (so
I'll gladly defer to Our Great Philosopher if he has anything to say
on the matter:-). Unless, of course, there are buffer-style non-string 
objects around that are better read back as strings than not read back 
at all.

Hmm again, I think I'd like it better if marshal.dumps() would barf on 
attempts to write unrepresentable data. Currently unrepresentable
objects are written as TYPE_UNKNOWN (unless they have bufferness (or
should I call that "a buffer-aspect"? :-)), which means you think you
are writing correctly marshalled data but you'll be in for an
exception when you try to read it back...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From akuchlin at mems-exchange.org  Fri Aug  6 00:24:03 1999
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 5 Aug 1999 18:24:03 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
Message-ID: <199908052224.SAA24159@amarok.cnri.reston.va.us>

A while back the suggestion was made that the mmapfile module be added
to the core distribution, and there was a guardedly positive reaction.
Should I go ahead and do that?  No one reported any problems when I
asked for bug reports, but that was probably because no one tried it;
putting it in the core would cause more people to try it.

I suppose this leads to a more important question: at what point
should we start checking 1.6-only things into the CVS tree?  For
example, once the current alphas of the re module are up to it
(they're not yet), when should they be checked in?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your
home.
    -- Terry Pratchett & Neil Gaiman, _Good Omens_


From bwarsaw at cnri.reston.va.us  Fri Aug  6 04:10:18 1999
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 5 Aug 1999 22:10:18 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
References: <199908052224.SAA24159@amarok.cnri.reston.va.us>
Message-ID: <14250.17418.781127.684009@anthem.cnri.reston.va.us>

>>>>> "AMK" == Andrew M Kuchling <akuchlin at mems-exchange.org> writes:

    AMK> I suppose this leads to a more important question: at what
    AMK> point should we start checking 1.6-only things into the CVS
    AMK> tree?  For example, once the current alphas of the re module
    AMK> are up to it (they're not yet), when should they be checked
    AMK> in?

Good question.  I've had a bunch of people ask about the string
methods branch, which I'm assuming will be a 1.6 feature, and I'd like
to get that checked in at some point too.  I think what's holding this
up is that Guido hasn't decided whether there will be a patch release
to 1.5.2 or not.

-Barry


From tim_one at email.msn.com  Fri Aug  6 04:26:06 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 5 Aug 1999 22:26:06 -0400
Subject: [Python-Dev] mmapfile module
In-Reply-To: <199908052224.SAA24159@amarok.cnri.reston.va.us>
Message-ID: <000201bedfb3$09a99000$98a22299@tim>

[Andrew M. Kuchling]
> ...
> I suppose this leads to a more important question: at what point
> should we start checking 1.6-only things into the CVS tree?  For
> example, once the current alphas of the re module are up to it
> (they're not yet), when should they be checked in?

I'd like to see a bugfix release of 1.5.2 put out first, then have at it.
There are several bugfixes that ought to go out ASAP.  Thread tstate races,
the cpickle/cookie.py snafu, and playing nice with current Tcl/Tk pop to
mind immediately.  I'm skeptical that anyone other than Guido could decide
what *needs* to go out, so it's a good thing he's got nothing to do <wink>.

one-boy's-opinion-ly y'rs  - tim


From mhammond at skippinet.com.au  Fri Aug  6 05:30:55 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 6 Aug 1999 13:30:55 +1000
Subject: [Python-Dev] mmapfile module
In-Reply-To: <000201bedfb3$09a99000$98a22299@tim>
Message-ID: <00a801bedfbc$1871a7e0$1101a8c0@bobcat>

[Tim laments]
> mind immediately.  I'm skeptical that anyone other than Guido
> could decide
> what *needs* to go out, so it's a good thing he's got nothing
> to do <wink>.

He has been very quiet recently - where are you hiding Guido.

> one-boy's-opinion-ly y'rs  - tim

Here is another.  Lets take a different tack - what has been checked in
since 1.5.2 that should _not_ go out - ie, is too controversial?

If nothing else, makes a good starting point, and may help Guido out:

Below summary of the CVS diff I just did, and categorized by my opinion.
It turns out that most of the changes would appear candidates.  While not
actually "bug-fixes", many have better documentation, removal of unused
imports etc, so would definately not hurt to get out. Looks like some build
issues have been fixed too.

Apart from possibly Tim's recent "UnboundLocalError" (which is the only
serious behaviour change) I can't see anything that should obviously be
ommitted.

Hopefully this is of interest...

[Disclaimer - lots of files here - it is quite possible I missed
something...]

Mark.


UNCONTROVERSIAL:
----------------
RCS file: /projects/cvsroot/python/dist/src/README,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cgi.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/ftplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/poplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/re.py,v
RCS file: /projects/cvsroot/python/dist/src/Tools/audiopy/README,v
  Doc changes.

RCS file: /projects/cvsroot/python/dist/src/Lib/SimpleHTTPServer.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cmd.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/htmllib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/netrc.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/pipes.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/pty.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/shlex.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/urlparse.py,v
  Remove unused imports

RCS file: /projects/cvsroot/python/dist/src/Lib/pdb.py,v
  Remove unused globals

RCS file: /projects/cvsroot/python/dist/src/Lib/popen2.py,v
  Change to cleanup

RCS file: /projects/cvsroot/python/dist/src/Lib/profile.py,v
  Remove unused imports and changes to comments.

RCS file: /projects/cvsroot/python/dist/src/Lib/pyclbr.py,v
  Better doc, and support for module level functions.

RCS file: /projects/cvsroot/python/dist/src/Lib/repr.py,v
  self.maxlist changed to self.maxdict

RCS file: /projects/cvsroot/python/dist/src/Lib/rfc822.py,v
  Doc changes, and better date handling.

RCS file: /projects/cvsroot/python/dist/src/configure,v
RCS file: /projects/cvsroot/python/dist/src/configure.in,v
  Looks like FreeBSD build flag changes.

RCS file: /projects/cvsroot/python/dist/src/Demo/classes/bitvec.py,v
RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v
  Whitespace fixes.

RCS file: /projects/cvsroot/python/dist/src/Demo/scripts/makedir.py,v
  Check we have passed a non empty string

RCS file: /projects/cvsroot/python/dist/src/Include/patchlevel.h,v
  1.5.2+

RCS file: /projects/cvsroot/python/dist/src/Lib/BaseHTTPServer.py,v
  Remove import rfc822 and more robust errors.

RCS file: /projects/cvsroot/python/dist/src/Lib/CGIHTTPServer.py,v
  Support for HTTP_COOKIE

RCS file: /projects/cvsroot/python/dist/src/Lib/fpformat.py,v
  NotANumber supports class exceptions.

RCS file: /projects/cvsroot/python/dist/src/Lib/macpath.py,v
  Use constants from stat module

RCS file: /projects/cvsroot/python/dist/src/Lib/macurl2path.py,v
  Minor changes to path parsing

RCS file: /projects/cvsroot/python/dist/src/Lib/mimetypes.py,v
  Recognise '.js': 'application/x-javascript',

RCS file: /projects/cvsroot/python/dist/src/Lib/sunau.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/wave.py,v
  Support for binary files.

RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v
  Reads file header to check for bsddb format.

RCS file: /projects/cvsroot/python/dist/src/Lib/xmllib.py,v
  XML may be at the start of the string, instead of the whole string.

RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/tkSimpleDialog.py,v
  Destroy method added.

RCS file: /projects/cvsroot/python/dist/src/Modules/cPickle.c,v
 As in the log :-)

RCS file: /projects/cvsroot/python/dist/src/Modules/cStringIO.c,v
  No longer a Py_FatalError on module init failure.

RCS file: /projects/cvsroot/python/dist/src/Modules/fpectlmodule.c,v
  Support for OSF in #ifdefs

RCS file: /projects/cvsroot/python/dist/src/Modules/makesetup,v
    # to handle backslashes for sh's that don't automatically
    # continue a read when the last char is a backslash

RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v
   Better error handling

RCS file: /projects/cvsroot/python/dist/src/Modules/timemodule.c,v
  #ifdef changes for __GNU_LIBRARY__/_GLIBC_

RCS file: /projects/cvsroot/python/dist/src/Python/errors.c,v
  Better error messages on Win32

RCS file: /projects/cvsroot/python/dist/src/Python/getversion.c,v
  Bigger buffer and strings.

RCS file: /projects/cvsroot/python/dist/src/Python/pystate.c,v
  Threading bug

RCS file: /projects/cvsroot/python/dist/src/Objects/floatobject.c,v
  Tim Peters writes:1. Fixes float divmod etc.

RCS file: /projects/cvsroot/python/dist/src/Objects/listobject.c,v
   Doc changes, and When deallocating a list, DECREF the items from the end
back to the start.

RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v
  Bug for to do with width of a formatspecifier

RCS file: /projects/cvsroot/python/dist/src/Objects/tupleobject.c,v
   Appropriate overflow checks so that things like sys.maxint*(1,)
can'tdump core.

RCS file: /projects/cvsroot/python/dist/src/Lib/tempfile.py,v
  don't cache attributes of type int

RCS file: /projects/cvsroot/python/dist/src/Lib/urllib.py,v
 Number of revisions.

RCS file: /projects/cvsroot/python/dist/src/Lib/aifc.py,v
  Chunk moved to new module.

RCS file: /projects/cvsroot/python/dist/src/Lib/audiodev.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/dbhash.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/dis.py,v
  Changes in comments.

RCS file: /projects/cvsroot/python/dist/src/Lib/cmpcache.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cmp.py,v
  New "shallow" arg.

RCS file: /projects/cvsroot/python/dist/src/Lib/dumbdbm.py,v
  Coerce f.tell() to int.

RCS file: /projects/cvsroot/python/dist/src/Modules/main.c,v
  Fix to tracebacks off by a line with -x

RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/Tkinter.py,v
  Number of changes you can review!

OTHERS:
--------

RCS file: /projects/cvsroot/python/dist/src/Lib/asynchat.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/asyncore.py,v
 Latest versions from Sam???

RCS file: /projects/cvsroot/python/dist/src/Lib/smtplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/sched.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/SocketServer.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/calendar.py,v  Sorry - out
of time to detail

RCS file: /projects/cvsroot/python/dist/src/Python/bltinmodule.c,v
  Unbound local, docstring, and better support for ExtensionClasses.

Freeze:
  Few changes

IDLE:
  Lotsa changes :-)

Number of .h files have #ifdef changes for CE I wont detail (but would be
great to get a few of these in - and I have more :-)

Tools directory:
  Number of changes - outa time to detail


From mal at lemburg.com  Fri Aug  6 10:54:20 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 06 Aug 1999 10:54:20 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl>
Message-ID: <37AAA2BC.466750B5@lemburg.com>

Jack Jansen wrote:
> 
> Recently, "Fredrik Lundh" <fredrik at pythonware.com> said:
> > on the other hand, I think something is missing from
> > the buffer design; I definitely don't like that people
> > can write and marshal objects that happen to
> > implement the buffer interface, only to find that
> > Python didn't do what they expected...
> >
> > >>> import unicode
> > >>> import marshal
> > >>> u = unicode.unicode
> > >>> s = u("foo")
> > >>> data = marshal.dumps(s)
> > >>> marshal.loads(data)
> > 'f\000o\000o\000'
> > >>> type(marshal.loads(data))
> > <type 'string'>

Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
that unicode objects use a two-byte character representation.

Note that implementing the char buffer interface will also give
you strange results with other code that uses
PyArg_ParseTuple(...,"s#",...), e.g. you could search through
Unicode strings as if they were normal 1-byte/char strings (and
most certainly not find what you're looking for, I guess).

> Hmm again, I think I'd like it better if marshal.dumps() would barf on
> attempts to write unrepresentable data. Currently unrepresentable
> objects are written as TYPE_UNKNOWN (unless they have bufferness (or
> should I call that "a buffer-aspect"? :-)), which means you think you
> are writing correctly marshalled data but you'll be in for an
> exception when you try to read it back...

I'd prefer an exception on write too.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   147 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Fri Aug  6 16:44:35 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 6 Aug 1999 10:44:35 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <00a801bedfbc$1871a7e0$1101a8c0@bobcat>
References: <000201bedfb3$09a99000$98a22299@tim>
	<00a801bedfbc$1871a7e0$1101a8c0@bobcat>
Message-ID: <14250.62675.807129.878242@weyr.cnri.reston.va.us>

Mark Hammond writes:
 > Apart from possibly Tim's recent "UnboundLocalError" (which is the only
 > serious behaviour change) I can't see anything that should obviously be

  Since UnboundLocalError is a subclass of NameError (what you got
before) normally, and they are the same string when -X is used, this
only represents a new name in the __builtin__ module for legacy code.
This should not be a problem; the only real difference is that, using
class exceptions for built-in exceptions, you get more useful
information in your tracebacks.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fredrik at pythonware.com  Sat Aug  7 12:51:56 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 7 Aug 1999 12:51:56 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com>
Message-ID: <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com>

> > > >>> import unicode
> > > >>> import marshal
> > > >>> u = unicode.unicode
> > > >>> s = u("foo")
> > > >>> data = marshal.dumps(s)
> > > >>> marshal.loads(data)
> > > 'f\000o\000o\000'
> > > >>> type(marshal.loads(data))
> > > <type 'string'>
> 
> Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> that unicode objects use a two-byte character representation.

>>> import array
>>> import marshal
>>> a = array.array
>>> s = a("f", [1, 2, 3])
>>> data = marshal.dumps(s)
>>> marshal.loads(data)
'\000\000\200?\000\000\000@\000\000@@'

looks like the various implementors haven't
really understood the intentions of whoever
designed the buffer interface...

</F>


From mal at lemburg.com  Sat Aug  7 18:14:56 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 07 Aug 1999 18:14:56 +0200
Subject: [Python-Dev] Some more constants for the socket module
Message-ID: <37AC5B80.56F740DD@lemburg.com>

Following the recent discussion on c.l.p about socket options,
I found that the socket module does not define all constants
defined in the (Linux) socket header file.

Below is a patch that adds a few more (note that the SOL_*
constants should be used for the setsockopt() level, not the
IPPROTO_* constants).


--- socketmodule.c~     Sat Aug  7 17:56:05 1999
+++ socketmodule.c      Sat Aug  7 18:10:07 1999
@@ -2005,14 +2005,48 @@ initsocket()
        PySocketSock_Type.tp_doc = sockettype_doc;
        Py_INCREF(&PySocketSock_Type);
        if (PyDict_SetItemString(d, "SocketType",
                                 (PyObject *)&PySocketSock_Type) != 0)
                return;
+
+       /* Address families (we only support AF_INET and AF_UNIX) */
+#ifdef AF_UNSPEC
+       insint(moddict, "AF_UNSPEC", AF_UNSPEC);
+#endif
        insint(d, "AF_INET", AF_INET);
 #ifdef AF_UNIX
        insint(d, "AF_UNIX", AF_UNIX);
 #endif /* AF_UNIX */
+#ifdef AF_AX25
+       insint(moddict, "AF_AX25", AF_AX25); /* Amateur Radio AX.25 */
+#endif
+#ifdef AF_IPX
+       insint(moddict, "AF_IPX", AF_IPX); /* Novell IPX */
+#endif
+#ifdef AF_APPLETALK
+       insint(moddict, "AF_APPLETALK", AF_APPLETALK); /* Appletalk DDP */
+#endif
+#ifdef AF_NETROM
+       insint(moddict, "AF_NETROM", AF_NETROM); /* Amateur radio NetROM */
+#endif
+#ifdef AF_BRIDGE
+       insint(moddict, "AF_BRIDGE", AF_BRIDGE); /* Multiprotocol bridge */
+#endif
+#ifdef AF_AAL5
+       insint(moddict, "AF_AAL5", AF_AAL5); /* Reserved for Werner's ATM */
+#endif
+#ifdef AF_X25
+       insint(moddict, "AF_X25", AF_X25); /* Reserved for X.25 project */
+#endif
+#ifdef AF_INET6
+       insint(moddict, "AF_INET6", AF_INET6); /* IP version 6 */
+#endif
+#ifdef AF_ROSE
+       insint(moddict, "AF_ROSE", AF_ROSE); /* Amateur Radio X.25 PLP */
+#endif
+
+       /* Socket types */
        insint(d, "SOCK_STREAM", SOCK_STREAM);
        insint(d, "SOCK_DGRAM", SOCK_DGRAM);
 #ifndef __BEOS__
 /* We have incomplete socket support. */
        insint(d, "SOCK_RAW", SOCK_RAW);
@@ -2048,11 +2082,10 @@ initsocket()
        insint(d, "SO_OOBINLINE", SO_OOBINLINE);
 #endif
 #ifdef SO_REUSEPORT
        insint(d, "SO_REUSEPORT", SO_REUSEPORT);
 #endif
-
 #ifdef SO_SNDBUF
        insint(d, "SO_SNDBUF", SO_SNDBUF);
 #endif
 #ifdef SO_RCVBUF
        insint(d, "SO_RCVBUF", SO_RCVBUF);
@@ -2111,14 +2144,43 @@ initsocket()
 #ifdef MSG_ETAG
        insint(d, "MSG_ETAG", MSG_ETAG);
 #endif
 
        /* Protocol level and numbers, usable for [gs]etsockopt */
-/* Sigh -- some systems (e.g. Linux) use enums for these. */
 #ifdef SOL_SOCKET
        insint(d, "SOL_SOCKET", SOL_SOCKET);
 #endif
+#ifdef  SOL_IP
+       insint(moddict, "SOL_IP", SOL_IP);
+#else
+       insint(moddict, "SOL_IP", 0);
+#endif
+#ifdef  SOL_IPX
+       insint(moddict, "SOL_IPX", SOL_IPX);
+#endif
+#ifdef  SOL_AX25
+       insint(moddict, "SOL_AX25", SOL_AX25);
+#endif
+#ifdef  SOL_ATALK
+       insint(moddict, "SOL_ATALK", SOL_ATALK);
+#endif
+#ifdef  SOL_NETROM
+       insint(moddict, "SOL_NETROM", SOL_NETROM);
+#endif
+#ifdef  SOL_ROSE
+       insint(moddict, "SOL_ROSE", SOL_ROSE);
+#endif
+#ifdef  SOL_TCP
+       insint(moddict, "SOL_TCP", SOL_TCP);
+#else
+       insint(moddict, "SOL_TCP", 6);
+#endif
+#ifdef  SOL_UDP
+       insint(moddict, "SOL_UDP", SOL_UDP);
+#else
+       insint(moddict, "SOL_UDP", 17);
+#endif
 #ifdef IPPROTO_IP
        insint(d, "IPPROTO_IP", IPPROTO_IP);
 #else
        insint(d, "IPPROTO_IP", 0);
 #endif
@@ -2266,10 +2328,32 @@ initsocket()
 #ifdef IP_ADD_MEMBERSHIP
        insint(d, "IP_ADD_MEMBERSHIP", IP_ADD_MEMBERSHIP);
 #endif
 #ifdef IP_DROP_MEMBERSHIP
        insint(d, "IP_DROP_MEMBERSHIP", IP_DROP_MEMBERSHIP);
+#endif
+#ifdef  IP_DEFAULT_MULTICAST_TTL
+       insint(moddict, "IP_DEFAULT_MULTICAST_TTL", IP_DEFAULT_MULTICAST_TTL);
+#endif
+#ifdef  IP_DEFAULT_MULTICAST_LOOP
+       insint(moddict, "IP_DEFAULT_MULTICAST_LOOP", IP_DEFAULT_MULTICAST_LOOP);
+#endif
+#ifdef  IP_MAX_MEMBERSHIPS
+       insint(moddict, "IP_MAX_MEMBERSHIPS", IP_MAX_MEMBERSHIPS);
+#endif
+
+       /* TCP options */
+#ifdef  TCP_NODELAY
+       insint(moddict, "TCP_NODELAY", TCP_NODELAY);
+#endif
+#ifdef  TCP_MAXSEG
+       insint(moddict, "TCP_MAXSEG", TCP_MAXSEG);
+#endif
+
+       /* IPX options */
+#ifdef  IPX_TYPE
+       insint(moddict, "IPX_TYPE", IPX_TYPE);
 #endif
 
        /* Initialize gethostbyname lock */
 #ifdef USE_GETHOSTBYNAME_LOCK
        gethostbyname_lock = PyThread_allocate_lock();

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   146 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sat Aug  7 22:15:08 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 07 Aug 1999 13:15:08 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com>
Message-ID: <37AC93CC.53982F3F@lyra.org>

Fredrik Lundh wrote:
> 
> > > > >>> import unicode
> > > > >>> import marshal
> > > > >>> u = unicode.unicode
> > > > >>> s = u("foo")
> > > > >>> data = marshal.dumps(s)
> > > > >>> marshal.loads(data)
> > > > 'f\000o\000o\000'
> > > > >>> type(marshal.loads(data))
> > > > <type 'string'>

This was a "nicety" that was put during a round of patches that I
submitted to Guido. We both had questions about it but figured that it
couldn't hurt since it at least let some things be marshalled out that
couldn't be marshalled before.

I would suggest backing out the marshalling of buffer-interface objects
and adding a mechanism for arbitrary type objects to marshal themselves.
Without the second part, arrays and Unicode objects aren't marshallable
at all (seems bad).

> > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > that unicode objects use a two-byte character representation.

Unicode objects should *not* implement the getcharbuffer slot. Only
read, write, and segcount.

> >>> import array
> >>> import marshal
> >>> a = array.array
> >>> s = a("f", [1, 2, 3])
> >>> data = marshal.dumps(s)
> >>> marshal.loads(data)
> '\000\000\200?\000\000\000@\000\000@@'
> 
> looks like the various implementors haven't
> really understood the intentions of whoever
> designed the buffer interface...

Arrays can/should support both the getreadbuffer and getcharbuffer
interface. The former: definitely. The latter: only if the contents are
byte-sized.

The loading back as a string is a different matter, as pointed out
above.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jack at oratrix.nl  Sun Aug  8 22:20:52 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Sun, 08 Aug 1999 22:20:52 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Sat, 07 Aug 1999 13:15:08 -0700 , <37AC93CC.53982F3F@lyra.org> 
Message-ID: <19990808202057.DB803E267A@oratrix.oratrix.nl>

Recently, Greg Stein <gstein at lyra.org> said:
> I would suggest backing out the marshalling of buffer-interface objects
> and adding a mechanism for arbitrary type objects to marshal themselves.
> Without the second part, arrays and Unicode objects aren't marshallable
> at all (seems bad).

This sounds like the right approach. It would require 2 slots in the
tp_ structure and a little extra glue for the typecodes (currently
marshall knows all the 1-letter typecodes for all objecttypes it can
handle, but types marshalling their own objects would require a
centralized registry of object types. For the time being it would
probably suffice to have the mapping of type<->letter be hardcoded in
marshal.h, but eventually you probably want a more extensible scheme,
where Joe R. Extension-Writer could add a marshaller to his objects
and know it won't collide with someone else's.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Mon Aug  9 10:56:30 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 09 Aug 1999 10:56:30 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990808202057.DB803E267A@oratrix.oratrix.nl>
Message-ID: <37AE97BE.2CADF48E@lemburg.com>

Jack Jansen wrote:
> 
> Recently, Greg Stein <gstein at lyra.org> said:
> > I would suggest backing out the marshalling of buffer-interface objects
> > and adding a mechanism for arbitrary type objects to marshal themselves.
> > Without the second part, arrays and Unicode objects aren't marshallable
> > at all (seems bad).
> 
> This sounds like the right approach. It would require 2 slots in the
> tp_ structure and a little extra glue for the typecodes (currently
> marshall knows all the 1-letter typecodes for all objecttypes it can
> handle, but types marshalling their own objects would require a
> centralized registry of object types. For the time being it would
> probably suffice to have the mapping of type<->letter be hardcoded in
> marshal.h, but eventually you probably want a more extensible scheme,
> where Joe R. Extension-Writer could add a marshaller to his objects
> and know it won't collide with someone else's.

This registry should ideally be reachable via C APIs. Then a module
writer could call these APIs in the init function of his module and
he'd be set. Since marshal won't be able to handle imports on the
fly (like pickle et al.), these modules will have to be imported
before unmarshalling.

Aside: wouldn't it make sense to move from marshal to pickle and
depreciate marshal altogether ? cPickle is quite fast and much more
flexible than marshal, plus it already provides mechanisms for
registering new types.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   144 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Mon Aug  9 15:49:44 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 09 Aug 1999 15:49:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Mon, 09 Aug 1999 10:56:30 +0200 , <37AE97BE.2CADF48E@lemburg.com> 
Message-ID: <19990809134944.BB2FC303120@snelboot.oratrix.nl>

> Aside: wouldn't it make sense to move from marshal to pickle and
> depreciate marshal altogether ? cPickle is quite fast and much more
> flexible than marshal, plus it already provides mechanisms for
> registering new types.

This is probably the best idea so far. Just remove the buffer-workaround in 
marshall, keep it functioning for the things it is used for now (like pyc 
files) and refer people to (c)Pickle for new development.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at CNRI.Reston.VA.US  Mon Aug  9 16:50:46 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 10:50:46 -0400
Subject: [Python-Dev] Some more constants for the socket module
In-Reply-To: Your message of "Sat, 07 Aug 1999 18:14:56 +0200."
             <37AC5B80.56F740DD@lemburg.com> 
References: <37AC5B80.56F740DD@lemburg.com> 
Message-ID: <199908091450.KAA29179@eric.cnri.reston.va.us>

Thanks for the socketmodule patch, Marc.  This was on my mental TO-DO
list for a long time!  I've checked it in.

(One note: I had a bit of trouble applying the patch; apparently your
mailer expanded all tabs to spaces.  Perhaps you could use attachments 
to mail diffs?  Also, you seem to have renamed 'd' to 'moddict' but
you didn't send the patch for that...)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at CNRI.Reston.VA.US  Mon Aug  9 18:26:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 12:26:28 -0400
Subject: [Python-Dev] preferred conference date?
Message-ID: <199908091626.MAA29411@eric.cnri.reston.va.us>

I need your input about the date of the next Python conference.

Foretec is close to a deal for a Python conference in January 2000 at
the Alexandria Old Town Hilton hotel.  Given our requirement of a good
location in the DC area, this is a very good deal (it's a brand new
hotel).  The prices are high (they tell me that the whole conference
will cost $900, with a room rate of $129) but it's a class A location
(metro, tons of restaurants, close to National Airport, etc.) and we
have found no cheaper DC hotel suitable for our purposes (even in drab
suburban locations).

I'm worried that I'll be flamed to hell for this by the PSA members,
but I don't think we can get the price any lower without starting all
over in a different location, probably causing several months of
delay.  If people won't come, Foretec (and I) will have learned a
valuable lesson and we'll rethink the issue for the 2001 conference.

Anyway, given that Foretec is likely to go with this hotel, we have a
choice of two dates: January 16-19, or 23-26 (both starting on a
Sunday with the tutorials).  This is where I need your help: which
date would you prefer?  Please mail me personally.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Mon Aug  9 18:31:43 1999
From: skip at mojam.com (Skip Montanaro)
Date: Mon,  9 Aug 1999 11:31:43 -0500 (CDT)
Subject: [Python-Dev] preferred conference date?
In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us>
References: <199908091626.MAA29411@eric.cnri.reston.va.us>
Message-ID: <14255.557.474160.824877@dolphin.mojam.com>

    Guido> The prices are high (they tell me that the whole conference will
    Guido> cost $900, with a room rate of $129) but it's a class A location

No way I (or my company) can afford to plunk down $900 for me to attend...

Skip


From mal at lemburg.com  Mon Aug  9 18:40:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 09 Aug 1999 18:40:45 +0200
Subject: [Python-Dev] Some more constants for the socket module
References: <37AC5B80.56F740DD@lemburg.com> <199908091450.KAA29179@eric.cnri.reston.va.us>
Message-ID: <37AF048D.FC0A540@lemburg.com>

Guido van Rossum wrote:
> 
> Thanks for the socketmodule patch, Marc.  This was on my mental TO-DO
> list for a long time!  I've checked it in.

Cool, thanks.
 
> (One note: I had a bit of trouble applying the patch; apparently your
> mailer expanded all tabs to spaces.  Perhaps you could use attachments
> to mail diffs?

Ok.

>  Also, you seem to have renamed 'd' to 'moddict' but
> you didn't send the patch for that...)

Oops, sorry... my "#define to insint" script uses 'd' as moddict,
that's the reason why.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   144 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at CNRI.Reston.VA.US  Mon Aug  9 19:30:36 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 13:30:36 -0400
Subject: [Python-Dev] preferred conference date?
In-Reply-To: Your message of "Mon, 09 Aug 1999 11:31:43 CDT."
             <14255.557.474160.824877@dolphin.mojam.com> 
References: <199908091626.MAA29411@eric.cnri.reston.va.us>  
            <14255.557.474160.824877@dolphin.mojam.com> 
Message-ID: <199908091730.NAA29559@eric.cnri.reston.va.us>

>     Guido> The prices are high (they tell me that the whole conference will
>     Guido> cost $900, with a room rate of $129) but it's a class A location
> 
> No way I (or my company) can afford to plunk down $900 for me to attend...

Let me clarify this.  The $900 is for the whole 4-day conference,
including a day of tutorials and developers' day.  I don't know what
the exact price breakdown will be, but the tutorials will probably be
$300.  Last year the total price was $700, with $250 for tutorials.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Tue Aug 10 14:04:27 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 13:04:27 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
Message-ID: <199908101204.NAA29572@pukapuka.inrialpes.fr>

Currently, dictionaries always grow until they are deallocated from
memory. This happens in PyDict_SetItem according to the following
code (before inserting the new item into the dict):

        /* if fill >= 2/3 size, double in size */
        if (mp->ma_fill*3 >= mp->ma_size*2) {
                if (dictresize(mp, mp->ma_used*2) != 0) {
                        if (mp->ma_fill+1 > mp->ma_size)
                                return -1;
                }
        }

The symmetric case is missing and this has intrigued me for a long time,
but I've never had the courage to look deeply into this portion of code
and try to propose a solution. Which is: reduce the size of the dict by
half when the nb of used items <= 1/6 the size.

This situation occurs far less frequently than dict growing, but anyways,
it seems useful for the degenerate cases where a dict has a peek usage,
then most of the items are deleted. This is usually the case for global
dicts holding dynamic object collections, etc.

A bonus effect of shrinking big dicts with deleted items is that
the lookup speed may be improved, because of the cleaned <dummy> entries
and the reduced overall size (resulting in a better hit ratio).

The (only) solution I could came with for this pb is the appended patch.
It is not immediately obvious, but in practice, it seems to work fine.
(inserting a print statement after the condition, showing the dict size
 and current usage helps in monitoring what's going on).

Any other ideas on how to deal with this? Thoughts, comments?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** dictobject.c-1.5.2	Fri Aug  6 18:51:02 1999
--- dictobject.c	Tue Aug 10 12:21:15 1999
***************
*** 417,423 ****
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key); 
  	return 0;
  }
  
--- 417,430 ----
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key);
! 	/* For bigger dictionaries, if used <= 1/6 size, half the size */
! 	if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) {
! 		if (dictresize(mp, mp->ma_used*2) != 0) {
! 			if (mp->ma_fill > mp->ma_size)
! 				return -1;
! 		}	  
! 	}
  	return 0;
  }
  

From Vladimir.Marangozov at inrialpes.fr  Tue Aug 10 15:20:36 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 14:20:36 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 10, 99 01:04:27 pm"
Message-ID: <199908101320.OAA21986@pukapuka.inrialpes.fr>

I wrote:
> 
> The (only) solution I could came with for this pb is the appended patch.
> It is not immediately obvious, but in practice, it seems to work fine.
> (inserting a print statement after the condition, showing the dict size
>  and current usage helps in monitoring what's going on).
> 
> Any other ideas on how to deal with this? Thoughts, comments?
> 

To clarify a bit what the patch does "as is", here's a short description:

The code is triggered in PyDict_DelItem only for sizes which are > MINSIZE*4,
i.e. greater than 4*4 = 16. Therefore, resizing will occur for a min size of
32 items.

one third  32 / 3 = 10
two thirds 32 * 2/3 = 21

one sixth  32 / 6 = 5

So the shinking will happen for a dict size of 32, of which 5 items are used
(the sixth was just deleted).  After the dictresize, the size will be 16, of
which 5 items are used, i.e. one third.

The threshold is fixed by the first condition of the patch. It could be
made 64, instead of 32. This is subject to discussion...

Obviously, this is most useful for bigger dicts, not for small ones.
A threshold of 32 items seemed to me to be a reasonable compromise.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From fredrik at pythonware.com  Tue Aug 10 14:35:33 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 10 Aug 1999 14:35:33 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>
Message-ID: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>

Greg Stein <gstein at lyra.org> wrote:
> > > > > >>> import unicode
> > > > > >>> import marshal
> > > > > >>> u = unicode.unicode
> > > > > >>> s = u("foo")
> > > > > >>> data = marshal.dumps(s)
> > > > > >>> marshal.loads(data)
> > > > > 'f\000o\000o\000'
> > > > > >>> type(marshal.loads(data))
> > > > > <type 'string'>
>
> > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > > that unicode objects use a two-byte character representation.
> 
> Unicode objects should *not* implement the getcharbuffer slot. Only
> read, write, and segcount.

unicode objects do not implement the getcharbuffer slot.
here's the relevant descriptor:

static PyBufferProcs unicode_as_buffer = {
    (getreadbufferproc) unicode_buffer_getreadbuf,
    (getwritebufferproc) unicode_buffer_getwritebuf,
    (getsegcountproc) unicode_buffer_getsegcount
};

the array module uses a similar descriptor.

maybe the unicode class shouldn't implement the
buffer interface at all?  sure looks like the best way
to avoid trivial mistakes (the current behaviour of
fp.write(unicodeobj) is even more serious than the
marshal glitch...)

or maybe the buffer design needs an overhaul?

</F>


From guido at CNRI.Reston.VA.US  Tue Aug 10 16:12:23 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Tue, 10 Aug 1999 10:12:23 -0400
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Your message of "Tue, 10 Aug 1999 14:35:33 +0200."
             <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> 
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>  
            <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> 
Message-ID: <199908101412.KAA02065@eric.cnri.reston.va.us>

> Greg Stein <gstein at lyra.org> wrote:
> > > > > > >>> import unicode
> > > > > > >>> import marshal
> > > > > > >>> u = unicode.unicode
> > > > > > >>> s = u("foo")
> > > > > > >>> data = marshal.dumps(s)
> > > > > > >>> marshal.loads(data)
> > > > > > 'f\000o\000o\000'
> > > > > > >>> type(marshal.loads(data))
> > > > > > <type 'string'>
> >
> > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > > > that unicode objects use a two-byte character representation.
> > 
> > Unicode objects should *not* implement the getcharbuffer slot. Only
> > read, write, and segcount.
> 
> unicode objects do not implement the getcharbuffer slot.
> here's the relevant descriptor:
> 
> static PyBufferProcs unicode_as_buffer = {
>     (getreadbufferproc) unicode_buffer_getreadbuf,
>     (getwritebufferproc) unicode_buffer_getwritebuf,
>     (getsegcountproc) unicode_buffer_getsegcount
> };
> 
> the array module uses a similar descriptor.
> 
> maybe the unicode class shouldn't implement the
> buffer interface at all?  sure looks like the best way
> to avoid trivial mistakes (the current behaviour of
> fp.write(unicodeobj) is even more serious than the
> marshal glitch...)
> 
> or maybe the buffer design needs an overhaul?

I think most places that should use the charbuffer interface actually
use the readbuffer interface.  This is what should be fixed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Tue Aug 10 19:53:56 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 19:53:56 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <37B06734.4339D3BF@lemburg.com>

Fredrik Lundh wrote:
> 
> unicode objects do not implement the getcharbuffer slot.
>...
> or maybe the buffer design needs an overhaul?

I think its usage does. The character slot should be used whenever
character data is needed, not the read buffer slot. The latter one is
for passing around raw binary data (without reinterpretation !),
if I understood Greg correctly back when I gave those abstract
APIs a try.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Aug 10 19:39:29 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 19:39:29 +0200
Subject: [Python-Dev] shrinking dicts
References: <199908101204.NAA29572@pukapuka.inrialpes.fr>
Message-ID: <37B063D1.29F3106A@lemburg.com>

Vladimir Marangozov wrote:
> 
> Currently, dictionaries always grow until they are deallocated from
> memory. This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
> 
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }
> 
> The symmetric case is missing and this has intrigued me for a long time,
> but I've never had the courage to look deeply into this portion of code
> and try to propose a solution. Which is: reduce the size of the dict by
> half when the nb of used items <= 1/6 the size.
> 
> This situation occurs far less frequently than dict growing, but anyways,
> it seems useful for the degenerate cases where a dict has a peek usage,
> then most of the items are deleted. This is usually the case for global
> dicts holding dynamic object collections, etc.
> 
> A bonus effect of shrinking big dicts with deleted items is that
> the lookup speed may be improved, because of the cleaned <dummy> entries
> and the reduced overall size (resulting in a better hit ratio).
> 
> The (only) solution I could came with for this pb is the appended patch.
> It is not immediately obvious, but in practice, it seems to work fine.
> (inserting a print statement after the condition, showing the dict size
>  and current usage helps in monitoring what's going on).
> 
> Any other ideas on how to deal with this? Thoughts, comments?

I think that integrating this into the C code is not really that
effective since the situation will not occur that often and then
it often better to let the programmer decide rather than integrate
an automatic downsize.

You can call dict.update({}) to force an internal
resize (the empty dictionary can be made global since it is not
manipulated in any way and thus does not cause creation overhead).

Perhaps a new method .resize(approx_size) would make this even
clearer. This would also have the benefit of allowing a programmer
to force allocation of the wanted size, e.g.

d = {}
d.resize(10000)
# Insert 10000 items in a batch insert

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov at inrialpes.fr  Tue Aug 10 21:58:27 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 20:58:27 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <37B063D1.29F3106A@lemburg.com> from "M.-A. Lemburg" at "Aug 10, 99 07:39:29 pm"
Message-ID: <199908101958.UAA22028@pukapuka.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> [me]
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> I think that integrating this into the C code is not really that
> effective since the situation will not occur that often and then
> it often better to let the programmer decide rather than integrate
> an automatic downsize.

Agreed that the situation is rare. But if it occurs, its Python's
responsability to manage its data structures (and system resources)
efficiently. As a programmer, I really don't want to be bothered with
internals -- I trust the interpreter for that. Moreover, how could
I decide that at some point, some dict needs to be resized in my
fairly big app, say IDLE?

> 
> You can call dict.update({}) to force an internal
> resize (the empty dictionary can be made global since it is not
> manipulated in any way and thus does not cause creation overhead).

I know that I can force the resize in other ways, but this is not
the point. I'm usually against the idea of changing the programming
logic because of my advanced knowledge of the internals.

> 
> Perhaps a new method .resize(approx_size) would make this even
> clearer. This would also have the benefit of allowing a programmer
> to force allocation of the wanted size, e.g.
> 
> d = {}
> d.resize(10000)
> # Insert 10000 items in a batch insert

This is interesting, but the two ideas are not mutually excusive.
Python has to dowsize dicts automatically (just the same way it doubles
the size automatically). Offering more through an API is a plus for
hackers. ;-)

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Tue Aug 10 22:19:46 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 22:19:46 +0200
Subject: [Python-Dev] shrinking dicts
References: <199908101958.UAA22028@pukapuka.inrialpes.fr>
Message-ID: <37B08962.6DFB3F0@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> >
> > [me]
> > > Any other ideas on how to deal with this? Thoughts, comments?
> >
> > I think that integrating this into the C code is not really that
> > effective since the situation will not occur that often and then
> > it often better to let the programmer decide rather than integrate
> > an automatic downsize.
> 
> Agreed that the situation is rare. But if it occurs, its Python's
> responsability to manage its data structures (and system resources)
> efficiently. As a programmer, I really don't want to be bothered with
> internals -- I trust the interpreter for that. Moreover, how could
> I decide that at some point, some dict needs to be resized in my
> fairly big app, say IDLE?

You usually don't ;-) because "normal" dict only grow (well, more or
less). The downsizing thing will only become a problem if you use
dictionaries in certain algorithms and there you handle the problem
manually.

My stack implementation uses the same trick, BTW. Memory is cheap
and with an extra resize method (which the mxStack implementation
has), problems can be dealt with explicitly for everyone to see
in the code.

> > You can call dict.update({}) to force an internal
> > resize (the empty dictionary can be made global since it is not
> > manipulated in any way and thus does not cause creation overhead).
> 
> I know that I can force the resize in other ways, but this is not
> the point. I'm usually against the idea of changing the programming
> logic because of my advanced knowledge of the internals.

True, that why I mentioned...
 
> >
> > Perhaps a new method .resize(approx_size) would make this even
> > clearer. This would also have the benefit of allowing a programmer
> > to force allocation of the wanted size, e.g.
> >
> > d = {}
> > d.resize(10000)
> > # Insert 10000 items in a batch insert
> 
> This is interesting, but the two ideas are not mutually excusive.
> Python has to dowsize dicts automatically (just the same way it doubles
> the size automatically). Offering more through an API is a plus for
> hackers. ;-)

It's not really for hackers: the point is that it makes the technique
visible and understandable (as opposed to the hack above). The same
could be useful for lists too (the hack there is l = [None] * size,
which I find rather difficult to understand at first sight...).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond at skippinet.com.au  Wed Aug 11 00:39:30 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed, 11 Aug 1999 08:39:30 +1000
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <37B08962.6DFB3F0@lemburg.com>
Message-ID: <010901bee381$36ee5d30$1101a8c0@bobcat>

Looking over the messages from Marc and Vladimir, Im going to add my 2c
worth.

IMO, Marc's position is untenable iff it can be demonstrated that the
"average" program is likely to see "sparse" dictionaries, and such
dictionaries have an adverse effect on either speed or memory.

The analogy is quite simple - you dont need to manually resize lists or
dicts before inserting (to allocate more storage - an internal
implementation issue) so neither should you need to manually resize when
deleting (to reclaim that storage - still internal implementation).
Suggesting that the allocation of resources should be automatic, but the
recycling of them not be automatic flies in the face of everything else -
eg, you dont need to delete each object - when it is no longer referenced,
its memory is reclaimed automatically.

Marc's position is only reasonable if the specific case we are talking
about is very very rare, and unlikely to be hit by anyone with normal,
real-world requirements or programs.  In this case, exposing the
implementation detail is reasonable.

So, the question comes down to: "What is the benefit to Vladmir's patch?"

Maybe we need some metrics on some dictionaries.  For example, maybe a
doctored Python that kept stats for each dictionary and log this info.  The
output of this should be able to tell you what savings you could possibly
expect.  If you find that the average program really would not benefit at
all (say only a few K from a small number of dicts) then the horse was
probably dead well before we started flogging it.  If however you can
demonstrate serious benefits could be achieved, then interest may pick up
and I too would lobby for automatic downsizing.

Mark.


From tim_one at email.msn.com  Wed Aug 11 07:30:20 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 11 Aug 1999 01:30:20 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr>
Message-ID: <000001bee3ba$9b226f60$8d2d2399@tim>

[Vladimir]
> Currently, dictionaries always grow until they are deallocated from
> memory.

It's more accurate to say they never shrink <0.9 wink>.  Even that has
exceptions, though, starting with:

> This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
>
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }

This code can shrink the dict too.  The load factor computation is based on
"fill", but the resize is based on "used".  If you grow a huge dict, then
delete all the entries one by one, "used" falls to 0 but "fill" stays at its
high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
continues to climb as keys are added again:  when the load factor
computation triggers again, "used" may be as small as 1, and dictresize can
shrink the dict dramatically.

The only clear a priori return I see in your patch is that I might save
memory if I delete gobs of stuff from a dict and then neither get rid of it
nor add keys to it again.  But my programs generally grow dicts forever,
grow then delete them entirely, or cycle through fat and lean times (in
which case the code above already shrinks them from time to time).  So I
don't expect that your patch would be buy me anything I want, but would cost
me more on every delete.

> ...
> Any other ideas on how to deal with this? Thoughts, comments?

Just that slowing the expected case to prevent theoretical bad cases is
usually a net loss -- I think the onus is on you to demonstrate that this
change is an exception to that rule.  I do recall one real-life complaint
about it on c.l.py a couple years ago:  the poster had a huge dict,
eventually deleted most of the items, and then kept it around purely for
lookups.  They were happy enough to copy the dict into a fresh one a
key+value pair at a time; today they could just do

    d = d.copy()

or even

    d.update({})

to shrink the dict.

It would certainly be good to document these tricks!

if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
    see-why-1999-is-special-ly y'rs  - tim


From tim_one at email.msn.com  Wed Aug 11 08:45:49 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 11 Aug 1999 02:45:49 -0400
Subject: [Python-Dev] preferred conference date?
In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us>
Message-ID: <000201bee3c5$25b47b00$8d2d2399@tim>

[Guido]
> ...
> The prices are high (they tell me that the whole conference will cost
> $900, with a room rate of $129)

Is room rental in addition to, or included in, that $900?

> ...
> I'm worried that I'll be flamed to hell for this by the PSA members,

So have JulieK announce it <wink>.

> ...
> Anyway, given that Foretec is likely to go with this hotel, we have a
> choice of two dates: January 16-19, or 23-26 (both starting on a
> Sunday with the tutorials).  This is where I need your help: which
> date would you prefer?

23-26 for me; 16-19 may not be doable.

or-everyone-can-switch-to-windows-and-we'll-do-the-conference-via-
    netmeeting-ly y'rs  - tim


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 11 16:33:17 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 11 Aug 1999 15:33:17 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <000001bee3ba$9b226f60$8d2d2399@tim> from "Tim Peters" at "Aug 11, 99 01:30:20 am"
Message-ID: <199908111433.PAA31842@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Vladimir]
> > Currently, dictionaries always grow until they are deallocated from
> > memory.
> 
> It's more accurate to say they never shrink <0.9 wink>.  Even that has
> exceptions, though, starting with:
> 
> > This happens in PyDict_SetItem according to the following
> > code (before inserting the new item into the dict):
> >
> >         /* if fill >= 2/3 size, double in size */
> >         if (mp->ma_fill*3 >= mp->ma_size*2) {
> >                 if (dictresize(mp, mp->ma_used*2) != 0) {
> >                         if (mp->ma_fill+1 > mp->ma_size)
> >                                 return -1;
> >                 }
> >         }
> 
> This code can shrink the dict too.  The load factor computation is based on
> "fill", but the resize is based on "used".  If you grow a huge dict, then
> delete all the entries one by one, "used" falls to 0 but "fill" stays at its
> high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
> continues to climb as keys are added again:  when the load factor
> computation triggers again, "used" may be as small as 1, and dictresize can
> shrink the dict dramatically.

Thanks for clarifying this!

> [snip]
> 
> > ...
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> Just that slowing the expected case to prevent theoretical bad cases is
> usually a net loss -- I think the onus is on you to demonstrate that this
> change is an exception to that rule.

I won't, because this case is rare in practice, classifying it already
as an exception. A real exception. I'll have to think a bit more about
all this. Adding 1/3 new entries to trigger the next resize sounds
suboptimal (if it happens at all).

> I do recall one real-life complaint
> about it on c.l.py a couple years ago:  the poster had a huge dict,
> eventually deleted most of the items, and then kept it around purely for
> lookups.  They were happy enough to copy the dict into a fresh one a
> key+value pair at a time; today they could just do
> 
>     d = d.copy()
> 
> or even
> 
>     d.update({})
> 
> to shrink the dict.
> 
> It would certainly be good to document these tricks!

I think that officializing these tricks in the documentation is a bad idea.

> 
> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
>     see-why-1999-is-special-ly y'rs  - tim
> 

This is a good (your favorite ;-) argument, but don't forget that you've
been around, teaching people various tricks.

And 1999 is special -- we just had a solar eclipse today, the next being
scheduled for 2081.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From fredrik at pythonware.com  Wed Aug 11 16:07:44 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 11 Aug 1999 16:07:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>             <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>  <199908101412.KAA02065@eric.cnri.reston.va.us>
Message-ID: <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>

> > or maybe the buffer design needs an overhaul?
> 
> I think most places that should use the charbuffer interface actually
> use the readbuffer interface.  This is what should be fixed.

ok.

btw, how about adding support for buffer access
to data that have strange internal formats (like cer-
tain PIL image memories) or isn't directly accessible
(like "virtual" and "abstract" image buffers in PIL 1.1).
something like:

int initbuffer(PyObject* obj, void** context);
int exitbuffer(PyObject* obj, void* context);

and corresponding context arguments to the
rest of the functions...

</F>


From guido at CNRI.Reston.VA.US  Wed Aug 11 16:42:10 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 11 Aug 1999 10:42:10 -0400
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Your message of "Wed, 11 Aug 1999 16:07:44 +0200."
             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> 
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>  
            <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> 
Message-ID: <199908111442.KAA04423@eric.cnri.reston.va.us>

> btw, how about adding support for buffer access
> to data that have strange internal formats (like cer-
> tain PIL image memories) or isn't directly accessible
> (like "virtual" and "abstract" image buffers in PIL 1.1).
> something like:
> 
> int initbuffer(PyObject* obj, void** context);
> int exitbuffer(PyObject* obj, void* context);
> 
> and corresponding context arguments to the
> rest of the functions...

Can you explain this idea more?  Without more understanding of PIL I
have no idea what you're talking about...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Thu Aug 12 07:15:39 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 12 Aug 1999 01:15:39 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908111433.PAA31842@pukapuka.inrialpes.fr>
Message-ID: <000301bee481$b78ae5c0$4e2d2399@tim>

[Tim]
>> ...slowing the expected case to prevent theoretical bad cases is
>> usually a net loss -- I think the onus is on you to demonstrate
>> that this change is an exception to that rule.

[Vladimir Marangozov]
> I won't, because this case is rare in practice, classifying it already
> as an exception. A real exception. I'll have to think a bit more about
> all this. Adding 1/3 new entries to trigger the next resize sounds
> suboptimal (if it happens at all).

"Suboptimal" with respect to which specific cost model?  Exhibiting a
specific bad case isn't compelling, and especially not when it's considered
to be "a real exception".  Adding new expense to every delete is an obvious
new burden -- where's the payback, and is the expected net effect amortized
across all dict usage a win or loss?  Offhand it sounds like a small loss to
me, although I haven't worked up a formal cost model either <wink>.

> ...
> I think that officializing these tricks in the documentation is a
> bad idea.

It's rarely a good idea to keep truths secret, although
implementation-du-jour tricks don't belong in the current doc set.  Probably
in a HowTo.

>> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-
>>     to-see-why-1999-is-special-ly y'rs  - tim

> This is a good (your favorite ;-) argument,

I actually hate that kind of argument -- it's one of *Guido's* favorites,
and in his current silent state I'm simply channeling him <wink>.

> but don't forget that you've been around, teaching people various
> tricks.

As I said, this particular trick has come up only once in real life in my
experience; it's never come up in my own code; it's an anti-FAQ.  People are
100x more likely to whine about theoretical quadratic-time list growth
nobody has ever encountered (although it looks like they may finally get it
under an out-of-the-box BDW collector!).

> And 1999 is special -- we just had a solar eclipse today, the next being
> scheduled for 2081.

Ya, like any of us will survive Y2K to see it <wink>.

1999-is-special-cuz-it's-the-end-of-civilization-ly y'rs  - tim


From Vladimir.Marangozov at inrialpes.fr  Thu Aug 12 20:22:06 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Thu, 12 Aug 1999 19:22:06 +0100 (NFT)
Subject: [Python-Dev] about line numbers
Message-ID: <199908121822.TAA40444@pukapuka.inrialpes.fr>

Just curious:

Is python with vs. without "-O" equivalent today regarding line numbers?
Are SET_LINENO opcodes a plus in some situations or not?

Next, I see quite often several SET_LINENO in a row in the beginning
of code objects due to doc strings, etc. Since I don't think that
folding them into one SET_LINENO would be an optimisation (it would
rather be avoiding the redundancy), is it possible and/or reasonable
to do something in this direction?

A trivial example:

>>> def f():
...     "This is a comment about f"   
...     a = 1
... 
>>> import dis
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2

          6 SET_LINENO          3
          9 LOAD_CONST          1 (1)
         12 STORE_FAST          0 (a)
         15 LOAD_CONST          2 (None)
         18 RETURN_VALUE   
>>>

Can the above become something like this instead:

          0 SET_LINENO          3
          3 LOAD_CONST          1 (1)
          6 STORE_FAST          0 (a)
          9 LOAD_CONST          2 (None)
         12 RETURN_VALUE


-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack at oratrix.nl  Fri Aug 13 00:02:06 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 13 Aug 1999 00:02:06 +0200
Subject: [Python-Dev] about line numbers 
In-Reply-To: Message by Vladimir Marangozov <Vladimir.Marangozov@inrialpes.fr> ,
	     Thu, 12 Aug 1999 19:22:06 +0100 (NFT) , <199908121822.TAA40444@pukapuka.inrialpes.fr> 
Message-ID: <19990812220211.B3CED993@oratrix.oratrix.nl>

The only possible problem I can see with folding linenumbers is if
someone sets a breakpoint on such a line. And I think it'll be
difficult to explain the missing line numbers to pdb, so there isn't
an easy workaround (at least, it takes more than my 30 seconds of
brainpoewr to come up with one:-).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 13 01:10:26 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 00:10:26 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <000301bee481$b78ae5c0$4e2d2399@tim> from "Tim Peters" at "Aug 12, 99 01:15:39 am"
Message-ID: <199908122310.AAA29618@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Tim]
> >> ...slowing the expected case to prevent theoretical bad cases is
> >> usually a net loss -- I think the onus is on you to demonstrate
> >> that this change is an exception to that rule.
> 
> [Vladimir Marangozov]
> > I won't, because this case is rare in practice, classifying it already
> > as an exception. A real exception. I'll have to think a bit more about
> > all this. Adding 1/3 new entries to trigger the next resize sounds
> > suboptimal (if it happens at all).
> 
> "Suboptimal" with respect to which specific cost model?  Exhibiting a
> specific bad case isn't compelling, and especially not when it's considered
> to be "a real exception".  Adding new expense to every delete is an obvious
> new burden -- where's the payback, and is the expected net effect amortized
> across all dict usage a win or loss?  Offhand it sounds like a small loss to
> me, although I haven't worked up a formal cost model either <wink>.

C'mon Tim, don't try to impress me with cost models. I'm already impressed :-)
Anyways, I've looked at some traces. As expected, the conclusion is that
this case is extremely rare wrt the average dict usage. There are 3 reasons:
(1) dicts are usually deleted entirely and (2) del d[key] is rare in practice
(3) often d[key] = None is used instead of (2).

There is, however, a small percentage of dicts which are used below 1/3 of
their size. I must say, below 1/3 of their peek size, because dowsizing
is also rare. To trigger a downsize, 1/3 new entries of the peek size must
be inserted.

Besides these observations, after looking at the code one more time, I can't
really understand why the resize logic is based on the "fill" watermark
and not on "used". fill = used + dummy, but since lookdict returns the
first free slot (null or dummy), I don't really see what's the point of
using a fill watermark... Perhaps you can enlighten me on this. Using only
the "used" metrics seems fine to me. I even deactivated "fill" and replaced
it with "used" to see what happens -- no visible changes, except a tiny
speedup I'm willing to neglect.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 13 01:21:48 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 00:21:48 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <19990812220211.B3CED993@oratrix.oratrix.nl> from "Jack Jansen" at "Aug 13, 99 00:02:06 am"
Message-ID: <199908122321.AAA29572@pukapuka.inrialpes.fr>

Jack Jansen wrote:
> 
> 
> The only possible problem I can see with folding linenumbers is if
> someone sets a breakpoint on such a line. And I think it'll be
> difficult to explain the missing line numbers to pdb, so there isn't
> an easy workaround (at least, it takes more than my 30 seconds of
> brainpoewr to come up with one:-).
> 

Eek! We can set a breakpoint on a doc string? :-) There's no code
in there. It should be treated as a comment by pdb. I can't set a
breakpoint on a comment line even in C ;-) There must be something
deeper about it...

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one at email.msn.com  Fri Aug 13 02:07:32 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 12 Aug 1999 20:07:32 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908121822.TAA40444@pukapuka.inrialpes.fr>
Message-ID: <000101bee51f$d7601de0$fb2d2399@tim>

[Vladimir Marangozov]
> Is python with vs. without "-O" equivalent today regarding
> line numbers?
>
> Are SET_LINENO opcodes a plus in some situations or not?

In theory it should make no difference, except that the trace mechanism
makes a callback on each SET_LINENO, and that's how the debugger implements
line-number breakpoints.  Under -O, there are no SET_LINENOs, so debugger
line-number breakpoints don't work under -O.

I think there's also a sporadic buglet, which I've never bothered to track
down:  sometimes a line number reported in a traceback under -O (&, IIRC,
it's always the topmost line number) comes out as a senseless negative
value.

> Next, I see quite often several SET_LINENO in a row in the beginning
> of code objects due to doc strings, etc. Since I don't think that
> folding them into one SET_LINENO would be an optimisation (it would
> rather be avoiding the redundancy), is it possible and/or reasonable
> to do something in this direction?

All opcodes consume time, although a wasted trip or two around the eval loop
at the start of a function isn't worth much effort to avoid.  Still, it's a
legitimate opportunity for provable speedup, even if unmeasurable speedup
<wink>.

Would be more valuable to rethink the debugger's breakpoint approach so that
SET_LINENO is never needed (line-triggered callbacks are expensive because
called so frequently, turning each dynamic SET_LINENO into a full-blown
Python call; if I used the debugger often enough to care <wink>, I'd think
about munging in a new opcode to make breakpoint sites explicit).

immutability-is-made-to-be-violated-ly y'rs  - tim


From tim_one at email.msn.com  Fri Aug 13 06:53:38 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 13 Aug 1999 00:53:38 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908122307.AAA06018@pukapuka.inrialpes.fr>
Message-ID: <000101bee547$cffaa020$992d2399@tim>

[Vladimir Marangozov, *almost* seems ready to give up on a counter-
 productive dict pessimization <wink>]

> ...
> There is, however, a small percentage of dicts which are used
> below 1/3 of their size. I must say, below 1/3 of their peek size,
> because dowsizing is also rare. To trigger a downsize, 1/3 new
> entries of the peek size must be inserted.

Not so, although "on average" 1/6 may be correct.  Look at an extreme:  Say
a dict has size 333 (it can't, but it makes the math obvious ...).  Say it
contains 221 items.  Now someone deletes them all, one at a time.  used==0
and fill==221 at this point.  They insert one new key that happens to hit
one of the 333-221 = 112 remaining NULL keys.  Then used==1 and fill==222.
They insert a 2nd key, and before the dict is searched the new fill of 222
triggers the 2/3rds load-factor resizing -- which asks for a new size of 1*2
== 2.

For the minority of dicts that go up and down in size wildly many times, the
current behavior is fine.

> Besides these observations, after looking at the code one more
> time, I can't really understand why the resize logic is based on
> the "fill" watermark and not on "used". fill = used + dummy, but
> since lookdict returns the first free slot (null or dummy), I don't
> really see what's the point of using a fill watermark...

Let's just consider an unsuccessful search.  Then it does return "the first"
free slot, but not necessarily at the time it *sees* the first free slot.
So long as it sees a dummy, it has to keep searching; the search doesn't end
until it finds a NULL.  So consider this, assuming the resize triggered only
on "used":

d = {}
for i in xrange(50000):
    d[random.randrange(1000000)] = 1
for k in d.keys():
    del d[k]
# now there are 50000 dummy dict keys, and some number of NULLs

# loop invariant:  used == 0
for i in xrange(sys.maxint):
    j = random.randrange(10000000)
    d[j] = 1
    del d[j]
    assert not d.has_key(i)

However many NULL slots remained, the last loop eventually transforms them
*all* into dummies.  The dummies act exactly like "real keys" with respect
to expected time for an unsuccessful search, which is why it's thoroughly
appropriate to include dummies in the load factor computation.  The loop
will run slower and slower as the percentage of dummies approaches 100%, and
each failing has_key approaches O(N) time.

In most hash table implementations that's the worst that can happen (and
it's a disaster), but under Python's implementation it's worse:  Python
never checks to see whether the probe sequence "wraps around", so the first
search after the last NULL is changed to a dummy never ends.

Counting the dummies in the load-factor computation prevents all that:  no
matter how much inserts and deletes are intermixed, the "effective load
factor" stays under 2/3rds so gives excellent expected-case behavior; and it
also protects against an all-dummy dict, making the lack of an expensive
inner-loop "wrapped around?" check safe.

> Perhaps you can enlighten me on this. Using only the "used" metrics
> seems fine to me. I even deactivated "fill" and replaced it with "used"
> to see what happens -- no visible changes, except a tiny speedup I'm
> willing to neglect.

You need a mix of deletes and inserts for the dummies to make a difference;
dicts that always grow don't have dummies, so they're not likely to have any
dummy-related problems either <wink>.  Try this (untested):

import time
from random import randrange
N = 1000
thatmany = [None] * N

while 1:
    start = time.clock()
    for i in thatmany:
        d[randrange(10000000)] = 1
    for i in d.keys():
        del d[i]
    finish = time.clock()
    print round(finish - start, 3)

Succeeding iterations of the outer loop should grow dramatically slower, and
finally get into an infinite loop, despite that "used" never exceeds N.

Short course rewording:  for purposes of predicting expected search time, a
dummy is the same as a live key, because finding a dummy doesn't end a
search -- it has to press on until either finding the key it was looking
for, or finding a NULL.  And with a mix of insertions and deletions, and if
the hash function is doing a good job, then over time all the slots in the
table will become either live or dummy, even if "used" stays within a very
small range.

So, that's why <wink>.

dictobject-may-be-the-subtlest-object-there-is-ly y'rs  - tim


From gstein at lyra.org  Fri Aug 13 11:13:55 1999
From: gstein at lyra.org (Greg Stein)
Date: Fri, 13 Aug 1999 02:13:55 -0700 (PDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org>

On Tue, 10 Aug 1999, Fredrik Lundh wrote:
>...
> unicode objects do not implement the getcharbuffer slot.

This is Goodness. All righty.

>...
> maybe the unicode class shouldn't implement the
> buffer interface at all?  sure looks like the best way

It is needed for fp.write(unicodeobj) ...

It is also very handy for C functions to deal with Unicode strings.

> to avoid trivial mistakes (the current behaviour of
> fp.write(unicodeobj) is even more serious than the
> marshal glitch...)

What's wrong with fp.write(unicodeobj)? It should write the unicode value
to the file. Are you suggesting that it will need to be done differently?
Icky.

> or maybe the buffer design needs an overhaul?

Not that I know of. 

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Fri Aug 13 12:36:13 1999
From: gstein at lyra.org (Greg Stein)
Date: Fri, 13 Aug 1999 03:36:13 -0700 (PDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <199908101412.KAA02065@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>

On Tue, 10 Aug 1999, Guido van Rossum wrote:
>...
> > or maybe the buffer design needs an overhaul?
> 
> I think most places that should use the charbuffer interface actually
> use the readbuffer interface.  This is what should be fixed.

I believe that I properly changed all of these within the core
distribution. Per your requested design, third-party extensions must
switch from "s#" to "t#" to move to the charbuffer interface, as needed. 

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 13 15:47:05 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 14:47:05 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm"
Message-ID: <199908131347.OAA30740@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Vladimir Marangozov, *almost* seems ready to give up on a counter-
>  productive dict pessimization <wink>]
> 

Of course I will! Now everything is perfectly clear. Thanks.

> ...
> So, that's why <wink>.
> 

Now, *this* one explanation of yours should go into a HowTo/BecauseOf
for developers. I timed your scripts and a couple of mine which attest
(again) the validity of the current implementation. My patch is out of
bounds. It even disturbs from time to time the existing harmony in the
results ;-) because of early resizing.

All in all, for performance reasons, dicts remain an exception to the
rule of releasing memory ASAP. They have been designed to tolerate caching
because of their dynamics, which is the main reason for the rare case
addressed by my patch.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Fri Aug 13 19:27:19 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 13 Aug 1999 19:27:19 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
Message-ID: <37B45577.7772CAA1@lemburg.com>

Greg Stein wrote:
> 
> On Tue, 10 Aug 1999, Guido van Rossum wrote:
> >...
> > > or maybe the buffer design needs an overhaul?
> >
> > I think most places that should use the charbuffer interface actually
> > use the readbuffer interface.  This is what should be fixed.
> 
> I believe that I properly changed all of these within the core
> distribution. Per your requested design, third-party extensions must
> switch from "s#" to "t#" to move to the charbuffer interface, as needed.

Shouldn't this be the other way around ? After all, extensions
using "s#" do expect character data and not arbitrary binary
encodings of information. IMHO, the latter should be special
cased, not the prior. E.g. it doesn't make sense to use the
re module to scan over 2-byte Unicode with single character
based search patterns.

Aside: Is the buffer interface reachable in any way from within
Python ? Why isn't the interface exposed via __XXX__ methods
on normal Python instances (could be implemented by returning a
buffer object) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Fri Aug 13 17:32:40 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Aug 1999 11:32:40 -0400 (EDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B45577.7772CAA1@lemburg.com>
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
	<37B45577.7772CAA1@lemburg.com>
Message-ID: <14260.15000.398399.840716@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > Aside: Is the buffer interface reachable in any way from within
 > Python ? Why isn't the interface exposed via __XXX__ methods
 > on normal Python instances (could be implemented by returning a
 > buffer object) ?

  Would it even make sense?  I though a large part of the intent was
to for performance, avoiding memory copies.  Perhaps there should be
an .__as_buffer__() which returned an object that supports the C
buffer interface.  I'm not sure how useful it would be; perhaps for
classes that represent image data?  They could return a buffer object
created from a string/array/NumPy array.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fredrik at pythonware.com  Fri Aug 13 17:59:12 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 13 Aug 1999 17:59:12 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org><37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us>
Message-ID: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>

>   Would it even make sense?  I though a large part of the intent was
> to for performance, avoiding memory copies.

looks like there's some confusion here over
what the buffer interface is all about.  time
for a new GvR essay, perhaps?

</F>


From fdrake at acm.org  Fri Aug 13 18:22:09 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Aug 1999 12:22:09 -0400 (EDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
	<37B45577.7772CAA1@lemburg.com>
	<14260.15000.398399.840716@weyr.cnri.reston.va.us>
	<00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>
Message-ID: <14260.17969.497916.382752@weyr.cnri.reston.va.us>

Fredrik Lundh writes:
 > looks like there's some confusion here over
 > what the buffer interface is all about.  time
 > for a new GvR essay, perhaps?

  If he'll write something about it, I'll be glad to adapt it to the
extending & embedding manual.  It seems important that it be included
in the standard documentation since it will be important for extension 
writers to understand when they should implement it.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fredrik at pythonware.com  Fri Aug 13 18:34:46 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 13 Aug 1999 18:34:46 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us>
Message-ID: <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com>

Guido van Rossum wrote:
> > btw, how about adding support for buffer access
> > to data that have strange internal formats (like cer-
> > tain PIL image memories) or isn't directly accessible
> > (like "virtual" and "abstract" image buffers in PIL 1.1).
> > something like:
> > 
> > int initbuffer(PyObject* obj, void** context);
> > int exitbuffer(PyObject* obj, void* context);
> > 
> > and corresponding context arguments to the
> > rest of the functions...
> 
> Can you explain this idea more?  Without more understanding of PIL I
> have no idea what you're talking about...

in code:

    void* context;

    // this can be done at any time
    segments = pb->getsegcount(obj, NULL, context);

    if (!pb->bf_initbuffer(obj, &context))
        ... failed to initialise buffer api ...
    
    ... allocate segment size buffer ...

    pb->getsegcount(obj, &bytes, context);
    ... calculate total buffer size and allocate buffer ...

    for (i = offset = 0; i < segments; i++) {
        n = pb->getreadbuffer(obj, i, &p, context);
        if (n < 0)
            ... failed to fetch a given segment ...
        memcpy(buf + offset, p, n); // or write to file, or whatevef
        offset = offset + n;
   }

   pb->bf_exitbuffer(obj, context);

in other words, this would given the target object a
chance to keep some local context (like a temporary
buffer) during a sequence of buffer operations...

for PIL, this would make it possible to

1) store required metadata (size, mode, palette)
along with the actual buffer contents.

2) possibly pack formats that use extra internal
storage for performance reasons -- RGB pixels
are stored as 32-bit integers, for example.

3) access virtual image memories (that can only
be accessed via a buffer-like interface in them-
selves -- given an image object, you acquire an
access handle, and use a getdata method to
access the actual data.  without initbuffer,
there's no way to do two buffer access in
parallel.  without exitbuffer, there's no way
to release the access handle.  without the
context variable, there's nowhere to keep
the access handle between calls.)

4) access abstract image memories (like virtual
memories, but they reside outside PIL, like on
a remote server, or inside another image pro-
cessing library, or on a hardware device).

5) convert to external formats on the fly:

    fp.write(im.buffer("JPEG"))

and probably a lot more.  as far as I can tell,
nothing of this can be done using the current
design...

...

besides, what about buffers and threads?  if you
return a pointer from getreadbuf, wouldn't it be
good to know exactly when Python doesn't need
that pointer any more?  explicit initbuffer/exitbuffer
calls around each sequence of buffer operations
would make that a lot safer...

</F>


From mal at lemburg.com  Fri Aug 13 21:16:44 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 13 Aug 1999 21:16:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
		<37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us>
Message-ID: <37B46F1C.1A513F33@lemburg.com>

Fred L. Drake, Jr. wrote:
> 
> M.-A. Lemburg writes:
>  > Aside: Is the buffer interface reachable in any way from within
>  > Python ? Why isn't the interface exposed via __XXX__ methods
>  > on normal Python instances (could be implemented by returning a
>  > buffer object) ?
> 
>   Would it even make sense?  I though a large part of the intent was
> to for performance, avoiding memory copies.  Perhaps there should be
> an .__as_buffer__() which returned an object that supports the C
> buffer interface.  I'm not sure how useful it would be; perhaps for
> classes that represent image data?  They could return a buffer object
> created from a string/array/NumPy array.

That's what I had in mind.

def __getreadbuffer__(self):
    return buffer(self.data)

def __getcharbuffer__(self):
    return buffer(self.string_data)

def __getwritebuffer__(self):
    return buffer(self.mmaped_file)

Note that buffer() does not copy the data, it only adds a reference
to the object being used.

Hmm, how about adding a writeable binary object to the core ?
This would be useful for the __getwritebbuffer__() API because
currently, I think, only mmap'ed files are useable as write
buffers -- no other in-memory type. Perhaps buffer objects
could be used for this purpose too, e.g. by having them
allocate the needed memory chunk in case you pass None as
object.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Fri Aug 13 23:48:12 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 13 Aug 1999 23:48:12 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
Message-ID: <19990813214817.5393C1C4742@oratrix.oratrix.nl>

This week again I was bitten by the fact that Python doesn't have any
form of weak references, and while I was toying with some ideas I came 
up with the following quick-and-dirty scheme that I thought I'd bounce 
off this list. I might even volunteer to implement it, if people agree 
it is worth it:-)

We add a new builtin function (or a module with that function)
weak(). This returns a weak reference to the object passed as a
parameter. A weak object has one method: strong(), which returns the
corresponding real object or raises an exception if the object doesn't 
exist anymore. For convenience we could add a method exists() that
returns true if the real object still exists.

Now comes the bit that I'm unsure about: to implement this I need to
add a pointer to every object. This pointer is either NULL or points
to the corresponding weak objectt (so for every object there is either no
weak reference object or exactly one). But, for the price of 4 bytes extra
in every object we get the nicety that there is little cpu-overhead:
refcounting macros work identical to the way they do now, the only
thing to take care of is that during object deallocation we have to
zero the weak pointer. (actually: we could make do with a single bit
in every object, with the bit meaning "this object has an associated
weak object". We could then use a global dictionary indexed by object
address to find the weak object)


From mal at lemburg.com  Sat Aug 14 01:15:39 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 01:15:39 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org>
Message-ID: <37B4A71B.2073875F@lemburg.com>

Greg Stein wrote:
> 
> On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > maybe the unicode class shouldn't implement the
> > buffer interface at all?  sure looks like the best way
> 
> It is needed for fp.write(unicodeobj) ...
> 
> It is also very handy for C functions to deal with Unicode strings.

Wouldn't a special C API be (even) more convenient ?

> > to avoid trivial mistakes (the current behaviour of
> > fp.write(unicodeobj) is even more serious than the
> > marshal glitch...)
> 
> What's wrong with fp.write(unicodeobj)? It should write the unicode value
> to the file. Are you suggesting that it will need to be done differently?
> Icky.

Would this also write some kind of Unicode encoding header ?
[Sorry, this is my Unicode ignorance shining through... I only
 remember lots of talk about these things on the string-sig.]

Since fp.write() uses "s#" this would use the getreadbuffer
slot in 1.5.2... I think what it *should* do is use the
getcharbuffer slot instead (see my other post), since dumping
the raw unicode data would loose too much information. Again,
such things should be handled by extra methods, e.g. fp.rawwrite().

Hmm, I guess the philosophy behind the interface is not
really clear. Binary data is fetched via getreadbuffer and then
interpreted as character data... I always thought that the
getcharbuffer should be used for such an interpretation.

Or maybe, we should dump the getcharbufer slot again and
use the getreadbuffer information just as we would a
void* pointer in C: with no explicit or implicit type information.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sat Aug 14 10:53:04 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 01:53:04 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com>
Message-ID: <37B52E70.2D957546@lyra.org>

M.-A. Lemburg wrote:
> 
> Greg Stein wrote:
> >
> > On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > > maybe the unicode class shouldn't implement the
> > > buffer interface at all?  sure looks like the best way
> >
> > It is needed for fp.write(unicodeobj) ...
> >
> > It is also very handy for C functions to deal with Unicode strings.
> 
> Wouldn't a special C API be (even) more convenient ?

Why? Accessing the Unicode values as a series of bytes matches exactly
to the semantics of the buffer interface. Why throw in Yet Another
Function?

Your abstract.c functions make it quite simple.

> > > to avoid trivial mistakes (the current behaviour of
> > > fp.write(unicodeobj) is even more serious than the
> > > marshal glitch...)
> >
> > What's wrong with fp.write(unicodeobj)? It should write the unicode value
> > to the file. Are you suggesting that it will need to be done differently?
> > Icky.
> 
> Would this also write some kind of Unicode encoding header ?
> [Sorry, this is my Unicode ignorance shining through... I only
>  remember lots of talk about these things on the string-sig.]

Absolutely not. Placing the Byte Order Mark (BOM) into an output stream
is an application-level task. It should never by done by any subsystem.

There are no other "encoding headers" that would go into the output
stream. The output would simply be UTF-16 (2-byte values in host byte
order).

> Since fp.write() uses "s#" this would use the getreadbuffer
> slot in 1.5.2... I think what it *should* do is use the
> getcharbuffer slot instead (see my other post), since dumping
> the raw unicode data would loose too much information. Again,

I very much disagree. To me, fp.write() is not about writing characters
to a stream. I think it makes much more sense as "writing bytes to a
stream" and the buffer interface fits that perfectly.

There is no loss of data. You could argue that the byte order is lost,
but I think that is incorrect. The application defines the semantics:
the file might be defined as using host-order, or the application may be
writing a BOM at the head of the file.

> such things should be handled by extra methods, e.g. fp.rawwrite().

I believe this would be a needless complication of the interface.

> Hmm, I guess the philosophy behind the interface is not
> really clear.

I didn't design or implement it initially, but (as you may have guessed)
I am a proponent of its existence.

> Binary data is fetched via getreadbuffer and then
> interpreted as character data... I always thought that the
> getcharbuffer should be used for such an interpretation.

The former is bad behavior. That is why getcharbuffer was added (by me,
for 1.5.2). It was a preventative measure for the introduction of
Unicode strings. Using getreadbuffer for characters would break badly
given a Unicode string. Therefore, "clients" that want (8-bit)
characters from an object supporting the buffer interface should use
getcharbuffer. The Unicode object doesn't implement it, implying that it
cannot provide 8-bit characters. You can get the raw bytes thru
getreadbuffer.

> Or maybe, we should dump the getcharbufer slot again and
> use the getreadbuffer information just as we would a
> void* pointer in C: with no explicit or implicit type information.

Nope. That path is frought with failure :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Sat Aug 14 12:21:51 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 12:21:51 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <19990813214817.5393C1C4742@oratrix.oratrix.nl>
Message-ID: <37B5433F.61CE6F76@lemburg.com>

Jack Jansen wrote:
> 
> This week again I was bitten by the fact that Python doesn't have any
> form of weak references, and while I was toying with some ideas I came
> up with the following quick-and-dirty scheme that I thought I'd bounce
> off this list. I might even volunteer to implement it, if people agree
> it is worth it:-)

Have you checked the weak reference dictionary implementation
by Dieter Maurer ? It's at:

	http://www.handshake.de/~dieter/weakdict.html

While I like the idea of having weak references in the core,
I think 4 extra bytes for *every* object is just a little
too much. The flag bit idea (with the added global dictionary
of weak referenced objects) looks promising though.

BTW, how would this be done in JPython ? I guess it doesn't
make much sense there because cycles are no problem for the
Java VM GC.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sat Aug 14 14:30:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 14:30:45 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org>
Message-ID: <37B56175.23ABB350@lemburg.com>

Greg Stein wrote:
> 
> M.-A. Lemburg wrote:
> >
> > Greg Stein wrote:
> > >
> > > On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > > > maybe the unicode class shouldn't implement the
> > > > buffer interface at all?  sure looks like the best way
> > >
> > > It is needed for fp.write(unicodeobj) ...
> > >
> > > It is also very handy for C functions to deal with Unicode strings.
> >
> > Wouldn't a special C API be (even) more convenient ?
> 
> Why? Accessing the Unicode values as a series of bytes matches exactly
> to the semantics of the buffer interface. Why throw in Yet Another
> Function?

I meant PyUnicode_* style APIs for dealing with all the aspects
of Unicode objects -- much like the PyString_* APIs available.
 
> Your abstract.c functions make it quite simple.

BTW, do we need an extra set of those with buffer index or not ?
Those would really be one-liners for the sake of hiding the
type slots from applications.

> > > > to avoid trivial mistakes (the current behaviour of
> > > > fp.write(unicodeobj) is even more serious than the
> > > > marshal glitch...)
> > >
> > > What's wrong with fp.write(unicodeobj)? It should write the unicode value
> > > to the file. Are you suggesting that it will need to be done differently?
> > > Icky.
> >
> > Would this also write some kind of Unicode encoding header ?
> > [Sorry, this is my Unicode ignorance shining through... I only
> >  remember lots of talk about these things on the string-sig.]
> 
> Absolutely not. Placing the Byte Order Mark (BOM) into an output stream
> is an application-level task. It should never by done by any subsystem.
> 
> There are no other "encoding headers" that would go into the output
> stream. The output would simply be UTF-16 (2-byte values in host byte
> order).

Ok.

> > Since fp.write() uses "s#" this would use the getreadbuffer
> > slot in 1.5.2... I think what it *should* do is use the
> > getcharbuffer slot instead (see my other post), since dumping
> > the raw unicode data would loose too much information. Again,
> 
> I very much disagree. To me, fp.write() is not about writing characters
> to a stream. I think it makes much more sense as "writing bytes to a
> stream" and the buffer interface fits that perfectly.

This is perfectly ok, but shouldn't the behaviour of fp.write()
mimic that of previous Python versions ? How does JPython
write the data ?

Inlined different subject:
I think the internal semantics of "s#" using the getreadbuffer slot
and "t#" the getcharbuffer slot should be switched; see my other post.
In previous Python versions "s#" had the semantics of string data
with possibly embedded NULL bytes. Now it suddenly has the meaning
of binary data and you can't simply change extensions to use the
new "t#" because people are still using them with older Python
versions.
 
> There is no loss of data. You could argue that the byte order is lost,
> but I think that is incorrect. The application defines the semantics:
> the file might be defined as using host-order, or the application may be
> writing a BOM at the head of the file.

The problem here is that many application were not written
to handle these kind of objects. Previously they could only
handle strings, now they can suddenly handle any object
having the buffer interface and then fail when the data
gets read back in.

> > such things should be handled by extra methods, e.g. fp.rawwrite().
> 
> I believe this would be a needless complication of the interface.

It would clarify things and make the interface 100% backward
compatible again.
 
> > Hmm, I guess the philosophy behind the interface is not
> > really clear.
> 
> I didn't design or implement it initially, but (as you may have guessed)
> I am a proponent of its existence.
> 
> > Binary data is fetched via getreadbuffer and then
> > interpreted as character data... I always thought that the
> > getcharbuffer should be used for such an interpretation.
> 
> The former is bad behavior. That is why getcharbuffer was added (by me,
> for 1.5.2). It was a preventative measure for the introduction of
> Unicode strings. Using getreadbuffer for characters would break badly
> given a Unicode string. Therefore, "clients" that want (8-bit)
> characters from an object supporting the buffer interface should use
> getcharbuffer. The Unicode object doesn't implement it, implying that it
> cannot provide 8-bit characters. You can get the raw bytes thru
> getreadbuffer.

I agree 100%, but did you add the "t#" instead of having
"s#" use the getcharbuffer interface ? E.g. my mxTextTools
package uses "s#" on many APIs. Now someone could stick
in a Unicode object and get pretty strange results without
any notice about mxTextTools and Unicode being incompatible.
You could argue that I change to "t#", but that doesn't
work since many people out there still use Python versions
<1.5.2 and those didn't have "t#", so mxTextTools would then
fail completely for them.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sat Aug 14 13:34:17 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 04:34:17 -0700
Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?))
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com>
Message-ID: <37B55439.683272D2@lyra.org>

M.-A. Lemburg wrote:
>...
> I meant PyUnicode_* style APIs for dealing with all the aspects
> of Unicode objects -- much like the PyString_* APIs available.

Sure, these could be added as necessary. For raw access to the bytes, I
would refer people to the abstract buffer functions, tho.

> > Your abstract.c functions make it quite simple.
> 
> BTW, do we need an extra set of those with buffer index or not ?
> Those would really be one-liners for the sake of hiding the
> type slots from applications.

It sounds like NumPy and PIL would need it, which makes the landscape
quite a bit different from the last time we discussed this (when we
didn't imagine anybody needing those).

>...
> > > Since fp.write() uses "s#" this would use the getreadbuffer
> > > slot in 1.5.2... I think what it *should* do is use the
> > > getcharbuffer slot instead (see my other post), since dumping
> > > the raw unicode data would loose too much information. Again,
> >
> > I very much disagree. To me, fp.write() is not about writing characters
> > to a stream. I think it makes much more sense as "writing bytes to a
> > stream" and the buffer interface fits that perfectly.
> 
> This is perfectly ok, but shouldn't the behaviour of fp.write()
> mimic that of previous Python versions ? How does JPython
> write the data ?

fp.write() had no semantics for writing Unicode objects since they
didn't exist. Therefore, we are not breaking or changing any behavior.

> Inlined different subject:
> I think the internal semantics of "s#" using the getreadbuffer slot
> and "t#" the getcharbuffer slot should be switched; see my other post.

1) Too late
2) The use of "t#" ("text") for the getcharbuffer slot was decided by
the Benevolent Dictator.
3) see (2)

> In previous Python versions "s#" had the semantics of string data
> with possibly embedded NULL bytes. Now it suddenly has the meaning
> of binary data and you can't simply change extensions to use the
> new "t#" because people are still using them with older Python
> versions.

Guido and I had a pretty long discussion on what the best approach here
was. I think we even pulled in Tim as a final arbiter, as I recall.

I believe "s#" remained getreadbuffer simply because it *also* meant
"give me the bytes of that object". If it changed to getcharbuffer, then
you could see exceptions in code that didn't raise exceptions
beforehand.

(more below)

> > There is no loss of data. You could argue that the byte order is lost,
> > but I think that is incorrect. The application defines the semantics:
> > the file might be defined as using host-order, or the application may be
> > writing a BOM at the head of the file.
> 
> The problem here is that many application were not written
> to handle these kind of objects. Previously they could only
> handle strings, now they can suddenly handle any object
> having the buffer interface and then fail when the data
> gets read back in.

An application is a complete unit. How are you suddenly going to
manifest Unicode objects within that application? The only way is if the
developer goes in and changes things; let them deal with the issues and
fallout of their change. The other is external changes such as an
upgrade to the interpreter or a module. Again, (IMO) if you're
perturbing a system, then you are responsible for also correcting any
problems you introduce.

In any case, Guido's position was that things can easily switch over to
the "t#" interface to prevent the class of error where you pass a
Unicode string to a function that expects a standard string.

> > > such things should be handled by extra methods, e.g. fp.rawwrite().
> >
> > I believe this would be a needless complication of the interface.
> 
> It would clarify things and make the interface 100% backward
> compatible again.

No. "s#" used to pull bytes from any buffer-capable object. Your
suggestion for "s#" to use the getcharbuffer could introduce exceptions
into currently-working code.

(this was probably Guido's prime motivation for the currently meaning of
"t#"... I can dig up the mail thread if people need an authoritative
commentary on the decision that was made)

> > > Hmm, I guess the philosophy behind the interface is not
> > > really clear.
> >
> > I didn't design or implement it initially, but (as you may have guessed)
> > I am a proponent of its existence.
> >
> > > Binary data is fetched via getreadbuffer and then
> > > interpreted as character data... I always thought that the
> > > getcharbuffer should be used for such an interpretation.
> >
> > The former is bad behavior. That is why getcharbuffer was added (by me,
> > for 1.5.2). It was a preventative measure for the introduction of
> > Unicode strings. Using getreadbuffer for characters would break badly
> > given a Unicode string. Therefore, "clients" that want (8-bit)
> > characters from an object supporting the buffer interface should use
> > getcharbuffer. The Unicode object doesn't implement it, implying that it
> > cannot provide 8-bit characters. You can get the raw bytes thru
> > getreadbuffer.
> 
> I agree 100%, but did you add the "t#" instead of having
> "s#" use the getcharbuffer interface ?

Yes. For reasons detailed above.

> E.g. my mxTextTools
> package uses "s#" on many APIs. Now someone could stick
> in a Unicode object and get pretty strange results without
> any notice about mxTextTools and Unicode being incompatible.

They could also stick in an array of integers. That supports the buffer
interface, meaning the "s#" in your code would extract the bytes from
it. In other words, people can already stick bogus stuff into your code.

This seems to be a moot argument.

> You could argue that I change to "t#", but that doesn't
> work since many people out there still use Python versions
> <1.5.2 and those didn't have "t#", so mxTextTools would then
> fail completely for them.

If support for the older versions is needed, then use an #ifdef to set
up the appropriate macro in some header. Use that throughout your code.

In any case: yes -- I would argue that you should absolutely be using
"t#".

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From fredrik at pythonware.com  Sat Aug 14 15:19:07 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 14 Aug 1999 15:19:07 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com>
Message-ID: <003101bee657$972d1550$f29b12c2@secret.pythonware.com>

M.-A. Lemburg <mal at lemburg.com> wrote:
> I meant PyUnicode_* style APIs for dealing with all the aspects
> of Unicode objects -- much like the PyString_* APIs available.

it's already there, of course.  see unicode.h
in the unicode distribution (Mark is hopefully
adding this to 1.6 in this very moment...)

> > I very much disagree. To me, fp.write() is not about writing characters
> > to a stream. I think it makes much more sense as "writing bytes to a
> > stream" and the buffer interface fits that perfectly.
> 
> This is perfectly ok, but shouldn't the behaviour of fp.write()
> mimic that of previous Python versions ? How does JPython
> write the data ?

the crucial point is how an average user expects things
to work.  the current design is quite assymmetric -- you
can easily *write* things that implement the buffer inter-
face to a stream, but how the heck do you get them
back?

(as illustrated by the marshal buglet...)

</F>


From fredrik at pythonware.com  Sat Aug 14 17:21:48 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 14 Aug 1999 17:21:48 +0200
Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?))
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org>
Message-ID: <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com>

Greg Stein <gstein at lyra.org> wrote:
> > E.g. my mxTextTools
> > package uses "s#" on many APIs. Now someone could stick
> > in a Unicode object and get pretty strange results without
> > any notice about mxTextTools and Unicode being incompatible.
> 
> They could also stick in an array of integers. That supports the buffer
> interface, meaning the "s#" in your code would extract the bytes from
> it. In other words, people can already stick bogus stuff into your code.

Except that people may expect unicode strings
to work just like any other kind of string, while
arrays are surely a different thing.

I'm beginning to suspect that the current buffer
design is partially broken; it tries to work around
at least two problems at once:

a) the current use of "string" objects for two purposes:
as strings of 8-bit characters, and as buffers containing
arbitrary binary data.

b) performance issues when reading/writing certain kinds
of data to/from streams.

and fails to fully address either of them.

</F>


From mal at lemburg.com  Sat Aug 14 18:30:21 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 18:30:21 +0200
Subject: [Python-Dev] Re: buffer design
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org>
Message-ID: <37B5999D.201EA88C@lemburg.com>

Greg Stein wrote:
> 
> M.-A. Lemburg wrote:
> >...
> > I meant PyUnicode_* style APIs for dealing with all the aspects
> > of Unicode objects -- much like the PyString_* APIs available.
> 
> Sure, these could be added as necessary. For raw access to the bytes, I
> would refer people to the abstract buffer functions, tho.

I guess that's up to them... PyUnicode_AS_WCHAR() could also be
exposed I guess (are C's wchar strings useable as Unicode basis ?).

> > > Your abstract.c functions make it quite simple.
> >
> > BTW, do we need an extra set of those with buffer index or not ?
> > Those would really be one-liners for the sake of hiding the
> > type slots from applications.
> 
> It sounds like NumPy and PIL would need it, which makes the landscape
> quite a bit different from the last time we discussed this (when we
> didn't imagine anybody needing those).

Ok, then I'll add them and post the new set next week.
 
> >...
> > > > Since fp.write() uses "s#" this would use the getreadbuffer
> > > > slot in 1.5.2... I think what it *should* do is use the
> > > > getcharbuffer slot instead (see my other post), since dumping
> > > > the raw unicode data would loose too much information. Again,
> > >
> > > I very much disagree. To me, fp.write() is not about writing characters
> > > to a stream. I think it makes much more sense as "writing bytes to a
> > > stream" and the buffer interface fits that perfectly.
> >
> > This is perfectly ok, but shouldn't the behaviour of fp.write()
> > mimic that of previous Python versions ? How does JPython
> > write the data ?
> 
> fp.write() had no semantics for writing Unicode objects since they
> didn't exist. Therefore, we are not breaking or changing any behavior.

The problem is hidden in polymorph functions and tools: previously
they could not handle anything but strings, now they also work
on arbitrary buffers without raising exceptions. That's what I'm
concerned about.
 
> > Inlined different subject:
> > I think the internal semantics of "s#" using the getreadbuffer slot
> > and "t#" the getcharbuffer slot should be switched; see my other post.
> 
> 1) Too late
> 2) The use of "t#" ("text") for the getcharbuffer slot was decided by
> the Benevolent Dictator.
> 3) see (2)

1) It's not too late: most people aren't even aware of the buffer
interface (except maybe the small crowd on this list).
 
2) A mistake in patchlevel release of Python can easily be undone
in the next minor release. No big deal.

3) Too remain even compatible to 1.5.2 in future revisions, a
new explicit marker, e.g. "r#" for raw data, could be added to hold the
code for getreadbuffer. "s#" and "z#" should then switch 
to using getcharbuffer.

> > In previous Python versions "s#" had the semantics of string data
> > with possibly embedded NULL bytes. Now it suddenly has the meaning
> > of binary data and you can't simply change extensions to use the
> > new "t#" because people are still using them with older Python
> > versions.
> 
> Guido and I had a pretty long discussion on what the best approach here
> was. I think we even pulled in Tim as a final arbiter, as I recall.

What was the final argument then ? (I guess the discussion was
held *before* the addition of getcharbuffer, right ?)
 
> I believe "s#" remained getreadbuffer simply because it *also* meant
> "give me the bytes of that object". If it changed to getcharbuffer, then
> you could see exceptions in code that didn't raise exceptions
> beforehand.
>
> (more below)

"s#" historically always meant "give be char* data with length".
It did not mean: "give me a pointer to the data area and its length".
That interpretation is new in 1.5.2. Even integers and lists
could provide buffer access with the new interpretation...
(sound evil ;-)

> > > There is no loss of data. You could argue that the byte order is lost,
> > > but I think that is incorrect. The application defines the semantics:
> > > the file might be defined as using host-order, or the application may be
> > > writing a BOM at the head of the file.
> >
> > The problem here is that many application were not written
> > to handle these kind of objects. Previously they could only
> > handle strings, now they can suddenly handle any object
> > having the buffer interface and then fail when the data
> > gets read back in.
> 
> An application is a complete unit. How are you suddenly going to
> manifest Unicode objects within that application? The only way is if the
> developer goes in and changes things; let them deal with the issues and
> fallout of their change. The other is external changes such as an
> upgrade to the interpreter or a module. Again, (IMO) if you're
> perturbing a system, then you are responsible for also correcting any
> problems you introduce.

Well, ok, if you're talking about standalone apps. I was
referring to applications which interact with other applications,
e.g. via files or sockets. You could pass a Unicode obj to a
socket and have it transfer the data to the other end without
getting an exception on the sending part of the connection.
The receiver would read the data as string and most probably
fail.

The whole application sitting in between and dealing with
the protocol and connection management wouldn't even notice
that you've just tried to extended its capabilities.

> In any case, Guido's position was that things can easily switch over to
> the "t#" interface to prevent the class of error where you pass a
> Unicode string to a function that expects a standard string.

Strange, why should code that relies on 8-bit character data
be changed because a new unsupported object type pops up ?
Code supporting the new type will have to be rewritten anyway,
but why break existing extensions in unpredicted ways ?

> > > > such things should be handled by extra methods, e.g. fp.rawwrite().
> > >
> > > I believe this would be a needless complication of the interface.
> >
> > It would clarify things and make the interface 100% backward
> > compatible again.
> 
> No. "s#" used to pull bytes from any buffer-capable object. Your
> suggestion for "s#" to use the getcharbuffer could introduce exceptions
> into currently-working code.

The buffer objects were introduced in 1.5.1, AFAIR. Changing
the semantics back to the original ones would only break
extensions relying on the behaviour you desribe -- the distribution
can easily be adapted to use some other marker, such as "r#".

> (this was probably Guido's prime motivation for the currently meaning of
> "t#"... I can dig up the mail thread if people need an authoritative
> commentary on the decision that was made)
> 
> > > > Hmm, I guess the philosophy behind the interface is not
> > > > really clear.
> > >
> > > I didn't design or implement it initially, but (as you may have guessed)
> > > I am a proponent of its existence.
> > >
> > > > Binary data is fetched via getreadbuffer and then
> > > > interpreted as character data... I always thought that the
> > > > getcharbuffer should be used for such an interpretation.
> > >
> > > The former is bad behavior. That is why getcharbuffer was added (by me,
> > > for 1.5.2). It was a preventative measure for the introduction of
> > > Unicode strings. Using getreadbuffer for characters would break badly
> > > given a Unicode string. Therefore, "clients" that want (8-bit)
> > > characters from an object supporting the buffer interface should use
> > > getcharbuffer. The Unicode object doesn't implement it, implying that it
> > > cannot provide 8-bit characters. You can get the raw bytes thru
> > > getreadbuffer.
> >
> > I agree 100%, but did you add the "t#" instead of having
> > "s#" use the getcharbuffer interface ?
> 
> Yes. For reasons detailed above.
> 
> > E.g. my mxTextTools
> > package uses "s#" on many APIs. Now someone could stick
> > in a Unicode object and get pretty strange results without
> > any notice about mxTextTools and Unicode being incompatible.
> 
> They could also stick in an array of integers. That supports the buffer
> interface, meaning the "s#" in your code would extract the bytes from
> it. In other words, people can already stick bogus stuff into your code.

Right now they can with 1.5.1 and 1.5.2 which is unfortunate.
I'd rather have the parsing function raise an exception.
 
> This seems to be a moot argument.

Not really when you have to support extensions across three
different patchlevels of Python.
 
> > You could argue that I change to "t#", but that doesn't
> > work since many people out there still use Python versions
> > <1.5.2 and those didn't have "t#", so mxTextTools would then
> > fail completely for them.
> 
> If support for the older versions is needed, then use an #ifdef to set
> up the appropriate macro in some header. Use that throughout your code.
>
> In any case: yes -- I would argue that you should absolutely be using
> "t#".

I can easily change my code, no big deal, but what about
the dozens of other extensions I don't want to bother diving
into ? I'd rather see an exception then complete garbage written
to a file or a socket.

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sat Aug 14 18:53:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 18:53:45 +0200
Subject: [Python-Dev] buffer design
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com>
Message-ID: <37B59F19.45C1D23B@lemburg.com>

Fredrik Lundh wrote:
> 
> Greg Stein <gstein at lyra.org> wrote:
> > > E.g. my mxTextTools
> > > package uses "s#" on many APIs. Now someone could stick
> > > in a Unicode object and get pretty strange results without
> > > any notice about mxTextTools and Unicode being incompatible.
> >
> > They could also stick in an array of integers. That supports the buffer
> > interface, meaning the "s#" in your code would extract the bytes from
> > it. In other words, people can already stick bogus stuff into your code.
> 
> Except that people may expect unicode strings
> to work just like any other kind of string, while
> arrays are surely a different thing.
> 
> I'm beginning to suspect that the current buffer
> design is partially broken; it tries to work around
> at least two problems at once:
> 
> a) the current use of "string" objects for two purposes:
> as strings of 8-bit characters, and as buffers containing
> arbitrary binary data.
> 
> b) performance issues when reading/writing certain kinds
> of data to/from streams.
> 
> and fails to fully address either of them.

True, a higher level interface for those two objectives would
certainly address them much better than what we are trying to do at
bit level. Buffers should probably only be treated as pointers to
abstract memory areas and nothing more.

BTW, what about my suggestion to extend buffers to also allocate
memory (in case you pass None as object) ? Or should array
be used for that purpose (its an undocumented feature of arrays) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sun Aug 15 04:59:25 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 19:59:25 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com>
Message-ID: <37B62D0D.6EC24240@lyra.org>

Fredrik Lundh wrote:
>...
> besides, what about buffers and threads?  if you
> return a pointer from getreadbuf, wouldn't it be
> good to know exactly when Python doesn't need
> that pointer any more?  explicit initbuffer/exitbuffer
> calls around each sequence of buffer operations
> would make that a lot safer...

This is a pretty obvious one, I think: it lasts only as long as the
object. PyString_AS_STRING is similar. Nothing new or funny here.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Aug 15 05:09:19 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 20:09:19 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
			<37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com>
Message-ID: <37B62F5E.30C62070@lyra.org>

M.-A. Lemburg wrote:
> 
> Fred L. Drake, Jr. wrote:
> >
> > M.-A. Lemburg writes:
> >  > Aside: Is the buffer interface reachable in any way from within
> >  > Python ? Why isn't the interface exposed via __XXX__ methods
> >  > on normal Python instances (could be implemented by returning a
> >  > buffer object) ?
> >
> >   Would it even make sense?  I though a large part of the intent was
> > to for performance, avoiding memory copies.  Perhaps there should be
> > an .__as_buffer__() which returned an object that supports the C
> > buffer interface.  I'm not sure how useful it would be; perhaps for
> > classes that represent image data?  They could return a buffer object
> > created from a string/array/NumPy array.

There is no way to do this. The buffer interface only returns pointers
to memory. There would be no place to return an intermediary object, nor
a way to retain the reference for it.

For example, your class instance quickly sets up a PyBufferObject with
the relevant data and returns that. The underlying C code must now hold
that reference *and* return a pointer to the calling code. Impossible.

Fredrik's open/close concept for buffer accesses would make this
possible, as long as clients are aware that any returned pointer is
valid only until the buffer_close call. The context argument he proposes
would hold the object reference.

Having class instances respond to the buffer interface is interesting,
but until more code attempts to *use* the interface, I'm not quite sure
of the utility...

>... 
> Hmm, how about adding a writeable binary object to the core ?
> This would be useful for the __getwritebbuffer__() API because
> currently, I think, only mmap'ed files are useable as write
> buffers -- no other in-memory type. Perhaps buffer objects
> could be used for this purpose too, e.g. by having them
> allocate the needed memory chunk in case you pass None as
> object.

Yes, this would be very good. I would recommend that you pass an
integer, however, rather than None. You need to tell it the size of the
buffer to allocate. Since buffer(5) has no meaning at the moment,
altering the semantics to include this form would not be a problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From da at ski.org  Sun Aug 15 08:10:59 1999
From: da at ski.org (David Ascher)
Date: Sat, 14 Aug 1999 23:10:59 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B62F5E.30C62070@lyra.org>
Message-ID: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>

On Sat, 14 Aug 1999, Greg Stein wrote:

> Having class instances respond to the buffer interface is interesting,
> but until more code attempts to *use* the interface, I'm not quite sure
> of the utility...

Well, here's an example from my work today.  Maybe someone can suggest an
alternative that I haven't seen.

I'm using buffer objects to pass pointers to structs back and forth
between Python and Windows (Win32's GUI scheme involves sending messages
to functions with, oftentimes, addresses of structs as arguments, and
expect the called function to modify the struct directly -- similarly, I
must call Win32 functions w/ pointers to memory that Windows will modify,
and be able to read the modified memory). With 'raw' buffer object
manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
Python), this works fine [*].  So far, no instances.

I also have a class which allows the user to describe the buffer memory
layout in a natural way given the C struct, and manipulate the buffer
layout w/ getattr/setattr.  For example:

class Win32MenuItemStruct(AutoStruct):
    #
    # for each slot, specify type (maps to a struct.pack specifier),
    # name (for setattr/getattr behavior) and optional defaults.
    #
    table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct),
             (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID),
             (UINT, 'fType', MFT_STRING),
             (UINT, 'fState', MFS_ENABLED),
             (UINT, 'wID', None),
             (HANDLE, 'hSubMenu', 0),
             (HANDLE, 'hbmpChecked', 0),
             (HANDLE, 'hbmpUnchecked', 0),
             (DWORD, 'dwItemData', 0),
             (LPSTR, 'name', None),
             (UINT, 'cch', 0)]

AutoStruct has machinery which allows setting of buffer slices by slot
name, conversion of numeric types, etc.  This is working well.

The only hitch is that to send the buffer to the SWIG'ed function call, I
have three options, none ideal:

   1) define a __str__ method which makes a string of the buffer and pass
      that to the function which expects an "s#" argument.  This send
      a copy of the data, not the address.  As a result, this works
      well for structs which I create from scratch as long as I don't need
      to see any changes that Windows might have performed on the memory.

   2) send the instance but make up my own 'get-the-instance-as-buffer'
      API -- complicates extension module code.

   3) send the buffer attribute of the instance instead of the instance --
      complicates Python code, and the C code isn't trivial because there
      is no 'buffer' typecode for PyArg_ParseTuple().

If I could define an 

  def __aswritebuffer__

and if there was a PyArg_ParseTuple() typecode associated with read/write
buffers (I nominate 'w'!), I believe things would be simpler -- I could
then send the instance, specify in the PyArgParse_Tuple that I want a
pointer to memory, and I'd be golden.

What did I miss?

--david

[*] I feel naughty modifying random bits of memory from Python, but Bill
    Gates made me do it!


From mal at lemburg.com  Sun Aug 15 10:47:00 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 15 Aug 1999 10:47:00 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
				<37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> <37B62F5E.30C62070@lyra.org>
Message-ID: <37B67E84.6BBC8136@lemburg.com>

Greg Stein wrote:
>
> [me suggesting new __XXX__ methods on Python instances to provide
>  the buffer slots to Python programmers]
>
> Having class instances respond to the buffer interface is interesting,
> but until more code attempts to *use* the interface, I'm not quite sure
> of the utility...

Well, there already is lots of code supporting the interface,
e.g. fp.write(), socket.write() etc. Basically all streaming
interfaces I guess. So these APIs could be used to "write"
the object directly into a file.

> >...
> > Hmm, how about adding a writeable binary object to the core ?
> > This would be useful for the __getwritebbuffer__() API because
> > currently, I think, only mmap'ed files are useable as write
> > buffers -- no other in-memory type. Perhaps buffer objects
> > could be used for this purpose too, e.g. by having them
> > allocate the needed memory chunk in case you pass None as
> > object.
> 
> Yes, this would be very good. I would recommend that you pass an
> integer, however, rather than None. You need to tell it the size of the
> buffer to allocate. Since buffer(5) has no meaning at the moment,
> altering the semantics to include this form would not be a problem.

I was thinking of using the existing buffer(object,offset,size)
constructor... that's why I took None as object. offset would
then always be 0 and size gives the size of the memory chunk
to allocate. Of course, buffer(size) would look nicer, but it seems
a rather peculiar interface definition to say: ok, if you pass
a real Python integer, we'll take that as size. Who knows, maybe
at some in the future, you want to "write" integers via the
buffer interface too... then you'd probably also want to write
None... so how about a new builtin writebuffer(size) ?

Also, I think it would make sense to extend buffers to have
methods and attributes:

.writeable - attribute that tells whether the buffer is writeable
.chardata - true iff the getcharbuffer slot is available
.asstring() - return the buffer as Python string object

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sun Aug 15 10:59:21 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 15 Aug 1999 10:59:21 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>
Message-ID: <37B68169.73E03C84@lemburg.com>

David Ascher wrote:
> 
> On Sat, 14 Aug 1999, Greg Stein wrote:
> 
> > Having class instances respond to the buffer interface is interesting,
> > but until more code attempts to *use* the interface, I'm not quite sure
> > of the utility...
> 
> Well, here's an example from my work today.  Maybe someone can suggest an
> alternative that I haven't seen.
> 
> I'm using buffer objects to pass pointers to structs back and forth
> between Python and Windows (Win32's GUI scheme involves sending messages
> to functions with, oftentimes, addresses of structs as arguments, and
> expect the called function to modify the struct directly -- similarly, I
> must call Win32 functions w/ pointers to memory that Windows will modify,
> and be able to read the modified memory). With 'raw' buffer object
> manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
> Python), this works fine [*].  So far, no instances.

So that's why you were suggesting that struct.pack returns a buffer
rather than a string ;-)

Actually, I think you could use arrays to do the trick right now,
because they are writeable (unlike strings). Until creating
writeable buffer objects becomes possible that is...

> I also have a class which allows the user to describe the buffer memory
> layout in a natural way given the C struct, and manipulate the buffer
> layout w/ getattr/setattr.  For example:
> 
> class Win32MenuItemStruct(AutoStruct):
>     #
>     # for each slot, specify type (maps to a struct.pack specifier),
>     # name (for setattr/getattr behavior) and optional defaults.
>     #
>     table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct),
>              (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID),
>              (UINT, 'fType', MFT_STRING),
>              (UINT, 'fState', MFS_ENABLED),
>              (UINT, 'wID', None),
>              (HANDLE, 'hSubMenu', 0),
>              (HANDLE, 'hbmpChecked', 0),
>              (HANDLE, 'hbmpUnchecked', 0),
>              (DWORD, 'dwItemData', 0),
>              (LPSTR, 'name', None),
>              (UINT, 'cch', 0)]
> 
> AutoStruct has machinery which allows setting of buffer slices by slot
> name, conversion of numeric types, etc.  This is working well.
> 
> The only hitch is that to send the buffer to the SWIG'ed function call, I
> have three options, none ideal:
> 
>    1) define a __str__ method which makes a string of the buffer and pass
>       that to the function which expects an "s#" argument.  This send
>       a copy of the data, not the address.  As a result, this works
>       well for structs which I create from scratch as long as I don't need
>       to see any changes that Windows might have performed on the memory.
> 
>    2) send the instance but make up my own 'get-the-instance-as-buffer'
>       API -- complicates extension module code.
> 
>    3) send the buffer attribute of the instance instead of the instance --
>       complicates Python code, and the C code isn't trivial because there
>       is no 'buffer' typecode for PyArg_ParseTuple().
> 
> If I could define an
> 
>   def __aswritebuffer__
> 
> and if there was a PyArg_ParseTuple() typecode associated with read/write
> buffers (I nominate 'w'!), I believe things would be simpler -- I could
> then send the instance, specify in the PyArgParse_Tuple that I want a
> pointer to memory, and I'd be golden.
> 
> What did I miss?

Just a naming thingie: __getwritebuffer__ et al. would map to the
C interfaces more directly.

The new typecode "w#" for writeable buffer style objects is a good idea
(it should only work on single segment buffers).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fredrik at pythonware.com  Sun Aug 15 12:32:59 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 15 Aug 1999 12:32:59 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org>
Message-ID: <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>

> Fredrik Lundh wrote:
> >...
> > besides, what about buffers and threads?  if you
> > return a pointer from getreadbuf, wouldn't it be
> > good to know exactly when Python doesn't need
> > that pointer any more?  explicit initbuffer/exitbuffer
> > calls around each sequence of buffer operations
> > would make that a lot safer...
> 
> This is a pretty obvious one, I think: it lasts only as long as the
> object. PyString_AS_STRING is similar. Nothing new or funny here.

well, I think the buffer behaviour is both
new and pretty funny:

from array import array

a = array("f", [0]*8192)

b = buffer(a)

for i in range(1000):
    a.append(1234)

print b

in other words, the buffer interface should
be redesigned, or removed.

(though I'm sure AOL would find some inter-
resting use for this ;-)

</F>

    "Confusing?  Yes, but this is a lot better than
    allowing arbitrary pointers!"
    -- GvR on assignment operators, November 91


From da at ski.org  Sun Aug 15 18:54:23 1999
From: da at ski.org (David Ascher)
Date: Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B68169.73E03C84@lemburg.com>
Message-ID: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>

On Sun, 15 Aug 1999, M.-A. Lemburg wrote:

> Actually, I think you could use arrays to do the trick right now,
> because they are writeable (unlike strings). Until creating
> writeable buffer objects becomes possible that is...

No, because I can't make an array around existing memory which Win32
allocates before I get to it.

> Just a naming thingie: __getwritebuffer__ et al. would map to the
> C interfaces more directly.

Whatever.

> The new typecode "w#" for writeable buffer style objects is a good idea
> (it should only work on single segment buffers).

Indeed.

--david


From gstein at lyra.org  Sun Aug 15 22:27:57 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:27:57 -0700
Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? ))
References: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>
Message-ID: <37B722CD.383A2A9E@lyra.org>

David Ascher wrote:
> On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> ...
> > The new typecode "w#" for writeable buffer style objects is a good idea
> > (it should only work on single segment buffers).
> 
> Indeed.

I just borrowed Guido's time machine. That typecode is already in 1.5.2.

:-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Aug 15 22:35:25 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:35:25 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>
Message-ID: <37B7248D.31E5D2BF@lyra.org>

Fredrik Lundh wrote:
>...
> well, I think the buffer behaviour is both
> new and pretty funny:

I think the buffer interface was introduced in 1.5 (by Jack?). I added
the 8-bit character buffer slot and buffer objects in 1.5.2.

> from array import array
> 
> a = array("f", [0]*8192)
> 
> b = buffer(a)
> 
> for i in range(1000):
>     a.append(1234)
> 
> print b
> 
> in other words, the buffer interface should
> be redesigned, or removed.

I don't understand what you believe is weird here. Also, are you saying
the buffer *interface* is weird, or the buffer *object* ?

thx,
-g

--
Greg Stein, http://www.lyra.org/


From da at ski.org  Sun Aug 15 22:49:23 1999
From: da at ski.org (David Ascher)
Date: Sun, 15 Aug 1999 13:49:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in
 abstract.c? ))
In-Reply-To: <37B722CD.383A2A9E@lyra.org>
Message-ID: <Pine.WNT.4.05.9908151347050.68-100000@david.ski.org>

On Sun, 15 Aug 1999, Greg Stein wrote:

> David Ascher wrote:
> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> > ...
> > > The new typecode "w#" for writeable buffer style objects is a good idea
> > > (it should only work on single segment buffers).
> > 
> > Indeed.
> 
> I just borrowed Guido's time machine. That typecode is already in 1.5.2.

Ha.  Cool. 

--da


From gstein at lyra.org  Sun Aug 15 22:53:51 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:53:51 -0700
Subject: [Python-Dev] instances as buffers
References: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>
Message-ID: <37B728DF.2CA2A20A@lyra.org>

David Ascher wrote:
>...
> I'm using buffer objects to pass pointers to structs back and forth
> between Python and Windows (Win32's GUI scheme involves sending messages
> to functions with, oftentimes, addresses of structs as arguments, and
> expect the called function to modify the struct directly -- similarly, I
> must call Win32 functions w/ pointers to memory that Windows will modify,
> and be able to read the modified memory). With 'raw' buffer object
> manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
> Python), this works fine [*].  So far, no instances.

How do you manage the lifetimes of the memory and objects?
PyBuffer_FromReadWriteMemory() creates a buffer object that points to
memory. You need to ensure that the memory exists as long as the buffer
does.

Would it make more sense to use PyBuffer_New(size)?

Note: PyBuffer_FromMemory() (read-only) was built primarily for the case
where you have static constants in an extension module (strings, code
objects, etc) and want to expose them to Python without copying them
into the heap. Currently, stuff like this must be copied into a dynamic
string object to be exposed to Python. The
PyBuffer_FromReadWriteMemory() is there for symmetry, but it can be very
dangerous to use because of the lifetime problem.

PyBuffer_New() allocates its own memory, so the lifetimes are managed
properly. PyBuffer_From*Object maintains a reference to the target
object so that the target object can be kept around at least as long as
the buffer.

> I also have a class which allows the user to describe the buffer memory
> layout in a natural way given the C struct, and manipulate the buffer
> layout w/ getattr/setattr.  For example:

This is a very cool class. Mark and I had discussed doing something just
like this (a while back) for some of the COM stuff. Basically, we'd want
to generate these structures from type libraries.

>...
> The only hitch is that to send the buffer to the SWIG'ed function call, I
> have three options, none ideal:
> 
>    1) define a __str__ method which makes a string of the buffer and pass
>       that to the function which expects an "s#" argument.  This send
>       a copy of the data, not the address.  As a result, this works
>       well for structs which I create from scratch as long as I don't need
>       to see any changes that Windows might have performed on the memory.

Note that "s#" can be used directly against the buffer object. You could
pass it directly rather than via __str__.

>    2) send the instance but make up my own 'get-the-instance-as-buffer'
>       API -- complicates extension module code.
> 
>    3) send the buffer attribute of the instance instead of the instance --
>       complicates Python code, and the C code isn't trivial because there
>       is no 'buffer' typecode for PyArg_ParseTuple().
> 
> If I could define an
> 
>   def __aswritebuffer__
> 
> and if there was a PyArg_ParseTuple() typecode associated with read/write
> buffers (I nominate 'w'!), I believe things would be simpler -- I could
> then send the instance, specify in the PyArgParse_Tuple that I want a
> pointer to memory, and I'd be golden.
> 
> What did I miss?

You can do #3 today since there is a buffer typecode present ("w" or
"w#"). It will complicate Python code a bit since you need to pass the
buffer, but it is the simplest of the three options.

Allowing instances to return buffers does seem to make sense, although
it exposes a lot of underlying machinery at the Python level. It might
be nicer to find a better semantic for this than just exposing the
buffer interface slots.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From da at ski.org  Sun Aug 15 23:07:35 1999
From: da at ski.org (David Ascher)
Date: Sun, 15 Aug 1999 14:07:35 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Re: instances as buffers
In-Reply-To: <37B728DF.2CA2A20A@lyra.org>
Message-ID: <Pine.WNT.4.05.9908151358310.68-100000@david.ski.org>

On Sun, 15 Aug 1999, Greg Stein wrote:

> How do you manage the lifetimes of the memory and objects?
> PyBuffer_FromReadWriteMemory() creates a buffer object that points to
> memory. You need to ensure that the memory exists as long as the buffer
> does.

For those cases where I use PyBuffer_FromReadWriteMemory, I have no
control over the memory involved.  Windows allocates the memory, lets me
use it for a litle while, and it cleans it up whenever it feels like it.
It hasn't been a problem yet, but I agree that it's possibly a problem.
I'd call it a problem w/ the win32 API, though.

> Would it make more sense to use PyBuffer_New(size)?

Again, I can't because I am given a pointer and am expected to modify e.g.
bytes 10-12 starting from that memory location.

> This is a very cool class. Mark and I had discussed doing something just
> like this (a while back) for some of the COM stuff. Basically, we'd want
> to generate these structures from type libraries.

I know zilch about type libraries.  This is for CE work, although none
about this class is CE-specific.  Do type libraries give the same kind of
info?

> You can do #3 today since there is a buffer typecode present ("w" or
> "w#"). It will complicate Python code a bit since you need to pass the
> buffer, but it is the simplest of the three options.

Ok.  Time to patch SWIG again!

--david


From Vladimir.Marangozov at inrialpes.fr  Mon Aug 16 01:35:10 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Mon, 16 Aug 1999 00:35:10 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm"
Message-ID: <199908152335.AAA55842@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call; if I used the debugger often enough to care <wink>, I'd think
> about munging in a new opcode to make breakpoint sites explicit).
> 
> immutability-is-made-to-be-violated-ly y'rs  - tim
> 

Could you elaborate a bit more on this? Do you mean setting breakpoints
on a per opcode basis (for example by exchanging the original opcode
with a new BREAKPOINT opcode in the code object) and use the lineno tab
for breakpoints based on the source listing?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one at email.msn.com  Mon Aug 16 04:31:16 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Sun, 15 Aug 1999 22:31:16 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908152335.AAA55842@pukapuka.inrialpes.fr>
Message-ID: <000101bee78f$6aa217e0$f22d2399@tim>

[Vladimir Marangozov]
> Could you elaborate a bit more on this?

No time for this now -- sorry.

> Do you mean setting breakpoints on a per opcode basis (for example
> by exchanging the original opcode with a new BREAKPOINT opcode in
> the code object) and use the lineno tab for breakpoints based on
> the source listing?

Something like that.  The classic way to implement positional breakpoints is
to perturb the code; the classic problem is how to get back the effect of
the code that was overwritten.


From gstein at lyra.org  Mon Aug 16 06:42:19 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 21:42:19 -0700
Subject: [Python-Dev] Re: why
References: <Pine.WNT.4.05.9908152139000.180-100000@david.ski.org>
Message-ID: <37B796AB.34F6F93@lyra.org>

David Ascher wrote:
> 
> Why does buffer(array('c', 'test')) return a read-only buffer?

Simply because the buffer() builtin always creates a read-only object,
rather than selecting read/write when possible.

Shouldn't be hard to alter the semantics of buffer() to do so. Maybe do
this at the same time as updating it to create read/write buffers out of
the blue.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Mon Aug 16 08:42:17 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 16 Aug 1999 02:42:17 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <19990813214817.5393C1C4742@oratrix.oratrix.nl>
Message-ID: <000b01bee7b2$7c62d780$f22d2399@tim>

[Jack Jansen]
> ...

A long time ago, Dianne Hackborn actually implemented a scheme like this,
under the name VREF (for "virtual reference", or some such).  IIRC,
differences from your scheme were mainly that:

1) There was an elaborate proxy mechanism to avoid having to explicitly
strengthen the weak.

2) Each object contained a pointer to a linked list of associated weak refs.

This predates DejaNews, so may be a pain to find.

> ...
> We add a new builtin function (or a module with that function)
> weak(). This returns a weak reference to the object passed as a
> parameter. A weak object has one method: strong(), which returns the
> corresponding real object or raises an exception if the object doesn't
> exist anymore.

This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash"
functions, except that their hash returns an (unbounded) int and guarantees
that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a
stronger guarantee than Python's "id", which may return the same int for
objects with disjoint lifetimes; the other reason object address isn't
appropriate for them is that objects can be moved by garbage collection, but
hash is an object invariant).

Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises
an exception.  By most accounts (I haven't used it seriously myself), it's a
usable interface.

> ...
> to implement this I need to add a pointer to every object.

That's unattractive, of course.

> ...
> (actually: we could make do with a single bit in every object, with
> the bit meaning "this object has an associated weak object". We could
> then use a global dictionary indexed by object address to find the
> weak object)

Is a single bit actually smaller than a pointer?  For example, on most
machines these days

#define PyObject_HEAD \
	int ob_refcnt; \
	struct _typeobject *ob_type;

is two 4-byte fields packed solid already, and structure padding prevents
adding anything less than a 4-byte increment in reality.  I guess on Alpha
there's a 4-byte hole here, but I don't want weak pointers enough to switch
machines <wink>.

OTOH, sooner or later Guido is going to want a mark bit too, so the other
way to view this is that 32 new flag bits are as cheap as one <wink>.

There's one other thing I like about this:  it can get rid of the dicey

> Strong() checks that self->object->weak == self and returns
> self->object (INCREFfed) if it is.

check.  If object has gone away, you're worried that self->object may (on
some systems) point to a newly-invalid address.  But worse than that, its
memory may get reused, and then self->object may point into the *middle* of
some other object where the bit pattern at the "weak" offset just happens to
equal self.

Let's try a sketch in pseduo-Python, where __xxx are secret functions that
do the obvious things (and glossing over thread safety since these are
presumably really implemented in C):

# invariant:  __is_weak_bit_set(obj) == id2weak.has_key(id(obj))
# So "the weak bit" is simply an optimization, sparing most objects
# from a dict lookup when they die.
# The invariant is delicate in the presence of threads.

id2weak = {}

class _Weak:
    def __init__(self, obj):
        self.id = id(obj)  # obj's refcount not bumped
        __set_weak_bit(obj)
        id2weak[self.id] = self
        # note that "the system" (see below) sets self.id
        # to None if obj dies

    def strong(self):
        if self.id is None:
            raise DeadManWalkingError(self.id)
        return __id2obj(self.id)  # will bump obj's refcount

    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

def weak(obj):
    return id2weak.get(id(obj), None) or _Weak(obj)

and then whenever an object of any kind is deleted the system does:

    if __is_weak_bit_set(obj):
        objid = id(obj)
        id2weak[objid].id = None
        del id2weak[objid]

In my current over-tired state, I think that's safe (modulo threads),
portable and reasonably fast; I do think the extra bit costs 4 bytes,
though.

> ...
> The weak object isn't transparent, because you have to call strong()
> before you can do anything with it, but this is an advantage (says he,
> aspiring to a career in politics or sales:-): with a transparent weak
> object the object could disappear at unexpected moments and with this
> scheme it can't, because when you have the object itself in hand you
> have a refcount too.

Explicit is better than implicit for me.

[M.-A. Lemburg]
> Have you checked the weak reference dictionary implementation
> by Dieter Maurer ? It's at:
>
>	http://www.handshake.de/~dieter/weakdict.html

A project where I work is using it; it blows up a lot <wink/frown>.

While some form of weak dict is what most people want in the end, I'm not
sure Dieter's decision to support weak dicts with only weak values (not weak
keys) is sufficient.  For example, the aforementioned project wants to
associate various computed long strings with certain hashable objects, and
for some reason or other (ain't my project ...) these objects can't be
changed.  So they can't store the strings in the objects.  So they'd like to
map the objects to the strings via assorted dicts.  But using the object as
a dict key keeps it (and, via the dicts, also its associated strings)
artificially alive; they really want a weakdict with weak *keys*.

I'm not sure I know of a clear & fast way to implement a weakdict building
only on the weak() function.  Jack?

Using weak objects as values (or keys) with an ordinary dict can prevent
their referents from being kept artificially alive, but that doesn't get the
dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
object when its referent goes away; that would at least give the WO a chance
to purge itself from structures it knows it's in ...

> ...
> BTW, how would this be done in JPython ? I guess it doesn't
> make much sense there because cycles are no problem for the
> Java VM GC.

Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
"hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
details.  I stopped paying attention to Java, so it's up to you to tell us
what you learn about it <wink>.


From fredrik at pythonware.com  Mon Aug 16 09:06:43 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 16 Aug 1999 09:06:43 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org>
Message-ID: <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com>

> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.
> 
> > from array import array
> > 
> > a = array("f", [0]*8192)
> > 
> > b = buffer(a)
> > 
> > for i in range(1000):
> >     a.append(1234)
> > 
> > print b
> > 
> > in other words, the buffer interface should
> > be redesigned, or removed.
> 
> I don't understand what you believe is weird here.

did you run that code?

it may work, it may bomb, or it may generate bogus
output. all depending on your memory allocator, the
phase of the moon, etc. just like back in the C/C++
days...

imo, that's not good enough for a core feature.

</F>


From gstein at lyra.org  Mon Aug 16 09:15:54 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 00:15:54 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com>
Message-ID: <37B7BAAA.1E6EE4CA@lyra.org>

Fredrik Lundh wrote:
> 
> > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > the 8-bit character buffer slot and buffer objects in 1.5.2.
> >
> > > from array import array
> > >
> > > a = array("f", [0]*8192)
> > >
> > > b = buffer(a)
> > >
> > > for i in range(1000):
> > >     a.append(1234)
> > >
> > > print b
> > >
> > > in other words, the buffer interface should
> > > be redesigned, or removed.
> >
> > I don't understand what you believe is weird here.
> 
> did you run that code?

Yup. It printed nothing.

> it may work, it may bomb, or it may generate bogus
> output. all depending on your memory allocator, the
> phase of the moon, etc. just like back in the C/C++
> days...

It probably appeared as an empty string because the construction of the
array filled it with zeroes (at least the first byte).

Regardless, I'd be surprised if it crashed the interpreter. The print
command is supposed to do a str() on the object, which creates a
PyStringObject from the buffer contents. Shouldn't be a crash there.

> imo, that's not good enough for a core feature.

If it crashed, then sure. But I'd say that indicates a bug rather than a
design problem. Do you have a stack trace from a crash?

Ah. I just worked through, in my head, what is happening here. The
buffer object caches the pointer returned by the array object. The
append on the array does a realloc() somewhere, thereby invalidating the
pointer inside the buffer object.

Icky. Gotta think on this one... As an initial thought, it would seem
that the buffer would have to re-query the pointer for each operation.
There are performance implications there, of course, but that would
certainly fix the problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jack at oratrix.nl  Mon Aug 16 11:42:42 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 11:42:42 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: Message by David Ascher <da@ski.org> ,
	     Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) , 
 <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>
Message-ID: <19990816094243.3CE83303120@snelboot.oratrix.nl>

> On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> 
> > Actually, I think you could use arrays to do the trick right now,
> > because they are writeable (unlike strings). Until creating
> > writeable buffer objects becomes possible that is...
> 
> No, because I can't make an array around existing memory which Win32
> allocates before I get to it.

Would adding a buffer interface to cobject solve your problem? Cobject is 
described as being used for passing C objects between Python modules, but I've 
always thought of it as passing C objects from one C routine to another C 
routine through Python, which doesn't necessarily understand what the object 
is all about.

That latter description seems to fit your bill quite nicely.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack at oratrix.nl  Mon Aug 16 11:49:41 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 11:49:41 +0200
Subject: [Python-Dev] buffer interface considered harmful 
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Sun, 15 Aug 1999 13:35:25 -0700 , <37B7248D.31E5D2BF@lyra.org> 
Message-ID: <19990816094941.83BE2303120@snelboot.oratrix.nl>

> >...
> > well, I think the buffer behaviour is both
> > new and pretty funny:
> 
> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.

Ah, now I understand why I didn't understand some of the previous 
conversation: I hadn't never come across the buffer *objects* (as opposed to 
the buffer *interface*) until Fredrik's example.

I've just look at it, and I'm not sure I understand the full intentions of the 
buffer object. Buffer objects can either behave as the "buffer-aspect" of the 
object behind them (without the rest of their functionality) or as array 
objects, and if they start out life as the first they can evolve into the 
second, is that right?

Is there a rationale behind this design, or is it just something that 
happened?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein at lyra.org  Mon Aug 16 11:56:31 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 02:56:31 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990816094941.83BE2303120@snelboot.oratrix.nl>
Message-ID: <37B7E04F.3843004@lyra.org>

Jack Jansen wrote:
>...
> I've just look at it, and I'm not sure I understand the full intentions of the
> buffer object. Buffer objects can either behave as the "buffer-aspect" of the
> object behind them (without the rest of their functionality) or as array
> objects, and if they start out life as the first they can evolve into the
> second, is that right?
> 
> Is there a rationale behind this design, or is it just something that
> happened?

The object doesn't change. You create it as a reference to an existing
object's buffer (as exported via the buffer interface), or you create it
as a reference to some arbitrary memory.

The buffer object provides (optionally read/write) string-like behavior
to any object that supports buffer behavior. It can also be used to make
lightweight slices of another object. For example:

>>> a = "abcdefghi"
>>> b = buffer(a, 3, 3)
>>> print b
def
>>>

In the above example, there is only one copy of "def" (the portion
inside of the string object referenced by <a>).

The string-like behavior can be quite nice for memory-mapped files.
Andrew's mmapfile module's file objects export the buffer interface.
This means that you can open a file, wrap a buffer around it, and
perform quick and easy random-access on the thing. You could even select
slices of the file and pass them around as if they were strings, without
loading anything into the process heap. (I want to try mmap'ing a .pyc
and create code objects that have buffer-based bytecode streams; it will
be interesting to see if this significantly reduces memory consumption
(in terms of the heap size; the mmap'd .pyc can be shared across
processes)).

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jim at digicool.com  Mon Aug 16 14:30:41 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 08:30:41 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>
Message-ID: <37B80471.F0F467C9@digicool.com>

Fredrik Lundh wrote:
> 
> > Fredrik Lundh wrote:
> > >...
> > > besides, what about buffers and threads?  if you
> > > return a pointer from getreadbuf, wouldn't it be
> > > good to know exactly when Python doesn't need
> > > that pointer any more?  explicit initbuffer/exitbuffer
> > > calls around each sequence of buffer operations
> > > would make that a lot safer...
> >
> > This is a pretty obvious one, I think: it lasts only as long as the
> > object. PyString_AS_STRING is similar. Nothing new or funny here.
> 
> well, I think the buffer behaviour is both
> new and pretty funny:
> 
> from array import array
> 
> a = array("f", [0]*8192)
> 
> b = buffer(a)
> 
> for i in range(1000):
>     a.append(1234)
> 
> print b
> 
> in other words, the buffer interface should
> be redesigned, or removed.

A while ago I asked for some documentation on the Buffer
interface.  I basically got silence.  At this point, I 
don't have a good idea what buffers are for and I don't see alot
of evidence that there *is* a design. I assume that there was
a design, but I can't see it.  This whole discussion makes me
very queasy.  

I'm probably just out of it, since I don't have
time to read the Python list anymore. Presumably the buffer
interface was proposed and discussed there at some distant
point in the past.

(I can't pay as much attention to this discussion as I suspect
I should, due to time constaints and due to a basic understanding
of the rational for the buffer interface.  Jst now I caught a sniff
of something I find kinda repulsive.  I think I hear you all talking about
beasies that hold a reference to some object's internal storage and that
have write operations so you can write directly to the objects storage 
bypassing the object interfaces. I probably just imagined it.)

</whine>

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gstein at lyra.org  Mon Aug 16 14:41:23 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 05:41:23 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com>
Message-ID: <37B806F3.2C5EDC44@lyra.org>

Jim Fulton wrote:
>...
> A while ago I asked for some documentation on the Buffer
> interface.  I basically got silence.  At this point, I

I think the silence was caused by the simple fact that the documentation
does not (yet) exist. That's all... nothing nefarious.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Mon Aug 16 14:05:35 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:05:35 +0200
Subject: [Python-Dev] Re: w# typecode (was: marshal (was:Buffer interface in abstract.c? ))
References: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org> <37B722CD.383A2A9E@lyra.org>
Message-ID: <37B7FE8F.30C35284@lemburg.com>

Greg Stein wrote:
> 
> David Ascher wrote:
> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> > ...
> > > The new typecode "w#" for writeable buffer style objects is a good idea
> > > (it should only work on single segment buffers).
> >
> > Indeed.
> 
> I just borrowed Guido's time machine. That typecode is already in 1.5.2.
> 
> :-)

Ah, cool :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Mon Aug 16 14:29:31 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:29:31 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000b01bee7b2$7c62d780$f22d2399@tim>
Message-ID: <37B8042B.21DE6053@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > Have you checked the weak reference dictionary implementation
> > by Dieter Maurer ? It's at:
> >
> >       http://www.handshake.de/~dieter/weakdict.html
> 
> A project where I work is using it; it blows up a lot <wink/frown>.
> 
> While some form of weak dict is what most people want in the end, I'm not
> sure Dieter's decision to support weak dicts with only weak values (not weak
> keys) is sufficient.  For example, the aforementioned project wants to
> associate various computed long strings with certain hashable objects, and
> for some reason or other (ain't my project ...) these objects can't be
> changed.  So they can't store the strings in the objects.  So they'd like to
> map the objects to the strings via assorted dicts.  But using the object as
> a dict key keeps it (and, via the dicts, also its associated strings)
> artificially alive; they really want a weakdict with weak *keys*.
> 
> I'm not sure I know of a clear & fast way to implement a weakdict building
> only on the weak() function.  Jack?
> 
> Using weak objects as values (or keys) with an ordinary dict can prevent
> their referents from being kept artificially alive, but that doesn't get the
> dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
> object when its referent goes away; that would at least give the WO a chance
> to purge itself from structures it knows it's in ...

Perhaps one could fiddle something out of the Proxy objects
in mxProxy (you know where...). These support a special __cleanup__
protocol that I use a lot to work around circular garbage:
the __cleanup__ method of the referenced object is called prior
to destroying the proxy; even if the reference count on the
object has not yet gone down to 0.

This makes direct circles possible without problems: the parent
can reference a child through the proxy and the child can reference the
parent directly. As soon as the parent is cleaned up, the reference to
the proxy is deleted which then automagically makes the
back reference in the child disappear, allowing the parent
to be deallocated after cleanup without leaving a circular
reference around.

> > ...
> > BTW, how would this be done in JPython ? I guess it doesn't
> > make much sense there because cycles are no problem for the
> > Java VM GC.
> 
> Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
> "hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
> details.  I stopped paying attention to Java, so it's up to you to tell us
> what you learn about it <wink>.

Thanks for the reference... but I guess this will remain a
weak one for some time since the latter is currently a
limited resource :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Mon Aug 16 14:41:51 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:41:51 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> <37B7BAAA.1E6EE4CA@lyra.org>
Message-ID: <37B8070F.763C3FF8@lemburg.com>

Greg Stein wrote:
> 
> Fredrik Lundh wrote:
> >
> > > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > > the 8-bit character buffer slot and buffer objects in 1.5.2.
> > >
> > > > from array import array
> > > >
> > > > a = array("f", [0]*8192)
> > > >
> > > > b = buffer(a)
> > > >
> > > > for i in range(1000):
> > > >     a.append(1234)
> > > >
> > > > print b
> > > >
> > > > in other words, the buffer interface should
> > > > be redesigned, or removed.
> > >
> > > I don't understand what you believe is weird here.
> >
> > did you run that code?
> 
> Yup. It printed nothing.
> 
> > it may work, it may bomb, or it may generate bogus
> > output. all depending on your memory allocator, the
> > phase of the moon, etc. just like back in the C/C++
> > days...
> 
> It probably appeared as an empty string because the construction of the
> array filled it with zeroes (at least the first byte).
> 
> Regardless, I'd be surprised if it crashed the interpreter. The print
> command is supposed to do a str() on the object, which creates a
> PyStringObject from the buffer contents. Shouldn't be a crash there.
> 
> > imo, that's not good enough for a core feature.
> 
> If it crashed, then sure. But I'd say that indicates a bug rather than a
> design problem. Do you have a stack trace from a crash?
> 
> Ah. I just worked through, in my head, what is happening here. The
> buffer object caches the pointer returned by the array object. The
> append on the array does a realloc() somewhere, thereby invalidating the
> pointer inside the buffer object.
> 
> Icky. Gotta think on this one... As an initial thought, it would seem
> that the buffer would have to re-query the pointer for each operation.
> There are performance implications there, of course, but that would
> certainly fix the problem.

I guess that's the way to go. I wouldn't want to think
about those details when using buffer objects and a function call
is still better than a copy... it would do the init/exit
wrapping implicitly: init at the time the getreadbuffer
call is made and exit next time a thread switch is done - 
provided that the functions using the memory pointer also
keep a reference to the buffer object alive (but that should
be natural as this is always done when dealing with references
in a safe way).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jim at digicool.com  Mon Aug 16 15:26:40 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 09:26:40 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> <37B806F3.2C5EDC44@lyra.org>
Message-ID: <37B81190.165C373E@digicool.com>

Greg Stein wrote:
> 
> Jim Fulton wrote:
> >...
> > A while ago I asked for some documentation on the Buffer
> > interface.  I basically got silence.  At this point, I
> 
> I think the silence was caused by the simple fact that the documentation
> does not (yet) exist. That's all... nothing nefarious.

I didn't mean to suggest anything nefarious.  I do think that a change that
affects something as basic as the standard object type layout and that
generates this much discussion really should be documented before it
becomes part of the core.  I'd especially like to see some kind of document
that includes information like:

  - A problem statement that describes the problem the change is
    solving,

  - How does the solution solve the problem,

  - When and how should people writing new types support the new
    interfaces?

We're not talking about a new library module here.  There's been 
a change to the core object interface.

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From jack at oratrix.nl  Mon Aug 16 15:45:31 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 15:45:31 +0200
Subject: [Python-Dev] buffer interface considered harmful 
In-Reply-To: Message by Jim Fulton <jim@digicool.com> ,
	     Mon, 16 Aug 1999 08:30:41 -0400 , <37B80471.F0F467C9@digicool.com> 
Message-ID: <19990816134531.C30B5303120@snelboot.oratrix.nl>

> A while ago I asked for some documentation on the Buffer
> interface.  I basically got silence.  At this point, I 
> don't have a good idea what buffers are for and I don't see alot
> of evidence that there *is* a design. I assume that there was
> a design, but I can't see it.  This whole discussion makes me
> very queasy.  

Okay, as I'm apparently not the only one who is queasy let's start from 
scratch.

First, there is the old buffer _interface_. This is a C interface that allows 
extension (and builtin) modules and functions a unified way to access objects 
if they want to write the object to file and similar things. It is also what 
the PyArg_ParseTuple "s#" returns. This is, in C, the 
getreadbuffer/getwritebuffer interface.

Second, there's the extension the the buffer interface as of 1.5.2. This is 
again only available in C, and it allows C programmers to get an object _as an 
ASCII string_. This is meant for things like regexp modules, to access any 
"textual" object as an ASCII string. This is the getcharbuffer interface, and 
bound to the "t#" specifier in PyArg_ParseTuple.

Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports 
the functionality of the buffer interface to Python, but it does a bit more as 
well, because the buffer objects have a sort of copy-on-write semantics that 
means they may or may not be "attached" to a python object through the buffer 
interface.

<personal opinion>
I think that the C interface and the object should be treated completely 
separately. I definitely want the C interface, but I personally don't use the 
Python buffer objects, so I don't really care all that much about those. Also, 
I think that the buffer objects might become easier to understand if we don't 
think of it as "the buffer interface exported to python", but as "Python 
buffer objects, that may share memory with other Python objects as an 
optimization".
</personal opinion>
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jim at digicool.com  Mon Aug 16 18:03:54 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 12:03:54 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990816134531.C30B5303120@snelboot.oratrix.nl>
Message-ID: <37B8366A.82B305C7@digicool.com>

Jack Jansen wrote:
> 
> > A while ago I asked for some documentation on the Buffer
> > interface.  I basically got silence.  At this point, I
> > don't have a good idea what buffers are for and I don't see alot
> > of evidence that there *is* a design. I assume that there was
> > a design, but I can't see it.  This whole discussion makes me
> > very queasy.
> 
> Okay, as I'm apparently not the only one who is queasy let's start from
> scratch.

Yee ha!
 
> First, there is the old buffer _interface_. This is a C interface that allows
> extension (and builtin) modules and functions a unified way to access objects
> if they want to write the object to file and similar things.

Is this serialization?  What does this achiev that, say, the pickling
protocols don't achiev? What other problems does it solve?

> It is also what
> the PyArg_ParseTuple "s#" returns. This is, in C, the
> getreadbuffer/getwritebuffer interface.

Huh? "s#" doesn't return a string? Or are you saying that you can
pass a non-string object to a C function that uses "s#" and have it
bufferized and then stringized?  In either case, this is not
consistent with the documentation (interface) of PyArg_ParseTuple.
 
> Second, there's the extension the the buffer interface as of 1.5.2. This is
> again only available in C, and it allows C programmers to get an object _as an
> ASCII string_. This is meant for things like regexp modules, to access any
> "textual" object as an ASCII string. This is the getcharbuffer interface, and
> bound to the "t#" specifier in PyArg_ParseTuple.

Hm. So this is making a little more sense. So, there is a notion that
there are "textual" objects that want to provide a method for getting
their "text". How does this text differ from what you get from __str__
or __repr__?  

> Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports
> the functionality of the buffer interface to Python,

How so?  Maybe I'm at sea because I still don't get what the 
C buffer interface is for.

> but it does a bit more as
> well, because the buffer objects have a sort of copy-on-write semantics that
> means they may or may not be "attached" to a python object through the buffer
> interface.

What is this thing used for?

Where does the slot in tp_as_buffer come into all of this?

Why does this need to be a slot in the first place?
Are these "textual" objects really common? Is the presense of this
slot a flag for "textualness"?
 
It would help alot, at least for me, if there was a clearer
description of what motivates these things. What problems are
they trying to solve?  

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da at ski.org  Mon Aug 16 18:45:47 1999
From: da at ski.org (David Ascher)
Date: Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B8366A.82B305C7@digicool.com>
Message-ID: <Pine.WNT.4.04.9908160926440.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

> > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > again only available in C, and it allows C programmers to get an object _as an
> > ASCII string_. This is meant for things like regexp modules, to access any
> > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > bound to the "t#" specifier in PyArg_ParseTuple.
> 
> Hm. So this is making a little more sense. So, there is a notion that
> there are "textual" objects that want to provide a method for getting
> their "text". How does this text differ from what you get from __str__
> or __repr__?  

I'll let others give a well thought out rationale.  Here are some examples
of use which I think worthwile:

* Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
  objects fit this aspect of the buffer interface allows you to do regexp
  searches on it w/o ever building a twelve gigabyte PyString.

* Consider a non-contiguous NumPy array.  If the array type supported the
  multi-segment buffer interface, extension module writers could
  manipulate the data within this array w/o having to worry about the
  non-contiguous nature of the data.  They'd still have to worry about
  the multi-byte nature of the data, but it's still a win.  In other
  words, I think that the buffer interface could be useful even w/
  non-textual data.  

* If NumPy was modified to have arrays with data stored in buffer objects
  as opposed to the current "char *", and if PIL was modified to have
  images stored in buffer objects as opposed to whatever it uses, one
  could have arrays and images which shared data.  

I think all of these provide examples of motivations which are appealing
to at least some Python users. I make no claim that they motivate the
specific interface.  In all the cases I can think of, one or both of two
features are the key asset:

  - access to subset of huge data regions w/o creation of huge temporary
    variables.

  - sharing of data space.

Yes, it's a power tool, and as a such should come with safety goggles.
But then again, the same is true for ExtensionClasses =).

leaving-out-the-regexp-on-NumPy-arrays-example, 

   --david

PS: I take back the implicit suggestion that buffer() return read-write
    buffers when possible.  


From jim at digicool.com  Mon Aug 16 19:06:19 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 13:06:19 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908160926440.281-100000@rigoletto.ski.org>
Message-ID: <37B8450B.C5D308E4@digicool.com>

David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> > > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > > again only available in C, and it allows C programmers to get an object _as an
> > > ASCII string_. This is meant for things like regexp modules, to access any
> > > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > > bound to the "t#" specifier in PyArg_ParseTuple.
> >
> > Hm. So this is making a little more sense. So, there is a notion that
> > there are "textual" objects that want to provide a method for getting
> > their "text". How does this text differ from what you get from __str__
> > or __repr__?
> 
> I'll let others give a well thought out rationale. 

I eagerly await this. :)

> Here are some examples
> of use which I think worthwile:
> 
> * Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
>   objects fit this aspect of the buffer interface allows you to do regexp
>   searches on it w/o ever building a twelve gigabyte PyString.

This seems reasonable, if a bit exotic. :)

> * Consider a non-contiguous NumPy array.  If the array type supported the
>   multi-segment buffer interface, extension module writers could
>   manipulate the data within this array w/o having to worry about the
>   non-contiguous nature of the data.  They'd still have to worry about
>   the multi-byte nature of the data, but it's still a win.  In other
>   words, I think that the buffer interface could be useful even w/
>   non-textual data.

Why is this a good thing? Why should extension module writes 
worry abot the non-contiguous nature of the data now?  Does the NumPy
C API somehow expose this now?  Will multi-segment buffers make it
go away somehow?
 
> * If NumPy was modified to have arrays with data stored in buffer objects
>   as opposed to the current "char *", and if PIL was modified to have
>   images stored in buffer objects as opposed to whatever it uses, one
>   could have arrays and images which shared data.

Uh, and this would be a good thing? Maybe PIL should just be modified
to use NumPy arrays.
 
> I think all of these provide examples of motivations which are appealing
> to at least some Python users.

Perhaps, although Guido knows how they'd find out about them. ;)

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da at ski.org  Mon Aug 16 19:18:46 1999
From: da at ski.org (David Ascher)
Date: Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B8450B.C5D308E4@digicool.com>
Message-ID: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

>> [regexps on gigabyte files]
>
> This seems reasonable, if a bit exotic. :)

In the bioinformatics world, I think it's everyday stuff.

> Why is this a good thing? Why should extension module writes worry
> abot the non-contiguous nature of the data now?  Does the NumPy C API
> somehow expose this now?  Will multi-segment buffers make it go away
> somehow?

A NumPy extension module writer needs to create and modify NumPy arrays.
These arrays may be non-contiguous (if e.g. they are the result of
slicing).  The NumPy C API exposes the non-contiguous nature, but it's
hard enough to deal with it that I suspect most extension writers require
contiguous arrays, which means unnecessary copies.

Multi-segment buffers won't make the API go away necessarily (backwards
compatibility and all that), but it could make it unnecessary for many
extension writers.

> > * If NumPy was modified to have arrays with data stored in buffer objects
> >   as opposed to the current "char *", and if PIL was modified to have
> >   images stored in buffer objects as opposed to whatever it uses, one
> >   could have arrays and images which shared data.
> 
> Uh, and this would be a good thing? Maybe PIL should just be modified
> to use NumPy arrays.

Why?  PIL was designed for image processing, and made design decisions
appropriate to that domain.  NumPy was designed for multidimensional
numeric array processing, and made design decisions appropriate to that
domain. The intersection of interests exists (e.g. in the medical imaging
world), and I know people who spend a lot of their CPU time moving data
between images and arrays with "stupid" tostring/fromstring operations.  
Given the size of the images, it's a prodigious waste of time, and kills
the use of Python in many a project.

> Perhaps, although Guido knows how they'd find out about them. ;)

Uh?  These issues have been discussed in the NumPy/PIL world for a while,
with no solution in sight.  Recently, I and others saw mentions of buffers
in the source, and they seemed like a reasonable approach, which could be
done w/o a rewrite of either PIL or NumPy.  

Don't get me wrong -- I'm all for better documentation of the buffer
stuff, design guidelines, warnings and protocols.  I stated as much on
June 15:

  http://www.python.org/pipermail/python-dev/1999-June/000338.html


--david


From jim at digicool.com  Mon Aug 16 19:38:22 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 13:38:22 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>
Message-ID: <37B84C8E.46885C8E@digicool.com>

David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> >> [regexps on gigabyte files]
> >
> > This seems reasonable, if a bit exotic. :)
> 
> In the bioinformatics world, I think it's everyday stuff.

Right, in some (exotic ;) domains it's not exotic at all. 

> > Why is this a good thing? Why should extension module writes worry
> > abot the non-contiguous nature of the data now?  Does the NumPy C API
> > somehow expose this now?  Will multi-segment buffers make it go away
> > somehow?
> 
> A NumPy extension module writer needs to create and modify NumPy arrays.
> These arrays may be non-contiguous (if e.g. they are the result of
> slicing).  The NumPy C API exposes the non-contiguous nature, but it's
> hard enough to deal with it that I suspect most extension writers require
> contiguous arrays, which means unnecessary copies.

Hm. This sounds like an API problem to me.

> Multi-segment buffers won't make the API go away necessarily (backwards
> compatibility and all that), but it could make it unnecessary for many
> extension writers.

Multi-segment buffers don't make the mult-segmented nature of the
memory go away. Do they really simplify the API that much?

They seem to strip away an awful lot of information hiding.
 
> > > * If NumPy was modified to have arrays with data stored in buffer objects
> > >   as opposed to the current "char *", and if PIL was modified to have
> > >   images stored in buffer objects as opposed to whatever it uses, one
> > >   could have arrays and images which shared data.
> >
> > Uh, and this would be a good thing? Maybe PIL should just be modified
> > to use NumPy arrays.
> 
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

It seems to me that NumPy is sufficiently broad enogh to encompass
image processing.

My main concern is having two systems rely on some low-level "shared
memory" mechanism to achiev effiecient communication.
 
> > Perhaps, although Guido knows how they'd find out about them. ;)
> 
> Uh?  These issues have been discussed in the NumPy/PIL world for a while,
> with no solution in sight.  Recently, I and others saw mentions of buffers
> in the source, and they seemed like a reasonable approach, which could be
> done w/o a rewrite of either PIL or NumPy.

My point was that people would be lucky to find out about buffers or
about how to use them as things stand.

> Don't get me wrong -- I'm all for better documentation of the buffer
> stuff, design guidelines, warnings and protocols.  I stated as much on
> June 15:
> 
>   http://www.python.org/pipermail/python-dev/1999-June/000338.html

Yes, that was quite a jihad you launched. ;)

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da at ski.org  Mon Aug 16 20:25:54 1999
From: da at ski.org (David Ascher)
Date: Mon, 16 Aug 1999 11:25:54 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B84C8E.46885C8E@digicool.com>
Message-ID: <Pine.WNT.4.04.9908161047290.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

[ Aside:

  > It seems to me that NumPy is sufficiently broad enogh to encompass
  > image processing.

  Well, I'll just say that you could have been right, but w/ the current
  NumPy, I don't blame F/ for having developed his own data structures.  
  NumPy is messy, and some of its design decisions are wrong for image
  things (memory handling, casting rules, etc.).  It's all water under the
  bridge at this point.
]

Back to the main topic:

You say:

> [Multi-segment buffers] seem to strip away an awful lot of information
> hiding.

My impression of the buffer notion was that it is intended to *provide*
information hiding, by giving a simple API to byte arrays which could be
stored in various ways.  I do agree that whether those bytes should be
shared or not is a decision which should be weighted carefully.

> My main concern is having two systems rely on some low-level "shared
> memory" mechanism to achiev effiecient communication.

I don't particularly care about the specific buffer interface (the
low-level nature of which is what I think you object to). I do care about
having a well-defined mechanism for sharing memory between objects, and I
think there is value in defining such an interface generically.  Maybe the
notion of segmented arrays of bytes is too low-level, and instead we
should think of the data spaces as segmented arrays of chunks, where a
chunk can be one or more bytes?  Or do you object to any 'generic'
interface?

Just for fun, here's the list of things which either currently do or have
been talked about possibly in the future supporting some sort of buffer
interface, and my guesses as to chunk size, segmented status and
writeability):

  - strings  (1 byte, single-segment, r/o)
  - unicode strings (2 bytes, single-segment, r/o)
  - struct.pack() things (1 byte, single-segment,r/o)
  - arrays (1-4? bytes, single-segment, r/w)
  - NumPy arrays (1-8 bytes, multi-segment, r/w)
  - PIL images (1-? bytes, multi-segment, r/w)
  - CObjects (1-byte, single-segment, r/?)
  - mmapfiles (1-byte, multi-segment?, r/w)
  - non-python-owned memory (1-byte, single-segment, r/w)

--david


From jack at oratrix.nl  Mon Aug 16 21:36:40 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 21:36:40 +0200
Subject: [Python-Dev] Buffer interface and multiple threads
Message-ID: <19990816193645.9E5B5CF320@oratrix.oratrix.nl>

Hmm, something that just struck me: the buffer _interface_ (i.e. the C 
routines, not the buffer object stuff) is potentially thread-unsafe.

In the "old world", where "s#" only worked on string objects, you
could be sure that the C pointer returned remained valid as long as
you had a reference to the python string object in hand, as strings
are immutable.

In the "new world", where "s#" also works on, say, array objects, this 
doesn't hold anymore. So, potentially, while one thread is in a
write() system call writing the contents of the array to a file
another thread could come in and change the data.

Is this a problem?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Mon Aug 16 22:22:12 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 22:22:12 +0200
Subject: [Python-Dev] New htmlentitydefs.py file
Message-ID: <37B872F4.1C3F5D39@lemburg.com>

Attached you find a new HTML entity definitions file taken and
parsed from:

    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLlat1.ent
    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLsymbol.ent
    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLspecial.ent
 
The latter two contain Unicode charcodes which obviously cannot
(yet) be mapped to Unicode strings... perhaps Fredrik wants
to include a spiced up version in with his Unicode type.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
"""
    Entity definitions for HTML4.0. Taken and parsed from:
        http://www.w3.org/TR/1998/REC-html40/HTMLlat1.ent
        http://www.w3.org/TR/1998/REC-html40/HTMLsymbol.ent
        http://www.w3.org/TR/1998/REC-html40/HTMLspecial.ent
"""

entitydefs = {
    'AElig':	chr(198),	# latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1
    'Aacute':	chr(193),	# latin capital letter A with acute, U+00C1 ISOlat1
    'Acirc':	chr(194),	# latin capital letter A with circumflex, U+00C2 ISOlat1
    'Agrave':	chr(192),	# latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1
    'Alpha':	'&#913;',	# greek capital letter alpha, U+0391
    'Aring':	chr(197),	# latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1
    'Atilde':	chr(195),	# latin capital letter A with tilde, U+00C3 ISOlat1
    'Auml':	chr(196),	# latin capital letter A with diaeresis, U+00C4 ISOlat1
    'Beta':	'&#914;',	# greek capital letter beta, U+0392
    'Ccedil':	chr(199),	# latin capital letter C with cedilla, U+00C7 ISOlat1
    'Chi':	'&#935;',	# greek capital letter chi, U+03A7
    'Dagger':	'&#8225;',	# double dagger, U+2021 ISOpub
    'Delta':	'&#916;',	# greek capital letter delta, U+0394 ISOgrk3
    'ETH':	chr(208),	# latin capital letter ETH, U+00D0 ISOlat1
    'Eacute':	chr(201),	# latin capital letter E with acute, U+00C9 ISOlat1
    'Ecirc':	chr(202),	# latin capital letter E with circumflex, U+00CA ISOlat1
    'Egrave':	chr(200),	# latin capital letter E with grave, U+00C8 ISOlat1
    'Epsilon':	'&#917;',	# greek capital letter epsilon, U+0395
    'Eta':	'&#919;',	# greek capital letter eta, U+0397
    'Euml':	chr(203),	# latin capital letter E with diaeresis, U+00CB ISOlat1
    'Gamma':	'&#915;',	# greek capital letter gamma, U+0393 ISOgrk3
    'Iacute':	chr(205),	# latin capital letter I with acute, U+00CD ISOlat1
    'Icirc':	chr(206),	# latin capital letter I with circumflex, U+00CE ISOlat1
    'Igrave':	chr(204),	# latin capital letter I with grave, U+00CC ISOlat1
    'Iota':	'&#921;',	# greek capital letter iota, U+0399
    'Iuml':	chr(207),	# latin capital letter I with diaeresis, U+00CF ISOlat1
    'Kappa':	'&#922;',	# greek capital letter kappa, U+039A
    'Lambda':	'&#923;',	# greek capital letter lambda, U+039B ISOgrk3
    'Mu':	'&#924;',	# greek capital letter mu, U+039C
    'Ntilde':	chr(209),	# latin capital letter N with tilde, U+00D1 ISOlat1
    'Nu':	'&#925;',	# greek capital letter nu, U+039D
    'Oacute':	chr(211),	# latin capital letter O with acute, U+00D3 ISOlat1
    'Ocirc':	chr(212),	# latin capital letter O with circumflex, U+00D4 ISOlat1
    'Ograve':	chr(210),	# latin capital letter O with grave, U+00D2 ISOlat1
    'Omega':	'&#937;',	# greek capital letter omega, U+03A9 ISOgrk3
    'Omicron':	'&#927;',	# greek capital letter omicron, U+039F
    'Oslash':	chr(216),	# latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1
    'Otilde':	chr(213),	# latin capital letter O with tilde, U+00D5 ISOlat1
    'Ouml':	chr(214),	# latin capital letter O with diaeresis, U+00D6 ISOlat1
    'Phi':	'&#934;',	# greek capital letter phi, U+03A6 ISOgrk3
    'Pi':	'&#928;',	# greek capital letter pi, U+03A0 ISOgrk3
    'Prime':	'&#8243;',	# double prime = seconds = inches, U+2033 ISOtech
    'Psi':	'&#936;',	# greek capital letter psi, U+03A8 ISOgrk3
    'Rho':	'&#929;',	# greek capital letter rho, U+03A1
    'Sigma':	'&#931;',	# greek capital letter sigma, U+03A3 ISOgrk3
    'THORN':	chr(222),	# latin capital letter THORN, U+00DE ISOlat1
    'Tau':	'&#932;',	# greek capital letter tau, U+03A4
    'Theta':	'&#920;',	# greek capital letter theta, U+0398 ISOgrk3
    'Uacute':	chr(218),	# latin capital letter U with acute, U+00DA ISOlat1
    'Ucirc':	chr(219),	# latin capital letter U with circumflex, U+00DB ISOlat1
    'Ugrave':	chr(217),	# latin capital letter U with grave, U+00D9 ISOlat1
    'Upsilon':	'&#933;',	# greek capital letter upsilon, U+03A5 ISOgrk3
    'Uuml':	chr(220),	# latin capital letter U with diaeresis, U+00DC ISOlat1
    'Xi':	'&#926;',	# greek capital letter xi, U+039E ISOgrk3
    'Yacute':	chr(221),	# latin capital letter Y with acute, U+00DD ISOlat1
    'Zeta':	'&#918;',	# greek capital letter zeta, U+0396
    'aacute':	chr(225),	# latin small letter a with acute, U+00E1 ISOlat1
    'acirc':	chr(226),	# latin small letter a with circumflex, U+00E2 ISOlat1
    'acute':	chr(180),	# acute accent = spacing acute, U+00B4 ISOdia
    'aelig':	chr(230),	# latin small letter ae = latin small ligature ae, U+00E6 ISOlat1
    'agrave':	chr(224),	# latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1
    'alefsym':	'&#8501;',	# alef symbol = first transfinite cardinal, U+2135 NEW
    'alpha':	'&#945;',	# greek small letter alpha, U+03B1 ISOgrk3
    'and':	'&#8743;',	# logical and = wedge, U+2227 ISOtech
    'ang':	'&#8736;',	# angle, U+2220 ISOamso
    'aring':	chr(229),	# latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1
    'asymp':	'&#8776;',	# almost equal to = asymptotic to, U+2248 ISOamsr
    'atilde':	chr(227),	# latin small letter a with tilde, U+00E3 ISOlat1
    'auml':	chr(228),	# latin small letter a with diaeresis, U+00E4 ISOlat1
    'bdquo':	'&#8222;',	# double low-9 quotation mark, U+201E NEW
    'beta':	'&#946;',	# greek small letter beta, U+03B2 ISOgrk3
    'brvbar':	chr(166),	# broken bar = broken vertical bar, U+00A6 ISOnum
    'bull':	'&#8226;',	# bullet = black small circle, U+2022 ISOpub
    'cap':	'&#8745;',	# intersection = cap, U+2229 ISOtech
    'ccedil':	chr(231),	# latin small letter c with cedilla, U+00E7 ISOlat1
    'cedil':	chr(184),	# cedilla = spacing cedilla, U+00B8 ISOdia
    'cent':	chr(162),	# cent sign, U+00A2 ISOnum
    'chi':	'&#967;',	# greek small letter chi, U+03C7 ISOgrk3
    'clubs':	'&#9827;',	# black club suit = shamrock, U+2663 ISOpub
    'cong':	'&#8773;',	# approximately equal to, U+2245 ISOtech
    'copy':	chr(169),	# copyright sign, U+00A9 ISOnum
    'crarr':	'&#8629;',	# downwards arrow with corner leftwards = carriage return, U+21B5 NEW
    'cup':	'&#8746;',	# union = cup, U+222A ISOtech
    'curren':	chr(164),	# currency sign, U+00A4 ISOnum
    'dArr':	'&#8659;',	# downwards double arrow, U+21D3 ISOamsa
    'dagger':	'&#8224;',	# dagger, U+2020 ISOpub
    'darr':	'&#8595;',	# downwards arrow, U+2193 ISOnum
    'deg':	chr(176),	# degree sign, U+00B0 ISOnum
    'delta':	'&#948;',	# greek small letter delta, U+03B4 ISOgrk3
    'diams':	'&#9830;',	# black diamond suit, U+2666 ISOpub
    'divide':	chr(247),	# division sign, U+00F7 ISOnum
    'eacute':	chr(233),	# latin small letter e with acute, U+00E9 ISOlat1
    'ecirc':	chr(234),	# latin small letter e with circumflex, U+00EA ISOlat1
    'egrave':	chr(232),	# latin small letter e with grave, U+00E8 ISOlat1
    'empty':	'&#8709;',	# empty set = null set = diameter, U+2205 ISOamso
    'emsp':	'&#8195;',	# em space, U+2003 ISOpub
    'ensp':	'&#8194;',	# en space, U+2002 ISOpub
    'epsilon':	'&#949;',	# greek small letter epsilon, U+03B5 ISOgrk3
    'equiv':	'&#8801;',	# identical to, U+2261 ISOtech
    'eta':	'&#951;',	# greek small letter eta, U+03B7 ISOgrk3
    'eth':	chr(240),	# latin small letter eth, U+00F0 ISOlat1
    'euml':	chr(235),	# latin small letter e with diaeresis, U+00EB ISOlat1
    'exist':	'&#8707;',	# there exists, U+2203 ISOtech
    'fnof':	'&#402;',	# latin small f with hook = function = florin, U+0192 ISOtech
    'forall':	'&#8704;',	# for all, U+2200 ISOtech
    'frac12':	chr(189),	# vulgar fraction one half = fraction one half, U+00BD ISOnum
    'frac14':	chr(188),	# vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum
    'frac34':	chr(190),	# vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum
    'frasl':	'&#8260;',	# fraction slash, U+2044 NEW
    'gamma':	'&#947;',	# greek small letter gamma, U+03B3 ISOgrk3
    'ge':	'&#8805;',	# greater-than or equal to, U+2265 ISOtech
    'hArr':	'&#8660;',	# left right double arrow, U+21D4 ISOamsa
    'harr':	'&#8596;',	# left right arrow, U+2194 ISOamsa
    'hearts':	'&#9829;',	# black heart suit = valentine, U+2665 ISOpub
    'hellip':	'&#8230;',	# horizontal ellipsis = three dot leader, U+2026 ISOpub
    'iacute':	chr(237),	# latin small letter i with acute, U+00ED ISOlat1
    'icirc':	chr(238),	# latin small letter i with circumflex, U+00EE ISOlat1
    'iexcl':	chr(161),	# inverted exclamation mark, U+00A1 ISOnum
    'igrave':	chr(236),	# latin small letter i with grave, U+00EC ISOlat1
    'image':	'&#8465;',	# blackletter capital I = imaginary part, U+2111 ISOamso
    'infin':	'&#8734;',	# infinity, U+221E ISOtech
    'int':	'&#8747;',	# integral, U+222B ISOtech
    'iota':	'&#953;',	# greek small letter iota, U+03B9 ISOgrk3
    'iquest':	chr(191),	# inverted question mark = turned question mark, U+00BF ISOnum
    'isin':	'&#8712;',	# element of, U+2208 ISOtech
    'iuml':	chr(239),	# latin small letter i with diaeresis, U+00EF ISOlat1
    'kappa':	'&#954;',	# greek small letter kappa, U+03BA ISOgrk3
    'lArr':	'&#8656;',	# leftwards double arrow, U+21D0 ISOtech
    'lambda':	'&#955;',	# greek small letter lambda, U+03BB ISOgrk3
    'lang':	'&#9001;',	# left-pointing angle bracket = bra, U+2329 ISOtech
    'laquo':	chr(171),	# left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum
    'larr':	'&#8592;',	# leftwards arrow, U+2190 ISOnum
    'lceil':	'&#8968;',	# left ceiling = apl upstile, U+2308 ISOamsc
    'ldquo':	'&#8220;',	# left double quotation mark, U+201C ISOnum
    'le':	'&#8804;',	# less-than or equal to, U+2264 ISOtech
    'lfloor':	'&#8970;',	# left floor = apl downstile, U+230A ISOamsc
    'lowast':	'&#8727;',	# asterisk operator, U+2217 ISOtech
    'loz':	'&#9674;',	# lozenge, U+25CA ISOpub
    'lrm':	'&#8206;',	# left-to-right mark, U+200E NEW RFC 2070
    'lsaquo':	'&#8249;',	# single left-pointing angle quotation mark, U+2039 ISO proposed
    'lsquo':	'&#8216;',	# left single quotation mark, U+2018 ISOnum
    'macr':	chr(175),	# macron = spacing macron = overline = APL overbar, U+00AF ISOdia
    'mdash':	'&#8212;',	# em dash, U+2014 ISOpub
    'micro':	chr(181),	# micro sign, U+00B5 ISOnum
    'middot':	chr(183),	# middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum
    'minus':	'&#8722;',	# minus sign, U+2212 ISOtech
    'mu':	'&#956;',	# greek small letter mu, U+03BC ISOgrk3
    'nabla':	'&#8711;',	# nabla = backward difference, U+2207 ISOtech
    'nbsp':	chr(160),	# no-break space = non-breaking space, U+00A0 ISOnum
    'ndash':	'&#8211;',	# en dash, U+2013 ISOpub
    'ne':	'&#8800;',	# not equal to, U+2260 ISOtech
    'ni':	'&#8715;',	# contains as member, U+220B ISOtech
    'not':	chr(172),	# not sign, U+00AC ISOnum
    'notin':	'&#8713;',	# not an element of, U+2209 ISOtech
    'nsub':	'&#8836;',	# not a subset of, U+2284 ISOamsn
    'ntilde':	chr(241),	# latin small letter n with tilde, U+00F1 ISOlat1
    'nu':	'&#957;',	# greek small letter nu, U+03BD ISOgrk3
    'oacute':	chr(243),	# latin small letter o with acute, U+00F3 ISOlat1
    'ocirc':	chr(244),	# latin small letter o with circumflex, U+00F4 ISOlat1
    'ograve':	chr(242),	# latin small letter o with grave, U+00F2 ISOlat1
    'oline':	'&#8254;',	# overline = spacing overscore, U+203E NEW
    'omega':	'&#969;',	# greek small letter omega, U+03C9 ISOgrk3
    'omicron':	'&#959;',	# greek small letter omicron, U+03BF NEW
    'oplus':	'&#8853;',	# circled plus = direct sum, U+2295 ISOamsb
    'or':	'&#8744;',	# logical or = vee, U+2228 ISOtech
    'ordf':	chr(170),	# feminine ordinal indicator, U+00AA ISOnum
    'ordm':	chr(186),	# masculine ordinal indicator, U+00BA ISOnum
    'oslash':	chr(248),	# latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1
    'otilde':	chr(245),	# latin small letter o with tilde, U+00F5 ISOlat1
    'otimes':	'&#8855;',	# circled times = vector product, U+2297 ISOamsb
    'ouml':	chr(246),	# latin small letter o with diaeresis, U+00F6 ISOlat1
    'para':	chr(182),	# pilcrow sign = paragraph sign, U+00B6 ISOnum
    'part':	'&#8706;',	# partial differential, U+2202 ISOtech
    'permil':	'&#8240;',	# per mille sign, U+2030 ISOtech
    'perp':	'&#8869;',	# up tack = orthogonal to = perpendicular, U+22A5 ISOtech
    'phi':	'&#966;',	# greek small letter phi, U+03C6 ISOgrk3
    'pi':	'&#960;',	# greek small letter pi, U+03C0 ISOgrk3
    'piv':	'&#982;',	# greek pi symbol, U+03D6 ISOgrk3
    'plusmn':	chr(177),	# plus-minus sign = plus-or-minus sign, U+00B1 ISOnum
    'pound':	chr(163),	# pound sign, U+00A3 ISOnum
    'prime':	'&#8242;',	# prime = minutes = feet, U+2032 ISOtech
    'prod':	'&#8719;',	# n-ary product = product sign, U+220F ISOamsb
    'prop':	'&#8733;',	# proportional to, U+221D ISOtech
    'psi':	'&#968;',	# greek small letter psi, U+03C8 ISOgrk3
    'rArr':	'&#8658;',	# rightwards double arrow, U+21D2 ISOtech
    'radic':	'&#8730;',	# square root = radical sign, U+221A ISOtech
    'rang':	'&#9002;',	# right-pointing angle bracket = ket, U+232A ISOtech
    'raquo':	chr(187),	# right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum
    'rarr':	'&#8594;',	# rightwards arrow, U+2192 ISOnum
    'rceil':	'&#8969;',	# right ceiling, U+2309 ISOamsc
    'rdquo':	'&#8221;',	# right double quotation mark, U+201D ISOnum
    'real':	'&#8476;',	# blackletter capital R = real part symbol, U+211C ISOamso
    'reg':	chr(174),	# registered sign = registered trade mark sign, U+00AE ISOnum
    'rfloor':	'&#8971;',	# right floor, U+230B ISOamsc
    'rho':	'&#961;',	# greek small letter rho, U+03C1 ISOgrk3
    'rlm':	'&#8207;',	# right-to-left mark, U+200F NEW RFC 2070
    'rsaquo':	'&#8250;',	# single right-pointing angle quotation mark, U+203A ISO proposed
    'rsquo':	'&#8217;',	# right single quotation mark, U+2019 ISOnum
    'sbquo':	'&#8218;',	# single low-9 quotation mark, U+201A NEW
    'sdot':	'&#8901;',	# dot operator, U+22C5 ISOamsb
    'sect':	chr(167),	# section sign, U+00A7 ISOnum
    'shy':	chr(173),	# soft hyphen = discretionary hyphen, U+00AD ISOnum
    'sigma':	'&#963;',	# greek small letter sigma, U+03C3 ISOgrk3
    'sigmaf':	'&#962;',	# greek small letter final sigma, U+03C2 ISOgrk3
    'sim':	'&#8764;',	# tilde operator = varies with = similar to, U+223C ISOtech
    'spades':	'&#9824;',	# black spade suit, U+2660 ISOpub
    'sub':	'&#8834;',	# subset of, U+2282 ISOtech
    'sube':	'&#8838;',	# subset of or equal to, U+2286 ISOtech
    'sum':	'&#8721;',	# n-ary sumation, U+2211 ISOamsb
    'sup':	'&#8835;',	# superset of, U+2283 ISOtech
    'sup1':	chr(185),	# superscript one = superscript digit one, U+00B9 ISOnum
    'sup2':	chr(178),	# superscript two = superscript digit two = squared, U+00B2 ISOnum
    'sup3':	chr(179),	# superscript three = superscript digit three = cubed, U+00B3 ISOnum
    'supe':	'&#8839;',	# superset of or equal to, U+2287 ISOtech
    'szlig':	chr(223),	# latin small letter sharp s = ess-zed, U+00DF ISOlat1
    'tau':	'&#964;',	# greek small letter tau, U+03C4 ISOgrk3
    'there4':	'&#8756;',	# therefore, U+2234 ISOtech
    'theta':	'&#952;',	# greek small letter theta, U+03B8 ISOgrk3
    'thetasym':	'&#977;',	# greek small letter theta symbol, U+03D1 NEW
    'thinsp':	'&#8201;',	# thin space, U+2009 ISOpub
    'thorn':	chr(254),	# latin small letter thorn with, U+00FE ISOlat1
    'times':	chr(215),	# multiplication sign, U+00D7 ISOnum
    'trade':	'&#8482;',	# trade mark sign, U+2122 ISOnum
    'uArr':	'&#8657;',	# upwards double arrow, U+21D1 ISOamsa
    'uacute':	chr(250),	# latin small letter u with acute, U+00FA ISOlat1
    'uarr':	'&#8593;',	# upwards arrow, U+2191 ISOnum
    'ucirc':	chr(251),	# latin small letter u with circumflex, U+00FB ISOlat1
    'ugrave':	chr(249),	# latin small letter u with grave, U+00F9 ISOlat1
    'uml':	chr(168),	# diaeresis = spacing diaeresis, U+00A8 ISOdia
    'upsih':	'&#978;',	# greek upsilon with hook symbol, U+03D2 NEW
    'upsilon':	'&#965;',	# greek small letter upsilon, U+03C5 ISOgrk3
    'uuml':	chr(252),	# latin small letter u with diaeresis, U+00FC ISOlat1
    'weierp':	'&#8472;',	# script capital P = power set = Weierstrass p, U+2118 ISOamso
    'xi':	'&#958;',	# greek small letter xi, U+03BE ISOgrk3
    'yacute':	chr(253),	# latin small letter y with acute, U+00FD ISOlat1
    'yen':	chr(165),	# yen sign = yuan sign, U+00A5 ISOnum
    'yuml':	chr(255),	# latin small letter y with diaeresis, U+00FF ISOlat1
    'zeta':	'&#950;',	# greek small letter zeta, U+03B6 ISOgrk3
    'zwj':	'&#8205;',	# zero width joiner, U+200D NEW RFC 2070
    'zwnj':	'&#8204;',	# zero width non-joiner, U+200C NEW RFC 2070

}

From tim_one at email.msn.com  Tue Aug 17 09:30:17 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 17 Aug 1999 03:30:17 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37B8042B.21DE6053@lemburg.com>
Message-ID: <000001bee882$5b7d8da0$112d2399@tim>

[about weakdicts and the possibility of building them on weak
 references; the obvious way doesn't clean up the dict itself by
 magic; maybe a weak object should be notified when its referent
 goes away
]

[M.-A. Lemburg]
> Perhaps one could fiddle something out of the Proxy objects
> in mxProxy (you know where...). These support a special __cleanup__
> protocol that I use a lot to work around circular garbage:
> the __cleanup__ method of the referenced object is called prior
> to destroying the proxy; even if the reference count on the
> object has not yet gone down to 0.
>
> This makes direct circles possible without problems: the parent
> can reference a child through the proxy and the child can reference the
> parent directly.

What you just wrote is:

    parent --> proxy --> child -->+
    ^                             v
    +<----------------------------+

Looks like a plain old cycle to me!

> As soon as the parent is cleaned up, the reference to
> the proxy is deleted which then automagically makes the
> back reference in the child disappear, allowing the parent
> to be deallocated after cleanup without leaving a circular
> reference around.

M-A, this is making less sense by the paragraph <wink>:  skipping the
middle, this says "as soon as the parent is cleaned up ... allowing the
parent to be deallocated after cleanup".  If we presume that the parent gets
cleaned up explicitly (since the reference from the child is keeping it
alive, it's not going to get cleaned up by magic, right?), then the parent
could just as well call the __cleanup__ methods of the things it references
directly without bothering with a proxy.  For that matter, if it's the
straightforward

    parent <-> child

kind of cycle, the parent's cleanup method can just do

    self.__dict__.clear()

and the cycle is broken without writing a __cleanup__ method anywhere
(that's what I usually do, and in this kind of cycle that clears the last
reference to the child, which then goes away, which in turn automagically
clears its back reference to the parent).

So, offhand, I don't see that the proxy protocol could help here.  In a
sense, what's really needed is the opposite:  notifying the *proxy* when the
*real* object goes away (which makes no sense in the context of what your
proxy objects were designed to do).

[about Java and its four reference strengths]

Found a good introductory writeup at (sorry, my mailer will break this URL,
so I'll break it myself at a sensible place):

http://developer.java.sun.com/developer/
    technicalArticles//ALT/RefObj/index.html

They have a class for each of the three "not strong" flavors of references.
For all three you pass the referenced object to the constructor, and all
three accept (optional in two of the flavors) a second ReferenceQueue
argument.  In the latter case, when the referenced object goes away the
weak/soft/phantom-ref proxy object is placed on the queue.  Which, in turn,
is a thread-safe queue with various put, get, and timeout-limited polling
functions.  So you have to write code to look at the queue from time to
time, to find the proxies whose referents have gone away.

The three flavors may (or may not ...) have these motivations:

soft:  an object reachable at strongest by soft references can go away at
any time, but the garbage collector strives to keep it intact until it can't
find any other way to get enough memory

weak:  an object reachable at strongest by weak references can go away at
any time, and the collector makes no attempt to delay its death

phantom:  an object reachable at strongest by phantom references can get
*finalized* at any time, but won't get *deallocated* before its phantom
proxy does something or other (goes away? wasn't clear).  This is the flavor
that requires passing a queue argument to the constructor.  Seems to be a
major hack to worm around Java's notorious problems with order of
finalization -- along the lines that you give phantom referents trivial
finalizers, and put the real cleanup logic in the phantom proxy.  This lets
your program take responsibility for running the real cleanup code in the
order-- and in the thread! --where it makes sense.

Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with
under-the-cover weak keys (unlike Dieter's flavor with weak values), and
where the key+value pairs vanish by magic when the key object goes away.
The details and the implementation of these guys waren't clear to me, but
then I didn't download the code, just scanned the online docs.


Ah, a correction to my last post:

class _Weak:
    ...
    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

Root of all evil:  this method is useless, since the id2weak dict keeps each
_Weak object alive until its referent goes away (at which time self.id gets
set to None, so _Weak.__del__ doesn't do anything).  Even if it did do
something, it's no cheaper to do it here than in the systemt cleanup code
("greater expense" was wrong).

weakly y'rs  - tim


PS:  Ooh!  Ooh!  Fellow at work today was whining about weakdicts, and
called them "limp dicts".  I'm not entirely sure it was an innocent Freudian
slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds
like American slang for "flaccid one-eyed trouser snake" ...).


From fredrik at pythonware.com  Tue Aug 17 09:23:03 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 17 Aug 1999 09:23:03 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>
Message-ID: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com>

David Ascher <da at ski.org> wrote:
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.  
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

as an aside, PIL 1.1 (*) introduces "virtual image memories" which
are, as I mentioned in an earlier post, accessed via an API rather
than via direct pointers.  it'll also include an adapter allowing you
to use NumPy objects as image memories.

unfortunately, the buffer interface is not good enough to use
on top of the virtual image memory interface...

</F>

*) 1.1 is our current development thread, which will be
released to plus customers in a number of weeks...


From mal at lemburg.com  Tue Aug 17 10:50:01 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 17 Aug 1999 10:50:01 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000001bee882$5b7d8da0$112d2399@tim>
Message-ID: <37B92239.4076841E@lemburg.com>

Tim Peters wrote:
> 
> [about weakdicts and the possibility of building them on weak
>  references; the obvious way doesn't clean up the dict itself by
>  magic; maybe a weak object should be notified when its referent
>  goes away
> ]
> 
> [M.-A. Lemburg]
> > Perhaps one could fiddle something out of the Proxy objects
> > in mxProxy (you know where...). These support a special __cleanup__
> > protocol that I use a lot to work around circular garbage:
> > the __cleanup__ method of the referenced object is called prior
> > to destroying the proxy; even if the reference count on the
> > object has not yet gone down to 0.
> >
> > This makes direct circles possible without problems: the parent
> > can reference a child through the proxy and the child can reference the
> > parent directly.
> 
> What you just wrote is:
> 
>     parent --> proxy --> child -->+
>     ^                             v
>     +<----------------------------+
> 
> Looks like a plain old cycle to me!

Sure :-) That was the intention. I'm using this to implement
acquisition without turning to ExtensionClasses. [Nice picture, BTW]
 
> > As soon as the parent is cleaned up, the reference to
> > the proxy is deleted which then automagically makes the
> > back reference in the child disappear, allowing the parent
> > to be deallocated after cleanup without leaving a circular
> > reference around.
> 
> M-A, this is making less sense by the paragraph <wink>:  skipping the
> middle, this says "as soon as the parent is cleaned up ... allowing the
> parent to be deallocated after cleanup".  If we presume that the parent gets
> cleaned up explicitly (since the reference from the child is keeping it
> alive, it's not going to get cleaned up by magic, right?), then the parent
> could just as well call the __cleanup__ methods of the things it references
> directly without bothering with a proxy.  For that matter, if it's the
> straightforward
> 
>     parent <-> child
> 
> kind of cycle, the parent's cleanup method can just do
> 
>     self.__dict__.clear()
> 
> and the cycle is broken without writing a __cleanup__ method anywhere
> (that's what I usually do, and in this kind of cycle that clears the last
> reference to the child, which then goes away, which in turn automagically
> clears its back reference to the parent).
> 
> So, offhand, I don't see that the proxy protocol could help here.  In a
> sense, what's really needed is the opposite:  notifying the *proxy* when the
> *real* object goes away (which makes no sense in the context of what your
> proxy objects were designed to do).

All true :-). The nice thing about the proxy is that it takes
care of the process automagically. And yes, the parent is used
via a proxy too. So the picture looks like this:

--> proxy --> parent --> proxy --> child -->+
              ^                             v
              +<----------------------------+

Since the proxy isn't noticed by the referencing objects (well, at
least if they don't fiddle with internals), the picture for the
objects looks like this:

--> parent --> child -->+
    ^                   v
    +<------------------+

You could of course do the same via explicit invokation of
the __cleanup__ method, but the object references involved could be
hidden in some other structure, so they might be hard to find.

And there's another feature about Proxies (as defined in mxProxy):
they allow you to control access in a much more strict way than
Python does. You can actually hide attributes and methods you
don't want exposed in a way that doesn't even let you access them
via some dict or pass me the frame object trick. This is very useful
when you program multi-user application host servers where you don't
want users to access internal structures of the server.

> [about Java and its four reference strengths]
> 
> Found a good introductory writeup at (sorry, my mailer will break this URL,
> so I'll break it myself at a sensible place):
> 
> http://developer.java.sun.com/developer/
>     technicalArticles//ALT/RefObj/index.html

Thanks for the reference... and for the summary ;-)
 
> They have a class for each of the three "not strong" flavors of references.
> For all three you pass the referenced object to the constructor, and all
> three accept (optional in two of the flavors) a second ReferenceQueue
> argument.  In the latter case, when the referenced object goes away the
> weak/soft/phantom-ref proxy object is placed on the queue.  Which, in turn,
> is a thread-safe queue with various put, get, and timeout-limited polling
> functions.  So you have to write code to look at the queue from time to
> time, to find the proxies whose referents have gone away.
> 
> The three flavors may (or may not ...) have these motivations:
> 
> soft:  an object reachable at strongest by soft references can go away at
> any time, but the garbage collector strives to keep it intact until it can't
> find any other way to get enough memory

So there is a possibility of reviving these objects, right ? 

I've just recently added a hackish function to my mxTools which allows
me to regain access to objects via their address (no, not thread safe,
not even necessarily correct). 
 
sys.makeref(id) 
         Provided that id is a valid address of a Python object (id(object) returns this address),
         this function returns a new reference to it. Only objects that are "alive" can be referenced
         this way, ones with zero reference count cause an exception to be raised. 

         You can use this function to reaccess objects lost during garbage collection.

         USE WITH CARE: this is an expert-only function since it can cause instant core dumps and
         many other strange things -- even ruin your system if you don't know what you're doing ! 

         SECURITY WARNING: This function can provide you with access to objects that are
         otherwise not visible, e.g. in restricted mode, and thus be a potential security hole. 

I use it for tracking objects via id-key based dictionary and
hooks in the create/del mechanisms of Python instances. It helps
finding those memory eating cycles. 

> weak:  an object reachable at strongest by weak references can go away at
> any time, and the collector makes no attempt to delay its death
> 
> phantom:  an object reachable at strongest by phantom references can get
> *finalized* at any time, but won't get *deallocated* before its phantom
> proxy does something or other (goes away? wasn't clear).  This is the flavor
> that requires passing a queue argument to the constructor.  Seems to be a
> major hack to worm around Java's notorious problems with order of
> finalization -- along the lines that you give phantom referents trivial
> finalizers, and put the real cleanup logic in the phantom proxy.  This lets
> your program take responsibility for running the real cleanup code in the
> order-- and in the thread! --where it makes sense.

Wouldn't these flavors be possible using the following setup ? Note
that it's quite similar to your _Weak class except that I use a
proxy without the need to first get a strong reference for the
object and that it doesn't use a weak bit.

--> proxy --> object
                ^
                |
         all_managed_objects

all_managed_objects is a dictionary indexed by address (its id)
and keeps a strong reference to the objects. The proxy does
not keep a strong reference to the object, but only the address
as integer and checks the ref-count on the object in the
all_managed_objects dictionary prior to every dereferencing
action. In case this refcount falls down to 1 (only the
all_managed_objects dict references it), the proxy takes
appropriate action, e.g. raises an exceptions and deletes
the reference in all_managed_objects to mimic a weak reference.
The same check is done prior to garbage collection of the
proxy.

Add to this some queues, pepper and salt and place it in an
oven at 220? for 20 minutes... plus take a look every 10 seconds
or so...

The downside is obvious: the zombified object will not get inspected
(and then GCed) until the next time a weak reference to it is used.

> Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with
> under-the-cover weak keys (unlike Dieter's flavor with weak values), and
> where the key+value pairs vanish by magic when the key object goes away.
> The details and the implementation of these guys waren't clear to me, but
> then I didn't download the code, just scanned the online docs.

Would the above help in creating such beasts ?
 
> Ah, a correction to my last post:
> 
> class _Weak:
>     ...
>     def __del__(self):
>         # this is purely an optimization:  if self gets nuked,
>         # exempt its referent from greater expense when *it*
>         # dies
>         if self.id is not None:
>             __clear_weak_bit(__id2obj(self.id))
>             del id2weak[self.id]
> 
> Root of all evil:  this method is useless, since the id2weak dict keeps each
> _Weak object alive until its referent goes away (at which time self.id gets
> set to None, so _Weak.__del__ doesn't do anything).  Even if it did do
> something, it's no cheaper to do it here than in the systemt cleanup code
> ("greater expense" was wrong).
> 
> weakly y'rs  - tim
> 
> PS:  Ooh!  Ooh!  Fellow at work today was whining about weakdicts, and
> called them "limp dicts".  I'm not entirely sure it was an innocent Freudian
> slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds
> like American slang for "flaccid one-eyed trouser snake" ...).

:-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   136 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond at skippinet.com.au  Tue Aug 17 18:05:40 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed, 18 Aug 1999 02:05:40 +1000
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com>
Message-ID: <000901bee8ca$5ceff4a0$1101a8c0@bobcat>

Fredrik,
	Care to elaborate?  Statements like "buffer interface needs a redesign" or
"the buffer interface is not good enough to use on top of the virtual image
memory interface" really only give me the impression you have a bee in your
bonnet over these buffer interfaces.

If you could actually stretch these statements out to provide even _some_
background, problem statement or potential solution it would help.  All I
know is "Fredrik doesnt like it for some unexplained reason".  You found an
issue with array reallocation - great - but thats a bug rather than a
design flaw.

Can you tell us why its not good enough, and an off-the-cuff design that
would solve it?  Or are you suggesting it is unsolvable?  I really dont
have a clue what your issue is.  Jim (for example) has made his position
and reasoning clear.  You have only made your position clear, but your
reasoning is still a mystery.

Mark.

>
> unfortunately, the buffer interface is not good enough to use
> on top of the virtual image memory interface...


From fredrik at pythonware.com  Tue Aug 17 18:48:31 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 17 Aug 1999 18:48:31 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <000901bee8ca$5ceff4a0$1101a8c0@bobcat>
Message-ID: <005201bee8d0$9b4737d0$f29b12c2@secret.pythonware.com>

> Care to elaborate?  Statements like "buffer interface needs a redesign" or
> "the buffer interface is not good enough to use on top of the virtual image
> memory interface" really only give me the impression you have a bee in your
> bonnet over these buffer interfaces.

re "good enough":
http://www.python.org/pipermail/python-dev/1999-August/000650.html

re "needs a redesign":
http://www.python.org/pipermail/python-dev/1999-August/000659.html
and to some extent:
http://www.python.org/pipermail/python-dev/1999-August/000658.html

> Jim (for example) has made his position and reasoning clear.

among other things, Jim said:

    "At this point, I don't have a good idea what buffers are
    for and I don't see alot of evidence that there *is* a design.
    I assume that there was a design, but I can't see it".

which pretty much echoes my concerns in:

http://www.python.org/pipermail/python-dev/1999-August/000612.html
http://www.python.org/pipermail/python-dev/1999-August/000648.html

> You found an issue with array reallocation - great - but thats
> a bug rather than a design flaw.

for me, that bug (and the marshal glitch) indicates that the
design isn't as chrystal-clear as it needs to be, for such a
fundamental feature.  otherwise, Greg would never have
made that mistake, and Guido would have spotted it when
he added the "buffer" built-in...

so what are you folks waiting for?   could someone who
thinks he understands exactly what this thing is spend
an hour on writing that design document, so me and Jim
can put this entire thing behind us?

</F>

PS. btw, was it luck or careful analysis behind the decision
to make buffer() always return read-only buffers, also for
objects implementing the read/write protocol?


From da at ski.org  Wed Aug 18 00:41:14 1999
From: da at ski.org (David Ascher)
Date: Tue, 17 Aug 1999 15:41:14 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: <19990816094243.3CE83303120@snelboot.oratrix.nl>
Message-ID: <Pine.WNT.4.04.9908160953490.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jack Jansen wrote:

> Would adding a buffer interface to cobject solve your problem? Cobject is 
> described as being used for passing C objects between Python modules, but I've 
> always thought of it as passing C objects from one C routine to another C 
> routine through Python, which doesn't necessarily understand what the object 
> is all about.
> 
> That latter description seems to fit your bill quite nicely.

It's an interesting idea, but it wouldn't do as it is, as I'd need the
ability to create a CObject given a memory location and a size.  Also, I
am not expected to free() the memory, which would happen when the CObject
got GC'ed.

(BTW: I am *not* arguing that PyBuffer_FromReadWriteMemory() should be
exposed by default.  I'm happy with exposing it in my little extension
module for my exotic needs.)

--david


From mal at lemburg.com  Wed Aug 18 11:02:02 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 11:02:02 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000001bee882$5b7d8da0$112d2399@tim> <37B92239.4076841E@lemburg.com>
Message-ID: <37BA768A.50DF5574@lemburg.com>

[about weakdicts and the possibility of building them on weak
 references; the obvious way doesn't clean up the dict itself by
 magic; maybe a weak object should be notified when its referent
 goes away
]

Here is a new version of my Proxy package which includes a
self managing weak reference mechanism without the need to
add extra bits or bytes to all Python objects:

  http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip

The docs and an explanation of how the thingie works are
included in the archive's Doc subdir. Basically it builds
upon the idea I posted earlier on on this thread -- with
a few extra kicks to get it right in the end ;-)

Usage is pretty simple:

from Proxy import WeakProxy
object = []
wr = WeakProxy(object)
wr.append(8)
del object

>>> wr[0]
Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected

I have checked the ref counts pretty thoroughly, but before
going public I would like the Python-Dev crowd to run some
tests as well: after all, the point is for the weak references
to be weak and that's sometimes a bit hard to check.

Hope you have as much fun with it as I had writing it ;-)

Ah yes, for the raw details have a look at the code. The code
uses a list of back references to the weak Proxies and notifies them
when the object goes away... would it be useful to add a hook
to the Proxies so that they can apply some other action as well ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 18 13:42:08 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 12:42:08 +0100 (NFT)
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37BA768A.50DF5574@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 11:02:02 am"
Message-ID: <199908181142.MAA22596@pukapuka.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> Usage is pretty simple:
> 
> from Proxy import WeakProxy
> object = []
> wr = WeakProxy(object)
> wr.append(8)
> del object
> 
> >>> wr[0]
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> mxProxy.LostReferenceError: object already garbage collected
> 
> I have checked the ref counts pretty thoroughly, but before
> going public I would like the Python-Dev crowd to run some
> tests as well: after all, the point is for the weak references
> to be weak and that's sometimes a bit hard to check.

It's even harder to implement them without side effects. I used
the same hack for the __heirs__ class attribute some time ago.
But I knew that a parent class cannot be garbage collected before
all of its descendants. That allowed me to keep weak refs in
the parent class, and preserve the existing strong refs in the
subclasses. On every dealloc of a subclass, the corresponding
weak ref in the parent class' __heirs__ is removed.

In your case, the lifetime of the objects cannot be predicted,
so implementing weak refs by messing with refcounts or checking
mem pointers is a dead end. I don't know whether this is the
case with mxProxy as I just browsed the code quickly, but here's
a scenario where your scheme (or implementation) is not working:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = WeakProxy(o)
>>> d = WeakProxy(o)
>>> p
<WeakProxy object at 20260138>
>>> d
<WeakProxy object at 20261328>
>>> print p
[]
>>> print d
[]
>>> del o
>>> p
<WeakProxy object at 20260138>
>>> d
<WeakProxy object at 20261328>
>>> print p
Illegal instruction (core dumped)


-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack at oratrix.nl  Wed Aug 18 13:02:13 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 18 Aug 1999 13:02:13 +0200
Subject: [Python-Dev] Quick-and-dirty weak references 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> 
Message-ID: <19990818110213.A558F303120@snelboot.oratrix.nl>

The one thing I'm not thrilled by in mxProxy is that a call to 
CheckWeakReferences() is needed before an object is cleaned up. I guess this 
boils down to the same problem I had with my weak reference scheme: you 
somehow want the Python core to tell the proxy stuff that the object can be 
cleaned up (although the details are different: in my scheme this would be 
triggered by refcount==0 and in mxProxy by refcount==1). And because objects 
are created and destroyed in Python at a tremendous rate you don't want to do 
this call for every object, only if you have a hint that the object has a weak 
reference (or a proxy).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Wed Aug 18 13:46:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 13:46:45 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <19990818110213.A558F303120@snelboot.oratrix.nl>
Message-ID: <37BA9D25.95E46EA@lemburg.com>

Jack Jansen wrote:
> 
> The one thing I'm not thrilled by in mxProxy is that a call to
> CheckWeakReferences() is needed before an object is cleaned up. I guess this
> boils down to the same problem I had with my weak reference scheme: you
> somehow want the Python core to tell the proxy stuff that the object can be
> cleaned up (although the details are different: in my scheme this would be
> triggered by refcount==0 and in mxProxy by refcount==1). And because objects
> are created and destroyed in Python at a tremendous rate you don't want to do
> this call for every object, only if you have a hint that the object has a weak
> reference (or a proxy).

Well, the check is done prior to every action using a proxy to
the object and also when a proxy to it is deallocated. The
addition checkweakrefs() API is only included to enable additional explicit
checking of the whole weak refs dictionary, e.g. every 10 seconds
or so (just like you would with a mark&sweep GC).

But yes, GC of the phantom object is delayed a bit depending on
how you set up the proxies. Still, I think most usages won't have
this problem, since the proxies themselves are usually 
temporary objects.

It may sometimes even make sense to have the phantom object
around as long as possible, e.g. to implement the soft references
Tim quoted from the Java paper.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Aug 18 13:33:18 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 13:33:18 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181142.MAA22596@pukapuka.inrialpes.fr>
Message-ID: <37BA99FE.45D582AD@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> > I have checked the ref counts pretty thoroughly, but before
> > going public I would like the Python-Dev crowd to run some
> > tests as well: after all, the point is for the weak references
> > to be weak and that's sometimes a bit hard to check.
> 
> It's even harder to implement them without side effects. I used
> the same hack for the __heirs__ class attribute some time ago.
> But I knew that a parent class cannot be garbage collected before
> all of its descendants. That allowed me to keep weak refs in
> the parent class, and preserve the existing strong refs in the
> subclasses. On every dealloc of a subclass, the corresponding
> weak ref in the parent class' __heirs__ is removed.
> 
> In your case, the lifetime of the objects cannot be predicted,
> so implementing weak refs by messing with refcounts or checking
> mem pointers is a dead end.

> I don't know whether this is the
> case with mxProxy as I just browsed the code quickly, but here's
> a scenario where your scheme (or implementation) is not working:
> 
> >>> from Proxy import WeakProxy
> >>> o = []
> >>> p = WeakProxy(o)
> >>> d = WeakProxy(o)
> >>> p
> <WeakProxy object at 20260138>
> >>> d
> <WeakProxy object at 20261328>
> >>> print p
> []
> >>> print d
> []
> >>> del o
> >>> p
> <WeakProxy object at 20260138>
> >>> d
> <WeakProxy object at 20261328>
> >>> print p
> Illegal instruction (core dumped)

Could you tell me where the core dump originates ? Also, it would
help to compile the package with the -DMAL_DEBUG switch turned
on (edit Setup) and then run the same things using 'python -d'.
The package will then print a pretty complete list of things it
is doing to mxProxy.log, which would help track down errors like
these.

BTW, I get:
>>> print p

Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected
>>>

[Don't know why the print statement prints an empty line, though.]

Thanks for trying it,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 18 15:12:14 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 14:12:14 +0100 (NFT)
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37BA99FE.45D582AD@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 01:33:18 pm"
Message-ID: <199908181312.OAA20542@pukapuka.inrialpes.fr>

[about mxProxy, WeakProxy]

M.-A. Lemburg wrote:
> 
> Could you tell me where the core dump originates ? Also, it would
> help to compile the package with the -DMAL_DEBUG switch turned
> on (edit Setup) and then run the same things using 'python -d'.
> The package will then print a pretty complete list of things it
> is doing to mxProxy.log, which would help track down errors like
> these.
> 
> BTW, I get:
> >>> print p
> 
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> mxProxy.LostReferenceError: object already garbage collected
> >>>
> 
> [Don't know why the print statement prints an empty line, though.]
> 

The previous example now *seems* to work fine in a freshly launched
interpreter, so it's not a good example, but this shorter one
definitely doesn't:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = q = WeakProxy(o)
>>> p = q = WeakProxy(o)
>>> del o
>>> print p or q
Illegal instruction (core dumped)

Or even shorter:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = q = WeakProxy(o)
>>> p = WeakProxy(o)
>>> del o
>>> print p
Illegal instruction (core dumped)

It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference().
I can mail you a complete trace in private, if you still need it.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Wed Aug 18 14:50:08 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 14:50:08 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181312.OAA20542@pukapuka.inrialpes.fr>
Message-ID: <37BAAC00.27A34FF7@lemburg.com>

Vladimir Marangozov wrote:
> 
> [about mxProxy, WeakProxy]
> 
> M.-A. Lemburg wrote:
> >
> > Could you tell me where the core dump originates ? Also, it would
> > help to compile the package with the -DMAL_DEBUG switch turned
> > on (edit Setup) and then run the same things using 'python -d'.
> > The package will then print a pretty complete list of things it
> > is doing to mxProxy.log, which would help track down errors like
> > these.
> >
> > BTW, I get:
> > >>> print p
> >
> > Traceback (innermost last):
> >   File "<stdin>", line 1, in ?
> > mxProxy.LostReferenceError: object already garbage collected
> > >>>
> >
> > [Don't know why the print statement prints an empty line, though.]
> >
> 
> The previous example now *seems* to work fine in a freshly launched
> interpreter, so it's not a good example, but this shorter one
> definitely doesn't:
> 
> >>> from Proxy import WeakProxy
> >>> o = []
> >>> p = q = WeakProxy(o)
> >>> p = q = WeakProxy(o)
> >>> del o
> >>> print p or q
> Illegal instruction (core dumped)
> 
> It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference().
> I can mail you a complete trace in private, if you still need it.

That would be nice (please also include the log-file), because I get:
>>> print p or q
Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected
>>>

Thank you,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From skip at mojam.com  Wed Aug 18 16:47:23 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 09:47:23 -0500
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
Message-ID: <199908181447.JAA05151@dolphin.mojam.com>

I posted a note to the main list yesterday in response to Dan Connolly's
complaint that the os module isn't very portable.  I saw no followups (it's
amazing how fast a thread can die out :-), but I think it's a reasonable
idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback
from people more interesting in long-term Python developments.

The basic premise is that for each platform on which Python runs there are
portable and nonportable interfaces to the underlying operating system.  The
term POSIX has some portability connotations, so let's assume that the posix
module exposes the portable subset of the OS interface.  To keep things
simple, let's also assume there are only three supported general OS
platforms: unix, nt and mac.  The proposal then is that importing the
platform's module by name will import both the portable and non-portable
interface elements.  Importing the posix module will import just that
portion of the interface that is truly portable across all platforms.  To
add new functionality to the posix interface it would have to be added to
all three platforms.  The posix module will be able to ferret out the
platform it is running on and import the correct OS-independent posix
implementation:

    import sys
    _plat = sys.platform
    del sys

    if _plat == "mac": from posixmac import *
    elif _plat == "nt": from posixnt import *
    else: from posixunix import *	# some unix variant

The platform-dependent module would simply import everything it could, e.g.:

    from posixunix import *
    from nonposixunix import *

The os module would vanish or be deprecated with its current behavior
intact.  The documentation would be modified so that the posix module
documents the portable interface and the OS-dependent module's documentation
documents the rest and just refers users to the posix module docs for the
portable stuff.

In theory, this could be done for 1.6, however as I've proposed it, the
semantics of importing the posix module would change.  Dan Connolly probably
isn't going to have a problem with that, though I suppose Guido might...  If
this idea is good enough for 1.6, perhaps we leave os and posix module
semantics alone and add a module named "portable", "portableos" or
"portableposix" or something equally arcane.

Skip Montanaro	| http://www.mojam.com/
skip at mojam.com  | http://www.musi-cal.com/~skip/
847-971-7098


From guido at CNRI.Reston.VA.US  Wed Aug 18 16:54:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 18 Aug 1999 10:54:28 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Your message of "Wed, 18 Aug 1999 09:47:23 CDT."
             <199908181447.JAA05151@dolphin.mojam.com> 
References: <199908181447.JAA05151@dolphin.mojam.com> 
Message-ID: <199908181454.KAA07692@eric.cnri.reston.va.us>

> I posted a note to the main list yesterday in response to Dan Connolly's
> complaint that the os module isn't very portable.  I saw no followups (it's
> amazing how fast a thread can die out :-), but I think it's a reasonable
> idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback
> from people more interesting in long-term Python developments.
> 
> The basic premise is that for each platform on which Python runs there are
> portable and nonportable interfaces to the underlying operating system.  The
> term POSIX has some portability connotations, so let's assume that the posix
> module exposes the portable subset of the OS interface.  To keep things
> simple, let's also assume there are only three supported general OS
> platforms: unix, nt and mac.  The proposal then is that importing the
> platform's module by name will import both the portable and non-portable
> interface elements.  Importing the posix module will import just that
> portion of the interface that is truly portable across all platforms.  To
> add new functionality to the posix interface it would have to be added to
> all three platforms.  The posix module will be able to ferret out the
> platform it is running on and import the correct OS-independent posix
> implementation:
> 
>     import sys
>     _plat = sys.platform
>     del sys
> 
>     if _plat == "mac": from posixmac import *
>     elif _plat == "nt": from posixnt import *
>     else: from posixunix import *	# some unix variant
> 
> The platform-dependent module would simply import everything it could, e.g.:
> 
>     from posixunix import *
>     from nonposixunix import *
> 
> The os module would vanish or be deprecated with its current behavior
> intact.  The documentation would be modified so that the posix module
> documents the portable interface and the OS-dependent module's documentation
> documents the rest and just refers users to the posix module docs for the
> portable stuff.
> 
> In theory, this could be done for 1.6, however as I've proposed it, the
> semantics of importing the posix module would change.  Dan Connolly probably
> isn't going to have a problem with that, though I suppose Guido might...  If
> this idea is good enough for 1.6, perhaps we leave os and posix module
> semantics alone and add a module named "portable", "portableos" or
> "portableposix" or something equally arcane.

And the advantage of this would be...?

Basically, it seems you're just renaming the functionality of os to posix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Wed Aug 18 17:10:41 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:10:41 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <199908181454.KAA07692@eric.cnri.reston.va.us>
References: <199908181447.JAA05151@dolphin.mojam.com>
	<199908181454.KAA07692@eric.cnri.reston.va.us>
Message-ID: <14266.51743.904066.470431@dolphin.mojam.com>

    Guido> And the advantage of this would be...?

    Guido> Basically, it seems you're just renaming the functionality of os
    Guido> to posix.

I see a few advantages.

    1. We will get the meaning of the noun "posix" more or less right.
       Programmers coming from other languages are used to thinking of
       programming to a POSIX API or the "POSIX subset of the OS API".
       Witness all the "#ifdef _POSIX" in the header files on my Linux box
       In Python, the exact opposite is true.  Importing the posix module is
       documented to be the non-portable way to interface to Unix platforms.

    2. You would make it clear on all platforms when you expect to be
       programming in a non-portable fashion, by importing the
       platform-specific os (unix, nt, mac).  "import unix" would mean I
       expect this code to only run on Unix machines.  You could argue that
       you are declaring your non-portability by importing the posix module
       today, but to the casual user or to a new Python programmer with a C
       or C++ background, that won't be obvious.

    3. If Dan Connolly's contention is correct, importing the os module
       today is not all that portable.  I can't really say one way or the
       other, because I'm lucky enough to be able to confine my serious
       programming to Unix.  I'm sure there's someone out there that can try
       the following on a few platforms:

	  import os
	  dir(os)

       and compare the output.

Skip


From jack at oratrix.nl  Wed Aug 18 17:33:20 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 18 Aug 1999 17:33:20 +0200
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain 
 fart
In-Reply-To: Message by Skip Montanaro <skip@mojam.com> ,
	     Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> 
Message-ID: <19990818153320.D61F6303120@snelboot.oratrix.nl>

>  The proposal then is that importing the
> platform's module by name will import both the portable and non-portable
> interface elements.  Importing the posix module will import just that
> portion of the interface that is truly portable across all platforms.

There's one slight problem with this: when you use functionality that is 
partially portable, i.e. a call that is available on Windows and Unix but not 
on the Mac.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From akuchlin at mems-exchange.org  Wed Aug 18 17:39:30 1999
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Wed, 18 Aug 1999 11:39:30 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com>
References: <199908181447.JAA05151@dolphin.mojam.com>
	<199908181454.KAA07692@eric.cnri.reston.va.us>
	<14266.51743.904066.470431@dolphin.mojam.com>
Message-ID: <14266.54194.715887.808096@amarok.cnri.reston.va.us>

Skip Montanaro writes:
>    2. You would make it clear on all platforms when you expect to be
>       programming in a non-portable fashion, by importing the
>       platform-specific os (unix, nt, mac).  "import unix" would mean I

To my mind, POSIX == Unix; other platforms may have bits of POSIX-ish
functionality, but most POSIX functions will only be found on Unix
systems.  One of my projects for 1.6 is to go through the O'Reilly
POSIX book and add all the missing calls to the posix modules.
Practically none of those functions would exist on Windows or Mac.

Perhaps it's really a documentation fix: the os module should document
only those features common to all of the big 3 platforms (Unix,
Windows, Mac), and have pointers to a section for each of the
platform-specific modules, listing the platform-specific functions.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Setting loose on the battlefield weapons that are able to learn may be one of
the biggest mistakes mankind has ever made. It could also be one of the last.
    -- Richard Forsyth, "Machine Learning for Expert Systems"


From skip at mojam.com  Wed Aug 18 17:52:20 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:52:20 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.54194.715887.808096@amarok.cnri.reston.va.us>
References: <199908181447.JAA05151@dolphin.mojam.com>
	<199908181454.KAA07692@eric.cnri.reston.va.us>
	<14266.51743.904066.470431@dolphin.mojam.com>
	<14266.54194.715887.808096@amarok.cnri.reston.va.us>
Message-ID: <14266.54907.143970.101594@dolphin.mojam.com>

    Andrew> Perhaps it's really a documentation fix: the os module should
    Andrew> document only those features common to all of the big 3
    Andrew> platforms (Unix, Windows, Mac), and have pointers to a section
    Andrew> for each of the platform-specific modules, listing the
    Andrew> platform-specific functions.

Perhaps.  Should that read

    ... the os module should *expose* only those features common to all of
    the big 3 platforms ...

?

Skip


From skip at mojam.com  Wed Aug 18 17:54:11 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:54:11 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain 
 fart
In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl>
References: <skip@mojam.com>
	<199908181447.JAA05151@dolphin.mojam.com>
	<19990818153320.D61F6303120@snelboot.oratrix.nl>
Message-ID: <14266.54991.27912.12075@dolphin.mojam.com>

>>>>> "Jack" == Jack Jansen <jack at oratrix.nl> writes:

    >> The proposal then is that importing the
    >> platform's module by name will import both the portable and non-portable
    >> interface elements.  Importing the posix module will import just that
    >> portion of the interface that is truly portable across all platforms.

    Jack> There's one slight problem with this: when you use functionality that is 
    Jack> partially portable, i.e. a call that is available on Windows and Unix but not 
    Jack> on the Mac.

Agreed.  I'm not sure what to do there.  Is the intersection of the common
OS calls on Unix, Windows and Mac so small as to be useless (or are there
some really gotta have functions not in the intersection because they are
missing only on the Mac)?

Skip


From guido at CNRI.Reston.VA.US  Wed Aug 18 18:16:27 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 18 Aug 1999 12:16:27 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Your message of "Wed, 18 Aug 1999 10:52:20 CDT."
             <14266.54907.143970.101594@dolphin.mojam.com> 
References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us>  
            <14266.54907.143970.101594@dolphin.mojam.com> 
Message-ID: <199908181616.MAA07901@eric.cnri.reston.va.us>

>     ... the os module should *expose* only those features common to all of
>     the big 3 platforms ...

Why?

My experience has been that functionality that was thought to be Unix
specific has gradually become available on other platforms, which
makes it hard to decide in which module a function should be placed.

The proper test for portability of a program is not whether it imports
certain module names, but whether it uses certain functions from those
modules (and whether it uses them in a portable fashion).  As
platforms evolve, a program that was previously thought to be
non-portable might become more portable.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 18 19:33:44 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 18:33:44 +0100 (NFT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain  fart
In-Reply-To: <14266.54991.27912.12075@dolphin.mojam.com> from "Skip Montanaro" at "Aug 18, 99 10:54:11 am"
Message-ID: <199908181733.SAA08434@pukapuka.inrialpes.fr>

Everybody's right in this debate. I have to type a lot to express
objectively my opinion, but better filter my reasoning and just say
the conclusion.

Having in mind:

- what POSIX is
- what an OS is
- that an OS may or may not comply w/ the POSIX standard, and if it doesn't,
  it may do so in a couple of years (Windows 3K and PyOS come to mind ;-)
- that the os module claims portability amongst the different
  OSes, mainly regarding their filesystem & process management services,
  hence it's exposing only a *subset* of the os specific services
- the current state of Python

It would be nice:
- to leave the os module as a common denominator
- to have a "unix" module (which could further incorporate the different
  brands of unix)
- to have the posix module capture the fraction of posix functionality,
  exported from a particular OS specific module, and add the appropriate
  POSIX propaganda in the docs
- to manage to do this, or argue what's wrong with the above
 
-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Thu Aug 19 12:02:26 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 19 Aug 1999 12:02:26 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181312.OAA20542@pukapuka.inrialpes.fr> <37BAAC00.27A34FF7@lemburg.com>
Message-ID: <37BBD632.3F66419C@lemburg.com>

[about weak references and a sample implementation in mxProxy]

With the help of Vladimir, I have solved the problem and uploaded
a modified version of the prerelease:

      http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip

The archive now also contains a precompiled Win32 PYD file
for those on WinXX platforms. Please give it a try and tell
me what you think.

Cheers,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   134 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Thu Aug 19 16:06:01 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Thu, 19 Aug 1999 16:06:01 +0200
Subject: [Python-Dev] Optimization idea
Message-ID: <19990819140602.433BC303120@snelboot.oratrix.nl>

I just had yet another idea for optimizing Python that looks so plausible that 
I guess someone else must have looked into it already (and, hence, probably 
rejected it:-):

We add to the type structure a "type identifier" number, a small integer for 
the common types (int=1, float=2, string=3, etc) and 0 for everything else.

When eval_code2 sees, for instance, a MULTIPLY operation it does something 
like the following:
   case BINARY_MULTIPLY:
	w = POP();
	v = POP();
	code = (BINARY_MULTIPLY << 8) |
		((v->ob_type->tp_typeid) << 4) |
		((w->ob_type->tp_typeid);
	x = (binopfuncs[code])(v, w);
	.... etc ...

The idea is that all the 256 BINARY_MULTIPLY entries would be filled with 
PyNumber_Multiply, except for a few common cases. The int*int field could 
point straight to int_mul(), etc.

Assuming the common cases are really more common than the uncommon cases the 
fact that they jump straight out to the implementation function in stead of 
mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset 
the added overhead of shifts, ors and indexing.

Any thoughts?

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at CNRI.Reston.VA.US  Thu Aug 19 16:05:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 10:05:28 -0400
Subject: [Python-Dev] Localization expert needed
Message-ID: <199908191405.KAA10401@eric.cnri.reston.va.us>

My contact at HP is asking for expert advice on localization and
multi-byte characters.  I have little to share except pointing to
Martin von Loewis and Pythonware.  Does anyone on this list have a
suggestion besides those?  Don't hesitate to recommend yourself --
there's money in it!

--Guido van Rossum (home page: http://www.python.org/~guido/)

------- Forwarded Message

Date:    Wed, 18 Aug 1999 23:15:55 -0700
From:    JOE_ELLSWORTH
To:      guido at CNRI.Reston.VA.US
Subject: Localization efforts and state in Python.

Hi Guido.  

Can you give me some references to The best references currently
available for using Python in CGI applications when multi-byte
localization is known to be needed?

Who is the expert in this in the Python area?   Can you recomend that
they work with us in this area?

            Thanks, Joe E.

------- End of Forwarded Message


From guido at CNRI.Reston.VA.US  Thu Aug 19 16:15:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 10:15:28 -0400
Subject: [Python-Dev] Optimization idea
In-Reply-To: Your message of "Thu, 19 Aug 1999 16:06:01 +0200."
             <19990819140602.433BC303120@snelboot.oratrix.nl> 
References: <19990819140602.433BC303120@snelboot.oratrix.nl> 
Message-ID: <199908191415.KAA10432@eric.cnri.reston.va.us>

> I just had yet another idea for optimizing Python that looks so
> plausible that I guess someone else must have looked into it already
> (and, hence, probably rejected it:-):
> 
> We add to the type structure a "type identifier" number, a small integer for 
> the common types (int=1, float=2, string=3, etc) and 0 for everything else.
> 
> When eval_code2 sees, for instance, a MULTIPLY operation it does something 
> like the following:
>    case BINARY_MULTIPLY:
> 	w = POP();
> 	v = POP();
> 	code = (BINARY_MULTIPLY << 8) |
> 		((v->ob_type->tp_typeid) << 4) |
> 		((w->ob_type->tp_typeid);
> 	x = (binopfuncs[code])(v, w);
> 	.... etc ...
> 
> The idea is that all the 256 BINARY_MULTIPLY entries would be filled with 
> PyNumber_Multiply, except for a few common cases. The int*int field could 
> point straight to int_mul(), etc.
> 
> Assuming the common cases are really more common than the uncommon cases the 
> fact that they jump straight out to the implementation function in stead of 
> mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset 
> the added overhead of shifts, ors and indexing.

You're assuming that arithmetic operations are a major time sink.  I
doubt that; much of my code contains hardly any arithmetic these days.

Of course, if you *do* have a piece of code that does a lot of basic
arithmetic, it might pay off -- but even then I would guess that the
majority of opcodes are things like list accessors and variable.

But we needn't speculate.  It's easy enough to measure the speedup:
you can use tp_xxx5 in the type structure and plug a typecode into it
for the int and float types.  

(Note that you would need a separate table of binopfuncs per
operator.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Thu Aug 19 21:09:26 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Thu, 19 Aug 1999 20:09:26 +0100 (NFT)
Subject: [Python-Dev] about line numbers
Message-ID: <199908191909.UAA20618@pukapuka.inrialpes.fr>

[Tim, in an earlier msg]
> 
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call;

Ok. In the meantime I think that folding the redundant SET_LINENO doesn't
hurt. I ended up with a patchlet that seems to have no side effects, that
updates the lnotab as it should and that even makes pdb a bit more clever,
IMHO.

Consider an extreme case for the function f (listed below). Currently,
we get the following:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2

          6 SET_LINENO          3

          9 SET_LINENO          4

         12 SET_LINENO          5
         15 LOAD_CONST          1 (1)
         18 STORE_FAST          0 (a)

         21 SET_LINENO          6

         24 SET_LINENO          7

         27 SET_LINENO          8
         30 LOAD_CONST          2 (None)
         33 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(1)f()
-> def f():
(Pdb) list 1, 20
  1  -> def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5             a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(2)f()
-> """Comment about f"""
(Pdb) step
> test.py(3)f()
-> """Another one"""
(Pdb) step
> test.py(4)f()
-> """A third one"""
(Pdb) step
> test.py(5)f()
-> a = 1
(Pdb) step
> test.py(6)f()
-> """Forth"""
(Pdb) step
> test.py(7)f()
-> "and pdb can set a breakpoint on this one (simple quotes)"
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>>
-------------------------------------------

With folded SET_LINENO, we have this:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 SET_LINENO          5
          3 LOAD_CONST          1 (1)
          6 STORE_FAST          0 (a)

          9 SET_LINENO          8
         12 LOAD_CONST          2 (None)
         15 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 20
  1     def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5  ->         a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) break 7 
Breakpoint 1 at test.py:7
(Pdb) break 8
*** Blank or comment
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>> 
-------------------------------------------

i.e, pdb stops at (points to) the first real instruction and doesn't step
trough the doc strings.

Or is there something I'm missing here?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** compile.c-orig	Thu Aug 19 19:27:13 1999
--- compile.c	Thu Aug 19 19:00:31 1999
***************
*** 615,620 ****
--- 615,623 ----
  	int arg;
  {
  	if (op == SET_LINENO) {
+ 		if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3)
+ 			/* Hack for folding several SET_LINENO in a row. */
+ 			c->c_nexti -= 3;
  		com_set_lineno(c, arg);
  		if (Py_OptimizeFlag)
  			return;


From guido at CNRI.Reston.VA.US  Thu Aug 19 23:10:33 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 17:10:33 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: Your message of "Thu, 19 Aug 1999 20:09:26 BST."
             <199908191909.UAA20618@pukapuka.inrialpes.fr> 
References: <199908191909.UAA20618@pukapuka.inrialpes.fr> 
Message-ID: <199908192110.RAA12755@eric.cnri.reston.va.us>

Earlier, you argued that this is "not an optimization," but rather
avoiding redundancy.  I should have responded right then that I
disagree, or at least I'm lukewarm about your patch.  Either you're
not using -O, and then you don't care much about this; or you care,
and then you should be using -O.

Rather than encrusting the code with more and more ad-hoc micro
optimizations, I'd prefer to have someone look into Tim's suggestion
of supporting more efficient breakpoints...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 20 14:45:46 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 13:45:46 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908192110.RAA12755@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 19, 99 05:10:33 pm"
Message-ID: <199908201245.NAA27098@pukapuka.inrialpes.fr>

Guido van Rossum wrote:
> 
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.

I haven't argued so much; I asked whether this would be reasonable.

Probably I should have said that I don't see the purpose of emitting
SET_LINENO instructions for those nodes for which the compiler
generates no code, mainly because (as I learned subsequently) SET_LINENO
serve no other purpose but debugging. As I haven't payed much attention to
this aspect of the code, I thought thay they might still be used for
tracebacks. But I couldn't have said that because I didn't know it.

> I should have responded right then that I disagree, ...

Although I agree this is a minor issue, I'm interested in your argument
here, if it's something else than the dialectic: "we're more interested
in long term improvements" which is also my opinion.

> ... or at least I'm lukewarm about your patch.

No surprise here :-) But I haven't found another way of not generating
SET_LINENO for doc strings other than backpatching.

> Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.

Neither of those. I don't really care, frankly. I was just intrigued by
the consecutive SET_LINENO in my disassemblies, so I started to think
and ask questions about it.

> 
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

This is *the* real issue with the real potential solution. I'm willing
to have a look at this (although I don't know pdb/bdb in its finest
details). All suggestions and thoughts are welcome.

We would probably leave the SET_LINENO opcode as is and (eventually)
introduce a new opcode (instead of transforming/renaming it) for
compatibility reasons, methinks.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gmcm at hypernet.com  Fri Aug 20 18:04:22 1999
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 11:04:22 -0500
Subject: [Python-Dev] Quick-and-dirty weak references 
In-Reply-To: <19990818110213.A558F303120@snelboot.oratrix.nl>
References: Message by "M.-A. Lemburg" <mal@lemburg.com> ,	     Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> 
Message-ID: <1276961301-70195@hypernet.com>

In reply to no one in particular:

 I've often wished that the instance type object had an (optimized) 
__decref__ slot. With nothing but hand-waving to support it, I'll 
claim that would enable all these games.

- Gordon


From gmcm at hypernet.com  Fri Aug 20 18:04:22 1999
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 11:04:22 -0500
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl>
References: Message by Skip Montanaro <skip@mojam.com> ,	     Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> 
Message-ID: <1276961295-70552@hypernet.com>

Jack Jansen wrote:

> There's one slight problem with this: when you use functionality
> that is partially portable, i.e. a call that is available on Windows
> and Unix but not on the Mac.

 It gets worse, I think. How about the inconsistencies in POSIX 
support among *nixes? How about NT being a superset of Win9x? How 
about NTFS having capabilities that FAT does not? I'd guess there are 
inconsistencies between Mac flavors, too.

 The Java approach (if you can't do it everywhere, you can't do it)
sucks. In some cases you could probably have the missing
functionality (in os) fail silently, but in other cases that would
be a disaster. 

 "Least-worst"-is-not-necessarily-"good"-ly y'rs

- Gordon


From tismer at appliedbiometrics.com  Fri Aug 20 17:05:47 1999
From: tismer at appliedbiometrics.com (Christian Tismer)
Date: Fri, 20 Aug 1999 17:05:47 +0200
Subject: [Python-Dev] about line numbers
References: <199908191909.UAA20618@pukapuka.inrialpes.fr> <199908192110.RAA12755@eric.cnri.reston.va.us>
Message-ID: <37BD6ECB.9DD17460@appliedbiometrics.com>


Guido van Rossum wrote:
> 
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.  I should have responded right then that I
> disagree, or at least I'm lukewarm about your patch.  Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.
> 
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

I didn't think of this before, but I just realized that
I have something like that already in Stackless Python.
It is possible to set a breakpoint at every opcode, for every
frame. Adding an extra opcode for breakpoints is a good thing
as well. The former are good for tracing, conditionla breakpoints
and such, and cost a little more time since the is always one extra
function call. The latter would be a quick, less versatile thing.

The implementation of inserting extra breakpoint opcodes for
running code turns out to be easy to implement, if the running
frame gets a local extra copy of its code object, with the
breakpoints replacing the original opcodes. The breakpoint handler
would then simply look into the original code object.

Inserting breakpoints on the source level gives us breakpoints
per procedure. Doing it in a running frame gives "instance" level
debugging of code. Checking a monitor function on every opcode
is slightly more expensive but most general.
We can have it all, what do you think.
I'm going to finish and publish the stackless/continous package
and submit a paper by end of September. Should I include
this debugging feature?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From guido at CNRI.Reston.VA.US  Fri Aug 20 17:09:32 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 11:09:32 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."
             <1276961301-70195@hypernet.com> 
References: Message by "M.-A. Lemburg" <mal@lemburg.com> , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com>  
            <1276961301-70195@hypernet.com> 
Message-ID: <199908201509.LAA14726@eric.cnri.reston.va.us>

> In reply to no one in particular:
> 
>  I've often wished that the instance type object had an (optimized) 
> __decref__ slot. With nothing but hand-waving to support it, I'll 
> claim that would enable all these games.

Without context, I don't know when this would be called.  If you want
this called on all DECREFs (regardless of the refcount value), realize
that this is a huge slowdown because it would mean the DECREF macro
has to inspect the type object, which means several indirections.
This would slow down *every* DECREF operation, not just those on
instances with a __decref__ slot, because the DECREF macro doesn't
know the type of the object!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at CNRI.Reston.VA.US  Fri Aug 20 17:13:16 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 11:13:16 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."
             <1276961295-70552@hypernet.com> 
References: Message by Skip Montanaro <skip@mojam.com> , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com>  
            <1276961295-70552@hypernet.com> 
Message-ID: <199908201513.LAA14741@eric.cnri.reston.va.us>

From: "Gordon McMillan" <gmcm at hypernet.com>

> Jack Jansen wrote:
> 
> > There's one slight problem with this: when you use functionality
> > that is partially portable, i.e. a call that is available on Windows
> > and Unix but not on the Mac.
> 
>  It gets worse, I think. How about the inconsistencies in POSIX 
> support among *nixes? How about NT being a superset of Win9x? How 
> about NTFS having capabilities that FAT does not? I'd guess there are 
> inconsistencies between Mac flavors, too.
> 
>  The Java approach (if you can't do it everywhere, you can't do it)
> sucks. In some cases you could probably have the missing
> functionality (in os) fail silently, but in other cases that would
> be a disaster. 

The Python policy has always been "if it's available, there's a
standard name and API for it; if it's not available, the function is
not defined or will raise an exception; you can use hasattr(os, ...)
or catch exceptions to cope if you can live without it."

There are a few cases where unavailable calls are emulated, a few
where they are made no-ops, and a few where they are made to raise an
exception uncoditionally, but in most cases the function will simply
not exist, so it's easy to test.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 20 22:54:10 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 21:54:10 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <37BD6ECB.9DD17460@appliedbiometrics.com> from "Christian Tismer" at "Aug 20, 99 05:05:47 pm"
Message-ID: <199908202054.VAA26970@pukapuka.inrialpes.fr>

I'll try to sketch here the scheme I'm thinking of for the
callback/breakpoint issue (without SET_LINENO), although some
technical details are still missing.

I'm assuming the following, in this order:

1) No radical changes in the current behavior, i.e. preserve the
   current architecture / strategy as much as possible.

2) We dont have breakpoints per opcode, but per source line. For that
   matter, we have sys.settrace (and for now, we don't aim to have
   sys.settracei that would be called on every opcode, although we might
   want this in the future)

3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
   used for callbacks from C to Python. So the basic problem is to generate
   these callbacks.

If any of the above is not an appropriate assumption and we want a radical
change in the strategy of setting breakpoints/ generating callbacks, then
this post is invalid.

The solution I'm thinking of:

a) Currently, we have a function PyCode_Addr2Line which computes the source
   line from the opcode's address. I hereby assume that we can write the
   reverse function PyCode_Line2Addr that returns the address from a given
   source line number. I don't have the implementation, but it should be
   doable. Furthermore, we can compute, having the co_lnotab table and
   co_firstlineno, the source line range for a code object.

   As a consequence, even with the dumbiest of all algorithms, by looping
   trough this source line range, we can enumerate with PyCode_Line2Addr 
   the sequence of addresses for the source lines of this code object.

b) As Chris pointed out, in case sys.settrace is defined, we can allocate
   and keep a copy of the original code string per frame. We can further
   dynamically overwrite the original code string with a new (internal,
   one byte) CALL_TRACE opcode at the addresses we have enumerated in a).

   The CALL_TRACE opcodes will trigger the callbacks from C to Python,
   just as the current SET_LINENO does.

c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
   the callback and if it returns successfully, we'll fetch the original
   opcode for the current location from the copy of the original co_code.
   Then we directly jump to the arg fetch code (or in case we fetch the
   entire original opcode in CALL_TRACE - we jump to the dispatch code).


Hmm. I think that's all.

At the heart of this scheme is the PyCode_Line2Addr function, which is
the only blob in my head, for now.

Christian Tismer wrote:
> 
> I didn't think of this before, but I just realized that
> I have something like that already in Stackless Python.
> It is possible to set a breakpoint at every opcode, for every
> frame. Adding an extra opcode for breakpoints is a good thing
> as well. The former are good for tracing, conditionla breakpoints
> and such, and cost a little more time since the is always one extra
> function call. The latter would be a quick, less versatile thing.

I don't think I understand clearly the difference you're talking about, 
and why the one thing is better that the other, probably because I'm a
bit far from stackless python.
 
> I'm going to finish and publish the stackless/continous package
> and submit a paper by end of September. Should I include this debugging
> feature?

Write the paper first, you have more than enough material to talk about
already ;-). Then if you have time to implement some debugging support,
you could always add another section, but it won't be a central point
of your paper.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From guido at CNRI.Reston.VA.US  Fri Aug 20 21:59:24 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 15:59:24 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: Your message of "Fri, 20 Aug 1999 21:54:10 BST."
             <199908202054.VAA26970@pukapuka.inrialpes.fr> 
References: <199908202054.VAA26970@pukapuka.inrialpes.fr> 
Message-ID: <199908201959.PAA16105@eric.cnri.reston.va.us>

> I'll try to sketch here the scheme I'm thinking of for the
> callback/breakpoint issue (without SET_LINENO), although some
> technical details are still missing.
> 
> I'm assuming the following, in this order:
> 
> 1) No radical changes in the current behavior, i.e. preserve the
>    current architecture / strategy as much as possible.
> 
> 2) We dont have breakpoints per opcode, but per source line. For that
>    matter, we have sys.settrace (and for now, we don't aim to have
>    sys.settracei that would be called on every opcode, although we might
>    want this in the future)
> 
> 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
>    used for callbacks from C to Python. So the basic problem is to generate
>    these callbacks.

They used to be the only mechanism by which the traceback code knew
the current line number (long before the debugger hooks existed), but
with the lnotab, that's no longer necessary.

> If any of the above is not an appropriate assumption and we want a radical
> change in the strategy of setting breakpoints/ generating callbacks, then
> this post is invalid.

Sounds reasonable.

> The solution I'm thinking of:
> 
> a) Currently, we have a function PyCode_Addr2Line which computes the source
>    line from the opcode's address. I hereby assume that we can write the
>    reverse function PyCode_Line2Addr that returns the address from a given
>    source line number. I don't have the implementation, but it should be
>    doable. Furthermore, we can compute, having the co_lnotab table and
>    co_firstlineno, the source line range for a code object.
> 
>    As a consequence, even with the dumbiest of all algorithms, by looping
>    trough this source line range, we can enumerate with PyCode_Line2Addr 
>    the sequence of addresses for the source lines of this code object.
> 
> b) As Chris pointed out, in case sys.settrace is defined, we can allocate
>    and keep a copy of the original code string per frame. We can further
>    dynamically overwrite the original code string with a new (internal,
>    one byte) CALL_TRACE opcode at the addresses we have enumerated in a).
> 
>    The CALL_TRACE opcodes will trigger the callbacks from C to Python,
>    just as the current SET_LINENO does.
> 
> c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
>    the callback and if it returns successfully, we'll fetch the original
>    opcode for the current location from the copy of the original co_code.
>    Then we directly jump to the arg fetch code (or in case we fetch the
>    entire original opcode in CALL_TRACE - we jump to the dispatch code).

Tricky, but doable.

> Hmm. I think that's all.
> 
> At the heart of this scheme is the PyCode_Line2Addr function, which is
> the only blob in my head, for now.

I'm pretty sure that this would be straightforward.

I'm a little anxious about modifying the code, and was thinking myself
of a way to specify a bitvector of addresses where to break.  But that
would still cause some overhead for code without breakpoints, so I
guess you're right (and it's certainly a long-standing tradition in
breakpoint-setting!)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 20 23:22:12 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 22:22:12 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908201959.PAA16105@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 20, 99 03:59:24 pm"
Message-ID: <199908202122.WAA26956@pukapuka.inrialpes.fr>

Guido van Rossum wrote:
> 
> 
> I'm a little anxious about modifying the code, and was thinking myself
> of a way to specify a bitvector of addresses where to break.  But that
> would still cause some overhead for code without breakpoints, so I
> guess you're right (and it's certainly a long-standing tradition in
> breakpoint-setting!)
> 

Hm. You're probably right, especially if someone wants to inspect
a code object from the debugger or something. But I belive, that
we can manage to redirect the instruction pointer in the beginning
of eval_code2 to the *copy* of co_code, and modify the copy with
CALL_TRACE, preserving the original intact.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From skip at mojam.com  Fri Aug 20 22:25:25 1999
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 20 Aug 1999 15:25:25 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: <1276961295-70552@hypernet.com>
References: <skip@mojam.com>
	<199908181447.JAA05151@dolphin.mojam.com>
	<19990818153320.D61F6303120@snelboot.oratrix.nl>
	<1276961295-70552@hypernet.com>
Message-ID: <14269.47443.192469.525132@dolphin.mojam.com>

    Gordon> It gets worse, I think. How about the inconsistencies in POSIX
    Gordon> support among *nixes? How about NT being a superset of Win9x?
    Gordon> How about NTFS having capabilities that FAT does not? I'd guess
    Gordon> there are inconsistencies between Mac flavors, too.

To a certain degree I think the C module(s) that interface to the underlying 
OS's API can iron out differences.  In other cases you may have to document
minor (known) differences.  In still other cases you may have to relegate
particular functionality to the OS-dependent modules.

Skip Montanaro	| http://www.mojam.com/
skip at mojam.com  | http://www.musi-cal.com/~skip/
847-971-7098


From gmcm at hypernet.com  Sat Aug 21 00:38:14 1999
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 17:38:14 -0500
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <199908201509.LAA14726@eric.cnri.reston.va.us>
References: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."             <1276961301-70195@hypernet.com> 
Message-ID: <1276937670-1491544@hypernet.com>

[me]
> > 
> >  I've often wished that the instance type object had an (optimized) 
> > __decref__ slot. With nothing but hand-waving to support it, I'll 
> > claim that would enable all these games.

[Guido]
> Without context, I don't know when this would be called.  If you
> want this called on all DECREFs (regardless of the refcount value),
> realize that this is a huge slowdown because it would mean the
> DECREF macro has to inspect the type object, which means several
> indirections. This would slow down *every* DECREF operation, not
> just those on instances with a __decref__ slot, because the DECREF
> macro doesn't know the type of the object!

This was more 2.0-ish speculation, and really thinking of classic C++ 
ref counting where decref would be a function call, not a macro. 
Still a slowdown, of course, but not quite so massive. The upside is 
opening up all kinds of tricks at the type object and user class 
levels, (such as weak refs and copy on write etc). Worth it? I'd 
think so, but I'm not a speed demon.

- Gordon


From tim_one at email.msn.com  Sat Aug 21 10:09:17 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 21 Aug 1999 04:09:17 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com>
Message-ID: <000201beebac$776d32e0$0c2d2399@tim>

[Skip Montanaro]
> ...
> 3. If Dan Connolly's contention is correct, importing the os module
>    today is not all that portable.  I can't really say one way or the
>    other, because I'm lucky enough to be able to confine my serious
>    programming to Unix.  I'm sure there's someone out there that
>    can try the following on a few platforms:
>
> 	  import os
> 	  dir(os)
>
>    and compare the output.

There's no need to, Skip.  Just read the os module docs; where a function
says, e.g., "Availability: Unix.", it doesn't show up on a Windows or Mac
box.

In that sense using (some) os functions is certainly unportable.  But I have
no sympathy for the phrasing of Dan's complaint:  if he calls os.getegid(),
*he* knows perfectly well that's a Unix-specific function, and expressing
outrage over it not working on NT is disingenuous.

OTOH, I don't think you're going to find anything in the OS module
documented as available only on Windows or only on Macs, and some
semi-portable functions (notoriosly chmod) are documented in ways that make
sense only to Unixheads.  This certainly gives a strong impression of
Unix-centricity to non-Unix weenies, and has got to baffle true newbies
completely.

So 'twould be nice to have a basic os module all of whose functions "run
everywhere", whose interfaces aren't copies of cryptic old Unixisms, and
whose docs are platform neutral.

If Guido is right that the os functions tend to get more portable over time,
fine, that module can grow over time too.  In the meantime, life would be
easier for everyone except Python's implementers.


From Vladimir.Marangozov at inrialpes.fr  Sat Aug 21 17:34:32 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sat, 21 Aug 1999 16:34:32 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908202122.WAA26956@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 20, 99 10:22:12 pm"
Message-ID: <199908211534.QAA22392@pukapuka.inrialpes.fr>

[me]
> 
> Guido van Rossum wrote:
> > 
> > 
> > I'm a little anxious about modifying the code, and was thinking myself
> > of a way to specify a bitvector of addresses where to break.  But that
> > would still cause some overhead for code without breakpoints, so I
> > guess you're right (and it's certainly a long-standing tradition in
> > breakpoint-setting!)
> > 
> 
> Hm. You're probably right, especially if someone wants to inspect
> a code object from the debugger or something. But I belive, that
> we can manage to redirect the instruction pointer in the beginning
> of eval_code2 to the *copy* of co_code, and modify the copy with
> CALL_TRACE, preserving the original intact.
> 

I wrote a very rough first implementation of this idea. The files are at:

http://sirac.inrialpes.fr/~marangoz/python/lineno/


Basically, what I did is:

1) what I said :-)
2) No more SET_LINENO
3) In tracing mode, a copy of the original code is put in an additional
   slot (co_tracecode) of the code object. Then it's overwritten with
   CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

   The VM is routed to execute this code, and not the original one.

4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to
   normal execution of the original code.


A couple of things that need finalization:

a) how to deallocate the modified code string when tracing is off
b) the value of CALL_TRACE (I almost randomly picked 76)
c) I don't handle the cases where sys_tracefunc is enabled or disabled
   within the same code object. Tracing or not is determined before
   the main loop.
d) update pdb, so that it does not allow setting breakpoints on lines with
   no code. To achieve this, I think that python versions of PyCode_Addr2Line
   & PyCode_Line2Addr have to be integrated into pdb as helper functions.
e) correct bugs and design flaws
f) something else?


And here's the sample session of my lousy function f with this
'proof of concept' code:

>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 LOAD_CONST          1 (1)
          3 STORE_FAST          0 (a)
          6 LOAD_CONST          2 (None)
          9 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 10
  1     def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5  ->         a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>>

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at appliedbiometrics.com  Sat Aug 21 19:10:50 1999
From: tismer at appliedbiometrics.com (Christian Tismer)
Date: Sat, 21 Aug 1999 19:10:50 +0200
Subject: [Python-Dev] about line numbers
References: <199908211534.QAA22392@pukapuka.inrialpes.fr>
Message-ID: <37BEDD9A.DBA817B1@appliedbiometrics.com>


Vladimir Marangozov wrote:
...
> I wrote a very rough first implementation of this idea. The files are at:
> 
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
> 
> Basically, what I did is:
> 
> 1) what I said :-)
> 2) No more SET_LINENO
> 3) In tracing mode, a copy of the original code is put in an additional
>    slot (co_tracecode) of the code object. Then it's overwritten with
>    CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

I'd rather keep the original code object as it is, create a copy
with inserted breakpoints and put that into the frame slot.
Pointing back to the original from there.

Then I'd redirect the code from the CALL_TRACE opcode completely
to a user-defined function.
Getting rid of the extra code object would be done by this function
when tracing is off. It also vanishes automatically when the frame
is released.

> a) how to deallocate the modified code string when tracing is off

By making the copy a frame property which is temporary, I think.
Or, if tracing should work for all frames, by pushing the original
in the back of the modified. Both works.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Vladimir.Marangozov at inrialpes.fr  Sat Aug 21 23:40:05 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sat, 21 Aug 1999 22:40:05 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <37BEDD9A.DBA817B1@appliedbiometrics.com> from "Christian Tismer" at "Aug 21, 99 07:10:50 pm"
Message-ID: <199908212140.WAA51054@pukapuka.inrialpes.fr>

Chris, could you please repeat that step by step in more detail?
I'm not sure I understand your suggestions.

Christian Tismer wrote:
>
> Vladimir Marangozov wrote:
> ...
> > I wrote a very rough first implementation of this idea. The files are at:
> >
> > http://sirac.inrialpes.fr/~marangoz/python/lineno/
> >
> > Basically, what I did is:
> >
> > 1) what I said :-)
> > 2) No more SET_LINENO
> > 3) In tracing mode, a copy of the original code is put in an additional
> >    slot (co_tracecode) of the code object. Then it's overwritten with
> >    CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.
>
> I'd rather keep the original code object as it is, create a copy
> with inserted breakpoints and put that into the frame slot.

You seem to suggest to duplicate the entire code object, right?
And reference the modified duplicata from the current frame?

I actually duplicate only the opcode string (that is, the co_code string
object) and I don't see the point of duplicating the entire code object.

Keeping a reference from the current frame makes sense, but won't it
deallocate the modified version on every frame release (then redo all the
code duplication work for every frame) ?

> Pointing back to the original from there.

I don't understand this. What points back where?

>
> Then I'd redirect the code from the CALL_TRACE opcode completely
> to a user-defined function.

What user-defined function? I don't understand that either...
Except the sys_tracefunc, what other (user-defined) function do we have here?
Is it a Python or a C function?

> Getting rid of the extra code object would be done by this function
> when tracing is off.

How exactly? This seems to be obvious for you, but obviously, not for me ;-)

> It also vanishes automatically when the frame is released.

The function or the extra code object?

>
> > a) how to deallocate the modified code string when tracing is off
>
> By making the copy a frame property which is temporary, I think.

I understood that the frame lifetime could be exploited "somehow"...

> Or, if tracing should work for all frames, by pushing the original
> in the back of the modified. Both works.

Tracing is done for all frames, if sys_tracefunc is not NULL, which
is a function that usually ends up in the f_trace slot.

>
> ciao - chris

I'm confused. I didn't understand your idea.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at appliedbiometrics.com  Sat Aug 21 23:23:10 1999
From: tismer at appliedbiometrics.com (Christian Tismer)
Date: Sat, 21 Aug 1999 23:23:10 +0200
Subject: [Python-Dev] about line numbers
References: <199908212140.WAA51054@pukapuka.inrialpes.fr>
Message-ID: <37BF18BE.B3D58836@appliedbiometrics.com>


Vladimir Marangozov wrote:
> 
> Chris, could you please repeat that step by step in more detail?
> I'm not sure I understand your suggestions.

I think I was too quick. I thought of copying the whole
code object, of course.

...
> > I'd rather keep the original code object as it is, create a copy
> > with inserted breakpoints and put that into the frame slot.
> 
> You seem to suggest to duplicate the entire code object, right?
> And reference the modified duplicata from the current frame?

Yes.

> I actually duplicate only the opcode string (that is, the co_code string
> object) and I don't see the point of duplicating the entire code object.
> 
> Keeping a reference from the current frame makes sense, but won't it
> deallocate the modified version on every frame release (then redo all the
> code duplication work for every frame) ?

You get two options by that.
1) permanently modifying one code object to be traceable is
pushing a copy of the original "behind" by means of some
co_back pointer. This keeps the patched one where the
original was, and makes a global debugging version.

2) Creating a copy for one frame, and putting the original
in to an co_back pointer. This gives debugging just
for this one frame.

...
> > Then I'd redirect the code from the CALL_TRACE opcode completely
> > to a user-defined function.
> 
> What user-defined function? I don't understand that either...
> Except the sys_tracefunc, what other (user-defined) function do we have here?
> Is it a Python or a C function?

I would suggest a Python function, of course.

> > Getting rid of the extra code object would be done by this function
> > when tracing is off.
> 
> How exactly? This seems to be obvious for you, but obviously, not for me ;-)

If the permanent tracing "1)" is used, just restore the code object's
contents from the original in co_back, and drop co_back.
In the "2)" version, just pull the co_back into the frame's code pointer
and loose the reference to the copy. Occours automatically on frame
release.

> > It also vanishes automatically when the frame is released.
> 
> The function or the extra code object?

The extra code object.

...
> I'm confused. I didn't understand your idea.

Forget it, it isn't more than another brain fart :-)

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From tim_one at email.msn.com  Sun Aug 22 03:25:22 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 21 Aug 1999 21:25:22 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908131347.OAA30740@pukapuka.inrialpes.fr>
Message-ID: <000001beec3d$348f0160$cb2d2399@tim>

[going back a week here, to dict resizing ...]

[Vladimir Marangozov]
> ...
> All in all, for performance reasons, dicts remain an exception
> to the rule of releasing memory ASAP.

Yes, except I don't think there is such a rule!  The actual rule is a
balancing act between the cost of keeping memory around "just in case", and
the expense of getting rid of it.

Resizing a dict is extraordinarily expensive because the entire table needs
to be rearranged, but lists make this tradeoff too (when you del a list
element or list slice, it still goes thru NRESIZE, which still keeps space
for as many as 100 "extra" elements around).

The various internal caches for int and frame objects (etc) also play this
sort of game; e.g., if I happen to have a million ints sitting around at
some time, Python effectively assumes I'll never want to reuse that int
storage for anything other than ints again.

python-rarely-releases-memory-asap-ly y'rs  - tim


From Vladimir.Marangozov at inrialpes.fr  Sun Aug 22 21:41:59 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sun, 22 Aug 1999 20:41:59 +0100 (NFT)
Subject: [Python-Dev]  Memory  (was: about line numbers, which was shrinking dicts)
In-Reply-To: <000001beec3d$348f0160$cb2d2399@tim> from "Tim Peters" at "Aug 21, 99 09:25:22 pm"
Message-ID: <199908221941.UAA54480@pukapuka.inrialpes.fr>

Tim Peters wrote:
>
> [going back a week here, to dict resizing ...]

Yes, and the subject line does not correspond to the contents because
at the moment I've sent this message, I ran out of disk space and the
mailer picked a random header after destroying half of the messages
in this mailbox.

>
> [Vladimir Marangozov]
> > ...
> > All in all, for performance reasons, dicts remain an exception
> > to the rule of releasing memory ASAP.
>
> Yes, except I don't think there is such a rule!  The actual rule is a
> balancing act between the cost of keeping memory around "just in case", and
> the expense of getting rid of it.

Good point.

>
> Resizing a dict is extraordinarily expensive because the entire table needs
> to be rearranged, but lists make this tradeoff too (when you del a list
> element or list slice, it still goes thru NRESIZE, which still keeps space
> for as many as 100 "extra" elements around).
>
> The various internal caches for int and frame objects (etc) also play this
> sort of game; e.g., if I happen to have a million ints sitting around at
> some time, Python effectively assumes I'll never want to reuse that int
> storage for anything other than ints again.
>
> python-rarely-releases-memory-asap-ly y'rs  - tim

Yes, and I'm somewhat sensible to this issue afer spending 6 years
in a team which deals a lot with memory management (mainly DSM).

In other words, you say that Python tolerates *virtual* memory fragmentation
(a funny term :-). In the case of dicts and strings, we tolerate "internal
fragmentation" (a contiguous chunk is allocated, then partially used).
In the case of ints, floats or frames, we tolerate "external fragmentation".

And as you said, Python tolerates this because of the speed/space tradeoff.
Hopefully, all we deal with at this level is virtual memory, so even if you
have zillions of ints, it's the OS VMM that will help you more with its
long-term scheduling than Python's wild guesses about a hypothetical usage
of zillions of ints later.

I think that some OS concepts can give us hints on how to reduce our
virtual fragmentation (which, as we all know, is a not a very good thing).
A few kewords: compaction, segmentation, paging, sharing.

We can't do much about our internal fragmentation, except changing the
algorithms of dicts & strings (which is not appealing anyways). But it
would be nice to think about the external fragmentation of Python's caches.
Or even try to reduce the internal fragmentation in combination with the
internal caches...

BTW, this is the whole point of PyMalloc: in a virtual memory world, try
to reduce the distance between the user view and the OS view on memory.
PyMalloc addresses the fragmentation problem at a lower level of granularity
than an OS (that is, *within* a page), because most Python's objects are
very small. However, it can't handle efficiently large chunks like the
int/float caches. Basically what it does is: segmentation of the virtual
space and sharing of the cached free space. I think that Python could
improve on sharing its internal caches, without significant slowdowns...

The bottom line is that there's still plenty of room for exploring alternate
mem mgt strategies that fit better Python's memory needs as a whole.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack at oratrix.nl  Sun Aug 22 23:25:56 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Sun, 22 Aug 1999 23:25:56 +0200
Subject: [Python-Dev] Converting C objects to Python objects and back
Message-ID: <19990822212601.2D4BE18BA0D@oratrix.oratrix.nl>

Here's another siy idea, not having to do with optimization.

On the Mac, and as far as I know on Windows as well, there are quite a 
few OS API structures that have a Python Object representation that is 
little more than the PyObject boilerplate plus a pointer to the C API
object. (And, of course, lots of methods to operate on the object).

To convert these from Python to C I always use boilerplate code like

  WindowPtr *win;

  PyArg_ParseTuple(args, "O&", PyWin_Convert, &win);

where PyWin_Convert is the function that takes a PyObject * and a void 
**, does the typecheck and sets the pointer. A similar way is used to
convert C pointers back to Python objects in Py_BuildValue.

What I was thinking is that it would be nice (if you are _very_
careful) if this functionality was available in struct. So, if I would 
somehow obtain (in a Python string) a C structure that contained, say, 
a WindowPtr and two ints, I would be able to say
  win, x, y = struct.unpack("Ohh", Win.WindowType)
and struct would be able, through the WindowType type object, to get
at the PyWin_Convert and PyWin_New functions.

A nice side issue is that you can add an option to PyArg_Parsetuple so 
you can say
   PyArg_ParseTuple(args, "O+", Win_WinObject, &win)
and you don't have to remember the different names the various types
use for their conversion routines.

Is this worth pursuing is is it just too dangerous? And, if it is
worth pursuing, I have to stash away the two function pointers
somewhere in the TypeObject, should I grab one of the tp_xxx fields
for this or is there a better place?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From fdrake at acm.org  Mon Aug 23 16:54:07 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 23 Aug 1999 10:54:07 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000201beebac$776d32e0$0c2d2399@tim>
References: <14266.51743.904066.470431@dolphin.mojam.com>
	<000201beebac$776d32e0$0c2d2399@tim>
Message-ID: <14273.24719.865520.797568@weyr.cnri.reston.va.us>

Tim Peters writes:
 > OTOH, I don't think you're going to find anything in the OS module
 > documented as available only on Windows or only on Macs, and some

Tim,
  Actually, the spawn*() functions are included in os and are
documented as Windows-only, along with the related P_* constants.
These are provided by the nt module.

 > everywhere", whose interfaces aren't copies of cryptic old Unixisms, and
 > whose docs are platform neutral.

  I'm alwasy glad to see documentation patches, or even pointers to
specific problems.  Being a Unix-weenie myself, making the
documentation more readable to Windows-weenies can be difficult at
times.  But given useful pointers, I can usually pull it off, or at
least drive someone who canto do so.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one at email.msn.com  Tue Aug 24 08:32:49 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 24 Aug 1999 02:32:49 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
Message-ID: <000701beedfa$7c5c8e40$902d2399@tim>

[Fred L. Drake, Jr.]
>   Actually, the spawn*() functions are included in os and are
> documented as Windows-only, along with the related P_* constants.
> These are provided by the nt module.

I stand corrected, Fred -- so how do the Unix dweebs like this Windows crap
cluttering "their" docs <wink>?

[Tim, pitching a portable sane interface to a portable sane subset of
 os functionality]

>   I'm alwasy glad to see documentation patches, or even pointers to
> specific problems.  Being a Unix-weenie myself, making the
> documentation more readable to Windows-weenies can be difficult at
> times.  But given useful pointers, I can usually pull it off, or at
> least drive someone who canto do so.  ;-)

No, it's deeper than that.  Some of the inherited Unix interfaces are flatly
incomprehensible to anyone other than a Unix-head, but the functionality is
supplied only in that form (docs may ease the pain, but the interfaces still
suck); for example,

    mkdir (path[, mode])
    Create a directory named path with numeric mode mode.
    The default mode is 0777 (octal). On some systems, mode
    is ignored. Where it is used, the current umask value is
    first masked out. Availability: Macintosh, Unix, Windows.

If you have a sister or parent or 3-year-old child (they're all equivalent for
this purpose <wink>), just imagine them reading that.  If you can't, I'll have
my sister call you <wink>.  Raw numeric permission modes, octal mode notation,
and the "umask" business are Unix-specific -- and even Unices supply symbolic
ways to specify permissions.

chmod is likely the one I hear the most gripes about.  Windows heads are
looking to change "file attributes", the name "chmod" is gibberish to them,
most of the Unix mode bits make no sense under Windows (& contra Guido's
optimism, never will) even if you know the secret octal code, and Windows has
several attributes (hidden bit, system bit, archive bit) chmod can't get at.
The only portable functionality here is the write bit, but no non-Unix person
could possibly guess either that chmod is the function they need, or what to
type after someone tells them it's chmod.

So this is less a doc issue than that more of os needs to become more like
os.path (i.e., intelligently named functions with intelligently abstracted
interfaces).

never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs  - tim


From skip at mojam.com  Tue Aug 24 19:21:53 1999
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 24 Aug 1999 12:21:53 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000701beedfa$7c5c8e40$902d2399@tim>
References: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
	<000701beedfa$7c5c8e40$902d2399@tim>
Message-ID: <14274.53860.210265.71990@dolphin.mojam.com>

    Tim> chmod is likely the one I hear the most gripes about.  Windows
    Tim> heads are looking to change "file attributes", the name "chmod" is
    Tim> gibberish to them

Well, we could confuse everyone and rename "chmod" to "chfat" (is that like
file system liposuction?).  Windows probably has an equivalent function
whose name is 17 characters long which we'd all love to type, I'm sure. ;-)

    Tim> most of the Unix mode bits make no sense under Windows (& contra
    Tim> Guido's optimism, never will) even if you know the secret octal
    Tim> code ...

It beats a secret handshake.  Imagine all the extra peripherals we'd have to
make available for everyone's computer. ;-)

    Tim> So this is less a doc issue than that more of os needs to become
    Tim> more like os.path (i.e., intelligently named functions with
    Tim> intelligently abstracted interfaces).

Hasn't Guido's position been that the interface modules like os, posix, etc
are just a thin layer over the underlying API (Guido: note how I cleverly
attributed this position to you but also placed the responsibility for
correctness on your head!)?  If that's the case, perhaps we should provide a
slightly higher level module that abstracts the file system as objects, and
adopts a more user-friendly approach to the secret octal codes.  Those of us
worried about job security could continue to use the lower level module and
leave the higher level interface for former Visual Basic programmers.

    Tim> never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs -

maybe-the-"e"-key-stuck-on-his-TTY-ly y'rs...

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From fdrake at acm.org  Tue Aug 24 20:21:44 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 24 Aug 1999 14:21:44 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com>
References: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
	<000701beedfa$7c5c8e40$902d2399@tim>
	<14274.53860.210265.71990@dolphin.mojam.com>
Message-ID: <14274.58040.138331.413958@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > whose name is 17 characters long which we'd all love to type, I'm sure. ;-)

  Just 17?  ;-)

 >     Tim> So this is less a doc issue than that more of os needs to become
 >     Tim> more like os.path (i.e., intelligently named functions with
 >     Tim> intelligently abstracted interfaces).

  Sounds like some doc improvements can really help improve things, at 
least in the short term.

 > correctness on your head!)?  If that's the case, perhaps we should provide a
 > slightly higher level module that abstracts the file system as objects, and
 > adopts a more user-friendly approach to the secret octal codes.  Those of us

  I'm all for an object interface to a logical filesystem; having had
to write just such a thing in Java not long ago, and we have a similar 
construct in Python (not by me, though), that we use in our Knowbot
work.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one at email.msn.com  Wed Aug 25 09:02:21 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 25 Aug 1999 03:02:21 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com>
Message-ID: <000801beeec7$c6f06b20$fc2d153f@tim>

[Skip Montanaro]
> Well, we could confuse everyone and rename "chmod" to "chfat" ...

I don't want to rename anything, nor do I want to use MS-specific names.  chmod
is both the wrong spelling & the wrong functionality for all non-Unix systems.
os.path did a Good Thing by, e.g., introducing getmtime(), despite that
everyone knows <wink> it's just os.stat()[8].  New isreadonly(path) and
setreadonly(path) are more what I'm after; nothing beyond that is portable, &
never will be.

> Windows probably has an equivalent function whose name is 17
> characters long

Indeed, SetFileAttributes is exactly 17 characters long (you moonlighting on
NT, Skip?!).  But while Windows geeks would like to use that, it's both the
wrong spelling & the wrong functionality for all non-Windows systems.

> ...
> Hasn't Guido's position been that the interface modules like os,
> posix, etc are just a thin layer over the underlying API (Guido:
> note how I cleverly attributed this position to you but also placed
> the responsibility for correctness on your head!)?  If that's the
> case, perhaps we should provide a slightly higher level module that
> abstracts the file system as objects, and adopts a more user-friendly
> approach to the secret octal codes.

Like that, yes.

> Those of us worried about job security could continue to use the
> lower level module and leave the higher level interface for former
> Visual Basic programmers.

You're just *begging* Guido to make the Python2 os module take all of its names
from the Win32 API <wink>.

it's-no-lamer-to-be-ignorant-of-unix-names-than-it-is-
    to-be-ignorant-of-chinese-ly y'rs  - tim


From tim_one at email.msn.com  Wed Aug 25 09:05:31 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 25 Aug 1999 03:05:31 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
Message-ID: <000901beeec8$380d05c0$fc2d153f@tim>

[Fred L. Drake, Jr.]
> ...
>   I'm all for an object interface to a logical filesystem; having
> had to write just such a thing in Java not long ago, and we have
> a similar construct in Python (not by me, though), that we use in
> our Knowbot work.

Well, don't read anything unintended into this, but Guido *is* out of town, and
you *do* have the power to check in code outside the doc subtree ...

barry-will-help-he's-been-itching-to-revolt-too<wink>-ly y'rs  - tim


From bwarsaw at cnri.reston.va.us  Wed Aug 25 13:20:16 1999
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 25 Aug 1999 07:20:16 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14275.53616.585669.890621@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

    TP> Well, don't read anything unintended into this, but Guido *is*
    TP> out of town, and you *do* have the power to check in code
    TP> outside the doc subtree ...

    TP> barry-will-help-he's-been-itching-to-revolt-too<wink>-ly y'rs

I'll bring the pitchforks if you bring the torches!
-Barry


From skip at mojam.com  Wed Aug 25 17:17:35 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 25 Aug 1999 10:17:35 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14276.2229.983969.228891@dolphin.mojam.com>

    > I'm all for an object interface to a logical filesystem; having had to
    > write just such a thing in Java not long ago, and we have a similar
    > construct in Python (not by me, though), that we use in our Knowbot
    > work.

Fred,

Since this is the dev group, how about showing us the Knowbot's logical
filesystem API, and let's do some dev-ing...

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From fdrake at acm.org  Wed Aug 25 18:22:52 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 12:22:52 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000801beeec7$c6f06b20$fc2d153f@tim>
References: <14274.53860.210265.71990@dolphin.mojam.com>
	<000801beeec7$c6f06b20$fc2d153f@tim>
Message-ID: <14276.6236.605103.369339@weyr.cnri.reston.va.us>

Tim Peters writes:
 > os.path did a Good Thing by, e.g., introducing getmtime(), despite that
 > everyone knows <wink> it's just os.stat()[8].  New isreadonly(path) and
 > setreadonly(path) are more what I'm after; nothing beyond that is portable,

Tim,
  I think we can simply declare that isreadonly() checks that the file 
doesn't allow the user to read it, but setreadonly() sounds to me like 
it wouldn't be portable to Unix.  There's more than one (reasonable)
way to make a file unreadable to a user just by manipulating
permission bits, and which is best will vary according to both the
user and the file's existing permissions.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Wed Aug 25 18:26:25 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 12:26:25 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14276.6449.428851.402955@weyr.cnri.reston.va.us>

Tim Peters writes:
 > Well, don't read anything unintended into this, but Guido *is* out
 > of town, and you *do* have the power to check in code outside the
 > doc subtree ...

  Good thing I turned of the python-checkins list when I added the
curly bracket patch I've been working on!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Wed Aug 25 20:46:30 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 14:46:30 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.2229.983969.228891@dolphin.mojam.com>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
	<14276.2229.983969.228891@dolphin.mojam.com>
Message-ID: <14276.14854.366220.664463@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > Since this is the dev group, how about showing us the Knowbot's logical
 > filesystem API, and let's do some dev-ing...

  Well, I took a look at it, and I must confess it's just not really
different from the set of interfaces in the os module; the important
point is that they are methods instead of functions (other than a few
data items: sep, pardir, curdir).  The path attribute provided the
same interface as os.path.  Its only user-visible state is the
current-directory setting, which may or may not be that useful.
  We left off chmod(), which would make Tim happy, but that was only
because it wasn't meaningful in context.  We'd have to add it (or
something equivalent) for a general purpose filesystem object.  So
Tim's only happy if he can come up with a general interface that is
actually portable (consider my earlier comments on setreadonly()).
  On the other hand, you don't need chmod() or anything like it for
most situations where a filesystem object would be useful.  An
FTPFilesystem class would not be hard to write!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From jack at oratrix.nl  Wed Aug 25 23:43:16 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 25 Aug 1999 23:43:16 +0200
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart 
In-Reply-To: Message by "Fred L. Drake, Jr." <fdrake@acm.org> ,
	     Wed, 25 Aug 1999 12:22:52 -0400 (EDT) , <14276.6236.605103.369339@weyr.cnri.reston.va.us> 
Message-ID: <19990825214321.D50AD18BA0F@oratrix.oratrix.nl>

But in Python, with its nice high-level datastructures, couldn't we
design the Mother Of All File Attribute Calls, which would optionally
map functionality from one platform to another?

As an example consider the Mac resource fork size. If on unix I did
  fattrs = os.getfileattributes(filename)
  rfsize = fattrs.get('resourceforksize')
it would raise an exception. If, however, I did
  rfsize = fattrs.get('resourceforksize', compat=1)
I would get a "close approximation", 0. Note that you want some sort
of a compat parameter, not a default value, as for some attributes
(the various atime/mtime/ctimes, permission bits, etc) you'd get a
default based on other file attributes that do exist on the current
platform.

Hmm, the file-attribute-object idea has the added advantage that you
can then use setfileattributes(filename, fattrs) to be sure that
you've copied all relevant attributes, independent of the platform
you're on.

Mapping permissions takes a bit more (design-) work, with unix having
user/group/other only and Windows having full-fledged ACLs (or nothing 
at all, depending how you look at it:-), but should also be doable.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Vladimir.Marangozov at inrialpes.fr  Thu Aug 26 08:10:01 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Thu, 26 Aug 1999 07:10:01 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908211534.QAA22392@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 21, 99 04:34:32 pm"
Message-ID: <199908260610.HAA20304@pukapuka.inrialpes.fr>

[me, dropping SET_LINENO]
> 
> I wrote a very rough first implementation of this idea. The files are at:
> 
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
> 
> ...
> 
> A couple of things that need finalization:
> 
> ...

An updated version is available at the same location.
I think that this one does The Right Thing (tm).

a) Everything is internal to the VM and totally hidden, as it should be.
b) No modifications of the code and frame objects (no additional slots)
c) The modified code string (used for tracing) is allocated dynamically
   when the 1st frame pointing to its original switches in trace mode,
   and is deallocated automatically when the last frame pointing to its
   original dies.

I feel better with this code so I can stop thinking about it and move on :-)
(leaving it to your appreciation).

What's next? File attributes? ;-)

It's not easy to weight what kind of common interface would be easy to grasp,
intuitive and unambiguous for the average user. I think that the people on
this list (being core developers) are more or less biased here (I'd say more
than less). Perhaps some input from the community (c.l.py) would help?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one at email.msn.com  Thu Aug 26 07:06:57 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 26 Aug 1999 01:06:57 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.14854.366220.664463@weyr.cnri.reston.va.us>
Message-ID: <000301beef80$d26158c0$522d153f@tim>

[Fred L. Drake, Jr.]
> ...
>   We left off chmod(), which would make Tim happy, but that was only
> because it wasn't meaningful in context.

I'd be appalled to see chmod go away; for many people it's comfortable and
useful.  I want *another* way, to do what little bit is portable in a way that
doesn't require first mastering a badly designed interface from a dying OS
<wink>.

> We'd have to add it (or something equivalent) for a general purpose
> filesystem object.  So Tim's only happy if he can come up with a
> general interface that is actually portable (consider my earlier
> comments on setreadonly()).

I don't care about general here; making up a general new way to spell
everything that everyone may want to do under every OS would create an
interface even worse than chmod's.  My sister doesn't want to create files that
are read-only to the world but writable to her group -- she just wants to mark
certain precious files as read-only to minimize the chance of accidental
destruction.  What she wants is easy to do under Windows or Unix, and I expect
she's the norm rather than the exception.

>   On the other hand, you don't need chmod() or anything like it for
> most situations where a filesystem object would be useful.  An
> FTPFilesystem class would not be hard to write!

An OO filesystem object with a .makereadonly method suits me fine <wink>.


From tim_one at email.msn.com  Thu Aug 26 07:06:54 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 26 Aug 1999 01:06:54 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.6236.605103.369339@weyr.cnri.reston.va.us>
Message-ID: <000201beef80$d072f640$522d153f@tim>

[Fred L. Drake, Jr.]
>   I think we can simply declare that isreadonly() checks that the
> file doesn't allow the user to read it,

Had more in mind that the file doesn't allow the user to write it <wink>.

> but setreadonly() sounds to me like it wouldn't be portable to Unix.
> There's more than one (reasonable) way to make a file unreadable to
> a user just by manipulating permission bits, and which is best will
> vary according to both the user and the file's existing permissions.

"Portable" implies least common denominator, and the plain meaning of read-only
is that nobody (whether owner, group or world in Unix) has write permission.
People wanting something beyond that are going beyond what's portable, and
that's fine -- I'm not suggesting getting rid of chmod for Unix dweebs.  But by
the same token, Windows dweebs should get some other (as non-portable as chmod)
way to fiddle the bits important on *their* OS (only one of which chmod can
affect).

Billions of newbies will delightedly stick to the portable interface with the
name that makes sense.

the-percentage-of-programmers-doing-systems-programming-shrinks-by-
    the-millisecond-ly y'rs  - tim


From mal at lemburg.com  Sat Aug 28 16:37:50 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 28 Aug 1999 16:37:50 +0200
Subject: [Python-Dev] Iterating over dictionaries and objects in general
References: <990826114149.ZM59302@rayburn.hcs.tl> 
						<199908261702.NAA01866@eric.cnri.reston.va.us> 
						<37C57E01.2ADC02AE@digicool.com> <990826150216.ZM60002@rayburn.hcs.tl> <37C5BAF1.4D6C1031@lemburg.com> <37C5C320.CF11BC7C@digicool.com> <37C643B0.7ECA586@lemburg.com> <37C69FB3.9CB279C7@digicool.com>
Message-ID: <37C7F43E.67EEAB98@lemburg.com>

[Followup to a discussion on psa-members about iterating over
 dictionaries without creating intermediate lists]

Jim Fulton wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Jim Fulton wrote:
> > >
> > > > The problem with the PyDict_Next() approach is that it will only
> > > > work reliably from within a single C call. You can't return
> > > > to Python between calls to PyDict_Next(), because those could
> > > > modify the dictionary causing the next PyDict_Next() call to
> > > > fail or core dump.
> > >
> > > I do this all the time without problem.  Basically, you provide an
> > > index and  if the index is out of range, you simply get an end-of-data return.
> > > The only downside of this approach is that you might get "incorrect"
> > > results if the dictionary is modified between calls.  This isn't
> > > all that different from iterating over a list with an index.
> >
> > Hmm, that's true... but what if the dictionary gets resized
> > in between iterations ? The item layout is then likely to
> > change, so you could potentially get complet bogus.
> 
> I think I said that. :)

Just wanted to verify my understanding ;-)

> > Even iterating over items twice may then occur, I guess.
> 
> Yup.
> 
> Again, this is not so different from iterating over
> a list using a range:
> 
>   l=range(10)
>   for i in range.len(l):
>     l.insert(0,'Bruce')
>     print l[i]
> 
> This always outputs 'Bruce'. :)

Ok, so the "risk" is under user control. Fine with me...
 
> > Or perhaps via a special dictionary iterator, so that the following
> > works:
> >
> > for item in dictrange(d):
> >    ...
> 
> Yup.
> 
> > The iterator could then also take some extra actions to insure
> > that the dictionary hasn't been resized.
> 
> I don't think it should do that. It should simply
> stop when it has run out of items.

I think I'll give such an iterator a spin. Would be a nice
extension to mxTools.

BTW, a generic type slot for iterating over types would probably
be a nice feature too. The type slot could provide hooks of the
form it_first, it_last, it_next, it_prev which all work integer
index based, e.g. in pseudo code:

int i;
PyObject *item;

/* set up i and item to point to the first item */
if (obj.it_first(&i,&item) < 0)
   goto onError;
while (1) {
   PyObject_Print(item);
   /* move i and item to the next item; an IndexError is raised
      in case there are no more items */
   if (obj.it_next(&i,&item) < 0) {
	PyErr_Clear();
	break;
   }
}

These slots would cover all problem instances where iteration
over non-sequences or non-uniform sequences (i.e. sequences like
objects which don't provide konvex index sets, e.g. 1,2,3,6,7,8,11,12)
is required, e.g. dictionaries, multi-segment buffers

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   127 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gward at cnri.reston.va.us  Mon Aug 30 21:02:22 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:02:22 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
Message-ID: <19990830150222.B428@cnri.reston.va.us>

Hi all --

it recently occured to me that the 'spawn' module I wrote for the
Distutils (and which Perry Stoll extended to handle NT), could fit
nicely in the core library.  On Unix, it's just a front-end to
fork-and-exec; on NT, it's a front-end to spawnv().  In either case,
it's just enough code (and just tricky enough code) that not everybody
should have to duplicate it for their own uses.

The basic idea is this:

  from spawn import spawn
  ...
  spawn (['cmd', 'arg1', 'arg2'])
  # or
  spawn (['cmd'] + args)

you get the idea: it takes a *list* representing the command to spawn:
no strings to parse, no shells to get in the way, no sneaky
meta-characters ruining your day, draining your efficiency, or
compromising your security.  (Conversely, no pipelines, redirection,
etc.)

The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()'
depending on os.name.  Additionally, it takes a couple of optional
keyword arguments (all booleans): 'search_path', 'verbose', and
'dry_run', which do pretty much what you'd expect.

The module as it's currently in the Distutils code is attached.  Let me
know what you think...

        Greg
-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From skip at mojam.com  Mon Aug 30 21:11:50 1999
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 30 Aug 1999 14:11:50 -0500 (CDT)
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830150222.B428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <14282.54880.922571.792484@dolphin.mojam.com>

    Greg> it recently occured to me that the 'spawn' module I wrote for the
    Greg> Distutils (and which Perry Stoll extended to handle NT), could fit
    Greg> nicely in the core library.

How's spawn.spawn semantically different from the Windows-dependent
os.spawn?  How are stdout/stdin/stderr connected to the child process - just 
like fork+exec or something slightly higher level like os.popen?  If it's
semantically like os.spawn and a little bit higher level abstraction than
fork+exec, I'd vote for having the os module simply import it:

    from spawn import spawn

and thus make that function more widely available...

    Greg> The module as it's currently in the Distutils code is attached.

Not in the message I saw...

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From gward at cnri.reston.va.us  Mon Aug 30 21:14:57 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:14:57 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830150222.B428@cnri.reston.va.us>; from Greg Ward on Mon, Aug 30, 1999 at 03:02:22PM -0400
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <19990830151457.C428@cnri.reston.va.us>

On 30 August 1999, To python-dev at python.org said:
> The module as it's currently in the Distutils code is attached.  Let me
> know what you think...

New definition of "attached": I'll just reply to my own message with the 
code I meant to attach.  D'oh!

------------------------------------------------------------------------
"""distutils.spawn

Provides the 'spawn()' function, a front-end to various platform-
specific functions for launching another program in a sub-process."""

# created 1999/07/24, Greg Ward

__rcsid__ = "$Id: spawn.py,v 1.2 1999/08/29 18:20:56 gward Exp $"

import sys, os, string
from distutils.errors import *


def spawn (cmd,
           search_path=1,
           verbose=0,
           dry_run=0):

    """Run another program, specified as a command list 'cmd', in a new
       process.  'cmd' is just the argument list for the new process, ie.
       cmd[0] is the program to run and cmd[1:] are the rest of its
       arguments.  There is no way to run a program with a name different
       from that of its executable.

       If 'search_path' is true (the default), the system's executable
       search path will be used to find the program; otherwise, cmd[0] must
       be the exact path to the executable.  If 'verbose' is true, a
       one-line summary of the command will be printed before it is run.
       If 'dry_run' is true, the command will not actually be run.

       Raise DistutilsExecError if running the program fails in any way;
       just return on success."""

    if os.name == 'posix':
        _spawn_posix (cmd, search_path, verbose, dry_run)
    elif os.name in ( 'nt', 'windows' ):          # ???
        _spawn_nt (cmd, search_path, verbose, dry_run)
    else:
        raise DistutilsPlatformError, \
              "don't know how to spawn programs on platform '%s'" % os.name

# spawn ()

def _spawn_nt ( cmd,
                search_path=1,
                verbose=0,
                dry_run=0):
    import string
    executable = cmd[0]
    if search_path:
        paths = string.split( os.environ['PATH'], os.pathsep)
        base,ext = os.path.splitext(executable)
        if (ext != '.exe'):
            executable = executable + '.exe'
        if not os.path.isfile(executable):
            paths.reverse()         # go over the paths and keep the last one
            for p in paths:
                f = os.path.join( p, executable )
                if os.path.isfile ( f ):
                    # the file exists, we have a shot at spawn working
                    executable = f
    if verbose:
        print string.join ( [executable] + cmd[1:], ' ')
    if not dry_run:
        # spawn for NT requires a full path to the .exe
        rc = os.spawnv (os.P_WAIT, executable, cmd)
        if rc != 0:
            raise DistutilsExecError("command failed: %d" % rc) 

    
def _spawn_posix (cmd,
                  search_path=1,
                  verbose=0,
                  dry_run=0):

    if verbose:
        print string.join (cmd, ' ')
    if dry_run:
        return
    exec_fn = search_path and os.execvp or os.execv

    pid = os.fork ()

    if pid == 0:                        # in the child
        try:
            #print "cmd[0] =", cmd[0]
            #print "cmd =", cmd
            exec_fn (cmd[0], cmd)
        except OSError, e:
            sys.stderr.write ("unable to execute %s: %s\n" %
                              (cmd[0], e.strerror))
            os._exit (1)
            
        sys.stderr.write ("unable to execute %s for unknown reasons" % cmd[0])
        os._exit (1)

    
    else:                               # in the parent
        # Loop until the child either exits or is terminated by a signal
        # (ie. keep waiting if it's merely stopped)
        while 1:
            (pid, status) = os.waitpid (pid, 0)
            if os.WIFSIGNALED (status):
                raise DistutilsExecError, \
                      "command %s terminated by signal %d" % \
                      (cmd[0], os.WTERMSIG (status))

            elif os.WIFEXITED (status):
                exit_status = os.WEXITSTATUS (status)
                if exit_status == 0:
                    return              # hey, it succeeded!
                else:
                    raise DistutilsExecError, \
                          "command %s failed with exit status %d" % \
                          (cmd[0], exit_status)
        
            elif os.WIFSTOPPED (status):
                continue

            else:
                raise DistutilsExecError, \
                      "unknown error executing %s: termination status %d" % \
                      (cmd[0], status)
# _spawn_posix ()
------------------------------------------------------------------------

-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From gward at cnri.reston.va.us  Mon Aug 30 21:31:55 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:31:55 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <14282.54880.922571.792484@dolphin.mojam.com>; from Skip Montanaro on Mon, Aug 30, 1999 at 02:11:50PM -0500
References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com>
Message-ID: <19990830153155.D428@cnri.reston.va.us>

On 30 August 1999, Skip Montanaro said:
> 
>     Greg> it recently occured to me that the 'spawn' module I wrote for the
>     Greg> Distutils (and which Perry Stoll extended to handle NT), could fit
>     Greg> nicely in the core library.
> 
> How's spawn.spawn semantically different from the Windows-dependent
> os.spawn?

My understanding (purely from reading Perry's code!) is that the Windows
spawnv() and spawnve() calls require the full path of the executable,
and there is no spawnvp().  Hence, the bulk of Perry's '_spawn_nt()'
function is code to search the system path if the 'search_path' flag is
true.

In '_spawn_posix()', I just use either 'execv()' or 'execvp()'
for this.  The bulk of my code is the complicated dance required to
wait for a fork'ed child process to finish.

> How are stdout/stdin/stderr connected to the child process - just 
> like fork+exec or something slightly higher level like os.popen?

Just like fork 'n exec -- '_spawn_posix()' is just a front end to fork
and exec (either execv or execvp).

In a previous life, I *did* implement a spawning module for a certain
other popular scripting language that handles redirection and capturing
(backticks in the shell and that other scripting language).  It was a
lot of fun, but pretty hairy.  Took three attempts gradually developed
over two years to get it right in the end.  In fact, it does all the
easy stuff that a Unix shell does in spawning commands, ie. search the
path, fork 'n exec, and redirection and capturing.  Doesn't handle the
tricky stuff, ie. pipelines and job control.

The documentation for this module is 22 pages long; the code is 600+
lines of somewhat tricky Perl (1300 lines if you leave in comments and
blank lines).  That's why the Distutils spawn module doesn't do anything
with std{out,err,in}.

> If it's semantically like os.spawn and a little bit higher level
> abstraction than fork+exec, I'd vote for having the os module simply
> import it:

So os.spawnv and os.spawnve would be Windows-specific, but os.spawn
portable?  Could be confusing.  And despite the recent extended
discussion of the os module, I'm not sure if this fits the model.

BTW, is there anything like this on the Mac?  On what other OSs does it
even make sense to talk about programs spawning other programs?  (Surely
those GUI user interfaces have to do *something*...)

        Greg
-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From skip at mojam.com  Mon Aug 30 21:52:49 1999
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 30 Aug 1999 14:52:49 -0500 (CDT)
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830153155.D428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
	<14282.54880.922571.792484@dolphin.mojam.com>
	<19990830153155.D428@cnri.reston.va.us>
Message-ID: <14282.57574.918011.54595@dolphin.mojam.com>

    Greg> BTW, is there anything like this on the Mac? 

There will be, once Jack Jansen contributes _spawn_mac... ;-)

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From jack at oratrix.nl  Mon Aug 30 23:25:04 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 30 Aug 1999 23:25:04 +0200
Subject: [Python-Dev] Portable "spawn" module for core? 
In-Reply-To: Message by Greg Ward <gward@cnri.reston.va.us> ,
	     Mon, 30 Aug 1999 15:31:55 -0400 , <19990830153155.D428@cnri.reston.va.us> 
Message-ID: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl>

Recently, Greg Ward <gward at cnri.reston.va.us> said:
> BTW, is there anything like this on the Mac?  On what other OSs does it
> even make sense to talk about programs spawning other programs?  (Surely
> those GUI user interfaces have to do *something*...)

Yes, but the interface is quite a bit more high-level, so it's pretty
difficult to reconcile with the Unix and Windows "every argument is a
string" paradigm. You start the process and pass along an AppleEvent
(basically an RPC-call) that will be presented to the program upon
startup.

So on the mac there's a serious difference between (inventing the API
interface here, cut down to make it understandable to non-macheads:-)
  spawn("netscape", ("Open", "file.html"))
and
  spawn("netscape", ("OpenURL", "http://foo.com/file.html"))

The mac interface is (of course:-) infinitely more powerful, allowing
you to talk to running apps, adressing stuff in it as COM/OLE does,
etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is
impossible to represent in a meaningful way.

Add to that the fact that there's no stdin/stdout/stderr and there's
little common ground. The one area of common ground is "run program X
on files Y and Z and wait (or don't wait) for completion", so that is
something that could maybe have a special method that could be
implemented on all three mentioned platforms (and probably everything
else as well). And even then it'll be surprising to Mac users that
they have to _exit_ their editor (if you specify wait), not something
people commonly do.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at CNRI.Reston.VA.US  Mon Aug 30 23:29:55 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 30 Aug 1999 17:29:55 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Your message of "Mon, 30 Aug 1999 23:25:04 +0200."
             <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> 
References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> 
Message-ID: <199908302129.RAA08442@eric.cnri.reston.va.us>

> Recently, Greg Ward <gward at cnri.reston.va.us> said:
> > BTW, is there anything like this on the Mac?  On what other OSs does it
> > even make sense to talk about programs spawning other programs?  (Surely
> > those GUI user interfaces have to do *something*...)
> 
> Yes, but the interface is quite a bit more high-level, so it's pretty
> difficult to reconcile with the Unix and Windows "every argument is a
> string" paradigm. You start the process and pass along an AppleEvent
> (basically an RPC-call) that will be presented to the program upon
> startup.
> 
> So on the mac there's a serious difference between (inventing the API
> interface here, cut down to make it understandable to non-macheads:-)
>   spawn("netscape", ("Open", "file.html"))
> and
>   spawn("netscape", ("OpenURL", "http://foo.com/file.html"))
> 
> The mac interface is (of course:-) infinitely more powerful, allowing
> you to talk to running apps, adressing stuff in it as COM/OLE does,
> etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is
> impossible to represent in a meaningful way.
> 
> Add to that the fact that there's no stdin/stdout/stderr and there's
> little common ground. The one area of common ground is "run program X
> on files Y and Z and wait (or don't wait) for completion", so that is
> something that could maybe have a special method that could be
> implemented on all three mentioned platforms (and probably everything
> else as well). And even then it'll be surprising to Mac users that
> they have to _exit_ their editor (if you specify wait), not something
> people commonly do.

Indeed.  I'm guessing that Greg wrote his code specifically to drive
compilers, not so much to invoke an editor on a specific file.  It so
happens that the Windows compilers have command lines that look
sufficiently like the Unix compilers that this might actually work.

On the Mac, driving the compilers is best done using AppleEvents, so
it's probably better to to try to abuse the spawn() interface for
that...  (Greg, is there a higher level where the compiler actions are 
described without referring to specific programs, but perhaps just to
compiler actions and input and output files?)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at CNRI.Reston.VA.US  Mon Aug 30 23:35:45 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 30 Aug 1999 17:35:45 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Your message of "Mon, 30 Aug 1999 15:02:22 EDT."
             <19990830150222.B428@cnri.reston.va.us> 
References: <19990830150222.B428@cnri.reston.va.us> 
Message-ID: <199908302135.RAA08467@eric.cnri.reston.va.us>

> it recently occured to me that the 'spawn' module I wrote for the
> Distutils (and which Perry Stoll extended to handle NT), could fit
> nicely in the core library.  On Unix, it's just a front-end to
> fork-and-exec; on NT, it's a front-end to spawnv().  In either case,
> it's just enough code (and just tricky enough code) that not everybody
> should have to duplicate it for their own uses.
> 
> The basic idea is this:
> 
>   from spawn import spawn
>   ...
>   spawn (['cmd', 'arg1', 'arg2'])
>   # or
>   spawn (['cmd'] + args)
> 
> you get the idea: it takes a *list* representing the command to spawn:
> no strings to parse, no shells to get in the way, no sneaky
> meta-characters ruining your day, draining your efficiency, or
> compromising your security.  (Conversely, no pipelines, redirection,
> etc.)
> 
> The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()'
> depending on os.name.  Additionally, it takes a couple of optional
> keyword arguments (all booleans): 'search_path', 'verbose', and
> 'dry_run', which do pretty much what you'd expect.
> 
> The module as it's currently in the Distutils code is attached.  Let me
> know what you think...

I'm not sure that the verbose and dry_run options belong in the
standard library.  When both are given, this does something
semi-useful; for Posix that's basically just printing the arguments,
while for NT it prints the exact command that will be executed.  Not
sure if that's significant though.

Perhaps it's better to extract the code that runs the path to find the
right executable and make that into a separate routine.  (Also, rather
than reversing the path, I would break out of the loop at the first
hit.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Mon Aug 30 23:38:36 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 17:38:36 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <199908302129.RAA08442@eric.cnri.reston.va.us>; from Guido van Rossum on Mon, Aug 30, 1999 at 05:29:55PM -0400
References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> <199908302129.RAA08442@eric.cnri.reston.va.us>
Message-ID: <19990830173836.F428@cnri.reston.va.us>

On 30 August 1999, Guido van Rossum said:
> Indeed.  I'm guessing that Greg wrote his code specifically to drive
> compilers, not so much to invoke an editor on a specific file.  It so
> happens that the Windows compilers have command lines that look
> sufficiently like the Unix compilers that this might actually work.

Correct, but the spawn module I posted should work for any case where
you want to run an external command synchronously without redirecting
I/O.  (And it could probably be extended to handle those cases, but a) I
don't need them for Distutils [yet!], and b) I don't know how to do it
portably.)

> On the Mac, driving the compilers is best done using AppleEvents, so
> it's probably better to to try to abuse the spawn() interface for
> that...  (Greg, is there a higher level where the compiler actions are 
> described without referring to specific programs, but perhaps just to
> compiler actions and input and output files?)

[off-topic alert... probably belongs on distutils-sig, but there you go]
Yes, my CCompiler class is all about providing a (hopefully) compiler-
and platform-neutral interface to a C/C++ compiler.  Currently there're
only two concrete subclasses of this: UnixCCompiler and MSVCCompiler,
and they both obviously use spawn, because Unix C compilers and MSVC
both provide that kind of interface.  A hypothetical sibling class that
provides an interface to some Mac C compiler might use a souped-up spawn
that "knows about" Apple Events, or it might use some other interface to
Apple Events.  If Jack's simplified summary of what passing Apple Events
to a command looks like is accurate, maybe spawn can be souped up to
work on the Mac.  Or we might need a dedicated module for running Mac
programs.

So does anybody have code to run external programs on the Mac using
Apple Events?  Would it be possible/reasonable to add that as
'_spawn_mac()' to my spawn module?

        Greg
--
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From jack at oratrix.nl  Mon Aug 30 23:52:29 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 30 Aug 1999 23:52:29 +0200
Subject: [Python-Dev] Portable "spawn" module for core? 
In-Reply-To: Message by Greg Ward <gward@cnri.reston.va.us> ,
	     Mon, 30 Aug 1999 17:38:36 -0400 , <19990830173836.F428@cnri.reston.va.us> 
Message-ID: <19990830215234.ED4E718B9FB@oratrix.oratrix.nl>

Hmm, if we're talking a "Python Make" or some such here the best way
would probably be to use Tool Server. Tool Server is a thing that is
based on Apple's old MPW programming environment, that is still
supported by compiler vendors like MetroWerks.

The nice thing of Tool Server for this type of work is that it _is_
command-line based, so you can probably send it things like
  spawn("cc", "-O", "test.c")

But, although I know it is possible to do this with ToolServer, I
haven't a clue on how to do it...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From tim_one at email.msn.com  Tue Aug 31 07:44:18 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 31 Aug 1999 01:44:18 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830153155.D428@cnri.reston.va.us>
Message-ID: <000101bef373$de2974c0$932d153f@tim>

[Greg Ward]
> ...
> In a previous life, I *did* implement a spawning module for
> a certain other popular scripting language that handles
> redirection and capturing (backticks in the shell and that other
> scripting language).  It was a lot of fun, but pretty hairy.  Took
> three attempts gradually developed over two years to get it right
> in the end.  In fact, it does all the easy stuff that a Unix shell
> does in spawning commands, ie. search the path, fork 'n exec, and
> redirection and capturing.  Doesn't handle the tricky stuff, ie.
> pipelines and job control.
>
> The documentation for this module is 22 pages long; the code
> is 600+ lines of somewhat tricky Perl (1300 lines if you leave
> in comments and blank lines).  That's why the Distutils spawn
> module doesn't do anything with std{out,err,in}.

Note that win/tclWinPipe.c-- which contains the Windows-specific support for
Tcl's "exec" cmd --is about 3,200 lines of C.  It does handle pipelines and
redirection, and even fakes pipes as needed with temp files when it can
identify a pipeline component as belonging to the 16-bit subsystem.  Even so,
the Tcl help page for "exec" bristles with hilarious caveats under the Windows
subsection; e.g.,

    When redirecting from NUL:, some applications may hang, others
    will get an infinite stream of "0x01" bytes, and some will
    actually correctly get an immediate end-of-file; the behavior
    seems to depend upon something compiled into the application
    itself.  When redirecting greater than 4K or so to NUL:, some
    applications will hang.  The above problems do not happen with
    32-bit applications.

Still, people seem very happy with Tcl's exec, and I'm certain no language
tries harder to provide a portable way to "do command lines".

Two points to that:

1) If Python ever wants to do something similar, let's steal the Tcl code (&
unlike stealing Perl's code, stealing Tcl's code actually looks possible --
it's very much better organized and written).

2) For all its heroic efforts to hide platform limitations,

int
Tcl_ExecObjCmd(dummy, interp, objc, objv)
    ClientData dummy;			/* Not used. */
    Tcl_Interp *interp;			/* Current interpreter. */
    int objc;				/* Number of arguments. */
    Tcl_Obj *CONST objv[];		/* Argument objects. */
{
#ifdef MAC_TCL

    Tcl_AppendResult(interp, "exec not implemented under Mac OS",
		(char *)NULL);
    return TCL_ERROR;

#else
...

a-generalized-spawn-is-a-good-start-ly y'rs  - tim


From fredrik at pythonware.com  Tue Aug 31 08:39:56 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 31 Aug 1999 08:39:56 +0200
Subject: [Python-Dev] Portable "spawn" module for core?
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <005101bef37b$b0415070$f29b12c2@secret.pythonware.com>

Greg Ward <gward at cnri.reston.va.us> wrote:
> it recently occured to me that the 'spawn' module I wrote for the
> Distutils (and which Perry Stoll extended to handle NT), could fit
> nicely in the core library.  On Unix, it's just a front-end to
> fork-and-exec; on NT, it's a front-end to spawnv().

any reason this couldn't go into the os module instead?

just add parts of it to os.py, and change the docs to say
that spawn* are supported on Windows and Unix...

(supporting the full set of spawn* primitives would
of course be nice, btw.  just like os.py provides all
exec variants...)

</F>


From da at ski.org  Tue Aug  3 01:01:26 1999
From: da at ski.org (David Ascher)
Date: Mon, 2 Aug 1999 16:01:26 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
Message-ID: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>

An issue which has dogged the NumPy project is that there is (to my
knowledge) no way to pickle very large arrays without creating strings
which contain all of the data.  This can be a problem given that NumPy
arrays tend to be very large -- often several megabytes, sometimes much
bigger.  This slows things down, sometimes a lot, depending on the
platform. It seems that it should be possible to do something more
efficient.

Two alternatives come to mind:

 -- define a new pickling protocol which passes a file-like object to the
    instance and have the instance write itself to that file, being as
    efficient or inefficient as it cares to.  This protocol is used only
    if the instance/type defines the appropriate slot.  Alternatively,
    enrich the semantics of the getstate interaction, so that an object
    can return partial data and tell the pickling mechanism to come back
    for more.

 -- make pickling of objects which support the buffer interface use that
    inteface's notion of segments and use that 'chunk' size to do
    something more efficient if not necessarily most efficient.  (oh, and
    make NumPy arrays support the buffer interface =).  This is simple
    for NumPy arrays since we want to pickle "everything", but may not be
    what other buffer-supporting objects want. 

Thoughts?  Alternatives?

--david


From mhammond at skippinet.com.au  Tue Aug  3 02:41:23 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 3 Aug 1999 10:41:23 +1000
Subject: [Python-Dev] Buffer interface in abstract.c?
Message-ID: <001001bedd48$ea796280$1101a8c0@bobcat>

Hi all,
	Im trying to slowly wean myself over to the buffer interfaces.

My exploration so far indicates that, for most cases, simply replacing
"PyString_FromStringAndSize" with "PyBuffer_FromMemory" handles the vast
majority of cases, and is preferred when the data contains arbitary bytes.
PyArg_ParseTuple("s#", ...) still works correctly as we would hope.

However, performing this explicitly is a pain.  Looking at getargs.c, the
code to achieve this is a little too convoluted to cut-and-paste each time.

Therefore, I would like to propose these functions to be added to
abstract.c:

int PyObject_GetBufferSize();
void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
const void *PyObject_GetReadOnlyBuffer();

Although equivalent functions exist for the buffer object, I can't see the
equivalent abstract implementations - ie, that work with any object
supporting the protocol.

Im willing to provide a patch if there is agreement a) the general idea is
good, and b) my specific spelling of the idea is OK (less likely -
PyBuffer_* seems better, but loses any implication of being abstract?).

Thoughts?

Mark.


From gstein at lyra.org  Tue Aug  3 03:51:43 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 02 Aug 1999 18:51:43 -0700
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat>
Message-ID: <37A64B2F.3386F0A9@lyra.org>

Mark Hammond wrote:
> ...
> Therefore, I would like to propose these functions to be added to
> abstract.c:
> 
> int PyObject_GetBufferSize();
> void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
> const void *PyObject_GetReadOnlyBuffer();
> 
> Although equivalent functions exist for the buffer object, I can't see the
> equivalent abstract implementations - ie, that work with any object
> supporting the protocol.
> 
> Im willing to provide a patch if there is agreement a) the general idea is
> good, and b) my specific spelling of the idea is OK (less likely -
> PyBuffer_* seems better, but loses any implication of being abstract?).

Marc-Andre proposed exactly the same thing back at the end of March (to
me and Guido). The two of us hashed out some of the stuff and M.A. came
up with a full patch for the stuff. Guido was relatively non-committal
at the point one way or another, but said they seemed fine. It appears
the stuff never made it into source control.

If Marc-Andre can resurface the final proposal/patch, then we'd be set.

Until then: use the bufferprocs :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Tue Aug  3 11:11:11 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 11:11:11 +0200
Subject: [Python-Dev] Pickling w/ low overhead
References: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
Message-ID: <37A6B22F.7A14BA2C@lemburg.com>

David Ascher wrote:
> 
> An issue which has dogged the NumPy project is that there is (to my
> knowledge) no way to pickle very large arrays without creating strings
> which contain all of the data.  This can be a problem given that NumPy
> arrays tend to be very large -- often several megabytes, sometimes much
> bigger.  This slows things down, sometimes a lot, depending on the
> platform. It seems that it should be possible to do something more
> efficient.
> 
> Two alternatives come to mind:
> 
>  -- define a new pickling protocol which passes a file-like object to the
>     instance and have the instance write itself to that file, being as
>     efficient or inefficient as it cares to.  This protocol is used only
>     if the instance/type defines the appropriate slot.  Alternatively,
>     enrich the semantics of the getstate interaction, so that an object
>     can return partial data and tell the pickling mechanism to come back
>     for more.
> 
>  -- make pickling of objects which support the buffer interface use that
>     inteface's notion of segments and use that 'chunk' size to do
>     something more efficient if not necessarily most efficient.  (oh, and
>     make NumPy arrays support the buffer interface =).  This is simple
>     for NumPy arrays since we want to pickle "everything", but may not be
>     what other buffer-supporting objects want.
> 
> Thoughts?  Alternatives?

Hmm, types can register their own pickling/unpickling functions
via copy_reg, so they can access the self.write method in pickle.py
to implement the write to file interface. Don't know how this
would be done for cPickle.c though.

For instances the situation is different since there is no
dispatching done on a per-class basis. I guess an optional argument
could help here.

Perhaps some lazy pickling wrapper would help fix this in general:
an object which calls back into the to-be-pickled object to
access the data rather than store the data in a huge string.

Yet another idea would be using memory mapped files instead
of strings as temporary storage (but this is probably hard to implement
right and not as portable).

Dunno... just some thoughts.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Aug  3 09:50:33 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 09:50:33 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A64B2F.3386F0A9@lyra.org>
Message-ID: <37A69F49.3575AE85@lemburg.com>

Greg Stein wrote:
> 
> Mark Hammond wrote:
> > ...
> > Therefore, I would like to propose these functions to be added to
> > abstract.c:
> >
> > int PyObject_GetBufferSize();
> > void *PyObject_GetReadWriteBuffer(); /* or "char *"?  */
> > const void *PyObject_GetReadOnlyBuffer();
> >
> > Although equivalent functions exist for the buffer object, I can't see the
> > equivalent abstract implementations - ie, that work with any object
> > supporting the protocol.
> >
> > Im willing to provide a patch if there is agreement a) the general idea is
> > good, and b) my specific spelling of the idea is OK (less likely -
> > PyBuffer_* seems better, but loses any implication of being abstract?).
> 
> Marc-Andre proposed exactly the same thing back at the end of March (to
> me and Guido). The two of us hashed out some of the stuff and M.A. came
> up with a full patch for the stuff. Guido was relatively non-committal
> at the point one way or another, but said they seemed fine. It appears
> the stuff never made it into source control.
> 
> If Marc-Andre can resurface the final proposal/patch, then we'd be set.

Below is the code I currently use. I don't really remember if this
is what Greg and I discussed a while back, but I'm sure he'll
correct me ;-) Note that you the buffer length is implicitly
returned by these APIs.

/* Takes an arbitrary object which must support the character (single
   segment) buffer interface and returns a pointer to a read-only
   memory location useable as character based input for subsequent
   processing.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsCharBuffer(PyObject *obj,
			  const char **buffer,
			  int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    const char *pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getcharbuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a character buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getcharbuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}

/* Same as PyObject_AsCharBuffer() except that this API expects a
   readable (single segment) buffer interface and returns a pointer
   to a read-only memory location which can contain arbitrary data.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsReadBuffer(PyObject *obj,
			  const void **buffer,
			  int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    void *pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getreadbuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a readable buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getreadbuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}

/* Takes an arbitrary object which must support the writeable (single
   segment) buffer interface and returns a pointer to a writeable
   memory location in buffer of size buffer_len.

   buffer and buffer_len are only set in case no error
   occurrs. Otherwise, -1 is returned and an exception set.

*/

static
int PyObject_AsWriteBuffer(PyObject *obj,
			   void **buffer,
			   int *buffer_len)
{
    PyBufferProcs *pb = obj->ob_type->tp_as_buffer;
    void*pp;
    int len;

    if ( pb == NULL ||
	 pb->bf_getwritebuffer == NULL ||
	 pb->bf_getsegcount == NULL ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a writeable buffer object");
	goto onError;
    }
    if ( (*pb->bf_getsegcount)(obj,NULL) != 1 ) {
	PyErr_SetString(PyExc_TypeError,
			"expected a single-segment buffer object");
	goto onError;
    }
    len = (*pb->bf_getwritebuffer)(obj,0,&pp);
    if (len < 0)
	goto onError;
    *buffer = pp;
    *buffer_len = len;
    return 0;

 onError:
    return -1;
}


-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Tue Aug  3 11:53:39 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 03 Aug 1999 11:53:39 +0200
Subject: [Python-Dev] Buffer interface in abstract.c? 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Tue, 03 Aug 1999 09:50:33 +0200 , <37A69F49.3575AE85@lemburg.com> 
Message-ID: <19990803095339.E02CE303120@snelboot.oratrix.nl>

Why not pass the index to the As*Buffer routines as well and make getsegcount 
available too? Then you could code things like
  for(i=0; i<PyObject_GetBufferCount(obj); i++) {
	if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
		return -1;
	write(fp, buf, count);
  }

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein at lyra.org  Tue Aug  3 12:25:11 1999
From: gstein at lyra.org (Greg Stein)
Date: Tue, 03 Aug 1999 03:25:11 -0700
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <19990803095339.E02CE303120@snelboot.oratrix.nl>
Message-ID: <37A6C387.7360D792@lyra.org>

Jack Jansen wrote:
> 
> Why not pass the index to the As*Buffer routines as well and make getsegcount
> available too? Then you could code things like
>   for(i=0; i<PyObject_GetBufferCount(obj); i++) {
>         if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
>                 return -1;
>         write(fp, buf, count);
>   }

Simply because multiple segments hasn't been seen. All objects
supporting the buffer interface have a single segment. IMO, it is best
to drop the argument to make typical usage easier. For handling multiple
segments, a caller can use the raw interface rather than the handy
functions.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jim at digicool.com  Tue Aug  3 12:58:54 1999
From: jim at digicool.com (Jim Fulton)
Date: Tue, 03 Aug 1999 06:58:54 -0400
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat>
Message-ID: <37A6CB6E.C990F561@digicool.com>

Mark Hammond wrote:
> 
> Hi all,
>         Im trying to slowly wean myself over to the buffer interfaces.

OK, I'll bite.  Where is the buffer interface documented?  I found references
to it in various places (e.g. built-in buffer()) but didn't find the interface 
itself.

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From mal at lemburg.com  Tue Aug  3 13:06:46 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 13:06:46 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <19990803095339.E02CE303120@snelboot.oratrix.nl>
Message-ID: <37A6CD46.642A9C6D@lemburg.com>

Jack Jansen wrote:
> 
> Why not pass the index to the As*Buffer routines as well and make getsegcount
> available too? Then you could code things like
>   for(i=0; i<PyObject_GetBufferCount(obj); i++) {
>         if ( PyObject_AsCharBuffer(obj, &buf, &count, i) < 0 )
>                 return -1;
>         write(fp, buf, count);
>   }

Well, just like Greg said, this is not much different than using the
buffer interface directly. While the above would be a handy
PyObject_WriteAsBuffer() kind of helper, I don't think that this
is really used all that much. E.g. in mxODBC I use the APIs
for accessing the raw char data in a buffer: the pointer is passed
directly to the ODBC APIs without copying, which makes things
quite fast. IMHO, this is the greatest advantage of the buffer
interface.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at cnri.reston.va.us  Tue Aug  3 15:07:44 1999
From: fdrake at cnri.reston.va.us (Fred L. Drake)
Date: Tue, 3 Aug 1999 09:07:44 -0400 (EDT)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <37A64B2F.3386F0A9@lyra.org>
References: <001001bedd48$ea796280$1101a8c0@bobcat>
	<37A64B2F.3386F0A9@lyra.org>
Message-ID: <14246.59808.561395.761772@weyr.cnri.reston.va.us>

Greg Stein writes:
 > Until then: use the bufferprocs :-)

Greg,
  On the topic of the buffer interface: Have you written documentation 
for this that I can include in the API reference?  Bugging you about
this is on my to-do list.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From mal at lemburg.com  Tue Aug  3 13:29:43 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 13:29:43 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <001001bedd48$ea796280$1101a8c0@bobcat> <37A6CB6E.C990F561@digicool.com>
Message-ID: <37A6D2A7.27F27554@lemburg.com>

Jim Fulton wrote:
> 
> Mark Hammond wrote:
> >
> > Hi all,
> >         Im trying to slowly wean myself over to the buffer interfaces.
> 
> OK, I'll bite.  Where is the buffer interface documented?  I found references
> to it in various places (e.g. built-in buffer()) but didn't find the interface
> itself.

I guess it's a read-the-source feature :-) Objects/bufferobject.c
and Include/object.h provide a start. Objects/stringobject.c has
a "sample" implementation.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Tue Aug  3 16:45:25 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Tue, 03 Aug 1999 16:45:25 +0200
Subject: [Python-Dev] Buffer interface in abstract.c? 
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Tue, 03 Aug 1999 03:25:11 -0700 , <37A6C387.7360D792@lyra.org> 
Message-ID: <19990803144526.6B796303120@snelboot.oratrix.nl>

> > Why not pass the index to the As*Buffer routines as well and make getsegcount
> > available too? 
> 
> Simply because multiple segments hasn't been seen. All objects
> supporting the buffer interface have a single segment.

Hmm. And I went out of my way to include this stupid multi-buffer stuff 
because the NumPy folks said they couldn't live without it (and one of the 
reasons for the buffer stuff was to allow NumPy arrays, which may be 
discontiguous, to be written efficiently).

Can someone confirm that the Numeric stuff indeed doesn't use this?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From da at ski.org  Tue Aug  3 18:19:19 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:19:19 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <37A6B22F.7A14BA2C@lemburg.com>
Message-ID: <Pine.WNT.4.04.9908030911550.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, M.-A. Lemburg wrote:

> Hmm, types can register their own pickling/unpickling functions
> via copy_reg, so they can access the self.write method in pickle.py
> to implement the write to file interface. 

Are you sure?  My understanding of copy_reg is, as stated in the doc:

pickle (type, function[, constructor]) 
    Declares that function should be used as a ``reduction'' function for
    objects of type or class type. function should return either a string
    or a tuple. The optional constructor parameter, if provided, is a
    callable object which can be used to reconstruct the object when
    called with the tuple of arguments returned by function at pickling
    time.  

How does one access the 'self.write method in pickle.py'?

> Perhaps some lazy pickling wrapper would help fix this in general:
> an object which calls back into the to-be-pickled object to
> access the data rather than store the data in a huge string.

Right.  That's an idea.

> Yet another idea would be using memory mapped files instead
> of strings as temporary storage (but this is probably hard to implement
> right and not as portable).

That's a very interesting idea!  I'll try that -- it might just be the
easiest way to do this.  I think that portability isn't a huge concern --
the folks who are coming up with the speed issue are on platforms which
have mmap support.

Thanks for the suggestions.

--david


From da at ski.org  Tue Aug  3 18:20:37 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:20:37 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Buffer interface in abstract.c?
In-Reply-To: <37A6C387.7360D792@lyra.org>
Message-ID: <Pine.WNT.4.04.9908030920070.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Greg Stein wrote:

> Simply because multiple segments hasn't been seen. All objects
> supporting the buffer interface have a single segment. IMO, it is best

FYI, if/when NumPy objects support the buffer API, they will require
multiple-segments.  


From da at ski.org  Tue Aug  3 18:23:31 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 09:23:31 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Buffer interface in abstract.c? 
In-Reply-To: <19990803144526.6B796303120@snelboot.oratrix.nl>
Message-ID: <Pine.WNT.4.04.9908030921430.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Jack Jansen wrote:

> > > Why not pass the index to the As*Buffer routines as well and make getsegcount
> > > available too? 
> > 
> > Simply because multiple segments hasn't been seen. All objects
> > supporting the buffer interface have a single segment.
> 
> Hmm. And I went out of my way to include this stupid multi-buffer stuff 
> because the NumPy folks said they couldn't live without it (and one of the 
> reasons for the buffer stuff was to allow NumPy arrays, which may be 
> discontiguous, to be written efficiently).
> 
> Can someone confirm that the Numeric stuff indeed doesn't use this?

/usr/LLNLDistribution/Numerical/Include$ grep buffer *.h
/usr/LLNLDistribution/Numerical/Include$

Yes. =) 

See the other thread on low-overhead pickling.

But again, *if* multiarrays supported the buffer interface, they'd have to
use the multi-segment feature (repeating myself).

--david


From mal at lemburg.com  Tue Aug  3 21:17:16 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 03 Aug 1999 21:17:16 +0200
Subject: [Python-Dev] Pickling w/ low overhead
References: <Pine.WNT.4.04.9908030911550.145-100000@rigoletto.ski.org>
Message-ID: <37A7403C.3BC05D02@lemburg.com>

David Ascher wrote:
> 
> On Tue, 3 Aug 1999, M.-A. Lemburg wrote:
> 
> > Hmm, types can register their own pickling/unpickling functions
> > via copy_reg, so they can access the self.write method in pickle.py
> > to implement the write to file interface.
> 
> Are you sure?  My understanding of copy_reg is, as stated in the doc:
> 
> pickle (type, function[, constructor])
>     Declares that function should be used as a ``reduction'' function for
>     objects of type or class type. function should return either a string
>     or a tuple. The optional constructor parameter, if provided, is a
>     callable object which can be used to reconstruct the object when
>     called with the tuple of arguments returned by function at pickling
>     time.
> 
> How does one access the 'self.write method in pickle.py'?

Ooops. Sorry, that doesn't work... well at least not using "normal"
Python ;-) You could of course simply go up one stack frame and
then grab the self object and then... well, you know...

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   150 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From skip at mojam.com  Tue Aug  3 22:47:04 1999
From: skip at mojam.com (Skip Montanaro)
Date: Tue,  3 Aug 1999 15:47:04 -0500 (CDT)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
References: <Pine.WNT.4.04.9908021408490.155-100000@rigoletto.ski.org>
Message-ID: <14247.21628.225029.392711@dolphin.mojam.com>

    David> An issue which has dogged the NumPy project is that there is (to
    David> my knowledge) no way to pickle very large arrays without creating
    David> strings which contain all of the data.  This can be a problem
    David> given that NumPy arrays tend to be very large -- often several
    David> megabytes, sometimes much bigger.  This slows things down,
    David> sometimes a lot, depending on the platform. It seems that it
    David> should be possible to do something more efficient.

David,

Using __getstate__/__setstate__, could you create a compressed
representation using zlib or some other scheme?  I don't know how well
numeric data compresses in general, but that might help.  Also, I trust you
use cPickle when it's available, yes?

Skip Montanaro	| http://www.mojam.com/
skip at mojam.com  | http://www.musi-cal.com/~skip/
847-475-3758


From da at ski.org  Tue Aug  3 22:58:23 1999
From: da at ski.org (David Ascher)
Date: Tue, 3 Aug 1999 13:58:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Pickling w/ low overhead
In-Reply-To: <14247.21628.225029.392711@dolphin.mojam.com>
Message-ID: <Pine.WNT.4.04.9908031349090.145-100000@rigoletto.ski.org>

On Tue, 3 Aug 1999, Skip Montanaro wrote:

> Using __getstate__/__setstate__, could you create a compressed
> representation using zlib or some other scheme?  I don't know how well
> numeric data compresses in general, but that might help.  Also, I trust you
> use cPickle when it's available, yes?

I *really* hate to admit it, but I've found the source of the most massive
problem in the pickling process that I was using.  I didn't use binary
mode, which meant that the huge strings were written & read
one-character-at-a-time.

I think I'll put a big fat note in the NumPy doc to that effect.

(note that luckily this just affected my usage, not all NumPy users).

<embarassed sheepish grin>

--da


From gstein at lyra.org  Wed Aug  4 21:15:27 1999
From: gstein at lyra.org (Greg Stein)
Date: Wed, 04 Aug 1999 12:15:27 -0700
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex
References: <199908041313.JAA26344@weyr.cnri.reston.va.us>
Message-ID: <37A8914F.6F5B9971@lyra.org>

Fred L. Drake wrote:
> 
> Update of /projects/cvsroot/python/dist/src/Doc/api
> In directory weyr:/home/fdrake/projects/python/Doc/api
> 
> Modified Files:
>         api.tex
> Log Message:
> 
> Started documentation on buffer objects & types.  Very preliminary.
> 
> Greg Stein:  Please help with this; it's your baby!
> 
> _______________________________________________
> Python-checkins mailing list
> Python-checkins at python.org
> http://www.python.org/mailman/listinfo/python-checkins


All righty. I'll send some doc on this stuff. Somebody else did the
initial buffer interface, but it seems that it has fallen to me now :-)

Please give me a little while to get to this, though. I'm in and out of
town for the next four weeks. <SubtleAnnouncement> I'm in the process of
moving into a new house in Palo Alto, CA, and I'm travelling back and
forth until Anni and I move for real in September. </SubtleAnnouncement>

I should be able to get to this by the weekend, or possibly in a couple
weeks.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From fdrake at acm.org  Wed Aug  4 23:00:26 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 4 Aug 1999 17:00:26 -0400 (EDT)
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api api.tex
In-Reply-To: <37A8914F.6F5B9971@lyra.org>
References: <199908041313.JAA26344@weyr.cnri.reston.va.us>
	<37A8914F.6F5B9971@lyra.org>
Message-ID: <14248.43498.664539.597656@weyr.cnri.reston.va.us>

Greg Stein writes:
 > All righty. I'll send some doc on this stuff. Somebody else did the
 > initial buffer interface, but it seems that it has fallen to me now :-)

  I was not aware that you were not the origin of this work; feel free 
to pass it to the right person.

 > Please give me a little while to get to this, though. I'm in and out of
 > town for the next four weeks. <SubtleAnnouncement> I'm in the process of
 > moving into a new house in Palo Alto, CA, and I'm travelling back and
 > forth until Anni and I move for real in September. </SubtleAnnouncement>

  Cool!

 > I should be able to get to this by the weekend, or possibly in a couple
 > weeks.

  That's good enough for me.  I expect it may be a couple of months or 
more before I try and get another release out with various fixes and
additions.  There's not a huge need to update the released doc set,
other than a few embarassing editorial...er, "oversights" (!).


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From jack at oratrix.nl  Thu Aug  5 11:57:33 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Thu, 05 Aug 1999 11:57:33 +0200
Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Doc/api 
 api.tex
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Wed, 04 Aug 1999 12:15:27 -0700 , <37A8914F.6F5B9971@lyra.org> 
Message-ID: <19990805095733.69D90303120@snelboot.oratrix.nl>

> All righty. I'll send some doc on this stuff. Somebody else did the
> initial buffer interface, but it seems that it has fallen to me now :-)

I think I did, but I gladly bequeath it to you. (Hmm, that's the first time I 
typed "bequeath", I think).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From fredrik at pythonware.com  Thu Aug  5 17:46:43 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 5 Aug 1999 17:46:43 +0200
Subject: [Python-Dev] Buffer interface in abstract.c?
References: <Pine.WNT.4.04.9908030920070.145-100000@rigoletto.ski.org>
Message-ID: <009801bedf59$b8150020$f29b12c2@secret.pythonware.com>

> > Simply because multiple segments hasn't been seen. All objects
> > supporting the buffer interface have a single segment. IMO, it is best
> 
> FYI, if/when NumPy objects support the buffer API, they will require
> multiple-segments.  

same goes for PIL.  in the worst case, there's
one segment per line.

...

on the other hand, I think something is missing from
the buffer design; I definitely don't like that people
can write and marshal objects that happen to
implement the buffer interface, only to find that
Python didn't do what they expected...

>>> import unicode
>>> import marshal
>>> u = unicode.unicode
>>> s = u("foo")
>>> data = marshal.dumps(s)
>>> marshal.loads(data)
'f\000o\000o\000'
>>> type(marshal.loads(data))
<type 'string'>

as for PIL, I would also prefer if the exported buffer
corresponded to what you get from im.tostring().  iirc,
that cannot be done -- I cannot export via a temporary
memory buffer, since there's no way to know when to
get rid of it...

</F>


From jack at oratrix.nl  Thu Aug  5 22:59:46 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Thu, 05 Aug 1999 22:59:46 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Message by "Fredrik Lundh" <fredrik@pythonware.com> ,
	     Thu, 5 Aug 1999 17:46:43 +0200 , <009801bedf59$b8150020$f29b12c2@secret.pythonware.com> 
Message-ID: <19990805205952.531B9E267A@oratrix.oratrix.nl>

Recently, "Fredrik Lundh" <fredrik at pythonware.com> said:
> on the other hand, I think something is missing from
> the buffer design; I definitely don't like that people
> can write and marshal objects that happen to
> implement the buffer interface, only to find that
> Python didn't do what they expected...
> 
> >>> import unicode
> >>> import marshal
> >>> u = unicode.unicode
> >>> s = u("foo")
> >>> data = marshal.dumps(s)
> >>> marshal.loads(data)
> 'f\000o\000o\000'
> >>> type(marshal.loads(data))
> <type 'string'>

Hmm. Looking at the code there is a catchall at the end, with a
comment explicitly saying "Write unknown buffer-style objects as a string".
IMHO this is an incorrect design, but that's a bit philosophical (so
I'll gladly defer to Our Great Philosopher if he has anything to say
on the matter:-). Unless, of course, there are buffer-style non-string 
objects around that are better read back as strings than not read back 
at all.

Hmm again, I think I'd like it better if marshal.dumps() would barf on 
attempts to write unrepresentable data. Currently unrepresentable
objects are written as TYPE_UNKNOWN (unless they have bufferness (or
should I call that "a buffer-aspect"? :-)), which means you think you
are writing correctly marshalled data but you'll be in for an
exception when you try to read it back...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From akuchlin at mems-exchange.org  Fri Aug  6 00:24:03 1999
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Thu, 5 Aug 1999 18:24:03 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
Message-ID: <199908052224.SAA24159@amarok.cnri.reston.va.us>

A while back the suggestion was made that the mmapfile module be added
to the core distribution, and there was a guardedly positive reaction.
Should I go ahead and do that?  No one reported any problems when I
asked for bug reports, but that was probably because no one tried it;
putting it in the core would cause more people to try it.

I suppose this leads to a more important question: at what point
should we start checking 1.6-only things into the CVS tree?  For
example, once the current alphas of the re module are up to it
(they're not yet), when should they be checked in?

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Kids! Bringing about Armageddon can be dangerous. Do not attempt it in your
home.
    -- Terry Pratchett & Neil Gaiman, _Good Omens_


From bwarsaw at cnri.reston.va.us  Fri Aug  6 04:10:18 1999
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Thu, 5 Aug 1999 22:10:18 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
References: <199908052224.SAA24159@amarok.cnri.reston.va.us>
Message-ID: <14250.17418.781127.684009@anthem.cnri.reston.va.us>

>>>>> "AMK" == Andrew M Kuchling <akuchlin at mems-exchange.org> writes:

    AMK> I suppose this leads to a more important question: at what
    AMK> point should we start checking 1.6-only things into the CVS
    AMK> tree?  For example, once the current alphas of the re module
    AMK> are up to it (they're not yet), when should they be checked
    AMK> in?

Good question.  I've had a bunch of people ask about the string
methods branch, which I'm assuming will be a 1.6 feature, and I'd like
to get that checked in at some point too.  I think what's holding this
up is that Guido hasn't decided whether there will be a patch release
to 1.5.2 or not.

-Barry


From tim_one at email.msn.com  Fri Aug  6 04:26:06 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 5 Aug 1999 22:26:06 -0400
Subject: [Python-Dev] mmapfile module
In-Reply-To: <199908052224.SAA24159@amarok.cnri.reston.va.us>
Message-ID: <000201bedfb3$09a99000$98a22299@tim>

[Andrew M. Kuchling]
> ...
> I suppose this leads to a more important question: at what point
> should we start checking 1.6-only things into the CVS tree?  For
> example, once the current alphas of the re module are up to it
> (they're not yet), when should they be checked in?

I'd like to see a bugfix release of 1.5.2 put out first, then have at it.
There are several bugfixes that ought to go out ASAP.  Thread tstate races,
the cpickle/cookie.py snafu, and playing nice with current Tcl/Tk pop to
mind immediately.  I'm skeptical that anyone other than Guido could decide
what *needs* to go out, so it's a good thing he's got nothing to do <wink>.

one-boy's-opinion-ly y'rs  - tim


From mhammond at skippinet.com.au  Fri Aug  6 05:30:55 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 6 Aug 1999 13:30:55 +1000
Subject: [Python-Dev] mmapfile module
In-Reply-To: <000201bedfb3$09a99000$98a22299@tim>
Message-ID: <00a801bedfbc$1871a7e0$1101a8c0@bobcat>

[Tim laments]
> mind immediately.  I'm skeptical that anyone other than Guido
> could decide
> what *needs* to go out, so it's a good thing he's got nothing
> to do <wink>.

He has been very quiet recently - where are you hiding Guido.

> one-boy's-opinion-ly y'rs  - tim

Here is another.  Lets take a different tack - what has been checked in
since 1.5.2 that should _not_ go out - ie, is too controversial?

If nothing else, makes a good starting point, and may help Guido out:

Below summary of the CVS diff I just did, and categorized by my opinion.
It turns out that most of the changes would appear candidates.  While not
actually "bug-fixes", many have better documentation, removal of unused
imports etc, so would definately not hurt to get out. Looks like some build
issues have been fixed too.

Apart from possibly Tim's recent "UnboundLocalError" (which is the only
serious behaviour change) I can't see anything that should obviously be
ommitted.

Hopefully this is of interest...

[Disclaimer - lots of files here - it is quite possible I missed
something...]

Mark.


UNCONTROVERSIAL:
----------------
RCS file: /projects/cvsroot/python/dist/src/README,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cgi.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/ftplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/poplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/re.py,v
RCS file: /projects/cvsroot/python/dist/src/Tools/audiopy/README,v
  Doc changes.

RCS file: /projects/cvsroot/python/dist/src/Lib/SimpleHTTPServer.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cmd.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/htmllib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/netrc.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/pipes.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/pty.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/shlex.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/urlparse.py,v
  Remove unused imports

RCS file: /projects/cvsroot/python/dist/src/Lib/pdb.py,v
  Remove unused globals

RCS file: /projects/cvsroot/python/dist/src/Lib/popen2.py,v
  Change to cleanup

RCS file: /projects/cvsroot/python/dist/src/Lib/profile.py,v
  Remove unused imports and changes to comments.

RCS file: /projects/cvsroot/python/dist/src/Lib/pyclbr.py,v
  Better doc, and support for module level functions.

RCS file: /projects/cvsroot/python/dist/src/Lib/repr.py,v
  self.maxlist changed to self.maxdict

RCS file: /projects/cvsroot/python/dist/src/Lib/rfc822.py,v
  Doc changes, and better date handling.

RCS file: /projects/cvsroot/python/dist/src/configure,v
RCS file: /projects/cvsroot/python/dist/src/configure.in,v
  Looks like FreeBSD build flag changes.

RCS file: /projects/cvsroot/python/dist/src/Demo/classes/bitvec.py,v
RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v
  Whitespace fixes.

RCS file: /projects/cvsroot/python/dist/src/Demo/scripts/makedir.py,v
  Check we have passed a non empty string

RCS file: /projects/cvsroot/python/dist/src/Include/patchlevel.h,v
  1.5.2+

RCS file: /projects/cvsroot/python/dist/src/Lib/BaseHTTPServer.py,v
  Remove import rfc822 and more robust errors.

RCS file: /projects/cvsroot/python/dist/src/Lib/CGIHTTPServer.py,v
  Support for HTTP_COOKIE

RCS file: /projects/cvsroot/python/dist/src/Lib/fpformat.py,v
  NotANumber supports class exceptions.

RCS file: /projects/cvsroot/python/dist/src/Lib/macpath.py,v
  Use constants from stat module

RCS file: /projects/cvsroot/python/dist/src/Lib/macurl2path.py,v
  Minor changes to path parsing

RCS file: /projects/cvsroot/python/dist/src/Lib/mimetypes.py,v
  Recognise '.js': 'application/x-javascript',

RCS file: /projects/cvsroot/python/dist/src/Lib/sunau.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/wave.py,v
  Support for binary files.

RCS file: /projects/cvsroot/python/dist/src/Lib/whichdb.py,v
  Reads file header to check for bsddb format.

RCS file: /projects/cvsroot/python/dist/src/Lib/xmllib.py,v
  XML may be at the start of the string, instead of the whole string.

RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/tkSimpleDialog.py,v
  Destroy method added.

RCS file: /projects/cvsroot/python/dist/src/Modules/cPickle.c,v
 As in the log :-)

RCS file: /projects/cvsroot/python/dist/src/Modules/cStringIO.c,v
  No longer a Py_FatalError on module init failure.

RCS file: /projects/cvsroot/python/dist/src/Modules/fpectlmodule.c,v
  Support for OSF in #ifdefs

RCS file: /projects/cvsroot/python/dist/src/Modules/makesetup,v
    # to handle backslashes for sh's that don't automatically
    # continue a read when the last char is a backslash

RCS file: /projects/cvsroot/python/dist/src/Modules/posixmodule.c,v
   Better error handling

RCS file: /projects/cvsroot/python/dist/src/Modules/timemodule.c,v
  #ifdef changes for __GNU_LIBRARY__/_GLIBC_

RCS file: /projects/cvsroot/python/dist/src/Python/errors.c,v
  Better error messages on Win32

RCS file: /projects/cvsroot/python/dist/src/Python/getversion.c,v
  Bigger buffer and strings.

RCS file: /projects/cvsroot/python/dist/src/Python/pystate.c,v
  Threading bug

RCS file: /projects/cvsroot/python/dist/src/Objects/floatobject.c,v
  Tim Peters writes:1. Fixes float divmod etc.

RCS file: /projects/cvsroot/python/dist/src/Objects/listobject.c,v
   Doc changes, and When deallocating a list, DECREF the items from the end
back to the start.

RCS file: /projects/cvsroot/python/dist/src/Objects/stringobject.c,v
  Bug for to do with width of a formatspecifier

RCS file: /projects/cvsroot/python/dist/src/Objects/tupleobject.c,v
   Appropriate overflow checks so that things like sys.maxint*(1,)
can'tdump core.

RCS file: /projects/cvsroot/python/dist/src/Lib/tempfile.py,v
  don't cache attributes of type int

RCS file: /projects/cvsroot/python/dist/src/Lib/urllib.py,v
 Number of revisions.

RCS file: /projects/cvsroot/python/dist/src/Lib/aifc.py,v
  Chunk moved to new module.

RCS file: /projects/cvsroot/python/dist/src/Lib/audiodev.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/dbhash.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/dis.py,v
  Changes in comments.

RCS file: /projects/cvsroot/python/dist/src/Lib/cmpcache.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/cmp.py,v
  New "shallow" arg.

RCS file: /projects/cvsroot/python/dist/src/Lib/dumbdbm.py,v
  Coerce f.tell() to int.

RCS file: /projects/cvsroot/python/dist/src/Modules/main.c,v
  Fix to tracebacks off by a line with -x

RCS file: /projects/cvsroot/python/dist/src/Lib/lib-tk/Tkinter.py,v
  Number of changes you can review!

OTHERS:
--------

RCS file: /projects/cvsroot/python/dist/src/Lib/asynchat.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/asyncore.py,v
 Latest versions from Sam???

RCS file: /projects/cvsroot/python/dist/src/Lib/smtplib.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/sched.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/ConfigParser.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/SocketServer.py,v
RCS file: /projects/cvsroot/python/dist/src/Lib/calendar.py,v  Sorry - out
of time to detail

RCS file: /projects/cvsroot/python/dist/src/Python/bltinmodule.c,v
  Unbound local, docstring, and better support for ExtensionClasses.

Freeze:
  Few changes

IDLE:
  Lotsa changes :-)

Number of .h files have #ifdef changes for CE I wont detail (but would be
great to get a few of these in - and I have more :-)

Tools directory:
  Number of changes - outa time to detail


From mal at lemburg.com  Fri Aug  6 10:54:20 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 06 Aug 1999 10:54:20 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl>
Message-ID: <37AAA2BC.466750B5@lemburg.com>

Jack Jansen wrote:
> 
> Recently, "Fredrik Lundh" <fredrik at pythonware.com> said:
> > on the other hand, I think something is missing from
> > the buffer design; I definitely don't like that people
> > can write and marshal objects that happen to
> > implement the buffer interface, only to find that
> > Python didn't do what they expected...
> >
> > >>> import unicode
> > >>> import marshal
> > >>> u = unicode.unicode
> > >>> s = u("foo")
> > >>> data = marshal.dumps(s)
> > >>> marshal.loads(data)
> > 'f\000o\000o\000'
> > >>> type(marshal.loads(data))
> > <type 'string'>

Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
that unicode objects use a two-byte character representation.

Note that implementing the char buffer interface will also give
you strange results with other code that uses
PyArg_ParseTuple(...,"s#",...), e.g. you could search through
Unicode strings as if they were normal 1-byte/char strings (and
most certainly not find what you're looking for, I guess).

> Hmm again, I think I'd like it better if marshal.dumps() would barf on
> attempts to write unrepresentable data. Currently unrepresentable
> objects are written as TYPE_UNKNOWN (unless they have bufferness (or
> should I call that "a buffer-aspect"? :-)), which means you think you
> are writing correctly marshalled data but you'll be in for an
> exception when you try to read it back...

I'd prefer an exception on write too.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   147 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Fri Aug  6 16:44:35 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 6 Aug 1999 10:44:35 -0400 (EDT)
Subject: [Python-Dev] mmapfile module
In-Reply-To: <00a801bedfbc$1871a7e0$1101a8c0@bobcat>
References: <000201bedfb3$09a99000$98a22299@tim>
	<00a801bedfbc$1871a7e0$1101a8c0@bobcat>
Message-ID: <14250.62675.807129.878242@weyr.cnri.reston.va.us>

Mark Hammond writes:
 > Apart from possibly Tim's recent "UnboundLocalError" (which is the only
 > serious behaviour change) I can't see anything that should obviously be

  Since UnboundLocalError is a subclass of NameError (what you got
before) normally, and they are the same string when -X is used, this
only represents a new name in the __builtin__ module for legacy code.
This should not be a problem; the only real difference is that, using
class exceptions for built-in exceptions, you get more useful
information in your tracebacks.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fredrik at pythonware.com  Sat Aug  7 12:51:56 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 7 Aug 1999 12:51:56 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com>
Message-ID: <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com>

> > > >>> import unicode
> > > >>> import marshal
> > > >>> u = unicode.unicode
> > > >>> s = u("foo")
> > > >>> data = marshal.dumps(s)
> > > >>> marshal.loads(data)
> > > 'f\000o\000o\000'
> > > >>> type(marshal.loads(data))
> > > <type 'string'>
> 
> Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> that unicode objects use a two-byte character representation.

>>> import array
>>> import marshal
>>> a = array.array
>>> s = a("f", [1, 2, 3])
>>> data = marshal.dumps(s)
>>> marshal.loads(data)
'\000\000\200?\000\000\000@\000\000@@'

looks like the various implementors haven't
really understood the intentions of whoever
designed the buffer interface...

</F>


From mal at lemburg.com  Sat Aug  7 18:14:56 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 07 Aug 1999 18:14:56 +0200
Subject: [Python-Dev] Some more constants for the socket module
Message-ID: <37AC5B80.56F740DD@lemburg.com>

Following the recent discussion on c.l.p about socket options,
I found that the socket module does not define all constants
defined in the (Linux) socket header file.

Below is a patch that adds a few more (note that the SOL_*
constants should be used for the setsockopt() level, not the
IPPROTO_* constants).


--- socketmodule.c~     Sat Aug  7 17:56:05 1999
+++ socketmodule.c      Sat Aug  7 18:10:07 1999
@@ -2005,14 +2005,48 @@ initsocket()
        PySocketSock_Type.tp_doc = sockettype_doc;
        Py_INCREF(&PySocketSock_Type);
        if (PyDict_SetItemString(d, "SocketType",
                                 (PyObject *)&PySocketSock_Type) != 0)
                return;
+
+       /* Address families (we only support AF_INET and AF_UNIX) */
+#ifdef AF_UNSPEC
+       insint(moddict, "AF_UNSPEC", AF_UNSPEC);
+#endif
        insint(d, "AF_INET", AF_INET);
 #ifdef AF_UNIX
        insint(d, "AF_UNIX", AF_UNIX);
 #endif /* AF_UNIX */
+#ifdef AF_AX25
+       insint(moddict, "AF_AX25", AF_AX25); /* Amateur Radio AX.25 */
+#endif
+#ifdef AF_IPX
+       insint(moddict, "AF_IPX", AF_IPX); /* Novell IPX */
+#endif
+#ifdef AF_APPLETALK
+       insint(moddict, "AF_APPLETALK", AF_APPLETALK); /* Appletalk DDP */
+#endif
+#ifdef AF_NETROM
+       insint(moddict, "AF_NETROM", AF_NETROM); /* Amateur radio NetROM */
+#endif
+#ifdef AF_BRIDGE
+       insint(moddict, "AF_BRIDGE", AF_BRIDGE); /* Multiprotocol bridge */
+#endif
+#ifdef AF_AAL5
+       insint(moddict, "AF_AAL5", AF_AAL5); /* Reserved for Werner's ATM */
+#endif
+#ifdef AF_X25
+       insint(moddict, "AF_X25", AF_X25); /* Reserved for X.25 project */
+#endif
+#ifdef AF_INET6
+       insint(moddict, "AF_INET6", AF_INET6); /* IP version 6 */
+#endif
+#ifdef AF_ROSE
+       insint(moddict, "AF_ROSE", AF_ROSE); /* Amateur Radio X.25 PLP */
+#endif
+
+       /* Socket types */
        insint(d, "SOCK_STREAM", SOCK_STREAM);
        insint(d, "SOCK_DGRAM", SOCK_DGRAM);
 #ifndef __BEOS__
 /* We have incomplete socket support. */
        insint(d, "SOCK_RAW", SOCK_RAW);
@@ -2048,11 +2082,10 @@ initsocket()
        insint(d, "SO_OOBINLINE", SO_OOBINLINE);
 #endif
 #ifdef SO_REUSEPORT
        insint(d, "SO_REUSEPORT", SO_REUSEPORT);
 #endif
-
 #ifdef SO_SNDBUF
        insint(d, "SO_SNDBUF", SO_SNDBUF);
 #endif
 #ifdef SO_RCVBUF
        insint(d, "SO_RCVBUF", SO_RCVBUF);
@@ -2111,14 +2144,43 @@ initsocket()
 #ifdef MSG_ETAG
        insint(d, "MSG_ETAG", MSG_ETAG);
 #endif
 
        /* Protocol level and numbers, usable for [gs]etsockopt */
-/* Sigh -- some systems (e.g. Linux) use enums for these. */
 #ifdef SOL_SOCKET
        insint(d, "SOL_SOCKET", SOL_SOCKET);
 #endif
+#ifdef  SOL_IP
+       insint(moddict, "SOL_IP", SOL_IP);
+#else
+       insint(moddict, "SOL_IP", 0);
+#endif
+#ifdef  SOL_IPX
+       insint(moddict, "SOL_IPX", SOL_IPX);
+#endif
+#ifdef  SOL_AX25
+       insint(moddict, "SOL_AX25", SOL_AX25);
+#endif
+#ifdef  SOL_ATALK
+       insint(moddict, "SOL_ATALK", SOL_ATALK);
+#endif
+#ifdef  SOL_NETROM
+       insint(moddict, "SOL_NETROM", SOL_NETROM);
+#endif
+#ifdef  SOL_ROSE
+       insint(moddict, "SOL_ROSE", SOL_ROSE);
+#endif
+#ifdef  SOL_TCP
+       insint(moddict, "SOL_TCP", SOL_TCP);
+#else
+       insint(moddict, "SOL_TCP", 6);
+#endif
+#ifdef  SOL_UDP
+       insint(moddict, "SOL_UDP", SOL_UDP);
+#else
+       insint(moddict, "SOL_UDP", 17);
+#endif
 #ifdef IPPROTO_IP
        insint(d, "IPPROTO_IP", IPPROTO_IP);
 #else
        insint(d, "IPPROTO_IP", 0);
 #endif
@@ -2266,10 +2328,32 @@ initsocket()
 #ifdef IP_ADD_MEMBERSHIP
        insint(d, "IP_ADD_MEMBERSHIP", IP_ADD_MEMBERSHIP);
 #endif
 #ifdef IP_DROP_MEMBERSHIP
        insint(d, "IP_DROP_MEMBERSHIP", IP_DROP_MEMBERSHIP);
+#endif
+#ifdef  IP_DEFAULT_MULTICAST_TTL
+       insint(moddict, "IP_DEFAULT_MULTICAST_TTL", IP_DEFAULT_MULTICAST_TTL);
+#endif
+#ifdef  IP_DEFAULT_MULTICAST_LOOP
+       insint(moddict, "IP_DEFAULT_MULTICAST_LOOP", IP_DEFAULT_MULTICAST_LOOP);
+#endif
+#ifdef  IP_MAX_MEMBERSHIPS
+       insint(moddict, "IP_MAX_MEMBERSHIPS", IP_MAX_MEMBERSHIPS);
+#endif
+
+       /* TCP options */
+#ifdef  TCP_NODELAY
+       insint(moddict, "TCP_NODELAY", TCP_NODELAY);
+#endif
+#ifdef  TCP_MAXSEG
+       insint(moddict, "TCP_MAXSEG", TCP_MAXSEG);
+#endif
+
+       /* IPX options */
+#ifdef  IPX_TYPE
+       insint(moddict, "IPX_TYPE", IPX_TYPE);
 #endif
 
        /* Initialize gethostbyname lock */
 #ifdef USE_GETHOSTBYNAME_LOCK
        gethostbyname_lock = PyThread_allocate_lock();

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   146 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sat Aug  7 22:15:08 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 07 Aug 1999 13:15:08 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com>
Message-ID: <37AC93CC.53982F3F@lyra.org>

Fredrik Lundh wrote:
> 
> > > > >>> import unicode
> > > > >>> import marshal
> > > > >>> u = unicode.unicode
> > > > >>> s = u("foo")
> > > > >>> data = marshal.dumps(s)
> > > > >>> marshal.loads(data)
> > > > 'f\000o\000o\000'
> > > > >>> type(marshal.loads(data))
> > > > <type 'string'>

This was a "nicety" that was put during a round of patches that I
submitted to Guido. We both had questions about it but figured that it
couldn't hurt since it at least let some things be marshalled out that
couldn't be marshalled before.

I would suggest backing out the marshalling of buffer-interface objects
and adding a mechanism for arbitrary type objects to marshal themselves.
Without the second part, arrays and Unicode objects aren't marshallable
at all (seems bad).

> > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > that unicode objects use a two-byte character representation.

Unicode objects should *not* implement the getcharbuffer slot. Only
read, write, and segcount.

> >>> import array
> >>> import marshal
> >>> a = array.array
> >>> s = a("f", [1, 2, 3])
> >>> data = marshal.dumps(s)
> >>> marshal.loads(data)
> '\000\000\200?\000\000\000@\000\000@@'
> 
> looks like the various implementors haven't
> really understood the intentions of whoever
> designed the buffer interface...

Arrays can/should support both the getreadbuffer and getcharbuffer
interface. The former: definitely. The latter: only if the contents are
byte-sized.

The loading back as a string is a different matter, as pointed out
above.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jack at oratrix.nl  Sun Aug  8 22:20:52 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Sun, 08 Aug 1999 22:20:52 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Sat, 07 Aug 1999 13:15:08 -0700 , <37AC93CC.53982F3F@lyra.org> 
Message-ID: <19990808202057.DB803E267A@oratrix.oratrix.nl>

Recently, Greg Stein <gstein at lyra.org> said:
> I would suggest backing out the marshalling of buffer-interface objects
> and adding a mechanism for arbitrary type objects to marshal themselves.
> Without the second part, arrays and Unicode objects aren't marshallable
> at all (seems bad).

This sounds like the right approach. It would require 2 slots in the
tp_ structure and a little extra glue for the typecodes (currently
marshall knows all the 1-letter typecodes for all objecttypes it can
handle, but types marshalling their own objects would require a
centralized registry of object types. For the time being it would
probably suffice to have the mapping of type<->letter be hardcoded in
marshal.h, but eventually you probably want a more extensible scheme,
where Joe R. Extension-Writer could add a marshaller to his objects
and know it won't collide with someone else's.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Mon Aug  9 10:56:30 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 09 Aug 1999 10:56:30 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990808202057.DB803E267A@oratrix.oratrix.nl>
Message-ID: <37AE97BE.2CADF48E@lemburg.com>

Jack Jansen wrote:
> 
> Recently, Greg Stein <gstein at lyra.org> said:
> > I would suggest backing out the marshalling of buffer-interface objects
> > and adding a mechanism for arbitrary type objects to marshal themselves.
> > Without the second part, arrays and Unicode objects aren't marshallable
> > at all (seems bad).
> 
> This sounds like the right approach. It would require 2 slots in the
> tp_ structure and a little extra glue for the typecodes (currently
> marshall knows all the 1-letter typecodes for all objecttypes it can
> handle, but types marshalling their own objects would require a
> centralized registry of object types. For the time being it would
> probably suffice to have the mapping of type<->letter be hardcoded in
> marshal.h, but eventually you probably want a more extensible scheme,
> where Joe R. Extension-Writer could add a marshaller to his objects
> and know it won't collide with someone else's.

This registry should ideally be reachable via C APIs. Then a module
writer could call these APIs in the init function of his module and
he'd be set. Since marshal won't be able to handle imports on the
fly (like pickle et al.), these modules will have to be imported
before unmarshalling.

Aside: wouldn't it make sense to move from marshal to pickle and
depreciate marshal altogether ? cPickle is quite fast and much more
flexible than marshal, plus it already provides mechanisms for
registering new types.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   144 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Mon Aug  9 15:49:44 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 09 Aug 1999 15:49:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Mon, 09 Aug 1999 10:56:30 +0200 , <37AE97BE.2CADF48E@lemburg.com> 
Message-ID: <19990809134944.BB2FC303120@snelboot.oratrix.nl>

> Aside: wouldn't it make sense to move from marshal to pickle and
> depreciate marshal altogether ? cPickle is quite fast and much more
> flexible than marshal, plus it already provides mechanisms for
> registering new types.

This is probably the best idea so far. Just remove the buffer-workaround in 
marshall, keep it functioning for the things it is used for now (like pyc 
files) and refer people to (c)Pickle for new development.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at CNRI.Reston.VA.US  Mon Aug  9 16:50:46 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 10:50:46 -0400
Subject: [Python-Dev] Some more constants for the socket module
In-Reply-To: Your message of "Sat, 07 Aug 1999 18:14:56 +0200."
             <37AC5B80.56F740DD@lemburg.com> 
References: <37AC5B80.56F740DD@lemburg.com> 
Message-ID: <199908091450.KAA29179@eric.cnri.reston.va.us>

Thanks for the socketmodule patch, Marc.  This was on my mental TO-DO
list for a long time!  I've checked it in.

(One note: I had a bit of trouble applying the patch; apparently your
mailer expanded all tabs to spaces.  Perhaps you could use attachments 
to mail diffs?  Also, you seem to have renamed 'd' to 'moddict' but
you didn't send the patch for that...)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at CNRI.Reston.VA.US  Mon Aug  9 18:26:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 12:26:28 -0400
Subject: [Python-Dev] preferred conference date?
Message-ID: <199908091626.MAA29411@eric.cnri.reston.va.us>

I need your input about the date of the next Python conference.

Foretec is close to a deal for a Python conference in January 2000 at
the Alexandria Old Town Hilton hotel.  Given our requirement of a good
location in the DC area, this is a very good deal (it's a brand new
hotel).  The prices are high (they tell me that the whole conference
will cost $900, with a room rate of $129) but it's a class A location
(metro, tons of restaurants, close to National Airport, etc.) and we
have found no cheaper DC hotel suitable for our purposes (even in drab
suburban locations).

I'm worried that I'll be flamed to hell for this by the PSA members,
but I don't think we can get the price any lower without starting all
over in a different location, probably causing several months of
delay.  If people won't come, Foretec (and I) will have learned a
valuable lesson and we'll rethink the issue for the 2001 conference.

Anyway, given that Foretec is likely to go with this hotel, we have a
choice of two dates: January 16-19, or 23-26 (both starting on a
Sunday with the tutorials).  This is where I need your help: which
date would you prefer?  Please mail me personally.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Mon Aug  9 18:31:43 1999
From: skip at mojam.com (Skip Montanaro)
Date: Mon,  9 Aug 1999 11:31:43 -0500 (CDT)
Subject: [Python-Dev] preferred conference date?
In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us>
References: <199908091626.MAA29411@eric.cnri.reston.va.us>
Message-ID: <14255.557.474160.824877@dolphin.mojam.com>

    Guido> The prices are high (they tell me that the whole conference will
    Guido> cost $900, with a room rate of $129) but it's a class A location

No way I (or my company) can afford to plunk down $900 for me to attend...

Skip


From mal at lemburg.com  Mon Aug  9 18:40:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 09 Aug 1999 18:40:45 +0200
Subject: [Python-Dev] Some more constants for the socket module
References: <37AC5B80.56F740DD@lemburg.com> <199908091450.KAA29179@eric.cnri.reston.va.us>
Message-ID: <37AF048D.FC0A540@lemburg.com>

Guido van Rossum wrote:
> 
> Thanks for the socketmodule patch, Marc.  This was on my mental TO-DO
> list for a long time!  I've checked it in.

Cool, thanks.
 
> (One note: I had a bit of trouble applying the patch; apparently your
> mailer expanded all tabs to spaces.  Perhaps you could use attachments
> to mail diffs?

Ok.

>  Also, you seem to have renamed 'd' to 'moddict' but
> you didn't send the patch for that...)

Oops, sorry... my "#define to insint" script uses 'd' as moddict,
that's the reason why.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   144 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From guido at CNRI.Reston.VA.US  Mon Aug  9 19:30:36 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 09 Aug 1999 13:30:36 -0400
Subject: [Python-Dev] preferred conference date?
In-Reply-To: Your message of "Mon, 09 Aug 1999 11:31:43 CDT."
             <14255.557.474160.824877@dolphin.mojam.com> 
References: <199908091626.MAA29411@eric.cnri.reston.va.us>  
            <14255.557.474160.824877@dolphin.mojam.com> 
Message-ID: <199908091730.NAA29559@eric.cnri.reston.va.us>

>     Guido> The prices are high (they tell me that the whole conference will
>     Guido> cost $900, with a room rate of $129) but it's a class A location
> 
> No way I (or my company) can afford to plunk down $900 for me to attend...

Let me clarify this.  The $900 is for the whole 4-day conference,
including a day of tutorials and developers' day.  I don't know what
the exact price breakdown will be, but the tutorials will probably be
$300.  Last year the total price was $700, with $250 for tutorials.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Tue Aug 10 14:04:27 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 13:04:27 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
Message-ID: <199908101204.NAA29572@pukapuka.inrialpes.fr>

Currently, dictionaries always grow until they are deallocated from
memory. This happens in PyDict_SetItem according to the following
code (before inserting the new item into the dict):

        /* if fill >= 2/3 size, double in size */
        if (mp->ma_fill*3 >= mp->ma_size*2) {
                if (dictresize(mp, mp->ma_used*2) != 0) {
                        if (mp->ma_fill+1 > mp->ma_size)
                                return -1;
                }
        }

The symmetric case is missing and this has intrigued me for a long time,
but I've never had the courage to look deeply into this portion of code
and try to propose a solution. Which is: reduce the size of the dict by
half when the nb of used items <= 1/6 the size.

This situation occurs far less frequently than dict growing, but anyways,
it seems useful for the degenerate cases where a dict has a peek usage,
then most of the items are deleted. This is usually the case for global
dicts holding dynamic object collections, etc.

A bonus effect of shrinking big dicts with deleted items is that
the lookup speed may be improved, because of the cleaned <dummy> entries
and the reduced overall size (resulting in a better hit ratio).

The (only) solution I could came with for this pb is the appended patch.
It is not immediately obvious, but in practice, it seems to work fine.
(inserting a print statement after the condition, showing the dict size
 and current usage helps in monitoring what's going on).

Any other ideas on how to deal with this? Thoughts, comments?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** dictobject.c-1.5.2	Fri Aug  6 18:51:02 1999
--- dictobject.c	Tue Aug 10 12:21:15 1999
***************
*** 417,423 ****
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key); 
  	return 0;
  }
  
--- 417,430 ----
  	ep->me_value = NULL;
  	mp->ma_used--;
  	Py_DECREF(old_value); 
! 	Py_DECREF(old_key);
! 	/* For bigger dictionaries, if used <= 1/6 size, half the size */
! 	if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) {
! 		if (dictresize(mp, mp->ma_used*2) != 0) {
! 			if (mp->ma_fill > mp->ma_size)
! 				return -1;
! 		}	  
! 	}
  	return 0;
  }
  

From Vladimir.Marangozov at inrialpes.fr  Tue Aug 10 15:20:36 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 14:20:36 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 10, 99 01:04:27 pm"
Message-ID: <199908101320.OAA21986@pukapuka.inrialpes.fr>

I wrote:
> 
> The (only) solution I could came with for this pb is the appended patch.
> It is not immediately obvious, but in practice, it seems to work fine.
> (inserting a print statement after the condition, showing the dict size
>  and current usage helps in monitoring what's going on).
> 
> Any other ideas on how to deal with this? Thoughts, comments?
> 

To clarify a bit what the patch does "as is", here's a short description:

The code is triggered in PyDict_DelItem only for sizes which are > MINSIZE*4,
i.e. greater than 4*4 = 16. Therefore, resizing will occur for a min size of
32 items.

one third  32 / 3 = 10
two thirds 32 * 2/3 = 21

one sixth  32 / 6 = 5

So the shinking will happen for a dict size of 32, of which 5 items are used
(the sixth was just deleted).  After the dictresize, the size will be 16, of
which 5 items are used, i.e. one third.

The threshold is fixed by the first condition of the patch. It could be
made 64, instead of 32. This is subject to discussion...

Obviously, this is most useful for bigger dicts, not for small ones.
A threshold of 32 items seemed to me to be a reasonable compromise.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From fredrik at pythonware.com  Tue Aug 10 14:35:33 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 10 Aug 1999 14:35:33 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>
Message-ID: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>

Greg Stein <gstein at lyra.org> wrote:
> > > > > >>> import unicode
> > > > > >>> import marshal
> > > > > >>> u = unicode.unicode
> > > > > >>> s = u("foo")
> > > > > >>> data = marshal.dumps(s)
> > > > > >>> marshal.loads(data)
> > > > > 'f\000o\000o\000'
> > > > > >>> type(marshal.loads(data))
> > > > > <type 'string'>
>
> > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > > that unicode objects use a two-byte character representation.
> 
> Unicode objects should *not* implement the getcharbuffer slot. Only
> read, write, and segcount.

unicode objects do not implement the getcharbuffer slot.
here's the relevant descriptor:

static PyBufferProcs unicode_as_buffer = {
    (getreadbufferproc) unicode_buffer_getreadbuf,
    (getwritebufferproc) unicode_buffer_getwritebuf,
    (getsegcountproc) unicode_buffer_getsegcount
};

the array module uses a similar descriptor.

maybe the unicode class shouldn't implement the
buffer interface at all?  sure looks like the best way
to avoid trivial mistakes (the current behaviour of
fp.write(unicodeobj) is even more serious than the
marshal glitch...)

or maybe the buffer design needs an overhaul?

</F>


From guido at CNRI.Reston.VA.US  Tue Aug 10 16:12:23 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Tue, 10 Aug 1999 10:12:23 -0400
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Your message of "Tue, 10 Aug 1999 14:35:33 +0200."
             <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> 
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>  
            <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> 
Message-ID: <199908101412.KAA02065@eric.cnri.reston.va.us>

> Greg Stein <gstein at lyra.org> wrote:
> > > > > > >>> import unicode
> > > > > > >>> import marshal
> > > > > > >>> u = unicode.unicode
> > > > > > >>> s = u("foo")
> > > > > > >>> data = marshal.dumps(s)
> > > > > > >>> marshal.loads(data)
> > > > > > 'f\000o\000o\000'
> > > > > > >>> type(marshal.loads(data))
> > > > > > <type 'string'>
> >
> > > > Why do Unicode objects implement the bf_getcharbuffer slot ? I thought
> > > > that unicode objects use a two-byte character representation.
> > 
> > Unicode objects should *not* implement the getcharbuffer slot. Only
> > read, write, and segcount.
> 
> unicode objects do not implement the getcharbuffer slot.
> here's the relevant descriptor:
> 
> static PyBufferProcs unicode_as_buffer = {
>     (getreadbufferproc) unicode_buffer_getreadbuf,
>     (getwritebufferproc) unicode_buffer_getwritebuf,
>     (getsegcountproc) unicode_buffer_getsegcount
> };
> 
> the array module uses a similar descriptor.
> 
> maybe the unicode class shouldn't implement the
> buffer interface at all?  sure looks like the best way
> to avoid trivial mistakes (the current behaviour of
> fp.write(unicodeobj) is even more serious than the
> marshal glitch...)
> 
> or maybe the buffer design needs an overhaul?

I think most places that should use the charbuffer interface actually
use the readbuffer interface.  This is what should be fixed.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From mal at lemburg.com  Tue Aug 10 19:53:56 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 19:53:56 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <37B06734.4339D3BF@lemburg.com>

Fredrik Lundh wrote:
> 
> unicode objects do not implement the getcharbuffer slot.
>...
> or maybe the buffer design needs an overhaul?

I think its usage does. The character slot should be used whenever
character data is needed, not the read buffer slot. The latter one is
for passing around raw binary data (without reinterpretation !),
if I understood Greg correctly back when I gave those abstract
APIs a try.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Tue Aug 10 19:39:29 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 19:39:29 +0200
Subject: [Python-Dev] shrinking dicts
References: <199908101204.NAA29572@pukapuka.inrialpes.fr>
Message-ID: <37B063D1.29F3106A@lemburg.com>

Vladimir Marangozov wrote:
> 
> Currently, dictionaries always grow until they are deallocated from
> memory. This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
> 
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }
> 
> The symmetric case is missing and this has intrigued me for a long time,
> but I've never had the courage to look deeply into this portion of code
> and try to propose a solution. Which is: reduce the size of the dict by
> half when the nb of used items <= 1/6 the size.
> 
> This situation occurs far less frequently than dict growing, but anyways,
> it seems useful for the degenerate cases where a dict has a peek usage,
> then most of the items are deleted. This is usually the case for global
> dicts holding dynamic object collections, etc.
> 
> A bonus effect of shrinking big dicts with deleted items is that
> the lookup speed may be improved, because of the cleaned <dummy> entries
> and the reduced overall size (resulting in a better hit ratio).
> 
> The (only) solution I could came with for this pb is the appended patch.
> It is not immediately obvious, but in practice, it seems to work fine.
> (inserting a print statement after the condition, showing the dict size
>  and current usage helps in monitoring what's going on).
> 
> Any other ideas on how to deal with this? Thoughts, comments?

I think that integrating this into the C code is not really that
effective since the situation will not occur that often and then
it often better to let the programmer decide rather than integrate
an automatic downsize.

You can call dict.update({}) to force an internal
resize (the empty dictionary can be made global since it is not
manipulated in any way and thus does not cause creation overhead).

Perhaps a new method .resize(approx_size) would make this even
clearer. This would also have the benefit of allowing a programmer
to force allocation of the wanted size, e.g.

d = {}
d.resize(10000)
# Insert 10000 items in a batch insert

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov at inrialpes.fr  Tue Aug 10 21:58:27 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Tue, 10 Aug 1999 20:58:27 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <37B063D1.29F3106A@lemburg.com> from "M.-A. Lemburg" at "Aug 10, 99 07:39:29 pm"
Message-ID: <199908101958.UAA22028@pukapuka.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> [me]
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> I think that integrating this into the C code is not really that
> effective since the situation will not occur that often and then
> it often better to let the programmer decide rather than integrate
> an automatic downsize.

Agreed that the situation is rare. But if it occurs, its Python's
responsability to manage its data structures (and system resources)
efficiently. As a programmer, I really don't want to be bothered with
internals -- I trust the interpreter for that. Moreover, how could
I decide that at some point, some dict needs to be resized in my
fairly big app, say IDLE?

> 
> You can call dict.update({}) to force an internal
> resize (the empty dictionary can be made global since it is not
> manipulated in any way and thus does not cause creation overhead).

I know that I can force the resize in other ways, but this is not
the point. I'm usually against the idea of changing the programming
logic because of my advanced knowledge of the internals.

> 
> Perhaps a new method .resize(approx_size) would make this even
> clearer. This would also have the benefit of allowing a programmer
> to force allocation of the wanted size, e.g.
> 
> d = {}
> d.resize(10000)
> # Insert 10000 items in a batch insert

This is interesting, but the two ideas are not mutually excusive.
Python has to dowsize dicts automatically (just the same way it doubles
the size automatically). Offering more through an API is a plus for
hackers. ;-)

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Tue Aug 10 22:19:46 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 10 Aug 1999 22:19:46 +0200
Subject: [Python-Dev] shrinking dicts
References: <199908101958.UAA22028@pukapuka.inrialpes.fr>
Message-ID: <37B08962.6DFB3F0@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> >
> > [me]
> > > Any other ideas on how to deal with this? Thoughts, comments?
> >
> > I think that integrating this into the C code is not really that
> > effective since the situation will not occur that often and then
> > it often better to let the programmer decide rather than integrate
> > an automatic downsize.
> 
> Agreed that the situation is rare. But if it occurs, its Python's
> responsability to manage its data structures (and system resources)
> efficiently. As a programmer, I really don't want to be bothered with
> internals -- I trust the interpreter for that. Moreover, how could
> I decide that at some point, some dict needs to be resized in my
> fairly big app, say IDLE?

You usually don't ;-) because "normal" dict only grow (well, more or
less). The downsizing thing will only become a problem if you use
dictionaries in certain algorithms and there you handle the problem
manually.

My stack implementation uses the same trick, BTW. Memory is cheap
and with an extra resize method (which the mxStack implementation
has), problems can be dealt with explicitly for everyone to see
in the code.

> > You can call dict.update({}) to force an internal
> > resize (the empty dictionary can be made global since it is not
> > manipulated in any way and thus does not cause creation overhead).
> 
> I know that I can force the resize in other ways, but this is not
> the point. I'm usually against the idea of changing the programming
> logic because of my advanced knowledge of the internals.

True, that why I mentioned...
 
> >
> > Perhaps a new method .resize(approx_size) would make this even
> > clearer. This would also have the benefit of allowing a programmer
> > to force allocation of the wanted size, e.g.
> >
> > d = {}
> > d.resize(10000)
> > # Insert 10000 items in a batch insert
> 
> This is interesting, but the two ideas are not mutually excusive.
> Python has to dowsize dicts automatically (just the same way it doubles
> the size automatically). Offering more through an API is a plus for
> hackers. ;-)

It's not really for hackers: the point is that it makes the technique
visible and understandable (as opposed to the hack above). The same
could be useful for lists too (the hack there is l = [None] * size,
which I find rather difficult to understand at first sight...).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   143 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond at skippinet.com.au  Wed Aug 11 00:39:30 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed, 11 Aug 1999 08:39:30 +1000
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <37B08962.6DFB3F0@lemburg.com>
Message-ID: <010901bee381$36ee5d30$1101a8c0@bobcat>

Looking over the messages from Marc and Vladimir, Im going to add my 2c
worth.

IMO, Marc's position is untenable iff it can be demonstrated that the
"average" program is likely to see "sparse" dictionaries, and such
dictionaries have an adverse effect on either speed or memory.

The analogy is quite simple - you dont need to manually resize lists or
dicts before inserting (to allocate more storage - an internal
implementation issue) so neither should you need to manually resize when
deleting (to reclaim that storage - still internal implementation).
Suggesting that the allocation of resources should be automatic, but the
recycling of them not be automatic flies in the face of everything else -
eg, you dont need to delete each object - when it is no longer referenced,
its memory is reclaimed automatically.

Marc's position is only reasonable if the specific case we are talking
about is very very rare, and unlikely to be hit by anyone with normal,
real-world requirements or programs.  In this case, exposing the
implementation detail is reasonable.

So, the question comes down to: "What is the benefit to Vladmir's patch?"

Maybe we need some metrics on some dictionaries.  For example, maybe a
doctored Python that kept stats for each dictionary and log this info.  The
output of this should be able to tell you what savings you could possibly
expect.  If you find that the average program really would not benefit at
all (say only a few K from a small number of dicts) then the horse was
probably dead well before we started flogging it.  If however you can
demonstrate serious benefits could be achieved, then interest may pick up
and I too would lobby for automatic downsizing.

Mark.


From tim_one at email.msn.com  Wed Aug 11 07:30:20 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 11 Aug 1999 01:30:20 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908101204.NAA29572@pukapuka.inrialpes.fr>
Message-ID: <000001bee3ba$9b226f60$8d2d2399@tim>

[Vladimir]
> Currently, dictionaries always grow until they are deallocated from
> memory.

It's more accurate to say they never shrink <0.9 wink>.  Even that has
exceptions, though, starting with:

> This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
>
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }

This code can shrink the dict too.  The load factor computation is based on
"fill", but the resize is based on "used".  If you grow a huge dict, then
delete all the entries one by one, "used" falls to 0 but "fill" stays at its
high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
continues to climb as keys are added again:  when the load factor
computation triggers again, "used" may be as small as 1, and dictresize can
shrink the dict dramatically.

The only clear a priori return I see in your patch is that I might save
memory if I delete gobs of stuff from a dict and then neither get rid of it
nor add keys to it again.  But my programs generally grow dicts forever,
grow then delete them entirely, or cycle through fat and lean times (in
which case the code above already shrinks them from time to time).  So I
don't expect that your patch would be buy me anything I want, but would cost
me more on every delete.

> ...
> Any other ideas on how to deal with this? Thoughts, comments?

Just that slowing the expected case to prevent theoretical bad cases is
usually a net loss -- I think the onus is on you to demonstrate that this
change is an exception to that rule.  I do recall one real-life complaint
about it on c.l.py a couple years ago:  the poster had a huge dict,
eventually deleted most of the items, and then kept it around purely for
lookups.  They were happy enough to copy the dict into a fresh one a
key+value pair at a time; today they could just do

    d = d.copy()

or even

    d.update({})

to shrink the dict.

It would certainly be good to document these tricks!

if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
    see-why-1999-is-special-ly y'rs  - tim


From tim_one at email.msn.com  Wed Aug 11 08:45:49 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 11 Aug 1999 02:45:49 -0400
Subject: [Python-Dev] preferred conference date?
In-Reply-To: <199908091626.MAA29411@eric.cnri.reston.va.us>
Message-ID: <000201bee3c5$25b47b00$8d2d2399@tim>

[Guido]
> ...
> The prices are high (they tell me that the whole conference will cost
> $900, with a room rate of $129)

Is room rental in addition to, or included in, that $900?

> ...
> I'm worried that I'll be flamed to hell for this by the PSA members,

So have JulieK announce it <wink>.

> ...
> Anyway, given that Foretec is likely to go with this hotel, we have a
> choice of two dates: January 16-19, or 23-26 (both starting on a
> Sunday with the tutorials).  This is where I need your help: which
> date would you prefer?

23-26 for me; 16-19 may not be doable.

or-everyone-can-switch-to-windows-and-we'll-do-the-conference-via-
    netmeeting-ly y'rs  - tim


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 11 16:33:17 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 11 Aug 1999 15:33:17 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <000001bee3ba$9b226f60$8d2d2399@tim> from "Tim Peters" at "Aug 11, 99 01:30:20 am"
Message-ID: <199908111433.PAA31842@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Vladimir]
> > Currently, dictionaries always grow until they are deallocated from
> > memory.
> 
> It's more accurate to say they never shrink <0.9 wink>.  Even that has
> exceptions, though, starting with:
> 
> > This happens in PyDict_SetItem according to the following
> > code (before inserting the new item into the dict):
> >
> >         /* if fill >= 2/3 size, double in size */
> >         if (mp->ma_fill*3 >= mp->ma_size*2) {
> >                 if (dictresize(mp, mp->ma_used*2) != 0) {
> >                         if (mp->ma_fill+1 > mp->ma_size)
> >                                 return -1;
> >                 }
> >         }
> 
> This code can shrink the dict too.  The load factor computation is based on
> "fill", but the resize is based on "used".  If you grow a huge dict, then
> delete all the entries one by one, "used" falls to 0 but "fill" stays at its
> high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
> continues to climb as keys are added again:  when the load factor
> computation triggers again, "used" may be as small as 1, and dictresize can
> shrink the dict dramatically.

Thanks for clarifying this!

> [snip]
> 
> > ...
> > Any other ideas on how to deal with this? Thoughts, comments?
> 
> Just that slowing the expected case to prevent theoretical bad cases is
> usually a net loss -- I think the onus is on you to demonstrate that this
> change is an exception to that rule.

I won't, because this case is rare in practice, classifying it already
as an exception. A real exception. I'll have to think a bit more about
all this. Adding 1/3 new entries to trigger the next resize sounds
suboptimal (if it happens at all).

> I do recall one real-life complaint
> about it on c.l.py a couple years ago:  the poster had a huge dict,
> eventually deleted most of the items, and then kept it around purely for
> lookups.  They were happy enough to copy the dict into a fresh one a
> key+value pair at a time; today they could just do
> 
>     d = d.copy()
> 
> or even
> 
>     d.update({})
> 
> to shrink the dict.
> 
> It would certainly be good to document these tricks!

I think that officializing these tricks in the documentation is a bad idea.

> 
> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
>     see-why-1999-is-special-ly y'rs  - tim
> 

This is a good (your favorite ;-) argument, but don't forget that you've
been around, teaching people various tricks.

And 1999 is special -- we just had a solar eclipse today, the next being
scheduled for 2081.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From fredrik at pythonware.com  Wed Aug 11 16:07:44 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 11 Aug 1999 16:07:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org>             <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>  <199908101412.KAA02065@eric.cnri.reston.va.us>
Message-ID: <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>

> > or maybe the buffer design needs an overhaul?
> 
> I think most places that should use the charbuffer interface actually
> use the readbuffer interface.  This is what should be fixed.

ok.

btw, how about adding support for buffer access
to data that have strange internal formats (like cer-
tain PIL image memories) or isn't directly accessible
(like "virtual" and "abstract" image buffers in PIL 1.1).
something like:

int initbuffer(PyObject* obj, void** context);
int exitbuffer(PyObject* obj, void* context);

and corresponding context arguments to the
rest of the functions...

</F>


From guido at CNRI.Reston.VA.US  Wed Aug 11 16:42:10 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 11 Aug 1999 10:42:10 -0400
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: Your message of "Wed, 11 Aug 1999 16:07:44 +0200."
             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> 
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>  
            <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com> 
Message-ID: <199908111442.KAA04423@eric.cnri.reston.va.us>

> btw, how about adding support for buffer access
> to data that have strange internal formats (like cer-
> tain PIL image memories) or isn't directly accessible
> (like "virtual" and "abstract" image buffers in PIL 1.1).
> something like:
> 
> int initbuffer(PyObject* obj, void** context);
> int exitbuffer(PyObject* obj, void* context);
> 
> and corresponding context arguments to the
> rest of the functions...

Can you explain this idea more?  Without more understanding of PIL I
have no idea what you're talking about...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From tim_one at email.msn.com  Thu Aug 12 07:15:39 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 12 Aug 1999 01:15:39 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908111433.PAA31842@pukapuka.inrialpes.fr>
Message-ID: <000301bee481$b78ae5c0$4e2d2399@tim>

[Tim]
>> ...slowing the expected case to prevent theoretical bad cases is
>> usually a net loss -- I think the onus is on you to demonstrate
>> that this change is an exception to that rule.

[Vladimir Marangozov]
> I won't, because this case is rare in practice, classifying it already
> as an exception. A real exception. I'll have to think a bit more about
> all this. Adding 1/3 new entries to trigger the next resize sounds
> suboptimal (if it happens at all).

"Suboptimal" with respect to which specific cost model?  Exhibiting a
specific bad case isn't compelling, and especially not when it's considered
to be "a real exception".  Adding new expense to every delete is an obvious
new burden -- where's the payback, and is the expected net effect amortized
across all dict usage a win or loss?  Offhand it sounds like a small loss to
me, although I haven't worked up a formal cost model either <wink>.

> ...
> I think that officializing these tricks in the documentation is a
> bad idea.

It's rarely a good idea to keep truths secret, although
implementation-du-jour tricks don't belong in the current doc set.  Probably
in a HowTo.

>> if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-
>>     to-see-why-1999-is-special-ly y'rs  - tim

> This is a good (your favorite ;-) argument,

I actually hate that kind of argument -- it's one of *Guido's* favorites,
and in his current silent state I'm simply channeling him <wink>.

> but don't forget that you've been around, teaching people various
> tricks.

As I said, this particular trick has come up only once in real life in my
experience; it's never come up in my own code; it's an anti-FAQ.  People are
100x more likely to whine about theoretical quadratic-time list growth
nobody has ever encountered (although it looks like they may finally get it
under an out-of-the-box BDW collector!).

> And 1999 is special -- we just had a solar eclipse today, the next being
> scheduled for 2081.

Ya, like any of us will survive Y2K to see it <wink>.

1999-is-special-cuz-it's-the-end-of-civilization-ly y'rs  - tim


From Vladimir.Marangozov at inrialpes.fr  Thu Aug 12 20:22:06 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Thu, 12 Aug 1999 19:22:06 +0100 (NFT)
Subject: [Python-Dev] about line numbers
Message-ID: <199908121822.TAA40444@pukapuka.inrialpes.fr>

Just curious:

Is python with vs. without "-O" equivalent today regarding line numbers?
Are SET_LINENO opcodes a plus in some situations or not?

Next, I see quite often several SET_LINENO in a row in the beginning
of code objects due to doc strings, etc. Since I don't think that
folding them into one SET_LINENO would be an optimisation (it would
rather be avoiding the redundancy), is it possible and/or reasonable
to do something in this direction?

A trivial example:

>>> def f():
...     "This is a comment about f"   
...     a = 1
... 
>>> import dis
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2

          6 SET_LINENO          3
          9 LOAD_CONST          1 (1)
         12 STORE_FAST          0 (a)
         15 LOAD_CONST          2 (None)
         18 RETURN_VALUE   
>>>

Can the above become something like this instead:

          0 SET_LINENO          3
          3 LOAD_CONST          1 (1)
          6 STORE_FAST          0 (a)
          9 LOAD_CONST          2 (None)
         12 RETURN_VALUE


-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack at oratrix.nl  Fri Aug 13 00:02:06 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 13 Aug 1999 00:02:06 +0200
Subject: [Python-Dev] about line numbers 
In-Reply-To: Message by Vladimir Marangozov <Vladimir.Marangozov@inrialpes.fr> ,
	     Thu, 12 Aug 1999 19:22:06 +0100 (NFT) , <199908121822.TAA40444@pukapuka.inrialpes.fr> 
Message-ID: <19990812220211.B3CED993@oratrix.oratrix.nl>

The only possible problem I can see with folding linenumbers is if
someone sets a breakpoint on such a line. And I think it'll be
difficult to explain the missing line numbers to pdb, so there isn't
an easy workaround (at least, it takes more than my 30 seconds of
brainpoewr to come up with one:-).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 13 01:10:26 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 00:10:26 +0100 (NFT)
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <000301bee481$b78ae5c0$4e2d2399@tim> from "Tim Peters" at "Aug 12, 99 01:15:39 am"
Message-ID: <199908122310.AAA29618@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Tim]
> >> ...slowing the expected case to prevent theoretical bad cases is
> >> usually a net loss -- I think the onus is on you to demonstrate
> >> that this change is an exception to that rule.
> 
> [Vladimir Marangozov]
> > I won't, because this case is rare in practice, classifying it already
> > as an exception. A real exception. I'll have to think a bit more about
> > all this. Adding 1/3 new entries to trigger the next resize sounds
> > suboptimal (if it happens at all).
> 
> "Suboptimal" with respect to which specific cost model?  Exhibiting a
> specific bad case isn't compelling, and especially not when it's considered
> to be "a real exception".  Adding new expense to every delete is an obvious
> new burden -- where's the payback, and is the expected net effect amortized
> across all dict usage a win or loss?  Offhand it sounds like a small loss to
> me, although I haven't worked up a formal cost model either <wink>.

C'mon Tim, don't try to impress me with cost models. I'm already impressed :-)
Anyways, I've looked at some traces. As expected, the conclusion is that
this case is extremely rare wrt the average dict usage. There are 3 reasons:
(1) dicts are usually deleted entirely and (2) del d[key] is rare in practice
(3) often d[key] = None is used instead of (2).

There is, however, a small percentage of dicts which are used below 1/3 of
their size. I must say, below 1/3 of their peek size, because dowsizing
is also rare. To trigger a downsize, 1/3 new entries of the peek size must
be inserted.

Besides these observations, after looking at the code one more time, I can't
really understand why the resize logic is based on the "fill" watermark
and not on "used". fill = used + dummy, but since lookdict returns the
first free slot (null or dummy), I don't really see what's the point of
using a fill watermark... Perhaps you can enlighten me on this. Using only
the "used" metrics seems fine to me. I even deactivated "fill" and replaced
it with "used" to see what happens -- no visible changes, except a tiny
speedup I'm willing to neglect.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 13 01:21:48 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 00:21:48 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <19990812220211.B3CED993@oratrix.oratrix.nl> from "Jack Jansen" at "Aug 13, 99 00:02:06 am"
Message-ID: <199908122321.AAA29572@pukapuka.inrialpes.fr>

Jack Jansen wrote:
> 
> 
> The only possible problem I can see with folding linenumbers is if
> someone sets a breakpoint on such a line. And I think it'll be
> difficult to explain the missing line numbers to pdb, so there isn't
> an easy workaround (at least, it takes more than my 30 seconds of
> brainpoewr to come up with one:-).
> 

Eek! We can set a breakpoint on a doc string? :-) There's no code
in there. It should be treated as a comment by pdb. I can't set a
breakpoint on a comment line even in C ;-) There must be something
deeper about it...

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one at email.msn.com  Fri Aug 13 02:07:32 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 12 Aug 1999 20:07:32 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908121822.TAA40444@pukapuka.inrialpes.fr>
Message-ID: <000101bee51f$d7601de0$fb2d2399@tim>

[Vladimir Marangozov]
> Is python with vs. without "-O" equivalent today regarding
> line numbers?
>
> Are SET_LINENO opcodes a plus in some situations or not?

In theory it should make no difference, except that the trace mechanism
makes a callback on each SET_LINENO, and that's how the debugger implements
line-number breakpoints.  Under -O, there are no SET_LINENOs, so debugger
line-number breakpoints don't work under -O.

I think there's also a sporadic buglet, which I've never bothered to track
down:  sometimes a line number reported in a traceback under -O (&, IIRC,
it's always the topmost line number) comes out as a senseless negative
value.

> Next, I see quite often several SET_LINENO in a row in the beginning
> of code objects due to doc strings, etc. Since I don't think that
> folding them into one SET_LINENO would be an optimisation (it would
> rather be avoiding the redundancy), is it possible and/or reasonable
> to do something in this direction?

All opcodes consume time, although a wasted trip or two around the eval loop
at the start of a function isn't worth much effort to avoid.  Still, it's a
legitimate opportunity for provable speedup, even if unmeasurable speedup
<wink>.

Would be more valuable to rethink the debugger's breakpoint approach so that
SET_LINENO is never needed (line-triggered callbacks are expensive because
called so frequently, turning each dynamic SET_LINENO into a full-blown
Python call; if I used the debugger often enough to care <wink>, I'd think
about munging in a new opcode to make breakpoint sites explicit).

immutability-is-made-to-be-violated-ly y'rs  - tim


From tim_one at email.msn.com  Fri Aug 13 06:53:38 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Fri, 13 Aug 1999 00:53:38 -0400
Subject: [Python-Dev] shrinking dicts
In-Reply-To: <199908122307.AAA06018@pukapuka.inrialpes.fr>
Message-ID: <000101bee547$cffaa020$992d2399@tim>

[Vladimir Marangozov, *almost* seems ready to give up on a counter-
 productive dict pessimization <wink>]

> ...
> There is, however, a small percentage of dicts which are used
> below 1/3 of their size. I must say, below 1/3 of their peek size,
> because dowsizing is also rare. To trigger a downsize, 1/3 new
> entries of the peek size must be inserted.

Not so, although "on average" 1/6 may be correct.  Look at an extreme:  Say
a dict has size 333 (it can't, but it makes the math obvious ...).  Say it
contains 221 items.  Now someone deletes them all, one at a time.  used==0
and fill==221 at this point.  They insert one new key that happens to hit
one of the 333-221 = 112 remaining NULL keys.  Then used==1 and fill==222.
They insert a 2nd key, and before the dict is searched the new fill of 222
triggers the 2/3rds load-factor resizing -- which asks for a new size of 1*2
== 2.

For the minority of dicts that go up and down in size wildly many times, the
current behavior is fine.

> Besides these observations, after looking at the code one more
> time, I can't really understand why the resize logic is based on
> the "fill" watermark and not on "used". fill = used + dummy, but
> since lookdict returns the first free slot (null or dummy), I don't
> really see what's the point of using a fill watermark...

Let's just consider an unsuccessful search.  Then it does return "the first"
free slot, but not necessarily at the time it *sees* the first free slot.
So long as it sees a dummy, it has to keep searching; the search doesn't end
until it finds a NULL.  So consider this, assuming the resize triggered only
on "used":

d = {}
for i in xrange(50000):
    d[random.randrange(1000000)] = 1
for k in d.keys():
    del d[k]
# now there are 50000 dummy dict keys, and some number of NULLs

# loop invariant:  used == 0
for i in xrange(sys.maxint):
    j = random.randrange(10000000)
    d[j] = 1
    del d[j]
    assert not d.has_key(i)

However many NULL slots remained, the last loop eventually transforms them
*all* into dummies.  The dummies act exactly like "real keys" with respect
to expected time for an unsuccessful search, which is why it's thoroughly
appropriate to include dummies in the load factor computation.  The loop
will run slower and slower as the percentage of dummies approaches 100%, and
each failing has_key approaches O(N) time.

In most hash table implementations that's the worst that can happen (and
it's a disaster), but under Python's implementation it's worse:  Python
never checks to see whether the probe sequence "wraps around", so the first
search after the last NULL is changed to a dummy never ends.

Counting the dummies in the load-factor computation prevents all that:  no
matter how much inserts and deletes are intermixed, the "effective load
factor" stays under 2/3rds so gives excellent expected-case behavior; and it
also protects against an all-dummy dict, making the lack of an expensive
inner-loop "wrapped around?" check safe.

> Perhaps you can enlighten me on this. Using only the "used" metrics
> seems fine to me. I even deactivated "fill" and replaced it with "used"
> to see what happens -- no visible changes, except a tiny speedup I'm
> willing to neglect.

You need a mix of deletes and inserts for the dummies to make a difference;
dicts that always grow don't have dummies, so they're not likely to have any
dummy-related problems either <wink>.  Try this (untested):

import time
from random import randrange
N = 1000
thatmany = [None] * N

while 1:
    start = time.clock()
    for i in thatmany:
        d[randrange(10000000)] = 1
    for i in d.keys():
        del d[i]
    finish = time.clock()
    print round(finish - start, 3)

Succeeding iterations of the outer loop should grow dramatically slower, and
finally get into an infinite loop, despite that "used" never exceeds N.

Short course rewording:  for purposes of predicting expected search time, a
dummy is the same as a live key, because finding a dummy doesn't end a
search -- it has to press on until either finding the key it was looking
for, or finding a NULL.  And with a mix of insertions and deletions, and if
the hash function is doing a good job, then over time all the slots in the
table will become either live or dummy, even if "used" stays within a very
small range.

So, that's why <wink>.

dictobject-may-be-the-subtlest-object-there-is-ly y'rs  - tim


From gstein at lyra.org  Fri Aug 13 11:13:55 1999
From: gstein at lyra.org (Greg Stein)
Date: Fri, 13 Aug 1999 02:13:55 -0700 (PDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com>
Message-ID: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org>

On Tue, 10 Aug 1999, Fredrik Lundh wrote:
>...
> unicode objects do not implement the getcharbuffer slot.

This is Goodness. All righty.

>...
> maybe the unicode class shouldn't implement the
> buffer interface at all?  sure looks like the best way

It is needed for fp.write(unicodeobj) ...

It is also very handy for C functions to deal with Unicode strings.

> to avoid trivial mistakes (the current behaviour of
> fp.write(unicodeobj) is even more serious than the
> marshal glitch...)

What's wrong with fp.write(unicodeobj)? It should write the unicode value
to the file. Are you suggesting that it will need to be done differently?
Icky.

> or maybe the buffer design needs an overhaul?

Not that I know of. 

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Fri Aug 13 12:36:13 1999
From: gstein at lyra.org (Greg Stein)
Date: Fri, 13 Aug 1999 03:36:13 -0700 (PDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <199908101412.KAA02065@eric.cnri.reston.va.us>
Message-ID: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>

On Tue, 10 Aug 1999, Guido van Rossum wrote:
>...
> > or maybe the buffer design needs an overhaul?
> 
> I think most places that should use the charbuffer interface actually
> use the readbuffer interface.  This is what should be fixed.

I believe that I properly changed all of these within the core
distribution. Per your requested design, third-party extensions must
switch from "s#" to "t#" to move to the charbuffer interface, as needed. 

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 13 15:47:05 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 13 Aug 1999 14:47:05 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm"
Message-ID: <199908131347.OAA30740@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> [Vladimir Marangozov, *almost* seems ready to give up on a counter-
>  productive dict pessimization <wink>]
> 

Of course I will! Now everything is perfectly clear. Thanks.

> ...
> So, that's why <wink>.
> 

Now, *this* one explanation of yours should go into a HowTo/BecauseOf
for developers. I timed your scripts and a couple of mine which attest
(again) the validity of the current implementation. My patch is out of
bounds. It even disturbs from time to time the existing harmony in the
results ;-) because of early resizing.

All in all, for performance reasons, dicts remain an exception to the
rule of releasing memory ASAP. They have been designed to tolerate caching
because of their dynamics, which is the main reason for the rare case
addressed by my patch.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Fri Aug 13 19:27:19 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 13 Aug 1999 19:27:19 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
Message-ID: <37B45577.7772CAA1@lemburg.com>

Greg Stein wrote:
> 
> On Tue, 10 Aug 1999, Guido van Rossum wrote:
> >...
> > > or maybe the buffer design needs an overhaul?
> >
> > I think most places that should use the charbuffer interface actually
> > use the readbuffer interface.  This is what should be fixed.
> 
> I believe that I properly changed all of these within the core
> distribution. Per your requested design, third-party extensions must
> switch from "s#" to "t#" to move to the charbuffer interface, as needed.

Shouldn't this be the other way around ? After all, extensions
using "s#" do expect character data and not arbitrary binary
encodings of information. IMHO, the latter should be special
cased, not the prior. E.g. it doesn't make sense to use the
re module to scan over 2-byte Unicode with single character
based search patterns.

Aside: Is the buffer interface reachable in any way from within
Python ? Why isn't the interface exposed via __XXX__ methods
on normal Python instances (could be implemented by returning a
buffer object) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fdrake at acm.org  Fri Aug 13 17:32:40 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Aug 1999 11:32:40 -0400 (EDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B45577.7772CAA1@lemburg.com>
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
	<37B45577.7772CAA1@lemburg.com>
Message-ID: <14260.15000.398399.840716@weyr.cnri.reston.va.us>

M.-A. Lemburg writes:
 > Aside: Is the buffer interface reachable in any way from within
 > Python ? Why isn't the interface exposed via __XXX__ methods
 > on normal Python instances (could be implemented by returning a
 > buffer object) ?

  Would it even make sense?  I though a large part of the intent was
to for performance, avoiding memory copies.  Perhaps there should be
an .__as_buffer__() which returned an object that supports the C
buffer interface.  I'm not sure how useful it would be; perhaps for
classes that represent image data?  They could return a buffer object
created from a string/array/NumPy array.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fredrik at pythonware.com  Fri Aug 13 17:59:12 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 13 Aug 1999 17:59:12 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org><37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us>
Message-ID: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>

>   Would it even make sense?  I though a large part of the intent was
> to for performance, avoiding memory copies.

looks like there's some confusion here over
what the buffer interface is all about.  time
for a new GvR essay, perhaps?

</F>


From fdrake at acm.org  Fri Aug 13 18:22:09 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Fri, 13 Aug 1999 12:22:09 -0400 (EDT)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
	<37B45577.7772CAA1@lemburg.com>
	<14260.15000.398399.840716@weyr.cnri.reston.va.us>
	<00d401bee5a7$13f84750$f29b12c2@secret.pythonware.com>
Message-ID: <14260.17969.497916.382752@weyr.cnri.reston.va.us>

Fredrik Lundh writes:
 > looks like there's some confusion here over
 > what the buffer interface is all about.  time
 > for a new GvR essay, perhaps?

  If he'll write something about it, I'll be glad to adapt it to the
extending & embedding manual.  It seems important that it be included
in the standard documentation since it will be important for extension 
writers to understand when they should implement it.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fredrik at pythonware.com  Fri Aug 13 18:34:46 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 13 Aug 1999 18:34:46 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us>
Message-ID: <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com>

Guido van Rossum wrote:
> > btw, how about adding support for buffer access
> > to data that have strange internal formats (like cer-
> > tain PIL image memories) or isn't directly accessible
> > (like "virtual" and "abstract" image buffers in PIL 1.1).
> > something like:
> > 
> > int initbuffer(PyObject* obj, void** context);
> > int exitbuffer(PyObject* obj, void* context);
> > 
> > and corresponding context arguments to the
> > rest of the functions...
> 
> Can you explain this idea more?  Without more understanding of PIL I
> have no idea what you're talking about...

in code:

    void* context;

    // this can be done at any time
    segments = pb->getsegcount(obj, NULL, context);

    if (!pb->bf_initbuffer(obj, &context))
        ... failed to initialise buffer api ...
    
    ... allocate segment size buffer ...

    pb->getsegcount(obj, &bytes, context);
    ... calculate total buffer size and allocate buffer ...

    for (i = offset = 0; i < segments; i++) {
        n = pb->getreadbuffer(obj, i, &p, context);
        if (n < 0)
            ... failed to fetch a given segment ...
        memcpy(buf + offset, p, n); // or write to file, or whatevef
        offset = offset + n;
   }

   pb->bf_exitbuffer(obj, context);

in other words, this would given the target object a
chance to keep some local context (like a temporary
buffer) during a sequence of buffer operations...

for PIL, this would make it possible to

1) store required metadata (size, mode, palette)
along with the actual buffer contents.

2) possibly pack formats that use extra internal
storage for performance reasons -- RGB pixels
are stored as 32-bit integers, for example.

3) access virtual image memories (that can only
be accessed via a buffer-like interface in them-
selves -- given an image object, you acquire an
access handle, and use a getdata method to
access the actual data.  without initbuffer,
there's no way to do two buffer access in
parallel.  without exitbuffer, there's no way
to release the access handle.  without the
context variable, there's nowhere to keep
the access handle between calls.)

4) access abstract image memories (like virtual
memories, but they reside outside PIL, like on
a remote server, or inside another image pro-
cessing library, or on a hardware device).

5) convert to external formats on the fly:

    fp.write(im.buffer("JPEG"))

and probably a lot more.  as far as I can tell,
nothing of this can be done using the current
design...

...

besides, what about buffers and threads?  if you
return a pointer from getreadbuf, wouldn't it be
good to know exactly when Python doesn't need
that pointer any more?  explicit initbuffer/exitbuffer
calls around each sequence of buffer operations
would make that a lot safer...

</F>


From mal at lemburg.com  Fri Aug 13 21:16:44 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Fri, 13 Aug 1999 21:16:44 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
		<37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us>
Message-ID: <37B46F1C.1A513F33@lemburg.com>

Fred L. Drake, Jr. wrote:
> 
> M.-A. Lemburg writes:
>  > Aside: Is the buffer interface reachable in any way from within
>  > Python ? Why isn't the interface exposed via __XXX__ methods
>  > on normal Python instances (could be implemented by returning a
>  > buffer object) ?
> 
>   Would it even make sense?  I though a large part of the intent was
> to for performance, avoiding memory copies.  Perhaps there should be
> an .__as_buffer__() which returned an object that supports the C
> buffer interface.  I'm not sure how useful it would be; perhaps for
> classes that represent image data?  They could return a buffer object
> created from a string/array/NumPy array.

That's what I had in mind.

def __getreadbuffer__(self):
    return buffer(self.data)

def __getcharbuffer__(self):
    return buffer(self.string_data)

def __getwritebuffer__(self):
    return buffer(self.mmaped_file)

Note that buffer() does not copy the data, it only adds a reference
to the object being used.

Hmm, how about adding a writeable binary object to the core ?
This would be useful for the __getwritebbuffer__() API because
currently, I think, only mmap'ed files are useable as write
buffers -- no other in-memory type. Perhaps buffer objects
could be used for this purpose too, e.g. by having them
allocate the needed memory chunk in case you pass None as
object.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Fri Aug 13 23:48:12 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Fri, 13 Aug 1999 23:48:12 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
Message-ID: <19990813214817.5393C1C4742@oratrix.oratrix.nl>

This week again I was bitten by the fact that Python doesn't have any
form of weak references, and while I was toying with some ideas I came 
up with the following quick-and-dirty scheme that I thought I'd bounce 
off this list. I might even volunteer to implement it, if people agree 
it is worth it:-)

We add a new builtin function (or a module with that function)
weak(). This returns a weak reference to the object passed as a
parameter. A weak object has one method: strong(), which returns the
corresponding real object or raises an exception if the object doesn't 
exist anymore. For convenience we could add a method exists() that
returns true if the real object still exists.

Now comes the bit that I'm unsure about: to implement this I need to
add a pointer to every object. This pointer is either NULL or points
to the corresponding weak objectt (so for every object there is either no
weak reference object or exactly one). But, for the price of 4 bytes extra
in every object we get the nicety that there is little cpu-overhead:
refcounting macros work identical to the way they do now, the only
thing to take care of is that during object deallocation we have to
zero the weak pointer. (actually: we could make do with a single bit
in every object, with the bit meaning "this object has an associated
weak object". We could then use a global dictionary indexed by object
address to find the weak object)


From mal at lemburg.com  Sat Aug 14 01:15:39 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 01:15:39 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org>
Message-ID: <37B4A71B.2073875F@lemburg.com>

Greg Stein wrote:
> 
> On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > maybe the unicode class shouldn't implement the
> > buffer interface at all?  sure looks like the best way
> 
> It is needed for fp.write(unicodeobj) ...
> 
> It is also very handy for C functions to deal with Unicode strings.

Wouldn't a special C API be (even) more convenient ?

> > to avoid trivial mistakes (the current behaviour of
> > fp.write(unicodeobj) is even more serious than the
> > marshal glitch...)
> 
> What's wrong with fp.write(unicodeobj)? It should write the unicode value
> to the file. Are you suggesting that it will need to be done differently?
> Icky.

Would this also write some kind of Unicode encoding header ?
[Sorry, this is my Unicode ignorance shining through... I only
 remember lots of talk about these things on the string-sig.]

Since fp.write() uses "s#" this would use the getreadbuffer
slot in 1.5.2... I think what it *should* do is use the
getcharbuffer slot instead (see my other post), since dumping
the raw unicode data would loose too much information. Again,
such things should be handled by extra methods, e.g. fp.rawwrite().

Hmm, I guess the philosophy behind the interface is not
really clear. Binary data is fetched via getreadbuffer and then
interpreted as character data... I always thought that the
getcharbuffer should be used for such an interpretation.

Or maybe, we should dump the getcharbufer slot again and
use the getreadbuffer information just as we would a
void* pointer in C: with no explicit or implicit type information.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   140 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sat Aug 14 10:53:04 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 01:53:04 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com>
Message-ID: <37B52E70.2D957546@lyra.org>

M.-A. Lemburg wrote:
> 
> Greg Stein wrote:
> >
> > On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > > maybe the unicode class shouldn't implement the
> > > buffer interface at all?  sure looks like the best way
> >
> > It is needed for fp.write(unicodeobj) ...
> >
> > It is also very handy for C functions to deal with Unicode strings.
> 
> Wouldn't a special C API be (even) more convenient ?

Why? Accessing the Unicode values as a series of bytes matches exactly
to the semantics of the buffer interface. Why throw in Yet Another
Function?

Your abstract.c functions make it quite simple.

> > > to avoid trivial mistakes (the current behaviour of
> > > fp.write(unicodeobj) is even more serious than the
> > > marshal glitch...)
> >
> > What's wrong with fp.write(unicodeobj)? It should write the unicode value
> > to the file. Are you suggesting that it will need to be done differently?
> > Icky.
> 
> Would this also write some kind of Unicode encoding header ?
> [Sorry, this is my Unicode ignorance shining through... I only
>  remember lots of talk about these things on the string-sig.]

Absolutely not. Placing the Byte Order Mark (BOM) into an output stream
is an application-level task. It should never by done by any subsystem.

There are no other "encoding headers" that would go into the output
stream. The output would simply be UTF-16 (2-byte values in host byte
order).

> Since fp.write() uses "s#" this would use the getreadbuffer
> slot in 1.5.2... I think what it *should* do is use the
> getcharbuffer slot instead (see my other post), since dumping
> the raw unicode data would loose too much information. Again,

I very much disagree. To me, fp.write() is not about writing characters
to a stream. I think it makes much more sense as "writing bytes to a
stream" and the buffer interface fits that perfectly.

There is no loss of data. You could argue that the byte order is lost,
but I think that is incorrect. The application defines the semantics:
the file might be defined as using host-order, or the application may be
writing a BOM at the head of the file.

> such things should be handled by extra methods, e.g. fp.rawwrite().

I believe this would be a needless complication of the interface.

> Hmm, I guess the philosophy behind the interface is not
> really clear.

I didn't design or implement it initially, but (as you may have guessed)
I am a proponent of its existence.

> Binary data is fetched via getreadbuffer and then
> interpreted as character data... I always thought that the
> getcharbuffer should be used for such an interpretation.

The former is bad behavior. That is why getcharbuffer was added (by me,
for 1.5.2). It was a preventative measure for the introduction of
Unicode strings. Using getreadbuffer for characters would break badly
given a Unicode string. Therefore, "clients" that want (8-bit)
characters from an object supporting the buffer interface should use
getcharbuffer. The Unicode object doesn't implement it, implying that it
cannot provide 8-bit characters. You can get the raw bytes thru
getreadbuffer.

> Or maybe, we should dump the getcharbufer slot again and
> use the getreadbuffer information just as we would a
> void* pointer in C: with no explicit or implicit type information.

Nope. That path is frought with failure :-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Sat Aug 14 12:21:51 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 12:21:51 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <19990813214817.5393C1C4742@oratrix.oratrix.nl>
Message-ID: <37B5433F.61CE6F76@lemburg.com>

Jack Jansen wrote:
> 
> This week again I was bitten by the fact that Python doesn't have any
> form of weak references, and while I was toying with some ideas I came
> up with the following quick-and-dirty scheme that I thought I'd bounce
> off this list. I might even volunteer to implement it, if people agree
> it is worth it:-)

Have you checked the weak reference dictionary implementation
by Dieter Maurer ? It's at:

	http://www.handshake.de/~dieter/weakdict.html

While I like the idea of having weak references in the core,
I think 4 extra bytes for *every* object is just a little
too much. The flag bit idea (with the added global dictionary
of weak referenced objects) looks promising though.

BTW, how would this be done in JPython ? I guess it doesn't
make much sense there because cycles are no problem for the
Java VM GC.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sat Aug 14 14:30:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 14:30:45 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org>
Message-ID: <37B56175.23ABB350@lemburg.com>

Greg Stein wrote:
> 
> M.-A. Lemburg wrote:
> >
> > Greg Stein wrote:
> > >
> > > On Tue, 10 Aug 1999, Fredrik Lundh wrote:
> > > > maybe the unicode class shouldn't implement the
> > > > buffer interface at all?  sure looks like the best way
> > >
> > > It is needed for fp.write(unicodeobj) ...
> > >
> > > It is also very handy for C functions to deal with Unicode strings.
> >
> > Wouldn't a special C API be (even) more convenient ?
> 
> Why? Accessing the Unicode values as a series of bytes matches exactly
> to the semantics of the buffer interface. Why throw in Yet Another
> Function?

I meant PyUnicode_* style APIs for dealing with all the aspects
of Unicode objects -- much like the PyString_* APIs available.
 
> Your abstract.c functions make it quite simple.

BTW, do we need an extra set of those with buffer index or not ?
Those would really be one-liners for the sake of hiding the
type slots from applications.

> > > > to avoid trivial mistakes (the current behaviour of
> > > > fp.write(unicodeobj) is even more serious than the
> > > > marshal glitch...)
> > >
> > > What's wrong with fp.write(unicodeobj)? It should write the unicode value
> > > to the file. Are you suggesting that it will need to be done differently?
> > > Icky.
> >
> > Would this also write some kind of Unicode encoding header ?
> > [Sorry, this is my Unicode ignorance shining through... I only
> >  remember lots of talk about these things on the string-sig.]
> 
> Absolutely not. Placing the Byte Order Mark (BOM) into an output stream
> is an application-level task. It should never by done by any subsystem.
> 
> There are no other "encoding headers" that would go into the output
> stream. The output would simply be UTF-16 (2-byte values in host byte
> order).

Ok.

> > Since fp.write() uses "s#" this would use the getreadbuffer
> > slot in 1.5.2... I think what it *should* do is use the
> > getcharbuffer slot instead (see my other post), since dumping
> > the raw unicode data would loose too much information. Again,
> 
> I very much disagree. To me, fp.write() is not about writing characters
> to a stream. I think it makes much more sense as "writing bytes to a
> stream" and the buffer interface fits that perfectly.

This is perfectly ok, but shouldn't the behaviour of fp.write()
mimic that of previous Python versions ? How does JPython
write the data ?

Inlined different subject:
I think the internal semantics of "s#" using the getreadbuffer slot
and "t#" the getcharbuffer slot should be switched; see my other post.
In previous Python versions "s#" had the semantics of string data
with possibly embedded NULL bytes. Now it suddenly has the meaning
of binary data and you can't simply change extensions to use the
new "t#" because people are still using them with older Python
versions.
 
> There is no loss of data. You could argue that the byte order is lost,
> but I think that is incorrect. The application defines the semantics:
> the file might be defined as using host-order, or the application may be
> writing a BOM at the head of the file.

The problem here is that many application were not written
to handle these kind of objects. Previously they could only
handle strings, now they can suddenly handle any object
having the buffer interface and then fail when the data
gets read back in.

> > such things should be handled by extra methods, e.g. fp.rawwrite().
> 
> I believe this would be a needless complication of the interface.

It would clarify things and make the interface 100% backward
compatible again.
 
> > Hmm, I guess the philosophy behind the interface is not
> > really clear.
> 
> I didn't design or implement it initially, but (as you may have guessed)
> I am a proponent of its existence.
> 
> > Binary data is fetched via getreadbuffer and then
> > interpreted as character data... I always thought that the
> > getcharbuffer should be used for such an interpretation.
> 
> The former is bad behavior. That is why getcharbuffer was added (by me,
> for 1.5.2). It was a preventative measure for the introduction of
> Unicode strings. Using getreadbuffer for characters would break badly
> given a Unicode string. Therefore, "clients" that want (8-bit)
> characters from an object supporting the buffer interface should use
> getcharbuffer. The Unicode object doesn't implement it, implying that it
> cannot provide 8-bit characters. You can get the raw bytes thru
> getreadbuffer.

I agree 100%, but did you add the "t#" instead of having
"s#" use the getcharbuffer interface ? E.g. my mxTextTools
package uses "s#" on many APIs. Now someone could stick
in a Unicode object and get pretty strange results without
any notice about mxTextTools and Unicode being incompatible.
You could argue that I change to "t#", but that doesn't
work since many people out there still use Python versions
<1.5.2 and those didn't have "t#", so mxTextTools would then
fail completely for them.
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sat Aug 14 13:34:17 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 04:34:17 -0700
Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?))
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com>
Message-ID: <37B55439.683272D2@lyra.org>

M.-A. Lemburg wrote:
>...
> I meant PyUnicode_* style APIs for dealing with all the aspects
> of Unicode objects -- much like the PyString_* APIs available.

Sure, these could be added as necessary. For raw access to the bytes, I
would refer people to the abstract buffer functions, tho.

> > Your abstract.c functions make it quite simple.
> 
> BTW, do we need an extra set of those with buffer index or not ?
> Those would really be one-liners for the sake of hiding the
> type slots from applications.

It sounds like NumPy and PIL would need it, which makes the landscape
quite a bit different from the last time we discussed this (when we
didn't imagine anybody needing those).

>...
> > > Since fp.write() uses "s#" this would use the getreadbuffer
> > > slot in 1.5.2... I think what it *should* do is use the
> > > getcharbuffer slot instead (see my other post), since dumping
> > > the raw unicode data would loose too much information. Again,
> >
> > I very much disagree. To me, fp.write() is not about writing characters
> > to a stream. I think it makes much more sense as "writing bytes to a
> > stream" and the buffer interface fits that perfectly.
> 
> This is perfectly ok, but shouldn't the behaviour of fp.write()
> mimic that of previous Python versions ? How does JPython
> write the data ?

fp.write() had no semantics for writing Unicode objects since they
didn't exist. Therefore, we are not breaking or changing any behavior.

> Inlined different subject:
> I think the internal semantics of "s#" using the getreadbuffer slot
> and "t#" the getcharbuffer slot should be switched; see my other post.

1) Too late
2) The use of "t#" ("text") for the getcharbuffer slot was decided by
the Benevolent Dictator.
3) see (2)

> In previous Python versions "s#" had the semantics of string data
> with possibly embedded NULL bytes. Now it suddenly has the meaning
> of binary data and you can't simply change extensions to use the
> new "t#" because people are still using them with older Python
> versions.

Guido and I had a pretty long discussion on what the best approach here
was. I think we even pulled in Tim as a final arbiter, as I recall.

I believe "s#" remained getreadbuffer simply because it *also* meant
"give me the bytes of that object". If it changed to getcharbuffer, then
you could see exceptions in code that didn't raise exceptions
beforehand.

(more below)

> > There is no loss of data. You could argue that the byte order is lost,
> > but I think that is incorrect. The application defines the semantics:
> > the file might be defined as using host-order, or the application may be
> > writing a BOM at the head of the file.
> 
> The problem here is that many application were not written
> to handle these kind of objects. Previously they could only
> handle strings, now they can suddenly handle any object
> having the buffer interface and then fail when the data
> gets read back in.

An application is a complete unit. How are you suddenly going to
manifest Unicode objects within that application? The only way is if the
developer goes in and changes things; let them deal with the issues and
fallout of their change. The other is external changes such as an
upgrade to the interpreter or a module. Again, (IMO) if you're
perturbing a system, then you are responsible for also correcting any
problems you introduce.

In any case, Guido's position was that things can easily switch over to
the "t#" interface to prevent the class of error where you pass a
Unicode string to a function that expects a standard string.

> > > such things should be handled by extra methods, e.g. fp.rawwrite().
> >
> > I believe this would be a needless complication of the interface.
> 
> It would clarify things and make the interface 100% backward
> compatible again.

No. "s#" used to pull bytes from any buffer-capable object. Your
suggestion for "s#" to use the getcharbuffer could introduce exceptions
into currently-working code.

(this was probably Guido's prime motivation for the currently meaning of
"t#"... I can dig up the mail thread if people need an authoritative
commentary on the decision that was made)

> > > Hmm, I guess the philosophy behind the interface is not
> > > really clear.
> >
> > I didn't design or implement it initially, but (as you may have guessed)
> > I am a proponent of its existence.
> >
> > > Binary data is fetched via getreadbuffer and then
> > > interpreted as character data... I always thought that the
> > > getcharbuffer should be used for such an interpretation.
> >
> > The former is bad behavior. That is why getcharbuffer was added (by me,
> > for 1.5.2). It was a preventative measure for the introduction of
> > Unicode strings. Using getreadbuffer for characters would break badly
> > given a Unicode string. Therefore, "clients" that want (8-bit)
> > characters from an object supporting the buffer interface should use
> > getcharbuffer. The Unicode object doesn't implement it, implying that it
> > cannot provide 8-bit characters. You can get the raw bytes thru
> > getreadbuffer.
> 
> I agree 100%, but did you add the "t#" instead of having
> "s#" use the getcharbuffer interface ?

Yes. For reasons detailed above.

> E.g. my mxTextTools
> package uses "s#" on many APIs. Now someone could stick
> in a Unicode object and get pretty strange results without
> any notice about mxTextTools and Unicode being incompatible.

They could also stick in an array of integers. That supports the buffer
interface, meaning the "s#" in your code would extract the bytes from
it. In other words, people can already stick bogus stuff into your code.

This seems to be a moot argument.

> You could argue that I change to "t#", but that doesn't
> work since many people out there still use Python versions
> <1.5.2 and those didn't have "t#", so mxTextTools would then
> fail completely for them.

If support for the older versions is needed, then use an #ifdef to set
up the appropriate macro in some header. Use that throughout your code.

In any case: yes -- I would argue that you should absolutely be using
"t#".

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From fredrik at pythonware.com  Sat Aug 14 15:19:07 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 14 Aug 1999 15:19:07 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com>
Message-ID: <003101bee657$972d1550$f29b12c2@secret.pythonware.com>

M.-A. Lemburg <mal at lemburg.com> wrote:
> I meant PyUnicode_* style APIs for dealing with all the aspects
> of Unicode objects -- much like the PyString_* APIs available.

it's already there, of course.  see unicode.h
in the unicode distribution (Mark is hopefully
adding this to 1.6 in this very moment...)

> > I very much disagree. To me, fp.write() is not about writing characters
> > to a stream. I think it makes much more sense as "writing bytes to a
> > stream" and the buffer interface fits that perfectly.
> 
> This is perfectly ok, but shouldn't the behaviour of fp.write()
> mimic that of previous Python versions ? How does JPython
> write the data ?

the crucial point is how an average user expects things
to work.  the current design is quite assymmetric -- you
can easily *write* things that implement the buffer inter-
face to a stream, but how the heck do you get them
back?

(as illustrated by the marshal buglet...)

</F>


From fredrik at pythonware.com  Sat Aug 14 17:21:48 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 14 Aug 1999 17:21:48 +0200
Subject: [Python-Dev] buffer design (was: marshal (was:Buffer interface in abstract.c?))
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org>
Message-ID: <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com>

Greg Stein <gstein at lyra.org> wrote:
> > E.g. my mxTextTools
> > package uses "s#" on many APIs. Now someone could stick
> > in a Unicode object and get pretty strange results without
> > any notice about mxTextTools and Unicode being incompatible.
> 
> They could also stick in an array of integers. That supports the buffer
> interface, meaning the "s#" in your code would extract the bytes from
> it. In other words, people can already stick bogus stuff into your code.

Except that people may expect unicode strings
to work just like any other kind of string, while
arrays are surely a different thing.

I'm beginning to suspect that the current buffer
design is partially broken; it tries to work around
at least two problems at once:

a) the current use of "string" objects for two purposes:
as strings of 8-bit characters, and as buffers containing
arbitrary binary data.

b) performance issues when reading/writing certain kinds
of data to/from streams.

and fails to fully address either of them.

</F>


From mal at lemburg.com  Sat Aug 14 18:30:21 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 18:30:21 +0200
Subject: [Python-Dev] Re: buffer design
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org>
Message-ID: <37B5999D.201EA88C@lemburg.com>

Greg Stein wrote:
> 
> M.-A. Lemburg wrote:
> >...
> > I meant PyUnicode_* style APIs for dealing with all the aspects
> > of Unicode objects -- much like the PyString_* APIs available.
> 
> Sure, these could be added as necessary. For raw access to the bytes, I
> would refer people to the abstract buffer functions, tho.

I guess that's up to them... PyUnicode_AS_WCHAR() could also be
exposed I guess (are C's wchar strings useable as Unicode basis ?).

> > > Your abstract.c functions make it quite simple.
> >
> > BTW, do we need an extra set of those with buffer index or not ?
> > Those would really be one-liners for the sake of hiding the
> > type slots from applications.
> 
> It sounds like NumPy and PIL would need it, which makes the landscape
> quite a bit different from the last time we discussed this (when we
> didn't imagine anybody needing those).

Ok, then I'll add them and post the new set next week.
 
> >...
> > > > Since fp.write() uses "s#" this would use the getreadbuffer
> > > > slot in 1.5.2... I think what it *should* do is use the
> > > > getcharbuffer slot instead (see my other post), since dumping
> > > > the raw unicode data would loose too much information. Again,
> > >
> > > I very much disagree. To me, fp.write() is not about writing characters
> > > to a stream. I think it makes much more sense as "writing bytes to a
> > > stream" and the buffer interface fits that perfectly.
> >
> > This is perfectly ok, but shouldn't the behaviour of fp.write()
> > mimic that of previous Python versions ? How does JPython
> > write the data ?
> 
> fp.write() had no semantics for writing Unicode objects since they
> didn't exist. Therefore, we are not breaking or changing any behavior.

The problem is hidden in polymorph functions and tools: previously
they could not handle anything but strings, now they also work
on arbitrary buffers without raising exceptions. That's what I'm
concerned about.
 
> > Inlined different subject:
> > I think the internal semantics of "s#" using the getreadbuffer slot
> > and "t#" the getcharbuffer slot should be switched; see my other post.
> 
> 1) Too late
> 2) The use of "t#" ("text") for the getcharbuffer slot was decided by
> the Benevolent Dictator.
> 3) see (2)

1) It's not too late: most people aren't even aware of the buffer
interface (except maybe the small crowd on this list).
 
2) A mistake in patchlevel release of Python can easily be undone
in the next minor release. No big deal.

3) Too remain even compatible to 1.5.2 in future revisions, a
new explicit marker, e.g. "r#" for raw data, could be added to hold the
code for getreadbuffer. "s#" and "z#" should then switch 
to using getcharbuffer.

> > In previous Python versions "s#" had the semantics of string data
> > with possibly embedded NULL bytes. Now it suddenly has the meaning
> > of binary data and you can't simply change extensions to use the
> > new "t#" because people are still using them with older Python
> > versions.
> 
> Guido and I had a pretty long discussion on what the best approach here
> was. I think we even pulled in Tim as a final arbiter, as I recall.

What was the final argument then ? (I guess the discussion was
held *before* the addition of getcharbuffer, right ?)
 
> I believe "s#" remained getreadbuffer simply because it *also* meant
> "give me the bytes of that object". If it changed to getcharbuffer, then
> you could see exceptions in code that didn't raise exceptions
> beforehand.
>
> (more below)

"s#" historically always meant "give be char* data with length".
It did not mean: "give me a pointer to the data area and its length".
That interpretation is new in 1.5.2. Even integers and lists
could provide buffer access with the new interpretation...
(sound evil ;-)

> > > There is no loss of data. You could argue that the byte order is lost,
> > > but I think that is incorrect. The application defines the semantics:
> > > the file might be defined as using host-order, or the application may be
> > > writing a BOM at the head of the file.
> >
> > The problem here is that many application were not written
> > to handle these kind of objects. Previously they could only
> > handle strings, now they can suddenly handle any object
> > having the buffer interface and then fail when the data
> > gets read back in.
> 
> An application is a complete unit. How are you suddenly going to
> manifest Unicode objects within that application? The only way is if the
> developer goes in and changes things; let them deal with the issues and
> fallout of their change. The other is external changes such as an
> upgrade to the interpreter or a module. Again, (IMO) if you're
> perturbing a system, then you are responsible for also correcting any
> problems you introduce.

Well, ok, if you're talking about standalone apps. I was
referring to applications which interact with other applications,
e.g. via files or sockets. You could pass a Unicode obj to a
socket and have it transfer the data to the other end without
getting an exception on the sending part of the connection.
The receiver would read the data as string and most probably
fail.

The whole application sitting in between and dealing with
the protocol and connection management wouldn't even notice
that you've just tried to extended its capabilities.

> In any case, Guido's position was that things can easily switch over to
> the "t#" interface to prevent the class of error where you pass a
> Unicode string to a function that expects a standard string.

Strange, why should code that relies on 8-bit character data
be changed because a new unsupported object type pops up ?
Code supporting the new type will have to be rewritten anyway,
but why break existing extensions in unpredicted ways ?

> > > > such things should be handled by extra methods, e.g. fp.rawwrite().
> > >
> > > I believe this would be a needless complication of the interface.
> >
> > It would clarify things and make the interface 100% backward
> > compatible again.
> 
> No. "s#" used to pull bytes from any buffer-capable object. Your
> suggestion for "s#" to use the getcharbuffer could introduce exceptions
> into currently-working code.

The buffer objects were introduced in 1.5.1, AFAIR. Changing
the semantics back to the original ones would only break
extensions relying on the behaviour you desribe -- the distribution
can easily be adapted to use some other marker, such as "r#".

> (this was probably Guido's prime motivation for the currently meaning of
> "t#"... I can dig up the mail thread if people need an authoritative
> commentary on the decision that was made)
> 
> > > > Hmm, I guess the philosophy behind the interface is not
> > > > really clear.
> > >
> > > I didn't design or implement it initially, but (as you may have guessed)
> > > I am a proponent of its existence.
> > >
> > > > Binary data is fetched via getreadbuffer and then
> > > > interpreted as character data... I always thought that the
> > > > getcharbuffer should be used for such an interpretation.
> > >
> > > The former is bad behavior. That is why getcharbuffer was added (by me,
> > > for 1.5.2). It was a preventative measure for the introduction of
> > > Unicode strings. Using getreadbuffer for characters would break badly
> > > given a Unicode string. Therefore, "clients" that want (8-bit)
> > > characters from an object supporting the buffer interface should use
> > > getcharbuffer. The Unicode object doesn't implement it, implying that it
> > > cannot provide 8-bit characters. You can get the raw bytes thru
> > > getreadbuffer.
> >
> > I agree 100%, but did you add the "t#" instead of having
> > "s#" use the getcharbuffer interface ?
> 
> Yes. For reasons detailed above.
> 
> > E.g. my mxTextTools
> > package uses "s#" on many APIs. Now someone could stick
> > in a Unicode object and get pretty strange results without
> > any notice about mxTextTools and Unicode being incompatible.
> 
> They could also stick in an array of integers. That supports the buffer
> interface, meaning the "s#" in your code would extract the bytes from
> it. In other words, people can already stick bogus stuff into your code.

Right now they can with 1.5.1 and 1.5.2 which is unfortunate.
I'd rather have the parsing function raise an exception.
 
> This seems to be a moot argument.

Not really when you have to support extensions across three
different patchlevels of Python.
 
> > You could argue that I change to "t#", but that doesn't
> > work since many people out there still use Python versions
> > <1.5.2 and those didn't have "t#", so mxTextTools would then
> > fail completely for them.
> 
> If support for the older versions is needed, then use an #ifdef to set
> up the appropriate macro in some header. Use that throughout your code.
>
> In any case: yes -- I would argue that you should absolutely be using
> "t#".

I can easily change my code, no big deal, but what about
the dozens of other extensions I don't want to bother diving
into ? I'd rather see an exception then complete garbage written
to a file or a socket.

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sat Aug 14 18:53:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 14 Aug 1999 18:53:45 +0200
Subject: [Python-Dev] buffer design
References: <Pine.LNX.3.95.990813021037.9391N-100000@ns1.lyra.org> <37B4A71B.2073875F@lemburg.com> <37B52E70.2D957546@lyra.org> <37B56175.23ABB350@lemburg.com> <37B55439.683272D2@lyra.org> <004201bee668$ba6e9870$f29b12c2@secret.pythonware.com>
Message-ID: <37B59F19.45C1D23B@lemburg.com>

Fredrik Lundh wrote:
> 
> Greg Stein <gstein at lyra.org> wrote:
> > > E.g. my mxTextTools
> > > package uses "s#" on many APIs. Now someone could stick
> > > in a Unicode object and get pretty strange results without
> > > any notice about mxTextTools and Unicode being incompatible.
> >
> > They could also stick in an array of integers. That supports the buffer
> > interface, meaning the "s#" in your code would extract the bytes from
> > it. In other words, people can already stick bogus stuff into your code.
> 
> Except that people may expect unicode strings
> to work just like any other kind of string, while
> arrays are surely a different thing.
> 
> I'm beginning to suspect that the current buffer
> design is partially broken; it tries to work around
> at least two problems at once:
> 
> a) the current use of "string" objects for two purposes:
> as strings of 8-bit characters, and as buffers containing
> arbitrary binary data.
> 
> b) performance issues when reading/writing certain kinds
> of data to/from streams.
> 
> and fails to fully address either of them.

True, a higher level interface for those two objectives would
certainly address them much better than what we are trying to do at
bit level. Buffers should probably only be treated as pointers to
abstract memory areas and nothing more.

BTW, what about my suggestion to extend buffers to also allocate
memory (in case you pass None as object) ? Or should array
be used for that purpose (its an undocumented feature of arrays) ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   139 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gstein at lyra.org  Sun Aug 15 04:59:25 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 19:59:25 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com>
Message-ID: <37B62D0D.6EC24240@lyra.org>

Fredrik Lundh wrote:
>...
> besides, what about buffers and threads?  if you
> return a pointer from getreadbuf, wouldn't it be
> good to know exactly when Python doesn't need
> that pointer any more?  explicit initbuffer/exitbuffer
> calls around each sequence of buffer operations
> would make that a lot safer...

This is a pretty obvious one, I think: it lasts only as long as the
object. PyString_AS_STRING is similar. Nothing new or funny here.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Aug 15 05:09:19 1999
From: gstein at lyra.org (Greg Stein)
Date: Sat, 14 Aug 1999 20:09:19 -0700
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
			<37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com>
Message-ID: <37B62F5E.30C62070@lyra.org>

M.-A. Lemburg wrote:
> 
> Fred L. Drake, Jr. wrote:
> >
> > M.-A. Lemburg writes:
> >  > Aside: Is the buffer interface reachable in any way from within
> >  > Python ? Why isn't the interface exposed via __XXX__ methods
> >  > on normal Python instances (could be implemented by returning a
> >  > buffer object) ?
> >
> >   Would it even make sense?  I though a large part of the intent was
> > to for performance, avoiding memory copies.  Perhaps there should be
> > an .__as_buffer__() which returned an object that supports the C
> > buffer interface.  I'm not sure how useful it would be; perhaps for
> > classes that represent image data?  They could return a buffer object
> > created from a string/array/NumPy array.

There is no way to do this. The buffer interface only returns pointers
to memory. There would be no place to return an intermediary object, nor
a way to retain the reference for it.

For example, your class instance quickly sets up a PyBufferObject with
the relevant data and returns that. The underlying C code must now hold
that reference *and* return a pointer to the calling code. Impossible.

Fredrik's open/close concept for buffer accesses would make this
possible, as long as clients are aware that any returned pointer is
valid only until the buffer_close call. The context argument he proposes
would hold the object reference.

Having class instances respond to the buffer interface is interesting,
but until more code attempts to *use* the interface, I'm not quite sure
of the utility...

>... 
> Hmm, how about adding a writeable binary object to the core ?
> This would be useful for the __getwritebbuffer__() API because
> currently, I think, only mmap'ed files are useable as write
> buffers -- no other in-memory type. Perhaps buffer objects
> could be used for this purpose too, e.g. by having them
> allocate the needed memory chunk in case you pass None as
> object.

Yes, this would be very good. I would recommend that you pass an
integer, however, rather than None. You need to tell it the size of the
buffer to allocate. Since buffer(5) has no meaning at the moment,
altering the semantics to include this form would not be a problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From da at ski.org  Sun Aug 15 08:10:59 1999
From: da at ski.org (David Ascher)
Date: Sat, 14 Aug 1999 23:10:59 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B62F5E.30C62070@lyra.org>
Message-ID: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>

On Sat, 14 Aug 1999, Greg Stein wrote:

> Having class instances respond to the buffer interface is interesting,
> but until more code attempts to *use* the interface, I'm not quite sure
> of the utility...

Well, here's an example from my work today.  Maybe someone can suggest an
alternative that I haven't seen.

I'm using buffer objects to pass pointers to structs back and forth
between Python and Windows (Win32's GUI scheme involves sending messages
to functions with, oftentimes, addresses of structs as arguments, and
expect the called function to modify the struct directly -- similarly, I
must call Win32 functions w/ pointers to memory that Windows will modify,
and be able to read the modified memory). With 'raw' buffer object
manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
Python), this works fine [*].  So far, no instances.

I also have a class which allows the user to describe the buffer memory
layout in a natural way given the C struct, and manipulate the buffer
layout w/ getattr/setattr.  For example:

class Win32MenuItemStruct(AutoStruct):
    #
    # for each slot, specify type (maps to a struct.pack specifier),
    # name (for setattr/getattr behavior) and optional defaults.
    #
    table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct),
             (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID),
             (UINT, 'fType', MFT_STRING),
             (UINT, 'fState', MFS_ENABLED),
             (UINT, 'wID', None),
             (HANDLE, 'hSubMenu', 0),
             (HANDLE, 'hbmpChecked', 0),
             (HANDLE, 'hbmpUnchecked', 0),
             (DWORD, 'dwItemData', 0),
             (LPSTR, 'name', None),
             (UINT, 'cch', 0)]

AutoStruct has machinery which allows setting of buffer slices by slot
name, conversion of numeric types, etc.  This is working well.

The only hitch is that to send the buffer to the SWIG'ed function call, I
have three options, none ideal:

   1) define a __str__ method which makes a string of the buffer and pass
      that to the function which expects an "s#" argument.  This send
      a copy of the data, not the address.  As a result, this works
      well for structs which I create from scratch as long as I don't need
      to see any changes that Windows might have performed on the memory.

   2) send the instance but make up my own 'get-the-instance-as-buffer'
      API -- complicates extension module code.

   3) send the buffer attribute of the instance instead of the instance --
      complicates Python code, and the C code isn't trivial because there
      is no 'buffer' typecode for PyArg_ParseTuple().

If I could define an 

  def __aswritebuffer__

and if there was a PyArg_ParseTuple() typecode associated with read/write
buffers (I nominate 'w'!), I believe things would be simpler -- I could
then send the instance, specify in the PyArgParse_Tuple that I want a
pointer to memory, and I'd be golden.

What did I miss?

--david

[*] I feel naughty modifying random bits of memory from Python, but Bill
    Gates made me do it!


From mal at lemburg.com  Sun Aug 15 10:47:00 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 15 Aug 1999 10:47:00 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.LNX.3.95.990813033139.10225B-100000@ns1.lyra.org>
				<37B45577.7772CAA1@lemburg.com> <14260.15000.398399.840716@weyr.cnri.reston.va.us> <37B46F1C.1A513F33@lemburg.com> <37B62F5E.30C62070@lyra.org>
Message-ID: <37B67E84.6BBC8136@lemburg.com>

Greg Stein wrote:
>
> [me suggesting new __XXX__ methods on Python instances to provide
>  the buffer slots to Python programmers]
>
> Having class instances respond to the buffer interface is interesting,
> but until more code attempts to *use* the interface, I'm not quite sure
> of the utility...

Well, there already is lots of code supporting the interface,
e.g. fp.write(), socket.write() etc. Basically all streaming
interfaces I guess. So these APIs could be used to "write"
the object directly into a file.

> >...
> > Hmm, how about adding a writeable binary object to the core ?
> > This would be useful for the __getwritebbuffer__() API because
> > currently, I think, only mmap'ed files are useable as write
> > buffers -- no other in-memory type. Perhaps buffer objects
> > could be used for this purpose too, e.g. by having them
> > allocate the needed memory chunk in case you pass None as
> > object.
> 
> Yes, this would be very good. I would recommend that you pass an
> integer, however, rather than None. You need to tell it the size of the
> buffer to allocate. Since buffer(5) has no meaning at the moment,
> altering the semantics to include this form would not be a problem.

I was thinking of using the existing buffer(object,offset,size)
constructor... that's why I took None as object. offset would
then always be 0 and size gives the size of the memory chunk
to allocate. Of course, buffer(size) would look nicer, but it seems
a rather peculiar interface definition to say: ok, if you pass
a real Python integer, we'll take that as size. Who knows, maybe
at some in the future, you want to "write" integers via the
buffer interface too... then you'd probably also want to write
None... so how about a new builtin writebuffer(size) ?

Also, I think it would make sense to extend buffers to have
methods and attributes:

.writeable - attribute that tells whether the buffer is writeable
.chardata - true iff the getcharbuffer slot is available
.asstring() - return the buffer as Python string object

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Sun Aug 15 10:59:21 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sun, 15 Aug 1999 10:59:21 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
References: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>
Message-ID: <37B68169.73E03C84@lemburg.com>

David Ascher wrote:
> 
> On Sat, 14 Aug 1999, Greg Stein wrote:
> 
> > Having class instances respond to the buffer interface is interesting,
> > but until more code attempts to *use* the interface, I'm not quite sure
> > of the utility...
> 
> Well, here's an example from my work today.  Maybe someone can suggest an
> alternative that I haven't seen.
> 
> I'm using buffer objects to pass pointers to structs back and forth
> between Python and Windows (Win32's GUI scheme involves sending messages
> to functions with, oftentimes, addresses of structs as arguments, and
> expect the called function to modify the struct directly -- similarly, I
> must call Win32 functions w/ pointers to memory that Windows will modify,
> and be able to read the modified memory). With 'raw' buffer object
> manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
> Python), this works fine [*].  So far, no instances.

So that's why you were suggesting that struct.pack returns a buffer
rather than a string ;-)

Actually, I think you could use arrays to do the trick right now,
because they are writeable (unlike strings). Until creating
writeable buffer objects becomes possible that is...

> I also have a class which allows the user to describe the buffer memory
> layout in a natural way given the C struct, and manipulate the buffer
> layout w/ getattr/setattr.  For example:
> 
> class Win32MenuItemStruct(AutoStruct):
>     #
>     # for each slot, specify type (maps to a struct.pack specifier),
>     # name (for setattr/getattr behavior) and optional defaults.
>     #
>     table = [(UINT, 'cbSize', AutoStruct.sizeOfStruct),
>              (UINT, 'fMask', MIIM_STRING | MIIM_TYPE | MIIM_ID),
>              (UINT, 'fType', MFT_STRING),
>              (UINT, 'fState', MFS_ENABLED),
>              (UINT, 'wID', None),
>              (HANDLE, 'hSubMenu', 0),
>              (HANDLE, 'hbmpChecked', 0),
>              (HANDLE, 'hbmpUnchecked', 0),
>              (DWORD, 'dwItemData', 0),
>              (LPSTR, 'name', None),
>              (UINT, 'cch', 0)]
> 
> AutoStruct has machinery which allows setting of buffer slices by slot
> name, conversion of numeric types, etc.  This is working well.
> 
> The only hitch is that to send the buffer to the SWIG'ed function call, I
> have three options, none ideal:
> 
>    1) define a __str__ method which makes a string of the buffer and pass
>       that to the function which expects an "s#" argument.  This send
>       a copy of the data, not the address.  As a result, this works
>       well for structs which I create from scratch as long as I don't need
>       to see any changes that Windows might have performed on the memory.
> 
>    2) send the instance but make up my own 'get-the-instance-as-buffer'
>       API -- complicates extension module code.
> 
>    3) send the buffer attribute of the instance instead of the instance --
>       complicates Python code, and the C code isn't trivial because there
>       is no 'buffer' typecode for PyArg_ParseTuple().
> 
> If I could define an
> 
>   def __aswritebuffer__
> 
> and if there was a PyArg_ParseTuple() typecode associated with read/write
> buffers (I nominate 'w'!), I believe things would be simpler -- I could
> then send the instance, specify in the PyArgParse_Tuple that I want a
> pointer to memory, and I'd be golden.
> 
> What did I miss?

Just a naming thingie: __getwritebuffer__ et al. would map to the
C interfaces more directly.

The new typecode "w#" for writeable buffer style objects is a good idea
(it should only work on single segment buffers).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From fredrik at pythonware.com  Sun Aug 15 12:32:59 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 15 Aug 1999 12:32:59 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org>
Message-ID: <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>

> Fredrik Lundh wrote:
> >...
> > besides, what about buffers and threads?  if you
> > return a pointer from getreadbuf, wouldn't it be
> > good to know exactly when Python doesn't need
> > that pointer any more?  explicit initbuffer/exitbuffer
> > calls around each sequence of buffer operations
> > would make that a lot safer...
> 
> This is a pretty obvious one, I think: it lasts only as long as the
> object. PyString_AS_STRING is similar. Nothing new or funny here.

well, I think the buffer behaviour is both
new and pretty funny:

from array import array

a = array("f", [0]*8192)

b = buffer(a)

for i in range(1000):
    a.append(1234)

print b

in other words, the buffer interface should
be redesigned, or removed.

(though I'm sure AOL would find some inter-
resting use for this ;-)

</F>

    "Confusing?  Yes, but this is a lot better than
    allowing arbitrary pointers!"
    -- GvR on assignment operators, November 91


From da at ski.org  Sun Aug 15 18:54:23 1999
From: da at ski.org (David Ascher)
Date: Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? )
In-Reply-To: <37B68169.73E03C84@lemburg.com>
Message-ID: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>

On Sun, 15 Aug 1999, M.-A. Lemburg wrote:

> Actually, I think you could use arrays to do the trick right now,
> because they are writeable (unlike strings). Until creating
> writeable buffer objects becomes possible that is...

No, because I can't make an array around existing memory which Win32
allocates before I get to it.

> Just a naming thingie: __getwritebuffer__ et al. would map to the
> C interfaces more directly.

Whatever.

> The new typecode "w#" for writeable buffer style objects is a good idea
> (it should only work on single segment buffers).

Indeed.

--david


From gstein at lyra.org  Sun Aug 15 22:27:57 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:27:57 -0700
Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in abstract.c? ))
References: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>
Message-ID: <37B722CD.383A2A9E@lyra.org>

David Ascher wrote:
> On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> ...
> > The new typecode "w#" for writeable buffer style objects is a good idea
> > (it should only work on single segment buffers).
> 
> Indeed.

I just borrowed Guido's time machine. That typecode is already in 1.5.2.

:-)

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From gstein at lyra.org  Sun Aug 15 22:35:25 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:35:25 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>
Message-ID: <37B7248D.31E5D2BF@lyra.org>

Fredrik Lundh wrote:
>...
> well, I think the buffer behaviour is both
> new and pretty funny:

I think the buffer interface was introduced in 1.5 (by Jack?). I added
the 8-bit character buffer slot and buffer objects in 1.5.2.

> from array import array
> 
> a = array("f", [0]*8192)
> 
> b = buffer(a)
> 
> for i in range(1000):
>     a.append(1234)
> 
> print b
> 
> in other words, the buffer interface should
> be redesigned, or removed.

I don't understand what you believe is weird here. Also, are you saying
the buffer *interface* is weird, or the buffer *object* ?

thx,
-g

--
Greg Stein, http://www.lyra.org/


From da at ski.org  Sun Aug 15 22:49:23 1999
From: da at ski.org (David Ascher)
Date: Sun, 15 Aug 1999 13:49:23 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] w# typecode (was: marshal (was:Buffer interface in
 abstract.c? ))
In-Reply-To: <37B722CD.383A2A9E@lyra.org>
Message-ID: <Pine.WNT.4.05.9908151347050.68-100000@david.ski.org>

On Sun, 15 Aug 1999, Greg Stein wrote:

> David Ascher wrote:
> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> > ...
> > > The new typecode "w#" for writeable buffer style objects is a good idea
> > > (it should only work on single segment buffers).
> > 
> > Indeed.
> 
> I just borrowed Guido's time machine. That typecode is already in 1.5.2.

Ha.  Cool. 

--da


From gstein at lyra.org  Sun Aug 15 22:53:51 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 13:53:51 -0700
Subject: [Python-Dev] instances as buffers
References: <Pine.WNT.4.05.9908142242510.164-100000@david.ski.org>
Message-ID: <37B728DF.2CA2A20A@lyra.org>

David Ascher wrote:
>...
> I'm using buffer objects to pass pointers to structs back and forth
> between Python and Windows (Win32's GUI scheme involves sending messages
> to functions with, oftentimes, addresses of structs as arguments, and
> expect the called function to modify the struct directly -- similarly, I
> must call Win32 functions w/ pointers to memory that Windows will modify,
> and be able to read the modified memory). With 'raw' buffer object
> manipulation (after exposing the PyBuffer_FromMemoryReadWrite call to
> Python), this works fine [*].  So far, no instances.

How do you manage the lifetimes of the memory and objects?
PyBuffer_FromReadWriteMemory() creates a buffer object that points to
memory. You need to ensure that the memory exists as long as the buffer
does.

Would it make more sense to use PyBuffer_New(size)?

Note: PyBuffer_FromMemory() (read-only) was built primarily for the case
where you have static constants in an extension module (strings, code
objects, etc) and want to expose them to Python without copying them
into the heap. Currently, stuff like this must be copied into a dynamic
string object to be exposed to Python. The
PyBuffer_FromReadWriteMemory() is there for symmetry, but it can be very
dangerous to use because of the lifetime problem.

PyBuffer_New() allocates its own memory, so the lifetimes are managed
properly. PyBuffer_From*Object maintains a reference to the target
object so that the target object can be kept around at least as long as
the buffer.

> I also have a class which allows the user to describe the buffer memory
> layout in a natural way given the C struct, and manipulate the buffer
> layout w/ getattr/setattr.  For example:

This is a very cool class. Mark and I had discussed doing something just
like this (a while back) for some of the COM stuff. Basically, we'd want
to generate these structures from type libraries.

>...
> The only hitch is that to send the buffer to the SWIG'ed function call, I
> have three options, none ideal:
> 
>    1) define a __str__ method which makes a string of the buffer and pass
>       that to the function which expects an "s#" argument.  This send
>       a copy of the data, not the address.  As a result, this works
>       well for structs which I create from scratch as long as I don't need
>       to see any changes that Windows might have performed on the memory.

Note that "s#" can be used directly against the buffer object. You could
pass it directly rather than via __str__.

>    2) send the instance but make up my own 'get-the-instance-as-buffer'
>       API -- complicates extension module code.
> 
>    3) send the buffer attribute of the instance instead of the instance --
>       complicates Python code, and the C code isn't trivial because there
>       is no 'buffer' typecode for PyArg_ParseTuple().
> 
> If I could define an
> 
>   def __aswritebuffer__
> 
> and if there was a PyArg_ParseTuple() typecode associated with read/write
> buffers (I nominate 'w'!), I believe things would be simpler -- I could
> then send the instance, specify in the PyArgParse_Tuple that I want a
> pointer to memory, and I'd be golden.
> 
> What did I miss?

You can do #3 today since there is a buffer typecode present ("w" or
"w#"). It will complicate Python code a bit since you need to pass the
buffer, but it is the simplest of the three options.

Allowing instances to return buffers does seem to make sense, although
it exposes a lot of underlying machinery at the Python level. It might
be nicer to find a better semantic for this than just exposing the
buffer interface slots.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From da at ski.org  Sun Aug 15 23:07:35 1999
From: da at ski.org (David Ascher)
Date: Sun, 15 Aug 1999 14:07:35 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] Re: instances as buffers
In-Reply-To: <37B728DF.2CA2A20A@lyra.org>
Message-ID: <Pine.WNT.4.05.9908151358310.68-100000@david.ski.org>

On Sun, 15 Aug 1999, Greg Stein wrote:

> How do you manage the lifetimes of the memory and objects?
> PyBuffer_FromReadWriteMemory() creates a buffer object that points to
> memory. You need to ensure that the memory exists as long as the buffer
> does.

For those cases where I use PyBuffer_FromReadWriteMemory, I have no
control over the memory involved.  Windows allocates the memory, lets me
use it for a litle while, and it cleans it up whenever it feels like it.
It hasn't been a problem yet, but I agree that it's possibly a problem.
I'd call it a problem w/ the win32 API, though.

> Would it make more sense to use PyBuffer_New(size)?

Again, I can't because I am given a pointer and am expected to modify e.g.
bytes 10-12 starting from that memory location.

> This is a very cool class. Mark and I had discussed doing something just
> like this (a while back) for some of the COM stuff. Basically, we'd want
> to generate these structures from type libraries.

I know zilch about type libraries.  This is for CE work, although none
about this class is CE-specific.  Do type libraries give the same kind of
info?

> You can do #3 today since there is a buffer typecode present ("w" or
> "w#"). It will complicate Python code a bit since you need to pass the
> buffer, but it is the simplest of the three options.

Ok.  Time to patch SWIG again!

--david


From Vladimir.Marangozov at inrialpes.fr  Mon Aug 16 01:35:10 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Mon, 16 Aug 1999 00:35:10 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <000101bee51f$d7601de0$fb2d2399@tim> from "Tim Peters" at "Aug 12, 99 08:07:32 pm"
Message-ID: <199908152335.AAA55842@pukapuka.inrialpes.fr>

Tim Peters wrote:
> 
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call; if I used the debugger often enough to care <wink>, I'd think
> about munging in a new opcode to make breakpoint sites explicit).
> 
> immutability-is-made-to-be-violated-ly y'rs  - tim
> 

Could you elaborate a bit more on this? Do you mean setting breakpoints
on a per opcode basis (for example by exchanging the original opcode
with a new BREAKPOINT opcode in the code object) and use the lineno tab
for breakpoints based on the source listing?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one at email.msn.com  Mon Aug 16 04:31:16 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Sun, 15 Aug 1999 22:31:16 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908152335.AAA55842@pukapuka.inrialpes.fr>
Message-ID: <000101bee78f$6aa217e0$f22d2399@tim>

[Vladimir Marangozov]
> Could you elaborate a bit more on this?

No time for this now -- sorry.

> Do you mean setting breakpoints on a per opcode basis (for example
> by exchanging the original opcode with a new BREAKPOINT opcode in
> the code object) and use the lineno tab for breakpoints based on
> the source listing?

Something like that.  The classic way to implement positional breakpoints is
to perturb the code; the classic problem is how to get back the effect of
the code that was overwritten.


From gstein at lyra.org  Mon Aug 16 06:42:19 1999
From: gstein at lyra.org (Greg Stein)
Date: Sun, 15 Aug 1999 21:42:19 -0700
Subject: [Python-Dev] Re: why
References: <Pine.WNT.4.05.9908152139000.180-100000@david.ski.org>
Message-ID: <37B796AB.34F6F93@lyra.org>

David Ascher wrote:
> 
> Why does buffer(array('c', 'test')) return a read-only buffer?

Simply because the buffer() builtin always creates a read-only object,
rather than selecting read/write when possible.

Shouldn't be hard to alter the semantics of buffer() to do so. Maybe do
this at the same time as updating it to create read/write buffers out of
the blue.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From tim_one at email.msn.com  Mon Aug 16 08:42:17 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Mon, 16 Aug 1999 02:42:17 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <19990813214817.5393C1C4742@oratrix.oratrix.nl>
Message-ID: <000b01bee7b2$7c62d780$f22d2399@tim>

[Jack Jansen]
> ...

A long time ago, Dianne Hackborn actually implemented a scheme like this,
under the name VREF (for "virtual reference", or some such).  IIRC,
differences from your scheme were mainly that:

1) There was an elaborate proxy mechanism to avoid having to explicitly
strengthen the weak.

2) Each object contained a pointer to a linked list of associated weak refs.

This predates DejaNews, so may be a pain to find.

> ...
> We add a new builtin function (or a module with that function)
> weak(). This returns a weak reference to the object passed as a
> parameter. A weak object has one method: strong(), which returns the
> corresponding real object or raises an exception if the object doesn't
> exist anymore.

This interface appears nearly isomorphic to MIT Scheme's "hash" and "unhash"
functions, except that their hash returns an (unbounded) int and guarantees
that hash(o1) != hash(o2) for any distinct objects o1 and o2 (this is a
stronger guarantee than Python's "id", which may return the same int for
objects with disjoint lifetimes; the other reason object address isn't
appropriate for them is that objects can be moved by garbage collection, but
hash is an object invariant).

Of course unhash(hash(o)) is o, unless o has been gc'ed; then unhash raises
an exception.  By most accounts (I haven't used it seriously myself), it's a
usable interface.

> ...
> to implement this I need to add a pointer to every object.

That's unattractive, of course.

> ...
> (actually: we could make do with a single bit in every object, with
> the bit meaning "this object has an associated weak object". We could
> then use a global dictionary indexed by object address to find the
> weak object)

Is a single bit actually smaller than a pointer?  For example, on most
machines these days

#define PyObject_HEAD \
	int ob_refcnt; \
	struct _typeobject *ob_type;

is two 4-byte fields packed solid already, and structure padding prevents
adding anything less than a 4-byte increment in reality.  I guess on Alpha
there's a 4-byte hole here, but I don't want weak pointers enough to switch
machines <wink>.

OTOH, sooner or later Guido is going to want a mark bit too, so the other
way to view this is that 32 new flag bits are as cheap as one <wink>.

There's one other thing I like about this:  it can get rid of the dicey

> Strong() checks that self->object->weak == self and returns
> self->object (INCREFfed) if it is.

check.  If object has gone away, you're worried that self->object may (on
some systems) point to a newly-invalid address.  But worse than that, its
memory may get reused, and then self->object may point into the *middle* of
some other object where the bit pattern at the "weak" offset just happens to
equal self.

Let's try a sketch in pseduo-Python, where __xxx are secret functions that
do the obvious things (and glossing over thread safety since these are
presumably really implemented in C):

# invariant:  __is_weak_bit_set(obj) == id2weak.has_key(id(obj))
# So "the weak bit" is simply an optimization, sparing most objects
# from a dict lookup when they die.
# The invariant is delicate in the presence of threads.

id2weak = {}

class _Weak:
    def __init__(self, obj):
        self.id = id(obj)  # obj's refcount not bumped
        __set_weak_bit(obj)
        id2weak[self.id] = self
        # note that "the system" (see below) sets self.id
        # to None if obj dies

    def strong(self):
        if self.id is None:
            raise DeadManWalkingError(self.id)
        return __id2obj(self.id)  # will bump obj's refcount

    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

def weak(obj):
    return id2weak.get(id(obj), None) or _Weak(obj)

and then whenever an object of any kind is deleted the system does:

    if __is_weak_bit_set(obj):
        objid = id(obj)
        id2weak[objid].id = None
        del id2weak[objid]

In my current over-tired state, I think that's safe (modulo threads),
portable and reasonably fast; I do think the extra bit costs 4 bytes,
though.

> ...
> The weak object isn't transparent, because you have to call strong()
> before you can do anything with it, but this is an advantage (says he,
> aspiring to a career in politics or sales:-): with a transparent weak
> object the object could disappear at unexpected moments and with this
> scheme it can't, because when you have the object itself in hand you
> have a refcount too.

Explicit is better than implicit for me.

[M.-A. Lemburg]
> Have you checked the weak reference dictionary implementation
> by Dieter Maurer ? It's at:
>
>	http://www.handshake.de/~dieter/weakdict.html

A project where I work is using it; it blows up a lot <wink/frown>.

While some form of weak dict is what most people want in the end, I'm not
sure Dieter's decision to support weak dicts with only weak values (not weak
keys) is sufficient.  For example, the aforementioned project wants to
associate various computed long strings with certain hashable objects, and
for some reason or other (ain't my project ...) these objects can't be
changed.  So they can't store the strings in the objects.  So they'd like to
map the objects to the strings via assorted dicts.  But using the object as
a dict key keeps it (and, via the dicts, also its associated strings)
artificially alive; they really want a weakdict with weak *keys*.

I'm not sure I know of a clear & fast way to implement a weakdict building
only on the weak() function.  Jack?

Using weak objects as values (or keys) with an ordinary dict can prevent
their referents from being kept artificially alive, but that doesn't get the
dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
object when its referent goes away; that would at least give the WO a chance
to purge itself from structures it knows it's in ...

> ...
> BTW, how would this be done in JPython ? I guess it doesn't
> make much sense there because cycles are no problem for the
> Java VM GC.

Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
"hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
details.  I stopped paying attention to Java, so it's up to you to tell us
what you learn about it <wink>.


From fredrik at pythonware.com  Mon Aug 16 09:06:43 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 16 Aug 1999 09:06:43 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org>
Message-ID: <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com>

> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.
> 
> > from array import array
> > 
> > a = array("f", [0]*8192)
> > 
> > b = buffer(a)
> > 
> > for i in range(1000):
> >     a.append(1234)
> > 
> > print b
> > 
> > in other words, the buffer interface should
> > be redesigned, or removed.
> 
> I don't understand what you believe is weird here.

did you run that code?

it may work, it may bomb, or it may generate bogus
output. all depending on your memory allocator, the
phase of the moon, etc. just like back in the C/C++
days...

imo, that's not good enough for a core feature.

</F>


From gstein at lyra.org  Mon Aug 16 09:15:54 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 00:15:54 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com>
Message-ID: <37B7BAAA.1E6EE4CA@lyra.org>

Fredrik Lundh wrote:
> 
> > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > the 8-bit character buffer slot and buffer objects in 1.5.2.
> >
> > > from array import array
> > >
> > > a = array("f", [0]*8192)
> > >
> > > b = buffer(a)
> > >
> > > for i in range(1000):
> > >     a.append(1234)
> > >
> > > print b
> > >
> > > in other words, the buffer interface should
> > > be redesigned, or removed.
> >
> > I don't understand what you believe is weird here.
> 
> did you run that code?

Yup. It printed nothing.

> it may work, it may bomb, or it may generate bogus
> output. all depending on your memory allocator, the
> phase of the moon, etc. just like back in the C/C++
> days...

It probably appeared as an empty string because the construction of the
array filled it with zeroes (at least the first byte).

Regardless, I'd be surprised if it crashed the interpreter. The print
command is supposed to do a str() on the object, which creates a
PyStringObject from the buffer contents. Shouldn't be a crash there.

> imo, that's not good enough for a core feature.

If it crashed, then sure. But I'd say that indicates a bug rather than a
design problem. Do you have a stack trace from a crash?

Ah. I just worked through, in my head, what is happening here. The
buffer object caches the pointer returned by the array object. The
append on the array does a realloc() somewhere, thereby invalidating the
pointer inside the buffer object.

Icky. Gotta think on this one... As an initial thought, it would seem
that the buffer would have to re-query the pointer for each operation.
There are performance implications there, of course, but that would
certainly fix the problem.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jack at oratrix.nl  Mon Aug 16 11:42:42 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 11:42:42 +0200
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: Message by David Ascher <da@ski.org> ,
	     Sun, 15 Aug 1999 09:54:23 -0700 (Pacific Daylight Time) , 
 <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org>
Message-ID: <19990816094243.3CE83303120@snelboot.oratrix.nl>

> On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> 
> > Actually, I think you could use arrays to do the trick right now,
> > because they are writeable (unlike strings). Until creating
> > writeable buffer objects becomes possible that is...
> 
> No, because I can't make an array around existing memory which Win32
> allocates before I get to it.

Would adding a buffer interface to cobject solve your problem? Cobject is 
described as being used for passing C objects between Python modules, but I've 
always thought of it as passing C objects from one C routine to another C 
routine through Python, which doesn't necessarily understand what the object 
is all about.

That latter description seems to fit your bill quite nicely.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jack at oratrix.nl  Mon Aug 16 11:49:41 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 11:49:41 +0200
Subject: [Python-Dev] buffer interface considered harmful 
In-Reply-To: Message by Greg Stein <gstein@lyra.org> ,
	     Sun, 15 Aug 1999 13:35:25 -0700 , <37B7248D.31E5D2BF@lyra.org> 
Message-ID: <19990816094941.83BE2303120@snelboot.oratrix.nl>

> >...
> > well, I think the buffer behaviour is both
> > new and pretty funny:
> 
> I think the buffer interface was introduced in 1.5 (by Jack?). I added
> the 8-bit character buffer slot and buffer objects in 1.5.2.

Ah, now I understand why I didn't understand some of the previous 
conversation: I hadn't never come across the buffer *objects* (as opposed to 
the buffer *interface*) until Fredrik's example.

I've just look at it, and I'm not sure I understand the full intentions of the 
buffer object. Buffer objects can either behave as the "buffer-aspect" of the 
object behind them (without the rest of their functionality) or as array 
objects, and if they start out life as the first they can evolve into the 
second, is that right?

Is there a rationale behind this design, or is it just something that 
happened?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From gstein at lyra.org  Mon Aug 16 11:56:31 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 02:56:31 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990816094941.83BE2303120@snelboot.oratrix.nl>
Message-ID: <37B7E04F.3843004@lyra.org>

Jack Jansen wrote:
>...
> I've just look at it, and I'm not sure I understand the full intentions of the
> buffer object. Buffer objects can either behave as the "buffer-aspect" of the
> object behind them (without the rest of their functionality) or as array
> objects, and if they start out life as the first they can evolve into the
> second, is that right?
> 
> Is there a rationale behind this design, or is it just something that
> happened?

The object doesn't change. You create it as a reference to an existing
object's buffer (as exported via the buffer interface), or you create it
as a reference to some arbitrary memory.

The buffer object provides (optionally read/write) string-like behavior
to any object that supports buffer behavior. It can also be used to make
lightweight slices of another object. For example:

>>> a = "abcdefghi"
>>> b = buffer(a, 3, 3)
>>> print b
def
>>>

In the above example, there is only one copy of "def" (the portion
inside of the string object referenced by <a>).

The string-like behavior can be quite nice for memory-mapped files.
Andrew's mmapfile module's file objects export the buffer interface.
This means that you can open a file, wrap a buffer around it, and
perform quick and easy random-access on the thing. You could even select
slices of the file and pass them around as if they were strings, without
loading anything into the process heap. (I want to try mmap'ing a .pyc
and create code objects that have buffer-based bytecode streams; it will
be interesting to see if this significantly reduces memory consumption
(in terms of the heap size; the mmap'd .pyc can be shared across
processes)).

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From jim at digicool.com  Mon Aug 16 14:30:41 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 08:30:41 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com>
Message-ID: <37B80471.F0F467C9@digicool.com>

Fredrik Lundh wrote:
> 
> > Fredrik Lundh wrote:
> > >...
> > > besides, what about buffers and threads?  if you
> > > return a pointer from getreadbuf, wouldn't it be
> > > good to know exactly when Python doesn't need
> > > that pointer any more?  explicit initbuffer/exitbuffer
> > > calls around each sequence of buffer operations
> > > would make that a lot safer...
> >
> > This is a pretty obvious one, I think: it lasts only as long as the
> > object. PyString_AS_STRING is similar. Nothing new or funny here.
> 
> well, I think the buffer behaviour is both
> new and pretty funny:
> 
> from array import array
> 
> a = array("f", [0]*8192)
> 
> b = buffer(a)
> 
> for i in range(1000):
>     a.append(1234)
> 
> print b
> 
> in other words, the buffer interface should
> be redesigned, or removed.

A while ago I asked for some documentation on the Buffer
interface.  I basically got silence.  At this point, I 
don't have a good idea what buffers are for and I don't see alot
of evidence that there *is* a design. I assume that there was
a design, but I can't see it.  This whole discussion makes me
very queasy.  

I'm probably just out of it, since I don't have
time to read the Python list anymore. Presumably the buffer
interface was proposed and discussed there at some distant
point in the past.

(I can't pay as much attention to this discussion as I suspect
I should, due to time constaints and due to a basic understanding
of the rational for the buffer interface.  Jst now I caught a sniff
of something I find kinda repulsive.  I think I hear you all talking about
beasies that hold a reference to some object's internal storage and that
have write operations so you can write directly to the objects storage 
bypassing the object interfaces. I probably just imagined it.)

</whine>

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From gstein at lyra.org  Mon Aug 16 14:41:23 1999
From: gstein at lyra.org (Greg Stein)
Date: Mon, 16 Aug 1999 05:41:23 -0700
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com>
Message-ID: <37B806F3.2C5EDC44@lyra.org>

Jim Fulton wrote:
>...
> A while ago I asked for some documentation on the Buffer
> interface.  I basically got silence.  At this point, I

I think the silence was caused by the simple fact that the documentation
does not (yet) exist. That's all... nothing nefarious.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/


From mal at lemburg.com  Mon Aug 16 14:05:35 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:05:35 +0200
Subject: [Python-Dev] Re: w# typecode (was: marshal (was:Buffer interface in abstract.c? ))
References: <Pine.WNT.4.05.9908150953260.159-100000@david.ski.org> <37B722CD.383A2A9E@lyra.org>
Message-ID: <37B7FE8F.30C35284@lemburg.com>

Greg Stein wrote:
> 
> David Ascher wrote:
> > On Sun, 15 Aug 1999, M.-A. Lemburg wrote:
> > ...
> > > The new typecode "w#" for writeable buffer style objects is a good idea
> > > (it should only work on single segment buffers).
> >
> > Indeed.
> 
> I just borrowed Guido's time machine. That typecode is already in 1.5.2.
> 
> :-)

Ah, cool :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Mon Aug 16 14:29:31 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:29:31 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000b01bee7b2$7c62d780$f22d2399@tim>
Message-ID: <37B8042B.21DE6053@lemburg.com>

Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > Have you checked the weak reference dictionary implementation
> > by Dieter Maurer ? It's at:
> >
> >       http://www.handshake.de/~dieter/weakdict.html
> 
> A project where I work is using it; it blows up a lot <wink/frown>.
> 
> While some form of weak dict is what most people want in the end, I'm not
> sure Dieter's decision to support weak dicts with only weak values (not weak
> keys) is sufficient.  For example, the aforementioned project wants to
> associate various computed long strings with certain hashable objects, and
> for some reason or other (ain't my project ...) these objects can't be
> changed.  So they can't store the strings in the objects.  So they'd like to
> map the objects to the strings via assorted dicts.  But using the object as
> a dict key keeps it (and, via the dicts, also its associated strings)
> artificially alive; they really want a weakdict with weak *keys*.
> 
> I'm not sure I know of a clear & fast way to implement a weakdict building
> only on the weak() function.  Jack?
> 
> Using weak objects as values (or keys) with an ordinary dict can prevent
> their referents from being kept artificially alive, but that doesn't get the
> dict itself cleaned up by magic.  Perhaps "the system" should notify a weak
> object when its referent goes away; that would at least give the WO a chance
> to purge itself from structures it knows it's in ...

Perhaps one could fiddle something out of the Proxy objects
in mxProxy (you know where...). These support a special __cleanup__
protocol that I use a lot to work around circular garbage:
the __cleanup__ method of the referenced object is called prior
to destroying the proxy; even if the reference count on the
object has not yet gone down to 0.

This makes direct circles possible without problems: the parent
can reference a child through the proxy and the child can reference the
parent directly. As soon as the parent is cleaned up, the reference to
the proxy is deleted which then automagically makes the
back reference in the child disappear, allowing the parent
to be deallocated after cleanup without leaving a circular
reference around.

> > ...
> > BTW, how would this be done in JPython ? I guess it doesn't
> > make much sense there because cycles are no problem for the
> > Java VM GC.
> 
> Weak refs have many uses beyond avoiding cycles, and Java 1.2 has all of
> "hard", "soft", "weak", and "phantom" references.  See java.lang.ref for
> details.  I stopped paying attention to Java, so it's up to you to tell us
> what you learn about it <wink>.

Thanks for the reference... but I guess this will remain a
weak one for some time since the latter is currently a
limited resource :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Mon Aug 16 14:41:51 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 14:41:51 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B7248D.31E5D2BF@lyra.org> <009401bee7b5$e5dc27e0$f29b12c2@secret.pythonware.com> <37B7BAAA.1E6EE4CA@lyra.org>
Message-ID: <37B8070F.763C3FF8@lemburg.com>

Greg Stein wrote:
> 
> Fredrik Lundh wrote:
> >
> > > I think the buffer interface was introduced in 1.5 (by Jack?). I added
> > > the 8-bit character buffer slot and buffer objects in 1.5.2.
> > >
> > > > from array import array
> > > >
> > > > a = array("f", [0]*8192)
> > > >
> > > > b = buffer(a)
> > > >
> > > > for i in range(1000):
> > > >     a.append(1234)
> > > >
> > > > print b
> > > >
> > > > in other words, the buffer interface should
> > > > be redesigned, or removed.
> > >
> > > I don't understand what you believe is weird here.
> >
> > did you run that code?
> 
> Yup. It printed nothing.
> 
> > it may work, it may bomb, or it may generate bogus
> > output. all depending on your memory allocator, the
> > phase of the moon, etc. just like back in the C/C++
> > days...
> 
> It probably appeared as an empty string because the construction of the
> array filled it with zeroes (at least the first byte).
> 
> Regardless, I'd be surprised if it crashed the interpreter. The print
> command is supposed to do a str() on the object, which creates a
> PyStringObject from the buffer contents. Shouldn't be a crash there.
> 
> > imo, that's not good enough for a core feature.
> 
> If it crashed, then sure. But I'd say that indicates a bug rather than a
> design problem. Do you have a stack trace from a crash?
> 
> Ah. I just worked through, in my head, what is happening here. The
> buffer object caches the pointer returned by the array object. The
> append on the array does a realloc() somewhere, thereby invalidating the
> pointer inside the buffer object.
> 
> Icky. Gotta think on this one... As an initial thought, it would seem
> that the buffer would have to re-query the pointer for each operation.
> There are performance implications there, of course, but that would
> certainly fix the problem.

I guess that's the way to go. I wouldn't want to think
about those details when using buffer objects and a function call
is still better than a copy... it would do the init/exit
wrapping implicitly: init at the time the getreadbuffer
call is made and exit next time a thread switch is done - 
provided that the functions using the memory pointer also
keep a reference to the buffer object alive (but that should
be natural as this is always done when dealing with references
in a safe way).

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jim at digicool.com  Mon Aug 16 15:26:40 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 09:26:40 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990805205952.531B9E267A@oratrix.oratrix.nl> <37AAA2BC.466750B5@lemburg.com> <003101bee0c2$de6a2c90$f29b12c2@secret.pythonware.com> <37AC93CC.53982F3F@lyra.org> <000b01bee32c$d7bc2350$f29b12c2@secret.pythonware.com> <199908101412.KAA02065@eric.cnri.reston.va.us>             <000b01bee402$e29389e0$f29b12c2@secret.pythonware.com>  <199908111442.KAA04423@eric.cnri.reston.va.us> <00db01bee5a9$c1ebc380$f29b12c2@secret.pythonware.com> <37B62D0D.6EC24240@lyra.org> <008601bee709$8c54eb50$f29b12c2@secret.pythonware.com> <37B80471.F0F467C9@digicool.com> <37B806F3.2C5EDC44@lyra.org>
Message-ID: <37B81190.165C373E@digicool.com>

Greg Stein wrote:
> 
> Jim Fulton wrote:
> >...
> > A while ago I asked for some documentation on the Buffer
> > interface.  I basically got silence.  At this point, I
> 
> I think the silence was caused by the simple fact that the documentation
> does not (yet) exist. That's all... nothing nefarious.

I didn't mean to suggest anything nefarious.  I do think that a change that
affects something as basic as the standard object type layout and that
generates this much discussion really should be documented before it
becomes part of the core.  I'd especially like to see some kind of document
that includes information like:

  - A problem statement that describes the problem the change is
    solving,

  - How does the solution solve the problem,

  - When and how should people writing new types support the new
    interfaces?

We're not talking about a new library module here.  There's been 
a change to the core object interface.

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From jack at oratrix.nl  Mon Aug 16 15:45:31 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 15:45:31 +0200
Subject: [Python-Dev] buffer interface considered harmful 
In-Reply-To: Message by Jim Fulton <jim@digicool.com> ,
	     Mon, 16 Aug 1999 08:30:41 -0400 , <37B80471.F0F467C9@digicool.com> 
Message-ID: <19990816134531.C30B5303120@snelboot.oratrix.nl>

> A while ago I asked for some documentation on the Buffer
> interface.  I basically got silence.  At this point, I 
> don't have a good idea what buffers are for and I don't see alot
> of evidence that there *is* a design. I assume that there was
> a design, but I can't see it.  This whole discussion makes me
> very queasy.  

Okay, as I'm apparently not the only one who is queasy let's start from 
scratch.

First, there is the old buffer _interface_. This is a C interface that allows 
extension (and builtin) modules and functions a unified way to access objects 
if they want to write the object to file and similar things. It is also what 
the PyArg_ParseTuple "s#" returns. This is, in C, the 
getreadbuffer/getwritebuffer interface.

Second, there's the extension the the buffer interface as of 1.5.2. This is 
again only available in C, and it allows C programmers to get an object _as an 
ASCII string_. This is meant for things like regexp modules, to access any 
"textual" object as an ASCII string. This is the getcharbuffer interface, and 
bound to the "t#" specifier in PyArg_ParseTuple.

Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports 
the functionality of the buffer interface to Python, but it does a bit more as 
well, because the buffer objects have a sort of copy-on-write semantics that 
means they may or may not be "attached" to a python object through the buffer 
interface.

<personal opinion>
I think that the C interface and the object should be treated completely 
separately. I definitely want the C interface, but I personally don't use the 
Python buffer objects, so I don't really care all that much about those. Also, 
I think that the buffer objects might become easier to understand if we don't 
think of it as "the buffer interface exported to python", but as "Python 
buffer objects, that may share memory with other Python objects as an 
optimization".
</personal opinion>
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From jim at digicool.com  Mon Aug 16 18:03:54 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 12:03:54 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <19990816134531.C30B5303120@snelboot.oratrix.nl>
Message-ID: <37B8366A.82B305C7@digicool.com>

Jack Jansen wrote:
> 
> > A while ago I asked for some documentation on the Buffer
> > interface.  I basically got silence.  At this point, I
> > don't have a good idea what buffers are for and I don't see alot
> > of evidence that there *is* a design. I assume that there was
> > a design, but I can't see it.  This whole discussion makes me
> > very queasy.
> 
> Okay, as I'm apparently not the only one who is queasy let's start from
> scratch.

Yee ha!
 
> First, there is the old buffer _interface_. This is a C interface that allows
> extension (and builtin) modules and functions a unified way to access objects
> if they want to write the object to file and similar things.

Is this serialization?  What does this achiev that, say, the pickling
protocols don't achiev? What other problems does it solve?

> It is also what
> the PyArg_ParseTuple "s#" returns. This is, in C, the
> getreadbuffer/getwritebuffer interface.

Huh? "s#" doesn't return a string? Or are you saying that you can
pass a non-string object to a C function that uses "s#" and have it
bufferized and then stringized?  In either case, this is not
consistent with the documentation (interface) of PyArg_ParseTuple.
 
> Second, there's the extension the the buffer interface as of 1.5.2. This is
> again only available in C, and it allows C programmers to get an object _as an
> ASCII string_. This is meant for things like regexp modules, to access any
> "textual" object as an ASCII string. This is the getcharbuffer interface, and
> bound to the "t#" specifier in PyArg_ParseTuple.

Hm. So this is making a little more sense. So, there is a notion that
there are "textual" objects that want to provide a method for getting
their "text". How does this text differ from what you get from __str__
or __repr__?  

> Third, there is the buffer _object_, also new in 1.5.2. This sort-of exports
> the functionality of the buffer interface to Python,

How so?  Maybe I'm at sea because I still don't get what the 
C buffer interface is for.

> but it does a bit more as
> well, because the buffer objects have a sort of copy-on-write semantics that
> means they may or may not be "attached" to a python object through the buffer
> interface.

What is this thing used for?

Where does the slot in tp_as_buffer come into all of this?

Why does this need to be a slot in the first place?
Are these "textual" objects really common? Is the presense of this
slot a flag for "textualness"?
 
It would help alot, at least for me, if there was a clearer
description of what motivates these things. What problems are
they trying to solve?  

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da at ski.org  Mon Aug 16 18:45:47 1999
From: da at ski.org (David Ascher)
Date: Mon, 16 Aug 1999 09:45:47 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B8366A.82B305C7@digicool.com>
Message-ID: <Pine.WNT.4.04.9908160926440.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

> > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > again only available in C, and it allows C programmers to get an object _as an
> > ASCII string_. This is meant for things like regexp modules, to access any
> > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > bound to the "t#" specifier in PyArg_ParseTuple.
> 
> Hm. So this is making a little more sense. So, there is a notion that
> there are "textual" objects that want to provide a method for getting
> their "text". How does this text differ from what you get from __str__
> or __repr__?  

I'll let others give a well thought out rationale.  Here are some examples
of use which I think worthwile:

* Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
  objects fit this aspect of the buffer interface allows you to do regexp
  searches on it w/o ever building a twelve gigabyte PyString.

* Consider a non-contiguous NumPy array.  If the array type supported the
  multi-segment buffer interface, extension module writers could
  manipulate the data within this array w/o having to worry about the
  non-contiguous nature of the data.  They'd still have to worry about
  the multi-byte nature of the data, but it's still a win.  In other
  words, I think that the buffer interface could be useful even w/
  non-textual data.  

* If NumPy was modified to have arrays with data stored in buffer objects
  as opposed to the current "char *", and if PIL was modified to have
  images stored in buffer objects as opposed to whatever it uses, one
  could have arrays and images which shared data.  

I think all of these provide examples of motivations which are appealing
to at least some Python users. I make no claim that they motivate the
specific interface.  In all the cases I can think of, one or both of two
features are the key asset:

  - access to subset of huge data regions w/o creation of huge temporary
    variables.

  - sharing of data space.

Yes, it's a power tool, and as a such should come with safety goggles.
But then again, the same is true for ExtensionClasses =).

leaving-out-the-regexp-on-NumPy-arrays-example, 

   --david

PS: I take back the implicit suggestion that buffer() return read-write
    buffers when possible.  


From jim at digicool.com  Mon Aug 16 19:06:19 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 13:06:19 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908160926440.281-100000@rigoletto.ski.org>
Message-ID: <37B8450B.C5D308E4@digicool.com>

David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> > > Second, there's the extension the the buffer interface as of 1.5.2. This is
> > > again only available in C, and it allows C programmers to get an object _as an
> > > ASCII string_. This is meant for things like regexp modules, to access any
> > > "textual" object as an ASCII string. This is the getcharbuffer interface, and
> > > bound to the "t#" specifier in PyArg_ParseTuple.
> >
> > Hm. So this is making a little more sense. So, there is a notion that
> > there are "textual" objects that want to provide a method for getting
> > their "text". How does this text differ from what you get from __str__
> > or __repr__?
> 
> I'll let others give a well thought out rationale. 

I eagerly await this. :)

> Here are some examples
> of use which I think worthwile:
> 
> * Consider an mmap()'ed file, twelve gigabytes long.  Making mmapfile
>   objects fit this aspect of the buffer interface allows you to do regexp
>   searches on it w/o ever building a twelve gigabyte PyString.

This seems reasonable, if a bit exotic. :)

> * Consider a non-contiguous NumPy array.  If the array type supported the
>   multi-segment buffer interface, extension module writers could
>   manipulate the data within this array w/o having to worry about the
>   non-contiguous nature of the data.  They'd still have to worry about
>   the multi-byte nature of the data, but it's still a win.  In other
>   words, I think that the buffer interface could be useful even w/
>   non-textual data.

Why is this a good thing? Why should extension module writes 
worry abot the non-contiguous nature of the data now?  Does the NumPy
C API somehow expose this now?  Will multi-segment buffers make it
go away somehow?
 
> * If NumPy was modified to have arrays with data stored in buffer objects
>   as opposed to the current "char *", and if PIL was modified to have
>   images stored in buffer objects as opposed to whatever it uses, one
>   could have arrays and images which shared data.

Uh, and this would be a good thing? Maybe PIL should just be modified
to use NumPy arrays.
 
> I think all of these provide examples of motivations which are appealing
> to at least some Python users.

Perhaps, although Guido knows how they'd find out about them. ;)

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da at ski.org  Mon Aug 16 19:18:46 1999
From: da at ski.org (David Ascher)
Date: Mon, 16 Aug 1999 10:18:46 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B8450B.C5D308E4@digicool.com>
Message-ID: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

>> [regexps on gigabyte files]
>
> This seems reasonable, if a bit exotic. :)

In the bioinformatics world, I think it's everyday stuff.

> Why is this a good thing? Why should extension module writes worry
> abot the non-contiguous nature of the data now?  Does the NumPy C API
> somehow expose this now?  Will multi-segment buffers make it go away
> somehow?

A NumPy extension module writer needs to create and modify NumPy arrays.
These arrays may be non-contiguous (if e.g. they are the result of
slicing).  The NumPy C API exposes the non-contiguous nature, but it's
hard enough to deal with it that I suspect most extension writers require
contiguous arrays, which means unnecessary copies.

Multi-segment buffers won't make the API go away necessarily (backwards
compatibility and all that), but it could make it unnecessary for many
extension writers.

> > * If NumPy was modified to have arrays with data stored in buffer objects
> >   as opposed to the current "char *", and if PIL was modified to have
> >   images stored in buffer objects as opposed to whatever it uses, one
> >   could have arrays and images which shared data.
> 
> Uh, and this would be a good thing? Maybe PIL should just be modified
> to use NumPy arrays.

Why?  PIL was designed for image processing, and made design decisions
appropriate to that domain.  NumPy was designed for multidimensional
numeric array processing, and made design decisions appropriate to that
domain. The intersection of interests exists (e.g. in the medical imaging
world), and I know people who spend a lot of their CPU time moving data
between images and arrays with "stupid" tostring/fromstring operations.  
Given the size of the images, it's a prodigious waste of time, and kills
the use of Python in many a project.

> Perhaps, although Guido knows how they'd find out about them. ;)

Uh?  These issues have been discussed in the NumPy/PIL world for a while,
with no solution in sight.  Recently, I and others saw mentions of buffers
in the source, and they seemed like a reasonable approach, which could be
done w/o a rewrite of either PIL or NumPy.  

Don't get me wrong -- I'm all for better documentation of the buffer
stuff, design guidelines, warnings and protocols.  I stated as much on
June 15:

  http://www.python.org/pipermail/python-dev/1999-June/000338.html


--david


From jim at digicool.com  Mon Aug 16 19:38:22 1999
From: jim at digicool.com (Jim Fulton)
Date: Mon, 16 Aug 1999 13:38:22 -0400
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>
Message-ID: <37B84C8E.46885C8E@digicool.com>

David Ascher wrote:
> 
> On Mon, 16 Aug 1999, Jim Fulton wrote:
> 
> >> [regexps on gigabyte files]
> >
> > This seems reasonable, if a bit exotic. :)
> 
> In the bioinformatics world, I think it's everyday stuff.

Right, in some (exotic ;) domains it's not exotic at all. 

> > Why is this a good thing? Why should extension module writes worry
> > abot the non-contiguous nature of the data now?  Does the NumPy C API
> > somehow expose this now?  Will multi-segment buffers make it go away
> > somehow?
> 
> A NumPy extension module writer needs to create and modify NumPy arrays.
> These arrays may be non-contiguous (if e.g. they are the result of
> slicing).  The NumPy C API exposes the non-contiguous nature, but it's
> hard enough to deal with it that I suspect most extension writers require
> contiguous arrays, which means unnecessary copies.

Hm. This sounds like an API problem to me.

> Multi-segment buffers won't make the API go away necessarily (backwards
> compatibility and all that), but it could make it unnecessary for many
> extension writers.

Multi-segment buffers don't make the mult-segmented nature of the
memory go away. Do they really simplify the API that much?

They seem to strip away an awful lot of information hiding.
 
> > > * If NumPy was modified to have arrays with data stored in buffer objects
> > >   as opposed to the current "char *", and if PIL was modified to have
> > >   images stored in buffer objects as opposed to whatever it uses, one
> > >   could have arrays and images which shared data.
> >
> > Uh, and this would be a good thing? Maybe PIL should just be modified
> > to use NumPy arrays.
> 
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

It seems to me that NumPy is sufficiently broad enogh to encompass
image processing.

My main concern is having two systems rely on some low-level "shared
memory" mechanism to achiev effiecient communication.
 
> > Perhaps, although Guido knows how they'd find out about them. ;)
> 
> Uh?  These issues have been discussed in the NumPy/PIL world for a while,
> with no solution in sight.  Recently, I and others saw mentions of buffers
> in the source, and they seemed like a reasonable approach, which could be
> done w/o a rewrite of either PIL or NumPy.

My point was that people would be lucky to find out about buffers or
about how to use them as things stand.

> Don't get me wrong -- I'm all for better documentation of the buffer
> stuff, design guidelines, warnings and protocols.  I stated as much on
> June 15:
> 
>   http://www.python.org/pipermail/python-dev/1999-June/000338.html

Yes, that was quite a jihad you launched. ;)

Jim

--
Jim Fulton           mailto:jim at digicool.com   Python Powered!        
Technical Director   (888) 344-4332            http://www.python.org  
Digital Creations    http://www.digicool.com   http://www.zope.org    

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.


From da at ski.org  Mon Aug 16 20:25:54 1999
From: da at ski.org (David Ascher)
Date: Mon, 16 Aug 1999 11:25:54 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <37B84C8E.46885C8E@digicool.com>
Message-ID: <Pine.WNT.4.04.9908161047290.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jim Fulton wrote:

[ Aside:

  > It seems to me that NumPy is sufficiently broad enogh to encompass
  > image processing.

  Well, I'll just say that you could have been right, but w/ the current
  NumPy, I don't blame F/ for having developed his own data structures.  
  NumPy is messy, and some of its design decisions are wrong for image
  things (memory handling, casting rules, etc.).  It's all water under the
  bridge at this point.
]

Back to the main topic:

You say:

> [Multi-segment buffers] seem to strip away an awful lot of information
> hiding.

My impression of the buffer notion was that it is intended to *provide*
information hiding, by giving a simple API to byte arrays which could be
stored in various ways.  I do agree that whether those bytes should be
shared or not is a decision which should be weighted carefully.

> My main concern is having two systems rely on some low-level "shared
> memory" mechanism to achiev effiecient communication.

I don't particularly care about the specific buffer interface (the
low-level nature of which is what I think you object to). I do care about
having a well-defined mechanism for sharing memory between objects, and I
think there is value in defining such an interface generically.  Maybe the
notion of segmented arrays of bytes is too low-level, and instead we
should think of the data spaces as segmented arrays of chunks, where a
chunk can be one or more bytes?  Or do you object to any 'generic'
interface?

Just for fun, here's the list of things which either currently do or have
been talked about possibly in the future supporting some sort of buffer
interface, and my guesses as to chunk size, segmented status and
writeability):

  - strings  (1 byte, single-segment, r/o)
  - unicode strings (2 bytes, single-segment, r/o)
  - struct.pack() things (1 byte, single-segment,r/o)
  - arrays (1-4? bytes, single-segment, r/w)
  - NumPy arrays (1-8 bytes, multi-segment, r/w)
  - PIL images (1-? bytes, multi-segment, r/w)
  - CObjects (1-byte, single-segment, r/?)
  - mmapfiles (1-byte, multi-segment?, r/w)
  - non-python-owned memory (1-byte, single-segment, r/w)

--david


From jack at oratrix.nl  Mon Aug 16 21:36:40 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 16 Aug 1999 21:36:40 +0200
Subject: [Python-Dev] Buffer interface and multiple threads
Message-ID: <19990816193645.9E5B5CF320@oratrix.oratrix.nl>

Hmm, something that just struck me: the buffer _interface_ (i.e. the C 
routines, not the buffer object stuff) is potentially thread-unsafe.

In the "old world", where "s#" only worked on string objects, you
could be sure that the C pointer returned remained valid as long as
you had a reference to the python string object in hand, as strings
are immutable.

In the "new world", where "s#" also works on, say, array objects, this 
doesn't hold anymore. So, potentially, while one thread is in a
write() system call writing the contents of the array to a file
another thread could come in and change the data.

Is this a problem?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Mon Aug 16 22:22:12 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Mon, 16 Aug 1999 22:22:12 +0200
Subject: [Python-Dev] New htmlentitydefs.py file
Message-ID: <37B872F4.1C3F5D39@lemburg.com>

Attached you find a new HTML entity definitions file taken and
parsed from:

    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLlat1.ent
    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLsymbol.ent
    http://www.w3.org/TR/1998/REC-html40-19980424/HTMLspecial.ent
 
The latter two contain Unicode charcodes which obviously cannot
(yet) be mapped to Unicode strings... perhaps Fredrik wants
to include a spiced up version in with his Unicode type.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   138 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/
-------------- next part --------------
"""
    Entity definitions for HTML4.0. Taken and parsed from:
        http://www.w3.org/TR/1998/REC-html40/HTMLlat1.ent
        http://www.w3.org/TR/1998/REC-html40/HTMLsymbol.ent
        http://www.w3.org/TR/1998/REC-html40/HTMLspecial.ent
"""

entitydefs = {
    'AElig':	chr(198),	# latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1
    'Aacute':	chr(193),	# latin capital letter A with acute, U+00C1 ISOlat1
    'Acirc':	chr(194),	# latin capital letter A with circumflex, U+00C2 ISOlat1
    'Agrave':	chr(192),	# latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1
    'Alpha':	'&#913;',	# greek capital letter alpha, U+0391
    'Aring':	chr(197),	# latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1
    'Atilde':	chr(195),	# latin capital letter A with tilde, U+00C3 ISOlat1
    'Auml':	chr(196),	# latin capital letter A with diaeresis, U+00C4 ISOlat1
    'Beta':	'&#914;',	# greek capital letter beta, U+0392
    'Ccedil':	chr(199),	# latin capital letter C with cedilla, U+00C7 ISOlat1
    'Chi':	'&#935;',	# greek capital letter chi, U+03A7
    'Dagger':	'&#8225;',	# double dagger, U+2021 ISOpub
    'Delta':	'&#916;',	# greek capital letter delta, U+0394 ISOgrk3
    'ETH':	chr(208),	# latin capital letter ETH, U+00D0 ISOlat1
    'Eacute':	chr(201),	# latin capital letter E with acute, U+00C9 ISOlat1
    'Ecirc':	chr(202),	# latin capital letter E with circumflex, U+00CA ISOlat1
    'Egrave':	chr(200),	# latin capital letter E with grave, U+00C8 ISOlat1
    'Epsilon':	'&#917;',	# greek capital letter epsilon, U+0395
    'Eta':	'&#919;',	# greek capital letter eta, U+0397
    'Euml':	chr(203),	# latin capital letter E with diaeresis, U+00CB ISOlat1
    'Gamma':	'&#915;',	# greek capital letter gamma, U+0393 ISOgrk3
    'Iacute':	chr(205),	# latin capital letter I with acute, U+00CD ISOlat1
    'Icirc':	chr(206),	# latin capital letter I with circumflex, U+00CE ISOlat1
    'Igrave':	chr(204),	# latin capital letter I with grave, U+00CC ISOlat1
    'Iota':	'&#921;',	# greek capital letter iota, U+0399
    'Iuml':	chr(207),	# latin capital letter I with diaeresis, U+00CF ISOlat1
    'Kappa':	'&#922;',	# greek capital letter kappa, U+039A
    'Lambda':	'&#923;',	# greek capital letter lambda, U+039B ISOgrk3
    'Mu':	'&#924;',	# greek capital letter mu, U+039C
    'Ntilde':	chr(209),	# latin capital letter N with tilde, U+00D1 ISOlat1
    'Nu':	'&#925;',	# greek capital letter nu, U+039D
    'Oacute':	chr(211),	# latin capital letter O with acute, U+00D3 ISOlat1
    'Ocirc':	chr(212),	# latin capital letter O with circumflex, U+00D4 ISOlat1
    'Ograve':	chr(210),	# latin capital letter O with grave, U+00D2 ISOlat1
    'Omega':	'&#937;',	# greek capital letter omega, U+03A9 ISOgrk3
    'Omicron':	'&#927;',	# greek capital letter omicron, U+039F
    'Oslash':	chr(216),	# latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1
    'Otilde':	chr(213),	# latin capital letter O with tilde, U+00D5 ISOlat1
    'Ouml':	chr(214),	# latin capital letter O with diaeresis, U+00D6 ISOlat1
    'Phi':	'&#934;',	# greek capital letter phi, U+03A6 ISOgrk3
    'Pi':	'&#928;',	# greek capital letter pi, U+03A0 ISOgrk3
    'Prime':	'&#8243;',	# double prime = seconds = inches, U+2033 ISOtech
    'Psi':	'&#936;',	# greek capital letter psi, U+03A8 ISOgrk3
    'Rho':	'&#929;',	# greek capital letter rho, U+03A1
    'Sigma':	'&#931;',	# greek capital letter sigma, U+03A3 ISOgrk3
    'THORN':	chr(222),	# latin capital letter THORN, U+00DE ISOlat1
    'Tau':	'&#932;',	# greek capital letter tau, U+03A4
    'Theta':	'&#920;',	# greek capital letter theta, U+0398 ISOgrk3
    'Uacute':	chr(218),	# latin capital letter U with acute, U+00DA ISOlat1
    'Ucirc':	chr(219),	# latin capital letter U with circumflex, U+00DB ISOlat1
    'Ugrave':	chr(217),	# latin capital letter U with grave, U+00D9 ISOlat1
    'Upsilon':	'&#933;',	# greek capital letter upsilon, U+03A5 ISOgrk3
    'Uuml':	chr(220),	# latin capital letter U with diaeresis, U+00DC ISOlat1
    'Xi':	'&#926;',	# greek capital letter xi, U+039E ISOgrk3
    'Yacute':	chr(221),	# latin capital letter Y with acute, U+00DD ISOlat1
    'Zeta':	'&#918;',	# greek capital letter zeta, U+0396
    'aacute':	chr(225),	# latin small letter a with acute, U+00E1 ISOlat1
    'acirc':	chr(226),	# latin small letter a with circumflex, U+00E2 ISOlat1
    'acute':	chr(180),	# acute accent = spacing acute, U+00B4 ISOdia
    'aelig':	chr(230),	# latin small letter ae = latin small ligature ae, U+00E6 ISOlat1
    'agrave':	chr(224),	# latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1
    'alefsym':	'&#8501;',	# alef symbol = first transfinite cardinal, U+2135 NEW
    'alpha':	'&#945;',	# greek small letter alpha, U+03B1 ISOgrk3
    'and':	'&#8743;',	# logical and = wedge, U+2227 ISOtech
    'ang':	'&#8736;',	# angle, U+2220 ISOamso
    'aring':	chr(229),	# latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1
    'asymp':	'&#8776;',	# almost equal to = asymptotic to, U+2248 ISOamsr
    'atilde':	chr(227),	# latin small letter a with tilde, U+00E3 ISOlat1
    'auml':	chr(228),	# latin small letter a with diaeresis, U+00E4 ISOlat1
    'bdquo':	'&#8222;',	# double low-9 quotation mark, U+201E NEW
    'beta':	'&#946;',	# greek small letter beta, U+03B2 ISOgrk3
    'brvbar':	chr(166),	# broken bar = broken vertical bar, U+00A6 ISOnum
    'bull':	'&#8226;',	# bullet = black small circle, U+2022 ISOpub
    'cap':	'&#8745;',	# intersection = cap, U+2229 ISOtech
    'ccedil':	chr(231),	# latin small letter c with cedilla, U+00E7 ISOlat1
    'cedil':	chr(184),	# cedilla = spacing cedilla, U+00B8 ISOdia
    'cent':	chr(162),	# cent sign, U+00A2 ISOnum
    'chi':	'&#967;',	# greek small letter chi, U+03C7 ISOgrk3
    'clubs':	'&#9827;',	# black club suit = shamrock, U+2663 ISOpub
    'cong':	'&#8773;',	# approximately equal to, U+2245 ISOtech
    'copy':	chr(169),	# copyright sign, U+00A9 ISOnum
    'crarr':	'&#8629;',	# downwards arrow with corner leftwards = carriage return, U+21B5 NEW
    'cup':	'&#8746;',	# union = cup, U+222A ISOtech
    'curren':	chr(164),	# currency sign, U+00A4 ISOnum
    'dArr':	'&#8659;',	# downwards double arrow, U+21D3 ISOamsa
    'dagger':	'&#8224;',	# dagger, U+2020 ISOpub
    'darr':	'&#8595;',	# downwards arrow, U+2193 ISOnum
    'deg':	chr(176),	# degree sign, U+00B0 ISOnum
    'delta':	'&#948;',	# greek small letter delta, U+03B4 ISOgrk3
    'diams':	'&#9830;',	# black diamond suit, U+2666 ISOpub
    'divide':	chr(247),	# division sign, U+00F7 ISOnum
    'eacute':	chr(233),	# latin small letter e with acute, U+00E9 ISOlat1
    'ecirc':	chr(234),	# latin small letter e with circumflex, U+00EA ISOlat1
    'egrave':	chr(232),	# latin small letter e with grave, U+00E8 ISOlat1
    'empty':	'&#8709;',	# empty set = null set = diameter, U+2205 ISOamso
    'emsp':	'&#8195;',	# em space, U+2003 ISOpub
    'ensp':	'&#8194;',	# en space, U+2002 ISOpub
    'epsilon':	'&#949;',	# greek small letter epsilon, U+03B5 ISOgrk3
    'equiv':	'&#8801;',	# identical to, U+2261 ISOtech
    'eta':	'&#951;',	# greek small letter eta, U+03B7 ISOgrk3
    'eth':	chr(240),	# latin small letter eth, U+00F0 ISOlat1
    'euml':	chr(235),	# latin small letter e with diaeresis, U+00EB ISOlat1
    'exist':	'&#8707;',	# there exists, U+2203 ISOtech
    'fnof':	'&#402;',	# latin small f with hook = function = florin, U+0192 ISOtech
    'forall':	'&#8704;',	# for all, U+2200 ISOtech
    'frac12':	chr(189),	# vulgar fraction one half = fraction one half, U+00BD ISOnum
    'frac14':	chr(188),	# vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum
    'frac34':	chr(190),	# vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum
    'frasl':	'&#8260;',	# fraction slash, U+2044 NEW
    'gamma':	'&#947;',	# greek small letter gamma, U+03B3 ISOgrk3
    'ge':	'&#8805;',	# greater-than or equal to, U+2265 ISOtech
    'hArr':	'&#8660;',	# left right double arrow, U+21D4 ISOamsa
    'harr':	'&#8596;',	# left right arrow, U+2194 ISOamsa
    'hearts':	'&#9829;',	# black heart suit = valentine, U+2665 ISOpub
    'hellip':	'&#8230;',	# horizontal ellipsis = three dot leader, U+2026 ISOpub
    'iacute':	chr(237),	# latin small letter i with acute, U+00ED ISOlat1
    'icirc':	chr(238),	# latin small letter i with circumflex, U+00EE ISOlat1
    'iexcl':	chr(161),	# inverted exclamation mark, U+00A1 ISOnum
    'igrave':	chr(236),	# latin small letter i with grave, U+00EC ISOlat1
    'image':	'&#8465;',	# blackletter capital I = imaginary part, U+2111 ISOamso
    'infin':	'&#8734;',	# infinity, U+221E ISOtech
    'int':	'&#8747;',	# integral, U+222B ISOtech
    'iota':	'&#953;',	# greek small letter iota, U+03B9 ISOgrk3
    'iquest':	chr(191),	# inverted question mark = turned question mark, U+00BF ISOnum
    'isin':	'&#8712;',	# element of, U+2208 ISOtech
    'iuml':	chr(239),	# latin small letter i with diaeresis, U+00EF ISOlat1
    'kappa':	'&#954;',	# greek small letter kappa, U+03BA ISOgrk3
    'lArr':	'&#8656;',	# leftwards double arrow, U+21D0 ISOtech
    'lambda':	'&#955;',	# greek small letter lambda, U+03BB ISOgrk3
    'lang':	'&#9001;',	# left-pointing angle bracket = bra, U+2329 ISOtech
    'laquo':	chr(171),	# left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum
    'larr':	'&#8592;',	# leftwards arrow, U+2190 ISOnum
    'lceil':	'&#8968;',	# left ceiling = apl upstile, U+2308 ISOamsc
    'ldquo':	'&#8220;',	# left double quotation mark, U+201C ISOnum
    'le':	'&#8804;',	# less-than or equal to, U+2264 ISOtech
    'lfloor':	'&#8970;',	# left floor = apl downstile, U+230A ISOamsc
    'lowast':	'&#8727;',	# asterisk operator, U+2217 ISOtech
    'loz':	'&#9674;',	# lozenge, U+25CA ISOpub
    'lrm':	'&#8206;',	# left-to-right mark, U+200E NEW RFC 2070
    'lsaquo':	'&#8249;',	# single left-pointing angle quotation mark, U+2039 ISO proposed
    'lsquo':	'&#8216;',	# left single quotation mark, U+2018 ISOnum
    'macr':	chr(175),	# macron = spacing macron = overline = APL overbar, U+00AF ISOdia
    'mdash':	'&#8212;',	# em dash, U+2014 ISOpub
    'micro':	chr(181),	# micro sign, U+00B5 ISOnum
    'middot':	chr(183),	# middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum
    'minus':	'&#8722;',	# minus sign, U+2212 ISOtech
    'mu':	'&#956;',	# greek small letter mu, U+03BC ISOgrk3
    'nabla':	'&#8711;',	# nabla = backward difference, U+2207 ISOtech
    'nbsp':	chr(160),	# no-break space = non-breaking space, U+00A0 ISOnum
    'ndash':	'&#8211;',	# en dash, U+2013 ISOpub
    'ne':	'&#8800;',	# not equal to, U+2260 ISOtech
    'ni':	'&#8715;',	# contains as member, U+220B ISOtech
    'not':	chr(172),	# not sign, U+00AC ISOnum
    'notin':	'&#8713;',	# not an element of, U+2209 ISOtech
    'nsub':	'&#8836;',	# not a subset of, U+2284 ISOamsn
    'ntilde':	chr(241),	# latin small letter n with tilde, U+00F1 ISOlat1
    'nu':	'&#957;',	# greek small letter nu, U+03BD ISOgrk3
    'oacute':	chr(243),	# latin small letter o with acute, U+00F3 ISOlat1
    'ocirc':	chr(244),	# latin small letter o with circumflex, U+00F4 ISOlat1
    'ograve':	chr(242),	# latin small letter o with grave, U+00F2 ISOlat1
    'oline':	'&#8254;',	# overline = spacing overscore, U+203E NEW
    'omega':	'&#969;',	# greek small letter omega, U+03C9 ISOgrk3
    'omicron':	'&#959;',	# greek small letter omicron, U+03BF NEW
    'oplus':	'&#8853;',	# circled plus = direct sum, U+2295 ISOamsb
    'or':	'&#8744;',	# logical or = vee, U+2228 ISOtech
    'ordf':	chr(170),	# feminine ordinal indicator, U+00AA ISOnum
    'ordm':	chr(186),	# masculine ordinal indicator, U+00BA ISOnum
    'oslash':	chr(248),	# latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1
    'otilde':	chr(245),	# latin small letter o with tilde, U+00F5 ISOlat1
    'otimes':	'&#8855;',	# circled times = vector product, U+2297 ISOamsb
    'ouml':	chr(246),	# latin small letter o with diaeresis, U+00F6 ISOlat1
    'para':	chr(182),	# pilcrow sign = paragraph sign, U+00B6 ISOnum
    'part':	'&#8706;',	# partial differential, U+2202 ISOtech
    'permil':	'&#8240;',	# per mille sign, U+2030 ISOtech
    'perp':	'&#8869;',	# up tack = orthogonal to = perpendicular, U+22A5 ISOtech
    'phi':	'&#966;',	# greek small letter phi, U+03C6 ISOgrk3
    'pi':	'&#960;',	# greek small letter pi, U+03C0 ISOgrk3
    'piv':	'&#982;',	# greek pi symbol, U+03D6 ISOgrk3
    'plusmn':	chr(177),	# plus-minus sign = plus-or-minus sign, U+00B1 ISOnum
    'pound':	chr(163),	# pound sign, U+00A3 ISOnum
    'prime':	'&#8242;',	# prime = minutes = feet, U+2032 ISOtech
    'prod':	'&#8719;',	# n-ary product = product sign, U+220F ISOamsb
    'prop':	'&#8733;',	# proportional to, U+221D ISOtech
    'psi':	'&#968;',	# greek small letter psi, U+03C8 ISOgrk3
    'rArr':	'&#8658;',	# rightwards double arrow, U+21D2 ISOtech
    'radic':	'&#8730;',	# square root = radical sign, U+221A ISOtech
    'rang':	'&#9002;',	# right-pointing angle bracket = ket, U+232A ISOtech
    'raquo':	chr(187),	# right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum
    'rarr':	'&#8594;',	# rightwards arrow, U+2192 ISOnum
    'rceil':	'&#8969;',	# right ceiling, U+2309 ISOamsc
    'rdquo':	'&#8221;',	# right double quotation mark, U+201D ISOnum
    'real':	'&#8476;',	# blackletter capital R = real part symbol, U+211C ISOamso
    'reg':	chr(174),	# registered sign = registered trade mark sign, U+00AE ISOnum
    'rfloor':	'&#8971;',	# right floor, U+230B ISOamsc
    'rho':	'&#961;',	# greek small letter rho, U+03C1 ISOgrk3
    'rlm':	'&#8207;',	# right-to-left mark, U+200F NEW RFC 2070
    'rsaquo':	'&#8250;',	# single right-pointing angle quotation mark, U+203A ISO proposed
    'rsquo':	'&#8217;',	# right single quotation mark, U+2019 ISOnum
    'sbquo':	'&#8218;',	# single low-9 quotation mark, U+201A NEW
    'sdot':	'&#8901;',	# dot operator, U+22C5 ISOamsb
    'sect':	chr(167),	# section sign, U+00A7 ISOnum
    'shy':	chr(173),	# soft hyphen = discretionary hyphen, U+00AD ISOnum
    'sigma':	'&#963;',	# greek small letter sigma, U+03C3 ISOgrk3
    'sigmaf':	'&#962;',	# greek small letter final sigma, U+03C2 ISOgrk3
    'sim':	'&#8764;',	# tilde operator = varies with = similar to, U+223C ISOtech
    'spades':	'&#9824;',	# black spade suit, U+2660 ISOpub
    'sub':	'&#8834;',	# subset of, U+2282 ISOtech
    'sube':	'&#8838;',	# subset of or equal to, U+2286 ISOtech
    'sum':	'&#8721;',	# n-ary sumation, U+2211 ISOamsb
    'sup':	'&#8835;',	# superset of, U+2283 ISOtech
    'sup1':	chr(185),	# superscript one = superscript digit one, U+00B9 ISOnum
    'sup2':	chr(178),	# superscript two = superscript digit two = squared, U+00B2 ISOnum
    'sup3':	chr(179),	# superscript three = superscript digit three = cubed, U+00B3 ISOnum
    'supe':	'&#8839;',	# superset of or equal to, U+2287 ISOtech
    'szlig':	chr(223),	# latin small letter sharp s = ess-zed, U+00DF ISOlat1
    'tau':	'&#964;',	# greek small letter tau, U+03C4 ISOgrk3
    'there4':	'&#8756;',	# therefore, U+2234 ISOtech
    'theta':	'&#952;',	# greek small letter theta, U+03B8 ISOgrk3
    'thetasym':	'&#977;',	# greek small letter theta symbol, U+03D1 NEW
    'thinsp':	'&#8201;',	# thin space, U+2009 ISOpub
    'thorn':	chr(254),	# latin small letter thorn with, U+00FE ISOlat1
    'times':	chr(215),	# multiplication sign, U+00D7 ISOnum
    'trade':	'&#8482;',	# trade mark sign, U+2122 ISOnum
    'uArr':	'&#8657;',	# upwards double arrow, U+21D1 ISOamsa
    'uacute':	chr(250),	# latin small letter u with acute, U+00FA ISOlat1
    'uarr':	'&#8593;',	# upwards arrow, U+2191 ISOnum
    'ucirc':	chr(251),	# latin small letter u with circumflex, U+00FB ISOlat1
    'ugrave':	chr(249),	# latin small letter u with grave, U+00F9 ISOlat1
    'uml':	chr(168),	# diaeresis = spacing diaeresis, U+00A8 ISOdia
    'upsih':	'&#978;',	# greek upsilon with hook symbol, U+03D2 NEW
    'upsilon':	'&#965;',	# greek small letter upsilon, U+03C5 ISOgrk3
    'uuml':	chr(252),	# latin small letter u with diaeresis, U+00FC ISOlat1
    'weierp':	'&#8472;',	# script capital P = power set = Weierstrass p, U+2118 ISOamso
    'xi':	'&#958;',	# greek small letter xi, U+03BE ISOgrk3
    'yacute':	chr(253),	# latin small letter y with acute, U+00FD ISOlat1
    'yen':	chr(165),	# yen sign = yuan sign, U+00A5 ISOnum
    'yuml':	chr(255),	# latin small letter y with diaeresis, U+00FF ISOlat1
    'zeta':	'&#950;',	# greek small letter zeta, U+03B6 ISOgrk3
    'zwj':	'&#8205;',	# zero width joiner, U+200D NEW RFC 2070
    'zwnj':	'&#8204;',	# zero width non-joiner, U+200C NEW RFC 2070

}

From tim_one at email.msn.com  Tue Aug 17 09:30:17 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 17 Aug 1999 03:30:17 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37B8042B.21DE6053@lemburg.com>
Message-ID: <000001bee882$5b7d8da0$112d2399@tim>

[about weakdicts and the possibility of building them on weak
 references; the obvious way doesn't clean up the dict itself by
 magic; maybe a weak object should be notified when its referent
 goes away
]

[M.-A. Lemburg]
> Perhaps one could fiddle something out of the Proxy objects
> in mxProxy (you know where...). These support a special __cleanup__
> protocol that I use a lot to work around circular garbage:
> the __cleanup__ method of the referenced object is called prior
> to destroying the proxy; even if the reference count on the
> object has not yet gone down to 0.
>
> This makes direct circles possible without problems: the parent
> can reference a child through the proxy and the child can reference the
> parent directly.

What you just wrote is:

    parent --> proxy --> child -->+
    ^                             v
    +<----------------------------+

Looks like a plain old cycle to me!

> As soon as the parent is cleaned up, the reference to
> the proxy is deleted which then automagically makes the
> back reference in the child disappear, allowing the parent
> to be deallocated after cleanup without leaving a circular
> reference around.

M-A, this is making less sense by the paragraph <wink>:  skipping the
middle, this says "as soon as the parent is cleaned up ... allowing the
parent to be deallocated after cleanup".  If we presume that the parent gets
cleaned up explicitly (since the reference from the child is keeping it
alive, it's not going to get cleaned up by magic, right?), then the parent
could just as well call the __cleanup__ methods of the things it references
directly without bothering with a proxy.  For that matter, if it's the
straightforward

    parent <-> child

kind of cycle, the parent's cleanup method can just do

    self.__dict__.clear()

and the cycle is broken without writing a __cleanup__ method anywhere
(that's what I usually do, and in this kind of cycle that clears the last
reference to the child, which then goes away, which in turn automagically
clears its back reference to the parent).

So, offhand, I don't see that the proxy protocol could help here.  In a
sense, what's really needed is the opposite:  notifying the *proxy* when the
*real* object goes away (which makes no sense in the context of what your
proxy objects were designed to do).

[about Java and its four reference strengths]

Found a good introductory writeup at (sorry, my mailer will break this URL,
so I'll break it myself at a sensible place):

http://developer.java.sun.com/developer/
    technicalArticles//ALT/RefObj/index.html

They have a class for each of the three "not strong" flavors of references.
For all three you pass the referenced object to the constructor, and all
three accept (optional in two of the flavors) a second ReferenceQueue
argument.  In the latter case, when the referenced object goes away the
weak/soft/phantom-ref proxy object is placed on the queue.  Which, in turn,
is a thread-safe queue with various put, get, and timeout-limited polling
functions.  So you have to write code to look at the queue from time to
time, to find the proxies whose referents have gone away.

The three flavors may (or may not ...) have these motivations:

soft:  an object reachable at strongest by soft references can go away at
any time, but the garbage collector strives to keep it intact until it can't
find any other way to get enough memory

weak:  an object reachable at strongest by weak references can go away at
any time, and the collector makes no attempt to delay its death

phantom:  an object reachable at strongest by phantom references can get
*finalized* at any time, but won't get *deallocated* before its phantom
proxy does something or other (goes away? wasn't clear).  This is the flavor
that requires passing a queue argument to the constructor.  Seems to be a
major hack to worm around Java's notorious problems with order of
finalization -- along the lines that you give phantom referents trivial
finalizers, and put the real cleanup logic in the phantom proxy.  This lets
your program take responsibility for running the real cleanup code in the
order-- and in the thread! --where it makes sense.

Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with
under-the-cover weak keys (unlike Dieter's flavor with weak values), and
where the key+value pairs vanish by magic when the key object goes away.
The details and the implementation of these guys waren't clear to me, but
then I didn't download the code, just scanned the online docs.


Ah, a correction to my last post:

class _Weak:
    ...
    def __del__(self):
        # this is purely an optimization:  if self gets nuked,
        # exempt its referent from greater expense when *it*
        # dies
        if self.id is not None:
            __clear_weak_bit(__id2obj(self.id))
            del id2weak[self.id]

Root of all evil:  this method is useless, since the id2weak dict keeps each
_Weak object alive until its referent goes away (at which time self.id gets
set to None, so _Weak.__del__ doesn't do anything).  Even if it did do
something, it's no cheaper to do it here than in the systemt cleanup code
("greater expense" was wrong).

weakly y'rs  - tim


PS:  Ooh!  Ooh!  Fellow at work today was whining about weakdicts, and
called them "limp dicts".  I'm not entirely sure it was an innocent Freudian
slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds
like American slang for "flaccid one-eyed trouser snake" ...).


From fredrik at pythonware.com  Tue Aug 17 09:23:03 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 17 Aug 1999 09:23:03 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <Pine.WNT.4.04.9908161005190.281-100000@rigoletto.ski.org>
Message-ID: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com>

David Ascher <da at ski.org> wrote:
> Why?  PIL was designed for image processing, and made design decisions
> appropriate to that domain.  NumPy was designed for multidimensional
> numeric array processing, and made design decisions appropriate to that
> domain. The intersection of interests exists (e.g. in the medical imaging
> world), and I know people who spend a lot of their CPU time moving data
> between images and arrays with "stupid" tostring/fromstring operations.  
> Given the size of the images, it's a prodigious waste of time, and kills
> the use of Python in many a project.

as an aside, PIL 1.1 (*) introduces "virtual image memories" which
are, as I mentioned in an earlier post, accessed via an API rather
than via direct pointers.  it'll also include an adapter allowing you
to use NumPy objects as image memories.

unfortunately, the buffer interface is not good enough to use
on top of the virtual image memory interface...

</F>

*) 1.1 is our current development thread, which will be
released to plus customers in a number of weeks...


From mal at lemburg.com  Tue Aug 17 10:50:01 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Tue, 17 Aug 1999 10:50:01 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000001bee882$5b7d8da0$112d2399@tim>
Message-ID: <37B92239.4076841E@lemburg.com>

Tim Peters wrote:
> 
> [about weakdicts and the possibility of building them on weak
>  references; the obvious way doesn't clean up the dict itself by
>  magic; maybe a weak object should be notified when its referent
>  goes away
> ]
> 
> [M.-A. Lemburg]
> > Perhaps one could fiddle something out of the Proxy objects
> > in mxProxy (you know where...). These support a special __cleanup__
> > protocol that I use a lot to work around circular garbage:
> > the __cleanup__ method of the referenced object is called prior
> > to destroying the proxy; even if the reference count on the
> > object has not yet gone down to 0.
> >
> > This makes direct circles possible without problems: the parent
> > can reference a child through the proxy and the child can reference the
> > parent directly.
> 
> What you just wrote is:
> 
>     parent --> proxy --> child -->+
>     ^                             v
>     +<----------------------------+
> 
> Looks like a plain old cycle to me!

Sure :-) That was the intention. I'm using this to implement
acquisition without turning to ExtensionClasses. [Nice picture, BTW]
 
> > As soon as the parent is cleaned up, the reference to
> > the proxy is deleted which then automagically makes the
> > back reference in the child disappear, allowing the parent
> > to be deallocated after cleanup without leaving a circular
> > reference around.
> 
> M-A, this is making less sense by the paragraph <wink>:  skipping the
> middle, this says "as soon as the parent is cleaned up ... allowing the
> parent to be deallocated after cleanup".  If we presume that the parent gets
> cleaned up explicitly (since the reference from the child is keeping it
> alive, it's not going to get cleaned up by magic, right?), then the parent
> could just as well call the __cleanup__ methods of the things it references
> directly without bothering with a proxy.  For that matter, if it's the
> straightforward
> 
>     parent <-> child
> 
> kind of cycle, the parent's cleanup method can just do
> 
>     self.__dict__.clear()
> 
> and the cycle is broken without writing a __cleanup__ method anywhere
> (that's what I usually do, and in this kind of cycle that clears the last
> reference to the child, which then goes away, which in turn automagically
> clears its back reference to the parent).
> 
> So, offhand, I don't see that the proxy protocol could help here.  In a
> sense, what's really needed is the opposite:  notifying the *proxy* when the
> *real* object goes away (which makes no sense in the context of what your
> proxy objects were designed to do).

All true :-). The nice thing about the proxy is that it takes
care of the process automagically. And yes, the parent is used
via a proxy too. So the picture looks like this:

--> proxy --> parent --> proxy --> child -->+
              ^                             v
              +<----------------------------+

Since the proxy isn't noticed by the referencing objects (well, at
least if they don't fiddle with internals), the picture for the
objects looks like this:

--> parent --> child -->+
    ^                   v
    +<------------------+

You could of course do the same via explicit invokation of
the __cleanup__ method, but the object references involved could be
hidden in some other structure, so they might be hard to find.

And there's another feature about Proxies (as defined in mxProxy):
they allow you to control access in a much more strict way than
Python does. You can actually hide attributes and methods you
don't want exposed in a way that doesn't even let you access them
via some dict or pass me the frame object trick. This is very useful
when you program multi-user application host servers where you don't
want users to access internal structures of the server.

> [about Java and its four reference strengths]
> 
> Found a good introductory writeup at (sorry, my mailer will break this URL,
> so I'll break it myself at a sensible place):
> 
> http://developer.java.sun.com/developer/
>     technicalArticles//ALT/RefObj/index.html

Thanks for the reference... and for the summary ;-)
 
> They have a class for each of the three "not strong" flavors of references.
> For all three you pass the referenced object to the constructor, and all
> three accept (optional in two of the flavors) a second ReferenceQueue
> argument.  In the latter case, when the referenced object goes away the
> weak/soft/phantom-ref proxy object is placed on the queue.  Which, in turn,
> is a thread-safe queue with various put, get, and timeout-limited polling
> functions.  So you have to write code to look at the queue from time to
> time, to find the proxies whose referents have gone away.
> 
> The three flavors may (or may not ...) have these motivations:
> 
> soft:  an object reachable at strongest by soft references can go away at
> any time, but the garbage collector strives to keep it intact until it can't
> find any other way to get enough memory

So there is a possibility of reviving these objects, right ? 

I've just recently added a hackish function to my mxTools which allows
me to regain access to objects via their address (no, not thread safe,
not even necessarily correct). 
 
sys.makeref(id) 
         Provided that id is a valid address of a Python object (id(object) returns this address),
         this function returns a new reference to it. Only objects that are "alive" can be referenced
         this way, ones with zero reference count cause an exception to be raised. 

         You can use this function to reaccess objects lost during garbage collection.

         USE WITH CARE: this is an expert-only function since it can cause instant core dumps and
         many other strange things -- even ruin your system if you don't know what you're doing ! 

         SECURITY WARNING: This function can provide you with access to objects that are
         otherwise not visible, e.g. in restricted mode, and thus be a potential security hole. 

I use it for tracking objects via id-key based dictionary and
hooks in the create/del mechanisms of Python instances. It helps
finding those memory eating cycles. 

> weak:  an object reachable at strongest by weak references can go away at
> any time, and the collector makes no attempt to delay its death
> 
> phantom:  an object reachable at strongest by phantom references can get
> *finalized* at any time, but won't get *deallocated* before its phantom
> proxy does something or other (goes away? wasn't clear).  This is the flavor
> that requires passing a queue argument to the constructor.  Seems to be a
> major hack to worm around Java's notorious problems with order of
> finalization -- along the lines that you give phantom referents trivial
> finalizers, and put the real cleanup logic in the phantom proxy.  This lets
> your program take responsibility for running the real cleanup code in the
> order-- and in the thread! --where it makes sense.

Wouldn't these flavors be possible using the following setup ? Note
that it's quite similar to your _Weak class except that I use a
proxy without the need to first get a strong reference for the
object and that it doesn't use a weak bit.

--> proxy --> object
                ^
                |
         all_managed_objects

all_managed_objects is a dictionary indexed by address (its id)
and keeps a strong reference to the objects. The proxy does
not keep a strong reference to the object, but only the address
as integer and checks the ref-count on the object in the
all_managed_objects dictionary prior to every dereferencing
action. In case this refcount falls down to 1 (only the
all_managed_objects dict references it), the proxy takes
appropriate action, e.g. raises an exceptions and deletes
the reference in all_managed_objects to mimic a weak reference.
The same check is done prior to garbage collection of the
proxy.

Add to this some queues, pepper and salt and place it in an
oven at 220? for 20 minutes... plus take a look every 10 seconds
or so...

The downside is obvious: the zombified object will not get inspected
(and then GCed) until the next time a weak reference to it is used.

> Java 1.2 *also* tosses in a WeakHashMap class, which is a dict with
> under-the-cover weak keys (unlike Dieter's flavor with weak values), and
> where the key+value pairs vanish by magic when the key object goes away.
> The details and the implementation of these guys waren't clear to me, but
> then I didn't download the code, just scanned the online docs.

Would the above help in creating such beasts ?
 
> Ah, a correction to my last post:
> 
> class _Weak:
>     ...
>     def __del__(self):
>         # this is purely an optimization:  if self gets nuked,
>         # exempt its referent from greater expense when *it*
>         # dies
>         if self.id is not None:
>             __clear_weak_bit(__id2obj(self.id))
>             del id2weak[self.id]
> 
> Root of all evil:  this method is useless, since the id2weak dict keeps each
> _Weak object alive until its referent goes away (at which time self.id gets
> set to None, so _Weak.__del__ doesn't do anything).  Even if it did do
> something, it's no cheaper to do it here than in the systemt cleanup code
> ("greater expense" was wrong).
> 
> weakly y'rs  - tim
> 
> PS:  Ooh!  Ooh!  Fellow at work today was whining about weakdicts, and
> called them "limp dicts".  I'm not entirely sure it was an innocent Freudian
> slut, but it's a funny pun even if it wasn't (for you foreigners, it sounds
> like American slang for "flaccid one-eyed trouser snake" ...).

:-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   136 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mhammond at skippinet.com.au  Tue Aug 17 18:05:40 1999
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed, 18 Aug 1999 02:05:40 +1000
Subject: [Python-Dev] buffer interface considered harmful
In-Reply-To: <00c201bee884$42a10ad0$f29b12c2@secret.pythonware.com>
Message-ID: <000901bee8ca$5ceff4a0$1101a8c0@bobcat>

Fredrik,
	Care to elaborate?  Statements like "buffer interface needs a redesign" or
"the buffer interface is not good enough to use on top of the virtual image
memory interface" really only give me the impression you have a bee in your
bonnet over these buffer interfaces.

If you could actually stretch these statements out to provide even _some_
background, problem statement or potential solution it would help.  All I
know is "Fredrik doesnt like it for some unexplained reason".  You found an
issue with array reallocation - great - but thats a bug rather than a
design flaw.

Can you tell us why its not good enough, and an off-the-cuff design that
would solve it?  Or are you suggesting it is unsolvable?  I really dont
have a clue what your issue is.  Jim (for example) has made his position
and reasoning clear.  You have only made your position clear, but your
reasoning is still a mystery.

Mark.

>
> unfortunately, the buffer interface is not good enough to use
> on top of the virtual image memory interface...


From fredrik at pythonware.com  Tue Aug 17 18:48:31 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 17 Aug 1999 18:48:31 +0200
Subject: [Python-Dev] buffer interface considered harmful
References: <000901bee8ca$5ceff4a0$1101a8c0@bobcat>
Message-ID: <005201bee8d0$9b4737d0$f29b12c2@secret.pythonware.com>

> Care to elaborate?  Statements like "buffer interface needs a redesign" or
> "the buffer interface is not good enough to use on top of the virtual image
> memory interface" really only give me the impression you have a bee in your
> bonnet over these buffer interfaces.

re "good enough":
http://www.python.org/pipermail/python-dev/1999-August/000650.html

re "needs a redesign":
http://www.python.org/pipermail/python-dev/1999-August/000659.html
and to some extent:
http://www.python.org/pipermail/python-dev/1999-August/000658.html

> Jim (for example) has made his position and reasoning clear.

among other things, Jim said:

    "At this point, I don't have a good idea what buffers are
    for and I don't see alot of evidence that there *is* a design.
    I assume that there was a design, but I can't see it".

which pretty much echoes my concerns in:

http://www.python.org/pipermail/python-dev/1999-August/000612.html
http://www.python.org/pipermail/python-dev/1999-August/000648.html

> You found an issue with array reallocation - great - but thats
> a bug rather than a design flaw.

for me, that bug (and the marshal glitch) indicates that the
design isn't as chrystal-clear as it needs to be, for such a
fundamental feature.  otherwise, Greg would never have
made that mistake, and Guido would have spotted it when
he added the "buffer" built-in...

so what are you folks waiting for?   could someone who
thinks he understands exactly what this thing is spend
an hour on writing that design document, so me and Jim
can put this entire thing behind us?

</F>

PS. btw, was it luck or careful analysis behind the decision
to make buffer() always return read-only buffers, also for
objects implementing the read/write protocol?


From da at ski.org  Wed Aug 18 00:41:14 1999
From: da at ski.org (David Ascher)
Date: Tue, 17 Aug 1999 15:41:14 -0700 (Pacific Daylight Time)
Subject: [Python-Dev] marshal (was:Buffer interface in abstract.c? ) 
In-Reply-To: <19990816094243.3CE83303120@snelboot.oratrix.nl>
Message-ID: <Pine.WNT.4.04.9908160953490.281-100000@rigoletto.ski.org>

On Mon, 16 Aug 1999, Jack Jansen wrote:

> Would adding a buffer interface to cobject solve your problem? Cobject is 
> described as being used for passing C objects between Python modules, but I've 
> always thought of it as passing C objects from one C routine to another C 
> routine through Python, which doesn't necessarily understand what the object 
> is all about.
> 
> That latter description seems to fit your bill quite nicely.

It's an interesting idea, but it wouldn't do as it is, as I'd need the
ability to create a CObject given a memory location and a size.  Also, I
am not expected to free() the memory, which would happen when the CObject
got GC'ed.

(BTW: I am *not* arguing that PyBuffer_FromReadWriteMemory() should be
exposed by default.  I'm happy with exposing it in my little extension
module for my exotic needs.)

--david


From mal at lemburg.com  Wed Aug 18 11:02:02 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 11:02:02 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <000001bee882$5b7d8da0$112d2399@tim> <37B92239.4076841E@lemburg.com>
Message-ID: <37BA768A.50DF5574@lemburg.com>

[about weakdicts and the possibility of building them on weak
 references; the obvious way doesn't clean up the dict itself by
 magic; maybe a weak object should be notified when its referent
 goes away
]

Here is a new version of my Proxy package which includes a
self managing weak reference mechanism without the need to
add extra bits or bytes to all Python objects:

  http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip

The docs and an explanation of how the thingie works are
included in the archive's Doc subdir. Basically it builds
upon the idea I posted earlier on on this thread -- with
a few extra kicks to get it right in the end ;-)

Usage is pretty simple:

from Proxy import WeakProxy
object = []
wr = WeakProxy(object)
wr.append(8)
del object

>>> wr[0]
Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected

I have checked the ref counts pretty thoroughly, but before
going public I would like the Python-Dev crowd to run some
tests as well: after all, the point is for the weak references
to be weak and that's sometimes a bit hard to check.

Hope you have as much fun with it as I had writing it ;-)

Ah yes, for the raw details have a look at the code. The code
uses a list of back references to the weak Proxies and notifies them
when the object goes away... would it be useful to add a hook
to the Proxies so that they can apply some other action as well ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 18 13:42:08 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 12:42:08 +0100 (NFT)
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37BA768A.50DF5574@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 11:02:02 am"
Message-ID: <199908181142.MAA22596@pukapuka.inrialpes.fr>

M.-A. Lemburg wrote:
> 
> Usage is pretty simple:
> 
> from Proxy import WeakProxy
> object = []
> wr = WeakProxy(object)
> wr.append(8)
> del object
> 
> >>> wr[0]
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> mxProxy.LostReferenceError: object already garbage collected
> 
> I have checked the ref counts pretty thoroughly, but before
> going public I would like the Python-Dev crowd to run some
> tests as well: after all, the point is for the weak references
> to be weak and that's sometimes a bit hard to check.

It's even harder to implement them without side effects. I used
the same hack for the __heirs__ class attribute some time ago.
But I knew that a parent class cannot be garbage collected before
all of its descendants. That allowed me to keep weak refs in
the parent class, and preserve the existing strong refs in the
subclasses. On every dealloc of a subclass, the corresponding
weak ref in the parent class' __heirs__ is removed.

In your case, the lifetime of the objects cannot be predicted,
so implementing weak refs by messing with refcounts or checking
mem pointers is a dead end. I don't know whether this is the
case with mxProxy as I just browsed the code quickly, but here's
a scenario where your scheme (or implementation) is not working:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = WeakProxy(o)
>>> d = WeakProxy(o)
>>> p
<WeakProxy object at 20260138>
>>> d
<WeakProxy object at 20261328>
>>> print p
[]
>>> print d
[]
>>> del o
>>> p
<WeakProxy object at 20260138>
>>> d
<WeakProxy object at 20261328>
>>> print p
Illegal instruction (core dumped)


-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack at oratrix.nl  Wed Aug 18 13:02:13 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 18 Aug 1999 13:02:13 +0200
Subject: [Python-Dev] Quick-and-dirty weak references 
In-Reply-To: Message by "M.-A. Lemburg" <mal@lemburg.com> ,
	     Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> 
Message-ID: <19990818110213.A558F303120@snelboot.oratrix.nl>

The one thing I'm not thrilled by in mxProxy is that a call to 
CheckWeakReferences() is needed before an object is cleaned up. I guess this 
boils down to the same problem I had with my weak reference scheme: you 
somehow want the Python core to tell the proxy stuff that the object can be 
cleaned up (although the details are different: in my scheme this would be 
triggered by refcount==0 and in mxProxy by refcount==1). And because objects 
are created and destroyed in Python at a tremendous rate you don't want to do 
this call for every object, only if you have a hint that the object has a weak 
reference (or a proxy).
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From mal at lemburg.com  Wed Aug 18 13:46:45 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 13:46:45 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <19990818110213.A558F303120@snelboot.oratrix.nl>
Message-ID: <37BA9D25.95E46EA@lemburg.com>

Jack Jansen wrote:
> 
> The one thing I'm not thrilled by in mxProxy is that a call to
> CheckWeakReferences() is needed before an object is cleaned up. I guess this
> boils down to the same problem I had with my weak reference scheme: you
> somehow want the Python core to tell the proxy stuff that the object can be
> cleaned up (although the details are different: in my scheme this would be
> triggered by refcount==0 and in mxProxy by refcount==1). And because objects
> are created and destroyed in Python at a tremendous rate you don't want to do
> this call for every object, only if you have a hint that the object has a weak
> reference (or a proxy).

Well, the check is done prior to every action using a proxy to
the object and also when a proxy to it is deallocated. The
addition checkweakrefs() API is only included to enable additional explicit
checking of the whole weak refs dictionary, e.g. every 10 seconds
or so (just like you would with a mark&sweep GC).

But yes, GC of the phantom object is delayed a bit depending on
how you set up the proxies. Still, I think most usages won't have
this problem, since the proxies themselves are usually 
temporary objects.

It may sometimes even make sense to have the phantom object
around as long as possible, e.g. to implement the soft references
Tim quoted from the Java paper.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From mal at lemburg.com  Wed Aug 18 13:33:18 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 13:33:18 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181142.MAA22596@pukapuka.inrialpes.fr>
Message-ID: <37BA99FE.45D582AD@lemburg.com>

Vladimir Marangozov wrote:
> 
> M.-A. Lemburg wrote:
> > I have checked the ref counts pretty thoroughly, but before
> > going public I would like the Python-Dev crowd to run some
> > tests as well: after all, the point is for the weak references
> > to be weak and that's sometimes a bit hard to check.
> 
> It's even harder to implement them without side effects. I used
> the same hack for the __heirs__ class attribute some time ago.
> But I knew that a parent class cannot be garbage collected before
> all of its descendants. That allowed me to keep weak refs in
> the parent class, and preserve the existing strong refs in the
> subclasses. On every dealloc of a subclass, the corresponding
> weak ref in the parent class' __heirs__ is removed.
> 
> In your case, the lifetime of the objects cannot be predicted,
> so implementing weak refs by messing with refcounts or checking
> mem pointers is a dead end.

> I don't know whether this is the
> case with mxProxy as I just browsed the code quickly, but here's
> a scenario where your scheme (or implementation) is not working:
> 
> >>> from Proxy import WeakProxy
> >>> o = []
> >>> p = WeakProxy(o)
> >>> d = WeakProxy(o)
> >>> p
> <WeakProxy object at 20260138>
> >>> d
> <WeakProxy object at 20261328>
> >>> print p
> []
> >>> print d
> []
> >>> del o
> >>> p
> <WeakProxy object at 20260138>
> >>> d
> <WeakProxy object at 20261328>
> >>> print p
> Illegal instruction (core dumped)

Could you tell me where the core dump originates ? Also, it would
help to compile the package with the -DMAL_DEBUG switch turned
on (edit Setup) and then run the same things using 'python -d'.
The package will then print a pretty complete list of things it
is doing to mxProxy.log, which would help track down errors like
these.

BTW, I get:
>>> print p

Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected
>>>

[Don't know why the print statement prints an empty line, though.]

Thanks for trying it,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 18 15:12:14 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 14:12:14 +0100 (NFT)
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <37BA99FE.45D582AD@lemburg.com> from "M.-A. Lemburg" at "Aug 18, 99 01:33:18 pm"
Message-ID: <199908181312.OAA20542@pukapuka.inrialpes.fr>

[about mxProxy, WeakProxy]

M.-A. Lemburg wrote:
> 
> Could you tell me where the core dump originates ? Also, it would
> help to compile the package with the -DMAL_DEBUG switch turned
> on (edit Setup) and then run the same things using 'python -d'.
> The package will then print a pretty complete list of things it
> is doing to mxProxy.log, which would help track down errors like
> these.
> 
> BTW, I get:
> >>> print p
> 
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> mxProxy.LostReferenceError: object already garbage collected
> >>>
> 
> [Don't know why the print statement prints an empty line, though.]
> 

The previous example now *seems* to work fine in a freshly launched
interpreter, so it's not a good example, but this shorter one
definitely doesn't:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = q = WeakProxy(o)
>>> p = q = WeakProxy(o)
>>> del o
>>> print p or q
Illegal instruction (core dumped)

Or even shorter:

>>> from Proxy import WeakProxy
>>> o = []
>>> p = q = WeakProxy(o)
>>> p = WeakProxy(o)
>>> del o
>>> print p
Illegal instruction (core dumped)

It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference().
I can mail you a complete trace in private, if you still need it.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Wed Aug 18 14:50:08 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Wed, 18 Aug 1999 14:50:08 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181312.OAA20542@pukapuka.inrialpes.fr>
Message-ID: <37BAAC00.27A34FF7@lemburg.com>

Vladimir Marangozov wrote:
> 
> [about mxProxy, WeakProxy]
> 
> M.-A. Lemburg wrote:
> >
> > Could you tell me where the core dump originates ? Also, it would
> > help to compile the package with the -DMAL_DEBUG switch turned
> > on (edit Setup) and then run the same things using 'python -d'.
> > The package will then print a pretty complete list of things it
> > is doing to mxProxy.log, which would help track down errors like
> > these.
> >
> > BTW, I get:
> > >>> print p
> >
> > Traceback (innermost last):
> >   File "<stdin>", line 1, in ?
> > mxProxy.LostReferenceError: object already garbage collected
> > >>>
> >
> > [Don't know why the print statement prints an empty line, though.]
> >
> 
> The previous example now *seems* to work fine in a freshly launched
> interpreter, so it's not a good example, but this shorter one
> definitely doesn't:
> 
> >>> from Proxy import WeakProxy
> >>> o = []
> >>> p = q = WeakProxy(o)
> >>> p = q = WeakProxy(o)
> >>> del o
> >>> print p or q
> Illegal instruction (core dumped)
> 
> It crashes in PyDict_DelItem() called from mxProxy_CollectWeakReference().
> I can mail you a complete trace in private, if you still need it.

That would be nice (please also include the log-file), because I get:
>>> print p or q
Traceback (innermost last):
  File "<stdin>", line 1, in ?
mxProxy.LostReferenceError: object already garbage collected
>>>

Thank you,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   135 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From skip at mojam.com  Wed Aug 18 16:47:23 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 09:47:23 -0500
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
Message-ID: <199908181447.JAA05151@dolphin.mojam.com>

I posted a note to the main list yesterday in response to Dan Connolly's
complaint that the os module isn't very portable.  I saw no followups (it's
amazing how fast a thread can die out :-), but I think it's a reasonable
idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback
from people more interesting in long-term Python developments.

The basic premise is that for each platform on which Python runs there are
portable and nonportable interfaces to the underlying operating system.  The
term POSIX has some portability connotations, so let's assume that the posix
module exposes the portable subset of the OS interface.  To keep things
simple, let's also assume there are only three supported general OS
platforms: unix, nt and mac.  The proposal then is that importing the
platform's module by name will import both the portable and non-portable
interface elements.  Importing the posix module will import just that
portion of the interface that is truly portable across all platforms.  To
add new functionality to the posix interface it would have to be added to
all three platforms.  The posix module will be able to ferret out the
platform it is running on and import the correct OS-independent posix
implementation:

    import sys
    _plat = sys.platform
    del sys

    if _plat == "mac": from posixmac import *
    elif _plat == "nt": from posixnt import *
    else: from posixunix import *	# some unix variant

The platform-dependent module would simply import everything it could, e.g.:

    from posixunix import *
    from nonposixunix import *

The os module would vanish or be deprecated with its current behavior
intact.  The documentation would be modified so that the posix module
documents the portable interface and the OS-dependent module's documentation
documents the rest and just refers users to the posix module docs for the
portable stuff.

In theory, this could be done for 1.6, however as I've proposed it, the
semantics of importing the posix module would change.  Dan Connolly probably
isn't going to have a problem with that, though I suppose Guido might...  If
this idea is good enough for 1.6, perhaps we leave os and posix module
semantics alone and add a module named "portable", "portableos" or
"portableposix" or something equally arcane.

Skip Montanaro	| http://www.mojam.com/
skip at mojam.com  | http://www.musi-cal.com/~skip/
847-971-7098


From guido at CNRI.Reston.VA.US  Wed Aug 18 16:54:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 18 Aug 1999 10:54:28 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Your message of "Wed, 18 Aug 1999 09:47:23 CDT."
             <199908181447.JAA05151@dolphin.mojam.com> 
References: <199908181447.JAA05151@dolphin.mojam.com> 
Message-ID: <199908181454.KAA07692@eric.cnri.reston.va.us>

> I posted a note to the main list yesterday in response to Dan Connolly's
> complaint that the os module isn't very portable.  I saw no followups (it's
> amazing how fast a thread can die out :-), but I think it's a reasonable
> idea, perhaps for Python 2.0, so I'll repeat it here to get some feedback
> from people more interesting in long-term Python developments.
> 
> The basic premise is that for each platform on which Python runs there are
> portable and nonportable interfaces to the underlying operating system.  The
> term POSIX has some portability connotations, so let's assume that the posix
> module exposes the portable subset of the OS interface.  To keep things
> simple, let's also assume there are only three supported general OS
> platforms: unix, nt and mac.  The proposal then is that importing the
> platform's module by name will import both the portable and non-portable
> interface elements.  Importing the posix module will import just that
> portion of the interface that is truly portable across all platforms.  To
> add new functionality to the posix interface it would have to be added to
> all three platforms.  The posix module will be able to ferret out the
> platform it is running on and import the correct OS-independent posix
> implementation:
> 
>     import sys
>     _plat = sys.platform
>     del sys
> 
>     if _plat == "mac": from posixmac import *
>     elif _plat == "nt": from posixnt import *
>     else: from posixunix import *	# some unix variant
> 
> The platform-dependent module would simply import everything it could, e.g.:
> 
>     from posixunix import *
>     from nonposixunix import *
> 
> The os module would vanish or be deprecated with its current behavior
> intact.  The documentation would be modified so that the posix module
> documents the portable interface and the OS-dependent module's documentation
> documents the rest and just refers users to the posix module docs for the
> portable stuff.
> 
> In theory, this could be done for 1.6, however as I've proposed it, the
> semantics of importing the posix module would change.  Dan Connolly probably
> isn't going to have a problem with that, though I suppose Guido might...  If
> this idea is good enough for 1.6, perhaps we leave os and posix module
> semantics alone and add a module named "portable", "portableos" or
> "portableposix" or something equally arcane.

And the advantage of this would be...?

Basically, it seems you're just renaming the functionality of os to posix.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From skip at mojam.com  Wed Aug 18 17:10:41 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:10:41 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <199908181454.KAA07692@eric.cnri.reston.va.us>
References: <199908181447.JAA05151@dolphin.mojam.com>
	<199908181454.KAA07692@eric.cnri.reston.va.us>
Message-ID: <14266.51743.904066.470431@dolphin.mojam.com>

    Guido> And the advantage of this would be...?

    Guido> Basically, it seems you're just renaming the functionality of os
    Guido> to posix.

I see a few advantages.

    1. We will get the meaning of the noun "posix" more or less right.
       Programmers coming from other languages are used to thinking of
       programming to a POSIX API or the "POSIX subset of the OS API".
       Witness all the "#ifdef _POSIX" in the header files on my Linux box
       In Python, the exact opposite is true.  Importing the posix module is
       documented to be the non-portable way to interface to Unix platforms.

    2. You would make it clear on all platforms when you expect to be
       programming in a non-portable fashion, by importing the
       platform-specific os (unix, nt, mac).  "import unix" would mean I
       expect this code to only run on Unix machines.  You could argue that
       you are declaring your non-portability by importing the posix module
       today, but to the casual user or to a new Python programmer with a C
       or C++ background, that won't be obvious.

    3. If Dan Connolly's contention is correct, importing the os module
       today is not all that portable.  I can't really say one way or the
       other, because I'm lucky enough to be able to confine my serious
       programming to Unix.  I'm sure there's someone out there that can try
       the following on a few platforms:

	  import os
	  dir(os)

       and compare the output.

Skip


From jack at oratrix.nl  Wed Aug 18 17:33:20 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 18 Aug 1999 17:33:20 +0200
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain 
 fart
In-Reply-To: Message by Skip Montanaro <skip@mojam.com> ,
	     Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> 
Message-ID: <19990818153320.D61F6303120@snelboot.oratrix.nl>

>  The proposal then is that importing the
> platform's module by name will import both the portable and non-portable
> interface elements.  Importing the posix module will import just that
> portion of the interface that is truly portable across all platforms.

There's one slight problem with this: when you use functionality that is 
partially portable, i.e. a call that is available on Windows and Unix but not 
on the Mac.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From akuchlin at mems-exchange.org  Wed Aug 18 17:39:30 1999
From: akuchlin at mems-exchange.org (Andrew M. Kuchling)
Date: Wed, 18 Aug 1999 11:39:30 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com>
References: <199908181447.JAA05151@dolphin.mojam.com>
	<199908181454.KAA07692@eric.cnri.reston.va.us>
	<14266.51743.904066.470431@dolphin.mojam.com>
Message-ID: <14266.54194.715887.808096@amarok.cnri.reston.va.us>

Skip Montanaro writes:
>    2. You would make it clear on all platforms when you expect to be
>       programming in a non-portable fashion, by importing the
>       platform-specific os (unix, nt, mac).  "import unix" would mean I

To my mind, POSIX == Unix; other platforms may have bits of POSIX-ish
functionality, but most POSIX functions will only be found on Unix
systems.  One of my projects for 1.6 is to go through the O'Reilly
POSIX book and add all the missing calls to the posix modules.
Practically none of those functions would exist on Windows or Mac.

Perhaps it's really a documentation fix: the os module should document
only those features common to all of the big 3 platforms (Unix,
Windows, Mac), and have pointers to a section for each of the
platform-specific modules, listing the platform-specific functions.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Setting loose on the battlefield weapons that are able to learn may be one of
the biggest mistakes mankind has ever made. It could also be one of the last.
    -- Richard Forsyth, "Machine Learning for Expert Systems"


From skip at mojam.com  Wed Aug 18 17:52:20 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:52:20 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.54194.715887.808096@amarok.cnri.reston.va.us>
References: <199908181447.JAA05151@dolphin.mojam.com>
	<199908181454.KAA07692@eric.cnri.reston.va.us>
	<14266.51743.904066.470431@dolphin.mojam.com>
	<14266.54194.715887.808096@amarok.cnri.reston.va.us>
Message-ID: <14266.54907.143970.101594@dolphin.mojam.com>

    Andrew> Perhaps it's really a documentation fix: the os module should
    Andrew> document only those features common to all of the big 3
    Andrew> platforms (Unix, Windows, Mac), and have pointers to a section
    Andrew> for each of the platform-specific modules, listing the
    Andrew> platform-specific functions.

Perhaps.  Should that read

    ... the os module should *expose* only those features common to all of
    the big 3 platforms ...

?

Skip


From skip at mojam.com  Wed Aug 18 17:54:11 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 18 Aug 1999 10:54:11 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain 
 fart
In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl>
References: <skip@mojam.com>
	<199908181447.JAA05151@dolphin.mojam.com>
	<19990818153320.D61F6303120@snelboot.oratrix.nl>
Message-ID: <14266.54991.27912.12075@dolphin.mojam.com>

>>>>> "Jack" == Jack Jansen <jack at oratrix.nl> writes:

    >> The proposal then is that importing the
    >> platform's module by name will import both the portable and non-portable
    >> interface elements.  Importing the posix module will import just that
    >> portion of the interface that is truly portable across all platforms.

    Jack> There's one slight problem with this: when you use functionality that is 
    Jack> partially portable, i.e. a call that is available on Windows and Unix but not 
    Jack> on the Mac.

Agreed.  I'm not sure what to do there.  Is the intersection of the common
OS calls on Unix, Windows and Mac so small as to be useless (or are there
some really gotta have functions not in the intersection because they are
missing only on the Mac)?

Skip


From guido at CNRI.Reston.VA.US  Wed Aug 18 18:16:27 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Wed, 18 Aug 1999 12:16:27 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: Your message of "Wed, 18 Aug 1999 10:52:20 CDT."
             <14266.54907.143970.101594@dolphin.mojam.com> 
References: <199908181447.JAA05151@dolphin.mojam.com> <199908181454.KAA07692@eric.cnri.reston.va.us> <14266.51743.904066.470431@dolphin.mojam.com> <14266.54194.715887.808096@amarok.cnri.reston.va.us>  
            <14266.54907.143970.101594@dolphin.mojam.com> 
Message-ID: <199908181616.MAA07901@eric.cnri.reston.va.us>

>     ... the os module should *expose* only those features common to all of
>     the big 3 platforms ...

Why?

My experience has been that functionality that was thought to be Unix
specific has gradually become available on other platforms, which
makes it hard to decide in which module a function should be placed.

The proper test for portability of a program is not whether it imports
certain module names, but whether it uses certain functions from those
modules (and whether it uses them in a portable fashion).  As
platforms evolve, a program that was previously thought to be
non-portable might become more portable.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Wed Aug 18 19:33:44 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Wed, 18 Aug 1999 18:33:44 +0100 (NFT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain  fart
In-Reply-To: <14266.54991.27912.12075@dolphin.mojam.com> from "Skip Montanaro" at "Aug 18, 99 10:54:11 am"
Message-ID: <199908181733.SAA08434@pukapuka.inrialpes.fr>

Everybody's right in this debate. I have to type a lot to express
objectively my opinion, but better filter my reasoning and just say
the conclusion.

Having in mind:

- what POSIX is
- what an OS is
- that an OS may or may not comply w/ the POSIX standard, and if it doesn't,
  it may do so in a couple of years (Windows 3K and PyOS come to mind ;-)
- that the os module claims portability amongst the different
  OSes, mainly regarding their filesystem & process management services,
  hence it's exposing only a *subset* of the os specific services
- the current state of Python

It would be nice:
- to leave the os module as a common denominator
- to have a "unix" module (which could further incorporate the different
  brands of unix)
- to have the posix module capture the fraction of posix functionality,
  exported from a particular OS specific module, and add the appropriate
  POSIX propaganda in the docs
- to manage to do this, or argue what's wrong with the above
 
-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From mal at lemburg.com  Thu Aug 19 12:02:26 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu, 19 Aug 1999 12:02:26 +0200
Subject: [Python-Dev] Quick-and-dirty weak references
References: <199908181312.OAA20542@pukapuka.inrialpes.fr> <37BAAC00.27A34FF7@lemburg.com>
Message-ID: <37BBD632.3F66419C@lemburg.com>

[about weak references and a sample implementation in mxProxy]

With the help of Vladimir, I have solved the problem and uploaded
a modified version of the prerelease:

      http://starship.skyport.net/~lemburg/mxProxy-pre0.2.0.zip

The archive now also contains a precompiled Win32 PYD file
for those on WinXX platforms. Please give it a try and tell
me what you think.

Cheers,
-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   134 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From jack at oratrix.nl  Thu Aug 19 16:06:01 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Thu, 19 Aug 1999 16:06:01 +0200
Subject: [Python-Dev] Optimization idea
Message-ID: <19990819140602.433BC303120@snelboot.oratrix.nl>

I just had yet another idea for optimizing Python that looks so plausible that 
I guess someone else must have looked into it already (and, hence, probably 
rejected it:-):

We add to the type structure a "type identifier" number, a small integer for 
the common types (int=1, float=2, string=3, etc) and 0 for everything else.

When eval_code2 sees, for instance, a MULTIPLY operation it does something 
like the following:
   case BINARY_MULTIPLY:
	w = POP();
	v = POP();
	code = (BINARY_MULTIPLY << 8) |
		((v->ob_type->tp_typeid) << 4) |
		((w->ob_type->tp_typeid);
	x = (binopfuncs[code])(v, w);
	.... etc ...

The idea is that all the 256 BINARY_MULTIPLY entries would be filled with 
PyNumber_Multiply, except for a few common cases. The int*int field could 
point straight to int_mul(), etc.

Assuming the common cases are really more common than the uncommon cases the 
fact that they jump straight out to the implementation function in stead of 
mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset 
the added overhead of shifts, ors and indexing.

Any thoughts?

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at CNRI.Reston.VA.US  Thu Aug 19 16:05:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 10:05:28 -0400
Subject: [Python-Dev] Localization expert needed
Message-ID: <199908191405.KAA10401@eric.cnri.reston.va.us>

My contact at HP is asking for expert advice on localization and
multi-byte characters.  I have little to share except pointing to
Martin von Loewis and Pythonware.  Does anyone on this list have a
suggestion besides those?  Don't hesitate to recommend yourself --
there's money in it!

--Guido van Rossum (home page: http://www.python.org/~guido/)

------- Forwarded Message

Date:    Wed, 18 Aug 1999 23:15:55 -0700
From:    JOE_ELLSWORTH
To:      guido at CNRI.Reston.VA.US
Subject: Localization efforts and state in Python.

Hi Guido.  

Can you give me some references to The best references currently
available for using Python in CGI applications when multi-byte
localization is known to be needed?

Who is the expert in this in the Python area?   Can you recomend that
they work with us in this area?

            Thanks, Joe E.

------- End of Forwarded Message


From guido at CNRI.Reston.VA.US  Thu Aug 19 16:15:28 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 10:15:28 -0400
Subject: [Python-Dev] Optimization idea
In-Reply-To: Your message of "Thu, 19 Aug 1999 16:06:01 +0200."
             <19990819140602.433BC303120@snelboot.oratrix.nl> 
References: <19990819140602.433BC303120@snelboot.oratrix.nl> 
Message-ID: <199908191415.KAA10432@eric.cnri.reston.va.us>

> I just had yet another idea for optimizing Python that looks so
> plausible that I guess someone else must have looked into it already
> (and, hence, probably rejected it:-):
> 
> We add to the type structure a "type identifier" number, a small integer for 
> the common types (int=1, float=2, string=3, etc) and 0 for everything else.
> 
> When eval_code2 sees, for instance, a MULTIPLY operation it does something 
> like the following:
>    case BINARY_MULTIPLY:
> 	w = POP();
> 	v = POP();
> 	code = (BINARY_MULTIPLY << 8) |
> 		((v->ob_type->tp_typeid) << 4) |
> 		((w->ob_type->tp_typeid);
> 	x = (binopfuncs[code])(v, w);
> 	.... etc ...
> 
> The idea is that all the 256 BINARY_MULTIPLY entries would be filled with 
> PyNumber_Multiply, except for a few common cases. The int*int field could 
> point straight to int_mul(), etc.
> 
> Assuming the common cases are really more common than the uncommon cases the 
> fact that they jump straight out to the implementation function in stead of 
> mucking around in PyNumber_Multiply and PyNumber_Coerce should easily offset 
> the added overhead of shifts, ors and indexing.

You're assuming that arithmetic operations are a major time sink.  I
doubt that; much of my code contains hardly any arithmetic these days.

Of course, if you *do* have a piece of code that does a lot of basic
arithmetic, it might pay off -- but even then I would guess that the
majority of opcodes are things like list accessors and variable.

But we needn't speculate.  It's easy enough to measure the speedup:
you can use tp_xxx5 in the type structure and plug a typecode into it
for the int and float types.  

(Note that you would need a separate table of binopfuncs per
operator.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Thu Aug 19 21:09:26 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Thu, 19 Aug 1999 20:09:26 +0100 (NFT)
Subject: [Python-Dev] about line numbers
Message-ID: <199908191909.UAA20618@pukapuka.inrialpes.fr>

[Tim, in an earlier msg]
> 
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call;

Ok. In the meantime I think that folding the redundant SET_LINENO doesn't
hurt. I ended up with a patchlet that seems to have no side effects, that
updates the lnotab as it should and that even makes pdb a bit more clever,
IMHO.

Consider an extreme case for the function f (listed below). Currently,
we get the following:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 SET_LINENO          1

          3 SET_LINENO          2

          6 SET_LINENO          3

          9 SET_LINENO          4

         12 SET_LINENO          5
         15 LOAD_CONST          1 (1)
         18 STORE_FAST          0 (a)

         21 SET_LINENO          6

         24 SET_LINENO          7

         27 SET_LINENO          8
         30 LOAD_CONST          2 (None)
         33 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(1)f()
-> def f():
(Pdb) list 1, 20
  1  -> def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5             a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(2)f()
-> """Comment about f"""
(Pdb) step
> test.py(3)f()
-> """Another one"""
(Pdb) step
> test.py(4)f()
-> """A third one"""
(Pdb) step
> test.py(5)f()
-> a = 1
(Pdb) step
> test.py(6)f()
-> """Forth"""
(Pdb) step
> test.py(7)f()
-> "and pdb can set a breakpoint on this one (simple quotes)"
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>>
-------------------------------------------

With folded SET_LINENO, we have this:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 SET_LINENO          5
          3 LOAD_CONST          1 (1)
          6 STORE_FAST          0 (a)

          9 SET_LINENO          8
         12 LOAD_CONST          2 (None)
         15 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 20
  1     def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5  ->         a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) break 7 
Breakpoint 1 at test.py:7
(Pdb) break 8
*** Blank or comment
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>> 
-------------------------------------------

i.e, pdb stops at (points to) the first real instruction and doesn't step
trough the doc strings.

Or is there something I'm missing here?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** compile.c-orig	Thu Aug 19 19:27:13 1999
--- compile.c	Thu Aug 19 19:00:31 1999
***************
*** 615,620 ****
--- 615,623 ----
  	int arg;
  {
  	if (op == SET_LINENO) {
+ 		if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3)
+ 			/* Hack for folding several SET_LINENO in a row. */
+ 			c->c_nexti -= 3;
  		com_set_lineno(c, arg);
  		if (Py_OptimizeFlag)
  			return;


From guido at CNRI.Reston.VA.US  Thu Aug 19 23:10:33 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Thu, 19 Aug 1999 17:10:33 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: Your message of "Thu, 19 Aug 1999 20:09:26 BST."
             <199908191909.UAA20618@pukapuka.inrialpes.fr> 
References: <199908191909.UAA20618@pukapuka.inrialpes.fr> 
Message-ID: <199908192110.RAA12755@eric.cnri.reston.va.us>

Earlier, you argued that this is "not an optimization," but rather
avoiding redundancy.  I should have responded right then that I
disagree, or at least I'm lukewarm about your patch.  Either you're
not using -O, and then you don't care much about this; or you care,
and then you should be using -O.

Rather than encrusting the code with more and more ad-hoc micro
optimizations, I'd prefer to have someone look into Tim's suggestion
of supporting more efficient breakpoints...

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 20 14:45:46 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 13:45:46 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908192110.RAA12755@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 19, 99 05:10:33 pm"
Message-ID: <199908201245.NAA27098@pukapuka.inrialpes.fr>

Guido van Rossum wrote:
> 
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.

I haven't argued so much; I asked whether this would be reasonable.

Probably I should have said that I don't see the purpose of emitting
SET_LINENO instructions for those nodes for which the compiler
generates no code, mainly because (as I learned subsequently) SET_LINENO
serve no other purpose but debugging. As I haven't payed much attention to
this aspect of the code, I thought thay they might still be used for
tracebacks. But I couldn't have said that because I didn't know it.

> I should have responded right then that I disagree, ...

Although I agree this is a minor issue, I'm interested in your argument
here, if it's something else than the dialectic: "we're more interested
in long term improvements" which is also my opinion.

> ... or at least I'm lukewarm about your patch.

No surprise here :-) But I haven't found another way of not generating
SET_LINENO for doc strings other than backpatching.

> Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.

Neither of those. I don't really care, frankly. I was just intrigued by
the consecutive SET_LINENO in my disassemblies, so I started to think
and ask questions about it.

> 
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

This is *the* real issue with the real potential solution. I'm willing
to have a look at this (although I don't know pdb/bdb in its finest
details). All suggestions and thoughts are welcome.

We would probably leave the SET_LINENO opcode as is and (eventually)
introduce a new opcode (instead of transforming/renaming it) for
compatibility reasons, methinks.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From gmcm at hypernet.com  Fri Aug 20 18:04:22 1999
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 11:04:22 -0500
Subject: [Python-Dev] Quick-and-dirty weak references 
In-Reply-To: <19990818110213.A558F303120@snelboot.oratrix.nl>
References: Message by "M.-A. Lemburg" <mal@lemburg.com> ,	     Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com> 
Message-ID: <1276961301-70195@hypernet.com>

In reply to no one in particular:

 I've often wished that the instance type object had an (optimized) 
__decref__ slot. With nothing but hand-waving to support it, I'll 
claim that would enable all these games.

- Gordon


From gmcm at hypernet.com  Fri Aug 20 18:04:22 1999
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 11:04:22 -0500
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: <19990818153320.D61F6303120@snelboot.oratrix.nl>
References: Message by Skip Montanaro <skip@mojam.com> ,	     Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com> 
Message-ID: <1276961295-70552@hypernet.com>

Jack Jansen wrote:

> There's one slight problem with this: when you use functionality
> that is partially portable, i.e. a call that is available on Windows
> and Unix but not on the Mac.

 It gets worse, I think. How about the inconsistencies in POSIX 
support among *nixes? How about NT being a superset of Win9x? How 
about NTFS having capabilities that FAT does not? I'd guess there are 
inconsistencies between Mac flavors, too.

 The Java approach (if you can't do it everywhere, you can't do it)
sucks. In some cases you could probably have the missing
functionality (in os) fail silently, but in other cases that would
be a disaster. 

 "Least-worst"-is-not-necessarily-"good"-ly y'rs

- Gordon


From tismer at appliedbiometrics.com  Fri Aug 20 17:05:47 1999
From: tismer at appliedbiometrics.com (Christian Tismer)
Date: Fri, 20 Aug 1999 17:05:47 +0200
Subject: [Python-Dev] about line numbers
References: <199908191909.UAA20618@pukapuka.inrialpes.fr> <199908192110.RAA12755@eric.cnri.reston.va.us>
Message-ID: <37BD6ECB.9DD17460@appliedbiometrics.com>


Guido van Rossum wrote:
> 
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.  I should have responded right then that I
> disagree, or at least I'm lukewarm about your patch.  Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.
> 
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

I didn't think of this before, but I just realized that
I have something like that already in Stackless Python.
It is possible to set a breakpoint at every opcode, for every
frame. Adding an extra opcode for breakpoints is a good thing
as well. The former are good for tracing, conditionla breakpoints
and such, and cost a little more time since the is always one extra
function call. The latter would be a quick, less versatile thing.

The implementation of inserting extra breakpoint opcodes for
running code turns out to be easy to implement, if the running
frame gets a local extra copy of its code object, with the
breakpoints replacing the original opcodes. The breakpoint handler
would then simply look into the original code object.

Inserting breakpoints on the source level gives us breakpoints
per procedure. Doing it in a running frame gives "instance" level
debugging of code. Checking a monitor function on every opcode
is slightly more expensive but most general.
We can have it all, what do you think.
I'm going to finish and publish the stackless/continous package
and submit a paper by end of September. Should I include
this debugging feature?

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From guido at CNRI.Reston.VA.US  Fri Aug 20 17:09:32 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 11:09:32 -0400
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."
             <1276961301-70195@hypernet.com> 
References: Message by "M.-A. Lemburg" <mal@lemburg.com> , Wed, 18 Aug 1999 11:02:02 +0200 , <37BA768A.50DF5574@lemburg.com>  
            <1276961301-70195@hypernet.com> 
Message-ID: <199908201509.LAA14726@eric.cnri.reston.va.us>

> In reply to no one in particular:
> 
>  I've often wished that the instance type object had an (optimized) 
> __decref__ slot. With nothing but hand-waving to support it, I'll 
> claim that would enable all these games.

Without context, I don't know when this would be called.  If you want
this called on all DECREFs (regardless of the refcount value), realize
that this is a huge slowdown because it would mean the DECREF macro
has to inspect the type object, which means several indirections.
This would slow down *every* DECREF operation, not just those on
instances with a __decref__ slot, because the DECREF macro doesn't
know the type of the object!

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at CNRI.Reston.VA.US  Fri Aug 20 17:13:16 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 11:13:16 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."
             <1276961295-70552@hypernet.com> 
References: Message by Skip Montanaro <skip@mojam.com> , Wed, 18 Aug 1999 09:47:23 -0500 , <199908181447.JAA05151@dolphin.mojam.com>  
            <1276961295-70552@hypernet.com> 
Message-ID: <199908201513.LAA14741@eric.cnri.reston.va.us>

From: "Gordon McMillan" <gmcm at hypernet.com>

> Jack Jansen wrote:
> 
> > There's one slight problem with this: when you use functionality
> > that is partially portable, i.e. a call that is available on Windows
> > and Unix but not on the Mac.
> 
>  It gets worse, I think. How about the inconsistencies in POSIX 
> support among *nixes? How about NT being a superset of Win9x? How 
> about NTFS having capabilities that FAT does not? I'd guess there are 
> inconsistencies between Mac flavors, too.
> 
>  The Java approach (if you can't do it everywhere, you can't do it)
> sucks. In some cases you could probably have the missing
> functionality (in os) fail silently, but in other cases that would
> be a disaster. 

The Python policy has always been "if it's available, there's a
standard name and API for it; if it's not available, the function is
not defined or will raise an exception; you can use hasattr(os, ...)
or catch exceptions to cope if you can live without it."

There are a few cases where unavailable calls are emulated, a few
where they are made no-ops, and a few where they are made to raise an
exception uncoditionally, but in most cases the function will simply
not exist, so it's easy to test.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 20 22:54:10 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 21:54:10 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <37BD6ECB.9DD17460@appliedbiometrics.com> from "Christian Tismer" at "Aug 20, 99 05:05:47 pm"
Message-ID: <199908202054.VAA26970@pukapuka.inrialpes.fr>

I'll try to sketch here the scheme I'm thinking of for the
callback/breakpoint issue (without SET_LINENO), although some
technical details are still missing.

I'm assuming the following, in this order:

1) No radical changes in the current behavior, i.e. preserve the
   current architecture / strategy as much as possible.

2) We dont have breakpoints per opcode, but per source line. For that
   matter, we have sys.settrace (and for now, we don't aim to have
   sys.settracei that would be called on every opcode, although we might
   want this in the future)

3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
   used for callbacks from C to Python. So the basic problem is to generate
   these callbacks.

If any of the above is not an appropriate assumption and we want a radical
change in the strategy of setting breakpoints/ generating callbacks, then
this post is invalid.

The solution I'm thinking of:

a) Currently, we have a function PyCode_Addr2Line which computes the source
   line from the opcode's address. I hereby assume that we can write the
   reverse function PyCode_Line2Addr that returns the address from a given
   source line number. I don't have the implementation, but it should be
   doable. Furthermore, we can compute, having the co_lnotab table and
   co_firstlineno, the source line range for a code object.

   As a consequence, even with the dumbiest of all algorithms, by looping
   trough this source line range, we can enumerate with PyCode_Line2Addr 
   the sequence of addresses for the source lines of this code object.

b) As Chris pointed out, in case sys.settrace is defined, we can allocate
   and keep a copy of the original code string per frame. We can further
   dynamically overwrite the original code string with a new (internal,
   one byte) CALL_TRACE opcode at the addresses we have enumerated in a).

   The CALL_TRACE opcodes will trigger the callbacks from C to Python,
   just as the current SET_LINENO does.

c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
   the callback and if it returns successfully, we'll fetch the original
   opcode for the current location from the copy of the original co_code.
   Then we directly jump to the arg fetch code (or in case we fetch the
   entire original opcode in CALL_TRACE - we jump to the dispatch code).


Hmm. I think that's all.

At the heart of this scheme is the PyCode_Line2Addr function, which is
the only blob in my head, for now.

Christian Tismer wrote:
> 
> I didn't think of this before, but I just realized that
> I have something like that already in Stackless Python.
> It is possible to set a breakpoint at every opcode, for every
> frame. Adding an extra opcode for breakpoints is a good thing
> as well. The former are good for tracing, conditionla breakpoints
> and such, and cost a little more time since the is always one extra
> function call. The latter would be a quick, less versatile thing.

I don't think I understand clearly the difference you're talking about, 
and why the one thing is better that the other, probably because I'm a
bit far from stackless python.
 
> I'm going to finish and publish the stackless/continous package
> and submit a paper by end of September. Should I include this debugging
> feature?

Write the paper first, you have more than enough material to talk about
already ;-). Then if you have time to implement some debugging support,
you could always add another section, but it won't be a central point
of your paper.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From guido at CNRI.Reston.VA.US  Fri Aug 20 21:59:24 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Fri, 20 Aug 1999 15:59:24 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: Your message of "Fri, 20 Aug 1999 21:54:10 BST."
             <199908202054.VAA26970@pukapuka.inrialpes.fr> 
References: <199908202054.VAA26970@pukapuka.inrialpes.fr> 
Message-ID: <199908201959.PAA16105@eric.cnri.reston.va.us>

> I'll try to sketch here the scheme I'm thinking of for the
> callback/breakpoint issue (without SET_LINENO), although some
> technical details are still missing.
> 
> I'm assuming the following, in this order:
> 
> 1) No radical changes in the current behavior, i.e. preserve the
>    current architecture / strategy as much as possible.
> 
> 2) We dont have breakpoints per opcode, but per source line. For that
>    matter, we have sys.settrace (and for now, we don't aim to have
>    sys.settracei that would be called on every opcode, although we might
>    want this in the future)
> 
> 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
>    used for callbacks from C to Python. So the basic problem is to generate
>    these callbacks.

They used to be the only mechanism by which the traceback code knew
the current line number (long before the debugger hooks existed), but
with the lnotab, that's no longer necessary.

> If any of the above is not an appropriate assumption and we want a radical
> change in the strategy of setting breakpoints/ generating callbacks, then
> this post is invalid.

Sounds reasonable.

> The solution I'm thinking of:
> 
> a) Currently, we have a function PyCode_Addr2Line which computes the source
>    line from the opcode's address. I hereby assume that we can write the
>    reverse function PyCode_Line2Addr that returns the address from a given
>    source line number. I don't have the implementation, but it should be
>    doable. Furthermore, we can compute, having the co_lnotab table and
>    co_firstlineno, the source line range for a code object.
> 
>    As a consequence, even with the dumbiest of all algorithms, by looping
>    trough this source line range, we can enumerate with PyCode_Line2Addr 
>    the sequence of addresses for the source lines of this code object.
> 
> b) As Chris pointed out, in case sys.settrace is defined, we can allocate
>    and keep a copy of the original code string per frame. We can further
>    dynamically overwrite the original code string with a new (internal,
>    one byte) CALL_TRACE opcode at the addresses we have enumerated in a).
> 
>    The CALL_TRACE opcodes will trigger the callbacks from C to Python,
>    just as the current SET_LINENO does.
> 
> c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
>    the callback and if it returns successfully, we'll fetch the original
>    opcode for the current location from the copy of the original co_code.
>    Then we directly jump to the arg fetch code (or in case we fetch the
>    entire original opcode in CALL_TRACE - we jump to the dispatch code).

Tricky, but doable.

> Hmm. I think that's all.
> 
> At the heart of this scheme is the PyCode_Line2Addr function, which is
> the only blob in my head, for now.

I'm pretty sure that this would be straightforward.

I'm a little anxious about modifying the code, and was thinking myself
of a way to specify a bitvector of addresses where to break.  But that
would still cause some overhead for code without breakpoints, so I
guess you're right (and it's certainly a long-standing tradition in
breakpoint-setting!)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From Vladimir.Marangozov at inrialpes.fr  Fri Aug 20 23:22:12 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Fri, 20 Aug 1999 22:22:12 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908201959.PAA16105@eric.cnri.reston.va.us> from "Guido van Rossum" at "Aug 20, 99 03:59:24 pm"
Message-ID: <199908202122.WAA26956@pukapuka.inrialpes.fr>

Guido van Rossum wrote:
> 
> 
> I'm a little anxious about modifying the code, and was thinking myself
> of a way to specify a bitvector of addresses where to break.  But that
> would still cause some overhead for code without breakpoints, so I
> guess you're right (and it's certainly a long-standing tradition in
> breakpoint-setting!)
> 

Hm. You're probably right, especially if someone wants to inspect
a code object from the debugger or something. But I belive, that
we can manage to redirect the instruction pointer in the beginning
of eval_code2 to the *copy* of co_code, and modify the copy with
CALL_TRACE, preserving the original intact.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From skip at mojam.com  Fri Aug 20 22:25:25 1999
From: skip at mojam.com (Skip Montanaro)
Date: Fri, 20 Aug 1999 15:25:25 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/
In-Reply-To: <1276961295-70552@hypernet.com>
References: <skip@mojam.com>
	<199908181447.JAA05151@dolphin.mojam.com>
	<19990818153320.D61F6303120@snelboot.oratrix.nl>
	<1276961295-70552@hypernet.com>
Message-ID: <14269.47443.192469.525132@dolphin.mojam.com>

    Gordon> It gets worse, I think. How about the inconsistencies in POSIX
    Gordon> support among *nixes? How about NT being a superset of Win9x?
    Gordon> How about NTFS having capabilities that FAT does not? I'd guess
    Gordon> there are inconsistencies between Mac flavors, too.

To a certain degree I think the C module(s) that interface to the underlying 
OS's API can iron out differences.  In other cases you may have to document
minor (known) differences.  In still other cases you may have to relegate
particular functionality to the OS-dependent modules.

Skip Montanaro	| http://www.mojam.com/
skip at mojam.com  | http://www.musi-cal.com/~skip/
847-971-7098


From gmcm at hypernet.com  Sat Aug 21 00:38:14 1999
From: gmcm at hypernet.com (Gordon McMillan)
Date: Fri, 20 Aug 1999 17:38:14 -0500
Subject: [Python-Dev] Quick-and-dirty weak references
In-Reply-To: <199908201509.LAA14726@eric.cnri.reston.va.us>
References: Your message of "Fri, 20 Aug 1999 11:04:22 CDT."             <1276961301-70195@hypernet.com> 
Message-ID: <1276937670-1491544@hypernet.com>

[me]
> > 
> >  I've often wished that the instance type object had an (optimized) 
> > __decref__ slot. With nothing but hand-waving to support it, I'll 
> > claim that would enable all these games.

[Guido]
> Without context, I don't know when this would be called.  If you
> want this called on all DECREFs (regardless of the refcount value),
> realize that this is a huge slowdown because it would mean the
> DECREF macro has to inspect the type object, which means several
> indirections. This would slow down *every* DECREF operation, not
> just those on instances with a __decref__ slot, because the DECREF
> macro doesn't know the type of the object!

This was more 2.0-ish speculation, and really thinking of classic C++ 
ref counting where decref would be a function call, not a macro. 
Still a slowdown, of course, but not quite so massive. The upside is 
opening up all kinds of tricks at the type object and user class 
levels, (such as weak refs and copy on write etc). Worth it? I'd 
think so, but I'm not a speed demon.

- Gordon


From tim_one at email.msn.com  Sat Aug 21 10:09:17 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 21 Aug 1999 04:09:17 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14266.51743.904066.470431@dolphin.mojam.com>
Message-ID: <000201beebac$776d32e0$0c2d2399@tim>

[Skip Montanaro]
> ...
> 3. If Dan Connolly's contention is correct, importing the os module
>    today is not all that portable.  I can't really say one way or the
>    other, because I'm lucky enough to be able to confine my serious
>    programming to Unix.  I'm sure there's someone out there that
>    can try the following on a few platforms:
>
> 	  import os
> 	  dir(os)
>
>    and compare the output.

There's no need to, Skip.  Just read the os module docs; where a function
says, e.g., "Availability: Unix.", it doesn't show up on a Windows or Mac
box.

In that sense using (some) os functions is certainly unportable.  But I have
no sympathy for the phrasing of Dan's complaint:  if he calls os.getegid(),
*he* knows perfectly well that's a Unix-specific function, and expressing
outrage over it not working on NT is disingenuous.

OTOH, I don't think you're going to find anything in the OS module
documented as available only on Windows or only on Macs, and some
semi-portable functions (notoriosly chmod) are documented in ways that make
sense only to Unixheads.  This certainly gives a strong impression of
Unix-centricity to non-Unix weenies, and has got to baffle true newbies
completely.

So 'twould be nice to have a basic os module all of whose functions "run
everywhere", whose interfaces aren't copies of cryptic old Unixisms, and
whose docs are platform neutral.

If Guido is right that the os functions tend to get more portable over time,
fine, that module can grow over time too.  In the meantime, life would be
easier for everyone except Python's implementers.


From Vladimir.Marangozov at inrialpes.fr  Sat Aug 21 17:34:32 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sat, 21 Aug 1999 16:34:32 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908202122.WAA26956@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 20, 99 10:22:12 pm"
Message-ID: <199908211534.QAA22392@pukapuka.inrialpes.fr>

[me]
> 
> Guido van Rossum wrote:
> > 
> > 
> > I'm a little anxious about modifying the code, and was thinking myself
> > of a way to specify a bitvector of addresses where to break.  But that
> > would still cause some overhead for code without breakpoints, so I
> > guess you're right (and it's certainly a long-standing tradition in
> > breakpoint-setting!)
> > 
> 
> Hm. You're probably right, especially if someone wants to inspect
> a code object from the debugger or something. But I belive, that
> we can manage to redirect the instruction pointer in the beginning
> of eval_code2 to the *copy* of co_code, and modify the copy with
> CALL_TRACE, preserving the original intact.
> 

I wrote a very rough first implementation of this idea. The files are at:

http://sirac.inrialpes.fr/~marangoz/python/lineno/


Basically, what I did is:

1) what I said :-)
2) No more SET_LINENO
3) In tracing mode, a copy of the original code is put in an additional
   slot (co_tracecode) of the code object. Then it's overwritten with
   CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

   The VM is routed to execute this code, and not the original one.

4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to
   normal execution of the original code.


A couple of things that need finalization:

a) how to deallocate the modified code string when tracing is off
b) the value of CALL_TRACE (I almost randomly picked 76)
c) I don't handle the cases where sys_tracefunc is enabled or disabled
   within the same code object. Tracing or not is determined before
   the main loop.
d) update pdb, so that it does not allow setting breakpoints on lines with
   no code. To achieve this, I think that python versions of PyCode_Addr2Line
   & PyCode_Line2Addr have to be integrated into pdb as helper functions.
e) correct bugs and design flaws
f) something else?


And here's the sample session of my lousy function f with this
'proof of concept' code:

>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
          0 LOAD_CONST          1 (1)
          3 STORE_FAST          0 (a)
          6 LOAD_CONST          2 (None)
          9 RETURN_VALUE   
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 10
  1     def f():
  2             """Comment about f"""
  3             """Another one"""
  4             """A third one"""
  5  ->         a = 1
  6             """Forth"""
  7             "and pdb can set a breakpoint on this one (simple quotes)"
  8             """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb) 
>>>

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at appliedbiometrics.com  Sat Aug 21 19:10:50 1999
From: tismer at appliedbiometrics.com (Christian Tismer)
Date: Sat, 21 Aug 1999 19:10:50 +0200
Subject: [Python-Dev] about line numbers
References: <199908211534.QAA22392@pukapuka.inrialpes.fr>
Message-ID: <37BEDD9A.DBA817B1@appliedbiometrics.com>


Vladimir Marangozov wrote:
...
> I wrote a very rough first implementation of this idea. The files are at:
> 
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
> 
> Basically, what I did is:
> 
> 1) what I said :-)
> 2) No more SET_LINENO
> 3) In tracing mode, a copy of the original code is put in an additional
>    slot (co_tracecode) of the code object. Then it's overwritten with
>    CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

I'd rather keep the original code object as it is, create a copy
with inserted breakpoints and put that into the frame slot.
Pointing back to the original from there.

Then I'd redirect the code from the CALL_TRACE opcode completely
to a user-defined function.
Getting rid of the extra code object would be done by this function
when tracing is off. It also vanishes automatically when the frame
is released.

> a) how to deallocate the modified code string when tracing is off

By making the copy a frame property which is temporary, I think.
Or, if tracing should work for all frames, by pushing the original
in the back of the modified. Both works.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From Vladimir.Marangozov at inrialpes.fr  Sat Aug 21 23:40:05 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sat, 21 Aug 1999 22:40:05 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <37BEDD9A.DBA817B1@appliedbiometrics.com> from "Christian Tismer" at "Aug 21, 99 07:10:50 pm"
Message-ID: <199908212140.WAA51054@pukapuka.inrialpes.fr>

Chris, could you please repeat that step by step in more detail?
I'm not sure I understand your suggestions.

Christian Tismer wrote:
>
> Vladimir Marangozov wrote:
> ...
> > I wrote a very rough first implementation of this idea. The files are at:
> >
> > http://sirac.inrialpes.fr/~marangoz/python/lineno/
> >
> > Basically, what I did is:
> >
> > 1) what I said :-)
> > 2) No more SET_LINENO
> > 3) In tracing mode, a copy of the original code is put in an additional
> >    slot (co_tracecode) of the code object. Then it's overwritten with
> >    CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.
>
> I'd rather keep the original code object as it is, create a copy
> with inserted breakpoints and put that into the frame slot.

You seem to suggest to duplicate the entire code object, right?
And reference the modified duplicata from the current frame?

I actually duplicate only the opcode string (that is, the co_code string
object) and I don't see the point of duplicating the entire code object.

Keeping a reference from the current frame makes sense, but won't it
deallocate the modified version on every frame release (then redo all the
code duplication work for every frame) ?

> Pointing back to the original from there.

I don't understand this. What points back where?

>
> Then I'd redirect the code from the CALL_TRACE opcode completely
> to a user-defined function.

What user-defined function? I don't understand that either...
Except the sys_tracefunc, what other (user-defined) function do we have here?
Is it a Python or a C function?

> Getting rid of the extra code object would be done by this function
> when tracing is off.

How exactly? This seems to be obvious for you, but obviously, not for me ;-)

> It also vanishes automatically when the frame is released.

The function or the extra code object?

>
> > a) how to deallocate the modified code string when tracing is off
>
> By making the copy a frame property which is temporary, I think.

I understood that the frame lifetime could be exploited "somehow"...

> Or, if tracing should work for all frames, by pushing the original
> in the back of the modified. Both works.

Tracing is done for all frames, if sys_tracefunc is not NULL, which
is a function that usually ends up in the f_trace slot.

>
> ciao - chris

I'm confused. I didn't understand your idea.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tismer at appliedbiometrics.com  Sat Aug 21 23:23:10 1999
From: tismer at appliedbiometrics.com (Christian Tismer)
Date: Sat, 21 Aug 1999 23:23:10 +0200
Subject: [Python-Dev] about line numbers
References: <199908212140.WAA51054@pukapuka.inrialpes.fr>
Message-ID: <37BF18BE.B3D58836@appliedbiometrics.com>


Vladimir Marangozov wrote:
> 
> Chris, could you please repeat that step by step in more detail?
> I'm not sure I understand your suggestions.

I think I was too quick. I thought of copying the whole
code object, of course.

...
> > I'd rather keep the original code object as it is, create a copy
> > with inserted breakpoints and put that into the frame slot.
> 
> You seem to suggest to duplicate the entire code object, right?
> And reference the modified duplicata from the current frame?

Yes.

> I actually duplicate only the opcode string (that is, the co_code string
> object) and I don't see the point of duplicating the entire code object.
> 
> Keeping a reference from the current frame makes sense, but won't it
> deallocate the modified version on every frame release (then redo all the
> code duplication work for every frame) ?

You get two options by that.
1) permanently modifying one code object to be traceable is
pushing a copy of the original "behind" by means of some
co_back pointer. This keeps the patched one where the
original was, and makes a global debugging version.

2) Creating a copy for one frame, and putting the original
in to an co_back pointer. This gives debugging just
for this one frame.

...
> > Then I'd redirect the code from the CALL_TRACE opcode completely
> > to a user-defined function.
> 
> What user-defined function? I don't understand that either...
> Except the sys_tracefunc, what other (user-defined) function do we have here?
> Is it a Python or a C function?

I would suggest a Python function, of course.

> > Getting rid of the extra code object would be done by this function
> > when tracing is off.
> 
> How exactly? This seems to be obvious for you, but obviously, not for me ;-)

If the permanent tracing "1)" is used, just restore the code object's
contents from the original in co_back, and drop co_back.
In the "2)" version, just pull the co_back into the frame's code pointer
and loose the reference to the copy. Occours automatically on frame
release.

> > It also vanishes automatically when the frame is released.
> 
> The function or the extra code object?

The extra code object.

...
> I'm confused. I didn't understand your idea.

Forget it, it isn't more than another brain fart :-)

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.python.net
10553 Berlin                 :     PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint       E182 71C7 1A9D 66E9 9D15  D3CC D4D7 93E2 1FAE F6DF
     we're tired of banana software - shipped green, ripens at home


From tim_one at email.msn.com  Sun Aug 22 03:25:22 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Sat, 21 Aug 1999 21:25:22 -0400
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908131347.OAA30740@pukapuka.inrialpes.fr>
Message-ID: <000001beec3d$348f0160$cb2d2399@tim>

[going back a week here, to dict resizing ...]

[Vladimir Marangozov]
> ...
> All in all, for performance reasons, dicts remain an exception
> to the rule of releasing memory ASAP.

Yes, except I don't think there is such a rule!  The actual rule is a
balancing act between the cost of keeping memory around "just in case", and
the expense of getting rid of it.

Resizing a dict is extraordinarily expensive because the entire table needs
to be rearranged, but lists make this tradeoff too (when you del a list
element or list slice, it still goes thru NRESIZE, which still keeps space
for as many as 100 "extra" elements around).

The various internal caches for int and frame objects (etc) also play this
sort of game; e.g., if I happen to have a million ints sitting around at
some time, Python effectively assumes I'll never want to reuse that int
storage for anything other than ints again.

python-rarely-releases-memory-asap-ly y'rs  - tim


From Vladimir.Marangozov at inrialpes.fr  Sun Aug 22 21:41:59 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Sun, 22 Aug 1999 20:41:59 +0100 (NFT)
Subject: [Python-Dev]  Memory  (was: about line numbers, which was shrinking dicts)
In-Reply-To: <000001beec3d$348f0160$cb2d2399@tim> from "Tim Peters" at "Aug 21, 99 09:25:22 pm"
Message-ID: <199908221941.UAA54480@pukapuka.inrialpes.fr>

Tim Peters wrote:
>
> [going back a week here, to dict resizing ...]

Yes, and the subject line does not correspond to the contents because
at the moment I've sent this message, I ran out of disk space and the
mailer picked a random header after destroying half of the messages
in this mailbox.

>
> [Vladimir Marangozov]
> > ...
> > All in all, for performance reasons, dicts remain an exception
> > to the rule of releasing memory ASAP.
>
> Yes, except I don't think there is such a rule!  The actual rule is a
> balancing act between the cost of keeping memory around "just in case", and
> the expense of getting rid of it.

Good point.

>
> Resizing a dict is extraordinarily expensive because the entire table needs
> to be rearranged, but lists make this tradeoff too (when you del a list
> element or list slice, it still goes thru NRESIZE, which still keeps space
> for as many as 100 "extra" elements around).
>
> The various internal caches for int and frame objects (etc) also play this
> sort of game; e.g., if I happen to have a million ints sitting around at
> some time, Python effectively assumes I'll never want to reuse that int
> storage for anything other than ints again.
>
> python-rarely-releases-memory-asap-ly y'rs  - tim

Yes, and I'm somewhat sensible to this issue afer spending 6 years
in a team which deals a lot with memory management (mainly DSM).

In other words, you say that Python tolerates *virtual* memory fragmentation
(a funny term :-). In the case of dicts and strings, we tolerate "internal
fragmentation" (a contiguous chunk is allocated, then partially used).
In the case of ints, floats or frames, we tolerate "external fragmentation".

And as you said, Python tolerates this because of the speed/space tradeoff.
Hopefully, all we deal with at this level is virtual memory, so even if you
have zillions of ints, it's the OS VMM that will help you more with its
long-term scheduling than Python's wild guesses about a hypothetical usage
of zillions of ints later.

I think that some OS concepts can give us hints on how to reduce our
virtual fragmentation (which, as we all know, is a not a very good thing).
A few kewords: compaction, segmentation, paging, sharing.

We can't do much about our internal fragmentation, except changing the
algorithms of dicts & strings (which is not appealing anyways). But it
would be nice to think about the external fragmentation of Python's caches.
Or even try to reduce the internal fragmentation in combination with the
internal caches...

BTW, this is the whole point of PyMalloc: in a virtual memory world, try
to reduce the distance between the user view and the OS view on memory.
PyMalloc addresses the fragmentation problem at a lower level of granularity
than an OS (that is, *within* a page), because most Python's objects are
very small. However, it can't handle efficiently large chunks like the
int/float caches. Basically what it does is: segmentation of the virtual
space and sharing of the cached free space. I think that Python could
improve on sharing its internal caches, without significant slowdowns...

The bottom line is that there's still plenty of room for exploring alternate
mem mgt strategies that fit better Python's memory needs as a whole.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From jack at oratrix.nl  Sun Aug 22 23:25:56 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Sun, 22 Aug 1999 23:25:56 +0200
Subject: [Python-Dev] Converting C objects to Python objects and back
Message-ID: <19990822212601.2D4BE18BA0D@oratrix.oratrix.nl>

Here's another siy idea, not having to do with optimization.

On the Mac, and as far as I know on Windows as well, there are quite a 
few OS API structures that have a Python Object representation that is 
little more than the PyObject boilerplate plus a pointer to the C API
object. (And, of course, lots of methods to operate on the object).

To convert these from Python to C I always use boilerplate code like

  WindowPtr *win;

  PyArg_ParseTuple(args, "O&", PyWin_Convert, &win);

where PyWin_Convert is the function that takes a PyObject * and a void 
**, does the typecheck and sets the pointer. A similar way is used to
convert C pointers back to Python objects in Py_BuildValue.

What I was thinking is that it would be nice (if you are _very_
careful) if this functionality was available in struct. So, if I would 
somehow obtain (in a Python string) a C structure that contained, say, 
a WindowPtr and two ints, I would be able to say
  win, x, y = struct.unpack("Ohh", Win.WindowType)
and struct would be able, through the WindowType type object, to get
at the PyWin_Convert and PyWin_New functions.

A nice side issue is that you can add an option to PyArg_Parsetuple so 
you can say
   PyArg_ParseTuple(args, "O+", Win_WinObject, &win)
and you don't have to remember the different names the various types
use for their conversion routines.

Is this worth pursuing is is it just too dangerous? And, if it is
worth pursuing, I have to stash away the two function pointers
somewhere in the TypeObject, should I grab one of the tp_xxx fields
for this or is there a better place?
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From fdrake at acm.org  Mon Aug 23 16:54:07 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Mon, 23 Aug 1999 10:54:07 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000201beebac$776d32e0$0c2d2399@tim>
References: <14266.51743.904066.470431@dolphin.mojam.com>
	<000201beebac$776d32e0$0c2d2399@tim>
Message-ID: <14273.24719.865520.797568@weyr.cnri.reston.va.us>

Tim Peters writes:
 > OTOH, I don't think you're going to find anything in the OS module
 > documented as available only on Windows or only on Macs, and some

Tim,
  Actually, the spawn*() functions are included in os and are
documented as Windows-only, along with the related P_* constants.
These are provided by the nt module.

 > everywhere", whose interfaces aren't copies of cryptic old Unixisms, and
 > whose docs are platform neutral.

  I'm alwasy glad to see documentation patches, or even pointers to
specific problems.  Being a Unix-weenie myself, making the
documentation more readable to Windows-weenies can be difficult at
times.  But given useful pointers, I can usually pull it off, or at
least drive someone who canto do so.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one at email.msn.com  Tue Aug 24 08:32:49 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 24 Aug 1999 02:32:49 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
Message-ID: <000701beedfa$7c5c8e40$902d2399@tim>

[Fred L. Drake, Jr.]
>   Actually, the spawn*() functions are included in os and are
> documented as Windows-only, along with the related P_* constants.
> These are provided by the nt module.

I stand corrected, Fred -- so how do the Unix dweebs like this Windows crap
cluttering "their" docs <wink>?

[Tim, pitching a portable sane interface to a portable sane subset of
 os functionality]

>   I'm alwasy glad to see documentation patches, or even pointers to
> specific problems.  Being a Unix-weenie myself, making the
> documentation more readable to Windows-weenies can be difficult at
> times.  But given useful pointers, I can usually pull it off, or at
> least drive someone who canto do so.  ;-)

No, it's deeper than that.  Some of the inherited Unix interfaces are flatly
incomprehensible to anyone other than a Unix-head, but the functionality is
supplied only in that form (docs may ease the pain, but the interfaces still
suck); for example,

    mkdir (path[, mode])
    Create a directory named path with numeric mode mode.
    The default mode is 0777 (octal). On some systems, mode
    is ignored. Where it is used, the current umask value is
    first masked out. Availability: Macintosh, Unix, Windows.

If you have a sister or parent or 3-year-old child (they're all equivalent for
this purpose <wink>), just imagine them reading that.  If you can't, I'll have
my sister call you <wink>.  Raw numeric permission modes, octal mode notation,
and the "umask" business are Unix-specific -- and even Unices supply symbolic
ways to specify permissions.

chmod is likely the one I hear the most gripes about.  Windows heads are
looking to change "file attributes", the name "chmod" is gibberish to them,
most of the Unix mode bits make no sense under Windows (& contra Guido's
optimism, never will) even if you know the secret octal code, and Windows has
several attributes (hidden bit, system bit, archive bit) chmod can't get at.
The only portable functionality here is the write bit, but no non-Unix person
could possibly guess either that chmod is the function they need, or what to
type after someone tells them it's chmod.

So this is less a doc issue than that more of os needs to become more like
os.path (i.e., intelligently named functions with intelligently abstracted
interfaces).

never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs  - tim


From skip at mojam.com  Tue Aug 24 19:21:53 1999
From: skip at mojam.com (Skip Montanaro)
Date: Tue, 24 Aug 1999 12:21:53 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000701beedfa$7c5c8e40$902d2399@tim>
References: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
	<000701beedfa$7c5c8e40$902d2399@tim>
Message-ID: <14274.53860.210265.71990@dolphin.mojam.com>

    Tim> chmod is likely the one I hear the most gripes about.  Windows
    Tim> heads are looking to change "file attributes", the name "chmod" is
    Tim> gibberish to them

Well, we could confuse everyone and rename "chmod" to "chfat" (is that like
file system liposuction?).  Windows probably has an equivalent function
whose name is 17 characters long which we'd all love to type, I'm sure. ;-)

    Tim> most of the Unix mode bits make no sense under Windows (& contra
    Tim> Guido's optimism, never will) even if you know the secret octal
    Tim> code ...

It beats a secret handshake.  Imagine all the extra peripherals we'd have to
make available for everyone's computer. ;-)

    Tim> So this is less a doc issue than that more of os needs to become
    Tim> more like os.path (i.e., intelligently named functions with
    Tim> intelligently abstracted interfaces).

Hasn't Guido's position been that the interface modules like os, posix, etc
are just a thin layer over the underlying API (Guido: note how I cleverly
attributed this position to you but also placed the responsibility for
correctness on your head!)?  If that's the case, perhaps we should provide a
slightly higher level module that abstracts the file system as objects, and
adopts a more user-friendly approach to the secret octal codes.  Those of us
worried about job security could continue to use the lower level module and
leave the higher level interface for former Visual Basic programmers.

    Tim> never-grasped-what-ken-thompson-had-against-trailing-"e"s-ly y'rs -

maybe-the-"e"-key-stuck-on-his-TTY-ly y'rs...

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From fdrake at acm.org  Tue Aug 24 20:21:44 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Tue, 24 Aug 1999 14:21:44 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com>
References: <14273.24719.865520.797568@weyr.cnri.reston.va.us>
	<000701beedfa$7c5c8e40$902d2399@tim>
	<14274.53860.210265.71990@dolphin.mojam.com>
Message-ID: <14274.58040.138331.413958@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > whose name is 17 characters long which we'd all love to type, I'm sure. ;-)

  Just 17?  ;-)

 >     Tim> So this is less a doc issue than that more of os needs to become
 >     Tim> more like os.path (i.e., intelligently named functions with
 >     Tim> intelligently abstracted interfaces).

  Sounds like some doc improvements can really help improve things, at 
least in the short term.

 > correctness on your head!)?  If that's the case, perhaps we should provide a
 > slightly higher level module that abstracts the file system as objects, and
 > adopts a more user-friendly approach to the secret octal codes.  Those of us

  I'm all for an object interface to a logical filesystem; having had
to write just such a thing in Java not long ago, and we have a similar 
construct in Python (not by me, though), that we use in our Knowbot
work.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From tim_one at email.msn.com  Wed Aug 25 09:02:21 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 25 Aug 1999 03:02:21 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.53860.210265.71990@dolphin.mojam.com>
Message-ID: <000801beeec7$c6f06b20$fc2d153f@tim>

[Skip Montanaro]
> Well, we could confuse everyone and rename "chmod" to "chfat" ...

I don't want to rename anything, nor do I want to use MS-specific names.  chmod
is both the wrong spelling & the wrong functionality for all non-Unix systems.
os.path did a Good Thing by, e.g., introducing getmtime(), despite that
everyone knows <wink> it's just os.stat()[8].  New isreadonly(path) and
setreadonly(path) are more what I'm after; nothing beyond that is portable, &
never will be.

> Windows probably has an equivalent function whose name is 17
> characters long

Indeed, SetFileAttributes is exactly 17 characters long (you moonlighting on
NT, Skip?!).  But while Windows geeks would like to use that, it's both the
wrong spelling & the wrong functionality for all non-Windows systems.

> ...
> Hasn't Guido's position been that the interface modules like os,
> posix, etc are just a thin layer over the underlying API (Guido:
> note how I cleverly attributed this position to you but also placed
> the responsibility for correctness on your head!)?  If that's the
> case, perhaps we should provide a slightly higher level module that
> abstracts the file system as objects, and adopts a more user-friendly
> approach to the secret octal codes.

Like that, yes.

> Those of us worried about job security could continue to use the
> lower level module and leave the higher level interface for former
> Visual Basic programmers.

You're just *begging* Guido to make the Python2 os module take all of its names
from the Win32 API <wink>.

it's-no-lamer-to-be-ignorant-of-unix-names-than-it-is-
    to-be-ignorant-of-chinese-ly y'rs  - tim


From tim_one at email.msn.com  Wed Aug 25 09:05:31 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Wed, 25 Aug 1999 03:05:31 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
Message-ID: <000901beeec8$380d05c0$fc2d153f@tim>

[Fred L. Drake, Jr.]
> ...
>   I'm all for an object interface to a logical filesystem; having
> had to write just such a thing in Java not long ago, and we have
> a similar construct in Python (not by me, though), that we use in
> our Knowbot work.

Well, don't read anything unintended into this, but Guido *is* out of town, and
you *do* have the power to check in code outside the doc subtree ...

barry-will-help-he's-been-itching-to-revolt-too<wink>-ly y'rs  - tim


From bwarsaw at cnri.reston.va.us  Wed Aug 25 13:20:16 1999
From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw)
Date: Wed, 25 Aug 1999 07:20:16 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14275.53616.585669.890621@anthem.cnri.reston.va.us>

>>>>> "TP" == Tim Peters <tim_one at email.msn.com> writes:

    TP> Well, don't read anything unintended into this, but Guido *is*
    TP> out of town, and you *do* have the power to check in code
    TP> outside the doc subtree ...

    TP> barry-will-help-he's-been-itching-to-revolt-too<wink>-ly y'rs

I'll bring the pitchforks if you bring the torches!
-Barry


From skip at mojam.com  Wed Aug 25 17:17:35 1999
From: skip at mojam.com (Skip Montanaro)
Date: Wed, 25 Aug 1999 10:17:35 -0500 (CDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14276.2229.983969.228891@dolphin.mojam.com>

    > I'm all for an object interface to a logical filesystem; having had to
    > write just such a thing in Java not long ago, and we have a similar
    > construct in Python (not by me, though), that we use in our Knowbot
    > work.

Fred,

Since this is the dev group, how about showing us the Knowbot's logical
filesystem API, and let's do some dev-ing...

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From fdrake at acm.org  Wed Aug 25 18:22:52 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 12:22:52 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000801beeec7$c6f06b20$fc2d153f@tim>
References: <14274.53860.210265.71990@dolphin.mojam.com>
	<000801beeec7$c6f06b20$fc2d153f@tim>
Message-ID: <14276.6236.605103.369339@weyr.cnri.reston.va.us>

Tim Peters writes:
 > os.path did a Good Thing by, e.g., introducing getmtime(), despite that
 > everyone knows <wink> it's just os.stat()[8].  New isreadonly(path) and
 > setreadonly(path) are more what I'm after; nothing beyond that is portable,

Tim,
  I think we can simply declare that isreadonly() checks that the file 
doesn't allow the user to read it, but setreadonly() sounds to me like 
it wouldn't be portable to Unix.  There's more than one (reasonable)
way to make a file unreadable to a user just by manipulating
permission bits, and which is best will vary according to both the
user and the file's existing permissions.


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Wed Aug 25 18:26:25 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 12:26:25 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <000901beeec8$380d05c0$fc2d153f@tim>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
Message-ID: <14276.6449.428851.402955@weyr.cnri.reston.va.us>

Tim Peters writes:
 > Well, don't read anything unintended into this, but Guido *is* out
 > of town, and you *do* have the power to check in code outside the
 > doc subtree ...

  Good thing I turned of the python-checkins list when I added the
curly bracket patch I've been working on!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From fdrake at acm.org  Wed Aug 25 20:46:30 1999
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 25 Aug 1999 14:46:30 -0400 (EDT)
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.2229.983969.228891@dolphin.mojam.com>
References: <14274.58040.138331.413958@weyr.cnri.reston.va.us>
	<000901beeec8$380d05c0$fc2d153f@tim>
	<14276.2229.983969.228891@dolphin.mojam.com>
Message-ID: <14276.14854.366220.664463@weyr.cnri.reston.va.us>

Skip Montanaro writes:
 > Since this is the dev group, how about showing us the Knowbot's logical
 > filesystem API, and let's do some dev-ing...

  Well, I took a look at it, and I must confess it's just not really
different from the set of interfaces in the os module; the important
point is that they are methods instead of functions (other than a few
data items: sep, pardir, curdir).  The path attribute provided the
same interface as os.path.  Its only user-visible state is the
current-directory setting, which may or may not be that useful.
  We left off chmod(), which would make Tim happy, but that was only
because it wasn't meaningful in context.  We'd have to add it (or
something equivalent) for a general purpose filesystem object.  So
Tim's only happy if he can come up with a general interface that is
actually portable (consider my earlier comments on setreadonly()).
  On the other hand, you don't need chmod() or anything like it for
most situations where a filesystem object would be useful.  An
FTPFilesystem class would not be hard to write!


  -Fred

--
Fred L. Drake, Jr.	     <fdrake at acm.org>
Corporation for National Research Initiatives


From jack at oratrix.nl  Wed Aug 25 23:43:16 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Wed, 25 Aug 1999 23:43:16 +0200
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart 
In-Reply-To: Message by "Fred L. Drake, Jr." <fdrake@acm.org> ,
	     Wed, 25 Aug 1999 12:22:52 -0400 (EDT) , <14276.6236.605103.369339@weyr.cnri.reston.va.us> 
Message-ID: <19990825214321.D50AD18BA0F@oratrix.oratrix.nl>

But in Python, with its nice high-level datastructures, couldn't we
design the Mother Of All File Attribute Calls, which would optionally
map functionality from one platform to another?

As an example consider the Mac resource fork size. If on unix I did
  fattrs = os.getfileattributes(filename)
  rfsize = fattrs.get('resourceforksize')
it would raise an exception. If, however, I did
  rfsize = fattrs.get('resourceforksize', compat=1)
I would get a "close approximation", 0. Note that you want some sort
of a compat parameter, not a default value, as for some attributes
(the various atime/mtime/ctimes, permission bits, etc) you'd get a
default based on other file attributes that do exist on the current
platform.

Hmm, the file-attribute-object idea has the added advantage that you
can then use setfileattributes(filename, fattrs) to be sure that
you've copied all relevant attributes, independent of the platform
you're on.

Mapping permissions takes a bit more (design-) work, with unix having
user/group/other only and Windows having full-fledged ACLs (or nothing 
at all, depending how you look at it:-), but should also be doable.

--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From Vladimir.Marangozov at inrialpes.fr  Thu Aug 26 08:10:01 1999
From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov)
Date: Thu, 26 Aug 1999 07:10:01 +0100 (NFT)
Subject: [Python-Dev] about line numbers
In-Reply-To: <199908211534.QAA22392@pukapuka.inrialpes.fr> from "Vladimir Marangozov" at "Aug 21, 99 04:34:32 pm"
Message-ID: <199908260610.HAA20304@pukapuka.inrialpes.fr>

[me, dropping SET_LINENO]
> 
> I wrote a very rough first implementation of this idea. The files are at:
> 
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
> 
> ...
> 
> A couple of things that need finalization:
> 
> ...

An updated version is available at the same location.
I think that this one does The Right Thing (tm).

a) Everything is internal to the VM and totally hidden, as it should be.
b) No modifications of the code and frame objects (no additional slots)
c) The modified code string (used for tracing) is allocated dynamically
   when the 1st frame pointing to its original switches in trace mode,
   and is deallocated automatically when the last frame pointing to its
   original dies.

I feel better with this code so I can stop thinking about it and move on :-)
(leaving it to your appreciation).

What's next? File attributes? ;-)

It's not easy to weight what kind of common interface would be easy to grasp,
intuitive and unambiguous for the average user. I think that the people on
this list (being core developers) are more or less biased here (I'd say more
than less). Perhaps some input from the community (c.l.py) would help?

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov at inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252


From tim_one at email.msn.com  Thu Aug 26 07:06:57 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 26 Aug 1999 01:06:57 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.14854.366220.664463@weyr.cnri.reston.va.us>
Message-ID: <000301beef80$d26158c0$522d153f@tim>

[Fred L. Drake, Jr.]
> ...
>   We left off chmod(), which would make Tim happy, but that was only
> because it wasn't meaningful in context.

I'd be appalled to see chmod go away; for many people it's comfortable and
useful.  I want *another* way, to do what little bit is portable in a way that
doesn't require first mastering a badly designed interface from a dying OS
<wink>.

> We'd have to add it (or something equivalent) for a general purpose
> filesystem object.  So Tim's only happy if he can come up with a
> general interface that is actually portable (consider my earlier
> comments on setreadonly()).

I don't care about general here; making up a general new way to spell
everything that everyone may want to do under every OS would create an
interface even worse than chmod's.  My sister doesn't want to create files that
are read-only to the world but writable to her group -- she just wants to mark
certain precious files as read-only to minimize the chance of accidental
destruction.  What she wants is easy to do under Windows or Unix, and I expect
she's the norm rather than the exception.

>   On the other hand, you don't need chmod() or anything like it for
> most situations where a filesystem object would be useful.  An
> FTPFilesystem class would not be hard to write!

An OO filesystem object with a .makereadonly method suits me fine <wink>.


From tim_one at email.msn.com  Thu Aug 26 07:06:54 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Thu, 26 Aug 1999 01:06:54 -0400
Subject: [Python-Dev] Portable and OS-dependent module idea/proposal/brain fart
In-Reply-To: <14276.6236.605103.369339@weyr.cnri.reston.va.us>
Message-ID: <000201beef80$d072f640$522d153f@tim>

[Fred L. Drake, Jr.]
>   I think we can simply declare that isreadonly() checks that the
> file doesn't allow the user to read it,

Had more in mind that the file doesn't allow the user to write it <wink>.

> but setreadonly() sounds to me like it wouldn't be portable to Unix.
> There's more than one (reasonable) way to make a file unreadable to
> a user just by manipulating permission bits, and which is best will
> vary according to both the user and the file's existing permissions.

"Portable" implies least common denominator, and the plain meaning of read-only
is that nobody (whether owner, group or world in Unix) has write permission.
People wanting something beyond that are going beyond what's portable, and
that's fine -- I'm not suggesting getting rid of chmod for Unix dweebs.  But by
the same token, Windows dweebs should get some other (as non-portable as chmod)
way to fiddle the bits important on *their* OS (only one of which chmod can
affect).

Billions of newbies will delightedly stick to the portable interface with the
name that makes sense.

the-percentage-of-programmers-doing-systems-programming-shrinks-by-
    the-millisecond-ly y'rs  - tim


From mal at lemburg.com  Sat Aug 28 16:37:50 1999
From: mal at lemburg.com (M.-A. Lemburg)
Date: Sat, 28 Aug 1999 16:37:50 +0200
Subject: [Python-Dev] Iterating over dictionaries and objects in general
References: <990826114149.ZM59302@rayburn.hcs.tl> 
						<199908261702.NAA01866@eric.cnri.reston.va.us> 
						<37C57E01.2ADC02AE@digicool.com> <990826150216.ZM60002@rayburn.hcs.tl> <37C5BAF1.4D6C1031@lemburg.com> <37C5C320.CF11BC7C@digicool.com> <37C643B0.7ECA586@lemburg.com> <37C69FB3.9CB279C7@digicool.com>
Message-ID: <37C7F43E.67EEAB98@lemburg.com>

[Followup to a discussion on psa-members about iterating over
 dictionaries without creating intermediate lists]

Jim Fulton wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Jim Fulton wrote:
> > >
> > > > The problem with the PyDict_Next() approach is that it will only
> > > > work reliably from within a single C call. You can't return
> > > > to Python between calls to PyDict_Next(), because those could
> > > > modify the dictionary causing the next PyDict_Next() call to
> > > > fail or core dump.
> > >
> > > I do this all the time without problem.  Basically, you provide an
> > > index and  if the index is out of range, you simply get an end-of-data return.
> > > The only downside of this approach is that you might get "incorrect"
> > > results if the dictionary is modified between calls.  This isn't
> > > all that different from iterating over a list with an index.
> >
> > Hmm, that's true... but what if the dictionary gets resized
> > in between iterations ? The item layout is then likely to
> > change, so you could potentially get complet bogus.
> 
> I think I said that. :)

Just wanted to verify my understanding ;-)

> > Even iterating over items twice may then occur, I guess.
> 
> Yup.
> 
> Again, this is not so different from iterating over
> a list using a range:
> 
>   l=range(10)
>   for i in range.len(l):
>     l.insert(0,'Bruce')
>     print l[i]
> 
> This always outputs 'Bruce'. :)

Ok, so the "risk" is under user control. Fine with me...
 
> > Or perhaps via a special dictionary iterator, so that the following
> > works:
> >
> > for item in dictrange(d):
> >    ...
> 
> Yup.
> 
> > The iterator could then also take some extra actions to insure
> > that the dictionary hasn't been resized.
> 
> I don't think it should do that. It should simply
> stop when it has run out of items.

I think I'll give such an iterator a spin. Would be a nice
extension to mxTools.

BTW, a generic type slot for iterating over types would probably
be a nice feature too. The type slot could provide hooks of the
form it_first, it_last, it_next, it_prev which all work integer
index based, e.g. in pseudo code:

int i;
PyObject *item;

/* set up i and item to point to the first item */
if (obj.it_first(&i,&item) < 0)
   goto onError;
while (1) {
   PyObject_Print(item);
   /* move i and item to the next item; an IndexError is raised
      in case there are no more items */
   if (obj.it_next(&i,&item) < 0) {
	PyErr_Clear();
	break;
   }
}

These slots would cover all problem instances where iteration
over non-sequences or non-uniform sequences (i.e. sequences like
objects which don't provide konvex index sets, e.g. 1,2,3,6,7,8,11,12)
is required, e.g. dictionaries, multi-segment buffers

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   127 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/


From gward at cnri.reston.va.us  Mon Aug 30 21:02:22 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:02:22 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
Message-ID: <19990830150222.B428@cnri.reston.va.us>

Hi all --

it recently occured to me that the 'spawn' module I wrote for the
Distutils (and which Perry Stoll extended to handle NT), could fit
nicely in the core library.  On Unix, it's just a front-end to
fork-and-exec; on NT, it's a front-end to spawnv().  In either case,
it's just enough code (and just tricky enough code) that not everybody
should have to duplicate it for their own uses.

The basic idea is this:

  from spawn import spawn
  ...
  spawn (['cmd', 'arg1', 'arg2'])
  # or
  spawn (['cmd'] + args)

you get the idea: it takes a *list* representing the command to spawn:
no strings to parse, no shells to get in the way, no sneaky
meta-characters ruining your day, draining your efficiency, or
compromising your security.  (Conversely, no pipelines, redirection,
etc.)

The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()'
depending on os.name.  Additionally, it takes a couple of optional
keyword arguments (all booleans): 'search_path', 'verbose', and
'dry_run', which do pretty much what you'd expect.

The module as it's currently in the Distutils code is attached.  Let me
know what you think...

        Greg
-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From skip at mojam.com  Mon Aug 30 21:11:50 1999
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 30 Aug 1999 14:11:50 -0500 (CDT)
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830150222.B428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <14282.54880.922571.792484@dolphin.mojam.com>

    Greg> it recently occured to me that the 'spawn' module I wrote for the
    Greg> Distutils (and which Perry Stoll extended to handle NT), could fit
    Greg> nicely in the core library.

How's spawn.spawn semantically different from the Windows-dependent
os.spawn?  How are stdout/stdin/stderr connected to the child process - just 
like fork+exec or something slightly higher level like os.popen?  If it's
semantically like os.spawn and a little bit higher level abstraction than
fork+exec, I'd vote for having the os module simply import it:

    from spawn import spawn

and thus make that function more widely available...

    Greg> The module as it's currently in the Distutils code is attached.

Not in the message I saw...

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From gward at cnri.reston.va.us  Mon Aug 30 21:14:57 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:14:57 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830150222.B428@cnri.reston.va.us>; from Greg Ward on Mon, Aug 30, 1999 at 03:02:22PM -0400
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <19990830151457.C428@cnri.reston.va.us>

On 30 August 1999, To python-dev at python.org said:
> The module as it's currently in the Distutils code is attached.  Let me
> know what you think...

New definition of "attached": I'll just reply to my own message with the 
code I meant to attach.  D'oh!

------------------------------------------------------------------------
"""distutils.spawn

Provides the 'spawn()' function, a front-end to various platform-
specific functions for launching another program in a sub-process."""

# created 1999/07/24, Greg Ward

__rcsid__ = "$Id: spawn.py,v 1.2 1999/08/29 18:20:56 gward Exp $"

import sys, os, string
from distutils.errors import *


def spawn (cmd,
           search_path=1,
           verbose=0,
           dry_run=0):

    """Run another program, specified as a command list 'cmd', in a new
       process.  'cmd' is just the argument list for the new process, ie.
       cmd[0] is the program to run and cmd[1:] are the rest of its
       arguments.  There is no way to run a program with a name different
       from that of its executable.

       If 'search_path' is true (the default), the system's executable
       search path will be used to find the program; otherwise, cmd[0] must
       be the exact path to the executable.  If 'verbose' is true, a
       one-line summary of the command will be printed before it is run.
       If 'dry_run' is true, the command will not actually be run.

       Raise DistutilsExecError if running the program fails in any way;
       just return on success."""

    if os.name == 'posix':
        _spawn_posix (cmd, search_path, verbose, dry_run)
    elif os.name in ( 'nt', 'windows' ):          # ???
        _spawn_nt (cmd, search_path, verbose, dry_run)
    else:
        raise DistutilsPlatformError, \
              "don't know how to spawn programs on platform '%s'" % os.name

# spawn ()

def _spawn_nt ( cmd,
                search_path=1,
                verbose=0,
                dry_run=0):
    import string
    executable = cmd[0]
    if search_path:
        paths = string.split( os.environ['PATH'], os.pathsep)
        base,ext = os.path.splitext(executable)
        if (ext != '.exe'):
            executable = executable + '.exe'
        if not os.path.isfile(executable):
            paths.reverse()         # go over the paths and keep the last one
            for p in paths:
                f = os.path.join( p, executable )
                if os.path.isfile ( f ):
                    # the file exists, we have a shot at spawn working
                    executable = f
    if verbose:
        print string.join ( [executable] + cmd[1:], ' ')
    if not dry_run:
        # spawn for NT requires a full path to the .exe
        rc = os.spawnv (os.P_WAIT, executable, cmd)
        if rc != 0:
            raise DistutilsExecError("command failed: %d" % rc) 

    
def _spawn_posix (cmd,
                  search_path=1,
                  verbose=0,
                  dry_run=0):

    if verbose:
        print string.join (cmd, ' ')
    if dry_run:
        return
    exec_fn = search_path and os.execvp or os.execv

    pid = os.fork ()

    if pid == 0:                        # in the child
        try:
            #print "cmd[0] =", cmd[0]
            #print "cmd =", cmd
            exec_fn (cmd[0], cmd)
        except OSError, e:
            sys.stderr.write ("unable to execute %s: %s\n" %
                              (cmd[0], e.strerror))
            os._exit (1)
            
        sys.stderr.write ("unable to execute %s for unknown reasons" % cmd[0])
        os._exit (1)

    
    else:                               # in the parent
        # Loop until the child either exits or is terminated by a signal
        # (ie. keep waiting if it's merely stopped)
        while 1:
            (pid, status) = os.waitpid (pid, 0)
            if os.WIFSIGNALED (status):
                raise DistutilsExecError, \
                      "command %s terminated by signal %d" % \
                      (cmd[0], os.WTERMSIG (status))

            elif os.WIFEXITED (status):
                exit_status = os.WEXITSTATUS (status)
                if exit_status == 0:
                    return              # hey, it succeeded!
                else:
                    raise DistutilsExecError, \
                          "command %s failed with exit status %d" % \
                          (cmd[0], exit_status)
        
            elif os.WIFSTOPPED (status):
                continue

            else:
                raise DistutilsExecError, \
                      "unknown error executing %s: termination status %d" % \
                      (cmd[0], status)
# _spawn_posix ()
------------------------------------------------------------------------

-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From gward at cnri.reston.va.us  Mon Aug 30 21:31:55 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 15:31:55 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <14282.54880.922571.792484@dolphin.mojam.com>; from Skip Montanaro on Mon, Aug 30, 1999 at 02:11:50PM -0500
References: <19990830150222.B428@cnri.reston.va.us> <14282.54880.922571.792484@dolphin.mojam.com>
Message-ID: <19990830153155.D428@cnri.reston.va.us>

On 30 August 1999, Skip Montanaro said:
> 
>     Greg> it recently occured to me that the 'spawn' module I wrote for the
>     Greg> Distutils (and which Perry Stoll extended to handle NT), could fit
>     Greg> nicely in the core library.
> 
> How's spawn.spawn semantically different from the Windows-dependent
> os.spawn?

My understanding (purely from reading Perry's code!) is that the Windows
spawnv() and spawnve() calls require the full path of the executable,
and there is no spawnvp().  Hence, the bulk of Perry's '_spawn_nt()'
function is code to search the system path if the 'search_path' flag is
true.

In '_spawn_posix()', I just use either 'execv()' or 'execvp()'
for this.  The bulk of my code is the complicated dance required to
wait for a fork'ed child process to finish.

> How are stdout/stdin/stderr connected to the child process - just 
> like fork+exec or something slightly higher level like os.popen?

Just like fork 'n exec -- '_spawn_posix()' is just a front end to fork
and exec (either execv or execvp).

In a previous life, I *did* implement a spawning module for a certain
other popular scripting language that handles redirection and capturing
(backticks in the shell and that other scripting language).  It was a
lot of fun, but pretty hairy.  Took three attempts gradually developed
over two years to get it right in the end.  In fact, it does all the
easy stuff that a Unix shell does in spawning commands, ie. search the
path, fork 'n exec, and redirection and capturing.  Doesn't handle the
tricky stuff, ie. pipelines and job control.

The documentation for this module is 22 pages long; the code is 600+
lines of somewhat tricky Perl (1300 lines if you leave in comments and
blank lines).  That's why the Distutils spawn module doesn't do anything
with std{out,err,in}.

> If it's semantically like os.spawn and a little bit higher level
> abstraction than fork+exec, I'd vote for having the os module simply
> import it:

So os.spawnv and os.spawnve would be Windows-specific, but os.spawn
portable?  Could be confusing.  And despite the recent extended
discussion of the os module, I'm not sure if this fits the model.

BTW, is there anything like this on the Mac?  On what other OSs does it
even make sense to talk about programs spawning other programs?  (Surely
those GUI user interfaces have to do *something*...)

        Greg
-- 
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From skip at mojam.com  Mon Aug 30 21:52:49 1999
From: skip at mojam.com (Skip Montanaro)
Date: Mon, 30 Aug 1999 14:52:49 -0500 (CDT)
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830153155.D428@cnri.reston.va.us>
References: <19990830150222.B428@cnri.reston.va.us>
	<14282.54880.922571.792484@dolphin.mojam.com>
	<19990830153155.D428@cnri.reston.va.us>
Message-ID: <14282.57574.918011.54595@dolphin.mojam.com>

    Greg> BTW, is there anything like this on the Mac? 

There will be, once Jack Jansen contributes _spawn_mac... ;-)

Skip Montanaro | http://www.mojam.com/
skip at mojam.com | http://www.musi-cal.com/~skip/
847-971-7098   | Python: Programming the way Guido indented...


From jack at oratrix.nl  Mon Aug 30 23:25:04 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 30 Aug 1999 23:25:04 +0200
Subject: [Python-Dev] Portable "spawn" module for core? 
In-Reply-To: Message by Greg Ward <gward@cnri.reston.va.us> ,
	     Mon, 30 Aug 1999 15:31:55 -0400 , <19990830153155.D428@cnri.reston.va.us> 
Message-ID: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl>

Recently, Greg Ward <gward at cnri.reston.va.us> said:
> BTW, is there anything like this on the Mac?  On what other OSs does it
> even make sense to talk about programs spawning other programs?  (Surely
> those GUI user interfaces have to do *something*...)

Yes, but the interface is quite a bit more high-level, so it's pretty
difficult to reconcile with the Unix and Windows "every argument is a
string" paradigm. You start the process and pass along an AppleEvent
(basically an RPC-call) that will be presented to the program upon
startup.

So on the mac there's a serious difference between (inventing the API
interface here, cut down to make it understandable to non-macheads:-)
  spawn("netscape", ("Open", "file.html"))
and
  spawn("netscape", ("OpenURL", "http://foo.com/file.html"))

The mac interface is (of course:-) infinitely more powerful, allowing
you to talk to running apps, adressing stuff in it as COM/OLE does,
etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is
impossible to represent in a meaningful way.

Add to that the fact that there's no stdin/stdout/stderr and there's
little common ground. The one area of common ground is "run program X
on files Y and Z and wait (or don't wait) for completion", so that is
something that could maybe have a special method that could be
implemented on all three mentioned platforms (and probably everything
else as well). And even then it'll be surprising to Mac users that
they have to _exit_ their editor (if you specify wait), not something
people commonly do.
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From guido at CNRI.Reston.VA.US  Mon Aug 30 23:29:55 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 30 Aug 1999 17:29:55 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Your message of "Mon, 30 Aug 1999 23:25:04 +0200."
             <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> 
References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> 
Message-ID: <199908302129.RAA08442@eric.cnri.reston.va.us>

> Recently, Greg Ward <gward at cnri.reston.va.us> said:
> > BTW, is there anything like this on the Mac?  On what other OSs does it
> > even make sense to talk about programs spawning other programs?  (Surely
> > those GUI user interfaces have to do *something*...)
> 
> Yes, but the interface is quite a bit more high-level, so it's pretty
> difficult to reconcile with the Unix and Windows "every argument is a
> string" paradigm. You start the process and pass along an AppleEvent
> (basically an RPC-call) that will be presented to the program upon
> startup.
> 
> So on the mac there's a serious difference between (inventing the API
> interface here, cut down to make it understandable to non-macheads:-)
>   spawn("netscape", ("Open", "file.html"))
> and
>   spawn("netscape", ("OpenURL", "http://foo.com/file.html"))
> 
> The mac interface is (of course:-) infinitely more powerful, allowing
> you to talk to running apps, adressing stuff in it as COM/OLE does,
> etc. but unfortunately the simple case of spawn("rm", "-rf", "/") is
> impossible to represent in a meaningful way.
> 
> Add to that the fact that there's no stdin/stdout/stderr and there's
> little common ground. The one area of common ground is "run program X
> on files Y and Z and wait (or don't wait) for completion", so that is
> something that could maybe have a special method that could be
> implemented on all three mentioned platforms (and probably everything
> else as well). And even then it'll be surprising to Mac users that
> they have to _exit_ their editor (if you specify wait), not something
> people commonly do.

Indeed.  I'm guessing that Greg wrote his code specifically to drive
compilers, not so much to invoke an editor on a specific file.  It so
happens that the Windows compilers have command lines that look
sufficiently like the Unix compilers that this might actually work.

On the Mac, driving the compilers is best done using AppleEvents, so
it's probably better to to try to abuse the spawn() interface for
that...  (Greg, is there a higher level where the compiler actions are 
described without referring to specific programs, but perhaps just to
compiler actions and input and output files?)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido at CNRI.Reston.VA.US  Mon Aug 30 23:35:45 1999
From: guido at CNRI.Reston.VA.US (Guido van Rossum)
Date: Mon, 30 Aug 1999 17:35:45 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: Your message of "Mon, 30 Aug 1999 15:02:22 EDT."
             <19990830150222.B428@cnri.reston.va.us> 
References: <19990830150222.B428@cnri.reston.va.us> 
Message-ID: <199908302135.RAA08467@eric.cnri.reston.va.us>

> it recently occured to me that the 'spawn' module I wrote for the
> Distutils (and which Perry Stoll extended to handle NT), could fit
> nicely in the core library.  On Unix, it's just a front-end to
> fork-and-exec; on NT, it's a front-end to spawnv().  In either case,
> it's just enough code (and just tricky enough code) that not everybody
> should have to duplicate it for their own uses.
> 
> The basic idea is this:
> 
>   from spawn import spawn
>   ...
>   spawn (['cmd', 'arg1', 'arg2'])
>   # or
>   spawn (['cmd'] + args)
> 
> you get the idea: it takes a *list* representing the command to spawn:
> no strings to parse, no shells to get in the way, no sneaky
> meta-characters ruining your day, draining your efficiency, or
> compromising your security.  (Conversely, no pipelines, redirection,
> etc.)
> 
> The 'spawn()' function just calls '_spawn_posix()' or '_spawn_nt()'
> depending on os.name.  Additionally, it takes a couple of optional
> keyword arguments (all booleans): 'search_path', 'verbose', and
> 'dry_run', which do pretty much what you'd expect.
> 
> The module as it's currently in the Distutils code is attached.  Let me
> know what you think...

I'm not sure that the verbose and dry_run options belong in the
standard library.  When both are given, this does something
semi-useful; for Posix that's basically just printing the arguments,
while for NT it prints the exact command that will be executed.  Not
sure if that's significant though.

Perhaps it's better to extract the code that runs the path to find the
right executable and make that into a separate routine.  (Also, rather
than reversing the path, I would break out of the loop at the first
hit.)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From gward at cnri.reston.va.us  Mon Aug 30 23:38:36 1999
From: gward at cnri.reston.va.us (Greg Ward)
Date: Mon, 30 Aug 1999 17:38:36 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <199908302129.RAA08442@eric.cnri.reston.va.us>; from Guido van Rossum on Mon, Aug 30, 1999 at 05:29:55PM -0400
References: <19990830212509.7F5C018B9FB@oratrix.oratrix.nl> <199908302129.RAA08442@eric.cnri.reston.va.us>
Message-ID: <19990830173836.F428@cnri.reston.va.us>

On 30 August 1999, Guido van Rossum said:
> Indeed.  I'm guessing that Greg wrote his code specifically to drive
> compilers, not so much to invoke an editor on a specific file.  It so
> happens that the Windows compilers have command lines that look
> sufficiently like the Unix compilers that this might actually work.

Correct, but the spawn module I posted should work for any case where
you want to run an external command synchronously without redirecting
I/O.  (And it could probably be extended to handle those cases, but a) I
don't need them for Distutils [yet!], and b) I don't know how to do it
portably.)

> On the Mac, driving the compilers is best done using AppleEvents, so
> it's probably better to to try to abuse the spawn() interface for
> that...  (Greg, is there a higher level where the compiler actions are 
> described without referring to specific programs, but perhaps just to
> compiler actions and input and output files?)

[off-topic alert... probably belongs on distutils-sig, but there you go]
Yes, my CCompiler class is all about providing a (hopefully) compiler-
and platform-neutral interface to a C/C++ compiler.  Currently there're
only two concrete subclasses of this: UnixCCompiler and MSVCCompiler,
and they both obviously use spawn, because Unix C compilers and MSVC
both provide that kind of interface.  A hypothetical sibling class that
provides an interface to some Mac C compiler might use a souped-up spawn
that "knows about" Apple Events, or it might use some other interface to
Apple Events.  If Jack's simplified summary of what passing Apple Events
to a command looks like is accurate, maybe spawn can be souped up to
work on the Mac.  Or we might need a dedicated module for running Mac
programs.

So does anybody have code to run external programs on the Mac using
Apple Events?  Would it be possible/reasonable to add that as
'_spawn_mac()' to my spawn module?

        Greg
--
Greg Ward - software developer                    gward at cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913


From jack at oratrix.nl  Mon Aug 30 23:52:29 1999
From: jack at oratrix.nl (Jack Jansen)
Date: Mon, 30 Aug 1999 23:52:29 +0200
Subject: [Python-Dev] Portable "spawn" module for core? 
In-Reply-To: Message by Greg Ward <gward@cnri.reston.va.us> ,
	     Mon, 30 Aug 1999 17:38:36 -0400 , <19990830173836.F428@cnri.reston.va.us> 
Message-ID: <19990830215234.ED4E718B9FB@oratrix.oratrix.nl>

Hmm, if we're talking a "Python Make" or some such here the best way
would probably be to use Tool Server. Tool Server is a thing that is
based on Apple's old MPW programming environment, that is still
supported by compiler vendors like MetroWerks.

The nice thing of Tool Server for this type of work is that it _is_
command-line based, so you can probably send it things like
  spawn("cc", "-O", "test.c")

But, although I know it is possible to do this with ToolServer, I
haven't a clue on how to do it...
--
Jack Jansen             | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack    | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm 


From tim_one at email.msn.com  Tue Aug 31 07:44:18 1999
From: tim_one at email.msn.com (Tim Peters)
Date: Tue, 31 Aug 1999 01:44:18 -0400
Subject: [Python-Dev] Portable "spawn" module for core?
In-Reply-To: <19990830153155.D428@cnri.reston.va.us>
Message-ID: <000101bef373$de2974c0$932d153f@tim>

[Greg Ward]
> ...
> In a previous life, I *did* implement a spawning module for
> a certain other popular scripting language that handles
> redirection and capturing (backticks in the shell and that other
> scripting language).  It was a lot of fun, but pretty hairy.  Took
> three attempts gradually developed over two years to get it right
> in the end.  In fact, it does all the easy stuff that a Unix shell
> does in spawning commands, ie. search the path, fork 'n exec, and
> redirection and capturing.  Doesn't handle the tricky stuff, ie.
> pipelines and job control.
>
> The documentation for this module is 22 pages long; the code
> is 600+ lines of somewhat tricky Perl (1300 lines if you leave
> in comments and blank lines).  That's why the Distutils spawn
> module doesn't do anything with std{out,err,in}.

Note that win/tclWinPipe.c-- which contains the Windows-specific support for
Tcl's "exec" cmd --is about 3,200 lines of C.  It does handle pipelines and
redirection, and even fakes pipes as needed with temp files when it can
identify a pipeline component as belonging to the 16-bit subsystem.  Even so,
the Tcl help page for "exec" bristles with hilarious caveats under the Windows
subsection; e.g.,

    When redirecting from NUL:, some applications may hang, others
    will get an infinite stream of "0x01" bytes, and some will
    actually correctly get an immediate end-of-file; the behavior
    seems to depend upon something compiled into the application
    itself.  When redirecting greater than 4K or so to NUL:, some
    applications will hang.  The above problems do not happen with
    32-bit applications.

Still, people seem very happy with Tcl's exec, and I'm certain no language
tries harder to provide a portable way to "do command lines".

Two points to that:

1) If Python ever wants to do something similar, let's steal the Tcl code (&
unlike stealing Perl's code, stealing Tcl's code actually looks possible --
it's very much better organized and written).

2) For all its heroic efforts to hide platform limitations,

int
Tcl_ExecObjCmd(dummy, interp, objc, objv)
    ClientData dummy;			/* Not used. */
    Tcl_Interp *interp;			/* Current interpreter. */
    int objc;				/* Number of arguments. */
    Tcl_Obj *CONST objv[];		/* Argument objects. */
{
#ifdef MAC_TCL

    Tcl_AppendResult(interp, "exec not implemented under Mac OS",
		(char *)NULL);
    return TCL_ERROR;

#else
...

a-generalized-spawn-is-a-good-start-ly y'rs  - tim


From fredrik at pythonware.com  Tue Aug 31 08:39:56 1999
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 31 Aug 1999 08:39:56 +0200
Subject: [Python-Dev] Portable "spawn" module for core?
References: <19990830150222.B428@cnri.reston.va.us>
Message-ID: <005101bef37b$b0415070$f29b12c2@secret.pythonware.com>

Greg Ward <gward at cnri.reston.va.us> wrote:
> it recently occured to me that the 'spawn' module I wrote for the
> Distutils (and which Perry Stoll extended to handle NT), could fit
> nicely in the core library.  On Unix, it's just a front-end to
> fork-and-exec; on NT, it's a front-end to spawnv().

any reason this couldn't go into the os module instead?

just add parts of it to os.py, and change the docs to say
that spawn* are supported on Windows and Unix...

(supporting the full set of spawn* primitives would
of course be nice, btw.  just like os.py provides all
exec variants...)

</F>